DEPARTMENT OF INFORMATION TECHNOLOGY
AZ5402- MATHEMATICAL FOUNDATIONS OF
DATA SCIENCE
ASSIGNMENT
Submitted by
SRINIVASAN S
2022510305
[Link]. AI&DS (4/8)
MADRAS INSTITUTE OF TECHNOLOGY
ANNA UNIVERSITY, MIT CAMPUS
1
GRAM-SCHMIDT ORTHOGONALIZATION, SVD, LEAST SQUARE APPROXIMATIONS
GRAM-SCHMIDT ORTHOGONALIZATION:
The Gram-Schmidt process is a method for orthogonalizing a set of vectors in an inner product
space, typically Euclidean space. Given a linearly independent set of vectors, the process
produces an orthogonal set of vectors that span the same subspace.
Steps of Gram-Schmidt Process:
• Start with a set of linearly independent vectors
• {𝑣1,𝑣2,…,𝑣𝑛}{v 1,v 2,…,v n }.
• Initialize the first orthogonal vector 𝑢1=𝑣1
• For each subsequent vector 𝑣𝑖:
[Link] 𝑣𝑖 onto each of the previous orthogonal vectors 𝑢𝑗.
[Link] these projections from 𝑣𝑖 to get the orthogonal component.
[Link] the resulting vector to get the orthonormal basis if needed.
SINGULAR VALUE DECOMPOSITION (SVD):
SVD is a factorization of a real or complex matrix that generalizes the eigendecomposition of
a square matrix to any m×n matrix. It represents a matrix A as the product of three matrices:
A=UΣV ^T , where:
• U is an m×m orthogonal matrix.
• Σ is an m×n diagonal matrix with non-negative real numbers (singular values) on the
diagonal.
• V is an n×n orthogonal matrix.
Applications of SVD:
• Dimensionality Reduction: Reducing the number of features while preserving important
information.
• Noise Reduction: Removing noise from data.
• Solving Linear Systems: Finding least-squares solutions to linear systems.
2
LEAST SQUARES APPROXIMATIONS:
The least squares method is used to approximate the solution of overdetermined systems (more
equations than unknowns). It minimizes the sum of the squares of the residuals (differences
between observed and predicted values).
Formulation:
Given a matrix A and a vector b, the least squares solution is:
min 𝑥∥Ax−b∥^2
Normal Equations:
The solution can be found using the normal equations: ATAx=ATb
Using SVD for Least Squares:
If 𝐴=𝑈Σ𝑉𝑇A=UΣVT, the least squares solution can be found as:
𝑥=𝑉Σ+𝑈𝑇𝑏x=VΣ+UTb where Σ+Σ+ is the pseudoinverse of ΣΣ.
CODE:
IMPORTING DEPENDENCIES:
For SVD decomposition, we need three libraries: PIL, numpy, and matplotlib.
• PIL is used to load images from files and convert them to numpy arrays for further
processing.
• NUMPY is used for creating and manipulating arrays, performing mathematical
operations, and handling linear algebra computations.
• MATPLOTLIB is used to visualize data. It provides tools to create various types of
plots and charts. In our case, we are going to visualize the image multiple times, such as
before and after compression.
LOADING THE IMAGE:
Now, for SVD image compression, we need an image. Let's begin by opening an image,
loading it into a variable, converting it to a NumPy array, and then displaying it using
`[Link]()`.
3
INPUT:
image = [Link](r'C:\Users\srini\Downloads\test_image.jpg')
image_array = [Link](image)
[Link](image_array)
[Link]("Happiest day 😍")
[Link]()
OUTPUT:
So here is an image of me trying to smile and not smile at the same time, but anyway, who
cares about the image? Let's start compressing the image using SVD.
IMAGE ARRAY SHAPE:
CODE:
image_array.shape
OUTPUT:
4
COMPRESSING THE ACTUAL COLORED IMAGE:
The image's size is 3264 x 2448 pixels, with 3 color channels (RGB), making it a 3-dimensional
array. First, we'll compress a magma scale image (magma) and then move on to compressing
the actual colored image.
CODE:
gray_image_array = [Link](image_array[..., :3], [0.2989, 0.5870, 0.1140])
gray_image_array = gray_image_array.astype(np.uint8)
[Link](gray_image_array, cmap='magma')
[Link]("MY PURPLE WORLD! 😮")
[Link]()
OUTPUT:
For converting an image to black and white, the image is multiplied by the values `[0.2989,
0.5870, 0.1140]` to obtain a 2-dimensional array. This multiplication results in a grayscale
value, which is a weighted sum of the RGB values. These specific weights are chosen based on
5
human perception, as the human eye is more sensitive to green light and less sensitive to blue
light. This method of converting an image to grayscale is known as the luminosity method.
When visualizing the grayscale image using matplotlib, it's important to specify `cmap="gray"`
in the `imshow()` function. This ensures that the image is displayed correctly as grayscale. If
this colormap is not specified, `imshow()` will interpret the image as a colored one, leading to a
potentially misleading visualization.
CODE:
gray_image_array.shape
OUTPUT:
SVD COMPRESSION FOR GRAYSCALE IMAGE:
COED:
INPUT:
def compute_svd(gray_image_array, k=50):
# Step 1: Compute the covariance matrix
covariance_matrix = [Link](gray_image_array, gray_image_array.T)
# Step 2: Compute the eigendecomposition of the covariance matrix
eigenvalues, eigenvectors = [Link](covariance_matrix)
# Step 3: Ensure all eigenvalues are non-negative
eigenvalues = [Link](eigenvalues, 0)
# Step 4: Sort eigenvalues and eigenvectors in descending order
sorted_indices = [Link](eigenvalues)[::-1]
eigenvalues = eigenvalues[sorted_indices]
6
eigenvectors = eigenvectors[:, sorted_indices]
# Step 5: Compute singular values and truncate to k dimensions
singular_values = [Link](eigenvalues[:k])
# Step 6: Compute U and V matrices
U = eigenvectors[:, :k]
V = [Link](gray_image_array.T, U)
V = V / singular_values
# Return the truncated matrices
return U, singular_values, V.T
SVD:
Now we compute the SVD of the grayscale image using the `compute_svd()` function.
Note that this function accepts the `k` value beforehand. Then we multiply the U, S, and V
matrices to get our compressed image. After that, the image is clipped. Why is this done? If
some values go larger than 255, they are set to 255. Then we convert the compressed array to an
unsigned 8-bit integer because pixel values are 8-bit, ranging from 0 to 255.
CODE:
U, S, V = compute_svd(gray_image_array)
compressed_image = [Link](U, [Link]([Link](S), V))
compressed_image = [Link](compressed_image, 0, 255).astype(np.uint8)
def image_view(og,ci):
[Link](figsize=(10, 5))
[Link](1, 2, 1)
[Link]('Original Image')
[Link](og, cmap='gray')
7
[Link](1, 2, 2)
[Link](f'Compressed Image')
[Link](ci, cmap='gray')
[Link]()
difference OF compressed and actual images
A function is created to visualize the image and to see the difference between the compressed
and actual images.
CODE:
image_view(gray_image_array, compressed_image)
OUTPUT:
From the image we can see that the compressed image is pixelated a bit.
def print_memory(gray_image_array, singular_values, compressed_image):
original_size = gray_image_array.size * gray_image_array.itemsize
compressed_size = ([Link] + singular_values.size + [Link]) * compressed_image.itemsize
8
memory_saved = original_size - compressed_size
print(f"Original size: {original_size} bytes")
print(f"Compressed size: {compressed_size} bytes")
print(f"Memory saved: {memory_saved} bytes")
Now, we need to see how much memory we saved from compression. For that, a
`print_memory()` function is defined, which prints the original size, compressed size, and
memory saved.
IMAGE SIZE:
print_memory(gray_image_array, S, compressed_image)
OUTPUT:
From the above result we can see that the original image is greater in size as compared to the
compressed image size.
Now, let's try with a different `k` value. Last time the `k` value was 50, now let's make it 30.
CODE:
U, S, V = compute_svd(gray_image_array,k=30)
compressed_image = [Link](U, [Link]([Link](S), V))
compressed_image = [Link](U, [Link]([Link](S), V))
compressed_image = [Link](compressed_image, 0, 255).astype(np.uint8)
image_view(gray_image_array, compressed_image)
9
From the plot we can see that the image is more pixelated. I am looking like dooby dooby 😥
print_memory(gray_image_array, S, compressed_image)
OUTPU:
SVD COMPRESSION FOR COLORED IMAGES:
Now, let's compress a colored image. The process is almost the same, but in this case, we are
separately calculating the SVD values for each layer (R, G, and B) and putting them all together
to form the compressed image.
image_array = image_array.astype(float)
R = image_array[:, :, 0]
G = image_array[:, :, 1]
B = image_array[:, :, 2]
UR, SR, VTR = compute_svd(R)
UG, SG, VTG = compute_svd(G)
UB, SB, VTB = compute_svd(B)
10
R_compressed = [Link](UR, [Link]([Link](SR), VTR))
G_compressed = [Link](UG, [Link]([Link](SG), VTG))
B_compressed = [Link](UB, [Link]([Link](SB), VTB))
compressed_image = [Link]((R_compressed, G_compressed, B_compressed), axis=2)
compressed_image = [Link](compressed_image, 0, 255).astype(np.uint8)
def image_view_1(og,ci):
[Link](figsize=(10, 5))
[Link](1, 2, 1)
[Link]('Original Image')
[Link](og)
[Link](1, 2, 2)
[Link](f'Compressed Image')
[Link](ci)
[Link]()
INPUT:
image_view_1([Link](image), compressed_image)
OUTPUT:
11
The compressed image is pixelated compared to the original image. Now, let's set the `k` value
to 30 and see the compressed image. By the way, the sizes of the images are listed below.
CODE:
def print_memory_color(image_array, compressed_image, UR, SR, VTR, UG, SG, VTG, UB,
SB, VTB):
original_size = image_array.size * image_array.itemsize
compressed_size = ([Link] + [Link] + [Link] +
[Link] + [Link] + [Link] +
[Link] + [Link] + [Link]) * compressed_image.itemsize
memory_saved = original_size - compressed_size
print(f"Original size: {original_size} bytes")
print(f"Compressed size: {compressed_size} bytes")
print(f"Memory saved: {memory_saved} bytes")
print_memory_color(image_array, compressed_image, UR, SR, VTR, UG, SG, VTG, UB, SB,
VTB)
OUTPUT:
CODE:
UR, SR, VTR = compute_svd(R,30)
UG, SG, VTG = compute_svd(G,30)
UB, SB, VTB = compute_svd(B,30)
R_compressed = [Link](UR, [Link]([Link](SR), VTR))
G_compressed = [Link](UG, [Link]([Link](SG), VTG))
B_compressed = [Link](UB, [Link]([Link](SB), VTB))
12
compressed_image = [Link]((R_compressed, G_compressed, B_compressed), axis=2)
compressed_image = [Link](compressed_image, 0, 255).astype(np.uint8)
image_view_1([Link](image), compressed_image)
OUTPUT:
From the information below, we can see how much memory the compressed image is taking.
INPUT:
print_memory_color(image_array, compressed_image, UR, SR, VTR, UG, SG, VTG, UB, SB,
VTB)
OUTPUT:
MEAN SQUARED ERROR
def mean_squared_error(original, compressed):
return [Link]((original - compressed) ** 2)
mse = mean_squared_error([Link](image), compressed_image)
print(f"Mean Squared Error (MSE): {mse:.2f}")
OUTPUT:
13
To find the best value for k (rank) that produces the optimal compressed image, we will use
the Mean Squared Error (MSE) as the error measure. We will loop through different rank
values, compute the MSE for each value, and then plot the rank against the MSE value. This
will help us find the elbow point in the graph, indicating the best rank to use for optimal
compression.
CODE:
temp_img = [Link](image)
ranks = []
mse_val = []
for r in range(1, 201, 20):
UR, SR, VTR = compute_svd(R,r)
UG, SG, VTG = compute_svd(G,r)
UB, SB, VTB = compute_svd(B,r)
R_compressed = [Link](UR, [Link]([Link](SR), VTR))
G_compressed = [Link](UG, [Link]([Link](SG), VTG))
B_compressed = [Link](UB, [Link]([Link](SB), VTB))
compressed_image = [Link]((R_compressed, G_compressed, B_compressed), axis=2)
compressed_image = [Link](compressed_image, 0, 255).astype(np.uint8)
image_view_1([Link](image), compressed_image)
mse_value = mean_squared_error(temp_img, compressed_image)
[Link](r)
mse_val.append(mse_value)
print(f"Rank {r} - MSE: {mse_value}")
14
Rank 1 - MSE: 106.5747486255787 Rank 21 - MSE: 76.79286458333333
Rank 41 - MSE: 63.85901294849537 Rank 61 - MSE: 55.235299840856484
Rank 81 - MSE: 48.640845630787034 Rank 101 - MSE: 42.59190212673611
15
Rank 121 - MSE: 36.85522352430556
Anyway Let's plot Rank versus Mean Squared Error to find the elbow point
INPUT:
[Link](ranks, mse_val, color='blue')
[Link](ranks, mse_val, color='red', linestyle='-', linewidth=1)
[Link](True)
[Link]('Rank K')
[Link]('MSE')
[Link]('Rank vs MSE')
[Link]()
16
OUTPUT:
From the above graph we can see that after rank 41 the curve starts to flatten. So the best K
values would be 41 - 71
The dataset chosen is the preprocessed PIMA Indian Diabetes dataset. The idea is to first take
the column with the maximum variance as a starting point. Then, we compute the component
that needs to be subtracted from the second column to make this second column orthogonal to
the maximum variance column. Similarly, this process is repeated for the other columns, and
the orthogonalized values are stored. The smallest value is chosen, and this becomes the new
column that has maximum variability compared to the other columns. This process is repeated
for all columns, and the dataframe is rearranged. The arrangement of the columns indicates that
we have columns in order of maximum variability.
# Example dataset
df = pd.read_csv(r"C:\Users\srini\Downloads\archive\[Link]")
target = df['target']
[Link](columns = 'target', inplace=True)
# Step 1: Compute variance for each column
variances = [Link]()
# Step 2: Select the column with the maximum variance
17
max_variance_col = [Link]()
ordered_columns = [max_variance_col]
# Step 3: Perform Gram-Schmidt-like orthogonalization
remaining_columns = set([Link]) - set(ordered_columns)
orthogonalized_columns = {col: df[col] for col in [Link]}
CODE:
while remaining_columns:
max_col = ordered_columns[-1]
orthogonalization_values = {}
for col in remaining_columns:
projection = [Link](df[col], orthogonalized_columns[max_col]) /
[Link](orthogonalized_columns[max_col], orthogonalized_columns[max_col])
orthogonalized_columns[col] -= projection * orthogonalized_columns[max_col]
orthogonalization_values[col] = [Link](orthogonalized_columns[col])
# Select the column with the least orthogonalization value
next_col = min(orthogonalization_values, key=orthogonalization_values.get)
ordered_columns.append(next_col)
remaining_columns.remove(next_col)
# Rearrange the DataFrame columns
df_reordered = df[ordered_columns]
df_reordered
18
OUTPUT:
Now, to test this, let's train a KNN model. Well, the KNN model is a lazy algorithm, like
me, but for the sake of this example, let's use it. First, let's train the model using the first 5
columns and see its score.
X = [Link](df_reordered.iloc[:,:5])
y = [Link](target)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
INPUT:
knn = KNeighborsClassifier(n_neighbors=3) # You can choose the number of neighbors (k)
[Link](X_train, y_train.ravel()
OUTPUT:
INPUT:
[Link](X_test, y_test)
OUTPUT:
0.8474025974025974
19
Now, let's train another KNN model with all the columns and see its score.X = [Link](df)
y = [Link](target)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
knn = KNeighborsClassifier(n_neighbors=3) # You can choose the number of neighbors (k)
[Link](X_train, y_train.ravel())
OUTPUT:
INPUT:
[Link](X_test, y_test)
OUTPUT:
0.8831168831168831
Surprisingly, this score is lower than the previous one. This might have happened due to noise.
And thus, feature selection using Gram-Schmidt orthogonalization is completed.
20