Applied Machine Learning 12-02-2025
Lecture 1-4: Machine Learning Algorithms
Lecture: 1-4 Student:
READ THE FOLLOWING CAREFULLY:
Deadline for Assignment Submission:
11:59 PM, 01 March 2025 (strict deadline—no late submissions will be accepted).
• Assignments must be submitted via the Taxila eLearn portal using the provided submission link.
• Use a Jupyter Notebook for your solutions:
– For theoretical questions: Solve them in a handwritten note and upload clear images of your
solutions into the Jupyter Notebook.
– For coding/implementation tasks: Write and execute your code directly in the notebook.
– Ensure that all images are properly displayed in the Jupyter Notebook before submission.
• Each answer must include the corresponding question number.
• File naming format: rollno firstname lastname assignmentno.ipynb
Failure to follow the guidelines may result in penalties.
-3.1 Assignments
-3.1.1 Programming Questions
Consider the Iris dataset. The dataset is available here: https://scikit-learn.org/stable/auto_
examples/datasets/plot_iris_dataset.html.
1. Write a small paragraph describing the Iris dataset. 2 Marks
2. Identify the features/ attributes in Iris dataset? 4 Marks
3. Identify the total number of classes in Iris dataset? 3 Marks
4. In a table, summarize the total data instances of each class (Remember table and figure should have
self contained appropriate captions.) 3 Marks
5. Split the Iris dataset randomly into training (80%) and testing (20%) (you can use sklearn train-test
split - randomseed= 42) 2 Marks.
6. In a table, provide the number of data instances used for training and testing for each class. 2 Marks
-3-1
-3-2 Lecture 1-4: Machine Learning Algorithms
7. Using the train data (obtained after splitting the total data into training and testing), perform three
fold crossvalidation to find the best value of k in k Nearest Neighbour classifier (the k value can
range from 1 to 25, and use euclidean norm to compute the distance). (You can use the k-fold
crossvalidation package provided in sklearn for hyperparameter tuning - https://scikit-learn.org/
stable/modules/generated/sklearn.model_selection.KFold.html). 5 Marks
8. Plot the average macro f1-score obtained using three fold crossvalidation with respect to the different
values of k considered in three fold crossvalidation. 3 Marks
9. Identify the best value of k for which you get the peak performance in three fold crossvalidation. 2
Marks
10. Using the best value of k, evaluate the performance of the k nearest neighbour classifier on the testdata
(Remember testing should be done only once!). 2 Marks
11. Report the test accuracy, precision, recall, f1-score and macro f1-score. 4 Marks
-3.1.2 Vector Space
12. Define the following (Refer to chapter 3 of the book: Introduction to Linear Algebra (Fifth Edition) by
Prof. Gilbert Strang) :
• Vector Space. 1 Mark
• Column Space of a Matrix A. 1 Mark
• Row Space of a Matrix A. 1 Mark
• Right Null Space of a Matrix A.1 Mark Explanation: For an m × n matrix A, the set of all vectors
x ∈ Rn , which satisfies Ax = 0 is the right null space. We call the above set of all x a vector space
because it satisfies the properties of vector space. For example, let x1 ∈ Rn and x2 ∈ Rn be such
that Ax1 = 0 and Ax2 = 0. Now A(x1 + x2) = Ax1 + Ax2 = 0 -(closed under vector addition).
Now let c ̸= 0 be a scalar, A(cx1 ) = cAx1 = 0 - (closed under scalar multiplication.
• Left Null Space of a Matrix A. 1 Mark
• Dimension of a Vector Space. 1 Mark
• Basis set of a Vector Space. 1 Mark
• Rank of a Matrix A. 1 Mark
• L2 norm of a vector x. 1 Mark
Fill in the blanks:
13. Ax = b has a solution when b lies in column space space of A. 1 Mark
14. Two nonzero vectors are orthogonal when their dot product is zero. 2 Marks
15. Two nonzero vectors are orthonormal when their dot product is zero and the L2 norm of two vectors
are 1 respectively. 2 Marks
16. Consider matrices A of size m × nand B = [A A] of size m × 2n (repeated A twice). A and B has
same column space and left null space. 2 Marks
17. Are the following statements True or False? Justify or give examples to support your reasoning.
• Orthogonality of two nonzero vectors implies linear independence. 2 Marks True
• Linear independence of two vectors implies orthogonality. 2 Marks False
Lecture 1-4: Machine Learning Algorithms -3-3
• Dimension of row space and column space of an m × n matrix A are same. 2 Marks True
• Row rank and Column rank of an m × n matrix A are same. 2 Marks True
• If two m × n matrices A and B have the same row space, column space, right null space and left
null space, then A = B. 2 Marks False
18. For the given matrix A, find the basis set for column space and row space. Also geometrically depict
the basis set that spans the column space. 5 Marks
1 2 3 4
A=
2 4 6 8
-3.1.3 Programming Question
19. Create a random 5 × 4 matrix A with rank 2 and a 5 × 1 vector b such that Ax = b has infinite solution.
Write the python code and also generate infinite solutions using loop. 5 Marks
1 # # Creating a Matrix with Rank = 2
2 import numpy as np
3 from sympy import Matrix
4
5 C = np . random . randint (10 , size =(5 , 2) )
6 D = np . random . randint (5 , size =(2 ,4) )
7 A = np . dot (C , D )
8 from numpy . linalg import matrix_rank
9 print ( " The ␣ rank ␣ of ␣ A ␣ = ␣ CD ␣ is ␣ " , matrix_rank ( A ) )
10
11 # # Ax = b with infinte solution
12
13 b = np . dot (A , np . random . randint (5 , size =( A . shape [1] , 1) ) )
14 x = np . dot ( np . linalg . pinv ( A ) , b )
15
16
17 null_space_A = Matrix ( A ) . nullspace ()
18
19 for sol in range (3 , 10) :
20 x = np . dot ( np . linalg . pinv ( A ) , b ) + np . dot ( np . array ( null_space_A [0:]) . astype ( float )
[: ,: ,0]. T , np . random . randint ( sol , size =( A . shape [1] - matrix_rank ( A ) , 1) ) )
21 if ( np . dot (A , x ) -b ) . all () < 10** -15:
22 print ( " True " )
20. Create a 3 × 4 matrix with rank 3, check whether right null space and left null space exist. Comment.
Write a python code to verify. 2 Marks
1 import numpy as np
2 from sympy import Matrix
3
4 C = np . random . randint (10 , size =(3 , 3) )
5 D = np . random . randint (5 , size =(3 ,4) )
6 A = np . dot (C , D )
7 from numpy . linalg import matrix_rank
8 print ( " The ␣ rank ␣ of ␣ A ␣ = ␣ CD ␣ is ␣ " , matrix_rank ( A ) )
9 r i g h t _ n u l l _ s p a c e _ A = Matrix ( A ) . nullspace ()
10 print ( " Right ␣ Null ␣ Space ␣ = ␣ " , r i g h t _ n u l l _ s p a c e _ A )
11 print ( " Checking ␣ A x _ r i g h t _ n u l l _ s p a c e =0= " , np . dot (A , np . array ( r i g h t _ n u l l _ s p a c e _ A [0]) .
astype ( float ) ) )
12
13 l e f t _ n u l l _ s p a c e _ A = Matrix ( A . T ) . nullspace ()
14 print ( " left ␣ Null ␣ Space ␣ = ␣ " , l e f t _ n u l l _ s p a c e _ A )
15 # print (" Checking A . T x _ l e f t _ n u l l _ s p a c e =0=" , np . dot ( A .T , np . array ( l e f t _ n u l l _ s p a c e _ A [0]) .
astype ( float ) ) )
-3-4 Lecture 1-4: Machine Learning Algorithms
21. Is it possible to create a no solution case for the above question. Justify if Yes or No. 1 Mark A is a 3 x
4 matrix with rank 3. Any b vector of size 3 x 1 can be obtained by the linear combination of cols of A.
This is because the dimension of column space is 3.
22. Write a python code for generating ten b vectors such that Ax = b has no solution. The matrix A is
given below. 5 Marks
1 2 3 4
2 3 4 5
A=
5
8 11 14
3 5 7 9
1 A = np . array ([[1 , 2 , 3 , 4] ,[2 , 3 , 4 , 5] , [5 ,8 , 11 , 14] , [3 , 5 , 7 , 9]])
2 A_columnspace = Matrix ( A ) . columnspace ()
3 A _ c o l u m n _ s p a c e _ b a s i s = np . array ( A_columnspace ) [0:][: ,: ,0]. T
4
5 print ( " Basis ␣ for ␣ Column ␣ Space ␣ of ␣ A ␣ = ␣ \ n " , A _ c o l u m n _ s p a c e _ b a s i s )
6
7 A _ l e f t _ n u l l s p ac e = Matrix ( A . T ) . nullspace ()
8 A _ l e f t _ n u l l _ s p a c e _ b a s i s = np . array ( A _ l e f t _ n u l l s p a c e ) [0:][: ,: ,0]. T
9
10 print ( " Basis ␣ for ␣ Left ␣ Null ␣ Space ␣ of ␣ A ␣ = ␣ \ n " , A _ l e f t _ n u l l _ s p a c e _ b a s i s )
11
12 # # Creating b vectors with no solutions
13 # b is can be defined as linear combination of vectors from column and left nullspace
14
15 b = np . dot ( A_left_null_space_basis , np . random . randint (5 , size =( A _ l e f t _ n u l l _ s p a c e _ b a s i s .
shape [1] , 1) ) ) + np . dot ( A_column_space_basis , np . random . randint (5 , size =(
A _ c o l u m n _ s p a c e _ b a s i s . shape [1] , 1) ) )
16 # X = np . linalg . solve ( A . astype ( float ) ,b . astype ( float ) )
17 Check_matrix = np . zeros ((4 ,5) )
18 Check_matrix [: ,0:4]= A
19 Check_matrix [: ,4]= b [: ,0]
20 from numpy . linalg import matrix_rank
21 print ( " Rank ␣ of ␣ A ␣ = " , matrix_rank ( A ) )
22 print ( " Rank ␣ of ␣ Check_Matrix ␣ = ␣ " , matrix_rank ( Check_matrix ) )
-3.1.4 Linear Regression using Least Squares
23. Mathematically derive the matrix formulation for linear regression. 2 Marks We consider the system
where Ax ̸= b, meaning b is not in the column space of A. Instead, we approximate b by finding x that
minimizes the error.
Ax + e = b (-3.1)
e = b − Ax (-3.2)
The error vector e is orthogonal to the column space of A, meaning: AT e = 0
Expanding e = b−Ax, we get: (b−Ax)T e = 0 Solving for x using the normal equation: x = (AT A)−1 AT b
24. Does the following system of linear equations Ax = b has a solution? If it does not have a solution can
you find an approximate solution using the following: 1 Marks No, b does not lie in the column space of
A.
• Method of least squares (you can use python for this) and justify why the system of linear equations
does not have a solution. 2 Marks
Lecture 1-4: Machine Learning Algorithms -3-5
The system of linear equations Ax = b is as follows:
1 0 1
v
0 1 · 11 = 1
v21
0 0 1
Step-by-Step Solution for the Given System Ax = b
Step 1: Understanding the Given System
We have the system of linear equations:
Ax = b
where:
1 0 1
v
A = 0 1 x = 11 b = 1
v21
0 0 1
Step 2: Checking for a Solution
For the system Ax = b to have a solution, b must be in the column space (range) of A. The column
space of A is given by:
1 0
Col(A) = span 0 , 1
0 0
Since the third component of b is 1, but all vectors in Col(A) have a zero in the third component, b is
not in Col(A). Thus, the system has no exact solution.
Step 3: Finding an Approximate Solution Using Least Squares
Since there is no exact solution, we use the least squares method, which minimizes the error
e = b − Ax. The least squares solution is given by:
xLS = (AT A)−1 AT b
Step 3.1: Compute AT A
1 0 0
AT =
0 1 0
1 0
AT A =
0 1
-3-6 Lecture 1-4: Machine Learning Algorithms
Step 3.2: Compute AT b
1
T 1 0 0 1
A b= 1 =
0 1 0 1
1
Step 3.3: Compute xLS
Since AT A is the identity matrix:
(AT A)−1 = I
T1
xLS =I ·A b=
1
Thus, the least squares solution is:
1
xLS =
1
Step 4: Python Implementation
To verify our results, we can use Python:
1 import numpy as np
2
3 # Define A and b
4 A = np . array ([[1 , 0] , [0 , 1] , [0 , 0]])
5 b = np . array ([[1] , [1] , [1]])
6
7 # Compute least squares solution
8 x_ls = np . linalg . pinv ( A ) @ b
9
10 print ( " Least ␣ squares ␣ solution ␣ x : " , x_ls )
Thus, the least squares solution is:
1
x=
1
25. For the data (data.txt) attached in the email find the following using python: 2 Marks
• Find a line that best fit the data with minimum error (sum of squares). [Don’t use inbuilt code in
python].
• Find a second degree, third degree and fourth degree polynomial that fits the data respectively.
Also find the error in each case and note down your inference. ([Don’t use inbuilt code in python].
Refer the slides for help). 2 Marks