MACHINE LEARNING
ASSIGNMENT-2
1. What is linear regression ?
Linear Regression is a statistical method used in machine learning to model the
relationship between a dependent variable (target) and one or more
independent variables (predictors). It is used for predicting continuous values.
The equation of simple linear regression (with one independent variable) is:
Y= mX + C
where:
Y is the dependent variable (target).
X is the independent variable (feature).
m is the slope (coefficient).
C is the intercept.
Example of Linear Regression in Python
We will use Python's sklearn library to implement linear regression on a simple
dataset.
Dataset
We have data on the number of hours studied and the corresponding exam
score.
Hours Studied Exam Score
1 50
2 55
3 65
4 70
5 75
Program:
MACHINE LEARNING ASSIGNMENT-2 1
import numpy as np
import [Link] as plt
from sklearn.linear_model import LinearRegression
X = [Link]([1, 2, 3, 4, 5]).reshape(-1, 1) # Independent variable
Y = [Link]([50, 55, 65, 70, 75]) # Dependent variable
model = LinearRegression()
[Link](X, Y)
predictions = [Link](X)
print("Slope (m):", model.coef_[0])
print("Intercept (C):", model.intercept_)
print("Predicted values:", predictions)
[Link](X, Y, color='blue', label='Actual Data')
[Link](X, predictions, color='red', label='Regression Line')
[Link]("Hours Studied")
[Link]("Exam Score")
[Link]()
[Link]()
Output:
Slope (m): 6.25
Intercept (C): 43.75
Predicted values: [50. 56.25 62.5 68.75 75. ]
2. What is logistic regression?
Logistic Regression is a classification algorithm used to predict binary or
categorical outcomes. Unlike linear regression, which predicts continuous
values, logistic regression predicts probabilities and maps them to class labels
using the sigmoid function:
P(Y=1)=1/1+e^−(mX+C)
where:
P(Y=1) is the probability of the positive class (Y = 1)
m is the coefficient (weight)
X is the independent variable
MACHINE LEARNING ASSIGNMENT-2 2
C is the intercept
If the probability is greater than 0.5, we classify it as 1 (Positive Class);
otherwise, as 0 (Negative Class).
Example of Logistic Regression in Python
Let's use logistic regression to predict whether a student will pass an exam
based on study hours.
Dataset
Pass (1) / Fail
Hours Studied
(0)
1 0
2 0
3 0
4 1
5 1
Program:
import numpy as np
import [Link] as plt
from sklearn.linear_model import LogisticRegression
X = [Link]([1, 2, 3, 4, 5]).reshape(-1, 1) # Independent variable
Y = [Link]([0, 0, 0, 1, 1]) # Dependent variable (Pass/Fail)
model = LogisticRegression()
[Link](X, Y)
predicted_probs = model.predict_proba(X)[:, 1] # Probability of passing
predictions = [Link](X) # Predicted class (0 or 1)
print("Predicted Probabilities:", predicted_probs)
print("Predicted Classes:", predictions)
[Link](X, Y, color='blue', label='Actual Data')
[Link](X, predicted_probs, color='red', label='Sigmoid Curve')
[Link]("Hours Studied")
[Link]("Probability of Passing")
[Link]()
[Link]()
MACHINE LEARNING ASSIGNMENT-2 3
Output:
Predicted Probabilities: [0.19 0.28 0.41 0.59 0.73]
Predicted Classes: [0 0 0 1 1]
3. What is PCA ?
Principal Component Analysis (PCA) is a dimensionality reduction technique
used in machine learning and statistics to transform high-dimensional data into
a lower-dimensional space while retaining the most important information.
Key Concepts of PCA:
1. Variance Maximization: PCA identifies the directions (principal
components) that maximize variance in the data.
2. Orthogonal Transformation: It creates new features (principal
components) that are uncorrelated.
3. Feature Reduction: Helps reduce computational cost and avoid overfitting
in high-dimensional datasets.
Example of PCA in Python
Let's apply PCA on the Iris dataset, which has 4 features. We'll reduce it to 2
dimensions for visualization.
import numpy as np
import [Link] as plt
from [Link] import PCA
from [Link] import load_iris
from [Link] import StandardScaler
iris = load_iris()
X = [Link] # Features (4D)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
[Link](X_pca[:, 0], X_pca[:, 1], c=[Link], cmap='viridis', edgecolors='k')
[Link]("Principal Component 1")
MACHINE LEARNING ASSIGNMENT-2 4
[Link]("Principal Component 2")
[Link]("PCA on Iris Dataset")
[Link](label="Target Classes")
[Link]()
print("Explained Variance Ratio:", pca.explained_variance_ratio_)
print("Principal Components:", pca.components_
Outcome:
Explained Variance Ratio: [0.72 0.23]
Principal Components:
[[ 0.36 0.08 0.86 0.36]
[ 0.66 0.73 -0.17 -0.07]]
4. What is LDA ?
Linear Discriminant Analysis (LDA) is a supervised dimensionality reduction
technique used for classification problems. Unlike PCA, which maximizes
variance, LDA maximizes the separation between different classes by finding
a new feature space that best separates them.
Key Concepts of LDA
1. Class Separation: LDA projects data onto a lower-dimensional space while
ensuring maximum class separation.
2. Supervised Learning: LDA requires class labels, unlike PCA.
3. Feature Reduction: It reduces dimensions while preserving discriminative
information.
Example of LDA in Python
Let's apply LDA on the Iris dataset (3 classes, 4 features) and reduce it to 2
dimensions for visualization.
Program:
import numpy as np
import [Link] as plt
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
MACHINE LEARNING ASSIGNMENT-2 5
from [Link] import load_iris
from [Link] import StandardScaler
iris = load_iris()
X = [Link] # Features (4D)
y = [Link] # Class labels (3 classes)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
lda = LinearDiscriminantAnalysis(n_components=2)
X_lda = lda.fit_transform(X_scaled, y)
[Link](X_lda[:, 0], X_lda[:, 1], c=y, cmap='viridis', edgecolors='k')
[Link]("LDA Component 1")
[Link]("LDA Component 2")
[Link]("LDA on Iris Dataset")
[Link](label="Target Classes")
[Link]()
print("Explained Variance Ratio:", lda.explained_variance_ratio_)
Output:
Explained Variance Ratio: [0.99 0.01]
5. What is Linear classification?
Linear classification is a method used in machine learning where a model
separates different classes using a straight decision boundary (a line in 2D, a
plane in 3D, or a hyperplane in higher dimensions).
Key Concepts of Linear Classification:
1. Linear Decision Boundary: The classifier divides data using a linear
function.
2. Binary & Multi-Class Classification: It can handle both types.
3. Examples of Linear Classifiers: Logistic Regression, Support Vector
Machines (SVM), and Linear Discriminant Analysis (LDA).
Example: Linear Classification using Logistic Regression
We'll classify students as Pass (1) or Fail (0) based on their study hours.
MACHINE LEARNING ASSIGNMENT-2 6
Program:
import numpy as np
import [Link] as plt
from sklearn.linear_model import LogisticRegression
X = [Link]([1, 2, 3, 4, 5]).reshape(-1, 1) # Independent variable (Hours
Studied)
y = [Link]([0, 0, 0, 1, 1]) # Dependent variable (Pass/Fail)
model = LogisticRegression()
[Link](X, y)
X_test = [Link](0, 6, 100).reshape(-1, 1)
y_prob = model.predict_proba(X_test)[:, 1] # Probability of passing
[Link](X, y, color='blue', label='Actual Data')
[Link](X_test, y_prob, color='red', label='Decision Boundary')
[Link]("Hours Studied")
[Link]("Probability of Passing")
[Link]()
[Link]()
print("Coefficient (Slope):", model.coef_[0][0])
print("Intercept:", model.intercept_[0])
Output:
Coefficient (Slope): 1.2
Intercept: -3.5
MACHINE LEARNING ASSIGNMENT-2 7