NOTES - UNIT 2 - Machine Learning
NOTES - UNIT 2 - Machine Learning
Linear Regression is one of the simplest and most commonly used machine learning algorithms for
predicting numerical values. It is used to find a relationship between independent (input) and
dependent (output) variables.
Key Concept
Linear Regression assumes that there is a straight-line (linear) relationship between the input variables
(features) and the output variable (target). It tries to fit the best possible line that represents the
relationship between them.
Formula of Linear Regression:
Y=mX+c
Where:
Y = Predicted output (dependent variable)
X = Input feature (independent variable)
m = Slope of the line (shows how much Y changes with X)
c = Intercept (where the line crosses the Y-axis
Here, you can see that sales increase as advertising cost increases. So, we can fit a straight-line
equation to predict future sales.
Y=10X+0
This means:
If you spend ₹1,000 on advertising, sales will be ₹10,000.
If you spend ₹5,000, predicted sales will be ₹50,000 (since Y=10(5000)+0=50,000Y = 10(5000)
+ 0 = 50,000Y=10(5000)+0=50,000
How Linear Regression Works?
1. Collect Data – Gather input (X) and output (Y) values.
2. Plot Data – Visualize the relationship between X and Y.
3. Find the Best Fit Line – The algorithm calculates m (slope) and c (intercept).
4. Make Predictions – Use the formula to predict Y for any given X.
5. Evaluate Model – Check accuracy using metrics like Mean Squared Error (MSE).
Conclusion
Linear Regression ek simple aur powerful technique hai jo past data ke basis par numerical values predict
karne ke liye use hoti hai. Yeh ek seedha-line wala relationship dhoondhne ki koshish karta hai input (X)
aur output (Y) ke beech. Iska use sales forecasting, stock price prediction, aur bahut saari real-world
applications mein hota hai. Agar data ka pattern linear ho, toh Linear Regression ek accurate aur effective
model ban sakta hai. / .
'
7̧
•
˙
½
In Linear Regression, we find the best straight-line equation that predicts the output based on input
values. The equation is:
y=mx+c
How are the Coefficients Estimated?
The goal is to find m and c so that the predicted values are as close as possible to the actual values in the
dataset. This is done using the Least Squares Method (also called Ordinary Least Squares - OLS).
- Steps to Estimate Coefficients:
1. Collect Data – Gather historical data with input (x) and output (y).
2. Find the Best Fit Line – The line that minimizes the error (difference between actual and
predicted values).
3. Use Mathematical Formulas to calculate m and c:
Summary
We estimate coefficients (mmm and c) using the Least Squares Method.
Slope (m) tells us the relationship strength between x and y.
Intercept (c) tells us the starting value when x=0.
Ordinary Least Squares (OLS) in Machine Learning
What is OLS?
Ordinary Least Squares (OLS) is a method used in linear regression to find the best-fitting line through
a set of data points. It minimizes the difference between the actual values and the predicted values by
reducing the sum of squared errors.
How Does OLS Work?
1. Linear Equation
The relationship between input (X) and output (Y) is represented as:
Y=mX+c
where:
Y = dependent variable (output)
X = independent variable (input)
m = slope (coefficient)
c = intercept
2. Calculating the Best Line
OLS finds the best values for m and c by minimizing the sum of squared differences between actual and
predicted values.
Advantages of OLS
⬛ Simple and Easy to Understand – OLS is easy to implement and interpret.
✓
✓ Efficient for Small Datasets – Works well when the dataset is small and clean.
⬛
⬛ Minimizes Errors Effectively – Reduces the sum of squared errors, leading to accurate
✓
predictions in linear relationships.
✓ Mathematically Proven – Based on strong statistical foundations, making it a reliable
⬛
method for linear regression.
✓ Used in Many Applications – Common in finance, economics, and machine learning for
⬛
predictive modeling.
Disadvantages of OLS
+ Assumes a Linear Relationship – Does not work well if the relationship between variables
is not linear.
+ Sensitive to Outliers – Large errors from outliers can affect the accuracy of predictions.
+ Multicollinearity Issue – If independent variables are highly correlated, OLS may not work
properly.
+ Not Suitable for High-Dimensional Data – Struggles when there are too many variables or
complex relationships.
+ Assumes Constant Variance – Assumes that errors are evenly distributed, which may not
always be true.
Conclusion
OLS is a fundamental technique in machine learning for linear regression. It helps in understanding and
predicting relationships between variables by minimizing errors.
OLS ek simple aur powerful technique hai jo linear regression ke liye use hoti hai. Yeh chhoti aur clean
datasets ke liye kaafi effective hai, lekin agar data nonlinear ho ya outliers zyada ho, toh iska performance
achha nahi hota. Multicollinearity aur high-dimensional data ke case me bhi OLS struggle karta hai.
Isliye, OLS tabhi use karna chahiye jab data linear ho aur assumptions satisfy ho rahe ho. Agar data
complex ya large ho, toh advanced techniques jaise Ridge, Lasso ya Non-Linear Regression better ho
sakti hain.
where,
σ = residual standard deviation
n = sample size
Conclusion
Machine learning mein coefficients ki accuracy check karna important hai taki hum reliable
predictions bana sakein. Standard Error, p-Value, Confidence Interval, R-squared aur VIF
jaise metrics se hum jaan sakte hain ki model ke coefficients kitne accurate hain. Agar accuracy
low hai, toh feature selection, data cleaning, ya advanced regression techniques ka use karna
chahiye.
Machine Learning Mein Model Ki Accuracy Kaise Check Karein?
-When we train a machine learning model, it's important to check its accuracy to determine whether the
model is predicting correctly or not. Accuracy refers to how many correct predictions the model is
making.
⬛ Confusion Matrix
✓
Yeh TP, TN, FP, FN (True Positive, True Negative, False Positive, False Negative) dikhata hai,
jo classification model ka performance check karne me madad karta hai.
Example:
| Actual ↓ | Predicted 0 | Predicted 1 |
| | | |
| Class 0 | TN | FP |
| Class 1 | FN | TP |
You use the ROC curve and AUC in the following cases:
✓ When the model is solving a binary classification problem (like spam detection, fraud
⬛
detection).
✓ When the dataset is imbalanced, because AUC reduces the effect of imbalance.
⬛
⬛ When you need to tune the threshold, as you can understand the effects of different threshold
✓
values using the ROC curve.
+ Imbalanced Classification
Jab ek class dusri se zyada frequent hoti hai (ek taraf data zyada aur dusri taraf kam).
◆ Example:
Fraud Detection → 98% transactions normal, sirf 2% fraud
Disease Prediction → 95% log healthy, sirf 5% me disease
◆ Problem:
Accuracy misleading ho sakti hai (model sirf majority class ko predict karke bhi high accuracy
dikha sakta hai).
Example: 95% accuracy ka matlab ho sakta hai model sirf "healthy" predict kar raha hai, aur
disease wale cases ignore kar raha hai.
◆ Solution:
Precision, Recall, F1-Score, ROC-AUC Score jese metrics use karo.
Resampling Techniques: Oversampling (SMOTE) ya Undersampling
Multiple Linear Regression is a popular method in machine learning used to predict a number (output)
based on multiple factors (inputs). It is easy to understand but needs proper data handling, like choosing
the right features, removing unusual values (outliers), and checking if input factors are too similar
(multicollinearity). If the data has a clear pattern, MLR can give accurate and reliable predictions.
Default threshold hota hai 0.5, lekin hum ise problem ke according adjust kar sakte hain.
Threshold ek cut-off value hoti hai, jo decide karti hai ki model ka output kaunsi category me jayega.
High Threshold → Kam false positives, par zyada false negatives.
Low Threshold → Kam false negatives, par zyada false positives.
Sy) Best threshold problem-specific hota hai aur ROC Curve/F1-Score use karke optimize
kiya jata hai! /'
.
7̧
•
˙
½
+ Disadvantages
No universal threshold → Different problems need different threshold values.
Trial & error needed → Finding the best threshold requires multiple tests (e.g., using ROC
Curve).
Impacts errors → A wrong threshold can increase false positives or false negatives.
Not useful for all models → Some models (like Decision Trees) do not rely on thresholds.
Since machine learning models work with numbers, we need to convert qualitative predictors into
numerical form before using them in models. Some common techniques:
1. Label Encoding → Assigning numbers to categories
Example: Color (Red = 1, Blue = 2, Green = 3)
2. One-Hot Encoding → Creating separate columns for each category with binary values (0 or 1)
Example:
o Color_Red → 1 if Red, else 0
o Color_Blue → 1 if Blue, else 0
o Color_Green → 1 if Green, else 0
S One-Hot Encoding is preferred when categories have no order (Nominal Variables).
y)
Classification:
What is Classification?
Classification is a supervised machine learning technique used to categorize data into predefined
labels (classes). The model learns from labeled data and then predicts which category new data belongs
to.
y Example:
S
)
Email Spam Detection: Classify emails as "Spam" or "Not Spam".
Disease Prediction: Classify patients as "Diabetic" or "Non-Diabetic".
Summary
Type Classes Example Function
Binary Logistic Regression 2 Classes Spam/Not Spam Sigmoid
Multiclass Logistic Regression 3+ Classes Weather Prediction One vs Rest
Multinomial Logistic Regression Multiple Labels Fruit Classification Softmax
◉●
’ Conclusion
Logistic Regression ek powerful technique hai jo Binary, Multiclass aur Multinomial problems ko
solve karti hai. Har type ka use problem ke nature aur data ke type par depend karta hai.
Conclusion
Regression coefficients machine learning models mein ek important role play karte hain, especially
regression-based predictions mein. Ye humein batate hain ki kaunsa input variable (feature) output ko
kitna affect kar raha hai. Least Squares ya Gradient Descent jaise methods ka use karke hum inhe
accurately estimate kar sakte hain.
Ek achha regression model wahi hoga jisme kam error ho aur high accuracy ho. Lekin, sahi features select
karna aur data ko properly handle karna bhi zaroori hai taki model sahi results de.
1. Collecting Data
To make predictions, the first step is to collect relevant data.
The data should have input (features) and output (labels).
Example: Suppose we want to predict a student's exam score based on their study hours. Our
data might look like this:
Study Hours (X) Exam Score (Y)
2 50
4 70
6 80
8 90
10 95
Here, Study Hours is the input (X), and Exam Score is the output (Y).
4. Making Predictions
After training, we use the model to predict new results.
Input :
from sklearn.linear_model import LinearRegression
import numpy as np
# Training data
X_train = np.array([2, 4, 6, 8, 10]).reshape(-1, 1)
y_train = np.array([50, 70, 80, 90, 95])
# Creating model
model = LinearRegression()
model.fit(X_train, y_train) # Train the model
Output:
Predicted Score for 7 hours of study: 82.5
How Predictions Work in Real Life?
)
c’ 1. Weather Forecasting
Weather apps use past temperature, humidity, and wind speed data to predict future weather.
) 2. Stock Market Predictions
’
c
ML models analyze past stock prices and trends to predict future stock values.
’) 3. Disease Detection in Healthcare
c
Hospitals use ML to predict if a patient has a disease based on medical history.
’ 4. Spam Email Detection
)
c
Email services analyze past spam emails and predict which new emails are spam.
c 5. Online Shopping Recommendations
’
)
Amazon suggests products based on your past purchases and search history.
Machine learning models learn from past data and make future predictions. The more data a model
has, the better its predictions will be. From Netflix recommendations to self-driving cars, ML
predictions are shaping our world.
Multiple Logistic Regression is a machine learning algorithm used for classification problems when
there are multiple independent variables (inputs) but a binary output (Yes/No, 0/1, True/False).
Real-Life Example
We want to predict whether a student will pass (1) or fail (0) based on:
1. Study Hours
Here, we have three independent variables (inputs) and one dependent variable (output: Pass/Fail).
Formula of Logistic Regression
Example dataset:
Study Hours Previous Score Assignments Done Pass (1) / Fail (0)
2 50 1 0
4 60 2 0
6 75 3 1
8 85 4 1
10 95 5 1
Provide new input values to check if the model correctly predicts pass or fail.
INPUT -
import pandas as pd
# Sample dataset
data = {
# Convert to DataFrame
df = pd.DataFrame(data)
Y = df['Pass_Fail']
# Split data into training and testing sets
model = LogisticRegression()
model.fit(X_train, Y_train)
# Make predictions
Y_pred = model.predict(X_test)
# Check accuracy
new_student = [[7, 80, 3]] # Study Hours = 7, Previous Score = 80, Assignments Done = 3
prediction = model.predict(new_student)
It finds a new feature space that best separates classes in your data.
Term Meaning
Maximize Between-Class
Classes should be far apart.
Variance
Minimize Within-Class Variance Data points in the same class should be close to each other.
Works well when data is normally distributed and classes have equal covariance.
LDA tries to draw a line (or axis) so that when we project all points on that line:
^
● _: Applications of LDA:
" H
Face recognition
Document classification
Speech recognition
Customer segmentation
Kitna soye
Internal marks
Pass ya Fail
Ab agar tum is data ko visualize karna chahte ho ek simple graph mein — toh LDA help karta hai data
ko aise transform karne mein jahan “Pass” aur “Fail” clearly alag dikhai dein.
CONCLUSION
Linear Discriminant Analysis (LDA) ek supervised learning technique hai jo classification aur
dimensionality reduction ke liye use hoti hai. LDA aise features dhoondta hai jo alag-alag classes ke
beech maximum difference create karein. Ye method within-class scatter ko kam karta hai aur
between-class scatter ko zyada karta hai, taaki classes easily separate ho jaayein.
Bayes' Theorem (Simple Explanation)
ӳ What is it?
●
Bayes' Theorem is a math rule that helps us find the probability of something happening, based on
some known information.
In Machine Learning, we use it to predict which category or class something belongs to.
Then we calculate and choose the class (spam or not spam) with the highest probability.
Because it assumes that all features (like words in a message) are independent — which is often not
true, but the method still works well!
Works well even if the assumption (features are independent) isn’t fully true.
Bayes' Theorem ek simple aur powerful formula hai jo hume yeh batata hai ki kisi cheez ke hone ki
chance kitni hai, jab hume kuch aur information already pata ho.
Machine Learning me isse Naive Bayes Classifier banaya gaya hai jo classification problems (jaise spam
ya not spam, positive ya negative review) solve karta hai.
Ye method fast hota hai, simple hota hai, aur kaafi accurate bhi hota hai — even agar assumptions 100%
sahi na ho tab bhi.
( Short me:
˙
(
Bayes' Theorem = Pahle se jo pata hai uske base pe naye data ko samajhna aur uska class predict karna.
Agar ek email me likha ho "Win money now!", to hum check karte hain ki ye words zyada Spam emails
me aate hain ya Not Spam me.
Agar ye words spam emails me zyada baar aaye hain → Email ko Spam classify karo.
Agar ye words normal emails me zyada aaye hain → Email ko Not Spam classify karo.
Naive Bayes ye kaam fast aur simple tarike se karta hai, isliye use spam detection, sentiment analysis,
aur medical prediction me use kiya jata hai.
Linear Discriminant Analysis (LDA) ek dimensionality reduction technique hai jo use hoti hai
classification problems me.
+C. “Different classes ko maximum separate karna aur data ka dimension reduce karna.”
LDA ek supervised algorithm hai jo input data ke saath class labels ka bhi use karta hai, aur ek new axis
banata hai jahan pe:
S Matlab:
y
)
◉ Example:
’●
Agar sirf ek feature ho jaise "exam marks", to LDA decide karega kaun "pass" hai aur kaun "fail" based
on ek cut-off.
y Matlab:
)
S
LDA projects data from high-dimensional space to lower dimension (1D or 2D) while
maximizing class separability.
✓ Simple Steps:
⬛
◉ Example:
●
’
Agar aapke paas features ho jaise: age, income, education, etc., to LDA in sabko combine karke ek naya
feature banata hai jo best separate karta hai "eligible" vs "not eligible" categories.
1. ⬛
✓ Fast and simple to use
2. ⬛
✓ Useful for classification + dimensionality reduction
3. ⬛
✓ Works well when classes are linearly separable
4. ⬛
✓ Performs better when data is normally distributed
5. ⬛
✓ Reduces overfitting by reducing dimensions
ị Disadvantages of LDA:
1. + Assumes data is normally distributed (jo real world me always nahi hota)
4. + Sensitive to outliers
LDA ek aisa method hai jo feature space ko compress karta hai, aur is tarah se naye axes banata hai
jahan pe alag-alag classes easily separate ho sakein.
'çC LDA assumes: All classes have same covariance (same spread/shape).
ç' QDA assumes: Each class can have different covariance (different shape).
C
□] Definition (Simple):
L
QDA tries to model the boundary between classes based on the assumption that:
But each class has its own covariance matrix (i.e., they can spread differently).
yS Isliye QDA ka decision boundary curve ya quadratic shape ka hota hai — not a straight line like in
)
LDA.
Covariance matrix Same for all classes Different for each class
1. ⬛
✓ Works better when classes are non-linear or have different shapes
2. ✓
⬛ More flexible and accurate than LDA in many real-world problems
3. ⬛
✓ Can model complex boundaries
+ Disadvantages of QDA:
c Conclusion
’
)
QDA ek smart classifier hai jo maan ke chalta hai ki har class ki shape alag ho sakti hai, isliye ye curved
boundaries banata hai aur flexible decisions leta hai.
Lekin agar data kam ho to ye overfit kar sakta hai.