Supervised Learning
Supervised Learning
Supervised Learning
Supervised Learning is a type of machine learning where the model is trained using labeled
data (input with correct output).
→
Classification predicting categories or classes (e.g., spam or not spam).
→
Regression predicting continuous values (e.g., house price, temperature).
Process:
1. Collect labeled training data.
2. Train a model to identify class patterns.
3. Use the model to classify new, unseen data.
Examples:
→
Email filtering Spam / Not Spam
→
Medical diagnosis Disease / No Disease
→
Image recognition Dog / Cat / Human
Classification Model
Definition: A classification model is a mathematical or computational representation that learns from
labeled data to predict the correct class for new inputs.
Training Process:
Feed the model with training data.
Learn patterns and relationships between features and labels.
Adjust model parameters to minimize errors.
Prediction: For any unseen input, the model assigns it to the most probable class.
Evaluation: Performance is measured using metrics such as accuracy, precision, recall, and F1-score.
Example:
Input: Age, income, browsing history
Output: Likelihood of buying a product (Yes/No)
Classification – Learning Steps
Step 1: Data Collection
Gather a labeled dataset containing input features and their correct class labels.
Purpose:
Identify relationships between input features and output variable.
Make future predictions based on historical data.
Process:
[Link] labeled dataset with numeric outcomes.
[Link] a regression model to fit a curve/line.
[Link] continuous values for unseen inputs.
Examples:
Predicting house price based on location, size, and age.
Forecasting stock market trends.
Estimating student marks based on study hours.
Difference from Classification:
→
Classification categorical outputs (Yes/No).
→
Regression continuous outputs (numbers).
Common Regression Algorithms
1. Linear Regression
Models relationship between input (X) and output (Y) as a straight line.
2. Multiple Regression
Uses multiple independent variables to predict a dependent variable.
3. Polynomial Regression
Fits curved data by adding higher-order polynomial terms.
4. Logistic Regression
Used for binary classification, outputs probability (0–1).
Applications:
Predicting salary based on experience.
Estimating sales from advertising spend.
Advantages: Simple, interpretable, works well for linear data.
Multiple Regression
Definition: Extends linear regression by using two or more independent variables.
Example: Predicting house price using area, location, number of rooms, and age.
b3X^3 + … + bnX^nY=b0+b1X+b2X2+b3X3+…+bnXn.
Decision Boundary:
→
Probability > 0.5 Class 1
→
Probability ≤ 0.5 Class 0
Applications:
Medical diagnosis (disease/no disease).
Credit risk prediction (default/no default).
Maximum Likelihood Estimation (MLE)
Definition: A statistical method used to estimate parameters of a model.
Idea: Choose parameters that make the observed data most likely.
Steps:
Define likelihood function based on model.
Calculate probability of data given parameters.
Adjust parameters to maximize likelihood.
Applications in ML:
Used in Logistic Regression for parameter estimation.
Widely applied in probabilistic models (Naïve Bayes, Hidden Markov Models).