0% found this document useful (0 votes)
20 views15 pages

Classification and Regression

Learning about Classification and Regression.

Uploaded by

halkohi12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views15 pages

Classification and Regression

Learning about Classification and Regression.

Uploaded by

halkohi12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

PARUL INSTITUTE OF ENGINEERING & TECHNOLOGY

FACULTY OF ENGINEERING & TECHNOLOGY

Unit -4
Classification and Regression

Supervised Learning vs. Unsupervised Learning


Supervised learning and unsupervised learning are two fundamental paradigms in
machine learning, each with its own unique characteristics and applications.
1. Supervised Learning: -

Definition: In supervised learning, the algorithm is trained on a labeled dataset,


where each input data point is associated with a corresponding target or output.
The goal is to learn a mapping from inputs to outputs. –

Objective: The primary objective of supervised learning is to make predictions or


classify new, unseen data accurately based on the patterns learned from the labeled
data.
PARUL INSTITUTE OF ENGINEERING & TECHNOLOGY
FACULTY OF ENGINEERING & TECHNOLOGY

Examples: Classification and regression are common tasks in supervised learning.


Examples include image classification (assigning labels to images), spam email
detection (categorizing emails as spam or not), and predicting house prices based on
features like square footage and location.

2. Unsupervised Learning:

Definition: In unsupervised learning, the algorithm is trained on an unlabeled


dataset, where there are no target outputs provided. The algorithm must find
patterns or structure in the data on its own.

Objective: The main objective of unsupervised learning is to discover hidden


patterns, group similar data points together, or reduce the dimensionality of the
data.

Examples: Clustering and dimensionality reduction are common tasks in


unsupervised learning. Examples include clustering customers based on their
PARUL INSTITUTE OF ENGINEERING & TECHNOLOGY
FACULTY OF ENGINEERING & TECHNOLOGY

purchasing behavior (customer segmentation), topic modeling in text data, and


reducing the dimensionality of image data for easier visualization or processing.

Here are some key differences between the two: -

Data Type:
- Supervised learning requires labeled data (input-output pairs), whereas
-Unsupervised learning works with unlabeled data (only input data).

Objective:
- Supervised learning aims to predict or classify based on existing knowledge
from labeled data.
- Unsupervised learning aims to discover patterns or structure in data when no
prior information is available.

Applications:
- Supervised learning is used for tasks that involve making predictions or
PARUL INSTITUTE OF ENGINEERING & TECHNOLOGY
FACULTY OF ENGINEERING & TECHNOLOGY

decisions, such as classification and regression.


- Unsupervised learning is used for tasks like clustering, dimensionality reduction,
and data exploration.

Evaluation:
- In supervised learning, performance can be evaluated by comparing the model's
predictions to the true labels using metrics like accuracy, precision, recall, or
mean squared error.

- In unsupervised learning, evaluation is often less straightforward since there are no


target labels. Evaluation may involve assessing the quality of clusters or the
effectiveness of dimensionality reduction.

Examples:

- Supervised: Spam email detection, image classification, sentiment analysis.


- Unsupervised: Customer segmentation, anomaly detection, principal component
analysis (PCA).
PARUL INSTITUTE OF ENGINEERING & TECHNOLOGY
FACULTY OF ENGINEERING & TECHNOLOGY

Parameters Supervised machine Unsupervised machine


learning learning
Input Data Algorithms are trained Algorithms are used
using labelled data. against data that is not
labelled
Computational Simpler method Computationally complex
Complexity
Accuracy Highly accurate Less accurate
No. of classes No. of classes is known No. of classes is not known
Data Analysis Uses offline analysis Uses real-time analysis of
data
Algorithms used Linear and Logistics K-Means clustering,
regression, Random Hierarchical clustering,
Forest, Support Vector Apriori algorithm, etc.
Machine, Neural Network,
etc.
Output Desired output is given. Desired output is not
given.
Training data Use training data to infer No training data is used.
PARUL INSTITUTE OF ENGINEERING & TECHNOLOGY
FACULTY OF ENGINEERING & TECHNOLOGY

model.
Complex model It is not possible to learn It is possible to learn larger
larger and more complex and more complex models
models than with with unsupervised
supervised learning. learning.
Model We can test our model. We can not test our model.
Called as Supervised learning is also Unsupervised learning is
called classification. also called clustering.
Example Example: Optical Example: Find a face in an
character recognition. image.

Supervised Learning

Supervised learning is one of the primary paradigms in machine learning. In


supervised learning, an algorithm learns a mapping from input data to output
labels or targets by using a labeled dataset for training. Here are the key
components and steps involved in supervised learning:
PARUL INSTITUTE OF ENGINEERING & TECHNOLOGY
FACULTY OF ENGINEERING & TECHNOLOGY

1. Input Data (Features): This is the set of data points or observations that the
algorithm uses to make predictions or classifications. Each data point is
represented by a set of features or attributes that describe it. Features can be
numeric, categorical, or even more complex data types, depending on the
problem.

2. Output Labels (Targets): In supervised learning, each data point in the


training dataset is associated with a corresponding output label or target. These
labels represent the desired outcome or prediction that the algorithm should aim to
achieve.

3. Training Dataset: The training dataset is the labeled dataset used to train the
supervised learning model. It consists of a collection of input data samples and
their corresponding output labels. The model learns to make predictions by
finding patterns and relationships within this data.

4. Model Selection: Choose a machine learning algorithm or model that is suitable


for the problem at hand. Common supervised learning algorithms include linear
PARUL INSTITUTE OF ENGINEERING & TECHNOLOGY
FACULTY OF ENGINEERING & TECHNOLOGY

regression, decision trees, support vector machines, neural networks, and more.
The choice of model depends on factors such as the nature of the data and the
problem's requirements.

5. Training (Learning): The selected model is trained on the training dataset.


During training, the model adjusts its internal parameters or weights to
minimize the difference between its predictions and the actual output labels in the
training data. This process typically involves optimization techniques such as
gradient descent.

Linear Regression

Linear regression is a fundamental statistical and machine learning technique used


for modeling the relationship between a dependent variable (target) and one or
more independent variables (features or predictors) by fitting a linear equation
to the observed data. It is one of the simplest and most widely used regression
methods and is often employed for tasks like predicting numerical values (regression
problems).
PARUL INSTITUTE OF ENGINEERING & TECHNOLOGY
FACULTY OF ENGINEERING & TECHNOLOGY

Here are the key concepts and components of linear regression:

1. Linear Equation:

- In simple linear regression, which deals with one independent variable, the linear
equation is represented as: `y = mx + b`, where:
- `y` is the dependent variable (the one you want to predict).
- `x` is the independent variable (the feature).
- `m` is the slope of the line, representing the relationship between `x` and `y`.
- `b` is the y-intercept, which is the value of `y` when `x` is 0.

2. Multiple Linear Regressions:

- In multiple linear regression, you have more than one independent variable, and the
equation becomes: `y = b0 + b1*x1 + b2*x2 + ... + bn*xn`, where:
PARUL INSTITUTE OF ENGINEERING & TECHNOLOGY
FACULTY OF ENGINEERING & TECHNOLOGY

- `y` is still the dependent variable.


- `x1`, `x2`, ..., `xn` are the independent variables.
- `b0` is the y-intercept. - `b1`, `b2`, ..., `bn` are the coefficients associated with each
independent variable, representing their respective contributions to `y`.

3. Assumptions of Linear Regression: -

Linear relationship: There should be a linear relationship between the


independent and dependent variables.

- Independence: The residuals (the differences between predicted and actual values)
should be independent of each other.
- Homoscedasticity: The variance of the residuals should be constant across all
levels of the independent variables.
- Normality: The residuals should follow a normal distribution.
PARUL INSTITUTE OF ENGINEERING & TECHNOLOGY
FACULTY OF ENGINEERING & TECHNOLOGY

4. Least Squares Method: -

Linear regression aims to find the best-fitting line by minimizing the sum of squared
differences (residuals) between the predicted and actual values. This method is
called the least squares method.

5. Coefficient Estimation: -

The coefficients (`m` and `b` or `b0`, `b1`, `b2`, ...) are estimated during the training
process to find the best-fitting line that minimizes the sum of squared residuals.

6. Model Evaluation: -

Common metrics for evaluating linear regression models include Mean Absolute
Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and
R-squared (R²) which measures the proportion of variance in the dependent variable
explained by the model.
PARUL INSTITUTE OF ENGINEERING & TECHNOLOGY
FACULTY OF ENGINEERING & TECHNOLOGY

7. Overfitting and Underfitting: -

- Overfitting occurs when the model is too complex and fits the training data too
closely, leading to poor generalization to new data.
- Underfitting happens when the model is too simple and cannot capture the
underlying patterns in the data.

8. Regularization:

- Regularization techniques, like Ridge and Lasso regression, can be applied to


prevent overfitting and improve model stability.
PARUL INSTITUTE OF ENGINEERING & TECHNOLOGY
FACULTY OF ENGINEERING & TECHNOLOGY

Overfitting
Definition: Overfitting occurs when a model learns the training data too well, including
its noise and outliers, rather than capturing the underlying patterns. This leads to
excellent performance on the training set but poor performance on new, unseen data.
PARUL INSTITUTE OF ENGINEERING & TECHNOLOGY
FACULTY OF ENGINEERING & TECHNOLOGY

Symptoms:
 High accuracy or low error on the training set.
 Poor accuracy or high error on the validation or test set.
Causes:
 A model that is too complex relative to the amount and variability of the training
data (e.g., too many parameters or a very flexible model).
 Too many features or interactions that lead to a model that fits the noise in the data.
Prevention/Mitigation:
 Simplify the Model: Use a less complex model with fewer parameters.
 Regularization: Techniques like L1 (Lasso) or L2 (Ridge) regularization can
penalize large weights, helping to avoid overfitting.
 Cross-Validation: Use techniques like k-fold cross-validation to assess the model’s
performance on different subsets of the data.
 Early Stopping: Monitor the performance on a validation set and stop training when
performance starts to degrade.
PARUL INSTITUTE OF ENGINEERING & TECHNOLOGY
FACULTY OF ENGINEERING & TECHNOLOGY

 More Data: Increasing the size of the training dataset can help the model generalize
better.

You might also like