Module 6
Regression Analysis
• A statistical method to model the relationship between a dependent
(target) and one or more independent variables.
• Predicts continuous/real values such as temperature, age, salary, price,
etc.
Regression Analysis
• Supervised learning technique
• Mainly used for prediction, forecasting, time series modeling, and
determining the causal-effect relationship between variables.
• We plot a graph between the variables which best fits the given
datapoints
• Using this plot, the machine learning model can make predictions
about the data
• Regression shows a line or curve that passes through all the
datapoints on target-predictor graph in such a way that the vertical
distance between the datapoints and the regression line is minimum.
Terminologies Related to the Regression
Analysis:
• Dependent Variable: Factor which we want to predict, also called
target variable.
• Independent Variable: The factors which affect the dependent
variables, also called as a predictor.
• Multicollinearity: If the independent variables are highly
correlated with each other than other variables, then such condition
is called Multicollinearity. It should not be present in the dataset,
because it creates problem while ranking the most affecting
variable.
• Overfitting: If algorithm works well with the training dataset but
not well with test dataset, then such problem is called Overfitting.
• Underfitting: If algorithm does not perform well even with training
dataset, then such problem is called underfitting.
Linear Regression Y= a+bX
• Supervised
• Linear regression shows the linear
relationship between the independent
variable and the dependent variable
• If there is only one input variable (x), then
such linear regression is called simple linear
regression.
• If there is more than one input variable, then
such linear regression is called multiple
linear regression
• Popular applications:
• Analyzing trends and sales estimates
• Salary forecasting
• Real estate prediction
• Arriving at ETAs in traffic.
Find the linear regression equation for the given data:
x y x y x2 xy
3 8 3 8 9 24
9 6 9 6 81 54
5 4 5 4 25 20
3 2 3 2 9 6 Linear regression
∑x = 20 ∑y = 20
∑x2 = ∑xy = equation is:
124 104
Y = 1/6x + 4.17
Multiple Linear Regression
β=((XTX)-1XT)Y
What Is Cost Function?
• Measures the performance of a machine learning model
• Quantifies the error between predicted and expected
values
• Mean absolute error:
• Mean squared error:
How do you find
the line of best fit?
• Gradient descent is a tool to arrive at the line of best fit
• Start with a random line and then continue adjusting the slope and y-
intercept of line little by little until you reach a local minimum, where the
sum of squared errors is the smallest and additional tweaks does not
produce better result.
• The size of each step is determined by parameter α known as Learning
Rate.
If slope is -ve : θj = θj – (- If slope is +ve : θj = θj –
ve value) (+ve value)
choose α to be very large choose α to be very small
Subject Diet Age>2
Male BMI
ID Score 0
A 4 0 1 27
B 7 1 1 29
C 6 1 0 23
D 2 0 0 20
E 3 0 1 21
Logistic Regression
• Supervised
• For Classification problems β0 = bias or intercept
• Works with the categorical variable such as term
0 or 1, Yes or No, True or False, Spam or not β1 = coefficient for input
spam, etc. (x)
• Works on the concept of probability.
• Update the parameters b0and b1 to better
fit the data.
• Use gradient descent or other optimization
techniques to iteratively update the
parameters until convergence.
• There are three types of logistic regression:
• Binary(0/1, pass/fail)
• Multi(cats, dogs, lions)
• Ordinal(low, medium, high)
Why not Linear Regression for Classification?
Bought
Age (x1) Income (x2) Product (y)
25 50000 0
30 60000 1
35 70000 0
40 80000 1
45 90000 1
Naïve Bayes
• Supervised learning algorithm for solving classification problems.
• Most effective Classification algorithms which helps in building the
fast machine learning models that can make quick predictions.
• It is a probabilistic classifier
• Eg: spam filtration, Sentimental analysis, and classifying articles.
P(A|B) is Posterior probability: Probability of hypothesis A on the
observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given
that the probability of a hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing
the evidence.
P(B) is Marginal Probability: Probability of Evidence.
Working of Naïve Bayes' Classifier:
Outlook Play
Weath No Yes
Rainy Yes er • Convert the given dataset
Sunny Yes Overca 0 5 5/14 into frequency tables.
Overcas Yes st =
t 0.35 • Generate Likelihood table by
Overcas Yes Rainy 2 2 4/14 finding the probabilities of
t =0.2 given features.
Sunny No 9
• And use Bayes theorem to
Rainy Yes Sunny 2 3 5/14
=0.3 calculate the posterior
Sunny Yes
5 probability.
Overcas Yes P(No|Sunny)=?
t P(Yes|Sunny)=?
All 4/14 10/1
=0. 4 =
Rainy No = 29 =0.7 P(Sunny|No)*P(No)/P(Sunny
Sunny No P(Sunny|Yes)*P(Yes)/P(Sunn
1 )
Sunny Yes y) P(Sunny|NO) = 2/4=0.5
Rainy No P(Sunny|Yes) = 3/10= So P(No|Sunny) =
Nearest Neighbor
• Supervised machine learning for classification and regression problems
• Works by finding the K nearest neighbors to a given data point based on a distance metric, such as
Euclidean distance.
• The class or value of the data point is then determined by the majority vote or average of the K
neighbors.
• This approach allows the algorithm to adapt to different patterns and make predictions based on the
local structure of the data.
Nearest Neighbor steps
Find the label class for Sepal
Length 5.2, and Width 3.1 belongs
to for k=2, 3, and 4
Step 1:
Step 2:
Selecting the
Calculating
optimal value
distance
of K
Step 4: Voting
Step 3: for
Finding Classification
Nearest or Taking
Neighbors Average for
Regression
Decision Trees
• Supervised learning algorithm for both classification and regression
• Each branch represents an outcome of the test, and each leaf node
holds a class label
• Constructed by recursively splitting the training data into subsets based
on the values of the attributes until a stopping criterion is met, such as
the maximum depth of the tree or the minimum number of samples
required to split a node.
• Entropy H(x): A measure of the randomness in the information being
processed. The higher the entropy, the harder it is to draw any
conclusions from that information.
• Information Gain (IG): a measure of the reduction in impurity
achieved by splitting a dataset on a particular feature in a decision tree.
• Pruning: The process of removing branches from the tree that do not
provide any additional information or lead to overfitting.
Decision Trees
Create a Decision Tree - Iterative Dichotomiser 3
1.For each attribute, determine the entropy or information gain resulting from splitting
the dataset.
2.Choose the attribute with the highest information gain as the node for the decision
tree.
3.Divide the dataset into subsets based on the selected attribute's information gain.
4.Repeat Recursively:
1. Apply steps 1-3 recursively to the subsets until a stopping condition is met, such
as:
1.All instances in a subset belong to the same class.
2.No more attributes are available for splitting.
3.A predefined depth limit is reached.
4.A predefined number of instances limit is reached.
5.Build Tree:
1. Construct the decision tree as recursion progresses.
2. Each internal node represents an attribute, and each leaf node represents a class
label.
Create a Decision Tree - Iterative Dichotomiser 3
Create a Decision Tree - Iterative Dichotomiser 3
Create a Decision Tree - Iterative Dichotomiser 3