0% found this document useful (0 votes)

6 views97 pages

Unit 2 ML

Uploaded by

reddykedharnathy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views97 pages

Unit 2 ML

Uploaded by

reddykedharnathy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

MACHINE LEARNING

UNIT 2
21CSC305P
Maximum Likelihood Estimation

• Maximum Likelihood Estimation (MLE) is a statistical method

used to estimate the parameters of a probability distribution that
best describe a given dataset.
• To analyze the data provided, we need to identify the
distribution from which we have obtained data.
• Next, use data to find the parameters of our distribution. A
parameter is a numerical characteristic of a distribution.
Maximum Likelihood Estimation
Example Distribution and their parameters:
• Normal distributions - mean (µ) & variance (σ2)
• Binomial distributions –no.of trials (n) & probability of success (p)
• Gamma distributions - shape (k) and scale (θ)
• Exponential distributions - inverse mean (λ).
• These parameters are vital for understanding the size, shape, spread,
and other properties of a distribution.
• Since the data that we have is mostly randomly generated, we often
don’t know the true values of the parameters characterizing our
distribution.
Examples of Probability Distributions
Normal Distribution (µ, σ²)
Blood pressure of healthy adults
Mean (µ) ≈ 120 mmHg
Variance (σ²) shows variation across individuals
Binomial Distribution (n, p)
Quality check of light bulbs
n = 20 bulbs tested
p = 0.1 defective probability
Models: Number of defective bulbs found
Gamma Distribution (k, θ)
Time until 5 patients arrive at ER
k = 5 arrivals
θ = 10 minutes (average gap)
Models: Waiting time for group events
Exponential Distribution (λ)
Time between bus arrivals
λ = 1/15 per minute
Mean waiting time = 15 minutes
Models: Time between random events
Maximum Likelihood Estimation
• An estimator is like a function of data that gives approximate values of
the parameters.
Ex: sample-mean estimator – Simple & Frequently used estimator

• Since the numerical characteristics of the distribution vary as a

function of the range of parameter it is not easy to estimate parameter
θ of the distribution.

• Maximum likelihood estimation, which is a process of estimation that

gives an entire class of estimators called maximum likelihood
estimators or MLEs
Maximum Likelihood Estimation

When to Use Log-Likelihood:

• When Dealing with Large Datasets: The likelihood function can become extremely small
as more data points are considered, leading to computational difficulties. The log-
likelihood avoids this by converting multiplication into addition.
• Simplifying Derivatives: When performing MLE, you often need to take derivatives to find
the maximum. The log-likelihood simplifies this process, as the logarithm of a product
becomes a sum of logarithms, making differentiation easier.
The log-likelihood is a transformed version of the likelihood function that is more
mathematically and computationally convenient for optimization in machine learning
models.
Maximum Likelihood Estimation in linear regression
Linear regression - solved example
Ordinary Least squares
Ordinary Least Squares (OLS) is the most common method used to estimate the
coefficients (β) in linear regression.
It finds the best-fitting line through the data by minimizing the sum of squared
errors.

We assume a linear relationship:

y=β0+β1x+ϵ

where ϵ is the error.

OLS chooses β0,β1 (intercept and slope) so that the total error
between predicted values (y^) and actual values (y) is as small as
possible.
Ordinary Least squares
The error we minimize is:

it is called least squares.

Drawback of OLS - multicollinearity

Multicollinearity happens when two or more predictors (independent variables) are highly
correlated.
This means they provide overlapping information about the target variable.
In regression, this makes it hard to separate their individual effects, leading to unstable
coefficients.
Multicollinearity
Example
We want to predict house price (y) using:
x1 = house size (sq ft)
x2 = number of rooms
larger houses usually have more rooms, so x1and x2are highly
correlated→ multicollinearity.

OLS tries to assign coefficients β1,β2to fit the data.

Because x1and x2 overlap, OLS struggles to decide how much of price is
explained by size vs. rooms.
Result: unstable and weird coefficients.
Example:
β1=200,β2=−150
Ridge regression
Ridge Regression is a regularization technique used in linear regression to handle
the problem of multicollinearity and overfitting. It is also known as L2
Regularization.
In ordinary linear regression (OLS), we minimize the Residual Sum of Squares (RSS):
Ridge Regression Objective Function
However, when features are highly correlated, OLS estimates become unstable (large
variance in coefficients). Ridge regression solves this problem by adding a penalty term to the
loss function.
Ridge Regression penalty term
The penalty term in Ridge regression penalizes large coefficients, shrinks
them toward zero, reduces model complexity, and combats multicollinearity
and overfitting.

Why Squared Penalty (L2)?

L2 penalty (squares) shrinks coefficients smoothly but never makes them
exactly zero.
That’s why Ridge keeps all features in the model.
(Compare with Lasso’s L1 penalty, which can shrink some coefficients to exactly
zero → feature selection.)
Example - OLS and Ridge regression
Example - OLS and Ridge regression
Example - OLS and Ridge regression
Example - OLS and Ridge regression
Logistic regression

Logistic regression is used to describe data and to explain the

relationship between one dependent binary variable and one or
more nominal (categorical), ordinal (logical order), interval (no true
zero) or ratio-level (true zero) independent variables.
Remember: though the name of algorithm carries regression, it is
used for classification.
Logistic regression predicts the output of a categorical dependent
variable. Therefore the outcome must be a categorical or discrete
value. It can be either Yes or No, 0 or 1, true or False, etc. but instead
of giving the exact value as 0 and 1, it gives the probabilistic values
which lie between 0 and 1.
Logistic regression - Types
Logistic regression

Decision Rule
≥
If p 0.5, predict class 1.
If p<0.5, predict class 0.
Logistic regression

Decision Rule
≥
If p 0.5, predict class 1.
If p<0.5, predict class 0.
Logistic regression - sigmoid function
Logistic regression - Coefficients
Logistic regression - solved problem

x1.deviati
x1.deviati
x2.deviation on*x2.dev
on
iation

(x1-x1bar)
(x2- Logit L=b0 + P=1/(1+exp
x1 x2 x1-x1bar x2-x2bar (x2- (x1-x1bar)^2 Y NEW Given Y Accuracy
x2bar)^2 b1*x1 + b2*x2 (-L))
x2bar)

1 68 166 27.1 77.2 2092.12 734.41 5959.84 216.2132671 1 1 1 1

2 70 178 29.1 89.2 2595.72 846.81 7956.64 225.5147413 1 1 1 1

3 72 170 31.1 81.2 2525.32 967.21 6593.44 226.7086169 1 1 1 1

4 66 124 25.1 35.2 883.52 630.01 1239.04 194.750395 1 1 0 0

5 66 115 25.1 26.2 657.62 630.01 686.44 191.1019757 1 1 0 0

6 67 135 26.1 46.2 1205.82 681.21 2134.44 201.4280318 1 1 0 0

sum 409 888 163.6 355.2 9960.12 4489.66 24569.84 1255.717028 50

x1- bar,
40.9 88.8
x2-bar
Logistic regression - solved problem

b1 (x1-x1 bar)(x2-x2bar)/(x1-x1bar)^2 2.218457522

b2 (x1-x1 bar)(x2-x2bar)/(x2-x2bar)^2 0.405379929

b0 x2bar-b1*x1bar -1.934912666
Logistic regression - sigmoid function

study pass LET

hours /fail β0=−31.165
1 0 β1=9.19
2 0
3 0 Use the formula and find the pass/fail status of
4 1
students who study for 3.5 hours, 5.2 hours and
4.5 hours
5 1
6 1
Robust linear Regression

• Robust linear regression is designed to be less sensitive to outliers compared to

traditional linear regression.
• Traditional linear regression minimizes the sum of squared residuals, which can
be heavily influenced by outliers.
• Robust linear regression uses different techniques to mitigate the effect of
outliers and produce a more reliable model.
Robust linear Regression

•Linear Regression: Suitable when the data meets the assumptions, especially

when there are no significant outliers and the relationship is linear.

•Robust Linear Regression: Appropriate when there are outliers or when the

assumptions of linear regression are violated, making it more reliable for real-world

data that may not adhere perfectly to theoretical assumptions.

Robust linear Regression
Robust linear Regression - Huber Loss
Robust linear Regression - Huber Loss
Given:
Residuals:
r=[−2, −0.5, 0.3, 1.2],,δ=1
Evaluate loss and ψ(r) for each residual.
Solution:

Case 1: r=−2
Check: ∣ ∣−2 =2>δ → use second formula.
Loss: L=1×(2−0.5)=1.5
ψ: ψ=1×sign(−2)=−1

Result: Loss = 1.5, ψ = -1

Robust linear Regression - Huber Loss
Case 2: r=−0.5
Check: ∣ ∣
−0.5 =0.5≤1 → use first formula.
Loss: L=12(r2)=0.5×0.25=0.125
ψ: ψ=r=−0.5
Result: Loss = 0.125, ψ = -0.5

Case 3: r=0.3
Check: ∣ ∣
0.3 =0.3≤1 → use first formula.
Loss: L=0.5×(0.3)power2=0.5×0.09=0.045
ψ: ψ=r=0.3
Result: Loss = 0.045, ψ = 0.3
Robust linear Regression - Huber Loss

Case 4: r=1.2
Check: ∣ ∣ →
1.2 =1.2>1 use second formula.
Loss: L=1×(1.2−0.5)=0.7
ψ: ψ=1×sign(1.2)=+1
Result: Loss = 0.7, ψ = +1
Baye’s theorem - Background
Bayesian Linear Regression
Bayesian Linear Regression

Prior P(β):
Our guess about coefficients before seeing data.
∣
Likelihood P(y,X β):
Probability of observing the data, given coefficients β\betaβ.
Evidence P(y,X):
Normalization constant to ensure a valid probability distribution.
∣
Posterior P(β y,X):
The updated distribution of β after combining prior + data.
This is what we ultimately use for prediction.
Bayesian Linear Regression
Discriminant Functions

A discriminant function is used in classification t asks to assign a given input to

one of several possible classes. It is designed to make decisions based on the
values of input features by computing a score for each class. The class with
the highest score is the one to which the input is assigned.
Discriminant Functions
Discriminant Functions
Discriminant Functions
Discriminant Functions
Discriminant Functions
Discriminant Functions
Discriminant Functions
Discriminant Functions
Discriminant Functions
Discriminant Functions
Discriminant Functions
Discriminant Functions
Discriminant Functions
Discriminant Functions- Fishers LDA

Fisher’s Linear Discriminant Analysis (LDA) is a supervised technique used

for dimensionality reduction and classification.
It finds a projection line that maximizes the separation between class
means while minimizing the within-class scatter.
The method computes a discriminant vector

to achieve optimal separation.

It is widely used in face recognition, speech recognition, and other pattern
classification tasks.
Discriminant Functions- Fishers LDA

Intuition
Suppose you have two classes of points (say red and blue) in a high-
dimensional feature space.
You want to project them onto a line such that they are as well separated as
possible.
Fisher’s idea: find the projection direction (a vector w) that maximizes the
distance between the class means while minimizing the spread (variance)
within each class.
Discriminant Functions- Fishers LDA

Projection
Each point x (a feature vector) is projected onto a line:

where w is the direction vector we are trying to find.

Discriminant Functions- Fishers LDA
Discriminant Functions- Fishers LDA
Discriminant Functions- Fishers LDA

Decision rule
Discriminant Functions- Fishers LDA
Discriminant Functions- Fishers LDA

Two classes are given:

Class 1 (C1): {2, 3, 4, 5, 6}
Class 2 (C2): {7, 8, 9, 10, 11}
Find Fisher’s LDA discriminant.

Step 1. Class means

Discriminant Functions- Fishers LDA

Step 2: Within-class scatter (Sw)

Discriminant Functions- Fishers LDA

Step 3: Fisher’s weight (w)

Step 4: Decision boundary

Laplace Approximation

Laplace approximation is a technique in machine learning and

st atistics used to approximate integrals, particularly when
dealing with Bayesian inference.
It's often applied when the exact calculation of posterior
distributions is intract able. The method relies on approximating
the posterior distribution with a Gaussian distribution centered
around the mode of the posterior.
This is achieved by t aking the second-order Taylor expansion of
the log-posterior distribution around it s mode.
Laplace Approximation
Key Steps in Laplace Approximation:
1.Find the Mode of the Posterior Distribution:
-Identify the maximum a posteriori (MAP) estimate, which is the mode of the
posterior distribution. This can be done using optimization techniques.
2.Approximate the Posterior with a Gaussian:
-Use a second-order Taylor expansion of the log-posterior distribution around
the MAP estimate. This result s in a quadratic approximation, which corresponds to a
Gaussian distribution.
3.Calculate the Hessian Matrix:
-The covariance of the approximating Gaussian is the inverse of the Hessian
matrix of the negative log-posterior evaluated at the mode. The Hessian captures
the curvature of the posterior distribution.
4. Obt ain the Approximation:
-With the mode and covariance matrix, the posterior is approximated as a
multivariate normal distribution.
Laplace Approximation
Laplace Approximation
Laplace Approximation
Laplace Approximation
Support Vector Machine
What is support vector?
• “Support Vector Machine” (SVM) is a supervised machine learning
algorithm which can be used for both classification or regression
challenges. However, it is mostly used in classification problems.
• In this algorithm, we plot each data item as a point in n-dimensional
space (where n is number of features you have) with the value of each
feature being the value of a particular coordinate.
• Then, we perform classification by finding the hyperplane that
differentiate the two classes very well.
Support Vector Machine

It can easily handle multiple continuous and categorical variables.

SVM constructs a hyperplane in multidimensional space to separate
different classes. SVM generates optimal hyperplane in an iterative
manner, which is used to minimize an error.
The core idea of SVM is to find a maximum marginal
hyperplane(MMH) that best divides the dataset into classes.
Support Vector Machine
Support Vector Machine
Support Vectors
–Support vectors are the data points, which are closest to the hyperplane. These points
will define the separating line better by calculating margins.
These points are more relevant to the construction of the classifier.
Hyperplane
–A hyperplane is a decision plane which separates between a set of objects having
different class memberships.
Margin
–A margin is a gap between the two lines on the closest class points.
–This is calculated as the perpendicular distance from the line to support vectors or
closest points.
–If the margin is larger in between the classes, then itis considered a good margin, a
smaller margin is a bad margin.
Support Vector Machine
•The main objective is to segregate the given dataset in the best possible way.
•The distance between the either nearest points is known as the margin.
•The objective is to select a hyperplane with the maximum possible margin
between support vectors in the given dataset. SVM searches for the maximum
marginal hyperplane in the following steps:
–Generate hyperplanes which segregates the classes in the best way.
–Select the right hyperplane with the maximum segregation from the either
nearest data points.
Support Vector Machine
Identify the right hyper-plane (Scenario-1):

Here, we have three hyper-planes (A, B, and C). Now, identify the right hyper-plane to classify stars and
circles.

You need to remember a thumb rule to identify the right hyper-plane: “Select the hyper-plane which
segregates the two classes better”. In this scenario, hyper-plane “B” has excellently performed this job
Support Vector Machine
Identify the right hyper-plane (Scenario-2):

•Here, we have three hyper-planes (A, B, and C) and all are segregating the classes well. Now, How can
we identify the right hyper-plane?

•Here, maximizing the distances between nearest data point (either class) and hyper-plane will help us to
decide the right hyper-plane. This distance is called as Margin
Support Vector Machine
Identify the right hyper-plane (Scenario-2):

you can see that the margin for hyper-plane C is high as compared to both A and B. Hence, we name
the right hyper-plane as C. Another lightning reason for selecting the hyper-plane with higher margin is
robustness. If we select a hyper-plane having low margin then there is high chance of miss-
classification.
Support Vector Machine
Identify the right hyper-plane (Scenario-3):
•Hint: Use the rules as discussed in previous section to identify the right hyper-plane.

•Some of you may have selected the hyper-plane B as it has higher margin compared to A.
But, here is the catch, SVM selects the hyper-plane which classifies the classes accurately
prior to maximizing margin. Here, hyper-plane B has a classification error and A has classified
all correctly. Therefore, the right hyper-plane is A.
Support Vector Machine
Identify the right hyper-plane (Scenario-4):
In the scenario below, we can’t have linear hyper-plane between the two classes, so how
does SVM classify these two classes? Till now, we have only looked at the linear hyper-
plane.
Support Vector Machine
Identify the right hyper-plane (Scenario-4):
•SVM can solve this problem. Easily! It solves this problem by introducing additional feature.
Here, we will add a new feature z=x^2+y^2. Now, let’s plot the data points on axis x and z:
Types of SVM

•Linear SVM:
-Linear SVMs use a linear decision boundary to separate the data points of different
classes.
-When the data can be precisely linearly separated, linear SVMs are very suitable.
This means that a single straight line (in 2D) or a hyperplane (in higher dimensions) can
entirely divide the data points into their respective classes.
•Non-Linear SVM:
-Non-Linear SVM can be used to classify data when it cannot be separated into two
classes by a straight line (in the case of 2D).
By using kernel functions, nonlinear SVMs can handle nonlinearly separable data.
Types of SVM
Support Vector Machine
1-Dimensional Data Transferable
The kernel trick used the kernel function to work with non-linearly
separable data.
A polynomial kernel with degree 2 has been applied in transforming the
data from 1-dimensional to 2-dimensional data.
Support Vector Machine
2-Dimensional Transferable
In the 2-dimensional case, the kernel trick is applied as below with the
polynomial kernel with degree 2.
It seems that observations have been classified successfully using a
linear plane after projecting the data into higher dimensions
Support Vector Machine - Kernel
Trick
SVM algorithms use a set of mathematical functions that are defined as the
kernel. The function of kernel is to take data as input and transform it into the
required form.
Firstly, a kernel takes the data from its original space and implicitly maps it
to a higher-dimensional space. This is crucial when dealing with data that is
not linearly separable in its original form.
Instead of performing computationally expensive high-dimensional
calculations, the kernel function calculates the relationships or similarities
between pairs of data points as if they were in this higher-dimensional
space.
Support Vector Machine - Kernel
Trick
Support Vector Machine - Kernel Trick
Support Vector Machine - Kernel Trick
Numerical Example for working of Kernel Function
Feature(x) -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
X2 36 25 16 9 4 2 0 1 4 9 16 25 36
Kernel Functions
Kernel Functions
Let κ(x, x’)≥ 0 be some measure of similarity between objects x, x’ ∈ X , where X is some
abstract space; we will call κ a kernel function.
We define a kernel function to be a real-valued function of two arguments, κ(x, x’ ) ∈ R, for x,
x’∈ X . Typically the function is symmetric (i.e., κ(x, x’ ) = κ(x’ , x)), and non-negative (i.e., κ(x,
≥
x’ ) 0.
Linear Kernel:
Let φ(x) = x, we get the linear kernel, defined by just the dot product between the two
object vectors: This is useful if the original data is already high dimensional, and if the
original features are individually informative,
•e.g., a bag of words representation where the vocabulary size is large, or the expression
level of many genes.
•In such a case, the decision boundary is likely to be representable as a linear combination
of the original features, so it is not necessary to work in some other feature space
Kernel Functions
Kernel Functions
Kernel Functions
Kernel Functions
Kernel Functions
Kernel Functions

Unit 2 - ML - SRM
No ratings yet
Unit 2 - ML - SRM
89 pages
Unit 2 - ML - SRM
No ratings yet
Unit 2 - ML - SRM
66 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
Understanding Logistic Regression in GLMs
No ratings yet
Understanding Logistic Regression in GLMs
87 pages
Binary Logistic Regression - 6.2
No ratings yet
Binary Logistic Regression - 6.2
34 pages
Advice: sciences/business/economics/kit-baum-workshops/Bham13P4slides PDF
No ratings yet
Advice: sciences/business/economics/kit-baum-workshops/Bham13P4slides PDF
11 pages
Output 23
No ratings yet
Output 23
6 pages
Unit III Regression
No ratings yet
Unit III Regression
24 pages
Understanding Maximum Likelihood
No ratings yet
Understanding Maximum Likelihood
5 pages
Lecture 05
No ratings yet
Lecture 05
5 pages
ML - Unit 2
No ratings yet
ML - Unit 2
155 pages
Output 25
No ratings yet
Output 25
8 pages
An Introduction To Logistic Regression
No ratings yet
An Introduction To Logistic Regression
48 pages
Unit-2 ML
No ratings yet
Unit-2 ML
199 pages
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
No ratings yet
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
6 pages
UnivariateRegression 3
No ratings yet
UnivariateRegression 3
81 pages
Day 4
No ratings yet
Day 4
29 pages
Chapter 4 - Linear Model: Prepared By: Shier Nee, SAW Based On: Probabilistic Machine Learning by Kevin Murphy
No ratings yet
Chapter 4 - Linear Model: Prepared By: Shier Nee, SAW Based On: Probabilistic Machine Learning by Kevin Murphy
42 pages
Module 2-Supervised Learning
No ratings yet
Module 2-Supervised Learning
74 pages
DA Unit-3
No ratings yet
DA Unit-3
11 pages
Logistic Regression Derivation Explained
No ratings yet
Logistic Regression Derivation Explained
6 pages
Maximum Likelihood Estimation Overview
No ratings yet
Maximum Likelihood Estimation Overview
103 pages
Unit-2 Machine Learning
No ratings yet
Unit-2 Machine Learning
148 pages
Data Analytics Iii Unit
No ratings yet
Data Analytics Iii Unit
8 pages
Logit
No ratings yet
Logit
48 pages
4.OptionalReadingHomework GUIDE
No ratings yet
4.OptionalReadingHomework GUIDE
4 pages
Binary Response Analysis in Regression
No ratings yet
Binary Response Analysis in Regression
39 pages
Updated Module2 - OTML Updated
No ratings yet
Updated Module2 - OTML Updated
83 pages
Econometrics - Exercise Set 2 (Solution)
No ratings yet
Econometrics - Exercise Set 2 (Solution)
12 pages
Ch2 Linear Regression Analysis
No ratings yet
Ch2 Linear Regression Analysis
57 pages
Regression 3
No ratings yet
Regression 3
5 pages
Classification With Logistic Regression: DR Sandipan Karmakar Mnit Jaipur
No ratings yet
Classification With Logistic Regression: DR Sandipan Karmakar Mnit Jaipur
54 pages
Simple Linear Regression Guide
No ratings yet
Simple Linear Regression Guide
7 pages
Week04 Lecture BB
No ratings yet
Week04 Lecture BB
80 pages
Logistic Regression
No ratings yet
Logistic Regression
20 pages
Regresi Logistik
No ratings yet
Regresi Logistik
34 pages
Logistic Regression
No ratings yet
Logistic Regression
14 pages
Unit 2
No ratings yet
Unit 2
19 pages
ML Lec-9
No ratings yet
ML Lec-9
13 pages
Linear Regression and Classification
No ratings yet
Linear Regression and Classification
8 pages
W5S01 - PM-Logistic Regression
No ratings yet
W5S01 - PM-Logistic Regression
17 pages
Quantitative Methods for Finance Overview
No ratings yet
Quantitative Methods for Finance Overview
21 pages
9 Mle
No ratings yet
9 Mle
39 pages
Emet2007 Notes
No ratings yet
Emet2007 Notes
6 pages
W8 - Logistic Regression
No ratings yet
W8 - Logistic Regression
18 pages
Regression Basics for Epidemiologists
No ratings yet
Regression Basics for Epidemiologists
18 pages
Aiml 5
No ratings yet
Aiml 5
43 pages
An Introduction To Logistic Regression: Johnwhitehead Department of Economics East Carolina University
No ratings yet
An Introduction To Logistic Regression: Johnwhitehead Department of Economics East Carolina University
48 pages
05 Regression Least Squares
No ratings yet
05 Regression Least Squares
5 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
Unit-2 Machine Learning
No ratings yet
Unit-2 Machine Learning
110 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Week 3 ML Estimation and REML
No ratings yet
Week 3 ML Estimation and REML
39 pages
Notes 05
No ratings yet
Notes 05
51 pages
Lec-03 LogisticRegression
No ratings yet
Lec-03 LogisticRegression
32 pages
Linear Regression
No ratings yet
Linear Regression
56 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Regression Analysis Essentials
No ratings yet
Regression Analysis Essentials
55 pages
ETM - Unit 2 New - 2021R
No ratings yet
ETM - Unit 2 New - 2021R
71 pages
Etm 2021r Unit 1 PDF
No ratings yet
Etm 2021r Unit 1 PDF
104 pages
Dsah 1
No ratings yet
Dsah 1
13 pages
Face Mask Detection Using Digital Image Processing and Machine Learning
No ratings yet
Face Mask Detection Using Digital Image Processing and Machine Learning
13 pages
Car Dealer Management System
No ratings yet
Car Dealer Management System
6 pages
Sliding Platform Link Project
No ratings yet
Sliding Platform Link Project
78 pages
Juniper Config Guide Routing
No ratings yet
Juniper Config Guide Routing
958 pages
CoPH Building Occupancy 12-8-10
No ratings yet
CoPH Building Occupancy 12-8-10
2 pages
Unidad 10: Formas Perfectas: Textos
No ratings yet
Unidad 10: Formas Perfectas: Textos
23 pages
For Learning Python in Combined Technical Services Examination
No ratings yet
For Learning Python in Combined Technical Services Examination
2 pages
Mobile Bill - Virgin
No ratings yet
Mobile Bill - Virgin
5 pages
Excel Logical Functions Explained
No ratings yet
Excel Logical Functions Explained
4 pages
UC900 SS23 Cat.7 LSH-FR C S1d1a1
No ratings yet
UC900 SS23 Cat.7 LSH-FR C S1d1a1
3 pages
Public - PSS Crypto GuideBook
No ratings yet
Public - PSS Crypto GuideBook
70 pages
Smart Data Monitoring System For Power Loom Using IOT
No ratings yet
Smart Data Monitoring System For Power Loom Using IOT
6 pages
1850 Information Security Officer Interview Questions Answers Guide
No ratings yet
1850 Information Security Officer Interview Questions Answers Guide
17 pages
MBII - Shared Parameters
No ratings yet
MBII - Shared Parameters
20 pages
EE340 Autumn22 EndSem
No ratings yet
EE340 Autumn22 EndSem
2 pages
CS7015 (Deep Learning) : Lecture 7
No ratings yet
CS7015 (Deep Learning) : Lecture 7
55 pages
Concept Learning for Beginners
No ratings yet
Concept Learning for Beginners
59 pages
CCNA 1 v3.1 Module 11 TCP/IP Transport and Application Layers
No ratings yet
CCNA 1 v3.1 Module 11 TCP/IP Transport and Application Layers
26 pages
ISC NIRScan User Manual - EN - V1.2
No ratings yet
ISC NIRScan User Manual - EN - V1.2
108 pages
Juju Bae - Books, Biography, Latest Update
No ratings yet
Juju Bae - Books, Biography, Latest Update
4 pages
Encase Computer Forensics Includes DVD The Official Ence Encase Certified Examiner Study Guide 2Nd Edition Steve Bunting
100% (9)
Encase Computer Forensics Includes DVD The Official Ence Encase Certified Examiner Study Guide 2Nd Edition Steve Bunting
39 pages
BUS 439 Course Outline FALL 2024 AB2 WED 830 - K Lownie
No ratings yet
BUS 439 Course Outline FALL 2024 AB2 WED 830 - K Lownie
5 pages
Sample Project Charter
100% (1)
Sample Project Charter
2 pages
IT ES308 IU 2F Datasheet - SWITCH ETHERNET UNMANAGED INDUSTRIAL
No ratings yet
IT ES308 IU 2F Datasheet - SWITCH ETHERNET UNMANAGED INDUSTRIAL
2 pages
Comparison Between DSRC & C-V2X
100% (1)
Comparison Between DSRC & C-V2X
2 pages
Brksec 2023
No ratings yet
Brksec 2023
82 pages
Ethics Essay
No ratings yet
Ethics Essay
4 pages
Resume - Bhavyasree
No ratings yet
Resume - Bhavyasree
3 pages
n8n Cheat Sheet
No ratings yet
n8n Cheat Sheet
1 page
Journey Management
100% (2)
Journey Management
13 pages
Real-Time Employee Attendance Monitoring With Mobile App
No ratings yet
Real-Time Employee Attendance Monitoring With Mobile App
5 pages

Unit 2 ML

Uploaded by

Unit 2 ML

Uploaded by

MACHINE LEARNING

• Maximum Likelihood Estimation (MLE) is a statistical method

• Since the numerical characteristics of the distribution vary as a

• Maximum likelihood estimation, which is a process of estimation that

When to Use Log-Likelihood:

We assume a linear relationship:

where ϵ is the error.

it is called least squares.

Drawback of OLS - multicollinearity

OLS tries to assign coefficients β1,β2​to fit the data.

Why Squared Penalty (L2)?

Logistic regression is used to describe data and to explain the

1 68 166 27.1 77.2 2092.12 734.41 5959.84 216.2132671 1 1 1 1

2 70 178 29.1 89.2 2595.72 846.81 7956.64 225.5147413 1 1 1 1

3 72 170 31.1 81.2 2525.32 967.21 6593.44 226.7086169 1 1 1 1

4 66 124 25.1 35.2 883.52 630.01 1239.04 194.750395 1 1 0 0

5 66 115 25.1 26.2 657.62 630.01 686.44 191.1019757 1 1 0 0

6 67 135 26.1 46.2 1205.82 681.21 2134.44 201.4280318 1 1 0 0

sum 409 888 163.6 355.2 9960.12 4489.66 24569.84 1255.717028 50

b1 (x1-x1 bar)(x2-x2bar)/(x1-x1bar)^2 2.218457522

b2 (x1-x1 bar)(x2-x2bar)/(x2-x2bar)^2 0.405379929

study pass LET

• Robust linear regression is designed to be less sensitive to outliers compared to

when there are no significant outliers and the relationship is linear.

data that may not adhere perfectly to theoretical assumptions.

Result: Loss = 1.5, ψ = -1

A discriminant function is used in classification t asks to assign a given input to

Fisher’s Linear Discriminant Analysis (LDA) is a supervised technique used

to achieve optimal separation.

where w is the direction vector we are trying to find.

Two classes are given:

Step 1. Class means

Step 2: Within-class scatter (Sw)

Step 3: Fisher’s weight (w)

Step 4: Decision boundary

Laplace approximation is a technique in machine learning and

It can easily handle multiple continuous and categorical variables.

You might also like

OLS tries to assign coefficients β1,β2to fit the data.