0% found this document useful (0 votes)

18 views18 pages

Supervised Learning Algorithms

The document provides an overview of supervised learning algorithms, focusing on concepts such as probabilistic supervised learning, support vector machines (SVM), and kernel methods. It explains the mechanics of SVM, including the optimization problem and the use of the kernel trick to handle non-linear relationships. Additionally, it briefly discusses other supervised learning techniques like K-nearest neighbors and decision trees.

Uploaded by

khyatisingh1910

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views18 pages

Supervised Learning Algorithms

Uploaded by

khyatisingh1910

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Deep Learning Srihari

Machine Learning Basics:

Supervised Learning Algorithms
Sargur N. Srihari
[email protected]

This is part of lecture slides on Deep Learning:

http://www.cedar.buffalo.edu/~srihari/CSE676 1
Deep Learning Srihari

What is Supervised Learning?

• Refers to learning algorithms that learn to
associate some input with some output given a
training set of inputs x and outputs y
• Outputs may be collected automatically or
provided by a human supervisor

4
Deep Learning Srihari

Probabilistic Supervised Learning

• Most supervised learning algorithms are based
on estimating a probability distribution p(y|x)
• We can do this by using MLE to find the best
parameter vector θ for a parametric family of
distributions p(y|x;θ)
• Linear regression corresponds to the family
p(y|x;θ)=N(y|θTx,I)
• We can generalize linear regression to
classification by using a different family of
probability distributions 5
Deep Learning Srihari
Probabilistic Supervised Classification
• If we only have two classes we only need to
specify the distribution for one of these classes
– The probability of the other class is known
For linear regression, For classification, a distribution over
normal distribution a binary variable is more complex.
p(y|x;θ)=N(y|θTx,I) Its mean must be between 0 and 1.
was parameterized by Solved using logistic sigmoid
its mean θTx to squash the output of a linear
Any value we supply for function into interval (0,1):
this mean is valid p(y=1|x;θ)=σ(θTx)
Known as logistic regression
– Linear regression has a closed-form solution
– But Logistic regression has no closed-form solution
6
• Negative log-likelihood is minimized with gradient descent
Deep Learning Srihari

Strategy for Supervised Learning

• Same strategy can be applied to any
supervised learning problem
– Write down a parametric family of conditional
probability distributions over the right kind of
input and output variables

7
Deep Learning Srihari

Support Vector Machines

• An influential approach to supervised learning
• Model is similar to logistic regression in that it is
driven by a linear function wTx+b
– Unlike logistic regression, SVM does not provide
probabilities, but only outputs class identity
• SVM predicts positive class when wTx+b>0
• SVM predicts negative class when wTx+b<0

8
Deep Learning Srihari

Kernel trick
• A key innovation associated with SVM
• Many ML algorithms can be written as dot
products between examples
– Ex: linear function used by SVM can be rewritten as
m
f (x) = w x + b = b + ∑ α x x
T
i
T (i )

i =1

• where x(i) is a training example, α a vector of coefficients

– Replace x by a feature function ϕ(x) and the dot
product with a kernel function k(x,x(i))=ϕ(x)ϕ(x(i))
• The operator represents inner product ϕ(x)Tϕ(x(i))
• We may not literally use the inner product
9
Deep Learning Srihari

SVM Optimization Problem

Support Equation of plane: Distance from x to plane:
Vectors g(x) = w • x + w 0
t
r=g(x)/||w||
For each input vector, let zk = +1
depending on whether input k is in class C1 or C2

Thus if g(y)=0 is a separating hyperplane then zk g(yk) > 0, k = 1,.., n

Since distance of a point y to hyperplane g(y)=0 is g(y)/||a|| we could require that
hyperplane be such that all points are at least distant b from it, i.e. z g(y )
k k
≥b
The goal is to find the weight vector a that satisfies || a ||
( )
z k g yk while maximizing b.
≥ b, k = 1,.....n To ensure uniqueness we impose the constraint b ||a|| = 1
a or b = 1/||a||which implies that ||a||2 is to be minimized
Support vectors are training vectors which represent equality. A quadratic optimization
problem: minimize a quadratic subject to linear inequality constraints.
optimize
1
arg min || a ||2
a,b 2 Can be cast as unconstrained problem by introducing Lagrange multipliers
subject to constraints with one multiplier αk for each constraint. The Lagrange function is
zka tyk ≥ 1, k = 1,.....n
n
1 2 n 1 n n
( )
L a, α = a − ∑ α k ⎡⎣zka tyk − 1⎤⎦
2 k =1
Dual problem: L(α ) = ∑α k −
k =1
∑∑ α kα j z k z j k ( y j , yk )
2 k =1 j =1

n where kernel is defined as:

subject to constraint: ∑ α z k k =0 α k ≥ 0, k = 1,....., n
k =1 k(y j ,yk ) = y j t ⋅ yk = φ(x j )t ⋅ φ(x k )
Deep Learning Srihari

Prediction using SVM

• After replacing dot products with kernel
evaluations, we can make predictions using
f (x) = b + ∑ α k(x,x )
i
(i )

• Depending on whether f (x)>0

– The function is nonlinear in x but the relationship
between f (x) and ϕ(x) is linear
• Also the relationship between α and f (x) is linear
• The kernel-based function is exactly equivalent
to preprocessing the data by applying ϕ(x) to all
inputs, then learning a linear model in the new
transformed space 11
Deep Learning Srihari

Efficacy & Efficiency of kernel

1. Kernel trick allows us to learn models that are
nonlinear as a function of x using convex
optimization guaranteed to converge efficiently
– Possible because ϕ is fixed and optimize only α
2. Kernel k implemented more efficiently than
constructing ϕ vectors and taking dot product
– k(x,x’) tractable even when ϕ(x) is intractable

12
Deep Learning Srihari

Gaussian Kernel
• Most commonly used kernel is k(u,v)=N(u-v;0,σ2I)
– Called radial basis: decreases along lines radiating
from u
• To see that it is a valid kernel
– Consider k(u,v) = exp (-||u-v||2/2σ2)
• By expanding the square ||u-v||2 = uTu + vTv - 2uTv
• we get k(u,v)=exp(-uTu/2σ2)exp(-uTv/σ2)exp(-vTv/2σ2)
– Validity follows from kernel construction rules
• If k1(x,x’) is a valid kernel, so are
• k(x,x’)=f(x)k1(x,x’)f(x’) and k(x,x’)=exp(k1(x,x’))
• together with validity of linear kernel k(u,v)=uTv
Deep Learning Srihari

Intuition of Gaussian kernel

• It performs a kind of template matching
• When a test point x’ is near a template x its
response is high, putting a large weight on the
associated training label y
• Overall, the prediction combines many such
training labels weighted by the similarity of the
corresponding training examples
• SVM is not the only algorithm enhanced by the
kernel trick
– Methods employing the kernel trick are kernel
methods
Deep Learning Srihari

Disadvantages of Kernel Methods

• Cost of decision function evaluation: linear in m
– Because the ith example contributes a term
αik(x,x(i)) to the decision function
– Can mitigate this by learning an α with mostly zeros
• Classification requires evaluating the kernel function only
for training examples that have a nonzero αi
• These are known as support vectors
• Cost of training: high with large data sets
• Generic kernels struggle to generalize well
– Neural network outperformed RBF kernel SVM on
15
MNIST benchmark
Deep Learning Srihari

Other simple supervised learning

• K-nearest neighbor
• Decision trees

16
Deep Learning Srihari

K-Nearest Neighbors
• A family of techniques that can be used for
regression or classification
• As a nonparametric learning algorithm:
– it is not restricted to a fixed no of parameters
• A simple function of the training data
– When we want to produce output y for input x, we
find k-nearest neighbors to x in the training data X.
We then return the average of the corresponding y
values in the training set
• Works for any kind of supervised learning where we can
define average over y values 17
Deep Learning Srihari

K-Nearest Neighbor Classification

• We can average over one-hot-vectors c with
cy=1 and ci=0 for all other values of I
– Can interpret the average values as giving a
probability distribution over classes
• K-nn classification has high capacity
– High accuracy given a large training set
• 1-nearest neighbor error rate approaches twice Bayes
error rate, with training set size, assuming 0-1 loss
– Error in excess of Bayes rate because of choosing a neighbor by
breaking ties between equally distant neighbors randomly
– If we chooses all neighbors to vote, we get Bayes rate

• A weakness: cannot learn one feature more

18
discriminative than another
Deep Learning Srihari

Decision Tree
• A learning algorithm that breaks the input
space into regions and has separate
parameters for each region
• Each node is associated with with a region
of input space
• Internal nodes break that region into onr=e
subregion for each child

19
Deep Learning Srihari

How a decision tree works

Each node of the tree

chooses to send the input example to the
child node on the left (0) or the child node
on the right (1). Internal nodes are circles
Leaf nodes are squares.
The tree divides space into regions.
This 2-D plane shows how a decision
tree might divide R2.
The nodes are plotted in this plane
with each internal node drawn along
with the dividing line it uses to
categorize examples and leaf nodes
drawn in the center of the region of
examples they receive. The result is a
20
piecewise-constant function, with one
piece per leaf.

Unit4 (ML)
No ratings yet
Unit4 (ML)
8 pages
AP For NLP-LO2
No ratings yet
AP For NLP-LO2
38 pages
This Is
No ratings yet
This Is
7 pages
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
No ratings yet
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
14 pages
6.1 DeepFFNets
No ratings yet
6.1 DeepFFNets
47 pages
Deep Feedforward Networks Guide
No ratings yet
Deep Feedforward Networks Guide
103 pages
Supervised Learning - Support Vector Machines and Feature Reduction
No ratings yet
Supervised Learning - Support Vector Machines and Feature Reduction
11 pages
Statistical Machine Learning Overview
No ratings yet
Statistical Machine Learning Overview
45 pages
Data Analysis ch1
No ratings yet
Data Analysis ch1
13 pages
5.2 MLBasics-Capacity
No ratings yet
5.2 MLBasics-Capacity
30 pages
Extending Machine Learning Models
No ratings yet
Extending Machine Learning Models
64 pages
08 Classification
No ratings yet
08 Classification
46 pages
Predicting Student Pass Rates
No ratings yet
Predicting Student Pass Rates
17 pages
Unit - 2
No ratings yet
Unit - 2
15 pages
Slide - SVM
No ratings yet
Slide - SVM
12 pages
Unit-1 Introduction To Machine Learning: 1. What Is Learning? Learning Data Example
No ratings yet
Unit-1 Introduction To Machine Learning: 1. What Is Learning? Learning Data Example
15 pages
Interview Preparing - ML Draft
No ratings yet
Interview Preparing - ML Draft
12 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
Deep Learning 1
No ratings yet
Deep Learning 1
48 pages
SVM Guide for Data Scientists
No ratings yet
SVM Guide for Data Scientists
48 pages
Poly Aml
No ratings yet
Poly Aml
76 pages
SVM Basics for Machine Learning Enthusiasts
No ratings yet
SVM Basics for Machine Learning Enthusiasts
4 pages
Machine Learning
No ratings yet
Machine Learning
78 pages
6.1 DeepFFNets M2
No ratings yet
6.1 DeepFFNets M2
48 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
5 pages
PR & ML: CS5691: Machine Learning
No ratings yet
PR & ML: CS5691: Machine Learning
42 pages
(CS434) Questions You Can Be Asked
No ratings yet
(CS434) Questions You Can Be Asked
9 pages
Vahid
No ratings yet
Vahid
18 pages
ML Basics Unit 4
No ratings yet
ML Basics Unit 4
29 pages
SVMs: Theory Meets Practice
No ratings yet
SVMs: Theory Meets Practice
12 pages
Statistical Learning Theory Guide
No ratings yet
Statistical Learning Theory Guide
4 pages
Survey Piccialli sciandrone4OR
No ratings yet
Survey Piccialli sciandrone4OR
29 pages
ML Unit 3
No ratings yet
ML Unit 3
17 pages
SVMs: Techniques & Applications
No ratings yet
SVMs: Techniques & Applications
42 pages
Optimization Problems For Machine Learning: A Survey
No ratings yet
Optimization Problems For Machine Learning: A Survey
41 pages
ML Unit 2
No ratings yet
ML Unit 2
37 pages
Machine Learning and Deep Learning Overview
No ratings yet
Machine Learning and Deep Learning Overview
6 pages
Matematics and Machine Learning
No ratings yet
Matematics and Machine Learning
156 pages
Maths For ML
No ratings yet
Maths For ML
156 pages
Understanding Kernel Tricks in SVMs
No ratings yet
Understanding Kernel Tricks in SVMs
43 pages
Slide 10 Chapter9 Classification Advanced Methods
No ratings yet
Slide 10 Chapter9 Classification Advanced Methods
46 pages
CS229
No ratings yet
CS229
216 pages
Be Central
No ratings yet
Be Central
98 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
ML Unit 3 V1
No ratings yet
ML Unit 3 V1
25 pages
Perceptron
100% (1)
Perceptron
3 pages
CS229 Andrew NG Lecture Notes
No ratings yet
CS229 Andrew NG Lecture Notes
216 pages
Unit 5 - Machine Learning
No ratings yet
Unit 5 - Machine Learning
17 pages
L6 Lecture Image - Classification.fundemental v4
No ratings yet
L6 Lecture Image - Classification.fundemental v4
66 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Data Science Unit 3
No ratings yet
Data Science Unit 3
33 pages
Financial Machine Learning-Unit-1: Dr. J.Dhanalakshmi
No ratings yet
Financial Machine Learning-Unit-1: Dr. J.Dhanalakshmi
70 pages
Lecture 03 - Feedforward Networks - 4p
No ratings yet
Lecture 03 - Feedforward Networks - 4p
19 pages
Intro to Support Vector Machines
No ratings yet
Intro to Support Vector Machines
25 pages
SSRN 3702236
No ratings yet
SSRN 3702236
8 pages
ECE 449 Notes
No ratings yet
ECE 449 Notes
5 pages
ML Models
No ratings yet
ML Models
21 pages
Comparative Analysis of VGG19 and AlexNet Models For Plant Disease Detection - 308
No ratings yet
Comparative Analysis of VGG19 and AlexNet Models For Plant Disease Detection - 308
6 pages
Data ScienceTech Institute Programs
No ratings yet
Data ScienceTech Institute Programs
24 pages
Computational and Data-Driven Chemistry Using Artificial Intelligence: Fundamentals, Methods and Applications Takashiro Akitsu
No ratings yet
Computational and Data-Driven Chemistry Using Artificial Intelligence: Fundamentals, Methods and Applications Takashiro Akitsu
49 pages
Nvidia Application Frameworks
No ratings yet
Nvidia Application Frameworks
55 pages
B.tech (CSBS) Syllabus
No ratings yet
B.tech (CSBS) Syllabus
397 pages
AI Class 10 Sample Paper - 2
No ratings yet
AI Class 10 Sample Paper - 2
3 pages
Gemini Ccs17
No ratings yet
Gemini Ccs17
14 pages
Huawei ICT Competition 2024-2025 Exam Outline - Cloud Track
No ratings yet
Huawei ICT Competition 2024-2025 Exam Outline - Cloud Track
1 page
ChineseFoodNet: Dataset for Dish Recognition
No ratings yet
ChineseFoodNet: Dataset for Dish Recognition
8 pages
Deep Learning for Food Nutrient Classification
No ratings yet
Deep Learning for Food Nutrient Classification
19 pages
24 Vol 100 No 12
No ratings yet
24 Vol 100 No 12
13 pages
Machine Learning (BCS-055) QUS & ANS
No ratings yet
Machine Learning (BCS-055) QUS & ANS
29 pages
Recent Advances and Prospects in Hypersonic Inlet Design and Intelligent Optimization
No ratings yet
Recent Advances and Prospects in Hypersonic Inlet Design and Intelligent Optimization
36 pages
Understanding Artificial Intelligence - Chapter1
No ratings yet
Understanding Artificial Intelligence - Chapter1
38 pages
A Robust Predictive Model For Stock Price Prediction Using Deep Learning and Natural Language Processing
No ratings yet
A Robust Predictive Model For Stock Price Prediction Using Deep Learning and Natural Language Processing
6 pages
Automating Parallelism in Deep Learning
No ratings yet
Automating Parallelism in Deep Learning
20 pages
Open-AI Driven Open-Source Open-Access Sustainable ICs Design Flow
No ratings yet
Open-AI Driven Open-Source Open-Access Sustainable ICs Design Flow
5 pages
Deep Learning Methods For Breast Cancer Detection and Classification: A Systematic Review
No ratings yet
Deep Learning Methods For Breast Cancer Detection and Classification: A Systematic Review
26 pages
Deepfake Video Detection System Using Deep Neural Networks
No ratings yet
Deepfake Video Detection System Using Deep Neural Networks
6 pages
Top 45 Machine Learning Interview Questions in 2025
100% (1)
Top 45 Machine Learning Interview Questions in 2025
37 pages
Uni Answerbank 1
No ratings yet
Uni Answerbank 1
736 pages
Deep Learning
No ratings yet
Deep Learning
7 pages
AI Powered Tropical Cloud Cluster Identification For INSAT
No ratings yet
AI Powered Tropical Cloud Cluster Identification For INSAT
8 pages
Voice User Interface: Literature Review, Challenges and Future Directions
No ratings yet
Voice User Interface: Literature Review, Challenges and Future Directions
26 pages
iDS-2CD71C5G0-IZS Datasheet V5.5.121 20211208
No ratings yet
iDS-2CD71C5G0-IZS Datasheet V5.5.121 20211208
8 pages
CNNs in Radiology: An Overview
No ratings yet
CNNs in Radiology: An Overview
20 pages
2024-Curse of Rarity For Autonomous Vehicles
No ratings yet
2024-Curse of Rarity For Autonomous Vehicles
5 pages
Patend On Ai Applacation
No ratings yet
Patend On Ai Applacation
14 pages
Efficient Online Learning Algorithms Based On LSTM Neural Networks
No ratings yet
Efficient Online Learning Algorithms Based On LSTM Neural Networks
12 pages
Amity Paper 2
No ratings yet
Amity Paper 2
17 pages

Supervised Learning Algorithms

Uploaded by

Supervised Learning Algorithms

Uploaded by

Deep Learning Srihari

Machine Learning Basics:

This is part of lecture slides on Deep Learning:

What is Supervised Learning?

Probabilistic Supervised Learning

Strategy for Supervised Learning

Support Vector Machines

• where x(i) is a training example, α a vector of coefficients

SVM Optimization Problem

Thus if g(y)=0 is a separating hyperplane then zk g(yk) > 0, k = 1,.., n

n where kernel is defined as:

Prediction using SVM

• Depending on whether f (x)>0

Efficacy & Efficiency of kernel

Intuition of Gaussian kernel

Disadvantages of Kernel Methods

Other simple supervised learning

K-Nearest Neighbor Classification

• A weakness: cannot learn one feature more

How a decision tree works

Each node of the tree

You might also like