100% found this document useful (1 vote)

93 views

EDA Lecture Module 2

This document discusses classification techniques in machine learning. It covers topics like logistic regression, Bayes' theorem for classification, decision trees, bagging/boosting/random forests, support vector machines, and hyperplane classification. Logistic regression models the probability of class membership using a logistic function of the predictor variables. Bayes' theorem provides a way to calculate the posterior probability of class membership given features. Decision trees use a tree structure to split data into partitions based on feature values. Ensemble methods like bagging, boosting, and random forests combine multiple decision trees to improve performance. Support vector machines find a hyperplane that distinctly classifies data points.

Uploaded by

WINORLOSE

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

93 views

EDA Lecture Module 2

Uploaded by

WINORLOSE

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

CSE3506 - Essentials of Data Analytics

Facilitator: Dr Sathiya Narayanan S

Assistant Professor (Senior)

School of Electronics Engineering (SENSE), VIT-Chennai

Email: sathiyanarayanan.s@vit.ac.in
Handphone No.: +91-9944226963

Winter Semester 2020-21

Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 1 / 42

Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani,

“An Introduction to Statistical Learning with Applications in R”,
Springer Texts in Statistics, 2013 (Facilitator’s Recommendation).

Alpaydin Ethem, “ Introduction to Machine Learning”, 3rd Edition,

PHI Learning Private Limited, 2019.

Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 2 / 42

Contents

1 Module 2: Classification

Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 3 / 42

Module 2: Classification

Topics to be covered in Module-2

Logistic Regression
Bayes’ Theorem for classification
Decision Trees
Bagging, Boosting and Random Forest
Hyperplane for Classification
Support Vector Machines

Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 4 / 42

Module 2: Classification

Logistic Regression

Most common problems that occur when we fit a linear regression

model to a particular data set are: (i) non-linearity of the
response-predictor relationships, (ii) outliers and (iii) correlation of
error terms.
Moreover, the linear regression model assumes that the response
variable Y is quantitative (or numerical). But in many situations,Y is
instead qualitative (or categorical).
Consider predicting whether an individual will default on his or her
credit card payment, on the basis of annual income and monthly
credit card balance. Since the outcome is not quantitative, the linear
regression model is not appropriate.
In general, if the response Y falls into one of two categories (Yes or
No), logistic regression is used.
Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 5 / 42
Module 2: Classification

Logistic Regression

Rather than modeling Y directly, logistic regression models the

probability that Y belongs to a particular category.
For example, in the case of predicting whether an individual will
default on his or her credit card payment on the basis of monthly
credit card balance, logistic regression models the probability of
default as

Pr (default=Yes | balance) = p(balance).

The values of p(balance) ranges from 0 to 1. For any given value of

balance, a prediction can be made for default. For example, one
might predict default=Yes for any individual for whom p(balance)
exceeds a predefined threshold.
Logistic regression uses a logistic function to model this probability.
Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 6 / 42
Module 2: Classification

Logistic Regression

The logistic function for predicting the probability of Y on the basis

of a single predictor variable X is be expressed as

e β0 +β1 X
p(X ) =
1 + e β0 +β1 X
where β0 and β1 are the model parameters.
To fit the above model (i.e. to determine β0 and β1 ), a method called
maximum likelihood is used.
The estimates β0 and β1 are chosen to maximize the likelihood
function:
Y Y
`(β0 , β1 ) = p(xi ) × (1 − p(xi 0 )).
i:yi =1 i 0 :yi 0 =1

Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 7 / 42

Module 2: Classification

Logistic Regression

The logistic function can be manipulated as follows:

p(X )
= e β0 +β1 X
1 − p(X )
p(X )
The quantity 1−p(X ) is called the odds, and can take on any value
odds between 0 and ∞. Values of the odds close to 0 and ∞ indicate
very low and very high probabilities of default, respectively.
Taking logarithm on both sides of the above equation gives log-odds
or logit:
p(X )
loge = β0 + β1 X .
1 − p(X )
The logit of a logistic regression model is linear in X . Note that
loge () is natural logarithm which is usually denoted as ln().
Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 8 / 42
Module 2: Classification

Logistic Regression

Logistic regression can be extended to multiple logistic regression (i.e.

to make 2-class prediction based on p predictor variables X1 , X2 , ... ,
Xp ). The logistic function for multiple logistic regression can be
expressed as
e β0 +β1 X1 +β2 X2 +...+βp Xp
p(X ) = .
1 + e β0 +β1 X1 +β2 X2 +...+βp Xp
Model parameters can be chosen to maximize the same likelihood
function as in the case of single predictor variable.
The logit of a multiple logistic regression model will be linear in
{X1 , X2 , ..., Xp }.
Logistic regression can be extended to predict a response variable that
has more than two classes as well. However, for such tasks,
discriminant analysis is preferred.
Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 9 / 42
Module 2: Classification
Question 2.1
Consider the following training examples
Marks scored: X = [81 42 61 59 78 49]
Grade (Pass/Fail): Y = [Pass Fail Pass Fail Pass Fail]
Assume we want to model the probability of Y of the form
e β0 +β1 x
p(x) = 1+e β0 +β1 x which is parameterized by (β0 , β1 ).

(i) Which of the following parameters would you use to model p(x).
(a) (-119, 2)
(b) (-120, 2)
(c) (-121, 2)
(ii) With the chosen parameters, what should be the minimum mark to
ensure the student gets a ‘Pass’ grade with 95% probability?

Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 10 / 42

Module 2: Classification

Bayes’ Theorem for Classification

Bayes’ theorem is used in formulating the optimal classifier. The

classification task is: Given an input x, find the class ωi it belongs to.
Assume there are K ≥ 2 classes: ω1 , ω2 , ... , ωK .
The likelihood function of class k (i.e. the probability that class k has
x in it) is represented as p(x|ωk ) for k = 1, 2, ..., K .
The probability of deciding x belonging to ωk is denoted as p(ωk |x).
This probability distribution is generally unknown and it can be
estimated using Bayes’ theorem:

p(x|ωk ) p(ωk )
p(ωk |x) =
p(x)

where p(ωk ) is the probability of occurrence of class k and p(x) is the

probability of occurance of x. Note that p(x) is independent of k.

Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 11 / 42

Module 2: Classification

Bayes’ Theorem for Classification

Both p(x|ωk ) and p(ωk ) are apriori probabilities and they can be
estimated using training data. Using these apriori probabilities, the
posterior probability p(ωk |x) or its equivalent can be estimated.
The decision function for Bayes’ classifier is
K
X
dj (x) = − Lkj p(x|ωk ) p(ωk )
k=1

where Lkj is the loss/penalty due to misclassification. In general, Lkj

takes a value between 0 and 1. Since p(x) is independent of k, it
becomes a common term and hence it is not included in dj (x).
The decision is stated as follows:
x → ωi if di = max{dj } for j = 1, 2, ..., K
j

Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 12 / 42

Module 2: Classification

Question 2.2
Assume A and B are Boolean random variables (i.e. they take one of the
two possible values: True and False).
Given: p(A = True) = 0.3, p(A = False) = 0.7,
p(B = True|A = True) = 0.4, p(B = False|A = True) = 0.6,
p(B = True|A = False) = 0.6, p(B = False|A = False) = 0.4.

Calculate p(A = True|B = False) by appyling Bayes’ rule.

Hint: Use the relation

p(B = False) = p(B = False|A = True) × p(A = True)

+ p(B = False|A = False) × p(A = False).

Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 13 / 42

Module 2: Classification
Bayes’ Theorem for Classification
Naive Bayes’ Classifier (NBC) assumes conditional independence.
Two random variables A and B are said to be conditionally
independent given another random variable C if
p(A ∩ B|C ) = p(A, B|C ) = p(A|C ) × p(B|C ).
This implies, as long as the value of C is known and fixed, A and B
are independent. Equivalently, p(A|B, C ) = p(A|C ).
NBC is termed naive because of this strong assumption which is
unrealistic (for real data), yet very effective.
The joint probability distribution of n random variables A1 , A2 , ..., An
can be expressed as a product of n localized probabilities:
n
Y
p(∩nk=1 Ak ) = p(Ak | ∩k−1
j=1 Aj ).
k=1

Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 14 / 42

Module 2: Classification
Bayes’ Theorem for Classification
Consider the Bayesian network in Figure 1. It is a directed acyclic
graph in which each edge corresponds to a conditional dependency,
and each node corresponds to a unique random variable.
The network has 4 nodes: Cloudy, Sprinkler, Rain and WetGrass.
Since Cloudy has an edge going into Rain, it means that
p(Rain|Cloudy) will be a factor, whose probability values are specified
next to the Rain node in a conditional probability table.
Note that Sprinkler is conditionally independent of Rain given
Cloudy. Therefore,

p(Sprinkler|Cloudy, Rain) = p(Sprinkler|Cloudy).

Using the relationships specified by the Bayesian network, the joint

probability distribution can be obtained as a product of n factors (i.e.
n probabilities) by taking advantage of conditional independence.
Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 15 / 42
Module 2: Classification

Figure 1: Bayesian network - example 1

Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 16 / 42

Module 2: Classification

Question 2.3

(a) Consider the Bayesian network in Figure 1. Evaluate the following

probability distribution functions:
(i) p(Cloudy = True, Sprinkler = True, Rain = False, WetGrass = True)
(ii) p(Cloudy = True, Sprinkler = False, Rain = True, WetGrass = True)

(b) Consider the Bayesian network in Figure 2. Evaluate the following

probability distribution functions:
(i) p(a = 1, b = 0, c = 1, d = 1, e = 0)
(ii) p(a = 1, b = 1, c = 2, d = 0, e = 1)
(iii) p(a = 1, b = 1, c = 2, d = 0)

Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 17 / 42

Module 2: Classification

Figure 2: Bayesian network - example 2

Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 18 / 42

Module 2: Classification

Decision Trees
A decision tree is a hierarchical model for supervised learning. It can
be applied to both regression and classification problems.
A decision tree consists of decision nodes (root and internal) and leaf
nodes (terminal). Figure 3 shows a data set and its classification tree
(i.e. decision tree for classification).
Given an input, at each decision node, a test function is applied and
one of the branches is taken depending on the outcome of the
function. The test function gives discrete outcomes labeling the
branches (say for example, Yes or No).
The process starts at the root node (topmost decision node) and is
repeated recursively until a leaf node is hit. Each leaf node has an
output label (say for example, Class 0 or Class 1).
During the learning process, the trees grows, branches and leaf nodes
are added depending on the data.
Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 19 / 42
Module 2: Classification

Figure 3: Data set (left) and the corresponding decision tree (right) - Example of
a classification tree.

Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 20 / 42

Module 2: Classification

Decision Trees

Decision trees do not assume any parametric form for the class
densities and the tree structure is not fixed apriori. Therefore, a
decision tree is a non-parametric model.
Different decision trees assume different models for the test function,
say f (·). In a decision tree, the assumed model for f (·) defines the
shape of the classified regions. For example, in Figure 3, the test
functions define ‘rectangular’ regions.
In a univariate decision tree, the test function in each decision node
uses only one of the input dimensions.
In a classification tree, the ‘goodness of a split’ is quantified by an
impurity measure. Popular among them are entropy and Gini index. If
the split is such that, for all branches, all the instances choosing a
branch belong to the same class, then it is pure.

Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 21 / 42

Module 2: Classification

Question 2.4

What is specified at any non-leaf node in a decision tree?

(a) Class of instance (Class 0 or Class 1)

(b) Data value description
(c) Test function/specification
(d) Data process description

Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 22 / 42

Module 2: Classification

Advantages of Decision Trees

Fast localization of the region covering an input - due to hierarchical

placement of decisions. If the decisions are binary, it requires only
log2 (b) decisions to localize b regions (in the best case). In the case
of classification trees, there is no need to create dummy variables
while handling qualitative predictors.
Easily interpretable (in graphical form) and can be converted to easily
understandable IF-THEN rules. To some extent, decision trees mirror
human decision-making. For this reason, decision trees are sometimes
preferred over more accurate but less interpretable methods.

Disadvantages of Decision Trees

Greedy learning approach - they look for the best split at each step.
Low prediction accuracy compared to methods like regression.

Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 23 / 42

Module 2: Classification

Bagging, Boosting and Random Forest

Since the prediction accuracy of a decision tree is low (due to high

variance), techniques like bagging, random forests, and boosting
aggregate many decision trees to construct more powerful prediction
models.
Bagging creates multiple copies of the original training data using
the bootstrap (i.e. random sampling), fits a separate decision tree to
each copy, and then combines all of the trees in order to create a
single, powerful prediction model. Each tree is independent of the
other trees.
Boosting works in a way similar to bagging, except that the trees are
grown sequentially. Boosting does not involve random sampling;
instead each tree is grown using information from previously grown
trees (i.e. fit on a modified version of the original training data).

Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 24 / 42

Module 2: Classification

Bagging, Boosting and Random Forest

As in bagging, random forests build a number of decision trees on
bootstrapped training data. While building these trees, for each split,
a random sample of m predictors is chosen as split candidates from
the full set of p predictors and one among these m is used.
Suppose that there is one very strong predictor in the data set, along
with a number of other moderately strong predictors. In this case,
bootstrap aggregation (i.e. bagging) will not lead to a substantial
reduction in variance over a single tree.
Since in random forests only m out of p predictors are considered for
each split, on average p−mp of the splits will not even consider the
strong predictor, and therefore other predictors stand a chance. This
decorrelation process reduces the variance in the average of the
resulting trees and hence improves the reliability and the prediction
√
accuracy. Typically, m = p.
Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 25 / 42
Module 2: Classification

Question 2.5

Using a small value of m in building a random forest will typically be

helpful when

(a) the number of correlated samples is zero

(b) the number of correlated samples is small
(c) the number of correlated samples is large
(d) all predictors in the data set are moderately strong

Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 26 / 42

Module 2: Classification
Hyperplane for Classification
A hyperplane is a flat subspace of dimension p-1, in a p-dimensional
space. It is mathematically defined as
α0 + α1 X1 + α2 X2 + ... + αp Xp = 0.
The set of points X = {X1 , X2 , ...Xp } (i.e. vectors of length p)
satisfying the above equation lie on the hyperplane.
Suppose that,
α0 + α1 X1 + α2 X2 + ... + αp Xp > 0.
This shows the set of points lie on one side of the hyperplane.
On the other hand, if
α0 + α1 X1 + α2 X2 + ... + αp Xp < 0,
then the set of points lie on the other side of the hyperplane.
Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 27 / 42
Module 2: Classification

Hyperplane for Classification

In a 2-dimensional space (i.e. for p = 2), a hyperplane is a line

dividing the space into two halves. Figure 4 shows the hyperplane
1 + 2X1 + 3X2 = 0 dividing a 2-dimensional space into two. Similarly
for p = 3, a hyperplane is a plane dividing the 3-dimensional space
into two halves. In p > 3 dimensions, it becomes hard to visualize a
hyperplane but the notion of dividing p-dimensional space into two
halves still applies.
Consider a training data X of dimension n × p (i.e. a n × p data
matrix consisting of n training observations in p-dimensional space) in
which each of the observations fall into two classes, say Class -1 and
Class 1. Now, given a test observation x ? (i.e. a vector of p features
or variables), the concept of a separating hyperplane can be used to
develop a classifier that will correctly classify x ? .

Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 28 / 42

Module 2: Classification

Figure 4: The hyperplane (i.e. line) 1 + 2X1 + 3X2 = 0 in a 2-dimensional space.

Blue region: set of points satisfying 1 + 2X1 + 3X2 > 0. Purple region: set of
points satisfying 1 + 2X1 + 3X2 < 0

.
Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 29 / 42
Module 2: Classification

Hyperplane for Classification

If the class labels for Class -1 and Class 1 are yi = −1 and yi = 1,
respectively, then the separating hyperplane has the property that
yi (α0 + α1 xi,1 + α2 xi,2 + ... + αp xi,p ) > 0.
for all i = 1, 2, ..., n.
If there exists a hyperplane that separates the training observations
perfectly according to their class labels, then x ? can be assigned a
class depending on which side of the hyperplane it is located.
As shown in Figure 5, a classifier based on a separating hyperplane
leads to a linear boundary, and there can be more than one separating
hyperplane. The separating hyperplane that is farthest from the
training observations is considered for classification. It is called
optimal separating hyperplane or maximal margin hyperplane.
Figure 6 shows one such hyperplane.
Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 30 / 42
Module 2: Classification

Figure 5: Two classes of observations (shown in purple and blue), each having
two features/variables, and three separating hyperplanes.

Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 31 / 42

Module 2: Classification

Figure 6: Two classes of observations (shown in purple and blue), each having
two features/variables, and the optimal separating hyperplane or the maximal
marigin hyperplane
Facilitator: Dr Sathiya Narayanan S .
VIT-Chennai - SENSE Winter Semester 2020-21 32 / 42
Module 2: Classification

Hyperplane for Classification

Let M represent the marigin of the hyperplane. The maximal marigin
hyperplane is the solution to the following optimization problem:

maximizeα0 ,α1 ,...,αp M

p
X
subject to αj2 = 1,
j=1
p
X
yi α0 + αj xij ≥ M for all i = 1, 2, ..., n.
j=1

The two constraints in the above optimization problem ensures that:

(i) each training observation is in the correct side of the hyperplane;
and (ii) each observation is located at least a distance M from the
hyperplane.
Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 33 / 42
Module 2: Classification

Hyperplane for Classification

As shown in Figure 7, addition of a single observation leads to a
dramatic change in the maximal marigin hyperplane. Such highly
sensitive hyperplanes are problematic in the sense that they may
overfit the training data.
Consider a hyperplane that does not perfectly separate the two
classes, in the interest of: (i) robustness to individual observations;
and (ii) better classification of most of the training observations.
Classifier based on such hyperplane is called support vector classifier
(SVC) or soft marigin classifier.
The underlying assumption is, allowing misclassification of a few
training observations will result in a better classification of the
remaining observations.
The SVC is a natural approach for two-class classification, if the
boundary between the two classes is linear.
Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 34 / 42
Module 2: Classification

Figure 7: Two classes of observations (shown in purple and blue), each having
two features/variables, and two separating hyperplanes.

.
Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 35 / 42
Module 2: Classification

Hyperplane for Classification

The hyperplane for SVC is the solution to the following optimization

problem:

maximizeα0 ,α1 ,...,αp ,1 ,2 ,...,n M

p
X
subject to αj2 = 1,
j=1
p
X
yi α0 + αj xij ≥ M(1 − i ),
j=1
n
X
i ≥ 0, i ≤ C ,
i=1

for all i = 1, 2, ..., n and C is a non-negative tuning parameter.

Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 36 / 42
Module 2: Classification
Support Vector Machines
In real-world data, the class boundaries are often non-linear (as shown
in Figure 8) and in such scenarios, SVC or any linear classifier will
perform poorly.
In the case of SVC, in computing its coefficients, only inner products
are required. This inner product can be generalized as K (xi , xi 0 ),
where K is some function and it will be referred as a kernel. A linear
kernel will give back the SVC.
To handle non-linear boundaries, a polynomial kernel of degree d
(where d is a positive integer) is required. Using such a kernel with
d > 1 leads to a more flexible decision boundary compared to that of
a SVC. When the SVC is combined with a non-linear kernel, the
resulting classifier is known as a support vector machine (SVM).
Therefore, SVM is an extension of the SVC that enlarges the feature
space using polynomial kernels of degree d > 1, to handle non-linear
boundaries.
Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 37 / 42
Module 2: Classification

Figure 8: Two classes of observations (shown in purple and blue), with a

non-linear boundary separating them.

Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 38 / 42

Module 2: Classification

Support Vector Machines

The hyperplane for SVM using polynomial kernel of degree d = 2 is

the solution to the following optimization problem:

maximizeα0 ,α11 ,α12 ...,αp1 ,αp2 ,...,1 ,2 ,...,n M

p X
X 2
2
subject to αjk = 1,
j=1 k=1
p
X p
X
2
yi α0 + αj1 xij + αj2 xij ≥ M(1 − i ),
j=1 j=1
n
X
i ≥ 0, i ≤ C , for all i = 1, 2, ..., n.
i=1

Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 39 / 42

Module 2: Classification

Support Vector Machines

A radial kernel or radial basis function (RBF) is a popular non-linear
kernel used in SVMs. It takes the form
p
X
2
K (xi , xi 0 ) = exp − γ (xij − xi 0 j )
j=1

where γ is a positive constant. For a test observation x ? that is far

from a training observation xi , the value of K (xi , xi 0 ) will be tiny.
Therefore, the radial kernel has a local behavior, in the sense that
only nearby observations have an effect on the predicted class labels.
Figure 9 shows an example of an SVM with a radial kernel on a
non-linear data.
Usage of kernels (instead of simply expanding the feature space) in
SVM is computationally advantageous. A kernel-based approach
requires computation of K (xi , xi 0 ) for all n(n−1)
2 distinct pairs i and i 0 .
Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 40 / 42
Module 2: Classification

Figure 9: SVM with a radial kernel, on a non-linear data.

.
Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 41 / 42
Module 2: Classification

Module-2 Summary

Logistic regression: Modeling the probability that the response Y

belongs to a particular category, using a logistic function, on the basis
of single or multiple variables.
Bayes’ theorem for classification: Bayes’ classifier using conditional
independence
Decision trees and random forests: A non-parametric,
‘information-based learning’ approach which is easy to interpret.
Hyperplane for classification: maximal marigin classifier and SVC.
Support Vector Machines (SVMs): Extension of SVC to handle
‘non-linear boundaries’ between classes. Uses kernels for
computational efficiency. RBF kernel exhibits ‘local behavior’.

Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 42 / 42

Coding: MCQ's
100% (1)
Coding: MCQ's
3 pages
QuantEconlectures Python3 PDF
100% (1)
QuantEconlectures Python3 PDF
1,125 pages
Situational Poverty
100% (1)
Situational Poverty
9 pages
670 Series Version 2.2 IEC: Installation Manual
No ratings yet
670 Series Version 2.2 IEC: Installation Manual
100 pages
7. Heteroscedasticity: y = β + β x + · · · + β x + u
100% (1)
7. Heteroscedasticity: y = β + β x + · · · + β x + u
21 pages
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
100% (1)
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
42 pages
Sas Notes Module 4-Categorical Data Analysis Testing Association Between Categorical Variables
100% (1)
Sas Notes Module 4-Categorical Data Analysis Testing Association Between Categorical Variables
16 pages
8multiple Linear Regression
100% (1)
8multiple Linear Regression
21 pages
Correlation & Regression
100% (1)
Correlation & Regression
53 pages
CS229 Lecture 3 PDF
100% (1)
CS229 Lecture 3 PDF
35 pages
CPE412 Pattern Recognition (Week 8)
100% (1)
CPE412 Pattern Recognition (Week 8)
25 pages
Lecture 4 Linear Regression
100% (1)
Lecture 4 Linear Regression
44 pages
Community Medicine Trans - Epidemic Investigation 2
100% (1)
Community Medicine Trans - Epidemic Investigation 2
10 pages
Logistic Regression
100% (1)
Logistic Regression
17 pages
Stats For Managers - Intro
100% (1)
Stats For Managers - Intro
101 pages
Logistic Regression Model Study Assignment
100% (1)
Logistic Regression Model Study Assignment
5 pages
1.1 Simple Linear Regression Model
100% (1)
1.1 Simple Linear Regression Model
15 pages
Homework 2
100% (1)
Homework 2
14 pages
Blank: CFC Cumulative Forecast Error or Bias Error
100% (1)
Blank: CFC Cumulative Forecast Error or Bias Error
2 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
Unit 4 Basics of Feature Engineering
100% (1)
Unit 4 Basics of Feature Engineering
33 pages
Logistic Regression
100% (1)
Logistic Regression
14 pages
Homework 2
100% (1)
Homework 2
12 pages
Human Life Span Prediction Using Machine Learning
100% (1)
Human Life Span Prediction Using Machine Learning
9 pages
Introduction To STATISTICS-new
100% (1)
Introduction To STATISTICS-new
46 pages
KPMG Data
50% (2)
KPMG Data
3,723 pages
Scip y Lectures
100% (1)
Scip y Lectures
329 pages
Tutor
100% (1)
Tutor
309 pages
Data Analytics Week 3
100% (1)
Data Analytics Week 3
42 pages
Risk Return Summery
100% (1)
Risk Return Summery
85 pages
Forecasting of Stock Prices Using Multi Layer Perceptron
100% (1)
Forecasting of Stock Prices Using Multi Layer Perceptron
6 pages
Decision Tree Classification
100% (1)
Decision Tree Classification
11 pages
Stat1012 Cheatsheet Double-Sided
100% (1)
Stat1012 Cheatsheet Double-Sided
2 pages
Import As
100% (1)
Import As
27 pages
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
100% (1)
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
10 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Python Numpy (1) : Intro To Multi-Dimensional Array & Numerical Linear Algebra
100% (1)
Python Numpy (1) : Intro To Multi-Dimensional Array & Numerical Linear Algebra
27 pages
KPMG - Data Set
100% (1)
KPMG - Data Set
1,685 pages
Poly
100% (1)
Poly
108 pages
Quest Stat
100% (1)
Quest Stat
2 pages
Photon Prog Guide
100% (1)
Photon Prog Guide
919 pages
Preparation and Evaluation of Polyherbal Hair Oil
100% (1)
Preparation and Evaluation of Polyherbal Hair Oil
13 pages
Python For You and Me: Release 0.3.alpha1
100% (1)
Python For You and Me: Release 0.3.alpha1
143 pages
Python Vs R in Data and Machine Learning PDF
100% (1)
Python Vs R in Data and Machine Learning PDF
6 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
EFFIE 2002 Case Studies
100% (1)
EFFIE 2002 Case Studies
16 pages
Regression
100% (1)
Regression
20 pages
Case Study 2
100% (1)
Case Study 2
12 pages
Life Expectancy Using Data Analytics
100% (1)
Life Expectancy Using Data Analytics
9 pages
M&A Deal of ABC Inc. and XYZ Inc.: Insert Your Title Here
100% (1)
M&A Deal of ABC Inc. and XYZ Inc.: Insert Your Title Here
25 pages
Airbnbs in Seattle, Wa: Questions
100% (1)
Airbnbs in Seattle, Wa: Questions
5 pages
Taller Practica Churn
50% (2)
Taller Practica Churn
6 pages
Linear Regression (Check List)
100% (1)
Linear Regression (Check List)
2 pages
LPTHW
100% (1)
LPTHW
220 pages
Practical Problems in Statistic
100% (1)
Practical Problems in Statistic
8 pages
Regression Models Course Project
100% (1)
Regression Models Course Project
4 pages
Logistic Regression
100% (1)
Logistic Regression
56 pages
Lead Scoring Group Case Study Presentation
100% (2)
Lead Scoring Group Case Study Presentation
19 pages
SMDM - Week 1 Checklist
100% (1)
SMDM - Week 1 Checklist
3 pages
Employee Attrition Miniblogs
100% (1)
Employee Attrition Miniblogs
15 pages
CS263 - Bayesian Decision Theory
No ratings yet
CS263 - Bayesian Decision Theory
16 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Limits and Continuity (Calculus) Engineering Entrance Exams Question Bank
From Everand
Limits and Continuity (Calculus) Engineering Entrance Exams Question Bank
Mohmmad Khaja Shareef
No ratings yet
Reg - No Talent First Name Talent Middle Name
No ratings yet
Reg - No Talent First Name Talent Middle Name
32 pages
Essentials of Data Analytics: J Component Report
No ratings yet
Essentials of Data Analytics: J Component Report
25 pages
Sl. No. Full Name Gender Course: Internal
No ratings yet
Sl. No. Full Name Gender Course: Internal
6 pages
Module 4
No ratings yet
Module 4
40 pages
Ece3099 Ipt PPT Template 18becxxxx
No ratings yet
Ece3099 Ipt PPT Template 18becxxxx
27 pages
Sno Reg No First Name Last Name
No ratings yet
Sno Reg No First Name Last Name
14 pages
Schneider Electric - FTE R&D Job Description - 2022 Batch
No ratings yet
Schneider Electric - FTE R&D Job Description - 2022 Batch
32 pages
S.No Registration Number Full Name
No ratings yet
S.No Registration Number Full Name
29 pages
Q1 First Element After 4 Pass of Insertion Sort 4 5 8 3 7 9 6
No ratings yet
Q1 First Element After 4 Pass of Insertion Sort 4 5 8 3 7 9 6
3 pages
Test Name: CAT1 - CH2020215000725 - ECE3005 - M21 Name: RAJAL SHAH - Rajal - Shah2018@vitstudent - Ac.in
No ratings yet
Test Name: CAT1 - CH2020215000725 - ECE3005 - M21 Name: RAJAL SHAH - Rajal - Shah2018@vitstudent - Ac.in
7 pages
Name Reg - No
No ratings yet
Name Reg - No
4 pages
Winners Advice
No ratings yet
Winners Advice
40 pages
18bec1241 Tarp Report Team1
No ratings yet
18bec1241 Tarp Report Team1
100 pages
S.No Registration Number Full Name
No ratings yet
S.No Registration Number Full Name
12 pages
S.No Registration Number Full Name
No ratings yet
S.No Registration Number Full Name
39 pages
Dr. Vetrivelan. P School of Electronics Engineering: Loan Prediction Using Data Analytics
No ratings yet
Dr. Vetrivelan. P School of Electronics Engineering: Loan Prediction Using Data Analytics
31 pages
S.No Registration Number Full Name
No ratings yet
S.No Registration Number Full Name
17 pages
Candidatename Gender Degree Branch
No ratings yet
Candidatename Gender Degree Branch
2 pages
Full Profect Report
No ratings yet
Full Profect Report
39 pages
18bec1241 Team6 Esd J Report
No ratings yet
18bec1241 Team6 Esd J Report
22 pages
LIC LAB REPORT New
No ratings yet
LIC LAB REPORT New
27 pages
Function: Maintenance & Repair at Operational Level
No ratings yet
Function: Maintenance & Repair at Operational Level
2 pages
SURE - Gas Turbine Flowmeter Manual - V2 LWQ
100% (1)
SURE - Gas Turbine Flowmeter Manual - V2 LWQ
24 pages
Sae J98-2019
No ratings yet
Sae J98-2019
7 pages
Final Page
No ratings yet
Final Page
75 pages
ADVANCED FOOTSTEP POWER GENERATION SYSTEM - 4 Steps - Instructables
No ratings yet
ADVANCED FOOTSTEP POWER GENERATION SYSTEM - 4 Steps - Instructables
26 pages
3M 900AST Instructions Manual
No ratings yet
3M 900AST Instructions Manual
60 pages
Ipad 8 Gen Info
No ratings yet
Ipad 8 Gen Info
2 pages
Symantec DLP 15.5 System Requirements Guide
No ratings yet
Symantec DLP 15.5 System Requirements Guide
75 pages
Task 4: Team Working: A) The Roles Performed by The Team Members Based On Belbin's Idea
No ratings yet
Task 4: Team Working: A) The Roles Performed by The Team Members Based On Belbin's Idea
3 pages
Calibration Techniques. General Requirements
No ratings yet
Calibration Techniques. General Requirements
8 pages
Manual For Xioami XL Inventer
No ratings yet
Manual For Xioami XL Inventer
112 pages
confirme30.confirmeonline.com.br
No ratings yet
confirme30.confirmeonline.com.br
5 pages
KODAK PREPS Imposition Software 9.0-V7-20211123 - 205319
No ratings yet
KODAK PREPS Imposition Software 9.0-V7-20211123 - 205319
222 pages
Lecture Notes: RSCH8122042 - IS Research Methodology
No ratings yet
Lecture Notes: RSCH8122042 - IS Research Methodology
11 pages
Ar 21
No ratings yet
Ar 21
128 pages
Biography Dissertation
100% (2)
Biography Dissertation
8 pages
Nodal Analysis
No ratings yet
Nodal Analysis
38 pages
23 - Technology and Innovation Report 2021
No ratings yet
23 - Technology and Innovation Report 2021
1 page
I. Multiple Choice: Identify The Choice That Best Completes The Statement or Answers The Question
No ratings yet
I. Multiple Choice: Identify The Choice That Best Completes The Statement or Answers The Question
4 pages
Affiliate Marketing: How To Make Profits From It
100% (1)
Affiliate Marketing: How To Make Profits From It
17 pages
9852 2878 01 Maintenance Instructions COP1132B
No ratings yet
9852 2878 01 Maintenance Instructions COP1132B
42 pages
G10 - Math - Q1 - Module 7 Grade 10
No ratings yet
G10 - Math - Q1 - Module 7 Grade 10
12 pages
Internal
No ratings yet
Internal
267 pages
323-1851-102.7 (6500 R16.9 eMOTR CPS) Issue1
No ratings yet
323-1851-102.7 (6500 R16.9 eMOTR CPS) Issue1
88 pages
Padmanaban Sanjeevikumar Thesis
No ratings yet
Padmanaban Sanjeevikumar Thesis
137 pages
HC Asb 61 322
No ratings yet
HC Asb 61 322
44 pages
Data anh Thống
No ratings yet
Data anh Thống
99 pages
Fibocom_FM350_AT Commands User Manual_V2.10
No ratings yet
Fibocom_FM350_AT Commands User Manual_V2.10
344 pages
Alpha Company
No ratings yet
Alpha Company
20 pages