0% found this document useful (0 votes)

80 views43 pages

UNIT II Machine Learning

Machine learning unit-II

Uploaded by

indhureddy8688

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views43 pages

UNIT II Machine Learning

Machine learning unit-II

Uploaded by

indhureddy8688

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 43

UNIT – II

CHAPTER - I

Modelling
&
Evaluation

1
TRAINING A MODEL (FOR SUPERVISED
LEARNING)
1. Holdout Method

2
Drawback

The division of data of different classes into the

training and test data may not be proportionate.

 This problem can be solved by applying stratified

random sampling.
 By using this technique, the whole data is broken
into several homogenous groups or strata and a
random sample is selected from each such stratum.
 This ensures that the generated random partitions
have equal proportions of each class.
3
K-fold Cross-validation method
A special variant of holdout method, called repeated holdout

 In this technique, several random holdouts are used to measure

the model performance.
 In the end, the average of all performances is taken.
 As multiple holdouts have been drawn, the training and test data
are more likely to contain representative data from all classes
and resemble the original input data closely.
 This process of repeated holdout is the basis of k-fold cross-
validation technique. In k-fold cross-validation, the data set is
divided into k- completely distinct or non-overlapping random
partitions called folds.

4
Overall approach for K-fold cross-
validation

5
The value of ‘k’ in k-fold cross-validation can be set to any
number. However, there are two approaches which are
extremely popular:
1. 10-fold cross-validation (10-fold CV)
2. Leave-one-out cross-validation (LOOCV)
10-fold cross-validation approach:
In this approach, for each of the 10-folds, each comprising of
approximately 10% of the data, one of the folds is used as
the test data for validating model performance trained based
on the remaining 9 folds (or 90% of the data). This is
repeated 10 times, once for each of the 10 folds being used
as the test data and the remaining folds as the training data.
 The average performance across all folds is being reported.

 Figure depicts the detailed approach of selecting the ‘ k’ folds in k-fold

cross-validation.

 As can be observed in the figure, each of the circles resembles a

record in the input data set whereas the different colours indicate

the different classes that the records belong to.

 The entire data set is broken into ‘k’ folds – out of which one fold is

selected in each iteration as the test data set.

 The fold selected as test data set in each of the ‘ k’ iterations is

different. As shown in figure, the circles resemble the records in the

input data set, the contiguous circles represented as folds are not

subsequent records in the data set. The records in a fold are drawn by

using random sampling technique.

7
Detailed approach for fold selection

8
Leave-one-out cross-validation (LOOCV) is an extreme case of k-fold
cross-validation using one record or data instance at a time as a test
data. This is done to maximize the count of data used to train the model. It
is obvious. that the number of iterations for which it has to be run is equal
to the total number of data in the input data set. Hence, obviously, it is
computationally very expensive and not used much in practice.

 Bootstrap sampling: It is used to identify training and test data sets from the
input data set.
 It uses the technique of Simple Random Sampling with Replacement (SRSWR)
 This technique randomly picks data instances from the input data set, with the
possibility of the same data instance to be picked multiple times.
 This means that from the input data set having ‘ n’ data instances,
bootstrapping can create one or more training data sets having ‘ n’ data
instances, some of the data instances being repeated multiple times. This
technique is useful in case of input data sets of small size, i.e. having very
less number of data instances.
9
10
11
Lazy vs. Eager learner

• Eager learning:
 It tries to construct an input- independent target function during the
model training phase.
 It comes up with a trained model at the end of the learning phase.
 Hence, when the test data comes in for classification, the eager
learner is ready with the model and doesn’t need to refer back to the
training data.
 It takes more time in the learning phase than the lazy learners.
 Some of the algorithms which adopt eager learning approach include
Decision Tree, Support Vector Machine, Neural Network, etc.

12
Lazy learning:
 It doesn’t ‘learn’ anything.

 It uses the training data in exact, and uses the knowledge to classify
the unlabelled test data.
 Since lazy learning uses training data as-is, it is also known as rote learning
(i.e. memorization technique based on repetition).
 Due to its heavy dependency on the given training data instance, it is also
known as instance learning.
 They are also called non-parametric learning.

 Lazy learners take very little time in training because not much of training
actually happens.
 One of the most popular algorithm for lazy learning is k - nearest
neighbor.

13
MODEL REPRESENTATION AND INTERPRETABILITY
• The goal of supervised machine learning is to learn or derive a
target function which can best determine the target variable
from the set of input variables.
• A key consideration in learning the target function from the
training data is the extent of generalization. This is because the
input data is just a limited, specific view and the new, unknown
data in the test data set may be differing quite a bit from the
training data.
• Fitness of a target function approximated by a learning
algorithm determines how correctly it is able to classify a set
of data it has never seen.

14
Underfitting
• If the target function is kept too simple, it may not be able
to capture the essential nuances and represent the
underlying data well.
• A typical case of underfitting may occur when trying to
represent a non-linear data with a linear model as
demonstrated by both cases of underfitting shown in figure.
• Many times underfitting happens due to unavailability of
sufficient training data. Underfitting results in both poor
performance with training data as well as poor
generalization to test data. Underfitting can be avoided by
1. using more training data
2. reducing features by effective feature selection 15
Underfitting and Overfitting of
models
16
Overfitting
 The model has been designed in such a way that it matches the training
data too closely.
 It has an impact on the performance of the model on the test data.

 Overfitting occur as a result of trying to fit an excessively complex

model to closely match the training data.
 The target function tries to make sure all training data points are
correctly partitioned by the decision boundary.
 This exact nature is not replicated in the unknown test data set. Hence, the
target function results in wrong classification in the test data set.
Overfitting results in good performance with training data set, but poor
performance with test data set. Overfitting can be avoided by 1. using re-
sampling techniques like k-fold cross validation. 2.hold back of a validation
data set 3. remove the nodes which have little or no predictive power for
the given machine learning problem. 17
Bias – variance trade-off
• In supervised learning, the class value assigned by the learning
model built based on the training data may differ from the actual
class value. This error in learning can be of two types – errors due
to ‘bias’ and error due to ‘variance’.
Errors due to ‘Bias’ :
Arise from simplifying assumptions made by the model to make the
target function less complex or easier to learn. In short, it is due to
underfitting of the model. Parametric models generally have high bias
making them easier to understand/interpret and faster to learn.
These algorithms have a poor performance on data sets, which are
complex in nature and do not align with the simplifying assumptions
made by the algorithm. Underfitting results in high bias.
18
Errors due to ‘Variance’

• Errors due to variance occur from difference in training

data sets used to train the model.

• Different training data sets are used to train the model.

• Ideally the difference in the data sets should not be

significant and the model trained using different training

data sets should not be too different.

• However, in case of overfitting, since the model closely

matches the training data, even a small difference in

training data gets overstated in the model.

19
Bias-variance trade-off

20
• So, the problems in training a model can happen because either (a)
the model is too simple and hence fails to interpret the data grossly
or (b) the model is extremely complex and magnifies even small
differences in the training data.
• Increasing the bias will decrease the variance, and Increasing the
variance will decrease the bias.
• Parametric algorithms demonstrate high bias but low variance.
• non-parametric algorithms demonstrate low bias & high variance.
• The goal of supervised machine learning is to achieve a balance
between bias and variance.
• For example, in k-Nearest Neighbors or k-NN, the user configurable
parameter ‘k’ can be used to do a trade-off between bias and
variance. When the value of ‘k’ is decreased, the model becomes
simpler to fit and bias increases. When ‘k’ is increased, the variance
21
increases.
EVALUATING PERFORMANCE OF A MODEL
Supervised learning - classification
 In supervised learning, one major task is classification.
 The responsibility of the classification model is to assign class label
to the target feature based on the value of the predictor features.
 For example, in the problem of predicting the win/loss in a cricket
match, the classifier will assign a class value win/loss to target feature
based on the values of other features like
whether the team won the toss,
number of spinners in the team,
number of wins the team had in the tournament, etc.
 To evaluate the performance of the model, the number of correct
classifications or predictions made by the model has to be recorded.
 A classification is said to be correct if, say for example in the given
problem, it has been predicted by the model that the team will win and
it has actually won.
22
The accuracy of the model is calculated based on the number of correct and incorrect
classifications or predictions made by a model.
 If 99 out of 100 times the model has classified correctly, e.g. 99% accuracy in case of a
sports win predictor model may be reasonably good but
 the same number i.e., 99% may not be acceptable problem is to deal with predicting a
critical illness. In this case, even the 1% incorrect prediction may lead to loss of many
lives. So the model performance needs to be evaluated based on the learning problem.
So, let’s start with looking at model accuracy more closely.
There are four possibilities with regards to the cricket match win/loss prediction:
1. the model predicted win and the team won 2. the model predicted win and the team
lost 3.the model predicted loss and the team won 4.the model predicted loss and the
team lost. In this problem, the obvious class of interest is ‘win’.
The first case, i.e. the model predicted win and the team won is a case where the model
has correctly classified data instances as the class of interest. These cases are referred
as True Positive (TP) cases. The second case, i.e. the model predicted win and the team
lost is a case where the model incorrectly classified data instances as the class of
interest. These cases are referred as False Positive (FP) cases.

23
 The third case, i.e. the model predicted loss and the team won is
a case where the model has incorrectly classified as not the class
of interest. These cases are referred as False Negative (FN)
cases.
 The fourth case, i.e. the model predicted loss and the team lost is
a case where the model has correctly classified as not the class of
interest. These cases are referred as True Negative (TN) cases.
All these four cases are depicted in Figure 3.7 .
For any classification model, model accuracy is given by total number
of correct classifications (either as the class of interest, i.e. True
Positive or as not the class of interest, i.e. True Negative) divided by
total number of classifications done.

24
 A matrix containing correct and incorrect predictions in the form of TPs,
FPs, FNs and TNs is known as confusion matrix.
 The win/loss prediction of cricket match has two classes of interest –
win and loss.
 For that reason it will generate a 2 × 2 confusion matrix.
 For a classification problem involving three classes, the confusion matrix
would be 3 × 3, etc.
Let’s assume the confusion matrix of the win/loss prediction of cricket match
problem to be as below:

In context of the above confusion matrix, total count of TPs = 85, count of
FPs = 4, count of FNs = 2 and count of TNs = 9.

25
The percentage of misclassifications is indicated using error rate which is
measured as

In context of the above confusion matrix,

Sometimes, correct prediction, both TPs as well as TNs, may happen by

mere coincidence, but, it should not happen.
Kappa value of a model indicates the adjusted the model accuracy. It is
calculated using the formula below:

26
In context of the above confusion matrix,
total count of TPs = 85, count of FPs = 4,
count of FNs = 2 and count of TNs = 9.

Kappa value can be 1 at the maximum, which represents perfect

agreement between model’s prediction and actual values. 27
In certain learning problems, it is critical to have extremely low number of FN
cases, because, it will have a different consequence. For Ex, cancer detection. In
these problems there are some measures of model performance which are more
important than accuracy. Two such measurements are sensitivity and specificity
of the model.
The sensitivity of a model measures the proportion of TP examples or positive
cases which were correctly classified. It is measured as

In the context of the above confusion matrix for the cricket match win prediction
problem,

A high value of sensitivity is more desirable than a high value of accuracy.

Specificity of a model measures the proportion of negative examples which

have been correctly classified. In the context, of malignancy prediction of
tumours, specificity gives the proportion of benign tumours which have been
correctly classified. In the context of the above confusion matrix for the cricket
match win prediction problem,

A higher value of specificity will indicate a better model performance.

28
There are two other performance measures of a supervised learning model
which are similar to sensitivity and specificity. These are precision and recall.
While precision gives the proportion of positive predictions which are truly
positive, recall gives the proportion of TP cases over all actually positive
cases.

Precision indicates the reliability of a model in predicting a class of

interest. When the model is related to win / loss prediction of cricket,
precision indicates how often it predicts the win correctly. In context of the
above confusion matrix for the cricket match win prediction problem,

A model with higher precision is perceived to be more reliable.

29
Recall indicates the proportion of correct prediction of positives to the total
number of positives. In case of win/loss prediction of cricket, recall resembles
what proportion of the total wins were predicted correctly.

In the context of the above confusion matrix for the cricket match win
prediction problem,

F-measure is another measure of model performance which combines the

precision and recall. It takes the harmonic mean of precision and recall as
calculated as

In context of the above confusion matrix for the cricket match win prediction
problem,

30
Supervised learning – regression

A regression model which ensures that the difference

between predicted and actual values is low can be
considered as a good model.
The following figure represents a very simple problem of
real estate value prediction solved using linear
regression model.
If ‘area’ is the predictor variable (Input variable say x)
and ‘value’ is the target variable (output variable say y),
the linear regression model can be represented in the
form:

31
FIG. 3.9 Error – Predicted vs. actual value

32
 For a certain value of x, say x̂ , the value of y is predicted as ŷ
whereas the actual value of y is Y (say).
 The distance between the actual value and the fitted or predicted
value, i.e. ŷ is known as residual.
 The regression model can be considered to be fitted well, if the
difference between actual and predicted value, i.e. the residual
value is less.
R-squared is a good measure to evaluate the model fitness. It is also
known as the coefficient of determination, or
for multiple regression, the coefficient of multiple determination.
The R-squared value lies between 0 to 1 (0%–100%) with a larger
value representing a better fit. It is calculated as:

33
Sum of Squares Total (SST) = squared differences of
each observation from the overall mean =
𝑛

∑ ( yi − y ̅ ) 2
𝑖 =1

where y̅ is the mean.

Sum of Squared Errors (SSE) (of prediction) = sum of

the squared residuals =

∑ ( yi − ¿ 𝑦 ) 2
𝑖 =1

where ^y is the predicted value of yi and Yi is the actual value of yi.

34
Unsupervised learning - clustering
 Clustering algorithms try to prepare groupings amongst the data sets.
 But, it is quite difficult to evaluate the performance of a clustering
algorithm.
 There are two challenges which lie in the process of clustering:
 It is generally not known how many clusters can be
formulated from a particular data set.
 Even if the number of clusters is given, the same number of
clusters can be formed with different groups of data instances.
A clustering algorithm is successful if the clusters identified using the
algorithm is able to achieve the right results in the overall problem domain.

35
popular approaches which are adopted for cluster quality evaluation.

1. Internal evaluation

 The internal evaluation methods generally measure cluster quality based on

homogeneity of data belonging to the same cluster and heterogeneity of
data belonging to different clusters.
 The homogeneity/heterogeneity is decided by some similarity measure.
 For example, silhouette coefficient, which is one of the most popular internal
evaluation methods, uses distance (Euclidean or Manhattan distances most
commonly used) between data elements as a similarity measure.
 The value of silhouette width ranges between –1 and +1, with a high value
indicating high intra- cluster homogeneity and inter-cluster heterogeneity.
For a data set clustered into ‘k’ clusters, silhouette width is calculated as:

36
a (i ) is the average distance between the i data instance and all
th

other data instances belonging to the same cluster and b(i) is the
lowest average distance between the i-the data instance and data
instances of all other clusters.

FIG. Silhouette width calculation

37
Let’s try to understand this in context of the example depicted in figure.
There are four clusters namely cluster 1, 2, 3, and 4.
Let’s consider an arbitrary data element ‘i’ in cluster 1, resembled by the
asterisk.
a(i) is the average of the distances ai1, ai2, ..., ain1 of the different data
elements from the i th data element in cluster 1, assuming there are n1 data
elements in cluster 1. Mathematically,

In the same way, let’s calculate the distance of an arbitrary data element
‘i’ in cluster 1 with the different data elements from another cluster, say
cluster 4 and take an average of all those distances. Hence,

38
where n4 is the total number of elements in cluster 4. In the same way, we
can calculate the values of b12 (average) and b13 (average). b (i) is the minimum
of all these values. Hence, we can say that,
b(i) = minimum [b12(average), b13(average), b14(average)]

2. External evaluation

 In this approach, class label is already known for the data set.
 But, the known class labels are not a given during clustering time.
 The cluster algorithm is assessed based on how close the results are
compared to those known class labels.
For example, purity is one of the most popular measures of cluster
algorithms – evaluates the extent to which clusters contain a single class.

For a data set having ‘n’ data instances and ‘c’ known class labels which
generates ‘k’ clusters,

purity is measured as:

39
IMPROVING PERFORMANCE OF A MODEL
We have already discussed earlier that the model selection is done one
several aspects:
1.Type of learning the task in hand, i.e. supervised or unsupervised
2.Type of the data, i.e. categorical or numeric
3.Sometimes on the problem domain
4.Above all, experience in working with different models to solve
5.problems of diverse domains

What are the different ways to improve the performance of models?

Model parameter tuning is the process of adjusting the model fitting

options. For example, in the popular classification model k-Nearest
Neighbour (kNN), using different values of ‘k’ or the number of
nearest neighbours to be considered, the model can be tuned. In the
same way, a number of hidden layers can be adjusted to tune the
performance in neural networks model. Most machine learning models
have at least one parameter which can be tuned.
40
As an alternate approach of increasing the performance of one model,
several models may be combined together. i.e. one model may learn one type
data sets well while struggle with another type of data set. Another model
may perform well with the data set which the first one struggled with. This
approach of combining different models with diverse strengths is known
as ensemble (depicted in Figure 3.11 ). Ensemble helps in averaging out
biases of the different underlying models and also reducing the variance.
Ensemble methods combine weaker learners to create stronger ones. A
performance boost can be expected even if models are built as usual and
then ensembled.

FIG. Ensemble

41
Following are the typical steps in ensemble process:
 Build a number of models based on the training data
 For diversifying the models generated, the training data subset can be
varied using the allocation function.
 Sampling techniques like bootstrapping may be used to generate unique
training data sets.
 Alternatively, the same training data may be used but the models
combined are quite varying, e.g, SVM, neural network, kNN, etc.
 The outputs from the different models are combined using a
combination function.
 A very simple strategy of combining, say in case of a prediction task
using ensemble, can be majority voting of the different models
combined.
 For example, 3 out of 5 classes predict ‘win’ and 2 predict ‘loss’ – then
the final outcome of the ensemble using majority vote would be a ‘win’.
42
One of the earliest and most popular ensemble models is bootstrap aggregating
or bagging. Bagging uses bootstrap sampling method to generate multiple
training data sets. These training data sets are used to generate (or train) a set
of models. Then the outcomes of the models are combined by majority voting
(classification) or by average (regression).
Just like bagging, boosting is another key ensemble- based technique. In this type
of ensemble, weaker learning models are trained on resampled data and the
outcomes are combined using a weighted voting approach based on the
performance of different models. Adaptive boosting or AdaBoost is a special
variant of boosting algorithm. It is based on the idea of generating weak learners
and slowly learning.
Random forest is another ensemble-based technique. It is an ensemble of
decision trees – hence the name random forest to indicate a forest of decision
trees.

Unit 5
No ratings yet
Unit 5
12 pages
Overview of Machine Learning Concepts
No ratings yet
Overview of Machine Learning Concepts
19 pages
Machine Learning
No ratings yet
Machine Learning
73 pages
ML PPT Activation Functions
100% (1)
ML PPT Activation Functions
12 pages
Accuracy Analysis For TDOA Localization in Sensor Networks
No ratings yet
Accuracy Analysis For TDOA Localization in Sensor Networks
8 pages
A Study of Two Dimensional Sensor Placement Using
No ratings yet
A Study of Two Dimensional Sensor Placement Using
10 pages
5th Generation Wide Band DRFM
No ratings yet
5th Generation Wide Band DRFM
2 pages
Low-Cost TDOA Localization System
No ratings yet
Low-Cost TDOA Localization System
4 pages
MSS-A CGK Theoretical Introduction
No ratings yet
MSS-A CGK Theoretical Introduction
74 pages
TDOA-based Position Verification of ADS-B Information Using A Sensor Network
No ratings yet
TDOA-based Position Verification of ADS-B Information Using A Sensor Network
6 pages
TDOA Intro
No ratings yet
TDOA Intro
6 pages
Performance Analysis of Automatic Dependent Surveillance-Broadcas
No ratings yet
Performance Analysis of Automatic Dependent Surveillance-Broadcas
143 pages
Implementation of Wide Area Multilateration Using A 1090 MHZ ADS-B Signal
No ratings yet
Implementation of Wide Area Multilateration Using A 1090 MHZ ADS-B Signal
13 pages
Chapter - 5 Machine Learning
0% (1)
Chapter - 5 Machine Learning
25 pages
Alq-5oop Alq-5oop Esm Ecm Esm Ecm Airborne
No ratings yet
Alq-5oop Alq-5oop Esm Ecm Esm Ecm Airborne
2 pages
Machine Learning Unit 1
No ratings yet
Machine Learning Unit 1
39 pages
Hawk MB 339: System Performance
No ratings yet
Hawk MB 339: System Performance
2 pages
Spear Aecm Pod
No ratings yet
Spear Aecm Pod
2 pages
Aileron Reversal in Swept Wings Analysis
No ratings yet
Aileron Reversal in Swept Wings Analysis
5 pages
Pylon Based Ew Solutions Proven and Integrated Ew Solutions For Fighter Platforms
No ratings yet
Pylon Based Ew Solutions Proven and Integrated Ew Solutions For Fighter Platforms
2 pages
Vera E to NG: 4th to 5th Gen Upgrade
No ratings yet
Vera E to NG: 4th to 5th Gen Upgrade
38 pages
Machine Learning Types & Algorithms
No ratings yet
Machine Learning Types & Algorithms
29 pages
Machine Learning
No ratings yet
Machine Learning
12 pages
Iron Dome Family of Defense Systems
No ratings yet
Iron Dome Family of Defense Systems
2 pages
Levanon Radar Course Vol 1
No ratings yet
Levanon Radar Course Vol 1
134 pages
Saab at LAAD 2009
0% (1)
Saab at LAAD 2009
24 pages
Wirelesspi FMCW Radar
No ratings yet
Wirelesspi FMCW Radar
50 pages
Datasheet Vera-Ng Eng 2023 PDF
No ratings yet
Datasheet Vera-Ng Eng 2023 PDF
1 page
Radar Ambiguity Function Guide
No ratings yet
Radar Ambiguity Function Guide
38 pages
Module2.3 Hyperparameter Optimization
No ratings yet
Module2.3 Hyperparameter Optimization
29 pages
Intro to Machine Learning Basics
No ratings yet
Intro to Machine Learning Basics
31 pages
Clutter Attenuation Analysis
No ratings yet
Clutter Attenuation Analysis
39 pages
Understanding Multilateration Systems
No ratings yet
Understanding Multilateration Systems
31 pages
Linear Algebra: Camera Calibration
No ratings yet
Linear Algebra: Camera Calibration
19 pages
PN Sequence Generation in MATLAB
No ratings yet
PN Sequence Generation in MATLAB
0 pages
ESM System Role in Electronic Warfare
No ratings yet
ESM System Role in Electronic Warfare
4 pages
Radar Equation for Engineers
100% (4)
Radar Equation for Engineers
32 pages
Radar System Design with MATLAB & Simulink
No ratings yet
Radar System Design with MATLAB & Simulink
25 pages
Afd 160404 028
No ratings yet
Afd 160404 028
211 pages
Introduction to Radar Systems
No ratings yet
Introduction to Radar Systems
20 pages
Passive Radars and Their Use in The Modern Battlefield: April 2019
No ratings yet
Passive Radars and Their Use in The Modern Battlefield: April 2019
26 pages
Underwater OFDM DSP Algorithms
No ratings yet
Underwater OFDM DSP Algorithms
158 pages
Joint Target Tracking and Classification Using Radar and ESM Sensors
No ratings yet
Joint Target Tracking and Classification Using Radar and ESM Sensors
17 pages
Automotive Radar Target Tracking by Kalman Filtering
No ratings yet
Automotive Radar Target Tracking by Kalman Filtering
4 pages
Radar Transmitters and Receivers Overview
100% (1)
Radar Transmitters and Receivers Overview
29 pages
Unit I
No ratings yet
Unit I
28 pages
Machine L-Lab-Manual
No ratings yet
Machine L-Lab-Manual
90 pages
Support Vector Machine
No ratings yet
Support Vector Machine
32 pages
Radar Trainnig System
No ratings yet
Radar Trainnig System
32 pages
EITN90 Radar and Remote Sensing Lecture 10: Machine Learning Approaches To Radar Signal Analysis
No ratings yet
EITN90 Radar and Remote Sensing Lecture 10: Machine Learning Approaches To Radar Signal Analysis
49 pages
CS771: Intro to Machine Learning
No ratings yet
CS771: Intro to Machine Learning
25 pages
Mechanical Gimbal Design For Camera Stabilization-2
100% (1)
Mechanical Gimbal Design For Camera Stabilization-2
15 pages
Supervised Regression in Machine Learning
No ratings yet
Supervised Regression in Machine Learning
32 pages
Machine Learning & Python Basics
No ratings yet
Machine Learning & Python Basics
53 pages
Doppler Filter Banks, Limitation To MTI Performance
No ratings yet
Doppler Filter Banks, Limitation To MTI Performance
4 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
58 pages
Case Study ECM616 1
No ratings yet
Case Study ECM616 1
17 pages
Designing and Integrating Antenna Arrays With Multi Function Radar Systems
No ratings yet
Designing and Integrating Antenna Arrays With Multi Function Radar Systems
39 pages
ML Unit 2 Part 1
No ratings yet
ML Unit 2 Part 1
47 pages
CHP 3
No ratings yet
CHP 3
70 pages
ESP32 USB Remote Reset & Programming Fix
No ratings yet
ESP32 USB Remote Reset & Programming Fix
14 pages
Teknaevo TPG: Instructions Manual
No ratings yet
Teknaevo TPG: Instructions Manual
78 pages
Celestia Dot PDF
No ratings yet
Celestia Dot PDF
15 pages
Overview of Spirometry and Devices
No ratings yet
Overview of Spirometry and Devices
12 pages
ExtremeXOS 12.4.1 RelNote
No ratings yet
ExtremeXOS 12.4.1 RelNote
86 pages
Major Issues in DM
No ratings yet
Major Issues in DM
5 pages
18CSL58 - DBMS Lab Manual 2018 Scheme - KKS
75% (4)
18CSL58 - DBMS Lab Manual 2018 Scheme - KKS
113 pages
Divisibility Rules for Complex Numbers
No ratings yet
Divisibility Rules for Complex Numbers
2 pages
Operating System Unix Commands: (Asst. Prof. Umesh H. Thoriya)
No ratings yet
Operating System Unix Commands: (Asst. Prof. Umesh H. Thoriya)
33 pages
Pathways Algebra 2 Homework Answers
100% (1)
Pathways Algebra 2 Homework Answers
4 pages
Assignment 2, Quiz 2 & Quiz 3 PDF
No ratings yet
Assignment 2, Quiz 2 & Quiz 3 PDF
2 pages
COMPLETE JMI MBA Papers - Last 5 Yr Analysis
No ratings yet
COMPLETE JMI MBA Papers - Last 5 Yr Analysis
4 pages
Rhino Rendering Tutorial
No ratings yet
Rhino Rendering Tutorial
13 pages
Material Handling and Solid Processing Equipment
No ratings yet
Material Handling and Solid Processing Equipment
81 pages
Technological A
No ratings yet
Technological A
7 pages
Cit 102 PDF
100% (1)
Cit 102 PDF
216 pages
Welcome, Grade 3!
100% (1)
Welcome, Grade 3!
22 pages
Artigo - Catling - To Know Maps Primary School Children and Contextualised Map Learning
No ratings yet
Artigo - Catling - To Know Maps Primary School Children and Contextualised Map Learning
23 pages
Final HK2 21-22
No ratings yet
Final HK2 21-22
5 pages
Ancient Alexandria's Urban Evolution
No ratings yet
Ancient Alexandria's Urban Evolution
46 pages
Network Modeling and Analysis in ArcGIS
No ratings yet
Network Modeling and Analysis in ArcGIS
30 pages
Disc Brake Systems Explained
No ratings yet
Disc Brake Systems Explained
61 pages
Gpib and Ieee 488
No ratings yet
Gpib and Ieee 488
17 pages
Conveyor Calculation
No ratings yet
Conveyor Calculation
7 pages
Simple and Systematic LCL Filter Design For Three-Phase
No ratings yet
Simple and Systematic LCL Filter Design For Three-Phase
13 pages
(A) State Why The Velocity and The Acceleration Are Both Described As Vector Quantities
No ratings yet
(A) State Why The Velocity and The Acceleration Are Both Described As Vector Quantities
8 pages
Quint DC Ups 24dc 20
No ratings yet
Quint DC Ups 24dc 20
9 pages
Dremel Trio Model 6800 Manual
No ratings yet
Dremel Trio Model 6800 Manual
80 pages
Chapter - 5-Database Recovery Techniques
No ratings yet
Chapter - 5-Database Recovery Techniques
35 pages
Heat and Temperature Grade 7
100% (2)
Heat and Temperature Grade 7
4 pages

UNIT II Machine Learning

Uploaded by

UNIT II Machine Learning

Uploaded by

UNIT – II

The division of data of different classes into the

 This problem can be solved by applying stratified

 In this technique, several random holdouts are used to measure

 Figure depicts the detailed approach of selecting the ‘ k’ folds in k-fold

 As can be observed in the figure, each of the circles resembles a

the different classes that the records belong to.

selected in each iteration as the test data set.

 The fold selected as test data set in each of the ‘ k’ iterations is

different. As shown in figure, the circles resemble the records in the

using random sampling technique.

 Overfitting occur as a result of trying to fit an excessively complex

• Errors due to variance occur from difference in training

data sets used to train the model.

• Different training data sets are used to train the model.

• Ideally the difference in the data sets should not be

significant and the model trained using different training

data sets should not be too different.

• However, in case of overfitting, since the model closely

matches the training data, even a small difference in

training data gets overstated in the model.

In context of the above confusion matrix,

Sometimes, correct prediction, both TPs as well as TNs, may happen by

Kappa value can be 1 at the maximum, which represents perfect

A high value of sensitivity is more desirable than a high value of accuracy.

Specificity of a model measures the proportion of negative examples which

A higher value of specificity will indicate a better model performance.

Precision indicates the reliability of a model in predicting a class of

A model with higher precision is perceived to be more reliable.

F-measure is another measure of model performance which combines the

A regression model which ensures that the difference

where y̅ is the mean.

Sum of Squared Errors (SSE) (of prediction) = sum of

where ^y is the predicted value of yi and Yi is the actual value of yi.

 The internal evaluation methods generally measure cluster quality based on

FIG. Silhouette width calculation

purity is measured as:

What are the different ways to improve the performance of models?

Model parameter tuning is the process of adjusting the model fitting

You might also like