0% found this document useful (0 votes)

23 views45 pages

Module 3 - Introduction To ML

Uploaded by

devaadi0713

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views45 pages

Module 3 - Introduction To ML

Uploaded by

devaadi0713

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

MACHINE LEARNING

Presenter: Dr. Amit Kumar Das

Professor,
Dept. of Computer Science and Engg.,
Institute of Engineering & Management.
WHAT IS LEARNING?
TYPES OF HUMAN LEARNING

 Learning through direct guidance from

expert – is just one form …

 Learning through indirect guidance

 Learning by self
WHAT IS MACHINE LEARNING?
WHAT IS MACHINE LEARNING?
TYPES OF MACHINE LEARNING
 Supervised learning – also called
predictive learning

 Unsupervised learning – also called

descriptive learning

 Reinforcement learning
MACHINE LEARNING PROCESS
 What was the most difficult subject in the last
semester?

 What if, you had a list of all possible questions

with answers, and a photographic memory?
MACHINE LEARNING PROCESS
 Data Input – Past data or information is
utilized as a basis for future decision-making

 Abstraction – The input data is represented

in a broader way through the underlying
algorithm

 Generalization – The abstracted

representation is generalized to form a
framework for making decisions
TYPICAL ML PROBLEMS
 Prediction of results of a game
 Predicting whether a tumor is malignant or
benign
 Price prediction in domains like real estate,
stocks, etc.
 Demand forecasting in retails
 Customer segmentation
 Self-driven cars
PROBLEMS NOT TO BE CONSIDERED FOR ML

 Bank interest calculation

 Inventory management (except the demand

forecast module)

 Customer on-boarding (except risk prediction

module)

 Tasks in which humans are very effective or

frequent human intervention is needed. For
example, air traffic control
TYPES OF DATA
 Qualitative data (Categorical)
 Student Name, Blood group, Grade, etc.

 Quantitative data (Numerical)

 Temperature, Age, Weight, etc.
DATA EXPLORATION
 Understand the central tendency –
 Mean
 Median
 Mode

 Understand data spread

 Standard Deviation

 Understand data value position

DATA EXPLORATION – CENTRAL TENDENCY

Mean vs. Median for Auto MPG

DATA EXPLORATION – DATA SPREAD
 Consider the data values of two attributes
 Attribute 1 values – 44, 46, 48, 45 and 47
 Attribute 2 values – 34, 46, 59, 39 and 52
 Both the set of values have a mean and
median of 46.
 First set of values is more concentrated or
clustered around the mean / median value
DATA EXPLORATION – DATA VALUE POSITION

 Any data set attribute has five values

 Minimum
 First quartile (Q1)
 Median (Q2)
 Third quartile (Q3), and
 Maximum

Minimum Q1 Q3 Maximum

Median (Q2)
DATA EXPLORATION – BOX PLOT
DATA EXPLORATION – BOX PLOT
DATA QUALITY

 Most occurring data quality issues are:

 Missing values
 Outliers

Missing values of attribute “horsepower” in Auto MPG

REMEDIATE DATA ISSUES
 Remove missing values / outliers – If
number of records are not many, remove them.
 Imputation - Impute the value with mean or
median or mode
 Capping - For values that lie outside the
1.5 X IQR limits, cap them by replacing the
observations below the lower limit with value of
5th percentile and those that lie above the upper
limit, with value of 95th percentile
 Estimate missing values – Assign attribute
values of similar data points in place of the
missing value
ISSUES IN MACHINE LEARNING

 Relatively new and evolving technology

 In
different countries, rules and regulations,
cultural background, emotional maturity of
people are drastically different

 Biggestfears - potential breach of privacy,

discriminatory behaviour, resulting
discontent
WHAT IS MODELLING IN CONTEXT OF
MACHINE LEARNING?
WHAT ARE THE DIFFERENT ML
ALGORITHMS?

 Supervised
 Classification – KNN, Naive Bayes, Decision Tree, etc.

 Regression – Simple Linear Regression, Logistic

Regression

 Unsupervised
 Clustering – K-Means
 Market Basket Analysis
SUPERVISED LEARNING - CLASSIFICATION

Labelled Training Data

Classifier Classification Model

Test Data

Intel
SUPERVISED LEARNING - REGRESSION

y = α + βx
UNSUPERVISED LEARNING

Unlabelled Data

Unsupervised Learning Model

Grouped data / Clusters

UNSUPERVISED LEARNING - CLUSTERING

Cluster 2

Cluster 1

Cluster 3
Cluster 4
UNSUPERVISED LEARNING – MARKET BASKET
ANALYSIS
SELECTING A MODEL

 Predictive models (supervised)

 Predict the value of a category or class
 Problems that can be solved : Prediction of win/loss,
fraudulent transactions, etc.
 Examples : k-Nearest Neighbor (kNN), Naïve Bayes,
Decision Tree, etc.
 Predict numerical values of the target
 Problems that can be solved : Prediction of revenue
growth, rainfall amount, etc,
 Examples: Linear Regression, Logistic Regression, etc.
SELECTING A MODEL
 Descriptive
models
(unsupervised)
 Group together
similar data
instances
 Problems that can be
solved: Customer
grouping or
segmentation based
on social,
demographic, ethnic,
etc. factors
 Most popular model
for clustering is k-
Means
TRAIN A MODEL – HOLDOUT METHOD
70% - 80% Training
Data

Input
Data Trained Model

Test
20% - 30% Data

Model Performance
K-FOLD CROSS-VALIDATION– OVERALL APPROACH
K-FOLD CROSS-VALIDATION– DETAILED APPROACH
K-FOLD CROSS-VALIDATION (CONTD.)
BOOTSTRAP SAMPLING / BOOTSTRAPPING
TRAIN A MODEL – UNDER VS. OVER FIT

Under fit Balanced fit Over fit

TRAIN A MODEL – BIAS VS. VARIANCE
EVALUATING A MODEL - CLASSIFICATION

Actual Outcome  True Positive (TP) –

Win Loss
Predicted win, Actual win
 True Negative (TN) –
Predicted loss, Actual loss
 False Positive (FP) –
Win

Predicted win, Actual loss

Predicted Outcome

True Positive (TP) False Positive (FP)  False Negative (FN) –

Predicted loss, Actual win

 For both TP and TN,

predicted outcome
Loss

matches actual
outcome. Hence, they
False Negative (FN) True Negative (TN)
are correct
classifications.
EVALUATING A MODEL – CLASSIFICATION (CONTD.)
Actual
Actual Outcome
Actual Win Loss
Win Loss
Predicted Win 85 4
Predicted Loss 2 9
Win
Predicted Outcome

True Positive (TP) False Positive (FP)

Loss

False Negative (FN) True Negative (TN)

The percentage of misclassifications are indicated using error rate which is

measured as:

In context of the above confusion matrix,

EVALUATING A MODEL – CLASSIFICATION (CONTD.)
where P(a) = proportion of observed agreement between actual
and predicted in overall data set =

P(pr) = proportion of expected agreement between actual and predicted data both in case
of class of interest as well as the other classes =

Note: Kappa value can be 1 at the maximum, which represents perfect agreement between model’s prediction and actual values.
EVALUATING A MODEL (ROC CURVE)
TPR =

FPR =

Receiver Operating Characteristic curve

EVALUATING A MODEL (REGRESSION)
Value of the apartment unit 

Actual value

Error

Predicted value

Area (in square Feet) 

EVALUATING A MODEL (CLUSTERING)
“Clustering is in the eye of the beholder"

 Internal evaluation
 Silhouette width

 External evaluation
 Purity
EVALUATING A MODEL (CLUSTERING)
Cluster 2

Cluster 1
a(i)  Average distance between
ai2 ai1 the ith data instance and all other
data instances belonging to the
b14(1)
same cluster
ain_1 b(i)  Lowest average distance
b14(2)
between the i-the data instance and
b14(n4) data instances of all other clusters

Cluster 3
Cluster 4

Silhouette width calculation

ENSEMBLE
THANK YOU &
STAY TUNED!

Machine Learning
No ratings yet
Machine Learning
42 pages
Fam Question Bank CT
No ratings yet
Fam Question Bank CT
14 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
316 pages
Aiya Session 4
No ratings yet
Aiya Session 4
42 pages
Unit 3 ML
No ratings yet
Unit 3 ML
119 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
49 pages
Chapter 01 Introduction To ML
No ratings yet
Chapter 01 Introduction To ML
178 pages
Intro ML
No ratings yet
Intro ML
35 pages
Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
Machine Learning in PySpark
No ratings yet
Machine Learning in PySpark
18 pages
Machine Learning - Introduction
No ratings yet
Machine Learning - Introduction
73 pages
Unit 1 PDF
No ratings yet
Unit 1 PDF
135 pages
ML COMPLETE (Pure Sem Ka)
No ratings yet
ML COMPLETE (Pure Sem Ka)
347 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
24 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
10 pages
ML Chap 2
No ratings yet
ML Chap 2
60 pages
Lec-7 Intro Machine Learning
No ratings yet
Lec-7 Intro Machine Learning
87 pages
ML Unit1
No ratings yet
ML Unit1
25 pages
Data Analytics - ML Lecturenotes
No ratings yet
Data Analytics - ML Lecturenotes
85 pages
Introduction To Machine Learning-1
No ratings yet
Introduction To Machine Learning-1
28 pages
Module 2 - ML
No ratings yet
Module 2 - ML
53 pages
Lec-1 Introduction
No ratings yet
Lec-1 Introduction
65 pages
Introductiontomachinelearning 230723174746 1a0e5edc
No ratings yet
Introductiontomachinelearning 230723174746 1a0e5edc
27 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
11 pages
Unit 1 ML
No ratings yet
Unit 1 ML
49 pages
Machine Learning for Beginners
No ratings yet
Machine Learning for Beginners
27 pages
Project
No ratings yet
Project
12 pages
Intro to Machine Learning Basics
No ratings yet
Intro to Machine Learning Basics
31 pages
Chapter 01 Introduction To Machine Learning
No ratings yet
Chapter 01 Introduction To Machine Learning
59 pages
Data Mining Techniques and Models
No ratings yet
Data Mining Techniques and Models
43 pages
Machine Learning Intro & Evaluation Metrics
No ratings yet
Machine Learning Intro & Evaluation Metrics
50 pages
DSF Unit 3
No ratings yet
DSF Unit 3
29 pages
Machine Learning Models: by Mayuri Bhandari
No ratings yet
Machine Learning Models: by Mayuri Bhandari
48 pages
Machine Learning & Predictive Modeling Guide
No ratings yet
Machine Learning & Predictive Modeling Guide
24 pages
Machine Learning C
No ratings yet
Machine Learning C
24 pages
Data Science Lecture: Classification & Regression
No ratings yet
Data Science Lecture: Classification & Regression
27 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
14 pages
Learning Progress Review Week 10
No ratings yet
Learning Progress Review Week 10
35 pages
Introduction
No ratings yet
Introduction
41 pages
Machine Learning - Introduction
No ratings yet
Machine Learning - Introduction
138 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
31 pages
Lecture 4 Machine Learning - BCSC
No ratings yet
Lecture 4 Machine Learning - BCSC
45 pages
Classification vs Regression in ML
No ratings yet
Classification vs Regression in ML
15 pages
Classification of Machine Learning
No ratings yet
Classification of Machine Learning
73 pages
Module 1 ML
No ratings yet
Module 1 ML
51 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
Lecture1 MCQ Guide
No ratings yet
Lecture1 MCQ Guide
4 pages
Presentation ML-1
No ratings yet
Presentation ML-1
67 pages
Topic 1
No ratings yet
Topic 1
39 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
16 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
32 pages
Module 1 ML Mumbai University
No ratings yet
Module 1 ML Mumbai University
47 pages
Module2 ch2
No ratings yet
Module2 ch2
36 pages
Classification
No ratings yet
Classification
53 pages
Zarantech - Intro To ML
No ratings yet
Zarantech - Intro To ML
105 pages
ML Unit 1
No ratings yet
ML Unit 1
9 pages
05 - Machine Learning
No ratings yet
05 - Machine Learning
31 pages
Pom Intro 12 Jan
No ratings yet
Pom Intro 12 Jan
14 pages
Class XII Physics Practice Problems
No ratings yet
Class XII Physics Practice Problems
2 pages
Skull Crusher-3 Class XI JEE (Adv) Physics
No ratings yet
Skull Crusher-3 Class XI JEE (Adv) Physics
3 pages
Enrichment Activity Report (Sample)
No ratings yet
Enrichment Activity Report (Sample)
4 pages
Unit 3 Machine Learning
No ratings yet
Unit 3 Machine Learning
12 pages
3 - Aes
No ratings yet
3 - Aes
53 pages
Age-Related Questions and Solutions
No ratings yet
Age-Related Questions and Solutions
5 pages
Math Exam: Investment and Statistics
No ratings yet
Math Exam: Investment and Statistics
39 pages
Chapter Three
No ratings yet
Chapter Three
91 pages
MOS Game Class
No ratings yet
MOS Game Class
1 page
Isc Practical Paper
No ratings yet
Isc Practical Paper
100 pages
Ejercicio01 - What'sBest!
No ratings yet
Ejercicio01 - What'sBest!
5 pages
Matrices JEE Main 2021 (July) Chapter-Wise Questions
No ratings yet
Matrices JEE Main 2021 (July) Chapter-Wise Questions
9 pages
AQL Sampling
No ratings yet
AQL Sampling
4 pages
Aptitude Number System PDF
No ratings yet
Aptitude Number System PDF
5 pages
Pythagorean Theorem Explained
No ratings yet
Pythagorean Theorem Explained
5 pages
Invariants and Monovariants TAMO 5
No ratings yet
Invariants and Monovariants TAMO 5
11 pages
Revit Rebar Tutorial
100% (6)
Revit Rebar Tutorial
51 pages
Numerical Analysis Exam 2021
No ratings yet
Numerical Analysis Exam 2021
3 pages
Modified Entropies & Uncertainty Principles
No ratings yet
Modified Entropies & Uncertainty Principles
8 pages
HP 39gii Users Guide English en nw249-90001 Edition 1 PDF
100% (1)
HP 39gii Users Guide English en nw249-90001 Edition 1 PDF
355 pages
Optimal Design of AS/RS Storage Systems With Three-Class-Based Assignment Strategy Under Single and Dual Command Operations
No ratings yet
Optimal Design of AS/RS Storage Systems With Three-Class-Based Assignment Strategy Under Single and Dual Command Operations
13 pages
CB2041 - Applications of Game Theory To Business Lecture 2 - Extensive
No ratings yet
CB2041 - Applications of Game Theory To Business Lecture 2 - Extensive
40 pages
Analyzing Mario's Jump Trajectory
No ratings yet
Analyzing Mario's Jump Trajectory
4 pages
Quantum Physics: Bogoliubov Transformations
No ratings yet
Quantum Physics: Bogoliubov Transformations
8 pages
Math Annual Plan Grade 3.docx 3rd Term
No ratings yet
Math Annual Plan Grade 3.docx 3rd Term
2 pages
Tos Ral Language Makabansa Math GMRC 1
No ratings yet
Tos Ral Language Makabansa Math GMRC 1
2 pages
Adobe Scan 13 Jan 2025
No ratings yet
Adobe Scan 13 Jan 2025
5 pages
Find The Thévenin Equivalent With Respect To The 7k Ohm Resistor
No ratings yet
Find The Thévenin Equivalent With Respect To The 7k Ohm Resistor
27 pages
Swan Use
No ratings yet
Swan Use
143 pages

Module 3 - Introduction To ML

Uploaded by

Module 3 - Introduction To ML

Uploaded by

MACHINE LEARNING

Presenter: Dr. Amit Kumar Das

 Learning through direct guidance from

 Learning through indirect guidance

 Unsupervised learning – also called

 What if, you had a list of all possible questions

 Abstraction – The input data is represented

 Generalization – The abstracted

 Bank interest calculation

 Inventory management (except the demand

 Customer on-boarding (except risk prediction

 Tasks in which humans are very effective or

 Quantitative data (Numerical)

 Understand data spread

 Understand data value position

Mean vs. Median for Auto MPG

 Any data set attribute has five values

 Most occurring data quality issues are:

Missing values of attribute “horsepower” in Auto MPG

 Relatively new and evolving technology

 Biggestfears - potential breach of privacy,

 Regression – Simple Linear Regression, Logistic

Labelled Training Data

Classifier Classification Model

Unsupervised Learning Model

Grouped data / Clusters

 Predictive models (supervised)

Under fit Balanced fit Over fit

Under fit Balanced fit Over fit

Actual Outcome  True Positive (TP) –

Predicted win, Actual loss

True Positive (TP) False Positive (FP)  False Negative (FN) –

 For both TP and TN,

True Positive (TP) False Positive (FP)

False Negative (FN) True Negative (TN)

The percentage of misclassifications are indicated using error rate which is

In context of the above confusion matrix,

Receiver Operating Characteristic curve

Area (in square Feet) 

Silhouette width calculation

You might also like