0% found this document useful (0 votes)

68 views39 pages

IDA117V Supervised ML

The document provides an overview of machine learning, including concepts such as supervised and unsupervised learning, data preprocessing, and the importance of handling noisy data. It discusses various machine learning models, including linear regression and logistic regression, as well as classification metrics like accuracy, precision, recall, and F1-score. Additionally, it highlights the challenges of overfitting and underfitting in model training.

Uploaded by

lufunosape

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views39 pages

IDA117V Supervised ML

Uploaded by

lufunosape

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

IDA117V

Introduction to Machine learning.

Machines learning from Data

Machines /Models learning to predict the future based on patterns seen from
current data.

Learns patterns [Features ] and resulting outcomes [targets] of such patterns.

Not as straightforward like that, because data may be very dirty. Chances are
the machine can learn incorrect things from the data.
Data preprocessing + Training mechanisms

Handling noisy data:

EDA , so that we can see what the data is, preprocess the data ie removing NaN
values.

Selecting training / testing mechanisms that will reduce chances of learning

the noise.
Supervised learning

Supervised machine learning

Models are given what exactly to learn, must learn to associate certain patterns
with speciﬁc targets. There are features[inputs] and targets [outputs or labels].

Example : building a model to predict rain given atmospheric conditions

Unsupervised machine learning

Models are learn patterns or categories that exist in the data. No labels or
targets are given.

ie Group similar features / trends together.

Clustering, Dimensionality reduction and Association rules.

Supervised Machine learning examples

Predict Insurance fraud /valid based on some features

[0=fraud, 1=valid]

Features Targets

Id,policy number, policy status, alive , 0/1

Supervised Machine learning examples

Data about house prices prediction

Features Target

Floor area, number of stories, size of yard, surburb R

Classiﬁcation vs Regression

Classiﬁcation: Regression

[ Features ] [target] [Features] [ targert ]

Target is a discrete value / is a category. Target is a continuous value

Example: based on certain weather conditions Example: based on house features, location, size
one can train a machine learning algorithm to one can train a machine learning algorithm to
predict rain or no rain will occur. predict the price of a house.
Classiﬁcation vs Regression

The difference in classiﬁcation and regression is important because there are

some measurements that are used to evaluate specifically the regression
model and cannot be used to evaluate classification models.
Classification: Binary class Multi-class

Binary classiﬁcation: Multi-class classiﬁcation:

There are only two targets / outputs / classes in More than two targets / outputs classes / in the
the data data.
Linear regression

Regression: method that tries to determine the strength and nature of a relationship between the output (Y) and
the independent variable s (X).

Linear regression: There is a linear relationship between the independent variable and the dependent variable:
when independent variable changes, the dependent variable also changes linearly

Classic Example The price (Y) of a house may depend on the size ( X ) when the size of the house increases the
price tend to increase.
Linear regression continues. Car weight vs
mileage

Another example of Linear regression:

Identifying Linear relationships in data

Laerd Stats 2014

What a linear regression model learns

Linear regression models learns a line

y = mx + C Where y is the target or dependent variable and x is independent

observations / variables and C is the y intercept.
A few lines can be drawn that estimates the relationship between x and y variables.
Linear regression

Linear regression model learns a function

y^=mx+c that minimises the vertical error
(residual ) between this line and all the data
points.
The residual

We would want to know how good the line / model ﬁts the data. We want to see
how much error is made by the model by calculating the difference between the
predicted and the actual value:

ei = yi - yi^ where yi is the actual value , and yi^ is the predicted value of the ith
point.
Residuals

http://wiki.engageeducation.org.au/further-maths/data-analysis/residuals/
Residuals

Sometimes the residuals can be negative. ei = yi - yi^, if y^ is greater than

y.
More convenient to work with the squared residuals: e2

The best ﬁt line will have a minimum sum of squared residuals ie sum of
squared residuals from all the data points : ∑ ei2
Important measures of regression ﬁt

The standard deviation of the residuals, also called the Root Mean Squared
Error RMSE. A good rmse value is close to 0. Eg [ 0.2 ]

RMSE = where n = total points in data.

RMSE for Multiple regression

Def:

Simple regression:

Only one predictor [ X ] variable [independent variable], and the other varaible
Y is the response or dependent variable.

Multiple regression : Where there is more than one predictor variable that
affects the output/ response variable [ Y ]. RMSE changes slightly for MLR
RMSE for SLR and MLR

RMSE SLR RMSE MLR

Important measures of regression ﬁt

Coeﬃcient of determination R - Squared

Good R-squared is close to 1, [100%]

Overﬁtting and Underﬁtting

Overﬁtting Underﬁtting

Model knows too well about the data and Models learned nothing from the data, and
cannot generalize well to unseen data hence it cannot predict unseen data,

Training accuracy is high. eg 90% 50% accuracy.

Testing accuracy is very low. eg 60%

Preventing overﬁtting

Train with more data

Data augmentation [ artiﬁcial data ]

Feature selection, removing features that do not inform the outcome.

Classiﬁcation
Predicting a class not a continuous variable
Models for classiﬁcation

KNN

Logistic regression

Decision trees

SVM

ect.
Logistic regression

Types:

Binary or Binomial : target variable only have two possible values

Multi nominal: target variable takes three or more unordered values

Ordinal: target variable takes three or more ordered values

Linear regression vs Logistic regression

Linear regression y = mx+c Logistic regression: sigmoid function

Logistic regression

Chooses a threshold, say 0.5 and when the model returns more than 0.5 then
the prediction is given the class above the sigmoid curve.
Example: LRM

Using indian diabetic data:

Logistic regression Model.

Example: KNN

KNN
K nearest neighbours: assumes that similar things have a lot of features in
common. Examples Dog vs Cats , lions vs giraffes ect.
There is a small distance between similar things and a very huge difference
between different things.
KNN

The K, is number , hyper parameter in the model, 1 - N.

The algorithm computes distance between a query point and K Neighbours of

the point.
KNN

Using K= 2
KNN example.
Confusion matrix

TP : True positive

TN: True negative

FN: False negative

FP: False Positive

Confusion matrix

TP : True positive
Predicted diabetes and it is true
TN: True negative
Predicted no diabetes and it is true
FN: False negative
Predicted no diabetes yet there is diabetes [TYPE 2 Error]
FP: False Positive : predicted diabetes yet there is no diabetes. [TYPE 1 ERROR]
Metrics: Recall ,precision and accuracy

Measuring Model performance:

Recall : TP/(TP+FN) if we make less of FN , TYPE 2 error, the recall will be close to 1.

Precision : TP/(TP+FP) , if we make less of FP TYPE 1 error the precision will be

higher : close to 1.

Accuracy: (TP+TN) / (TP+FN+FP+TN)

Metrics : F1-Score

Better than accuracy because it take into account the class imbalances.

A harmonic mean between Recall and precision. Can be interpreted for all
scenarios. As opposed to either recall or precision.

Machine Learning
No ratings yet
Machine Learning
33 pages
Week 9 - PROG 8510 Week 9
No ratings yet
Week 9 - PROG 8510 Week 9
27 pages
Unit-2 Supervised Machine Learning
No ratings yet
Unit-2 Supervised Machine Learning
132 pages
ML Introduction
No ratings yet
ML Introduction
76 pages
Machine Learning
No ratings yet
Machine Learning
62 pages
Intro to Supervised Learning in ML
No ratings yet
Intro to Supervised Learning in ML
35 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Week 7. Intro To ML. Regression
No ratings yet
Week 7. Intro To ML. Regression
24 pages
Unit 2 Supervised Learning and Applications
No ratings yet
Unit 2 Supervised Learning and Applications
13 pages
Supervised Machine Learning Overview
No ratings yet
Supervised Machine Learning Overview
14 pages
Predictive ModellingAnalytics
No ratings yet
Predictive ModellingAnalytics
27 pages
Supervised Learning Essentials
No ratings yet
Supervised Learning Essentials
30 pages
Supervised and Unsupervised Learning
No ratings yet
Supervised and Unsupervised Learning
92 pages
ML 2 ND Unit
No ratings yet
ML 2 ND Unit
50 pages
Unit 2
No ratings yet
Unit 2
34 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
12 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
Unit2 ML Notes
No ratings yet
Unit2 ML Notes
19 pages
Unit - 2, Updated Notes
No ratings yet
Unit - 2, Updated Notes
121 pages
Supervised Learning. wk3
No ratings yet
Supervised Learning. wk3
18 pages
Machine Learning Techniques Overview
No ratings yet
Machine Learning Techniques Overview
73 pages
Unit 3
No ratings yet
Unit 3
45 pages
Ch-2 Supervised Machine Learning
No ratings yet
Ch-2 Supervised Machine Learning
48 pages
Module 3
No ratings yet
Module 3
63 pages
Supervised Learning & Linear Regression Guide
No ratings yet
Supervised Learning & Linear Regression Guide
56 pages
Module 2
No ratings yet
Module 2
139 pages
Types of Supervised Learning2
No ratings yet
Types of Supervised Learning2
66 pages
Classification and Regression
No ratings yet
Classification and Regression
26 pages
DSA Shotnotes For 2 Units
No ratings yet
DSA Shotnotes For 2 Units
5 pages
Cp4252 ML Unit-II
No ratings yet
Cp4252 ML Unit-II
44 pages
Complete
No ratings yet
Complete
12 pages
Machine Learning Basics for Beginners
No ratings yet
Machine Learning Basics for Beginners
53 pages
Linear Regression
No ratings yet
Linear Regression
28 pages
AAI Lecture 10 SP 25
No ratings yet
AAI Lecture 10 SP 25
37 pages
Overview of Supervised Machine Learning
No ratings yet
Overview of Supervised Machine Learning
24 pages
Day.9 SML
No ratings yet
Day.9 SML
23 pages
Presentation ML 1
No ratings yet
Presentation ML 1
67 pages
UNIT-2 Material
No ratings yet
UNIT-2 Material
71 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
17 pages
ML 1 PPT Unit 1
No ratings yet
ML 1 PPT Unit 1
93 pages
Class 3 - Classification
No ratings yet
Class 3 - Classification
80 pages
Regression
No ratings yet
Regression
19 pages
Machine Learning
No ratings yet
Machine Learning
19 pages
Slide 1
No ratings yet
Slide 1
29 pages
Machine Learning
No ratings yet
Machine Learning
100 pages
Chapter - 2-ML
No ratings yet
Chapter - 2-ML
63 pages
Introduction To AI and ML
No ratings yet
Introduction To AI and ML
22 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
115 pages
Supervised Learning: Classification & Regression
No ratings yet
Supervised Learning: Classification & Regression
6 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Foundation of Machine Learning F-PMLFML02-WS
No ratings yet
Foundation of Machine Learning F-PMLFML02-WS
352 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
d3 It ML Jan 2023 Part 2
No ratings yet
d3 It ML Jan 2023 Part 2
32 pages
Unit 2
No ratings yet
Unit 2
18 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
Wa0023.
No ratings yet
Wa0023.
22 pages
RRB - Unit 2 Regresion
No ratings yet
RRB - Unit 2 Regresion
53 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
IDA - Sample Questions FA1
No ratings yet
IDA - Sample Questions FA1
2 pages
Essay Question - IDA 2
No ratings yet
Essay Question - IDA 2
9 pages
IDA Essay Question - Answer
No ratings yet
IDA Essay Question - Answer
6 pages
Lecture 1
No ratings yet
Lecture 1
29 pages
Lean Accounting and Financial Performance in Nigeria
No ratings yet
Lean Accounting and Financial Performance in Nigeria
20 pages
Unit 3 Notes
100% (2)
Unit 3 Notes
32 pages
AP Statistics Worksheet Residuals and Least Squares
No ratings yet
AP Statistics Worksheet Residuals and Least Squares
3 pages
ACCA F2 Management Accounting Syllabus
No ratings yet
ACCA F2 Management Accounting Syllabus
12 pages
Classroom Management Practices and Learn
No ratings yet
Classroom Management Practices and Learn
16 pages
Classical LinearReg 000
No ratings yet
Classical LinearReg 000
41 pages
Welding Flux Composition Effects
No ratings yet
Welding Flux Composition Effects
13 pages
The Effect of Inflation To The Economic Growth of The Philippines
64% (14)
The Effect of Inflation To The Economic Growth of The Philippines
27 pages
History and Use of Relative Importance Indices
No ratings yet
History and Use of Relative Importance Indices
21 pages
9 Types of Regression Analysis
No ratings yet
9 Types of Regression Analysis
16 pages
Linear Regression Analysis of GDP
No ratings yet
Linear Regression Analysis of GDP
14 pages
Primary Data Collection Methods Guide
No ratings yet
Primary Data Collection Methods Guide
29 pages
Week 7 - Linear and Multiple Regression
100% (1)
Week 7 - Linear and Multiple Regression
2 pages
Kuant Guides: Sem With Lavaan 0.5-15
No ratings yet
Kuant Guides: Sem With Lavaan 0.5-15
11 pages
Project
No ratings yet
Project
13 pages
Ai-900 3
No ratings yet
Ai-900 3
18 pages
DBDA EANDC QB Practical Machine Learning PDF
No ratings yet
DBDA EANDC QB Practical Machine Learning PDF
4 pages
Mas First Preboard Examination Batch 95
No ratings yet
Mas First Preboard Examination Batch 95
11 pages
MBAF 502 Project
No ratings yet
MBAF 502 Project
15 pages
MBA Economic Analysis Course
No ratings yet
MBA Economic Analysis Course
13 pages
Profit Margins' Impact on Stock Prices
No ratings yet
Profit Margins' Impact on Stock Prices
11 pages
Fong Brian 272paper
No ratings yet
Fong Brian 272paper
28 pages
43-Article Text-350-1-10-20240530
No ratings yet
43-Article Text-350-1-10-20240530
12 pages
Impact of GDP, Inflation, and Rates on Investment
No ratings yet
Impact of GDP, Inflation, and Rates on Investment
8 pages
223-Article Text-370-1-10-20240619
No ratings yet
223-Article Text-370-1-10-20240619
28 pages
ECON3050 - Regression Analysis Research Paper
No ratings yet
ECON3050 - Regression Analysis Research Paper
17 pages
Capstone
No ratings yet
Capstone
57 pages
Assignment 4
No ratings yet
Assignment 4
6 pages
Statistical Analysis of Business Metrics
No ratings yet
Statistical Analysis of Business Metrics
11 pages
Cairns 1997 Root Biomass Allocation in The Worlds Upland Forests
No ratings yet
Cairns 1997 Root Biomass Allocation in The Worlds Upland Forests
12 pages

IDA117V Supervised ML

Uploaded by

IDA117V Supervised ML

Uploaded by

IDA117V

Introduction to Machine learning.

Learns patterns [Features ] and resulting outcomes [targets] of such patterns.

Handling noisy data:

Selecting training / testing mechanisms that will reduce chances of learning

Supervised machine learning

Example : building a model to predict rain given atmospheric conditions

ie Group similar features / trends together.

Clustering, Dimensionality reduction and Association rules.

Predict Insurance fraud /valid based on some features

Id,policy number, policy status, alive , 0/1

Data about house prices prediction

Floor area, number of stories, size of yard, surburb R

[ Features ] [target] [Features] [ targert ]

Target is a discrete value / is a category. Target is a continuous value

The difference in classiﬁcation and regression is important because there are

Binary classiﬁcation: Multi-class classiﬁcation:

Another example of Linear regression:

Laerd Stats 2014

Linear regression models learns a line

y = mx + C Where y is the target or dependent variable and x is independent

Linear regression model learns a function

Sometimes the residuals can be negative. ei = yi - yi^, if y^ is greater than

RMSE = where n = total points in data.

RMSE SLR RMSE MLR

Coeﬃcient of determination R - Squared

Good R-squared is close to 1, [100%]

Training accuracy is high. eg 90% 50% accuracy.

Testing accuracy is very low. eg 60%

Train with more data

Data augmentation [ artiﬁcial data ]

Feature selection, removing features that do not inform the outcome.

Binary or Binomial : target variable only have two possible values

Multi nominal: target variable takes three or more unordered values

Ordinal: target variable takes three or more ordered values

Linear regression y = mx+c Logistic regression: sigmoid function

Using indian diabetic data:

Logistic regression Model.

The K, is number , hyper parameter in the model, 1 - N.

The algorithm computes distance between a query point and K Neighbours of

TN: True negative

FN: False negative

FP: False Positive

Measuring Model performance:

Precision : TP/(TP+FP) , if we make less of FP TYPE 1 error the precision will be

Accuracy: (TP+TN) / (TP+FN+FP+TN)

You might also like