0% found this document useful (0 votes)

37 views42 pages

Slide 2 ML Basics

The document discusses K Nearest Neighbors (KNN) classification, detailing its implementation steps, distance metrics, and the importance of feature scaling. It also covers the characteristics of the KNN model, hyperparameters to tune, and the distinction between regression and classification tasks in machine learning. Additionally, it addresses the requirements for an ML model, including hypothesis and cost functions, as well as the impact of noise and outliers.

Uploaded by

JOBIN Wilson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views42 pages

Slide 2 ML Basics

Uploaded by

JOBIN Wilson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

24CSA524: Machine Learning

Remya Rajesh
K Nearest Neighbors Classification
SCATTER PLOT
Points from the visualization (Scatter plot)
• Two dimensions are the two features of a
dataset(number_of_malignant_nodes, age)
• Target: coloured – survived (blue), Did not Survive (red)
• number_of_malignant_nodes – range of values – (0,25]
• Age – range of values – (0,60]
• Each point in the plot is corresponding to a patient.
• Number of points(50) = Number of patients(50) in the dataset
• Each patient is identified by the values corresponding to the two
features (number_of_malignant_nodes, age)
K Nearest Neighbors Classification
K Nearest Neighbors Classification
K Nearest Neighbors Classification
K Nearest Neighbors Classification
K Nearest Neighbors Classification
K Nearest Neighbors Classification
What is Needed to Select a KNN Model?

•Correct value for 'K'

•How to measure closeness of neighbors?
Decision Boundary
Measurement of Distance
Euclidean Distance (L2 Distance)
Euclidean Distance (L2 Distance)
Manhattan Distance (L1 or City Block Distance)
KNN for Classification
• Load the data
• Preprocess the data
• Choose the value of K and define the distance metric
• Compute distances between the test point and all training points using the
chosen distance metric.
• Sort the distances in ascending order.
• Select the K nearest neighbors (smallest distances).
• Vote for the most frequent class among the K neighbors (majority rule).
• Assign the class label of the majority as the prediction.
• Evaluate the model
• Optimize the model
For regression
• Compute distances between the test point and all training points.
• Sort the distances in ascending order.
• Select the K nearest neighbors.
• Calculate the average (or weighted average) of the target values of
the K neighbors.
• Use the computed value as the prediction.
Feature Scaling is important
Comparison of Feature Scaling Methods

•Standard Scaler: mean center data and

scale to unit variance v−
v' =
A

 A

•Minimum-Maximum Scaler: scale data to

fixed range (usually 0–1)
v − minA
v' = (new _ maxA − new _ minA) + new _ minA
maxA − minA
Python

•NumPy, SciPy, Pandas: numerical computation

•Matplotlib, Seaborn:data visualization
•Scikit-learn: machine learning
Example:
Import the class containing the scaling method
from sklearn.preprocessing import StandardScaler
Create an instance of the class
StdSc= StandardScaler()
Fit the scaling parameters and then transform the data
StdSc= StdSc.fit(X_data)
X_scaled= StdSc.transform(X_data)

Other scaling method: MinMaxScaler

K Nearest Neighbors: The Syntax

Import the class containing the classification method

from sklearn.neighbors import KNeighborsClassifier
Create an instance of the class
KNN= KNeighborsClassifier(n_neighbors=3)
Fit the instance on the data and then predict the expected value
KNN= KNN.fit(X_data, y_data)
y_predict= KNN.predict(X_data)

Regression can be done with KNeighborsRegressor

Characteristics of KNN model

• KNN is a non-parametric algorithm (no model parameters)

•Fast to create model because it simply stores
data
•Slow to predict because many distance
calculations
•Can require lots of memory if data set is large
Hyperparameters to Tune

• K: Number of neighbors.
• Distance metric (e.g., Euclidean, Manhattan, etc.).
• Weighting scheme (uniform vs. distance-based) , w = 1/distance
• Neighbor search algorithm (brute force, k-d tree, ball tree).

• Extra points: Use of TreeSet in Java

What We Talk About When We Talk
About“Learning”
• Learning general models from a data of particular
examples
• Data is cheap and abundant (data warehouses, data
marts); knowledge is expensive and scarce.
• Build a model that is a good and useful
approximation/representation to the data!!
• Describe/ Summarize data in the form of a model

25
Learning: Knowledge iterates to improve

prior Learning knowledge

knowledge

Data/ Additional Data

26
Regression vs Classification
• regression: if 𝑦 ∈ ℝ is a continuous variable
• e.g., price prediction
• classification: the label is a discrete variable
• e.g., the task of predicting the types of residence

(living room size, parking area size) → mansion or villa?

𝑦 = mansion or
villa?
Algorithm

Model

Data
Neural Network model
Training and Test Splits
Using training and test data
Using training and test data
Train and Test Splitting: The Syntax
• Import the train and test split function
from sklearn.model_selection import train_test_split
• Split the data and put 30% into the test set
train, test = train_test_split(data, test_size=0.3)
Requirements for an ML Model
• Hypothesis Function - represents the mathematical model that maps
input features (X) to output predictions (Y). Different models have
different hypothesis functions.
• Examples:

• Cost Function - represents how well the hypothesis function fits the
data. It quantifies the error between predicted and actual values.
Different models have different cost functions.
Supervised Learning
Classification
• Example: Loan
payment
• Differentiating
between low-risk
and high-risk
customers from
their income and
savings

Discriminant: IF income > θ1 AND savings > θ2

THEN low-risk ELSE high-risk
36
Class C
(p1  price  p2 ) AND (e1  engine power  e2 )
Is a class rule for positive
examples

37
Hypothesis class H – set of all possible rectangles

Choose hypothesis h that

predicts well on unseen
examples (“test set”)

 1 if h says x is positive
h( x) = 
0 if h says x is negative

Generalization – How well the

hypothesis will classify unseen data not
part of the training set
38
Example:
Price Engine Power Y H(X)

10,000,00 150 1 0
20,000,00 192 0 1
15,000,00 170 1 1
19,000,00 187 0 0

Empirical Error of h –
Proportion of training
instances which don’t
match the required value
Noise and Outliers
Noise – due to wrong data collection, wrong labelling, due to other
hidden (latent) attributes not considered here
Outliers – Extreme cases

40
Linear Regression
Triple Trade-Off

• There is a trade-off between three factors

(Dietterich, 2003):
1. Complexity C of H ,
2. Training set size, N,
3. Generalization error, Er, on new data
 As N Er
 As C(H) first Er and then Er

Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
ML 4
No ratings yet
ML 4
33 pages
Lecture Week 2 KNN and Model Evaluation PDF
100% (1)
Lecture Week 2 KNN and Model Evaluation PDF
53 pages
Lesson 4 - Supervised Learning
No ratings yet
Lesson 4 - Supervised Learning
36 pages
Supervised Learning Techniques
No ratings yet
Supervised Learning Techniques
33 pages
ML Unit 4
No ratings yet
ML Unit 4
76 pages
ML Unit-2
No ratings yet
ML Unit-2
33 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
ML Unit2
No ratings yet
ML Unit2
38 pages
Unit 5 Learning With Algorithm
No ratings yet
Unit 5 Learning With Algorithm
7 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
ML 7th Sem Aiml Ite Notes Complete Long (1) - 63-155
No ratings yet
ML 7th Sem Aiml Ite Notes Complete Long (1) - 63-155
93 pages
K Nearest Neighbor: Presented by
No ratings yet
K Nearest Neighbor: Presented by
29 pages
Unit 3
No ratings yet
Unit 3
100 pages
k-NN Algorithm Overview & Applications
No ratings yet
k-NN Algorithm Overview & Applications
35 pages
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
No ratings yet
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
34 pages
Machine Lar Arii
No ratings yet
Machine Lar Arii
9 pages
cs4302 Lecture2
No ratings yet
cs4302 Lecture2
40 pages
Unit 5
No ratings yet
Unit 5
73 pages
Unit4 PPT
No ratings yet
Unit4 PPT
118 pages
Session 5
No ratings yet
Session 5
36 pages
Lecture 02 - KNN and ML Basics
No ratings yet
Lecture 02 - KNN and ML Basics
33 pages
2 KNN
No ratings yet
2 KNN
67 pages
Nearest Neighbor Regression: Find Training Datum Closest To Predict
No ratings yet
Nearest Neighbor Regression: Find Training Datum Closest To Predict
37 pages
ML Unit-2 (CEC)
No ratings yet
ML Unit-2 (CEC)
96 pages
Mlfa Autumn 22 Lec 03
No ratings yet
Mlfa Autumn 22 Lec 03
61 pages
Lecture 2 Final
No ratings yet
Lecture 2 Final
90 pages
ML Lec07 KNN
100% (2)
ML Lec07 KNN
37 pages
4.0 Supervised Learning 4.1 Discuss Classification Model
No ratings yet
4.0 Supervised Learning 4.1 Discuss Classification Model
48 pages
Machine Learning Model Essentials
No ratings yet
Machine Learning Model Essentials
18 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
105 pages
CH 7
No ratings yet
CH 7
33 pages
Lect 1
No ratings yet
Lect 1
24 pages
Unit 3 ML
No ratings yet
Unit 3 ML
25 pages
Week 4 v1.1 (Hidden) - Supervised Learning (Classification)
No ratings yet
Week 4 v1.1 (Hidden) - Supervised Learning (Classification)
43 pages
Deep Learning in Healthcare Applications
No ratings yet
Deep Learning in Healthcare Applications
68 pages
DSV Ia2
No ratings yet
DSV Ia2
18 pages
Chapter 6 ML Classifications
100% (1)
Chapter 6 ML Classifications
51 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
KNN Classification Lab Guide
No ratings yet
KNN Classification Lab Guide
4 pages
KNN Classifier & Regressor Guide
No ratings yet
KNN Classifier & Regressor Guide
40 pages
Updated K-Nearest Neighbors in Machine Learning
No ratings yet
Updated K-Nearest Neighbors in Machine Learning
11 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
73 pages
ML Supervised Learning Unit 3
No ratings yet
ML Supervised Learning Unit 3
51 pages
Supervised Learning Methods Guide
No ratings yet
Supervised Learning Methods Guide
34 pages
Week 09 Lesson 1 Intro Machine Learning 1 To 32
No ratings yet
Week 09 Lesson 1 Intro Machine Learning 1 To 32
61 pages
Nearest Neighbour
No ratings yet
Nearest Neighbour
25 pages
ML-Unit 5
No ratings yet
ML-Unit 5
40 pages
ML04 KNN-SVM 2024-2025
No ratings yet
ML04 KNN-SVM 2024-2025
57 pages
Intro to Machine Learning for Data Science
No ratings yet
Intro to Machine Learning for Data Science
37 pages
ML Notes
100% (2)
ML Notes
125 pages
Data Science Unit 3
No ratings yet
Data Science Unit 3
33 pages
ML KN
No ratings yet
ML KN
12 pages
CSE445 NSU Week - 5
No ratings yet
CSE445 NSU Week - 5
26 pages
Classification
No ratings yet
Classification
74 pages
Bill Summary
No ratings yet
Bill Summary
4 pages
Times Internet GST Invoice Details
No ratings yet
Times Internet GST Invoice Details
1 page
IPD Online Portal for Trainees and Mentors
No ratings yet
IPD Online Portal for Trainees and Mentors
6 pages
DTC Agreement Between Netherlands and Malta
No ratings yet
DTC Agreement Between Netherlands and Malta
25 pages
Chapter 02
No ratings yet
Chapter 02
25 pages
Sale Deed
No ratings yet
Sale Deed
5 pages
(Ebook PDF) Building Construction Handbook 11th Edition PDF Download
100% (3)
(Ebook PDF) Building Construction Handbook 11th Edition PDF Download
53 pages
Fall 2011 Phys 121 Syllabus-Draft04
No ratings yet
Fall 2011 Phys 121 Syllabus-Draft04
4 pages
Clothes Showroom Owners List
No ratings yet
Clothes Showroom Owners List
286 pages
Nle Quick Guide Handbook 02-24-25
No ratings yet
Nle Quick Guide Handbook 02-24-25
19 pages
Syarifah Rahimah 12 Jun
No ratings yet
Syarifah Rahimah 12 Jun
13 pages
Resource Governor
No ratings yet
Resource Governor
70 pages
Manual TV Philips 50 Pulgadas
No ratings yet
Manual TV Philips 50 Pulgadas
12 pages
MrsGalvan FREE Reading Lists
No ratings yet
MrsGalvan FREE Reading Lists
9 pages
ISC - Object Passing
No ratings yet
ISC - Object Passing
23 pages
ST Olaves Grammar School 11 Stage 2 Entrance Test Sample
No ratings yet
ST Olaves Grammar School 11 Stage 2 Entrance Test Sample
18 pages
Science 8 Curriculum Map Overview
60% (5)
Science 8 Curriculum Map Overview
2 pages
Golf Logix GPS Strategy Analysis
No ratings yet
Golf Logix GPS Strategy Analysis
3 pages
Campus Journalism Action Plan
100% (1)
Campus Journalism Action Plan
4 pages
Cost Engineering Professionals
No ratings yet
Cost Engineering Professionals
3 pages
Usp Compounding Compendium Toc
0% (1)
Usp Compounding Compendium Toc
5 pages
Cooking Theory & Food Science Syllabus
No ratings yet
Cooking Theory & Food Science Syllabus
20 pages
Specifications Spec No. Description ES11-137 Material ES33-102 Paint ES81-159 Clenliness ES81-250 Identification
No ratings yet
Specifications Spec No. Description ES11-137 Material ES33-102 Paint ES81-159 Clenliness ES81-250 Identification
1 page
BBD Report
100% (1)
BBD Report
4 pages
Credit Repair Plan B 19 Day Results
36% (11)
Credit Repair Plan B 19 Day Results
2 pages
Syllabus 2011 Revised Final Physics
No ratings yet
Syllabus 2011 Revised Final Physics
51 pages
Mca Scheme 2020 21
No ratings yet
Mca Scheme 2020 21
11 pages
Deutz Engine Repair Manual TCD 3 6 l4
97% (75)
Deutz Engine Repair Manual TCD 3 6 l4
20 pages
Mid Point Circle Drawing Algorithm Guide
No ratings yet
Mid Point Circle Drawing Algorithm Guide
8 pages
Letter of Bid: Road Widening and Upgrading Works in Himalayan Tole, Nikosera, Madhyapur Thimi 09, Bhaktapur
0% (1)
Letter of Bid: Road Widening and Upgrading Works in Himalayan Tole, Nikosera, Madhyapur Thimi 09, Bhaktapur
2 pages

Slide 2 ML Basics

Uploaded by

Slide 2 ML Basics

Uploaded by

24CSA524: Machine Learning

•Correct value for 'K'

•Standard Scaler: mean center data and

•Minimum-Maximum Scaler: scale data to

•NumPy, SciPy, Pandas: numerical computation

Other scaling method: MinMaxScaler

Import the class containing the classification method

Regression can be done with KNeighborsRegressor

• KNN is a non-parametric algorithm (no model parameters)

• Extra points: Use of TreeSet in Java

prior Learning knowledge

Data/ Additional Data

(living room size, parking area size) → mansion or villa?

Discriminant: IF income > θ1 AND savings > θ2

Choose hypothesis h that

Generalization – How well the

• There is a trade-off between three factors

You might also like