24CSA524: Machine Learning
Remya Rajesh
K Nearest Neighbors Classification
SCATTER PLOT
Points from the visualization (Scatter plot)
• Two dimensions are the two features of a
dataset(number_of_malignant_nodes, age)
• Target: coloured – survived (blue), Did not Survive (red)
• number_of_malignant_nodes – range of values – (0,25]
• Age – range of values – (0,60]
• Each point in the plot is corresponding to a patient.
• Number of points(50) = Number of patients(50) in the dataset
• Each patient is identified by the values corresponding to the two
features (number_of_malignant_nodes, age)
K Nearest Neighbors Classification
K Nearest Neighbors Classification
K Nearest Neighbors Classification
K Nearest Neighbors Classification
K Nearest Neighbors Classification
K Nearest Neighbors Classification
What is Needed to Select a KNN Model?
•Correct value for 'K'
•How to measure closeness of neighbors?
Decision Boundary
Measurement of Distance
Euclidean Distance (L2 Distance)
Euclidean Distance (L2 Distance)
Manhattan Distance (L1 or City Block Distance)
KNN for Classification
• Load the data
• Preprocess the data
• Choose the value of K and define the distance metric
• Compute distances between the test point and all training points using the
chosen distance metric.
• Sort the distances in ascending order.
• Select the K nearest neighbors (smallest distances).
• Vote for the most frequent class among the K neighbors (majority rule).
• Assign the class label of the majority as the prediction.
• Evaluate the model
• Optimize the model
For regression
• Compute distances between the test point and all training points.
• Sort the distances in ascending order.
• Select the K nearest neighbors.
• Calculate the average (or weighted average) of the target values of
the K neighbors.
• Use the computed value as the prediction.
Feature Scaling is important
Comparison of Feature Scaling Methods
•Standard Scaler: mean center data and
scale to unit variance v−
v' =
A
A
•Minimum-Maximum Scaler: scale data to
fixed range (usually 0–1)
v − minA
v' = (new _ maxA − new _ minA) + new _ minA
maxA − minA
Python
•NumPy, SciPy, Pandas: numerical computation
•Matplotlib, Seaborn:data visualization
•Scikit-learn: machine learning
Example:
Import the class containing the scaling method
from sklearn.preprocessing import StandardScaler
Create an instance of the class
StdSc= StandardScaler()
Fit the scaling parameters and then transform the data
StdSc= StdSc.fit(X_data)
X_scaled= StdSc.transform(X_data)
Other scaling method: MinMaxScaler
K Nearest Neighbors: The Syntax
Import the class containing the classification method
from sklearn.neighbors import KNeighborsClassifier
Create an instance of the class
KNN= KNeighborsClassifier(n_neighbors=3)
Fit the instance on the data and then predict the expected value
KNN= KNN.fit(X_data, y_data)
y_predict= KNN.predict(X_data)
Regression can be done with KNeighborsRegressor
Characteristics of KNN model
• KNN is a non-parametric algorithm (no model parameters)
•Fast to create model because it simply stores
data
•Slow to predict because many distance
calculations
•Can require lots of memory if data set is large
Hyperparameters to Tune
• K: Number of neighbors.
• Distance metric (e.g., Euclidean, Manhattan, etc.).
• Weighting scheme (uniform vs. distance-based) , w = 1/distance
• Neighbor search algorithm (brute force, k-d tree, ball tree).
• Extra points: Use of TreeSet in Java
What We Talk About When We Talk
About“Learning”
• Learning general models from a data of particular
examples
• Data is cheap and abundant (data warehouses, data
marts); knowledge is expensive and scarce.
• Build a model that is a good and useful
approximation/representation to the data!!
• Describe/ Summarize data in the form of a model
25
Learning: Knowledge iterates to improve
prior Learning knowledge
knowledge
Data/ Additional Data
26
Regression vs Classification
• regression: if 𝑦 ∈ ℝ is a continuous variable
• e.g., price prediction
• classification: the label is a discrete variable
• e.g., the task of predicting the types of residence
(living room size, parking area size) → mansion or villa?
𝑦 = mansion or
villa?
Algorithm
Model
Data
Neural Network model
Training and Test Splits
Using training and test data
Using training and test data
Train and Test Splitting: The Syntax
• Import the train and test split function
from sklearn.model_selection import train_test_split
• Split the data and put 30% into the test set
train, test = train_test_split(data, test_size=0.3)
Requirements for an ML Model
• Hypothesis Function - represents the mathematical model that maps
input features (X) to output predictions (Y). Different models have
different hypothesis functions.
• Examples:
• Cost Function - represents how well the hypothesis function fits the
data. It quantifies the error between predicted and actual values.
Different models have different cost functions.
Supervised Learning
Classification
• Example: Loan
payment
• Differentiating
between low-risk
and high-risk
customers from
their income and
savings
Discriminant: IF income > θ1 AND savings > θ2
THEN low-risk ELSE high-risk
36
Class C
(p1 price p2 ) AND (e1 engine power e2 )
Is a class rule for positive
examples
37
Hypothesis class H – set of all possible rectangles
Choose hypothesis h that
predicts well on unseen
examples (“test set”)
1 if h says x is positive
h( x) =
0 if h says x is negative
Generalization – How well the
hypothesis will classify unseen data not
part of the training set
38
Example:
Price Engine Power Y H(X)
10,000,00 150 1 0
20,000,00 192 0 1
15,000,00 170 1 1
19,000,00 187 0 0
Empirical Error of h –
Proportion of training
instances which don’t
match the required value
Noise and Outliers
Noise – due to wrong data collection, wrong labelling, due to other
hidden (latent) attributes not considered here
Outliers – Extreme cases
40
Linear Regression
Triple Trade-Off
• There is a trade-off between three factors
(Dietterich, 2003):
1. Complexity C of H ,
2. Training set size, N,
3. Generalization error, Er, on new data
As N Er
As C(H) first Er and then Er
42