0% found this document useful (0 votes)
137 views64 pages

ML Algo Basics

This document provides an overview of common machine learning algorithms and techniques, including regression, gradient descent, and k-nearest neighbors. It begins with introductions to machine learning basics and terminology. It then discusses linear regression and how to find the linear function that best fits a set of data using methods like the least squares method or gradient descent. Gradient descent is explained as an optimization technique for minimizing an error function by taking steps in the direction of the negative gradient. Finally, it briefly mentions k-nearest neighbors classification.

Uploaded by

puja_malh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
137 views64 pages

ML Algo Basics

This document provides an overview of common machine learning algorithms and techniques, including regression, gradient descent, and k-nearest neighbors. It begins with introductions to machine learning basics and terminology. It then discusses linear regression and how to find the linear function that best fits a set of data using methods like the least squares method or gradient descent. Gradient descent is explained as an optimization technique for minimizing an error function by taking steps in the direction of the negative gradient. Finally, it briefly mentions k-nearest neighbors classification.

Uploaded by

puja_malh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Common ML Algorithms Basics

Daniel Wehner, SAP SE


January 10, 2018

Public
Common ML Algorithms Basics »

Agenda

1 Machine Learning Basics

2 Regression and Gradient Descent

3 K-Nearest-Neighbor

4 Questions


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 2
Common ML Algorithms Basics

Machine Learning Basics


Common ML Algorithms Basics » Machine Learning Basics

Introduction

What is Learning?

„A computer program is said to learn from experience E with


respect to some class of tasks T and performance measure P ,
if its performance at tasks in T , as measured by P , improves
with experience E.“
M ITCHELL , 1997

The goal of ML is to generalize the experience in a way that


allows to improve the performance on the task.


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 4
Common ML Algorithms Basics » Machine Learning Basics

Some basic Terminology

• Type of Training Information


– Supervised: Algorithm needs labeled training data
– Unsupervised: Training data is not labeled ⇒ e. g. Clustering

• Availability of Training Examples


– Batch-Learning: Learner is provided with a set of training examples

– Online-Learning: Constant stream of training examples

• Concept Representation
– Model-based: Learner constructs a model (function, concept, . . . )

– Instance-based: No explicit model (lazy learning)


generalization on demand

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 5
Common ML Algorithms Basics

Regression and Gradient Descent


Common ML Algorithms Basics » Regression and Gradient Descent

Regression
Introduction

• Assume the following data:


y
10

x
−2 2 4

−10

• Find a function that fits the data best (⇒ Regression)

• Supervised, model-based, batch learning



c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 7
Common ML Algorithms Basics » Regression and Gradient Descent

Linear Regression

• Predict a value y depending on a value x

• Try to find a linear function fˆ(x) = a · x + b that fits the data best

y
10

x
−2 2 4

−10
x = 2.7, y =?


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 8
Common ML Algorithms Basics » Regression and Gradient Descent

Linear Regression

• Predict a value y depending on a value x

• Try to find a linear function fˆ(x) = a · x + b that fits the data best

y
10

x
−4 −2 2 4

−10
x = 2.7, y =?


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 8
Common ML Algorithms Basics » Regression and Gradient Descent

Regression
Least Squares Method
y

• Least Squares Method


Pn ˆ Error
• Minimize: i=1 (f (xi ) − yi )2
fˆ(x) = a · x + b

fˆ(xi ) := predicted value


yi := actual value


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 9
Common ML Algorithms Basics » Regression and Gradient Descent

Possible Solving Strategies

• Usual approach (closed form equation):


Pn
(x − x) · (yi − y)
a = y − b · x, b= Pni
i=1
2
i=1 (xi − x)

• Matrix notation (closed form equation):

1 x1 x21 · · · xm
   
1 b1
1 x2 x2 m
· · · x2   b2 
  
 2
T
 −1 T 1 x3 x2 m
· · · x3  , ~y =  b3 
p~ = X X X ~y with X = 
 
3 
 .. .. .. .. ..   .. 
. . . . .   . 
1 xn x2n · · · xnm bm

• Gradient Descent (e. g. Batch/Stochastic Gradient Descent)



c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 10
Common ML Algorithms Basics » Regression and Gradient Descent

Data Input Space vs. Model Space

Data Input Space (Input data) Model Space (Error function)


6

0
y

−2

−4

−6
−6 −4 −2 0 2 4 6
x

Determined by the attributes of the Determined by the degrees of free-


data dom of the model


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 11
Common ML Algorithms Basics » Regression and Gradient Descent

Error Function (aka Loss Function)

• The error function (=


b loss function) „lives“ in the model space
• Each point in the model space corresponds to a specific

combination of a and b and the resulting error value e(a, b)


• The error function has this form: (in this case)
n
1 X
e(a, b) = · (yi − (a · xi + b))2
n i=1

We want to minimize this function! ⇒ Gradient Descent


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 12
Common ML Algorithms Basics » Regression and Gradient Descent

Gradient Descent
What is a Gradient? (general case)

• A gradient is the slope of a function at a specific point x0

• Computation: g = function to optimize

∆y g(x2 ) − g(x1 )
grad(x0 ) = =
∆x x2 − x1
y
4

2
∆y
1
∆x
0 x1 x0 x2 x
−2 −1 0 1 2 3 4 5 6 7 8

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 13
Common ML Algorithms Basics » Regression and Gradient Descent

Gradient Descent (Ctd.)


(general case)

Gradient Descent

g(xn−1 ) − g(xn−2 )
xn = xn−1 − α ·
xn−1 − xn−2

• If the gradient is positive, we go left

• Otherwise we go right

⇒ This is how we make sure to find small function values


• α denotes the learning rate (prevention of oscillation)


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 14
Common ML Algorithms Basics » Regression and Gradient Descent

Gradient Descent (Ctd.)


Optimizing in multiple Dimensions

• So far we have only optimized in one dimension

• The error function has two independent variables: a, b

• Switch dimensions:
b error function
e=
e(an−1 , b) − e(an−2 , b)
an = an−1 − α · b = const
an−1 − an−2

e(a, bn−1 ) − e(a, bn−2 )


bn = bn−1 − α · a = const
bn−1 − bn−2
Choose one dimension to optimize, treat all other dimensions
as constant.

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 15
Common ML Algorithms Basics » Regression and Gradient Descent

Results of Linear Regression


for α = 0.001 and 3,000 iterations

• a ≈ 2.2667, b ≈ 2.1590, Error = e(2.2667, 2.1590) ≈ 429.6765

• fˆ(x) = 2.2667 · x + 2.1590 ⇒ fˆ(2.7) = 8.2791

y
10

x
−4 −2 2 4

−10
x = 2.7
y ≈ 8.28


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 16
Common ML Algorithms Basics » Regression and Gradient Descent

Results of Linear Regression (Ctd.)

• Values for a and b computed using a closed form equation:


– a = 1.896911
– b = 2.009028

• But: The error is still not 0!

This is because the data is not aligned in a straight line!


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 17
Common ML Algorithms Basics » Regression and Gradient Descent

Mathematically exact Gradient


Using partial Derivatives of Error Function

• Previously a „simplified/naive version“ of the gradient was used


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 18
Common ML Algorithms Basics » Regression and Gradient Descent

Mathematically exact Gradient


Using partial Derivatives of Error Function

• Previously a „simplified/naive version“ of the gradient was used

• Also possible:

Gradient for parameter a


n n
!
∂ 1 X 2 X
· (yi − (a · xi + b))2 = · −xi · (yi − (a · xi + b))
∂a n n
i=1 i=1

n n
!
∂ 1 X 2 X
· (yi − (a · xi + b))2 = · −(yi − (a · xi + b))
∂b n n
i=1 i=1
Gradient for parameter b

See: [Link]


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 18
Common ML Algorithms Basics » Regression and Gradient Descent

Disadvantage of Gradient Descent

• Disadvantage of Gradient Descent:

Might get stuck in local optimum

2 Local
Global
y

Optimum
Optimum
0

−2 −1 0 1 2
x

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 19
Common ML Algorithms Basics » Regression and Gradient Descent

Rastrigin Function
A Nightmare for Gradient Descent


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 20
Common ML Algorithms Basics » Regression and Gradient Descent

Overfitting and Underfitting


Or: How many Degrees of Freedom?

Overfitting Underfitting

The model does not general- The model generalizes too


ize well. (Too many degrees of much! (Too few degrees of
freedom!) freedom)

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 21
Common ML Algorithms Basics

K-Nearest-Neighbor
Common ML Algorithms Basics » K-Nearest-Neighbor

K-Nearest-Neighbor
Overview

• Basic idea: Predict class depending on nearest neigbors

• Instance-based, supervised, online learning


x2
0.8

0.6

0.4

0.2
? x1
0.2 0.4 0.6 0.8

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 23
Common ML Algorithms Basics » K-Nearest-Neighbor

K-Nearest-Neighbor
Distance Metric

• How to measure the distance between two items i and j?

⇒ Distance Metrics
• d is a distance function d : (i, j) −→ [0; +∞)

• d has the following properties:

1) d(i, j) = d(j, i) (Commutativity)


2) d(i, j) = 0 ⇒ i = j
3) d(i, k) ≤ d(i, j) + d(j, k) (Triangle inequality)


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 24
Common ML Algorithms Basics » K-Nearest-Neighbor

Distance Metrics
Some Examples for Continuous Values

• lr metric:
p
!1/r
X
d(i, j) = (xik − xjk )r
k=1

• l1 metric, City-Block/Manhattan Distance (r = 1):


p
X
d(i, j) = |xik − xjk |
k=1
• l2 metric, Euclidean Distance (r = 2):
v
u p
uX
d(i, j) = t (xik − xjk )2
k=1

• Many more: [Link]



c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 25
Common ML Algorithms Basics » K-Nearest-Neighbor

Distance Metrics (Ctd.)


Manhattan Distance and Euclidean Distance

x2
5
p √
d(i, j) = (7 − 4)2 + (4 − 2)2 = 13
4
a n
de
3 ucli
E
Manhattan
2
d(i, j) = |7 − 4| + |4 − 2| = 5
1

0 x1
0 1 2 3 4 5 6 7 8 9 10


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 26
Common ML Algorithms Basics » K-Nearest-Neighbor

Distance Metrics (Ctd.)


Example for categorical Values

• Simple-Matching-Distance

number of non-matching attributes


SMD =
number of attributes

a1 : Outlook a2 : Temperature a3 : Humidity a4 : Wind


sunny hot high weak
rainy mild normal weak
SMD: 3/4


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 27
Common ML Algorithms Basics » K-Nearest-Neighbor

Similarity Metrics as Alternative


Cosine Similarity

• Similarity metrics are an alternative to distance metrics

• A famous one is the Cosine Similarity

• Cosine similarity of two vectors ~a, ~b is the cosine

of the included angle:

~b Pn
~
a · i=1 api · bi
cos ^(~a, ~b) = = pPn Pn
k~ak2 k~bk2 2
i=1 (ai ) · i=1 (bi )
2

• Since the dot product is defined as:

~a · ~b = k~akk~bk cos ^(~a, ~b)


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 28
Common ML Algorithms Basics » K-Nearest-Neighbor

Cosine Similarity (Ctd.)


visualized

x2 j1 is closer to i
since
j~2 cos(α1 ) > cos(α2 )
1 —————————
~i cos(0) = 1
cos(90) = 0
j~1
α2

α1

x1
1

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 29
Common ML Algorithms Basics » K-Nearest-Neighbor

Prediction with K-Nearest-Neighbor


Calculating Distances

i = (0.45, 0.15); Euclidean Distance is used


x1 x2 c d(i, jl )
j1 0.65521554 0.24444527 1 0.225905571
j2 0.24504847 0.78597272 1 0.668181435
j3 0.15609492 0.81457872 1 0.726667098
j4 0.56865867 0.20662444 1 0.131477021
j5 0.20714328 0.71919525 1 0.618839736
j6 0.66016056 0.27059772 1 0.242304088
j7 0.26992809 0.10746116 0 0.185028229
j8 0.38712087 0.12727304 0 0.066860300
j9 0.39236510 0.85702696 0 0.709372190
j10 0.44483550 0.67265678 0 0.522682295
j11 0.30913813 0.33331875 0 0.231187868
j12 0.03022423 0.50891496 0 0.552296701
.. .. .. .. ..
. . . . .

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 30
Common ML Algorithms Basics » K-Nearest-Neighbor

Prediction with K-Nearest-Neighbor (Ctd.)

• All distances have been calculated

• Sort table by distances in ascending order

If similarity metrics are used ⇒ Sort in descending order!

• Look at the first k entries and count the occurrences of each class

• Predict the class with maximum score


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 31
Common ML Algorithms Basics » K-Nearest-Neighbor

Prediction with K-Nearest-Neighbor (Ctd.)

• k = 10 (We are only interested in the first 10 entries)

• Cnt(c = 0) = 3, Cnt(c = 1) = 7 ⇒ Predict class 1

x1 x2 c d(i, jl )
0.50913594 0.17425407 1 0.06391650
0.38712087 0.12727304 0 0.06686030
0.52462468 0.17319311 1 0.07814578
0.42848188 0.22721030 0 0.08015273
0.46872483 0.03031198 0 0.12114389
0.52246121 0.25967888 1 0.13145373
0.56865867 0.20662444 1 0.13147702
0.53451705 0.25122332 1 0.13186847
0.58228139 0.11870052 1 0.13593389
0.58674444 0.13294919 1 0.13780338
.. .. .. ..
. . . .


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 32
Common ML Algorithms Basics » K-Nearest-Neighbor

Algorithm 1 K-Nearest-Neighbor Algorithm


Require: New Instance i, Training Instances S, Integer k
1: distances ←− ∅
2: for all j ∈ S do
3: calculate distance d(i, j), e. g. Euclidean Distance:
v
u p
uX
d(i, j) ←− t (xik − xjk )2
k=1

4: distances ←− distances ∪ hi, j, d(i, j)i


5: distances ←− sortByDistance(distances, ascending=True)
6: select top k elements from distances
7: return majority class as prediction


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 33
Common ML Algorithms Basics » K-Nearest-Neighbor

K-Nearest-Neighbor
Use Case: Information Retrieval

• Suppose you want to create a search engine like G OOGLE

• You have to find matching documents to a query

• This is referred to as Information Retrieval

„Information retrieval (IR) is the activity of obtaining informa-


tion resources relevant to an information need from a collection
of information resources.“ W IKIPEDIA

Problem: Comparing the query with each document (data


point) is computationally extremely expensive and therefore not
viable! ⇒ Millions of documents!

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 34
Common ML Algorithms Basics » K-Nearest-Neighbor

K-Nearest-Neighbor
Use Case: Information Retrieval (Ctd.)

Docs ? Query
„SAP HANA Memory Sizes“

Some MAGIC

Retrieved
Documents


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 35
Common ML Algorithms Basics » K-Nearest-Neighbor

KD-Trees (k-dimensional trees)

• It’s a specific data structure for effectively representing data

• Organizes and partitions data based on conditions

• Allows for pruning the search space

We no longer have to visit each data point (document) in the


search space! :-)

• Data is partitioned by making cuts in each dimension

• We maintain lists of points that fall in each bin (partition)


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 36
Common ML Algorithms Basics » K-Nearest-Neighbor

KD-Trees (Ctd.)

• How to draw the cuts? • When to stop?


– Median value in the box – Threshold # points in box
– Center point – Reached minimum
(ignore spread of data) width of box

• Switch dimensions

Don’t worry. There will be an example on the next slides ;-)


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 37
Common ML Algorithms Basics » K-Nearest-Neighbor

KD-Trees
Example

X Y
Data point 0 0.54 0.93 0.8
Data point 1 0.96 0.86
0.6

Y
Data point 2 0.42 0.67
Data point 3 0.11 0.53
Data point 4 0.64 0.29 0.4
Data point 5 0.27 0.75
Data point 6 0.81 0.63 0.2 0.4 0.6 0.8 1
X
• How to draw the cuts?
– Mean of max and min value ⇒ vcut = (min + max)/2

• When to stop?
– Split until there are maximum two points per box


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 38
Common ML Algorithms Basics » K-Nearest-Neighbor

KD-Trees
Example: The first Cut

• Cut X-axis

• Xcut = (minX + maxX )/2 = (0.11 + 0.96)/2 = 0.53

Root
X > 0.53

No: Yes:
Node 1 Node 2


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 39
Common ML Algorithms Basics » K-Nearest-Neighbor

KD-Trees
Example: The first Cut (Ctd.)

1
Y
0.8

0.6

0.4

0.2
Node 1 Node 2
X
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 40
Common ML Algorithms Basics » K-Nearest-Neighbor

KD-Trees
Example: Consecutive Cuts

Root
X > 0.53

No: Yes:
Node 1 Node 2
Y > 0.64 Y > 0.61

Yes:
No: Yes: No:
Node 6
Node 3 Node 4 Node 5
X > 0.75

No: Yes:
Node 7 Node 8


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 41
Common ML Algorithms Basics » K-Nearest-Neighbor

KD-Trees
Example: Consecutive Cuts (Ctd.)

1
Y
0.8
N4 N7 N8
0.6
???
0.4
(0.38,0.55)

0.2
N3 N5 X
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 42
Common ML Algorithms Basics » K-Nearest-Neighbor

KD-Trees
Example: Predicting

Root
X > 0.53

No: Yes:
Node 1 Node 2
Y > 0.64 Y > 0.61

Yes:
No: Yes: No:
Node 6
Node 3 Node 4 Node 5
X > 0.75

No: Yes:
Node 7 Node 8


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 43
Common ML Algorithms Basics » K-Nearest-Neighbor

KD-Trees
Example: Predicting (Ctd.)

1
Y
0.8
N4 N7 N8
0.6
Nearest ???
0.4 Neighbor

0.2
N3 N5 X
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 44
Common ML Algorithms Basics » K-Nearest-Neighbor

KD-Trees
Example: Predicting (Ctd.)

Root
X > 0.53

No: Yes:
Node 1 Node 2
Y > 0.64 Y > 0.61

Yes:
No: Yes: No:
Node 6
Node 3 Node 4 Node 5
X > 0.75

No: Yes:
Node 7 Node 8


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 45
Common ML Algorithms Basics » K-Nearest-Neighbor

KD-Trees
Example: Predicting (Ctd.)

1
Y
0.8
N4 N7 N8
0.6
NN ???
0.4

0.2
N3 N5 X
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 46
Common ML Algorithms Basics » K-Nearest-Neighbor

KD-Trees
Example: Predicting (Ctd.)

Root
X > 0.53

No: Yes:
Node 1 Node 2
Y > 0.64 Y > 0.61

Yes:
No: Yes: No:
Node 6
Node 3 Node 4 Node 5
X > 0.75

No: Yes:
Node 7 Node 8


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 47
Common ML Algorithms Basics » K-Nearest-Neighbor

KD-Trees
Example: Predicting (Ctd.)

1
Y
0.8
N4 NN N7 N8
0.6
???
0.4

0.2
N3 N5 X
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 48
Common ML Algorithms Basics » K-Nearest-Neighbor

KD-Trees
Example: Predicting (Ctd.)

Root
X > 0.53

No: Yes:
Node 1 Node 2
Y > 0.64 Y > 0.61

Yes:
No: Yes: No:
Node 6
Node 3 Node 4 Node 5
X > 0.75

No: Yes:
Node 7 Node 8


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 49
Common ML Algorithms Basics » K-Nearest-Neighbor

KD-Trees
A very complex Example


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 50
Common ML Algorithms Basics » K-Nearest-Neighbor

Feature Weighting

• Not all dimensions/attributes are equally important


– Some dimensions might be completely irrelevant
– „normal“ distance function assign equal weights (wp = 1)

• Idea:
– use a weight for each attribute to denote its importance
– E. g. Weighted Euclidean Distance
v
u p
uX
d(i, j) = t wp · (xik − xjk )2
k=1

– Weights wp can be set by user or determined automatically


(e. g. ⇒ RELIEF algorithm by K IRA and R ENDELL)

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 51
Common ML Algorithms Basics » K-Nearest-Neighbor

RELIEF Algorithm
Feature Weighting

Basic idea:
In a local environment around an example x a good attribute p
should
• allow to discriminate x from all examples of different classes

(the set of misses)


– Prob. that p has a different value for x and a miss m should be high

• have the same value for all examples of the same class as x

(the set of hits)


– Prob. that p has a different value for x and a hit h should be low

⇒ Try to estimate: wp = P (vx 6= vm ) − P (vx 6= vh )


where vx is the value of attribute p in example x

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 52
Common ML Algorithms Basics » K-Nearest-Neighbor

Algorithm 2 RELIEF Algorithm


Require: Training Instances S, Integer r
1: ∀wp ∈ w
~ : wp ←− 0.0
2: for all i ∈ {1,. . . , r} do
3: select a random example x ∈ S
4: find nearest neighbor of different class (near miss)
5: find nearest neighbor of same class (near hit)
6: for all attributes p ∈ P do

1
wp ←− wp + · (dp (m, x) − dp (h, x))
r

7: return attribute weights w


~


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 53
Common ML Algorithms Basics » K-Nearest-Neighbor

Instance Weighting

• Idea: Assign a weight to each instance

• Instances with lower weights are always distant


– hence have a low impact on classification
– instance weight wx = 0 completely ignores instance x

– Example (again Euclidean Distance)


v
u p
1 uX
d(i, j) = · t (xik − xjk )2
wi · wj k=1

– where: (wx ≈ 1 ⇒ good prediction, wx < 1 ⇒ poor prediction)


Cnt(x has correctly predicted class)
wx =
Cnt(x has been used for prediction)

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 54
Common ML Algorithms Basics » K-Nearest-Neighbor

How to choose k?

⇒ Overfit! ⇒ Generalizes better!


Prone to outliers Underfit??

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 55
Common ML Algorithms Basics » K-Nearest-Neighbor

How to choose k? (Ctd.)


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 56
Common ML Algorithms Basics » K-Nearest-Neighbor

How to choose k? (Ctd.)


Solutions

• It is recommended to use odd values (⇒ no ties!)

• Compute k depending on the size of the data set S:


r
|S| p
k= or k = |S|
2
• Use d-fold Cross-Validation (e. g. d = 4):
X

X

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 57
Common ML Algorithms Basics » K-Nearest-Neighbor

Algorithm 3 Find-k (using Cross-Validation)


Require: List of k-values K, Training data S, Number of folds d
1: F ←− Split S into d disjoint subsets
2: for all f ∈ F do
3: Select f as validation set
4: Train classifier on remaining d − 1 subsets
5: for all k ∈ K do
6: Classify each instance in validation set
7: Record the prediction error
8: kopt ←− k with minimal error rate
9: Train model on entire data set using kopt


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 58
Common ML Algorithms Basics » K-Nearest-Neighbor

How to choose k? (Ctd.)


Results of Cross-Validation

20

15 best k: 8 or 9
Error

10

5
0 5 10 15 20 25 30 35 40 45 50
k

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 59
Common ML Algorithms Basics » K-Nearest-Neighbor

K-Nearest-Neighbor
Advantages and Disadvantages

• Advantages:
– Easy and comprehensible
– It is easy to add new training items (no retraining required)

• Disadvantages:
– Calculating distances to all points is costly
(⇒ Subsampling, KD-Trees)

– How to choose a proper value for k? (⇒ Cross-Validation)


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 60
Common ML Algorithms Basics » Questions

Any Questions?


c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 61
Thank you for your attention!

Contact:
Daniel Wehner, SAP SE

You might also like