0% found this document useful (0 votes)

137 views64 pages

ML Algo Basics

This document provides an overview of common machine learning algorithms and techniques, including regression, gradient descent, and k-nearest neighbors. It begins with introductions to machine learning basics and terminology. It then discusses linear regression and how to find the linear function that best fits a set of data using methods like the least squares method or gradient descent. Gradient descent is explained as an optimization technique for minimizing an error function by taking steps in the direction of the negative gradient. Finally, it briefly mentions k-nearest neighbors classification.

Uploaded by

puja_malh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

137 views64 pages

ML Algo Basics

Uploaded by

puja_malh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Common ML Algorithms Basics

Daniel Wehner, SAP SE

January 10, 2018

Public
Common ML Algorithms Basics »

Agenda

1 Machine Learning Basics

2 Regression and Gradient Descent

3 K-Nearest-Neighbor

4 Questions

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 2
Common ML Algorithms Basics

Machine Learning Basics

Common ML Algorithms Basics » Machine Learning Basics

Introduction

What is Learning?

„A computer program is said to learn from experience E with

respect to some class of tasks T and performance measure P ,
if its performance at tasks in T , as measured by P , improves
with experience E.“
M ITCHELL , 1997

The goal of ML is to generalize the experience in a way that

allows to improve the performance on the task.

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 4
Common ML Algorithms Basics » Machine Learning Basics

Some basic Terminology

• Type of Training Information

– Supervised: Algorithm needs labeled training data
– Unsupervised: Training data is not labeled ⇒ e. g. Clustering

• Availability of Training Examples

– Batch-Learning: Learner is provided with a set of training examples

– Online-Learning: Constant stream of training examples

• Concept Representation
– Model-based: Learner constructs a model (function, concept, . . . )

– Instance-based: No explicit model (lazy learning)

generalization on demand

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 5
Common ML Algorithms Basics

Regression and Gradient Descent

Common ML Algorithms Basics » Regression and Gradient Descent

Regression
Introduction

• Assume the following data:

y
10

x
−2 2 4

−10

• Find a function that fits the data best (⇒ Regression)

• Supervised, model-based, batch learning

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 7
Common ML Algorithms Basics » Regression and Gradient Descent

Linear Regression

• Predict a value y depending on a value x

• Try to find a linear function fˆ(x) = a · x + b that fits the data best

y
10

x
−2 2 4

−10
x = 2.7, y =?

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 8
Common ML Algorithms Basics » Regression and Gradient Descent

Linear Regression

• Predict a value y depending on a value x

• Try to find a linear function fˆ(x) = a · x + b that fits the data best

y
10

x
−4 −2 2 4

−10
x = 2.7, y =?

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 8
Common ML Algorithms Basics » Regression and Gradient Descent

Regression
Least Squares Method
y

• Least Squares Method

Pn ˆ Error
• Minimize: i=1 (f (xi ) − yi )2
fˆ(x) = a · x + b

fˆ(xi ) := predicted value

yi := actual value

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 9
Common ML Algorithms Basics » Regression and Gradient Descent

Possible Solving Strategies

• Usual approach (closed form equation):

Pn
(x − x) · (yi − y)
a = y − b · x, b= Pni
i=1
2
i=1 (xi − x)

• Matrix notation (closed form equation):

1 x1 x21 · · · xm
   
1 b1
1 x2 x2 m
· · · x2   b2 
  
 2
T
−1 T 1 x3 x2 m
· · · x3  , ~y =  b3 
p~ = X X X ~y with X = 
 
3 
 .. .. .. .. ..   .. 
. . . . .   . 
1 xn x2n · · · xnm bm

• Gradient Descent (e. g. Batch/Stochastic Gradient Descent)

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 10
Common ML Algorithms Basics » Regression and Gradient Descent

Data Input Space vs. Model Space

Data Input Space (Input data) Model Space (Error function)

0
y

−2

−4

−6
−6 −4 −2 0 2 4 6
x

Determined by the attributes of the Determined by the degrees of free-

data dom of the model

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 11
Common ML Algorithms Basics » Regression and Gradient Descent

Error Function (aka Loss Function)

• The error function (=

b loss function) „lives“ in the model space
• Each point in the model space corresponds to a specific

combination of a and b and the resulting error value e(a, b)

• The error function has this form: (in this case)
n
1 X
e(a, b) = · (yi − (a · xi + b))2
n i=1

We want to minimize this function! ⇒ Gradient Descent

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 12
Common ML Algorithms Basics » Regression and Gradient Descent

Gradient Descent
What is a Gradient? (general case)

• A gradient is the slope of a function at a specific point x0

• Computation: g = function to optimize

∆y g(x2 ) − g(x1 )
grad(x0 ) = =
∆x x2 − x1
y
4

2
∆y
1
∆x
0 x1 x0 x2 x
−2 −1 0 1 2 3 4 5 6 7 8

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 13
Common ML Algorithms Basics » Regression and Gradient Descent

Gradient Descent (Ctd.)

(general case)

Gradient Descent

g(xn−1 ) − g(xn−2 )
xn = xn−1 − α ·
xn−1 − xn−2

• If the gradient is positive, we go left

• Otherwise we go right

⇒ This is how we make sure to find small function values

• α denotes the learning rate (prevention of oscillation)

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 14
Common ML Algorithms Basics » Regression and Gradient Descent

Gradient Descent (Ctd.)

Optimizing in multiple Dimensions

• So far we have only optimized in one dimension

• The error function has two independent variables: a, b

• Switch dimensions:
b error function
e=
e(an−1 , b) − e(an−2 , b)
an = an−1 − α · b = const
an−1 − an−2

e(a, bn−1 ) − e(a, bn−2 )

bn = bn−1 − α · a = const
bn−1 − bn−2
Choose one dimension to optimize, treat all other dimensions
as constant.

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 15
Common ML Algorithms Basics » Regression and Gradient Descent

Results of Linear Regression

for α = 0.001 and 3,000 iterations

• a ≈ 2.2667, b ≈ 2.1590, Error = e(2.2667, 2.1590) ≈ 429.6765

• fˆ(x) = 2.2667 · x + 2.1590 ⇒ fˆ(2.7) = 8.2791

y
10

x
−4 −2 2 4

−10
x = 2.7
y ≈ 8.28

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 16
Common ML Algorithms Basics » Regression and Gradient Descent

Results of Linear Regression (Ctd.)

• Values for a and b computed using a closed form equation:

– a = 1.896911
– b = 2.009028

• But: The error is still not 0!

This is because the data is not aligned in a straight line!

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 17
Common ML Algorithms Basics » Regression and Gradient Descent

Mathematically exact Gradient

Using partial Derivatives of Error Function

• Previously a „simplified/naive version“ of the gradient was used

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 18
Common ML Algorithms Basics » Regression and Gradient Descent

Mathematically exact Gradient

Using partial Derivatives of Error Function

• Previously a „simplified/naive version“ of the gradient was used

• Also possible:

Gradient for parameter a

n n
!
∂ 1 X 2 X
· (yi − (a · xi + b))2 = · −xi · (yi − (a · xi + b))
∂a n n
i=1 i=1

n n
!
∂ 1 X 2 X
· (yi − (a · xi + b))2 = · −(yi − (a · xi + b))
∂b n n
i=1 i=1
Gradient for parameter b

See: [Link]

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 18
Common ML Algorithms Basics » Regression and Gradient Descent

Disadvantage of Gradient Descent

• Disadvantage of Gradient Descent:

Might get stuck in local optimum

2 Local
Global
y

Optimum
Optimum
0

−2 −1 0 1 2
x

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 19
Common ML Algorithms Basics » Regression and Gradient Descent

Rastrigin Function
A Nightmare for Gradient Descent

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 20
Common ML Algorithms Basics » Regression and Gradient Descent

Overfitting and Underfitting

Or: How many Degrees of Freedom?

Overfitting Underfitting

The model does not general- The model generalizes too

ize well. (Too many degrees of much! (Too few degrees of
freedom!) freedom)

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 21
Common ML Algorithms Basics

K-Nearest-Neighbor
Common ML Algorithms Basics » K-Nearest-Neighbor

K-Nearest-Neighbor
Overview

• Basic idea: Predict class depending on nearest neigbors

• Instance-based, supervised, online learning

x2
0.8

0.6

0.4

0.2
? x1
0.2 0.4 0.6 0.8

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 23
Common ML Algorithms Basics » K-Nearest-Neighbor

K-Nearest-Neighbor
Distance Metric

• How to measure the distance between two items i and j?

⇒ Distance Metrics
• d is a distance function d : (i, j) −→ [0; +∞)

• d has the following properties:

1) d(i, j) = d(j, i) (Commutativity)

2) d(i, j) = 0 ⇒ i = j
3) d(i, k) ≤ d(i, j) + d(j, k) (Triangle inequality)

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 24
Common ML Algorithms Basics » K-Nearest-Neighbor

Distance Metrics
Some Examples for Continuous Values

• lr metric:
p
!1/r
X
d(i, j) = (xik − xjk )r
k=1

• l1 metric, City-Block/Manhattan Distance (r = 1):

p
X
d(i, j) = |xik − xjk |
k=1
• l2 metric, Euclidean Distance (r = 2):
v
u p
uX
d(i, j) = t (xik − xjk )2
k=1

• Many more: [Link]

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 25
Common ML Algorithms Basics » K-Nearest-Neighbor

Distance Metrics (Ctd.)

Manhattan Distance and Euclidean Distance

x2
5
p √
d(i, j) = (7 − 4)2 + (4 − 2)2 = 13
4
a n
de
3 ucli
E
Manhattan
2
d(i, j) = |7 − 4| + |4 − 2| = 5
1

0 x1
0 1 2 3 4 5 6 7 8 9 10

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 26
Common ML Algorithms Basics » K-Nearest-Neighbor

Distance Metrics (Ctd.)

Example for categorical Values

• Simple-Matching-Distance

number of non-matching attributes

SMD =
number of attributes

a1 : Outlook a2 : Temperature a3 : Humidity a4 : Wind

sunny hot high weak
rainy mild normal weak
SMD: 3/4

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 27
Common ML Algorithms Basics » K-Nearest-Neighbor

Similarity Metrics as Alternative

Cosine Similarity

• Similarity metrics are an alternative to distance metrics

• A famous one is the Cosine Similarity

• Cosine similarity of two vectors ~a, ~b is the cosine

of the included angle:

~b Pn
~
a · i=1 api · bi
cos ^(~a, ~b) = = pPn Pn
k~ak2 k~bk2 2
i=1 (ai ) · i=1 (bi )
2

• Since the dot product is defined as:

~a · ~b = k~akk~bk cos ^(~a, ~b)

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 28
Common ML Algorithms Basics » K-Nearest-Neighbor

Cosine Similarity (Ctd.)

visualized

x2 j1 is closer to i
since
j~2 cos(α1 ) > cos(α2 )
1 —————————
~i cos(0) = 1
cos(90) = 0
j~1
α2

α1

x1
1

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 29
Common ML Algorithms Basics » K-Nearest-Neighbor

Prediction with K-Nearest-Neighbor

Calculating Distances

i = (0.45, 0.15); Euclidean Distance is used

x1 x2 c d(i, jl )
j1 0.65521554 0.24444527 1 0.225905571
j2 0.24504847 0.78597272 1 0.668181435
j3 0.15609492 0.81457872 1 0.726667098
j4 0.56865867 0.20662444 1 0.131477021
j5 0.20714328 0.71919525 1 0.618839736
j6 0.66016056 0.27059772 1 0.242304088
j7 0.26992809 0.10746116 0 0.185028229
j8 0.38712087 0.12727304 0 0.066860300
j9 0.39236510 0.85702696 0 0.709372190
j10 0.44483550 0.67265678 0 0.522682295
j11 0.30913813 0.33331875 0 0.231187868
j12 0.03022423 0.50891496 0 0.552296701
.. .. .. .. ..
. . . . .

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 30
Common ML Algorithms Basics » K-Nearest-Neighbor

Prediction with K-Nearest-Neighbor (Ctd.)

• All distances have been calculated

• Sort table by distances in ascending order

If similarity metrics are used ⇒ Sort in descending order!

• Look at the first k entries and count the occurrences of each class

• Predict the class with maximum score

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 31
Common ML Algorithms Basics » K-Nearest-Neighbor

Prediction with K-Nearest-Neighbor (Ctd.)

• k = 10 (We are only interested in the first 10 entries)

• Cnt(c = 0) = 3, Cnt(c = 1) = 7 ⇒ Predict class 1

x1 x2 c d(i, jl )
0.50913594 0.17425407 1 0.06391650
0.38712087 0.12727304 0 0.06686030
0.52462468 0.17319311 1 0.07814578
0.42848188 0.22721030 0 0.08015273
0.46872483 0.03031198 0 0.12114389
0.52246121 0.25967888 1 0.13145373
0.56865867 0.20662444 1 0.13147702
0.53451705 0.25122332 1 0.13186847
0.58228139 0.11870052 1 0.13593389
0.58674444 0.13294919 1 0.13780338
.. .. .. ..
. . . .

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 32
Common ML Algorithms Basics » K-Nearest-Neighbor

Algorithm 1 K-Nearest-Neighbor Algorithm

Require: New Instance i, Training Instances S, Integer k
1: distances ←− ∅
2: for all j ∈ S do
3: calculate distance d(i, j), e. g. Euclidean Distance:
v
u p
uX
d(i, j) ←− t (xik − xjk )2
k=1

4: distances ←− distances ∪ hi, j, d(i, j)i

5: distances ←− sortByDistance(distances, ascending=True)
6: select top k elements from distances
7: return majority class as prediction

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 33
Common ML Algorithms Basics » K-Nearest-Neighbor

K-Nearest-Neighbor
Use Case: Information Retrieval

• Suppose you want to create a search engine like G OOGLE

• You have to find matching documents to a query

• This is referred to as Information Retrieval

„Information retrieval (IR) is the activity of obtaining informa-

tion resources relevant to an information need from a collection
of information resources.“ W IKIPEDIA

Problem: Comparing the query with each document (data

point) is computationally extremely expensive and therefore not
viable! ⇒ Millions of documents!

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 34
Common ML Algorithms Basics » K-Nearest-Neighbor

K-Nearest-Neighbor
Use Case: Information Retrieval (Ctd.)

Docs ? Query
„SAP HANA Memory Sizes“

Some MAGIC

Retrieved
Documents

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 35
Common ML Algorithms Basics » K-Nearest-Neighbor

KD-Trees (k-dimensional trees)

• It’s a specific data structure for effectively representing data

• Organizes and partitions data based on conditions

• Allows for pruning the search space

We no longer have to visit each data point (document) in the

search space! :-)

• Data is partitioned by making cuts in each dimension

• We maintain lists of points that fall in each bin (partition)

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 36
Common ML Algorithms Basics » K-Nearest-Neighbor

KD-Trees (Ctd.)

• How to draw the cuts? • When to stop?

– Median value in the box – Threshold # points in box
– Center point – Reached minimum
(ignore spread of data) width of box

• Switch dimensions

Don’t worry. There will be an example on the next slides ;-)

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 37
Common ML Algorithms Basics » K-Nearest-Neighbor

KD-Trees
Example

X Y
Data point 0 0.54 0.93 0.8
Data point 1 0.96 0.86
0.6

Y
Data point 2 0.42 0.67
Data point 3 0.11 0.53
Data point 4 0.64 0.29 0.4
Data point 5 0.27 0.75
Data point 6 0.81 0.63 0.2 0.4 0.6 0.8 1
X
• How to draw the cuts?
– Mean of max and min value ⇒ vcut = (min + max)/2

• When to stop?
– Split until there are maximum two points per box

c 2018 SAP SE or an SAP affiliate company. All rights reserved. | Public Daniel Wehner 38
Common ML Algorithms Basics » K-Nearest-Neighbor

KD-Trees
Example: The first Cut

• Cut X-axis

• Xcut = (minX + maxX )/2 = (0.11 + 0.96)/2 = 0.53

Root
X > 0.53

No: Yes:
Node 1 Node 2

KD-Trees
Example: The first Cut (Ctd.)

1
Y
0.8

0.6

0.4

0.2
Node 1 Node 2
X
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

KD-Trees
Example: Consecutive Cuts

Root
X > 0.53

No: Yes:
Node 1 Node 2
Y > 0.64 Y > 0.61

Yes:
No: Yes: No:
Node 6
Node 3 Node 4 Node 5
X > 0.75

No: Yes:
Node 7 Node 8

KD-Trees
Example: Consecutive Cuts (Ctd.)

1
Y
0.8
N4 N7 N8
0.6
???
0.4
(0.38,0.55)

0.2
N3 N5 X
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

KD-Trees
Example: Predicting

Root
X > 0.53

No: Yes:
Node 1 Node 2
Y > 0.64 Y > 0.61

Yes:
No: Yes: No:
Node 6
Node 3 Node 4 Node 5
X > 0.75

No: Yes:
Node 7 Node 8