0% found this document useful (0 votes)
18 views40 pages

ShortCourse QTT Lecture1

Uploaded by

lam minh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views40 pages

ShortCourse QTT Lecture1

Uploaded by

lam minh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Unsupervised Learning

Lecture 1: Introduction and Evaluation

Tho Quan
qttho@[Link]
Agenda

• Introduction to supervised learning


• A case study with simple classification approach
• Evaluation the classification
• Make it practical: un upgraded version
Supervised vs. Unsupervised Learning
Classification: Overview

• Classification

Training Data Classification Model Make Predictions on


unseen Data

• Example :
• Play goft
• Neuron Network
Process 1: Model Construction

Classification
Algorithms
Training
Data

NAM E RANK YEARS TENURED Classifier


M ike Assistant Prof 3 no (Model)
M ary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes
IF rank = ‘professor’
Dave Assistant Prof 6 no
OR years > 6
Anne Associate Prof 3 no
THEN tenured = ‘yes’
9
Process 2: Using the Model in Prediction

Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)
NAM E RANK YEARS TENURED
Tom Assistant Prof 2 no
Tenured?
M erlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
Classification Algorithms

• Classification Algorithms:
• Support Vector Machines
• Neural Network (multi-layer perceptron)
• Decision Tree
• K-Nearest Neighbor
• Naive Bayes Classifier…
K-Nearest Neighbor

• Classifying objects based on closest training examples in the feature


space
• Approximated locally
• All computation is deferred until classification
• Classified by a majority vote of its neighbors
Data

• The training examples are vectors in a multidimensional feature space,


each with a class label
• The training phase of the algorithm consists only of storing the feature
vectors and class labels of the training samples
Algorithm
- k is a user-defined constant
- Choose k nearest neighbors (NNS)
- Label is label which is most frequent among the k training samples
nearest.
K-Nearest Neighbor

• Classifying objects based on closest training examples in the feature


space
• Approximated locally
• All computation is deferred until classification
• Classified by a majority vote of its neighbors
Similarity and Representation

Ø How do we define similarity?


Ø How do we represent the objects whose similarities we wish to
measure?
Motivation for the VSM

• We get back to the case study of document classification


• VSM is an algebraic model for representing text documents
as vectors of index term
• A document is represented as a vector. Each dimension
corresponds to a separate term. Its values are tf-idf
weighting
Document Collection
• A collection of n documents can be represented in the
vector space model by a term-document matrix.
• An entry in the matrix corresponds to the “weight” of a
term in the document; zero means the term has no
significance in the document or it simply doesn’t exist in
the document. T1 T2 …. Tt
D1 w11 w21 … wt1
D2 w12 w22 … wt2
: : : :
: : : :
Dn w1n w2n … wtn
Measuring Distance

• Euclidean Distance
• Manhattan Distance
• Cosine Similarity
• Vector Length & Normalization
Euclidean Distance

a +b = c
2 2 2

v (2,1)
1

b
2 +1 = c
2 2 2

c= 5
a

u x
(0, 0) 1 2
Euclidean Distance

In p dimensions, let Euclidean distance between


two points u and v be:

p
dist(u, v) = å (u - v )
i =1
i i
2
Manhattan Distance

• The Manhattan distance (a k a city block distance) is


the number of units on a rectangular grid it takes to
travel from point u to v. y

2 units horizontally +
1 unit vertically = 3 units
p
DM (u, v) = å ui - vi 1
v (2,1)

i =1
b

u x
(0, 0) 1 2
Cosine Similarity

• The traditional vector space model is based on a


different notion of similarity: the cosine of the
angle between two vectors.

• To start our consideration of cosine similarity,


consider our document vectors not as points in
term space, but as arrows that travel from the
origin of the space to a particular address.
Calculating Cosine(x, y)

x(x1, …, xp) & y(y1,…, yp) are two


vectors in p-dimensions:

x× y
cos(x, y) =
p p
(å x i × x i ) (å yi × yi )
i =1 i =1
p

åx ×y i i
= i =1
(1)
p p
(å x i × x i ) (å yi × yi )
i =1 i =1
Vector Length and Normalization

v
a +b =c
2 2 2

c 2 +1 = c
2 2 2
b

a
c= 5
x
1 2
Vector Length and Normalization

p
• Vector length:
v = åv
i =1
2
i = v×v

• Normalization: v
u=
v
Thus x×y
cos( x, y ) =
x y
Using the Cosine for IR

• From now on, unless otherwise specified, we can assume that


all document vectors in our term-document matrix A have been
normalized to unit length.
• Likewise we always normalize our query to unit length.
• Given these assumptions, we have the classic vector space
model for IR.
Accuracy Measures

• A natural criterion for judging the performance of a classifier is


the probability for making a misclassification error.

• Misclassification
• The observation belongs to one class, but the model classifies it as a
member of a different class.

• A classifier that makes no errors would be perfect


• Do not expect to be able to construct such classifiers in the real world
• Due to “noise”
• Not having all the information needed to precisely classify cases.
Accuracy Measures

• To obtain an honest estimate of classification error, we use the


classification matrix that is computed from the validation data.
• We first partition the data into training and validation sets by random
selection of cases.
• We then construct a classifier using the training data,
• Apply it to the validation data,
• Yields predicted classifications for the observations in the validation set.
• We then summarize these classifications in a classification matrix.
Problem with accuracy

• If the training and test data are skewed towards one classification,
then the model will predict everything as being that class.
• In Titanic training data, 68% of people died. If one trained a model
to predict everybody died, it would be 68% accurate.
Confusion Matrix
Practical Metrics for Data Science Projects

• Case study: in the Flight Delay project with NTU, we applied the
following metrics
• Accuracy
• Precision
• Recall
• F-measure
Practical k-NN

• Need a data structure for calculate near neighbors fast.


• Data structure
• KD-tree
Example: 2-D Tree

2-d Tree for these points :


(2,3), (5,4), (9,6), (4,7), (8,1), (7,2)
Tree Operation - Insert
(9, 5) => 9 > 7

(9, 5) => 5 < 6

(9, 5) => 9 > 8

9,5
Complexity

• Building a static k-d tree from n points O(n log n)


• Inserting a new point into a balanced k-d tree takes O(log n) time.
• Removing a point from a balanced k-d tree takes O(log n) time.
• Finding 1 nearest neighbour in a balanced k-d tree with randomly distributed
points takes O(log n) time on average.
Search Nearest Neighbor
Search Nearest Neighbor
Search K-Nearest Neighbor

• Apply the Search Nearest Neighbor with maintain the list


of points.
• Kd-trees are not suitable for efficiently finding the nearest
neighbour in high-dimensional spaces and approximate
nearest-neighbour methods should be used instead.

You might also like