Data Mining Presentation
Data Mining Presentation
Mohammadi Zanjireh
◦ Slice
◦ Dice
Observations
1001 Ali 17.12
1002 Reza 13.89
1003 Maryam 16.02
1004 Hasan 15.45
Attribute types:
◦ Nominal: Subject, Occupation.
◦ Numeric
Interval_scaled.
Ratio_scaled.
Discrete vs. Continuous attributes.
Mean:
Trimmed Mean:
Median:
Mode:
Variance and Standard Deviation:
Quartile:
Q1, Q2=Median, Q3
Example1:
1 2 3 4 5 6 7 8 9
Example1:
1 2 3 4 5 6 7 8 9
Q2=5.0
Example1:
1 2 3 4 5 6 7 8 9
Q2=5.0
Q1=2.5 Q3=7.5
Example2:
1 2 3 4 5 6 7 8 9 10
Example2:
1 2 3 4 5 6 7 8 9 10
Q2=5.5
Example2:
1 2 3 4 5 6 7 8 9 10
Q2=5.5
Q1=3.0 Q3=8.0
Example3:
1 2 3 4 5 6 7 8 9 10 11
Example3:
1 2 3 4 5 6 7 8 9 10 11
Q2=6.0
Example3:
1 2 3 4 5 6 7 8 9 10 11
Q2=6.0
Q1=3.0 Q3=9.0
Example4:
1 2 3 4 5 6 7 8 9 10 11 12
Example4:
1 2 3 4 5 6 7 8 9 10 11 12
Q2=6.5
Example4:
1 2 3 4 5 6 7 8 9 10 11 12
Q2=6.5
Q1=3.5 Q3=9.5
Inter Quartile Range (IQR):
IQR = Q3 - Q1
Q2=10.0
Example5:
0 2 5 8 8 8 9 9 10 10 10 11 12 12 12 14 15 20 25
Q2=10.0
Q1=8.0 Q3=12.0
Example5:
0 2 5 8 8 8 9 9 10 10 10 11 12 12 12 14 15 20 25
Q2=10.0
Q1=8.0 Q3=12.0
Q2=10.0
Q1=8.0 Q3=12.0
Student _1
Student_i
Student_n
Dissimilarity Matrix:
Similarity Matrix:
Sim(i,j)=1-d(i,j)
Proximity measures for Nominal attributes
Example:
Id Subject Birth_City Living_City Eye_Colour
1 Computer Teh Teh Black
2 Electricity Teh Kar Brown
3 Mechanic Qaz Qaz Brown
4 Computer Kar Qaz Green
0.00
Dissimilarity 0.75 0.00
=
Matrix 1.00 0.75 0.00
0.75 1.00 0.75 0.00
Proximity measures for Binary attributes
𝒓+𝒔
Symmetric Binary: d(i,j)=
𝒒+𝒓+𝒔+𝒕
𝒓+𝒔
Asymmetric Binary: d(i,j)=
𝒒+𝒓+𝒔
Example: Gender : Symmetric others : Asymmetric
2
d(Ali, Maryam) = = 0.50
4
2
d(Ali, Reza) = = 0.50
4
4
d(Maryam, Reza) = = 0.80
5
Euclidean Distance:
Euclidean Distance:
Manhattan Distance:
Euclidean Distance:
Manhattan Distance:
Supremum Distance:
Proximity measures for Ordinal attributes
Small Medium Large
1 2 3
Example:
Degree={Diploma, Undergraduate, Master, PhD}
Drink_size={Small, Medium, Large}
Id Degree Drink_size
1 Undergraduate Medium
2 PhD Small
3 Diploma Medium
4 Undergraduate Large
Example:
Degree={Diploma, Undergraduate, Master, PhD}
Drink_size={Small, Medium, Large}
Id Degree Drink_size
1 0.33 0.50
2 1.00 0.00
3 0.00 0.50
4 0.33 1.00
Normalising:
Id Grade
1 30
2 52
3 84
4 45
5 25
6 20
7 91
8 65
9 42
10 32
Normalising:
Id Grade Grade-min
1 30 10
2 52 32
3 84 64
4 45 25
5 25 5
6 20 0
7 91 71
8 65 45
9 42 22
10 32 12
Normalising:
(Grade-min)/(max-min)
Id Grade Grade-min
1 30 10 0.14
2 52 32 0.45
3 84 64 0.90
4 45 25 0.35
5 25 5 0.07
6 20 0 0.00
7 91 71 1.00
8 65 45 0.63
9 42 22 0.31
10 32 12 0.17
Proximity measures for mixed types:
Examples:
Id Test_1(nominal) Test_2(ordinal) Test_3(numeric)
1 Code A Excellent 45
2 Code B Fair 22
3 Code C Good 64
4 Code A Excellent 28
Examples:
Id Test_1(nominal) Test_2(ordinal) Test_3(numeric)
1 Code A 1 0.55
2 Code B 0 0.00
3 Code C 0.5 1.00
4 Code A 1 0.14
0.00
Dissimilarity 0.85 0.00
=
Matrix 0.65 0.83 0.00
0.13 0.71 0.79 0.00
Exercise:
Id Test_1(nominal) Test_2(ordinal) Test_3(numeric)
1.00
Similarity 0.94 1.00
=
Matrix 0.17 0.12 1.00
0.07 0.17 0.23 1.00
3.1-Data Cleaning.
3.2-Data Integration.
3.3-Data Reduction.
3.4-Data Transform.
3.1- Data Cleaning
3.1.1-Missing values
o Ignore the tuple.
o Fill in the missing value manually.
o Use a global constant.
o Use the attribute mean or median.
o Use the class mean or median.
3.1- Data Cleaning
➢ 3.1.2-Noise
o Binning
o Smoothing by bin means.
o Smoothing by bin boundaries.
o Regression
o Clustering.
Example:
4, 8, 15, 21, 21, 24, 25, 28, 34
Example:
4, 8, 15, 21, 21, 24, 25, 28, 34
Partition into bins: bin1: 4, 8, 15. bin2: 21, 21, 24. bin3: 25, 28, 34
Example:
4, 8, 15, 21, 21, 24, 25, 28, 34
Partition into bins: bin1: 4, 8, 15. bin2: 21, 21, 24. bin3: 25, 28, 34
Partition into bins: bin1: 4, 8, 15. bin2: 21, 21, 24. bin3: 25, 28, 34
➢ 3.3.2-Numerosity reduction
o Sampling.
o Clustering.
3.4- Data Transform
❑ Normalisation:
❑ Min_max normalisation:
❑ Z-Score normalisation:
Example:
3, 4, 4, 5, 9
Ã=5.00, σ = 2.34
• Support
Support(A=>B)=P(A∪B)
• Confidence
𝑆𝑢𝑝𝑝𝑜𝑟𝑡(𝐴∪ 𝐵) 𝑆𝑢𝑝𝑝𝑜𝑟𝑡_𝐶𝑜𝑢𝑛𝑡(𝐴∪ 𝐵)
Confidence(A=>B) = P(B|A) = =
𝑆𝑢𝑝𝑝𝑜𝑟𝑡(𝐴) 𝑆𝑢𝑝𝑝𝑜𝑟𝑡_𝐶𝑜𝑢𝑛𝑡(𝐴)
Mining Association Rules
1. Find all frequent itemsets.
2. Generate strong association rules from the frequent itemsets.
• Example:
} 𝟖𝑻 D = {𝑻𝟏 , 𝑻𝟐 , 𝑻𝟑 , 𝑻𝟒 , 𝑻𝟓 , 𝑻𝟔 , 𝑻𝟕 ,
}جعفری ،پیاز ،زیتون ،خیار ،گوجه فرنگی{= 𝟏𝑻
}جعفری ،خیار ،گوجه فرنگی{= 𝟐𝑻
}جعفری ،پیاز،نان ،نمک ،خیار ،گوجه فرنگی{= 𝟑𝑻
}پیاز ،نان ،خیار ،گوجه فرنگی{= 𝟒𝑻
}پیاز ،نمک ،گوجه فرنگی{= 𝟓𝑻
}پنیر ،نان{= 𝟔𝑻
}خیار ،پیاز ،گوجه فرنگی{= 𝟕𝑻
}کره ،نان{= 𝟖𝑻
Itemset Sup_Count
{𝐼1 , 𝐼2 , 𝐼3 } 2
{𝐼1 , 𝐼2 , 𝐼5 } 2
6. 𝑳𝟑
Itemset Sup_Count
{𝐼1 , 𝐼2 , 𝐼3 } 2
{𝐼1 , 𝐼2 , 𝐼5 } 2
7. 𝑪𝟒 = 𝑳𝟑 ∗ 𝑳𝟑 , Prune
𝐿4 = ∅.
End of Step 1.
Code of Apriori Algorithm:
Confidence: X={𝐼1 , 𝐼2 , 𝐼5 }
Confidence: X={𝐼1 , 𝐼2 , 𝐼5 }.
2
𝐼1 =>{𝐼2 , 𝐼5 }: Confidence= = 33%
6
2
𝐼2 =>{𝐼1 , 𝐼5 }: Confidence= = 29%
7
2
𝐼5 =>{𝐼1 , 𝐼2 }: Confidence= = 100%
2
2
{𝐼1 , 𝐼2 }=>{𝐼5 }: Confidence= = 50%
4
2
{𝐼1 , 𝐼5 }=>{𝐼2 }: Confidence= = 100%
2
2
{𝐼2 , 𝐼5 }=>{𝐼1 }: Confidence= = 100%
2
Confidence: X={𝐼1 , 𝐼2 , 𝐼5 }.
2
{𝐼1 }=>{𝐼2 , 𝐼5 }: Confidence= = 33% ×
6
2
{𝐼2 }=>{𝐼1 , 𝐼5 }: Confidence= = 29% ×
7
2
{𝐼5 }=>{𝐼1 , 𝐼2 }: Confidence= = 100%
2
2
{𝐼1 , 𝐼2 }=>{𝐼5 }: Confidence= = 50% ×
4
2
{𝐼1 , 𝐼5 }=>{𝐼2 }: Confidence= = 100%
2
2
{𝐼2 , 𝐼5 }=>{𝐼1 }: Confidence= = 100%
2
2- T100={𝐼1 , 𝐼2 , 𝐼5 }
Null
𝟏
I2
𝟏
I1
𝟏
I5
3- T200={𝐼2 , 𝐼4 }
Null
𝟐
I2
𝟏 𝟏
I1 I4
𝟏
I5
4- T300={𝐼2 , 𝐼3 }
Null
𝟑
I2
𝟏 𝟏 𝟏
I3 I1 I4
𝟏
I5
5- T400={𝐼1 , 𝐼2 , 𝐼4 }
Null
𝟒
I2
𝟏 𝟐 𝟏
I3 I1 I4
𝟏 𝟏
I5 I4
6- T500={𝐼1 , 𝐼3 }
Null
𝟏
I2
𝟒 I1
𝟏
I3 𝟏
I1
𝟐
I4
𝟏 I3
𝟏 𝟏
I5 I4
7- T600={𝐼2 , 𝐼3 }
Null
𝟏
I2
𝟓 I1
𝟏
I3 𝟐
I1
𝟐
I4
𝟏 I3
𝟏 𝟏
I5 I4
8- T700={𝐼1 , 𝐼3 }
Null
𝟐
I2
𝟓 I1
𝟐
I3 𝟐
I1
𝟐
I4
𝟏 I3
𝟏 𝟏
I5 I4
9- T800={𝐼1 , 𝐼2 , 𝐼3 , 𝐼5 }
Null
𝟐
I2
𝟔 I1
𝟐
𝟐
I3 I1
𝟑
I4
𝟏 I3
𝟏 𝟏 𝟏
I5 I4 I3
𝟏
I5
10- T900={𝐼1 , 𝐼2 , 𝐼3 }
Null
𝟐
I2
𝟕 I1
𝟐
𝟐
I3 I1
𝟒
I4
𝟏 I3
𝟏 𝟏 𝟐
I5 I4 I3
𝟏
I5
11- FP-tree is mined: We start from the last item of L.
{𝑰𝟑 , 𝑰𝟒 } {}
{𝑰𝟑 , 𝑰𝟓 } {T800}
{𝑰𝟒 , 𝑰𝟓 } {}
itemsets List of items
{𝑰𝟏 , 𝑰𝟐 , 𝑰𝟑 } {T800, T900}
{𝑰𝟏 , 𝑰𝟐 , 𝑰𝟒 } {T400}
{𝑰𝟏 , 𝑰𝟐 , 𝑰𝟓 } {T100, T800}
• Classification
• Supervised learning vs. unsupervised learning.
30 65 G
(Salary ≤ 40) (Salary ≤ 50)
23 15 B
40 75 G
55 40 B
B G B G
55 100 G
45 60 G
Classification Rules:
Class B: {(Age<=35) and (Salary<=40)} or {(Age>35) and (Salary<=50)}
Class G: {(Age<=35) and (Salary>40)} or {(Age>35) and (Salary>50)}
Test data: (Age=25 and Salary=50) => Class G
• Generating Decision tree:
• Invented by John Ross Quinlan, ID3 uses a top-down greedy approach to build a
decision tree. In simple words, the top-down approach means that we start
building the tree from the top and the greedy approach means that at each iteration
we select the best attribute at the present moment to create a node.
➢ ID3 (Iterative Dichotomiser 3) Algorithm:
• Invented by John Ross Quinlan, ID3 uses a top-down greedy approach to build a
decision tree. In simple words, the top-down approach means that we start
building the tree from the top and the greedy approach means that at each iteration
we select the best attribute at the present moment to create a node.
• ID3 uses Information Gain or just Gain to find the best attribute.
• The attribute with the highest Information Gain is selected as the best one.
➢ ID3 steps:
2. Make a decision tree node using the attribute with the maximum
Information gain.
3. If all rows belong to the same class, make the current node as a leaf node
with the class as its label.
4. Repeat for the remaining features until we run out of all features, or the
decision tree has all leaf nodes.
• ID3 algorithm
D: 7,000, E=0.82
(Information Gain)
𝐷2 : 2,500,
𝑒2 =0.40
𝑰𝒏𝒇𝒐(D) = -σ𝒎
𝒊=𝟏 𝒑𝒊 ∗ 𝒍𝒐𝒈𝟐 (𝒑𝒊 )
|𝑪𝒊,𝑫 |
𝒑𝒊 =
|𝑫|
|𝑫𝒋 |
𝑰𝒏𝒇𝒐𝑨 𝑫 = -σ𝑽𝒋=𝟏 * Info(𝑫𝒋 )
|𝑫|
RID age income student credit-rating Class: buy-computer
age:14
no:5
|𝐶𝑖,𝐷 |
𝐼𝑛𝑓𝑜(D) = -σ𝑚
𝑖=1 𝑝𝑖 ∗ 𝑙𝑜𝑔2 (𝑝𝑖 ) 𝑝𝑖 =
|𝐷|
9 9 5 5
𝐼𝑛𝑓𝑜𝑎𝑔𝑒 (D) = − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 = 0.94
14 14 14 14
yes:2
youth:5
no:3
yes:4
age:14 middle:4
no:0
yes:3
senior:5
no:2
|𝐷𝑗 |
➢ 𝐼𝑛𝑓𝑜𝐴 𝐷 = -σ𝑉𝑗=1 * Info(𝐷𝑗 )
|𝐷|
5 2 2 3 3 4 4 4
𝐼𝑛𝑓𝑜𝑎𝑔𝑒 (D) = − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 + ቂ− 𝑙𝑜𝑔2
14 5 5 5 5 14 4 4
0 0 5 3 3 2 2
− 𝑙𝑜𝑔2 ቃ + − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 = 0.69
4 4 14 5 5 5 5
𝒂𝒊 +𝒂𝒊+𝟏
( )
𝟐
𝑫𝟏 : 𝑨 ≤ 𝒔𝒑𝒍𝒊𝒕 − 𝒑𝒐𝒊𝒏𝒕
𝑫𝟐 : 𝑨 > 𝒔𝒑𝒍𝒊𝒕 − 𝒑𝒐𝒊𝒏𝒕
• Gain Ratio is the normalised Information Gain and Reduces a bias toward
multi-valued attributes.
• The attribute with the highest Information Gain is selected as the best one.
• Gain Ratio:
𝑮𝒂𝒊𝒏(𝑨)
GainRatio(A) =
𝑺𝒑𝒍𝒊𝒕𝑰𝒏𝒇𝒐𝑨 (𝑫)
𝒗 |𝑫𝒋 | |𝑫𝒋 |
𝑺𝒑𝒍𝒊𝒕𝑰𝒏𝒇𝒐𝑨 (𝑫) = -σ𝒋=𝟏 ∗ 𝒍𝒐𝒈𝟐 ( )
|𝑫| |𝑫|
|𝐷𝑗 | |𝐷𝑗 |
𝑆𝑝𝑙𝑖𝑡𝐼𝑛𝑓𝑜𝐴 (𝐷) = -σ𝑣𝑗=1 ∗ 𝑙𝑜𝑔2 ( )
|𝐷| |𝐷|
5 5 4 4 5 5
𝑆𝑝𝑙𝑖𝑡𝐼𝑛𝑓𝑜𝑎𝑔𝑒 (𝐷) = − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 = 1.57
14 14 14 14 14 14
• Example:
𝟎.𝟐𝟓
GainRatio(age) = =0.16
𝟏.𝟓𝟕
• Exercise:
GainRatio(income) = ?
GainRatio(credit_rating) = ?
GainRatio(student) = ?
• Developed by Breiman.
• The attribute with the lowest Gini Index is selected as the best one.
• Gini Index:
|𝑫𝟏 | |𝑫𝟐 |
𝑮𝒊𝒏𝒊𝑨 𝑫 = Gini(𝑫𝟏 ) + Gini(𝑫𝟐 )
|𝑫| |𝑫|
𝑮𝒊𝒏𝒊 𝑫 = 𝟏 − 𝒑𝟐𝒊
𝒊=𝟏
|𝑪𝒊,𝑫 |
𝒑𝒊 =
|𝑫|
age
D1 D2
subset1 subset2
• Example:
𝑮𝒊𝒏𝒊_𝑰𝒏𝒅𝒆𝒙𝒂𝒈𝒆 𝒚𝒐𝒖𝒕𝒉,𝒔𝒆𝒏𝒊𝒐𝒓 𝑫 = 𝑮𝒊𝒏𝒊_𝑰𝒏𝒅𝒆𝒙𝒂𝒈𝒆 𝒎𝒊𝒅𝒅𝒍𝒆 𝑫 = 𝟎. 𝟑𝟔
• Exercise: Calculate the Gini Index for all attributes and all combinations.
• Tree pruning
• Pre-pruning.
• Post-pruning.
• CHAID.
• MARS.
• Rainforest.
• Boat.
• Bayesian Classification
• Bayes’ Theorem:
𝑷 𝑩 𝑨 . 𝑷(𝑨)
𝑷 𝑨𝑩 =
𝑷(𝑩)
• Naïve Bayesian:
A , B : independent from each other => P(AB) = P(A) * P(B)
X=(𝒙𝟏 , 𝒙𝟐 , …, 𝒙𝒏 ) ⇒
𝑷 𝑿 𝑪𝒊 = 𝑷 𝒙𝟏 𝑪𝒊 ∗ 𝑷 𝒙𝟐 𝑪𝒊 ∗ … * 𝑷 𝒙𝒏 𝑪𝒊 =ς𝒏𝒌=𝟏 𝑷 𝒙𝒌 𝑪𝒊
• Bayesian Classification
𝑷 𝑿 𝑪𝒊 = 𝑷 𝒙𝟏 𝑪𝒊 . 𝑷 𝒙𝟐 𝑪𝒊 . … . 𝑷 𝒙𝒏 𝑪𝒊 =ς𝒏𝒌=𝟏 𝑷 𝒙𝒌 𝑪𝒊
(𝒙𝒌 −𝛍𝑪 )𝟐
𝟏 −( 𝟐𝛔 𝒊 )
𝑷 𝒙𝒌 𝑪𝒊 = 𝒆 𝒄𝒊
𝟐π𝛔𝑪𝒊
RID age income student credit-rating Class: buy-computer
X ∈ 𝐶1 or 𝐶2
9
P(𝐶1 ) = 𝑃 𝑏𝑢𝑦 − 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟 = 𝑦𝑒𝑠 = = 0.643
14
5
P(𝐶2 ) = 𝑃 𝑏𝑢𝑦 − 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟 = 𝑛𝑜 = = 0.357
14
• Example:
X=(age=youth, income=medium, student=yes, credit-rating=fair)
𝑃 𝑋 𝐶1 =P(age=youth | buy-computer=yes) *
P(income=medium | buy-computer=yes) *
P(student=yes | buy-computer=yes) *
2 4 6 6
P(credit-rating=fair | buy-computer=yes) = * 9∗ 9∗ = 0.044
9 9
𝑃 𝑋 𝐶2 =P(age=youth | buy-computer=no) *
P(income=medium | buy-computer=no) *
P(student=yes | buy-computer=no) *
3 2 1 2
P(credit-rating=fair | buy-computer=no) = * 5∗ 5∗ = 0.019
5 5
P(X|𝐶1 ) * P(𝐶1 ) =0.044 * 0.643 = 0.028
P(X|𝐶2 ) * P(𝐶2 ) =0.019 * 0.357 = 0.007
P(X|𝐶1 ) * P(𝐶1 ) =0.044 * 0.643 = 0.028
P(X|𝐶2 ) * P(𝐶2 ) =0.019 * 0.357 = 0.007
X ∈ 𝑪𝟏
P(X|𝐶1 ) * P(𝐶1 ) =0.044 * 0.643 = 0.028
P(X|𝐶2 ) * P(𝐶2 ) =0.019 * 0.357 = 0.007
0.028
P(X ∈ 𝐶1 ) = = 0.80
0.028+0.007
0.007
P(X ∈ 𝐶2 ) = = 0.20
0.028+0.007
X ∈ 𝑪𝟏
• Laplacian correction:
𝟎
income = low : 0 records =>P(income=low|𝑪𝟏 )= =0.000
𝟏𝟎𝟎𝟎
𝟗𝟗𝟎
income = medium : 990 records =>P(income=medium|𝑪𝟏 )=𝟏𝟎𝟎𝟎=0.990
𝟏𝟎
income = high: 10 records=>P(income=high|𝑪𝟏 )=𝟏𝟎𝟎𝟎 =0.010
• Laplacian correction:
𝑪𝟏 : buy_computer = yes : 1000 records
𝟏
income = low : 0 records =>P(income=low|𝑪𝟏 )= =0.001
𝟏𝟎𝟎𝟑
𝟗𝟗𝟏
income = medium : 990 records =>P(income=medium|𝑪𝟏 )=𝟏𝟎𝟎𝟑=0.988
𝟏𝟏
income = high: 10 records=>P(income=high|𝑪𝟏 )=𝟏𝟎𝟎𝟑 =0.011
• Using IF-THEN Rules for Classification:
antecedent consequent
Example:
R1: IF age = youth AND student = yes THEN buy_computer = yes
𝒏𝒄𝒐𝒓𝒓𝒆𝒄𝒕 𝒏𝒄𝒐𝒗𝒆𝒓𝒔
• Accuracy(R) = Coverage(R) =
𝒏𝒄𝒐𝒗𝒆𝒓𝒔 |𝑫|
• Extracting classification rules from a
decision tree:
• Confusion Matrix:
=𝑃′ + 𝑁′
𝑻𝑷+𝑻𝑵
• Accuracy = 𝑷+𝑵
𝑻𝑷 𝑻𝑷
• precision = =
𝑻𝑷+𝑭𝑷 𝑷′
𝑻𝑷 𝑻𝑷
• recall = 𝑻𝑷+𝑭𝑵 = 𝑷
• Example:
Predicted Total
yes no
Actual yes 90 210 300
no 140 9560 9700
Total 230 9770 10000
𝟗𝟎
• precision = 𝟐𝟑𝟎 = 39.13%
𝟗𝟎
• recall = 𝟑𝟎𝟎 = 𝟑𝟎. 𝟎𝟎%
𝟐 ∗𝒑𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 ∗ 𝒓𝒆𝒄𝒂𝒍𝒍
• F= 𝒑𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏+ 𝒓𝒆𝒄𝒂𝒍𝒍
➢ Estimating Accuracy Methods:
1. Holdout Method: The given data are randomly partitioned into two
independent sets, a training set and a test set. Typically, two-thirds of the data
are allocated to the training set, and the remaining one-third is allocated to
the test set.
➢ Estimating Accuracy Methods:
.632 bootstrap: On average, 63.2% of the original data tuples will end up in the
bootstrap sample, and the remaining 36.8% will form the test set.
• In addition to accuracy-based measures, classifiers can also be compared with
respect to the following additional aspects:
Speed
𝑹𝒐𝒃𝒖𝒔𝒕𝒏𝒆𝒔𝒔
• Other aspects:
𝑺𝒄𝒂𝒍𝒂𝒃𝒊𝒍𝒊𝒕𝒚
Interpretability
• Introduction: Cohesion and Separation.
Step_1: 𝒄𝟏 =3, 𝒄𝟐 =6
𝑪𝟏 = 𝑿𝟑 , 𝑿𝟓 , 𝑿𝟔 , 𝑿𝟗 , 𝑪𝟐 = 𝑿𝟏 , 𝑿𝟐 , 𝑿𝟒 , 𝑿𝟕 , 𝑿𝟖 , 𝑿𝟏𝟎
• Example 6-1:
Step_1: 𝒄𝟏 =3, 𝒄𝟐 =6
𝑪𝟏 = 𝑿𝟑 , 𝑿𝟓 , 𝑿𝟔 , 𝑿𝟗 , 𝑪𝟐 = 𝑿𝟏 , 𝑿𝟐 , 𝑿𝟒 , 𝑿𝟕 , 𝑿𝟖 , 𝑿𝟏𝟎
𝟑+𝟒+𝟐+𝟑 𝟓+𝟗+𝟔+𝟖+𝟓+𝟕
Step_2: 𝒄𝟏 = = 3.0, 𝒄𝟐 = = 6.67
𝟒 𝟔
𝑪𝟏 = 𝑿𝟑 , 𝑿𝟓 , 𝑿𝟔 , 𝑿𝟗 , 𝑪𝟐 = 𝑿𝟏 , 𝑿𝟐 , 𝑿𝟒 , 𝑿𝟕 , 𝑿𝟖 , 𝑿𝟏𝟎
• Exercise 1: Solve the previous example with 𝒄𝟏 = 2.0 and 𝒄𝟐 = 7.0
• Exercise 2: Solve the previous example with 𝒄𝟏 = 2.5, 𝒄𝟐 = 5.0 , and 𝒄𝟑 = 7.5
• Example 6-2 : (𝐢) (𝐢)
i 𝐗𝟏 𝐗𝟐
𝒄𝟏 =(1.0, 1.1), 𝒄𝟐 =(5.0, 7.0)
1 1.0 1.0
2 1.5 2.0
3 3.0 4.0
4 5.0 7.0
5 3.5 5.0
6 4.5 5.0
7 3.5 4.5
Step_1: 𝒄𝟏 =(1.0, 1.1), 𝒄𝟐 =(5.0, 7.0)
𝟏 𝟐 𝟑 𝟒 𝟓 𝟔 𝟕
𝑪𝟏 = {𝑿 ,𝑿 ,𝑿 } , 𝑪𝟐 = {𝑿 ,𝑿 ,𝑿 ,𝑿 }
Step_1: 𝒄𝟏 =(1.0, 1.0), 𝒄𝟐 =(5.0, 7.0)
𝟏 𝟐 𝟑 𝟒 𝟓 𝟔 𝟕
𝑪𝟏 = {𝑿 ,𝑿 ,𝑿 } , 𝑪𝟐 = {𝑿 ,𝑿 ,𝑿 ,𝑿 }
𝟏+𝟏.𝟓+𝟑 𝟏+𝟐+𝟒
Step_2: 𝒄𝟏 = ( , 𝟑 ) = (𝟏. 𝟖𝟑, 𝟐. 𝟑𝟑)
𝟑
𝟓+𝟑.𝟓+𝟒.𝟓+𝟑.𝟓 𝟕+𝟓+𝟓+𝟒.𝟓
𝒄𝟐 = ( , ) = (𝟒. 𝟏𝟐𝟓, 𝟓. 𝟑𝟕𝟓)
𝟒 𝟒
𝟏 𝟐 𝟑 𝟒 𝟓 𝟔 𝟕
𝑪𝟏 = {𝑿 ,𝑿 } , 𝑪𝟐 = {𝑿 , 𝑿 ,𝑿 ,𝑿 ,𝑿 }
𝟏+𝟏.𝟓 𝟏+𝟐
Step_3: 𝒄𝟏 = ( , 𝟐 ) = (𝟏. 𝟐𝟓, 𝟏. 𝟓)
𝟐
𝟑+𝟓+𝟑.𝟓+𝟒.𝟓+𝟑.𝟓 𝟒+𝟕+𝟓+𝟓+𝟒.𝟓
𝒄𝟐 = ( , ) = (𝟑. 𝟗, 𝟓. 𝟏)
𝟓 𝟓
𝑪𝟏 = {𝑿 𝟏 ,𝑿 𝟐 } , 𝑪𝟐 = {𝑿 𝟑 , 𝑿 𝟒 ,𝑿 𝟓 ,𝑿 𝟔 ,𝑿 𝟕 }
• Exercise 3: Solve the previous example with 𝒄𝟏 = (3.0, 3.0) and 𝒄𝟐 = (4.0, 4.0)
• k-means is sensitive to outliers:
• Complexity of k-means: O(nkt)
• n: No of Subjects.
• k: No of Clusters.
• t: No of iterations.
• k << n
• t << n
• Evaluations of Clusters:
• Single Link
• Complete Link
• Average Link
r
• Centroid s
• Ward’s method
• Single Linkage - Nearest Neighbour
d
a b
e g
c
f
r
s
dist(r,s) = dist(b,e)
• Farthest Neighbour – Complete Linkage
d
a b
e g
c
f
r
s
dist(r,s) = dist(a,g)
• Average Linkage
d
a b
e g
c
f
r
s
d
a m b
e n g
c
f
r
s
dist(r,s) = dist(m,n)
• Ward’s method: Minimizes the total within-cluster variance.
a
𝑒1
𝑒2 b
𝑒3
c
• Example 6-3: Bottom-Up approach, Single-Linkage, Similarity more than 1.8
i 𝐗𝟏 𝐗𝟐
A 1.0 1.0
B 1.0 3.5
C 3.0 1.0
D 2.0 4.0
E 1.0 3.0
F 3.0 2.0
• Example 6-3:
A B C D E F
i 𝐗𝟏 𝐗𝟐 A 0 2.5 2.0 3.16 2.0 2.24
A 1.0 1.0
B 1.0 3.5 B - 0 3.24 1.12 0.5 2.5
C 3.0 1.0 C - - 0 3.16 2.83 1.0
D 2.0 4.0
D - - - 0 1.41 2.24
E 1.0 3.0
F 3.0 2.0 E - - - - 0 1.41
F - - - - - 0
• Example 6-3:
merge A B C D E F
i 𝐗𝟏 𝐗𝟐 A 0 2.5 2.0 3.16 2.0 2.24
A 1.0 1.0
B 1.0 3.5 B - 0 3.24 1.12 0.5 2.5
C 3.0 1.0 C - - 0 3.16 2.83 1.0
D 2.0 4.0
D - - - 0 1.41 2.24
E 1.0 3.0
F 3.0 2.0 E - - - - 0 1.41
F - - - - - 0
• Example 6-3:
A BE C D F
i 𝐗𝟏 𝐗𝟐 A 0 2.0 2.0 3.16 2.24
A 1.0 1.0
BE 1.0 3.25 BE - 0 2.83 1.12 1.41
C 3.0 1.0 C - - 0 3.16 1.0
D 2.0 4.0
D - - - 0 2.24
F 3.0 2.0
F - - - - 0
• Example 6-3:
A BE C D F
i 𝐗𝟏 𝐗𝟐 merge A 0 2.0 2.0 3.16 2.24
A 1.0 1.0
BE 1.0 3.25 BE - 0 2.83 1.12 1.41
C 3.0 1.0 C - - 0 3.16 1.0
D 2.0 4.0
D - - - 0 2.24
F 3.0 2.0
F - - - - 0
• Example 6-3:
i 𝐗𝟏 𝐗𝟐 A BE CF D
D - - - 0
• Example 6-3:
A BE CF D
i 𝐗𝟏 𝐗𝟐 merge
A 1.0 1.0 A 0 2.0 2.0 3.16
BE 1.0 3.25
CF 3.0 1.5 BE - 0 1.41 1.12
D 2.0 4.0 CF - - 0 2.24
D - - - 0
• Example 6-3:
i 𝐗𝟏 𝐗𝟐 A BDE CF
i 𝐗𝟏 𝐗𝟐 A BDE CF
1.8≥…
• Example 6-3: Dendrogram - AGglomerative NESting (AGNES)
2.0
1.5
1.0
0.5
A B E D C F
• Exercise: Solve the previous example with Complete-Link, Average-Link,
Centroid, and ward’s method.