0% found this document useful (0 votes)

22 views6 pages

Solution For Assignment 3

B1.1

Uploaded by

tn475243

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views6 pages

Solution For Assignment 3

B1.1

Uploaded by

tn475243

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data Mining Assignment

Association Analysis (Solution)

University of Transportation Ho Chi Minh City

August 21, 2025

Instructions
Please answer the following questions clearly and show all your work. For questions requiring
calculations, define the formulas you are using before applying them.

Question 1
Consider the data set shown in Table 6.22.

Table 1: Example of market basket transactions.

Customer ID Transaction ID Items Bought
1 0001 {a, d, e}
1 0024 {a, b, c, e}
2 0012 {a, b, d, e}
2 0031 {a, c, d, e}
3 0015 {b, c, e}
3 0022 {b, d, e}
4 0029 {c, d}
4 0040 {a, b, c}
5 0033 {a, d, e}
5 0038 {a, b, e}

a) Compute the support for itemsets {e}, {b, d}, and {b, d, e} by treating each
transaction ID as a market basket.
Formula: The support of an itemset X is the fraction of transactions that contain X.
Number of transactions containing X
s(X) =
Total number of transactions
There are a total of 10 transactions.

• The itemset {e} appears in 8 transactions (0001, 0024, 0012, 0031, 0015, 0022, 0033,
0038).
8
s({e}) = = 0.8
10
• The itemset {b, d} appears in 2 transactions (0012, 0022).
2
s({b, d}) = = 0.2
10

1
• The itemset {b, d, e} appears in 2 transactions (0012, 0022).
2
s({b, d, e}) = = 0.2
10
b) Use the results in part (a) to compute the confidence for the association rules
{b, d} → {e} and {e} → {b, d}. Is confidence a symmetric measure?
Formula: The confidence of a rule X → Y is the conditional probability of seeing Y
given that we have seen X.
s(X ∪ Y )
c(X → Y ) =
s(X)
• For the rule {b, d} → {e}:
s({b, d, e}) 0.2
c({b, d} → {e}) = = = 1.0 (100%)
s({b, d}) 0.2

• For the rule {e} → {b, d}:

s({b, d, e}) 0.2
c({e} → {b, d}) = = = 0.25 (25%)
s({e}) 0.8

No, confidence is not a symmetric measure. As shown by the calculations, c({b, d} →

{e}) ̸= c({e} → {b, d}).

c) Repeat part (a) by treating each customer ID as a market basket.

First, we create a market basket for each customer, containing the union of all items they
bought. There are 5 customers.

• Customer 1: {a, b, c, d, e}
• Customer 2: {a, b, c, d, e}
• Customer 3: {b, c, d, e}
• Customer 4: {a, b, c, d}
• Customer 5: {a, b, d, e}

• The itemset {e} appears in 4 customer baskets (1, 2, 3, 5).

4
s({e}) = = 0.8
5
• The itemset {b, d} appears in all 5 customer baskets.
5
s({b, d}) = = 1.0
5
• The itemset {b, d, e} appears in 4 customer baskets (1, 2, 3, 5).
4
s({b, d, e}) = = 0.8
5
d) Use the results in part (c) to compute the confidence for the association rules
{b, d} → {e} and {e} → {b, d}.

• For the rule {b, d} → {e}:

s({b, d, e}) 0.8
c({b, d} → {e}) = = = 0.8 (80%)
s({b, d}) 1.0

2
• For the rule {e} → {b, d}:

s({b, d, e}) 0.8

c({e} → {b, d}) = = = 1.0 (100%)
s({e}) 0.8

e) Discuss whether there are any relationships between s1 and c1 (transaction-

based) or s2 and c2 (customer-based).
There are no apparent or guaranteed relationships between (s1 , c1 ) and (s2 , c2 ). Aggre-
gating transactions by customer fundamentally changes the dataset by altering the total
number of baskets and the composition of each basket. This transformation can cause
support and confidence values to increase, decrease, or stay the same in unpredictable
ways, depending on the purchasing patterns of the customers.

Question 2
Suppose the Apriori algorithm is applied to the data set shown below with minsup = 30% (3
transactions).

Table 2: Example of market basket transactions.

Transaction ID Items Bought
1 {a, b, d}
2 {b, c, d}
3 {a, b, d, e}
4 {a, c, d, e}
5 {b, d, e}
6 {c, d}
7 {a, b, c}
8 {a, d, e}
9 {a, c, d}
10 {b, d}

a) Draw an itemset lattice representing the data set. Label each node with F
(Frequent), I (Infrequent after counting), or N (Not a candidate/Pruned).
Step 1: Find Frequent 1-Itemsets (L1 )

• {a}: 6, {b}: 6, {c}: 5, {d}: 9, {e}: 4. All are frequent.

• L1 = {{a}, {b}, {c}, {d}, {e}}

Step 2: Generate and Prune 2-Itemsets (C2 → L2 )

• Candidates C2 : {ab}, {ac}, {ad}, {ae}, {bc}, {bd}, {be}, {cd}, {ce}, {de}
• Count supports: {ab}:3, {ac}:3, {ad}:5, {ae}:3, {bc}:3, {bd}:6, {be}:3, {cd}:5,
{ce}:2, {de}:4
• L2 = {{a, b}, {a, c}, {a, d}, {a, e}, {b, c}, {b, d}, {b, e}, {c, d}, {d, e}}. Itemset {c,e} is
infrequent.

Step 3: Generate and Prune 3-Itemsets (C3 → L3 )

3
• Candidates generated from L2 : {abc}, {abd}, {abe}, {acd}, {ace}, {ade}, {bcd},
{bce}, {bde}, {cde}
• Pruning step: {ace} is pruned because its subset {ce} is not in L2 . {bce} is pruned
because its subset {ce} is not in L2 .
• Remaining candidates to count: {abc}, {abd}, {abe}, {acd}, {ade}, {bcd}, {bde}
• Count supports: {abc}:1, {abd}:3, {abe}:1, {acd}:3, {ade}:3, {bcd}:2, {bde}:3
• L3 = {{a, b, d}, {a, c, d}, {a, d, e}, {b, d, e}}.

Step 4: Generate and Prune 4-Itemsets (C4 → L4 )

• Candidate generated from L3 : {abde} from joining {abd} and {abe} (not possible).
Join {abd} and {acd} (not possible). Join {ade} and {bde}. Candidate: {abde}.
• Join {acd} and {ade}. Candidate: {acde}.
• Let’s join {a,b,d} and {a,c,d}. Candidate {a,b,c,d}. Subsets: {abc}(I), {abd}(F),
{acd}(F), {bcd}(I). Pruned.
• Join {a,b,d} and {a,d,e}. Candidate {a,b,d,e}. Subsets: {abd}(F), {abe}(I), {ade}(F),
{bde}(F). Pruned.
• Join {a,c,d} and {a,d,e}. Candidate {a,c,d,e}. Subsets: {acd}(F), {ace}(I), {ade}(F),
{cde}(I). Pruned.
• No frequent 4-itemsets are found.

null
F

A B C D E
F F F F F

BD AE AD AB AC BC CD CE DE BE
F F F F I F F I F F

BCE ACD ABE ABD ABC BCD CDE BDE ACE ADE
N N I I N I N F N F

ABDE ABCD ABCE BCDE ACDE

N N N N N

ABCDE
N

b) What is the percentage of frequent itemsets (with respect to all itemsets in

the lattice)?

4
There are 25 = 32 total itemsets in the lattice. Counting the nodes labeled ’F’, there are
19 frequent itemsets (including the null set).
19
Percentage of frequent itemsets = = 59.375%
32

c) What is the pruning ratio of the Apriori algorithm on this data set?
Pruning ratio is the percentage of itemsets not considered because they were pruned during
candidate generation. These are the nodes labeled ’N’. There are 9 such nodes.
9
Pruning ratio = = 28.125%
32

d) What is the false alarm rate?

The false alarm rate is the percentage of candidate itemsets that are found to be infrequent
after support counting. These are the nodes labeled ’I’. There are 4 such nodes.
4
False alarm rate = = 12.5%
32

Question 3
The Apriori algorithm uses a hash tree data structure to efficiently count the support of candi-
date itemsets. Consider the hash tree for candidate 3-itemsets shown in Figure 6.2.

a) Given a transaction that contains items {1, 3, 4, 5, 8}, which of the hash tree
leaf nodes will be visited when finding the candidates of the transaction?
To find the candidate 3-itemsets contained within the transaction {1, 3, 4, 5, 8}, we must
generate all possible 3-item subsets from the transaction and traverse the hash tree for
each one. The subsets are: {1,3,4}, {1,3,5}, {1,3,8}, {1,4,5}, {1,4,8}, {1,5,8}, {3,4,5},
{3,4,8}, {3,5,8}, {4,5,8}.
By tracing each of these subsets through the hash tree structure provided in the problem,
we find that the leaf nodes visited are L1, L3, L5, L9, and L11.

b) Use the visited leaf nodes in part (a) to determine the candidate itemsets that
are contained in the transaction {1, 3, 4, 5, 8}.
We now check which candidate itemsets are stored in the visited leaf nodes.

• L1 contains {1,4,5}
• L5 contains {1,5,8}
• L9 contains {4,5,8}

Therefore, the candidates contained in the transaction are {1, 4, 5}, {1, 5, 8}, and {4,
5, 8}.

Question 4
Answer the following questions using the data sets shown in Figure 6.6. We will apply the Apriori
algorithm to extract frequent itemsets with minsup = 10% (i.e., itemsets must be contained in
at least 1000 transactions).

5
a) Which data set(s) will produce the most number of frequent itemsets?
Answer: Data set (e). This dataset has dense vertical patterns, indicating that many
items are frequently purchased together across many transactions. This high correlation
will lead to the generation of a very long frequent itemset along with all of its subsets,
resulting in the largest total number of frequent itemsets.

b) Which data set(s) will produce the fewest number of frequent itemsets?
Answer: Data set (d). This dataset appears to have very sparse and random-looking
patterns. It is unlikely that any combination of items will meet the 10% support threshold,
meaning it will likely produce no frequent itemsets beyond the 1-itemsets (if any).

c) Which data set(s) will produce the longest frequent itemset?

Answer: Data set (e). For the same reason as in part (a), the strong vertical correla-
tions in this dataset suggest that a large number of items are consistently bought together,
which will form a single, very long frequent itemset.

d) Which data set(s) will produce frequent itemsets with highest maximum sup-
port?
Answer: Data set (b). This dataset shows some items that appear in very long hor-
izontal blocks, meaning these specific items are present in a large, contiguous block of
transactions. This structure will lead to certain items having a very high support count,
likely the highest maximum support among all datasets.

e) Which data set(s) will produce frequent itemsets containing items with wide-
varying support levels?
Answer: Data set (e). This dataset shows some items appearing very frequently
(dense vertical bars) while others appear much less frequently (sparse dots). This mix
of common and rare items will result in frequent itemsets containing items with a wide
range of individual support levels (e.g., from less than 20% to more than 70%).

Association Rules & Clustering Techniques
No ratings yet
Association Rules & Clustering Techniques
13 pages
Chapter 5 Mining Frequent Pattern-DWM
No ratings yet
Chapter 5 Mining Frequent Pattern-DWM
48 pages
Mod 5
No ratings yet
Mod 5
56 pages
Unit 2-2
No ratings yet
Unit 2-2
53 pages
Data Mining 2, 3 Material
No ratings yet
Data Mining 2, 3 Material
173 pages
Mining Frequent Itemsets and Rules
No ratings yet
Mining Frequent Itemsets and Rules
27 pages
Equent Patterns
No ratings yet
Equent Patterns
74 pages
Data Mining for Business Insights
No ratings yet
Data Mining for Business Insights
7 pages
Understanding Market Basket Analysis
No ratings yet
Understanding Market Basket Analysis
5 pages
Efficient Algorithm for Closed Itemsets
No ratings yet
Efficient Algorithm for Closed Itemsets
8 pages
Data Analytics Unit 4
No ratings yet
Data Analytics Unit 4
22 pages
Association Rule
No ratings yet
Association Rule
5 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
14 pages
Module 2
No ratings yet
Module 2
13 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
12 pages
Data Mining: Frequent Itemsets & Clustering
No ratings yet
Data Mining: Frequent Itemsets & Clustering
152 pages
Assignment 3 Aim: Association Rule Mining Using Apriori Algorithm. Objectives
No ratings yet
Assignment 3 Aim: Association Rule Mining Using Apriori Algorithm. Objectives
7 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
Unit 4
No ratings yet
Unit 4
21 pages
Association Analysis: Unit-V
No ratings yet
Association Analysis: Unit-V
12 pages
DMT Unit-IV - UR20 - New
No ratings yet
DMT Unit-IV - UR20 - New
62 pages
FDS Unit - 3
No ratings yet
FDS Unit - 3
10 pages
Frequent Pattern Mining Analysis 2013
No ratings yet
Frequent Pattern Mining Analysis 2013
9 pages
CH 03 Frequent Pattern Mining 2021
No ratings yet
CH 03 Frequent Pattern Mining 2021
62 pages
DWDM Assocaition
No ratings yet
DWDM Assocaition
17 pages
Enhancing Apriori Algorithm Efficiency
No ratings yet
Enhancing Apriori Algorithm Efficiency
27 pages
Apriori Algorithm Examples
No ratings yet
Apriori Algorithm Examples
45 pages
Association Rule Mapping - Unit-4
No ratings yet
Association Rule Mapping - Unit-4
11 pages
DMWH Unit3 Colg2
No ratings yet
DMWH Unit3 Colg2
6 pages
DWM Unit 4
No ratings yet
DWM Unit 4
11 pages
Analysis of Retrospot Cake Cases Sales
No ratings yet
Analysis of Retrospot Cake Cases Sales
29 pages
Unsupervised Learning Essentials
No ratings yet
Unsupervised Learning Essentials
64 pages
An Efficient Algorithm For Mining
No ratings yet
An Efficient Algorithm For Mining
6 pages
Frequent Pattern Mining Overview
No ratings yet
Frequent Pattern Mining Overview
14 pages
L2: Frequent Itemsets Mining and Association Rules
No ratings yet
L2: Frequent Itemsets Mining and Association Rules
54 pages
Experiment: 3: Aim: Theory
No ratings yet
Experiment: 3: Aim: Theory
16 pages
Unit 5 Notes DWM
No ratings yet
Unit 5 Notes DWM
11 pages
Association Rule Mining Techniques
No ratings yet
Association Rule Mining Techniques
7 pages
Unit 4
No ratings yet
Unit 4
97 pages
Association Rules
No ratings yet
Association Rules
58 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
HW5 Redina
No ratings yet
HW5 Redina
6 pages
Apriori Algorithm in Association Analysis
No ratings yet
Apriori Algorithm in Association Analysis
32 pages
CS8075 DWDM Unit 3
No ratings yet
CS8075 DWDM Unit 3
44 pages
Association Rule Mining Overview
No ratings yet
Association Rule Mining Overview
19 pages
Assignment 2
No ratings yet
Assignment 2
13 pages
Ariori DHP
No ratings yet
Ariori DHP
53 pages
Frequent Itemsets
No ratings yet
Frequent Itemsets
11 pages
DWM Unit-4 Sem Ans
No ratings yet
DWM Unit-4 Sem Ans
9 pages
Unit 4
No ratings yet
Unit 4
113 pages
Data Mining for IT Students
No ratings yet
Data Mining for IT Students
31 pages
Unit Iii (DWDM)
No ratings yet
Unit Iii (DWDM)
11 pages
Market Basket Analysis with Association Rules
No ratings yet
Market Basket Analysis with Association Rules
54 pages
Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
Unit 5 Mining Frequent Patterns and Cluster Analysis
No ratings yet
Unit 5 Mining Frequent Patterns and Cluster Analysis
63 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
HW6 Redina
No ratings yet
HW6 Redina
7 pages
19 Dynamic Programming
No ratings yet
19 Dynamic Programming
11 pages
Memory-Bounded Heuristic Search Guide
No ratings yet
Memory-Bounded Heuristic Search Guide
13 pages
DAA Assign 3
No ratings yet
DAA Assign 3
7 pages
AD3271 Lab Manual Overview
No ratings yet
AD3271 Lab Manual Overview
43 pages
Python Record With Front Page and Result
No ratings yet
Python Record With Front Page and Result
41 pages
ADA Lab Manual (Replica)
No ratings yet
ADA Lab Manual (Replica)
26 pages
Bisection Method Algorithm & Example-1 F (X) X 3-X-1
No ratings yet
Bisection Method Algorithm & Example-1 F (X) X 3-X-1
5 pages
CSE3004 - DESIGN-AND-ANALYSIS-OF-ALGORITHMS - LT - 2.0 - 36 - Design and Analysis of Algorithms
No ratings yet
CSE3004 - DESIGN-AND-ANALYSIS-OF-ALGORITHMS - LT - 2.0 - 36 - Design and Analysis of Algorithms
2 pages
Pseudocode Questions and Answers
No ratings yet
Pseudocode Questions and Answers
6 pages
Algorithms Design and Analysis Lab
0% (1)
Algorithms Design and Analysis Lab
26 pages
B Tree
No ratings yet
B Tree
58 pages
2 Simplex Exercises
No ratings yet
2 Simplex Exercises
5 pages
Sliding Window Notes
No ratings yet
Sliding Window Notes
19 pages
Computational Thinking Searching and Sorting Algorithms VythqRmDYxPxsQt9
No ratings yet
Computational Thinking Searching and Sorting Algorithms VythqRmDYxPxsQt9
15 pages
Al Lab Manual Final
No ratings yet
Al Lab Manual Final
47 pages
Ch8 Storage Indexing Overview-95
No ratings yet
Ch8 Storage Indexing Overview-95
25 pages
Separable Programming in Nonlinear Optimization
0% (1)
Separable Programming in Nonlinear Optimization
10 pages
Question For CS102 End Sem Exam July 2021
No ratings yet
Question For CS102 End Sem Exam July 2021
2 pages
Index: S.no. Program Name No. Remarks
No ratings yet
Index: S.no. Program Name No. Remarks
3 pages
Algorithm Quiz Review
No ratings yet
Algorithm Quiz Review
4 pages
Lol PDF
No ratings yet
Lol PDF
5 pages
Daa QB Short
No ratings yet
Daa QB Short
11 pages
159.201 Algorithms and Data Structures - Massey - Exam - I10 - 1201
No ratings yet
159.201 Algorithms and Data Structures - Massey - Exam - I10 - 1201
9 pages
Bresenham Line Drawing Algorithm
No ratings yet
Bresenham Line Drawing Algorithm
8 pages
SOLUTION - NSM UT-I Que Paper With BT Level - Final AY 2025-26 - For Students
No ratings yet
SOLUTION - NSM UT-I Que Paper With BT Level - Final AY 2025-26 - For Students
13 pages
CS 482 Data Structures Exam June 2021
No ratings yet
CS 482 Data Structures Exam June 2021
4 pages
Uniform Cost Search - Bidirectional Algorithms
No ratings yet
Uniform Cost Search - Bidirectional Algorithms
14 pages
VTU 3rd Sem DSA Lab Manual
No ratings yet
VTU 3rd Sem DSA Lab Manual
44 pages
Arrays in C
No ratings yet
Arrays in C
49 pages
C++ Stacks & Queues Guide
No ratings yet
C++ Stacks & Queues Guide
64 pages

Solution For Assignment 3

Uploaded by

Solution For Assignment 3

Uploaded by

Data Mining Assignment

Association Analysis (Solution)

University of Transportation Ho Chi Minh City

August 21, 2025

Table 1: Example of market basket transactions.

• For the rule {e} → {b, d}:

No, confidence is not a symmetric measure. As shown by the calculations, c({b, d} →

c) Repeat part (a) by treating each customer ID as a market basket.

• The itemset {e} appears in 4 customer baskets (1, 2, 3, 5).

• For the rule {b, d} → {e}:

s({b, d, e}) 0.8

e) Discuss whether there are any relationships between s1 and c1 (transaction-

Table 2: Example of market basket transactions.

• {a}: 6, {b}: 6, {c}: 5, {d}: 9, {e}: 4. All are frequent.

Step 2: Generate and Prune 2-Itemsets (C2 → L2 )

Step 3: Generate and Prune 3-Itemsets (C3 → L3 )

Step 4: Generate and Prune 4-Itemsets (C4 → L4 )

ABDE ABCD ABCE BCDE ACDE

b) What is the percentage of frequent itemsets (with respect to all itemsets in

d) What is the false alarm rate?

c) Which data set(s) will produce the longest frequent itemset?

You might also like