ACCTG 6910, Spring 2003
DESB, University of Utah
Assignment 3 (3/27 – 4/8)
Question 1(50 points): Given the following transactions and minimum support - 50%
and minimum confidence - 80% large item sets, sequential patterns, rules, lifts,
recommend some management decisions
TID Brand_Item_bought
100 King’s-Crab, Sunset-Milk, Dairyland-Cheese, Best-Bread
200 Best-Cheese, Dairyland-Milk, Goldenfarm-Apple, Tasty-Pie, Wonder-Bread
300 Westcoast-Apple, Dairyland-Milk, Wonder-Bread, Tasty-Pie
400 Wonder-Bread, Sunset-Milk, Dairyland-Cheese
a) At the granularity of item without brand (e.g., “milk” and “bread”), please identify all
large itemsets using the Apriori algorithm. Be sure to include all steps in Apriori, i.e.,
Large (k-1)-itemset Candidate k-itemset (Join, Prune) Large k-itemset.
Step 1: Identify all large 1-itemsets
{Apple} 2/4 = 50%
{Bread} 4/4 = 100%
{Cheese} 3/4 = 75%
{Milk} 4/4 = 100%
{Pie} 2/4 = 50%
Step 2: Generate Candidate 2-itemsets by join
{Apple, Bread} {Apple, Cheese} {Apple, Milk} {Apple, Pie}
{Bread, Cheese} {Bread, Milk} {Bread, Pie}
{Cheese, Milk} {Cheese, Pie}
{Milk, Pie}
Step 3: Identify large 2-itemsets
{Apple, Bread} 2/4 = 50%
{Apple, Milk} 2/4 = 50%
{Apple, Pie} 2/4 = 50%
{Bread, Cheese} 3/4 = 75%
{Bread, Milk} 4/4 = 100%
{Bread, Pie} 2/4 = 50%
{Cheese, Milk} 3/4 = 75%
{Milk, Pie} 2/4 = 50%
Step 4: Generate candidate 3-itemsets by join
{Apple, Bread, Milk} {Apple, Bread, Pie} {Apple, Milk, Pie}
{Bread, Cheese, Milk} {Bread, Cheese, Pie} {Bread, Milk, Pie}
Step 5: Prune candidate 3-itemsets
{Apple, Bread, Milk} {Apple, Bread, Pie} {Apple, Milk, Pie}
{Bread, Cheese, Milk} {Bread, Milk, Pie}
{Bread, Cheese, Pie} is pruned because its subset {Cheese, Pie} is not large 2-
itemset.
Step 6: Identify Large 3-itemsets
{Apple, Bread, Milk} 2/4 = 50%
{Apple, Bread, Pie} 2/4 = 50%
{Apple, Milk, Pie} 2/4 = 50%
{Bread, Cheese, Milk} 3/4 = 75%
{Bread, Milk, Pie} 2/4 = 50%
Step 7: Generate candidate 4-itemsets by join
{Apple, Bread, Milk, Pie}
Step 8: prune candidate 4-itemsets
{Apple, Bread, Milk, Pie}
Step 9: Identify Large 4-itemsets
{Apple, Bread, Milk, Pie} 2/4 = 50%
b) At the granularity of brand-item (e.g., “Sunset-Milk” and “Wonder-Bread”), please
identify all large itemsets using the Apriori algorithm. Be sure to include all steps in
Apriori, i.e., Large (k-1)-itemset Candidate k-itemset (Join, Prune) Large k-
itemset.
Step 1: Identify all large 1-itemsets
{Dairyland-Cheese} 2/4 = 50%
{Dairyland-Milk} 2/4 = 50%
{Sunset-Milk} 2/4 = 50%
{Tasty-Pie} 2/4 = 50%
{Wonder-Bread} 3/4 = 75%
Step 2: Generate candidate 2-itemsets by join
{Dairyland-Cheese, Dairyland-Milk} {Dairyland-Cheese, Sunset-Milk}
{Dairyland-Cheese, Tasty-Pie} {Dairyland-Cheese, Wonder-Bread}
{Dairyland-Milk, Sunset-Milk} {Dairyland-Milk, Tasty-Pie}
{ Dairyland-Milk, Wonder-Bread} {Sunset-Milk, Tasty-Pie}
{Sunset-Milk, Wonder-Bread} {Tasty-Pie, Wonder-Bread }
Step 3: Identify large 2-itemsets
{Dairyland-Cheese, Sunset-Milk} 2/4 = 50%
{Dairyland-Milk, Tasty-Pie} 2/4 = 50%
{Dairyland-Milk, Wonder-Bread} 2/4 = 50%
{Tasty-Pie, Wonder-Bread} 2/4 = 50%
Step 4: Generate candidate 3-itemsets by join
{Dairyland-Milk, Tasty-Pie, Wonder-Bread}
Step 5: Prune candidate 3-itemsets
{Dairyland-Milk, Tasty-Pie, Wonder-Bread}
Step 6: Identify Large 3-itemsets
{Dairyland-Milk, Tasty-Pie, Wonder-Bread} 2/4 = 50%
c) Please list all association rules (i.e., association rules that meet minimum support and
minimum confidence requirements) derived from the itemsets you derived in b) and
their supports, confidences and lifts.
Dairyland-Cheese => Sunset-Milk
support = 50% confidence = 50%/50% = 100% lift = 100%/50% = 2
Sunset-Milk => Dairyland-Cheese
support = 50% confidence = 50%/50% = 100% lift = 100%/50% = 2
Dairyland-Milk => Tasty-Pie
support = 50% confidence = 50%/50% = 100% lift = 100%/50% = 2
Tasty-Pie => Dairyland-Milk
support = 50% confidence = 50%/50% = 100% lift = 100%/50% = 2
Dairyland-Milk => Wonder-Bread
support = 50% confidence = 50%/50% = 100% lift = 100%/75% = 1.33
Tasty-Pie => Wonder-Bread
support = 50% confidence = 50%/50% = 100% lift = 100%/75% = 1.33
Dairyland-Milk ∧ Tasty-Pie => Wonder-Bread
support = 50% confidence = 50%/50% = 100% lift = 100%/75% = 1.33
Dairyland-Milk ∧Wonder-Bread => Tasty-Pie
support = 50% confidence = 50%/50% = 100% lift = 100%/50% = 2
Tasty-Pie ∧Wonder-Bread => Dairyland-Milk
support = 50% confidence = 50%/50% = 100% lift = 100%/50% = 2
Dairyland-Milk => Tasty-Pie ∧Wonder-Bread
support = 50% confidence = 50%/50% = 100% lift = 100%/50% = 2
Tasty-Pie => Dairyland-Milk ∧Wonder-Bread
support = 50% confidence = 50%/50% = 100% lift = 100%/50% = 2
d) Please give one recommendation (e.g., store layout or promotion) to store
management based on the association rules and large item sets you discovered.
The store can put the Tasty-Pie and Wonder-Bread near the Dairyland-Milk to further
encourage the customer to buy them together.
Question 2 (25 points): Let the minimum support be 60% when you derive large
sequences from the following transaction database.
Customer ID Transaction ID Items
A 100 1,2
A 200 3,4
A 300 5,6
A 400 1,2
B 500 1
B 600 3
B 700 5
B 800 1
C 900 2
C 1000 4
C 1100 6
C 1200 2
a) Please identify all large sequencies using the Apriori algorithm. Be sure to include all
steps in Apriori, i.e., Large (k-1)-sequences Candidate k-sequencies (Join, Prune)
Large k-sequences.
Version 1 (no repetitive itemsets in sequences)
Step 1: Identify large 1-sequencies
<{1}> 2/3 = 66.67%
<{2}> 2/3 = 66.67%
<{3}> 2/3 = 66.67%
<{4}> 2/3 = 66.67%
<{5}> 2/3 = 66.67%
<{6}> 2/3 = 66.67%
Step 2: Generate candidate 2-sequencies by join
<{1}, {2}> <{2}, {1}> <{1}, {3}> <{3}, {1}>
<{1}, {4}> <{4}, {1}> <{1}, {5}> <{5}, {1}> <{1}, {6}> <{6}, {1}>
<{2}, {3}> <{3}, {2}> <{2}, {4}> <{4}, {2}>
<{2}, {5}> <{5}, {2}> <{2}, {6}> <{6}, {2}>
<{3}, {4}> <{4}, {3}> <{3}, {5}> <{5}, {3}>
<{3}, {6}> <{6}, {3}>
<{4}, {5}> <{5}, {4}> <{4}, {6}> <{6}, {4}>
<{5}, {6}> <{6}, {5}>
Step 3: Identify large 2-sequencies
<{1}, {3}> 2/3 = 66.67%
<{1}, {5}> 2/3 = 66.67%
<{2}, {4}> 2/3 = 66.67%
<{2}, {6}> 2/3 = 66.67%
<{3}, {1}> 2/3 = 66.67%
<{3}, {5}> 2/3 = 66.67%
<{4}, {2}> 2/3 = 66.67%
<{4}, {6}> 2/3 = 66.67%
<{5}, {1}> 2/3 = 66.67%
<{6}, {2}> 2/3 = 66.67%
Step 4: Generate candidate 3-sequencies by join
<{1}, {3}, {5}> <{1}, {5}, {3}>
<{2}, {4}, {6}> <{2}, {6}, {4}>
<{3}, {1}, {5}> <{3}, {5}, {1}>
<{4}, {2}, {6}> <{4}, {6}, {2}>
Step 4: Prune candidate 3-sequencies
<{1}, {3}, {5}>
<{2}, {4}, {6}>
<{3}, {1}, {5}> <{3}, {5}, {1}>
<{4}, {2}, {6}> <{4}, {6}, {2}>
Step 5: Identify large 3-sequencies
<{1}, {3}, {5}> 2/3 = 66.67%
<{2}, {4}, {6}> 2/3 = 66.67%
<{3}, {5}, {1}> 2/3 = 66.67%
<{4}, {6}, {2}> 2/3 = 66.67%
Step 6: Generate candidate 4-sequencies by join
no 4-sequence can be generated.
Version 2 (repetitive itemsets included in sequences)
Step 1: Identify large 1-sequencies
<{1}> 2/3 = 66.67%
<{2}> 2/3 = 66.67%
<{3}> 2/3 = 66.67%
<{4}> 2/3 = 66.67%
<{5}> 2/3 = 66.67%
<{6}> 2/3 = 66.67%
Step 2: Generate candidate 2-sequencies by join
<{1}, {1}> <{1}, {2}> <{2}, {1}> <{1}, {3}> <{3}, {1}>
<{1}, {4}> <{4}, {1}> <{1}, {5}> <{5}, {1}> <{1}, {6}> <{6}, {1}>
<{2}, {2}> <{2}, {3}> <{3}, {2}> <{2}, {4}> <{4}, {2}>
<{2}, {5}> <{5}, {2}> <{2}, {6}> <{6}, {2}>
<{3}, {3}> <{3}, {4}> <{4}, {3}> <{3}, {5}> <{5}, {3}>
<{3}, {6}> <{6}, {3}>
<{4}, {4}> <{4}, {5}> <{5}, {4}> <{4}, {6}> <{6}, {4}>
<{5}, {5}> <{5}, {6}> <{6}, {5}>
<{6}, {6}>
Step 3: Identify large 2-sequencies
<{1}, {1}> 2/3 = 66.67%
<{1}, {3}> 2/3 = 66.67%
<{1}, {5}> 2/3 = 66.67%
<{2}, {2}> 2/3 = 66.67%
<{2}, {4}> 2/3 = 66.67%
<{2}, {6}> 2/3 = 66.67%
<{3}, {1}> 2/3 = 66.67%
<{3}, {5}> 2/3 = 66.67%
<{4}, {2}> 2/3 = 66.67%
<{4}, {6}> 2/3 = 66.67%
<{5}, {1}> 2/3 = 66.67%
<{6}, {2}> 2/3 = 66.67%
Step 4: Generate candidate 3-sequencies by join
<{1}, {1}, {1}> <{1}, {1}, {3}> <{1}, {3}, {1}> <{1}, {3}, {3}>
<{1}, {1}, {5}> <{1}, {5}, {1}> <{1}, {5}, {5}>
<{1}, {3}, {5}> <{1}, {5}, {3}>
<{2}, {2}, {2}> <{2}, {2}, {4}> <{2}, {4}, {2}> <{2}, {4}, {4}>
<{2}, {2}, {6}> <{2}, {6}, {2}> <{2}, {6}, {6}>
<{2}, {4}, {6}> <{2}, {6}, {4}>
<{3}, {1}, {1}> <{3}, {1}, {5}> <{3}, {5}, {1}> <{3}, {5}, {5}>
<{4}, {2}, {2}> <{4}, {2}, {6}> <{4}, {6}, {2}> <{4}, {6}, {6}>
<{5}, {1}, {1}> <{6}, {2}, {2}>
Step 4: Prune candidate 3-sequencies
<{1}, {1}, {1}> <{1}, {1}, {3}> <{1}, {3}, {1}>
<{1}, {1}, {5}> <{1}, {5}, {1}>
<{1}, {3}, {5}>
<{2}, {2}, {4}> <{2}, {4}, {2}> <{2}, {2}, {6}> <{2}, {6}, {2}>
<{2}, {4}, {6}>
<{3}, {1}, {1}> <{3}, {1}, {5}> <{3}, {5}, {1}>
<{4}, {2}, {2}> <{4}, {2}, {6}> <{4}, {6}, {2}>
<{5}, {1}, {1}> <{6}, {2}, {2}>
Step 5: Identify large 3-sequencies
<{1}, {3}, {1}> 2/3 = 66.67%
<{1}, {3}, {5}> 2/3 = 66.67%
<{1}, {5}, {1}> 2/3 = 66.67%
<{2}, {4}, {2}> 2/3 = 66.67%
<{2}, {4}, {6}> 2/3 = 66.67%
<{2}, {6}, {2}> 2/3 = 66.67%
<{3}, {5}, {1}> 2/3 = 66.67%
<{4}, {6}, {2}> 2/3 = 66.67%
Step 6: Generate candidate 4-sequencies by join
<{1}, {3}, {1}, {1}> <{1}, {3}, {1}, {5}> <{1}, {3}, {5}, {1}> <{1}, {3}, {5}, {5}>
<{1}, {5}, {1}, {1}>
<{2}, {4}, {2}, {2}> <{2}, {4}, {2}, {6}> <{2}, {4}, {6}, {2}> <{2}, {4}, {6}, {6}>
<{2}, {6}, {2}, {2}>
<{3}, {5}, {1}, {1}>
<{4}, {6}, {2}, {2}>
Step 7: Prune candidate 4-sequencies
<{1}, {3}, {5}, {1}>
<{2}, {4}, {6}, {2}>
Step 8: Identify large 4-sequencies
<{1}, {3}, {5}, {1}> 2/3 = 66.67%
<{2}, {4}, {6}, {2}> 2/3 = 66.67%
Step 9: Generate candidate 5-sequencies
no 5-sequencies since the largest number of transactions of one customer is 4 in term of
the given dataset.
Question3 (25 points): Go to an ecommerce web site such as [Link] or [Link].
Discover and describe one application of the use association rules or sequential patterns.
Please comment on whether it is effective or needs improvement.
In [Link], when you are looking at description of a book, it also provides
you the information about the books that the customers who bought this book also
bought, the title that the customers are interested in may also be interested in, and the
customers who bought this book may also buy the books by other authors. This correlated
information about the book you are going to buy is provided by association rules, which
are mined from the past sales transactions. It is effective if [Link] wants to
recommend relevant books to the customer who is going to buy a book of certain topic.
However, we do not know if the [Link] sort the associated books according to the
support, confidence or lift, which may be helpful for the customer to locate the books
they really need efficiently.