Counting frequent item sets in a
stream
unit4/Counting frequent item sets in a
1
stream
A-Priori Algorithm – (1)
• A two-pass approach called
A-Priori limits the need for
main memory
• Key idea: monotonicity
– If a set of items I appears at
least s times, so does every subset J of I
• Contrapositive for pairs:
If item i does not appear in s baskets, then no
pair including i can appear in s baskets
• So, how does A-Priori find freq. pairs?
unit4/Counting frequent item sets in a
2
stream
A-Priori Algorithm – (2)
unit4/Counting frequent item sets in a
3
stream
Main-Memory: Picture of A-Priori
Frequent items
Item counts
Main memory
Counts of
pairs of
frequent items
(candidate
pairs)
Pass 1 Pass 2
unit4/Counting frequent item sets in a
4
stream
Detail for A-Priori
• You can use the
triangular matrix method
with n = number of Item counts Frequent Old
item
frequent items items
#s
– May save space compared
Counts of pairs
Main memory
with storing triples of frequent
Counts of
• Trick: re-number itemsof
pairs
frequent items
frequent items 1,2,… and
keep a table relating new
numbers to original item Pass 1 Pass 2
numbers
unit4/Counting frequent item sets in a
5
stream
Frequent Triples, Etc.
• For each k, we construct two sets of
k-tuples (sets of size k):
– Ck = candidate k-tuples = those that might be
frequent sets (support > s) based on information
from the pass for k–1
– Lk = the set of truly frequent k-tuples
Count All pairs Count
All of items To be
the items the pairs explained
items from L1
C1 Filter L1 Construct C2 Filter L2 Construct C3
unit4/Counting frequent item sets in a
6
stream