Apriori Algorithm Implementation using
Python
1. Introduction
The Apriori algorithm is one of the most widely used algorithms for Association Rule
Mining. It is primarily used in Market Basket Analysis to identify sets of items that
frequently co-occur in transactions. It works on the principle of 'downward closure': if an
itemset is frequent, then all of its subsets must also be frequent.
2. Important Terms
1. Transaction → A collection of items (e.g., a shopping cart). Example: {milk, bread, butter}
2. Support → Frequency of occurrence of an itemset. Support(A) = (Transactions containing
A) / (Total transactions)
3. Frequent Itemset → An itemset whose support ≥ min_support.
4. Candidate Itemset → Potential itemsets generated in each step to check frequency.
3. Steps in Apriori Algorithm
Step 1: Generate 1-itemsets (single items with support ≥ min_support).
Step 2: Generate candidate 2-itemsets from frequent 1-itemsets.
Step 3: Calculate support of candidate 2-itemsets.
Step 4: Keep only frequent ones.
Step 5: Repeat for 3-itemsets, 4-itemsets, ... until no more frequent itemsets remain.
4. Python Implementation
from itertools import combinations
def generate_candidates(freq_itemsets, k):
candidates = []
freq_itemsets = list(freq_itemsets)
for i in range(len(freq_itemsets)):
for j in range(i + 1, len(freq_itemsets)):
union = freq_itemsets[i].union(freq_itemsets[j])
if len(union) == k:
candidates.append(union)
return candidates
def calculate_support(transactions, candidates, min_support):
freq_itemsets = {}
for candidate in candidates:
count = 0
for transaction in transactions:
if candidate.issubset(transaction):
count += 1
support = count / len(transactions)
if support >= min_support:
freq_itemsets[frozenset(candidate)] = support
return freq_itemsets
def apriori(transactions, min_support=0.5):
items = set()
for transaction in transactions:
for item in transaction:
items.add(frozenset([item]))
freq_itemsets = calculate_support(transactions, items, min_support)
all_freq_itemsets = dict(freq_itemsets)
k=2
while freq_itemsets:
candidates = generate_candidates(list(freq_itemsets.keys()), k)
freq_itemsets = calculate_support(transactions, candidates, min_support)
all_freq_itemsets.update(freq_itemsets)
k += 1
return all_freq_itemsets
transactions = [
{'milk', 'bread', 'butter'},
{'bread', 'butter'},
{'milk', 'bread'},
{'milk', 'bread', 'butter'},
{'bread', 'butter'}
]
min_support = 0.5
freq_itemsets = apriori(transactions, min_support)
print("Frequent Itemsets with support ≥", min_support)
for itemset, support in freq_itemsets.items():
print(set(itemset), "=>", round(support, 2))
5. Example Execution
Dataset:
T1: {milk, bread, butter}
T2: {bread, butter}
T3: {milk, bread}
T4: {milk, bread, butter}
T5: {bread, butter}
Step 1 → 1-itemsets
{milk} = 3/5 = 0.6 (Frequent)
{bread} = 5/5 = 1.0 (Frequent)
{butter} = 4/5 = 0.8 (Frequent)
Step 2 → 2-itemsets
{milk, bread} = 3/5 = 0.6 (Frequent)
{milk, butter} = 2/5 = 0.4 (Not Frequent)
{bread, butter} = 4/5 = 0.8 (Frequent)
Step 3 → 3-itemsets
{milk, bread, butter} = 3/5 = 0.6 (Frequent)
6. Final Output
Frequent Itemsets with support ≥ 0.5
{'bread'} => 1.0
{'butter'} => 0.8
{'milk'} => 0.6
{'bread', 'milk'} => 0.6
{'bread', 'butter'} => 0.8
{'milk', 'bread', 'butter'} => 0.6
7. Applications of Apriori
1. Market basket analysis (finding items bought together).
2. Recommendation systems (Amazon, Flipkart).
3. Web usage mining.
4. Bioinformatics (gene sequence analysis).