MSApriori Algorithm Steps:
1. Define Minimum Support (MS) for Each Item:
o Unlike Apriori, where all items share a single minimum support, MSApriori
assigns different minimum support values to different items.
2. Sort Items by MIS (Minimum Item Support):
o Items are sorted in increasing order of their specified Minimum Item
Support (MIS) values.
3. Generate Frequent 1-itemsets:
o The first item in the sorted list is included if its support meets or exceeds its
MIS.
o Additional items are included only if their support meets their own MIS and
the support of previous items.
4. Candidate Generation (Tailored Join & Prune Steps):
o The join step ensures that candidates are formed only if they contain items
with sufficient support.
o The prune step removes itemsets where any subset does not meet the MIS
constraint.
5. Count Itemsets and Generate Frequent Itemsets:
o Only those candidates that meet the required frequency are retained.
We need to construct a case where a (k-1)-item subset is infrequent, causing pruning in the
MSApriori algorithm.
Dataset
TID Items
1 A, B, C
2 A, C, D
3 B, C, D
4 A, B
5 B, D
MIS Values
Item MIS (%)
A 40% (2 transactions)
B 50% (3 transactions)
C 30% (2 transactions)
Item MIS (%)
D 20% (1 transaction)
Step 1: Support Counts for Each Item
Item Support Count
A 3
B 4
C 3
D 3
✅ All items meet their MIS values, so they are included in L1 (frequent 1-itemsets).
Step 2: Generate Candidate 2-Itemsets
Since the MIS-sorted order is {D, C, A, B}, we generate frequent 2-itemsets:
Itemset Support Count Required MIS (min) Meets MIS?
{D, C} 2 min(MIS(D), MIS(C)) = 20% (1 transaction) ✅ Yes
{D, A} 1 min(MIS(D), MIS(A)) = 20% (1 transaction) ❌ No
{D, B} 2 min(MIS(D), MIS(B)) = 20% (1 transaction) ✅ Yes
{C, A} 2 min(MIS(C), MIS(A)) = 30% (2 transactions) ✅ Yes
{C, B} 3 min(MIS(C), MIS(B)) = 30% (2 transactions) ✅ Yes
{A, B} 2 min(MIS(A), MIS(B)) = 40% (2 transactions) ✅ Yes
🚨 {D, A} is infrequent, so it is NOT included in L2.
✅ Frequent 2-itemsets:
{D, C}, {D, B}, {C, A}, {C, B}, {A, B}
Step 3: Generate Candidate 3-Itemsets
Using L2, we generate:
{D, C, B}
{C, A, B}
Now, let’s check pruning.
Step 4: Pruning Check for {D, C, B}
To keep {D, C, B}, all its 2-item subsets must be frequent:
1. {D, C} → Support = 2 ✅
2. {D, B} → Support = 2 ✅
3. {C, B} → Support = 3 ✅
✅ All subsets are frequent, so {D, C, B} is kept.
Step 5: Pruning Check for {C, A, B}
To keep {C, A, B}, all its 2-item subsets must be frequent:
1. {C, A} → Support = 2 ✅
2. {C, B} → Support = 3 ✅
3. {A, B} → Support = 2` ✅
✅ All subsets are frequent, so {C, A, B} is kept.
Step 6: Generate Candidate 4-Itemset
Now, we generate {D, C, A, B}.
To be valid, all 3-item subsets must be frequent:
Subset Support Count Required MIS (min) Meets MIS?
{D, C, B} 2 ✅ Yes
{D, C, A} 1 ❌ No
{D, A, B} 1 ❌ No
{C, A, B} 2 ✅ Yes
🚨 Since {D, C, A} and {D, A, B} are not frequent, {D, C, A, B} is pruned!
✅ Final Frequent Itemsets
✅ {D, C, B}
✅ {C, A, B}
❌ {D, C, A, B} (pruned because {D, C, A} and {D, A, B} are infrequent)
Key Takeaways
🔹 Pruning happens when a candidate k-itemset contains an infrequent (k-1)-item subset.
🔹 Here, {D, C, A, B} was pruned because {D, C, A} and {D, A, B} were infrequent.
🔹 This pruning helps reduce unnecessary computations and improve efficiency.
🔹 How Are Association Rules Formed?
Association rules are extracted from frequent itemsets by dividing them into antecedents
(LHS) and consequents (RHS) and evaluating their confidence.
A rule is valid if:
Confidence=Support(X∪Y)Support(X)≥MinConfidence\text{Confidence} = \frac{\text{Support}(X \cup
Y)}{\text{Support}(X)} \geq \text{MinConfidence}Confidence=Support(X)Support(X∪Y)
≥MinConfidence
where:
Support(X ∪ Y) = frequency of the whole itemset
X→YX \rightarrow YX→Y is a candidate rule
Support(X) = frequency of the antecedent
MinConfidence = user-defined threshold (e.g., 60%)
🔹 Example: Generating Rules from {C, A, B}
Step 1: Consider the frequent itemset {C, A, B}
From our example, Support({C, A, B}) = 2.
We generate possible association rules by splitting {C, A, B}:
Rule Confidence Calculation
{C, A} → B Conf = Support({C, A, B}) / Support({C, A}) = 2 / 2 = 100% ✅
Rule Confidence Calculation
{C, B} → A Conf = Support({C, A, B}) / Support({C, B}) = 2 / 3 = 66.67% ✅
{A, B} → C Conf = Support({C, A, B}) / Support({A, B}) = 2 / 2 = 100% ✅
{C} → {A, B} Conf = Support({C, A, B}) / Support({C}) = 2 / 3 = 66.67% ✅
Rules that meet minConfidence (e.g., 60%) are accepted.
🔹 Example: Generating Rules from {D, C, B}
From our example, Support({D, C, B}) = 2.
Possible association rules:
Rule Confidence Calculation
{D, C} → B Conf = Support({D, C, B}) / Support({D, C}) = 2 / 2 = 100% ✅
{D, B} → C Conf = Support({D, C, B}) / Support({D, B}) = 2 / 2 = 100% ✅
{C, B} → D Conf = Support({D, C, B}) / Support({C, B}) = 2 / 3 = 66.67% ✅
{D} → {C, B} Conf = Support({D, C, B}) / Support({D}) = 2 / 3 = 66.67% ✅
Again, rules that meet minConfidence are accepted.
🔹 Final Output: Association Rules
After calculating confidence, we only keep rules where confidence ≥ MinConfidence (e.g.,
60%).
✅ Final valid rules:
1. {C, A} → B (100%)
2. {C, B} → A (66.67%)
3. {A, B} → C (100%)
4. {C} → {A, B} (66.67%)
5. {D, C} → B (100%)
6. {D, B} → C (100%)
7. {C, B} → D (66.67%)
8. {D} → {C, B} (66.67%)