Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm

This document discusses Association Rule Mining and the Apriori Algorithm, which are essential techniques in data mining for discovering relationships between categorical data. It explains the concept of frequent patterns, the process of generating association rules, and the challenges faced in frequent itemset mining. Additionally, it outlines methods to improve the efficiency of the Apriori algorithm, including reducing database scans and candidate numbers.

Uploaded by

kritikamishra4000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views24 pages

Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm

Uploaded by

kritikamishra4000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data Mining and Predictive

Modeling
Lecture 9: Association Rule Mining, Apriori Algorithm
Association Rule Mining
• It is an important data mining model studied extensively by the
database and data mining community.
• Assume all data are categorical.
• Initially used for market basket analysis to find how items purchased
by customers are related.

Bread → Milk [sup = 5%, conf = 100%]

Frequent Pattern Analysis
• Frequent pattern: a pattern ( a set of items, subsequences,
substructures, etc.) that occurs frequently in a data set.
• First proposed by Agrawal, Imielinski, and Swami [AIS93] in the
context of frequent itemsets and association rule mining.
• Motivation: Finding inherent regularities in data
• What products were often purchased together?
• What are the subsequent purchases after buying a PC?
• Can we automatically classify web documents?
• Applications:
• Basket data analysis, cross-marketing, catalog design, sale campaign analysis
Market Basket Analysis
Why Frequent Pattern Mining?
• Freq. pattern: An intrinsic and important property of datasets
• Foundation for many essential data mining tasks
• Association, correlation, and causality analysis
• Sequential, structural (e.g., sub-graph) patterns
• Pattern analysis in spatiotemporal, multimedia, time-series, and
stream data
• Classification: frequent pattern analysis
• Cluster analysis: frequent pattern-based clustering
• Data warehousing: iceberg cube and cube-gradient
• Semantic data compression
• Broad applications
Basing Concepts: Frequent Patterns

• I = {i1, i2, …, im}: a set of items.

• Transaction t :
• t a set of items, and t  I.
• Transaction Database T: a set of transactions T = {t1,
t2, …, tn}.
Transaction Data: Supermarket
• Market basket transactions:
t1: {bread, cheese, milk}
t2: {apple, eggs, salt, yogurt}
… …
tn: {biscuit, eggs, milk}
• Concepts:
• An item: an item/article in a basket
• I: the set of all items sold in the store
• A transaction: items purchased in a basket; it may have TID
(transaction ID)
• A transactional dataset: A set of transactions
The Model: rules
• A transaction t contains X, a set of items (itemset) in I,
if X  t.
• An association rule is an implication of the form:
X → Y, where X, Y  I, and X Y = 

• An itemset is a set of items.

• E.g., X = {milk, bread, cereal} is an itemset.
• A k-itemset is an itemset with k items.
• E.g., {milk, bread, cereal} is a 3-itemset
Example
Tid Items bought • itemset: A set of one or more items
10 Beer, Nuts, Diaper
• k-itemset X = {x1, …, xk}
20 Beer, Coffee, Diaper
30 Beer, Diaper, Eggs
• (absolute) support, or, support count
of X: Frequency or occurrence of an
40 Nuts, Eggs, Milk
itemset X
50 Nuts, Coffee, Diaper, Eggs, Milk
• (relative) support, s, is the fraction
Customer
buys both
Customer of transactions that contains X (i.e.,
buys diaper
the probability that a transaction
contains X)
• An itemset X is frequent if X’s
support is no less than a minsup
Customer threshold
buys beer
Association Rules
Tid Items bought • Find all the rules X → Y with
10 Beer, Nuts, Diaper minimum support and confidence
20 Beer, Coffee, Diaper
• support, s, probability that a
30 Beer, Diaper, Eggs
40 Nuts, Eggs, Milk
transaction contains X  Y
50 Nuts, Coffee, Diaper, Eggs, Milk • confidence, c, conditional
Customer
probability that a transaction
Customer
buys both
buys
having X also contains Y
diaper Let minsup = 50%, minconf = 50%
Freq. Pat.: Beer:3, Nuts:3, Diaper:4, Eggs:3, {Beer,
Diaper}:3
Customer
buys beer ◼ Association rules: (many more!)
◼ Beer → Diaper (60%, 100%)
◼ Diaper → Beer (60%, 75%)
10
Frequent Itemset Mining Methods
Frequent Itemset Mining Methods
Apriori: A Candidate Generation-and-Test Approach

Improving the efficiency of Apriori

FPGrowth: A Frequent Pattern-Growth Approach

Frequent Pattern Mining with Vertical Data Format

Proposed by R. Agrawal and R. Srikanth in
1994.

Apriori The name of this algorithm is based on the

fact that the algorithm uses prior knowledge
Algorithm of the frequent itemset properties.

Apriori employs an iterative approach known

as a level-wise search where k-itemsets are
used to explore (k+1) itemsets.
Apriori Method
• Method:
• Initially, scan DB once to get frequent 1-itemset
• Generate length (k+1) candidate itemsets from length k frequent
itemsets
• Terminate when no frequent or candidate set can be generated
• To improve the efficiency of the level-wise generation of frequent
itemsets, the apriori property is used to reduce the search space
• All non-empty subsets of a frequent itemset must also be frequent
• If {beer, diaper, nuts} is frequent, so is {beer, diaper}
Apriori -Example
Supmin = 2 Itemset sup
Itemset sup
Database TDB {A} 2
Tid Items
L1 {A} 2
C1 {B} 3
{B} 3
10 A, C, D {C} 3
1st scan {C} 3
20 B, C, E {D} 1
{E} 3
30 A, B, C, E {E} 3
40 B, E
C2 Itemset sup C2 Itemset
{A, B} 1
L2 Itemset sup 2nd scan {A, B}
{A, C} 2
{A, C} 2 {A, C}
{A, E} 1
{B, C} 2
{B, C} 2 {A, E}
{B, E} 3
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}

C3 Itemset L3 Itemset sup

3rd scan
{B, C, E} {B, C, E} 2
Apriori - Algorithm
Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that are
contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
Apriori - Implementation
• How to generate candidates?
• Step 1: self-joining Lk
• Step 2: pruning
• Example of Candidate-generation
• L3={abc, abd, acd, ace, bcd}
• Self-joining: L3*L3
• abcd from abc and abd
• acde from acd and ace
• Pruning:
• acde is removed because ade is not in L3
• C4 = {abcd}
Generating Association Rules from Frequent Itemsets
• Method
• For each frequent itemset l, generate all nonempty subsets of l
• For every nonempty subset of l, output the rule s -> (l - s) if (support_coutn(l) /
support_count(s)) >=min_conf

• Because the rules are generated from frequent itemsets, each one automatically
satisfies the minimum support
Generating Association Rules Example
• Example: Given the following table and the frequent itemset X = {I1, I2, I5}, generate
the association rules.

• The nonempty subsets of X are:

{I1, I2}, {I1, I5}, {I2, I5}, {I1}, {I2}, {I5}

• The resulting association rules are:

• If the minimum confidence threshold is, say, 70%, then only the second, third, and last
rules are output, because these are the only ones generated that are strong
Improving the Efficiency of Apriori
• Major computational challenges
• Multiple scans of transaction database
• Huge number of candidates
• Tedious workload of support counting for candidates
• Improving Apriori: general ideas
• Reduce passes of transaction database scans
• Shrink number of candidates
• Facilitate support counting of candidates
Scan Database only Twice
• Scan 1: partition database and find local frequent patterns.
• Scan 2: consolidate global frequent patterns.
Hash-based Technique: Reduce the Number of
candidates
• A k-itemset whose corresponding hashing bucket count is below the threshold
cannot be frequent
Example
• Consider the following database with five transactions. Let
min_sup = 60% and min_conf = 80%.

• Find all the frequent itemsets using Apriori method

• List all the strong association rules matching the following
metarule
Thank You

(2025-05-27) - FPM - Lecture 9
No ratings yet
(2025-05-27) - FPM - Lecture 9
35 pages
Frequent Pattern Mining Explained
No ratings yet
Frequent Pattern Mining Explained
30 pages
Association Analysis in Data Mining
No ratings yet
Association Analysis in Data Mining
72 pages
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
No ratings yet
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
37 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
DWDM Unit IV Mining - FP Association Rules
No ratings yet
DWDM Unit IV Mining - FP Association Rules
82 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Equent Patterns
No ratings yet
Equent Patterns
74 pages
AI & ML: Association Rule Mining
No ratings yet
AI & ML: Association Rule Mining
46 pages
Market Basket Analysis with Association Rules
No ratings yet
Market Basket Analysis with Association Rules
54 pages
Understanding the Apriori Algorithm
No ratings yet
Understanding the Apriori Algorithm
59 pages
Frequent Pattern Mining Techniques
No ratings yet
Frequent Pattern Mining Techniques
48 pages
Inbound 5799672056943946753
No ratings yet
Inbound 5799672056943946753
47 pages
Frequent Pattern Mining Techniques
No ratings yet
Frequent Pattern Mining Techniques
42 pages
Association Rule Mining Explained
No ratings yet
Association Rule Mining Explained
84 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
Association Rules
No ratings yet
Association Rules
33 pages
Mining Association Rules with Apriori
No ratings yet
Mining Association Rules with Apriori
24 pages
Frequent Pattern Analysis in Data Mining
No ratings yet
Frequent Pattern Analysis in Data Mining
26 pages
Apriori Algorithm for Mining Patterns
No ratings yet
Apriori Algorithm for Mining Patterns
48 pages
Mod 3 Notes Full
No ratings yet
Mod 3 Notes Full
25 pages
Association Rule Mining Techniques
No ratings yet
Association Rule Mining Techniques
62 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Session 8-Association Rules Mining
No ratings yet
Session 8-Association Rules Mining
75 pages
Association Rules & Clustering Techniques
No ratings yet
Association Rules & Clustering Techniques
13 pages
Association Rule Mining Techniques
No ratings yet
Association Rule Mining Techniques
40 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
Lecture 2.3.1 2.3.2
No ratings yet
Lecture 2.3.1 2.3.2
23 pages
Mod 5
No ratings yet
Mod 5
56 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
30 pages
Frequent Itemset & Association Rules Guide
No ratings yet
Frequent Itemset & Association Rules Guide
28 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
Association Rule Mining in Data Mining
No ratings yet
Association Rule Mining in Data Mining
43 pages
Association Rule Mining Explained
No ratings yet
Association Rule Mining Explained
40 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
16 pages
Association Rule Mining Techniques
No ratings yet
Association Rule Mining Techniques
77 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
91 pages
Unit 4
No ratings yet
Unit 4
97 pages
Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
Chapter 7
No ratings yet
Chapter 7
8 pages
ML Module3
No ratings yet
ML Module3
83 pages
Frequent Pattern Mining Techniques
No ratings yet
Frequent Pattern Mining Techniques
28 pages
Association Rule Mining Techniques
No ratings yet
Association Rule Mining Techniques
54 pages
Data Mining: Association Rules
No ratings yet
Data Mining: Association Rules
43 pages
Association Rule Mining Techniques Explained
No ratings yet
Association Rule Mining Techniques Explained
45 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
40 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
Apriori Algorithm in Association Rule Mining
No ratings yet
Apriori Algorithm in Association Rule Mining
32 pages
Association Rule Mining Explained
No ratings yet
Association Rule Mining Explained
49 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
Understanding Frequent Itemsets and Market Basket Analysis
No ratings yet
Understanding Frequent Itemsets and Market Basket Analysis
15 pages
Concept Description & Association Rules
No ratings yet
Concept Description & Association Rules
101 pages
DMDW Chapter 4 (Updated)
No ratings yet
DMDW Chapter 4 (Updated)
28 pages
Mining Association Rules Guide
No ratings yet
Mining Association Rules Guide
41 pages
Data Mining and Predictive Modelling
No ratings yet
Data Mining and Predictive Modelling
14 pages
DS - Lecture-Week-3 Recursion & Searching
No ratings yet
DS - Lecture-Week-3 Recursion & Searching
51 pages
Data Mining and Predictive Modelling: Lecture 2: Functionalities, KDD Process, Data Attributes and Properties
No ratings yet
Data Mining and Predictive Modelling: Lecture 2: Functionalities, KDD Process, Data Attributes and Properties
11 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
86 pages
Clustering and K-Means Algorithm
No ratings yet
Clustering and K-Means Algorithm
81 pages
DS Lecture Week 13
No ratings yet
DS Lecture Week 13
38 pages
DS Lecture Week 4 Sorting
No ratings yet
DS Lecture Week 4 Sorting
22 pages
DS Lecture Week 5
No ratings yet
DS Lecture Week 5
26 pages
CSET243-R23 Questionpaper
No ratings yet
CSET243-R23 Questionpaper
3 pages
Pastes and Gels by SARBAAN
No ratings yet
Pastes and Gels by SARBAAN
18 pages
LinguaFest 2014: Student Participation Guide
No ratings yet
LinguaFest 2014: Student Participation Guide
14 pages
Workbook Taking Charge of Your Brand
No ratings yet
Workbook Taking Charge of Your Brand
20 pages
Match Places and Professions Quiz
No ratings yet
Match Places and Professions Quiz
1 page
Case Facts
No ratings yet
Case Facts
3 pages
LAC Session Program For Science
100% (1)
LAC Session Program For Science
1 page
SMP - Condenser, Tube Cleaning & Debris Filter
100% (1)
SMP - Condenser, Tube Cleaning & Debris Filter
3 pages
Inglorious Empire - A Critical Reading
No ratings yet
Inglorious Empire - A Critical Reading
7 pages
Clavipectoral
No ratings yet
Clavipectoral
1 page
Wisconsin Clinics Improve Care Quality
No ratings yet
Wisconsin Clinics Improve Care Quality
6 pages
Payroll Timekeeping System
50% (2)
Payroll Timekeeping System
94 pages
Net Jets Pilot & Benefits Package
No ratings yet
Net Jets Pilot & Benefits Package
4 pages
UASA f1 2024 25 p1
No ratings yet
UASA f1 2024 25 p1
14 pages
Rasgas Company Limited: Cranes and Lifting Gear Integrity Manual
100% (1)
Rasgas Company Limited: Cranes and Lifting Gear Integrity Manual
2 pages
Tutorial-7 (IPC Introduction)
No ratings yet
Tutorial-7 (IPC Introduction)
5 pages
Democracy and The Limits of Self-Government
No ratings yet
Democracy and The Limits of Self-Government
34 pages
Company Profile
No ratings yet
Company Profile
14 pages
Construction Materials Inventory List
No ratings yet
Construction Materials Inventory List
4 pages
Ethics in Artificial Intelligence Overview
No ratings yet
Ethics in Artificial Intelligence Overview
22 pages
Presentation (Itc)
No ratings yet
Presentation (Itc)
20 pages
Shaw, K.A., & Hagemans, I. (2015) - Gentrification Without Displacement' and The Consequent Loss of Place
No ratings yet
Shaw, K.A., & Hagemans, I. (2015) - Gentrification Without Displacement' and The Consequent Loss of Place
19 pages
TDS VS 9372
No ratings yet
TDS VS 9372
1 page
ITISA Summary
No ratings yet
ITISA Summary
8 pages
Time Sheet For Month of June'24 Sultana Ferdoushi
No ratings yet
Time Sheet For Month of June'24 Sultana Ferdoushi
1 page
ELEKTRA UserManual en 17
No ratings yet
ELEKTRA UserManual en 17
718 pages
Strategic Management Midterm Exam Question B Answer-2
No ratings yet
Strategic Management Midterm Exam Question B Answer-2
3 pages
Chemistry IA Format and Structure
No ratings yet
Chemistry IA Format and Structure
6 pages
Adding Neo Subaccount To SAP Cloud Connector Fails Due To Authorization Error
No ratings yet
Adding Neo Subaccount To SAP Cloud Connector Fails Due To Authorization Error
5 pages
Discourse Analysis Siti Ramlah UTS
No ratings yet
Discourse Analysis Siti Ramlah UTS
3 pages
Pharmacy Advisor Guidance
No ratings yet
Pharmacy Advisor Guidance
3 pages