0% found this document useful (0 votes)

49 views8 pages

Association Rules in Data Mining

Uploaded by

Soma Srinivas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views8 pages

Association Rules in Data Mining

Uploaded by

Soma Srinivas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

2.

4 Generating Association Rules from Frequent Itemsets

Generating association rules from frequent itemsets is a crucial step in data mining that helps
us derive actionable insights from the data. Here’s a detailed discussion on how to generate
these rules, including an example for clarity.

1. Overview of Association Rules

Association rules are used to find relationships between different items in a dataset. Each rule
is of the form:

Antecedent→Consequent\text{Antecedent} \rightarrow \
text{Consequent}Antecedent→Consequent

Where:

 Antecedent (or Left-hand side) is a set of items.

 Consequent (or Right-hand side) is another set of items.

The goal is to discover rules that show a strong relationship between items. To measure the
strength of these rules, we use several metrics:

 Support: The proportion of transactions that contain both the antecedent and the
consequent.
 Confidence: The proportion of transactions that contain the antecedent and also
contain the consequent.
 Lift: The ratio of the observed support to the expected support if the items were
independent.

2. Steps for Generating Association Rules

1. Find Frequent Itemsets: Identify all the itemsets that appear frequently in the
dataset, i.e., those that meet a minimum support threshold.
2. Generate Rules from Frequent Itemsets: For each frequent itemset, generate all
possible rules and evaluate them using metrics such as confidence and lift.
3. Prune Rules: Discard rules that do not meet the minimum confidence threshold or
other criteria.

3. Example

Let’s walk through an example using a hypothetical transaction dataset. Consider a dataset
with the following transactions:

Transaction ID Items
1 {Milk, Bread}
2 {Milk, Diapers, Beer}
3 {Bread, Diapers}
4 {Milk, Bread, Diapers}
5 {Bread, Beer}
Assume we have already mined the frequent itemsets and found that the frequent itemsets
are:

 {Milk, Bread} with support = 60%

 {Milk, Diapers} with support = 40%
 {Bread, Diapers} with support = 40%
 {Milk} with support = 80%
 {Bread} with support = 80%
 {Diapers} with support = 60%

Let’s generate association rules from these frequent itemsets.

Example 1: Generating Rules from Frequent Itemset {Milk, Bread}

1. Generate Rules: From the itemset {Milk, Bread}, we can generate the following
rules:
o Rule 1: {Milk} → {Bread}
o Rule 2: {Bread} → {Milk}
2. Calculate Metrics:

For Rule 1: {Milk} → {Bread}

o Support: Support of {Milk, Bread} = 60% (since 3 out of 5 transactions

contain both Milk and Bread)
o Confidence: Confidence of {Milk} → {Bread} = (Support of {Milk, Bread}) /
(Support of {Milk}) = 60% / 80% = 75%
o Lift: Lift = Confidence of {Milk} → {Bread} / (Support of {Bread}) = 75% /
80% = 0.9375

For Rule 2: {Bread} → {Milk}

o Support: Same as Rule 1 = 60%

o Confidence: Confidence of {Bread} → {Milk} = (Support of {Milk, Bread}) /
(Support of {Bread}) = 60% / 80% = 75%
o Lift: Same as Rule 1 = 0.9375

Both rules have high confidence, indicating that there is a strong relationship between
Milk and Bread.

Example 2: Generating Rules from Frequent Itemset {Milk, Diapers}

1. Generate Rules: From the itemset {Milk, Diapers}, we can generate:

o Rule 1: {Milk} → {Diapers}
o Rule 2: {Diapers} → {Milk}
2. Calculate Metrics:

For Rule 1: {Milk} → {Diapers}

o Support: Support of {Milk, Diapers} = 40%

o Confidence: Confidence of {Milk} → {Diapers} = (Support of {Milk,
Diapers}) / (Support of {Milk}) = 40% / 80% = 50%
o Lift: Lift = Confidence of {Milk} → {Diapers} / (Support of {Diapers}) =
50% / 60% = 0.8333

For Rule 2: {Diapers} → {Milk}

o Support: Same as Rule 1 = 40%

o Confidence: Confidence of {Diapers} → {Milk} = (Support of {Milk,
Diapers}) / (Support of {Diapers}) = 40% / 60% = 66.67%
o Lift: Lift = Confidence of {Diapers} → {Milk} / (Support of {Milk}) =
66.67% / 80% = 0.8333

Here, the confidence for {Diapers} → {Milk} is higher compared to {Milk} →

{Diapers}, which may indicate a stronger relationship in this direction.

4. Practical Considerations

 Minimum Support and Confidence: Set thresholds for support and confidence to
filter out insignificant rules.
 Complexity: Generating and evaluating rules for very large itemsets can be
computationally expensive.
 Interpretation: Ensure that the rules make practical sense in the business context.
High lift values indicate stronger rules.

5. Conclusion

Generating association rules from frequent itemsets helps in uncovering valuable patterns and
relationships within the data. By evaluating the rules based on support, confidence, and lift,
one can derive meaningful insights that can inform decision-making processes in various
domains, from retail to healthcare.

2. Frequent Itemset Mining Methods

2.1 Apriori Algorithm

 Principle: Uses the "apriori property" that all subsets of a frequent itemset must also
be frequent.
 Steps:
1. Generate Frequent 1-itemsets: Scan the dataset and count item frequencies.
2. Generate Candidate Itemsets: Use frequent itemsets to generate larger
candidate itemsets.
3. Prune Candidates: Remove candidate itemsets that are not frequent.
4. Repeat: Continue until no more frequent itemsets can be found.
 Example: If {A, B} is frequent, then {A, B, C} will be considered for the next
iteration.

2.2 Finding Frequent Itemsets by Confined Candidate Generation

 Concept: Reduces the search space by confining candidate generation to only those
itemsets that are subsets of the current frequent itemsets.
 Efficiency Improvement: Reduces computational overhead and memory usage.

2.3 FPGrowth (Frequent Pattern Growth)

 Concept: Avoids candidate generation by using a compact data structure called FP-
tree (Frequent Pattern Tree).
 Steps:
1. Construct FP-Tree: Scan the dataset and build the FP-tree by inserting
transactions.
2. Mine FP-Tree: Extract frequent itemsets by traversing the FP-tree and
generating patterns.
 Advantages: Generally more efficient than Apriori, particularly for large datasets.

2.4 Generating Association Rules from Frequent Itemsets

 Steps:
1. Generate Rules: For each frequent itemset, generate rules by splitting the
itemset into antecedents and consequents.
2. Calculate Metrics: Compute support, confidence, and lift to evaluate the
strength and usefulness of each rule.
 Example: From the frequent itemset {A, B, C}, generate rules like {A, B} -> {C}.

2.5 Improving the Efficiency of Apriori

 Techniques:
1. Transaction Reduction: Remove transactions that do not contain the frequent
itemsets.
2. Itemset Pruning: Use the fact that if an itemset is infrequent, all its supersets
will be infrequent.
3. Partitioning: Partition the database into smaller chunks to reduce
computation.
 Example: Implementing a hash-based technique to reduce candidate itemsets.
3. From Association Analysis to Correlation Analysis

Association analysis and correlation analysis are both important techniques in data mining
and statistics, but they serve different purposes and are used in different contexts. Here’s a
detailed discussion on how these analyses relate and transition from one to the other, with
examples for clarity.

1. Overview of Association Analysis

Association Analysis primarily focuses on identifying relationships and patterns within

transactional or categorical data. It is often used to discover frequent itemsets and derive
association rules that reveal how items are related. The metrics used in association analysis
include:

 Support: Measures how often an itemset appears in the dataset.

 Confidence: Indicates how often the consequent of the rule appears when the
antecedent is present.
 Lift: Measures how much more likely the consequent is to be present when the
antecedent is present, compared to the expected likelihood if the antecedent and
consequent were independent.

Example: In a retail dataset, if customers who buy bread often buy milk, an association rule
might be: Bread → Milk\text{{Bread} → \text{Milk}}Bread → Milk with metrics such as
support = 40%, confidence = 75%, and lift = 1.2.

2. Overview of Correlation Analysis

Correlation Analysis examines the relationship between two continuous variables to

determine whether they move together and the strength of their relationship. Unlike
association analysis, which works with categorical data and focuses on itemsets, correlation
analysis uses statistical measures to quantify relationships between continuous variables. Key
metrics include:

 Pearson Correlation Coefficient (r): Measures the linear relationship between two
variables. Values range from -1 to 1, where -1 indicates a perfect negative linear
relationship, 1 indicates a perfect positive linear relationship, and 0 indicates no linear
relationship.
 Spearman's Rank Correlation: Measures the strength and direction of the
association between two ranked variables. It’s useful for non-linear relationships.
 Kendall’s Tau: Another rank-based measure of correlation, useful for small sample
sizes and when dealing with ordinal data.

Example: In a dataset of students’ study hours and exam scores, Pearson correlation might
reveal a strong positive correlation (r = 0.85), indicating that more study hours are associated
with higher exam scores.

3. Transition from Association to Correlation Analysis

While association analysis is used to find relationships between categorical items or events,
correlation analysis is used to quantify the strength and direction of relationships between
continuous variables. Here’s how you can transition from association analysis to correlation
analysis:

1. Data Preparation:
oAssociation Analysis: Deals with categorical data and is often used in market
basket analysis, where the goal is to find patterns like {bread, milk} →
{butter}.
o Correlation Analysis: Requires continuous data and is used to explore how
changes in one continuous variable affect another, such as how temperature
affects ice cream sales.
2. Finding Relationships:
o Association Analysis: Uses rules and metrics like support, confidence, and lift
to describe how often items are purchased together and how strong the
relationship is.
o Correlation Analysis: Uses statistical measures to describe how changes in
one variable are associated with changes in another variable.
3. Applications:
o Association Analysis: Used for recommendations (e.g., suggesting products),
understanding consumer behavior, and identifying common itemsets.
o Correlation Analysis: Used for predicting values, understanding trends, and
establishing causal relationships.

4. Example of Transition

Let’s illustrate with an example involving a retail store:

Dataset: Transactions from a store including items bought and customer demographics like
age and income.

1. Association Analysis:
o Frequent Itemsets: {Milk, Bread}, {Diapers, Beer}
o Association Rule: {Milk} → {Bread}, with support = 60%, confidence =
75%, and lift = 1.2.
2. Correlation Analysis:
o Data: Suppose we have continuous data on customer age and their total
spending.
o Pearson Correlation: Calculate the Pearson correlation coefficient between
age and total spending.
 If the correlation coefficient is 0.65, it suggests a moderate positive
relationship, meaning as customers get older, their spending tends to
increase.

Combined Insight:

 Association Analysis helps identify that people who buy milk often buy bread.
 Correlation Analysis might reveal that older customers spend more, possibly leading
to targeted marketing strategies.

5. Practical Considerations

 Data Type: Association analysis is suited for categorical data, while correlation
analysis is used for continuous data.
 Objective: Use association analysis to find patterns and relationships in categorical
data and correlation analysis to understand relationships between continuous
variables.
 Interpretation: Association rules provide actionable insights for categorical patterns
(e.g., “Customers who buy milk often buy bread”), while correlation analysis helps
understand continuous relationships (e.g., “Older customers tend to spend more”).

6. Conclusion

Transitioning from association analysis to correlation analysis involves understanding the

type of data and the goal of the analysis. While association analysis is used for discovering
relationships between categorical items, correlation analysis quantifies the strength and
direction of relationships between continuous variables. Both techniques provide valuable
insights that can be used in various applications, from market basket analysis to
understanding customer behavior and trends.

Mining Frequent Itemsets and Rules
No ratings yet
Mining Frequent Itemsets and Rules
27 pages
Association Rules and Frequent Item Analysis
No ratings yet
Association Rules and Frequent Item Analysis
30 pages
III Unit-DM
No ratings yet
III Unit-DM
9 pages
Clickstream Analysis with Association Rules
No ratings yet
Clickstream Analysis with Association Rules
22 pages
Association (IML)
No ratings yet
Association (IML)
19 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
30 pages
Data Mining Frequent Patterns
No ratings yet
Data Mining Frequent Patterns
22 pages
Data Analytics and Visualization Unit-IV
No ratings yet
Data Analytics and Visualization Unit-IV
4 pages
Data Analysis (No Free Launch Theorem)
No ratings yet
Data Analysis (No Free Launch Theorem)
8 pages
UNIT 2 Updated
No ratings yet
UNIT 2 Updated
50 pages
Association Rule Mining Basics
No ratings yet
Association Rule Mining Basics
17 pages
304A Data Warehousing and Data Mining Unit-3
No ratings yet
304A Data Warehousing and Data Mining Unit-3
17 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Unit 4 .3 Association Analysis
No ratings yet
Unit 4 .3 Association Analysis
50 pages
Association Rules in Data Mining
No ratings yet
Association Rules in Data Mining
68 pages
Session 8-Association Rules Mining
No ratings yet
Session 8-Association Rules Mining
75 pages
Understanding Association Rule Mining
100% (1)
Understanding Association Rule Mining
131 pages
Seminar 6
No ratings yet
Seminar 6
30 pages
Market Basket Analysis with Association Rules
No ratings yet
Market Basket Analysis with Association Rules
54 pages
Association Rules
No ratings yet
Association Rules
39 pages
Data Analytics Project
No ratings yet
Data Analytics Project
5 pages
Lect 6
No ratings yet
Lect 6
74 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
Rule Mining by Akshay Rele
No ratings yet
Rule Mining by Akshay Rele
42 pages
Chapter 13 - Association Rules: Data Mining For Business Intelligence
No ratings yet
Chapter 13 - Association Rules: Data Mining For Business Intelligence
22 pages
Module-4 DM - Introduction
No ratings yet
Module-4 DM - Introduction
5 pages
6 - Association Rules - For Students
No ratings yet
6 - Association Rules - For Students
39 pages
Data Mining Mod 2
No ratings yet
Data Mining Mod 2
7 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
16 pages
Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
Data Mining: Association Rules
No ratings yet
Data Mining: Association Rules
43 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Unit - III
No ratings yet
Unit - III
27 pages
Class 4-Associative Analysis
No ratings yet
Class 4-Associative Analysis
42 pages
Importance of Association Rule Mining and Its Real-Time Applications
No ratings yet
Importance of Association Rule Mining and Its Real-Time Applications
28 pages
Lecture 8
No ratings yet
Lecture 8
13 pages
Association Rules
No ratings yet
Association Rules
24 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
Association Rule Mining in Data Analytics
No ratings yet
Association Rule Mining in Data Analytics
24 pages
Association: Market Basket Analysis
No ratings yet
Association: Market Basket Analysis
40 pages
Data Analytics Unit 4
No ratings yet
Data Analytics Unit 4
22 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
No ratings yet
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
19 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
4 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Chap 4-Mining Frequent Patterns, Association-Lecture 6-2
No ratings yet
Chap 4-Mining Frequent Patterns, Association-Lecture 6-2
66 pages
Unit 5
No ratings yet
Unit 5
40 pages
Association
No ratings yet
Association
54 pages
Data Mining and Data Analytics Unit-II
No ratings yet
Data Mining and Data Analytics Unit-II
26 pages
Association Rule Mining Techniques
No ratings yet
Association Rule Mining Techniques
41 pages
Big Data Analytics Unit3
No ratings yet
Big Data Analytics Unit3
27 pages
CH 5
No ratings yet
CH 5
53 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 02
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 02
67 pages
Business Responsibility & Sustainability Report - 2023-24-Shilpa
No ratings yet
Business Responsibility & Sustainability Report - 2023-24-Shilpa
27 pages
Itarsi-Nagpur Bridge RCC Slab Plan
No ratings yet
Itarsi-Nagpur Bridge RCC Slab Plan
1 page
Stripping Test for Bituminous Aggregates
No ratings yet
Stripping Test for Bituminous Aggregates
2 pages
Masterprotect 1855 Tds
100% (1)
Masterprotect 1855 Tds
2 pages
Demag DF 115p Wheel Paver
86% (7)
Demag DF 115p Wheel Paver
174 pages
Apple’s Unique Social Media Strategy
No ratings yet
Apple’s Unique Social Media Strategy
6 pages
Lifelong Learning & Extension Activities at JSM College
No ratings yet
Lifelong Learning & Extension Activities at JSM College
16 pages
Meeting The Two Guardians
100% (1)
Meeting The Two Guardians
14 pages
ReRAM History Status and Future
No ratings yet
ReRAM History Status and Future
14 pages
Shop and Field Fabrication - PR - 0XX
No ratings yet
Shop and Field Fabrication - PR - 0XX
18 pages
English Language Skills Development Activities
No ratings yet
English Language Skills Development Activities
6 pages
Descriptive Linguistics
100% (1)
Descriptive Linguistics
19 pages
Strategic Innovaton Simulaton: Back Bay Batery Foreground Reading
No ratings yet
Strategic Innovaton Simulaton: Back Bay Batery Foreground Reading
7 pages
Global Service Station 2024-12
No ratings yet
Global Service Station 2024-12
22 pages
At The Supermarket - Phrases Full Article
No ratings yet
At The Supermarket - Phrases Full Article
7 pages
Excerpt From "The Grid" by Gretchen Bakke
No ratings yet
Excerpt From "The Grid" by Gretchen Bakke
20 pages
UNSELF Module 1
No ratings yet
UNSELF Module 1
3 pages
Prova de Inglês 1
No ratings yet
Prova de Inglês 1
4 pages
DCW Laptop Computer Request Letter Sample
100% (9)
DCW Laptop Computer Request Letter Sample
1 page
Indonesia's Low Carbon Recovery Plan
No ratings yet
Indonesia's Low Carbon Recovery Plan
108 pages
Bluetooth Network Encapsulation (BNEP) Protocol Test Cases - Rev0.95a
No ratings yet
Bluetooth Network Encapsulation (BNEP) Protocol Test Cases - Rev0.95a
41 pages
NCRCC Member Satisfaction Insights
No ratings yet
NCRCC Member Satisfaction Insights
12 pages
Bait Al Khandaq PDF
No ratings yet
Bait Al Khandaq PDF
10 pages
MORTH Specifications for Earthworks and Materials
No ratings yet
MORTH Specifications for Earthworks and Materials
34 pages
FSEZ: Growth and Future Prospects
No ratings yet
FSEZ: Growth and Future Prospects
22 pages
1st Quarter Test Pointers To Review
No ratings yet
1st Quarter Test Pointers To Review
3 pages
Mary Jones - Cambridge International AS - A Level Biology Study and Revision Guide-Hodder Education Group (2021)
88% (8)
Mary Jones - Cambridge International AS - A Level Biology Study and Revision Guide-Hodder Education Group (2021)
252 pages
Grade 12 STEM Physics Anxiety Study
No ratings yet
Grade 12 STEM Physics Anxiety Study
55 pages
Learning A Language
No ratings yet
Learning A Language
8 pages
Araldite CY 225 Hardener HY 925 Filler Silica Flour: Araldite Casting Resin System
No ratings yet
Araldite CY 225 Hardener HY 925 Filler Silica Flour: Araldite Casting Resin System
8 pages