0% found this document useful (0 votes)

68 views68 pages

Asssociation Rules: Prof. Sin-Min Lee Department of Computer Science

The document discusses association rule mining and the Apriori algorithm. Association rule mining aims to discover relationships between items in large datasets. The Apriori algorithm uses a multi-pass approach to efficiently find all frequent itemsets, which are then used to generate association rules. It works by first finding all frequent individual items, and then uses candidate generation to iteratively find longer frequent itemsets, pruning the search space based on the downward closure property.

Uploaded by

anees_kassem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views68 pages

Asssociation Rules: Prof. Sin-Min Lee Department of Computer Science

Uploaded by

anees_kassem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

Asssociation Rules

Prof. Sin-Min Lee Department of Computer Science

Medical

Association Rules -Cholesterol level -> Heart condition

Real Application in medicine : The discovery of interesting association relationships among huge amount of gene mutation can help in determining the cause of mutation in tumours and diseases.

Examples.

form: Body ead [support, confidence]. buys(x, diapers) buys(x, beers) [0.5%, 60%] major(x, CS) ^ takes(x, DB) grade(x, A) [1%, 75%]
Rule

Support
Simplest question: find sets of items that appear frequently in the baskets. Support for itemset I = the number of baskets containing all items in I. Given a support threshold s, sets of items that appear in > s baskets are called frequent itemsets.

Example
Items={milk, coke, pepsi, beer, juice}. Support = 3 baskets. B1 = {m, c, b} B2 = {m, p, j} B3 = {m, b} B4 = {c, j} B5 = {m, p, b} B6 = {m, c, b, j} B7 = {c, b, j} B8 = {b, c} Frequent itemsets: {m}, {c}, {b}, {j}, {m, b}, {c, b}, {j, c}.

Applications --- (1)

Real market baskets: chain stores keep terabytes of information about what customers buy together.
Tells how typical customers navigate stores, lets them position tempting items. Suggests tie-in tricks, e.g., run sale on diapers and raise the price of beer.

High support needed, or no $$s .

Applications --- (2)

Baskets = documents; items = words in those documents.
Lets us find words that appear together unusually frequently, i.e., linked concepts.

Baskets = sentences, items = documents containing those sentences.

Items that appear together too often could represent plagiarism.

Applications --- (3)

Baskets = Web pages; items = linked pages.
Pairs of pages with many common references may be about the same topic.

Baskets = Web pages p ; items = pages that link to p .

Pages with many of the same links may be mirrors or about the same topic.

Association Rules
If-then rules about the contents of baskets. {i1, i2,,ik} j means: if a basket contains all of i1,,ik then it is likely to contain j. Confidence of this association rule is the probability of j given i1,,ik.

Example
+ B1 = {m, c, b} _ B3 = {m, b} _

B5 = {m, p, b} B7 = {c, b, j}

B2 = {m, p, j} B4 = {c, j} + B6 = {m, c, b, j} B8 = {b, c}

An association rule: {m, b} c.

Confidence = 2/4 = 50%.

Interest
The interest of an association rule is the absolute value of the amount by which the confidence differs from what you would expect, were items selected independently of one another.

Example
B1 = {m, c, b} B3 = {m, b} B5 = {m, p, b} B7 = {c, b, j} B2 = {m, p, j} B4 = {c, j} B6 = {m, c, b, j} B8 = {b, c}

For association rule {m, b} c, item c appears in 5/8 of the baskets. Interest = | 2/4 - 5/8 | = 1/8 --- not very interesting.

Relationships Among Measures

Rules with high support and confidence may be useful even if they are not interesting.
We dont care if buying bread causes people to buy milk, or whether simply a lot of people buy both bread and milk.

But high interest suggests a cause that might be worth investigating.

Finding Association Rules

A typical question: find all association rules with support s and confidence c.
Note: support of an association rule is the support of the set of items it mentions.

Hard part: finding the high-support (frequent ) itemsets.

Checking the confidence of association rules involving those sets is relatively easy.

Nave Algorithm
A simple way to find frequent pairs is:
Read file once, counting in main memory the occurrences of each pair.
Expand each basket of n items into its pairs. n (n -1)/2

Fails if #items-squared exceeds main memory.

A-Priori Algorithm --- (1)

A two-pass approach called a-priori limits the need for main memory. Key idea: monotonicity : if a set of items appears at least s times, so does every subset.
Contrapositive for pairs: if item i does not appear in s baskets, then no pair including i can appear in s baskets.

A-Priori Algorithm --- (2)

Pass 1: Read baskets and count in main memory the occurrences of each item.
Requires only memory proportional to #items.

Pass 2: Read baskets again and count in main memory only those pairs both of which were found in Pass 1 to be frequent.
Requires memory proportional to square of frequent items only.

Picture of A-Priori
Item counts Frequent items

Counts of candidate pairs

Pass 1

Pass 2

Detail for A-Priori

You can use the triangular matrix method with n = number of frequent items.
Saves space compared with storing triples.

Trick: number frequent items 1,2, and keep a table relating new numbers to original item numbers.

Frequent Triples, Etc.

For each k, we construct two sets of k tuples:
Ck = candidate k tuples = those that might be frequent sets (support > s ) based on information from the pass for k 1. Lk = the set of truly frequent k tuples.

Filter

Construct

Filter

Construct

First pass

Second pass

A-Priori for All Frequent Itemsets

One pass for each k. Needs room in main memory to count each candidate k tuple. For typical market-basket data and reasonable support (e.g., 1%), k = 2 requires the most memory.

Frequent Itemsets --- (2)

C1 = all items L1 = those counted on first pass to be frequent. C2 = pairs, both chosen from L1. In general, Ck = k tuples each k 1 of which is in Lk-1. Lk = those candidates with support s.

"Association Rules": Market Baskets Frequent Itemsets A-Priori Algorithm
No ratings yet
"Association Rules": Market Baskets Frequent Itemsets A-Priori Algorithm
30 pages
Big Data Analytics AAM Unit 4
No ratings yet
Big Data Analytics AAM Unit 4
80 pages
Assoc Rules1
No ratings yet
Assoc Rules1
32 pages
ch03 Assocrules
No ratings yet
ch03 Assocrules
59 pages
Lec1b Assoc Rules
No ratings yet
Lec1b Assoc Rules
32 pages
L2: Frequent Itemsets Mining and Association Rules
No ratings yet
L2: Frequent Itemsets Mining and Association Rules
54 pages
Association Rules
No ratings yet
Association Rules
58 pages
L13 Apriori
No ratings yet
L13 Apriori
32 pages
BDA Module 5
No ratings yet
BDA Module 5
212 pages
Unit 2
No ratings yet
Unit 2
14 pages
1association Analysis-Apriori
No ratings yet
1association Analysis-Apriori
67 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Class 4-Associative Analysis
No ratings yet
Class 4-Associative Analysis
42 pages
A Survey of Association Rule Mining For Customer Relationship Management
No ratings yet
A Survey of Association Rule Mining For Customer Relationship Management
7 pages
Appriori Algorithm
No ratings yet
Appriori Algorithm
15 pages
Association Rules
No ratings yet
Association Rules
56 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Association Rule Discovery Techniques
100% (1)
Association Rule Discovery Techniques
21 pages
Association Rules
No ratings yet
Association Rules
33 pages
Market Basket Analysis & Apriori Algorithm
No ratings yet
Market Basket Analysis & Apriori Algorithm
10 pages
Mining Frequent Patterns
No ratings yet
Mining Frequent Patterns
108 pages
Unit 4
No ratings yet
Unit 4
97 pages
Unit-5 Finalized
No ratings yet
Unit-5 Finalized
15 pages
Lecture Notes Session-2
No ratings yet
Lecture Notes Session-2
4 pages
Lecture 10-Assiciation Rule Mining-I-M
No ratings yet
Lecture 10-Assiciation Rule Mining-I-M
30 pages
Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
Big Data - Week04 - Association Rules
No ratings yet
Big Data - Week04 - Association Rules
46 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
4 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
DM Unit-II
No ratings yet
DM Unit-II
80 pages
DWDM Module III
No ratings yet
DWDM Module III
33 pages
Ch06 Frequent Itemsets
No ratings yet
Ch06 Frequent Itemsets
59 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
14 pages
Association Rules in Data Mining
No ratings yet
Association Rules in Data Mining
68 pages
III Unit-DM
No ratings yet
III Unit-DM
9 pages
Data Analytics Unit 4
No ratings yet
Data Analytics Unit 4
22 pages
DataMining Chapter2
No ratings yet
DataMining Chapter2
8 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
178 pages
04-Association Rule Mining
No ratings yet
04-Association Rule Mining
22 pages
ML Module3
No ratings yet
ML Module3
83 pages
Chapter 3
No ratings yet
Chapter 3
27 pages
Association Rules & Clustering Techniques
No ratings yet
Association Rules & Clustering Techniques
13 pages
Unit 3 1
No ratings yet
Unit 3 1
34 pages
Mining: Association Rules
No ratings yet
Mining: Association Rules
54 pages
Data Mining for CSE Students
No ratings yet
Data Mining for CSE Students
11 pages
Association Rule Mining Techniques
No ratings yet
Association Rule Mining Techniques
10 pages
Data Mining
No ratings yet
Data Mining
4 pages
Understanding Association Rule Mining
100% (1)
Understanding Association Rule Mining
131 pages
Big Data Analytics Unit3
No ratings yet
Big Data Analytics Unit3
27 pages
CH 5
No ratings yet
CH 5
53 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
2024 Lecture6
No ratings yet
2024 Lecture6
40 pages
Data Mining: Association Rules
No ratings yet
Data Mining: Association Rules
43 pages
Notes Bresenhams Line Drawing Algorithm
100% (2)
Notes Bresenhams Line Drawing Algorithm
6 pages
Differential Equations for Engineers
No ratings yet
Differential Equations for Engineers
2 pages
Machine Learning Techniques Overview
No ratings yet
Machine Learning Techniques Overview
27 pages
The Model Improves The Ability To Predict Traffic Conflicts in Real
No ratings yet
The Model Improves The Ability To Predict Traffic Conflicts in Real
6 pages
Predicting Academic Success: Machine Learning Analysis of Student, Parental, and School Efforts
No ratings yet
Predicting Academic Success: Machine Learning Analysis of Student, Parental, and School Efforts
22 pages
B.A P Data Analysis Skill Enhancement Course SEC
No ratings yet
B.A P Data Analysis Skill Enhancement Course SEC
3 pages
Traditional vs Big Data Analytics
No ratings yet
Traditional vs Big Data Analytics
4 pages
Mathematics in Business and Finance
100% (1)
Mathematics in Business and Finance
2 pages
Chapter 3 Analysis of LTI System in Frequency Domain
No ratings yet
Chapter 3 Analysis of LTI System in Frequency Domain
6 pages
Wireless Network Security Innovations
No ratings yet
Wireless Network Security Innovations
8 pages
Silver Meal Heuristic
100% (1)
Silver Meal Heuristic
8 pages
Halfspaces
No ratings yet
Halfspaces
3 pages
Linear Programming Notes
No ratings yet
Linear Programming Notes
8 pages
Assignment#1
No ratings yet
Assignment#1
2 pages
Module 5-Hashing and Collision
No ratings yet
Module 5-Hashing and Collision
51 pages
SSRN 3320044
No ratings yet
SSRN 3320044
9 pages
Single Source Shortest Path
No ratings yet
Single Source Shortest Path
16 pages
Crypt Arithmetic Cheat Sheet
No ratings yet
Crypt Arithmetic Cheat Sheet
17 pages
Lecture 4: Linear Systems and Convolution
No ratings yet
Lecture 4: Linear Systems and Convolution
17 pages
Answer To The Assignment 3: by Xin Wu SID: 102519884 Fall, 2007
No ratings yet
Answer To The Assignment 3: by Xin Wu SID: 102519884 Fall, 2007
4 pages
Su Schur2019
No ratings yet
Su Schur2019
16 pages
Utility Maximization & Consumer Choice
No ratings yet
Utility Maximization & Consumer Choice
69 pages
Basic Block and Flow Graph
No ratings yet
Basic Block and Flow Graph
11 pages
Problem Set 3
No ratings yet
Problem Set 3
3 pages
Neural Networks & Fuzzy Logic Basics
No ratings yet
Neural Networks & Fuzzy Logic Basics
51 pages
MAT135 Final Partb
No ratings yet
MAT135 Final Partb
20 pages
Spectral and Algebraic Graph Theory
No ratings yet
Spectral and Algebraic Graph Theory
400 pages
Mathematical Programming Model Building
No ratings yet
Mathematical Programming Model Building
6 pages
Bairstow Method and Muller Method Lecture-18
No ratings yet
Bairstow Method and Muller Method Lecture-18
11 pages
Second-Order Reaction Rate Constants
No ratings yet
Second-Order Reaction Rate Constants
2 pages