0% found this document useful (0 votes)

32 views52 pages

COS10022 DSP Week06 Association Rules

The document covers the principles of association rules in data science, focusing on the Apriori algorithm for discovering relationships in large datasets. It explains the process of generating frequent itemsets, evaluating candidate rules using metrics like support, confidence, lift, and leverage, and provides examples of market basket analysis. The content emphasizes the importance of these metrics in identifying strong and interesting rules from transaction data.

Uploaded by

LAT

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views52 pages

COS10022 DSP Week06 Association Rules

Uploaded by

LAT

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

COS10022 – DATA SCIENCE PRINCIPLES

Dr Pei-Wei Tsai (Lecturer, Unit Convenor)

[email protected], EN508d
Week 06
Association Rules
COS10022 - Data science Principles
Outline
Overview

Apriori Algorithm

Evaluation of Candidate Rules

Examples

Application of Association Rules

Validation and Testing

Diagnostic
3
Overview
• Association rules Market Basket Analysis

• An unsupervised learning method.

• A descriptive, not predictive, method.

• Used to discover interesting hidden

relationships in a large dataset.

• The disclosed relationships are represented

as rules or frequent itemsets.
>
• Commonly used for mining transactions in
database.

4
Overview
• Some questions that association rules
can answer:
• Which products tend to be purchased
together?

• Of those customers who are similar to this

person, what products do they tend to buy? To analyze customer
buying habits by finding
associations and
• Of those customers who have purchased this correlations between the
different items that
product, what other similar products do they customers place in their
tend to view or purchase? shopping baskets.

5
Overview
The general logic behind association rules:

1. A large collection of transactions (depicted as

three stacks of receipts, in which each transaction
consists of one or more items).

2. Association rules go through the items being

purchased to see what items are frequently
bought together and to discover a list of rules that
describe the purchasing behavior.

3. The rules suggest that when cereal is purchased,

90% of the time milk is purchased; when bread is
purchased, 40% of the time milk is purchased also;
when milk is purchased, 23% of the time cereal is
also purchased.

6
Overview
Rules
• Each rule is in the form X → Y
• Means when Item X is observed, Item Y is also observed.

Itemset
• A collection of items or individual entities that contain
some kind of relationship.
• An itemset containing k items is called a k-itemset.
• k-itemset = {item 1, item 2,…,item k}
• Examples:
• A set of retail items purchased together in one transaction.
• A set of hyperlinks clicked on by one user in a single session.

8
Overview
1-itemsets
Apriori Algorithm
• The most fundamental algorithms for generating association rules. 2-itemsets

• One major component of Apriori is support.

• Given an Itemset L, the support of L is the percentage of transactions that contain L.
• If 80% of all transactions contain itemset {bread}, then the support of {bread} is 0.8.
• If 60% of all transactions contain itemset {bread, butter}, then the support of {bread, butter} is 0.6.

Frequent Itemset
• Items that appear together “often enough” (i.e. meets the minimum support criterion).
• If the minimum support is set at 0.7, {bread} is considered a frequent itemset; whereas {bread,
butter} is not considered as a frequent itemset.

9
Overview
Apriori Property
Frequent
• Also called downward closure property. Itemset

• If an item is considered frequent, then any subset of the

frequent itemset must also be frequent.
• If 60% of the transactions contain {bread, jam}, then at least
60% of all the transactions will contain {bread} or {jam}.
• If the support of {bread, jam} is 0.6, the support of {bread}
or {jam} is at least 0.6.

• If itemset {B, C, D} is frequent, then all the subset of this

itemset, shaded in the figure, must also be frequent
itemsets.
10
Overview
Association Rules Rule evaluation Metrics
• An implication expression of the form X • Support (s): No. of transactions that contain
→ Y, where X and Y are non-overlapping both X and Y out of total no. of transactions.
itemsets.
• E.g. A support of 2% means that 2% of all the
• E.g. {Milk, Diaper} → {Beer} transactions under analysis show that {Milk,
Diaper} and {Beer} are purchased together.
• Generating association rules:
• Step 1: Find frequent itemsets whose • Confidence (c): No. of transactions that
occurrences exceed a predefined contain both X and Y out of total no. of
minimum support threshold. transactions that contains X.
• Step 2: Derive association rules from
those frequent itemsets (with the • E.g. A confidence of 60% means that 60% of
constrains of minimum confidence customers who purchased {Milk, Diaper} also
threshold). bought {Beer}

11
Outline
Overview

Apriori Algorithm

Evaluation of Candidate Rules

Examples

Application of Association Rules

Validation and Testing

Diagnostic
12
Apriori Algorithm
Creating Frequent Sets
• Apriori employs an iterative approach known as a level-wise search, where
k-itemsets are used to explore (k+1)-itemsets.

• First, the set of frequent 1-itemsets is found by scanning the database to

accumulate the count for each item, and collecting those items that satisfy
minimum support. The resulting set is denoted L1.

• Next, L1 is used to find L2, the set of frequent 2-itemsets, which is used to find L3,
and so on, until no more frequent k-itemsets can be found.

• The finding of each Lk requires one full scan of the database.

13
Example 1
L1
Item Count Items (1-itemsets) (5 transactions ; 6 types of items)
Bread 4
Coke 2 L2
Milk 4 Itemset Count Pairs (2-itemsets)
Beer 3
{Bread,Milk} 3
Diaper 4
Eggs 1 {Bread,Beer} 2 (No need to generate
{Bread,Diaper} 3 candidates involving Coke
{Milk,Beer} 2 or Eggs)
{Milk,Diaper} 3
{Beer,Diaper} 3
L3
Itemset Count
{Bread,Milk,Diaper} 2 Triplets (3-itemsets)
MinSupp = 3/5=0.6 {Bread,Milk,Beer} 1
{Bread,Diaper,Beer} 2

14
Apriori Algorithm
Creating Frequent Sets

• Let’s define:
Ck as a candidate itemset of size k
Lk as a frequent itemset of size k

• Main steps of iteration are:

1. Find frequent itemset Lk-1 (starting from L1)
2. Join step: Ck is generated by joining Lk-1 with itself (cartesian product Lk-1 x Lk-1)
3. Prune step (Apriori Property): Any (k−1) size itemset that is not frequent cannot be a subset of
a frequent k size itemset, hence should be removed from Ck
4. Frequent set Lk has been achieved

15
Apriori Algorithm
Illustrating Apriori Principle

• Any subset of a
frequent itemset must
also be frequent.

• Itemsets that do not

meet the minimum
support threshold are
pruned away.

16
Apriori Algorithm
Pseudo Code
L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;

17
Example 2

18
Outline
Overview

Apriori Algorithm

Evaluation of Candidate Rules

Examples

Application of Association Rules

Validation and Testing

Diagnostic
19
Evaluation of Candidate Rules
The process of creating association rules is two-staged.
• First, a set of candidate rule based on frequent itemsets is generated.
• If {Bread, Egg, Milk, Butter} is the frequent itemset, candidate rules will look like:
• {Egg, Milk, Butter} → {Bread}
• {Bread, Milk, Butter} → {Egg}
• {Bread, Egg} → {Milk, Butter}
• Etc.

• Second, the appropriateness of these candidate rules are evaluated using:

• Confidence
• Lift
• Leverage
20
Evaluation of Candidate Rules
Confidence
• The measures of certainty or trustworthiness associated with each discovered rule.
• Mathematically, the percentage of transactions that contain both X and Y out of all
transactions that contain X.

Support( X + Y )
Confidence( X → Y ) =
support( X )

• E.g. if {bread, eggs, milk} has support of 0.15 and {bread, eggs} also has a support of
0.15, the confidence of rule {bread, eggs} → {milk} is 1.
• This means 100% of the time a customer buys bread and eggs, milk is brought as well. The rule is
therefore correct for 100% of the transactions containing bread and eggs.
21
Evaluation of Candidate Rules
{Toothbrush}→{Milk}:
Confidence Confidence = 10/(10+4) = 0.7

• A relationship may be thought of as interesting when the algorithm identifies the

relationship with a measure of confidence greater than or equal to the predefined
threshold (i.e. the minimum confidence).

• Problem with Confidence:

• Given a rule X → Y, confidence considers only the antecedent (X) and the co-
occurrence of X and Y.
• Cannot tell if a rule contains true implication of the relationship or if the rule
is purely coincidental.

22
https://towardsdatascience.com/association-rules-2-aa9a77241654
Evaluation of Candidate Rules
Lift
• Measures how many times more often X and Y occur together than expected if they are
statistically independent of each other.
• A measure of how X and Y are really related rather than coincidentally happening
together.

support( X + Y )
Lift( X ⇒ Y ) =
support( X ) × support( Y )

• Lift = 1 if X and Y are statistically independent

• Lift > 1 indicates the degree of usefulness of the rule
• A larger value of lift suggests a greater strength of the association between X and Y.
23
Evaluation of Candidate Rules
Lift

• E.g. Assuming 1000 transactions,

• If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in 400 of the transactions, then
Lift(milk → eggs) = 0.3 / (0.5 * 0.4) = 1.5
• If {milk, bread} appears in 400, {milk} in 500, and {bread} in 400 of the transactions, then
Lift(milk → bread) = 0.4/(0.5*0.4) = 2.0

• Therefore it can be concluded that milk and bread have a stronger association
than milk and eggs.

24
Evaluation of Candidate Rules
Leverage
• Measure the difference in the probability of X and Y appearing together in the
dataset compared to what would be expected if X and Y were statistically
independent of each other.

Leverage( X → Y ) = Support( X + Y ) − Support( X ) × Support( Y )

• Leverage = 0 if X and Y are statistically independent

• Leverage > 0 indicates the degree of relationship between X and Y,
• A larger leverage value indicates a stronger relationship between X and Y.

25
Evaluation of Candidate Rules
Leverage

• E.g. Assuming 1000 transactions,

• If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in 400 of the
transactions, then Leverage(milk → eggs) = 0.3 - 0.5*0.4 = 0.1

• If {milk, bread} appears in 400, {milk} in 500, and {bread} in 400 of the
transactions, then Leverage (milk → bread) = 0.4 - 0.5*0.4 = 0.2

• It again confirms that milk and bread have a stronger association

than milk and eggs.
26
Summary
• Assuming 1000 transactions,
• If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in 400 of the transactions.

• If {milk, bread} appears in 400, {milk} in 500, and {bread} in 400 of the transactions.

{Milk} → {Egg} {Milk} → {Bread}

Confidence 0.3ൗ 0.4ൗ
0.5 = 0.6 0.5 = 0.8
Lift 0.3ൗ 0.4ൗ
0.5 × 0.4 = 1.5 0.5 × 0.4 = 2
Leverage 0.3 − 0.5 × 0.4 = 0.1 0.4 − 0.5 × 0.4 = 0.2
Smaller Greater

27
Evaluation of Candidate Rules
• Confidence is able to identify trustworthy rules, but it cannot tell
whether a rule is coincidental.

• Measures such as lift and leverage not only ensure interesting rules
are identified but also filter out the coincidental rules.

• Support, confidence, lift and leverage ensures the discovery of

interesting and strong rules from sample dataset.

28
Outline
Overview

Apriori Algorithm

Evaluation of Candidate Rules

Examples

Application of Association Rules

Validation and Testing

Diagnostic
29
Example 3 Min_Supp Count = 2
Min_Conf = 50%

Five Market Basket 1. Candidate list generation, C1: 2. Create frequent 1-itemsets, L1:
transactions with items Start by creating all individual Remove candidates that fail
labelled as I1, I2, I3 and so on. items called candidates and min_sup count (i.e. I4 and I5).
calculate their support counts. No supersets of infrequent
itemset must be generated and
tested.
30
Min_Supp Count = 2
Min_Conf = 50%

3. Generate second 4. Create Frequent 2- 5. Generate third 6. Support count for {I1,
candidate list, C2: itemsets, L2: candidate list, C3: I2, I3} fails min_supp.
C2 is generated by L1 Remove candidates C3 is generated by L2 Association rule mining
cross join L1. that fail min_sup count. cross join L2. is completed and there
will be no C4 candidate
list.
{L1, L2} appears in 2
transactions together.
{L2, L3} appears in 3
transactions together.
{L1, L2, I3} appears in only 1
transaction together.

31
9. Strong association rules:
Green ticked are all
strong rules satisfying
min_conf >=50% and
min_supp count =2.

7. Generate candidate rules: Take the 8. Calculate confidence of each candidate

last non-empty frequent itemset (i.e. rule:
L2 = {I1, I2}, {I2, I3}) and Use all the Support( X + Y )
non-empty subsets of the frequent Confidence( X → Y ) =
support( X )
itemset to generate association rules.
32
Example 4
TID List of Items
• Consider a database, D , consisting of 9 transactions.
T100 I1, I2, I5

T101 I2, I4 • Suppose minimum support count required is 2 (i.e.

min_sup = 2/9 = 22 % )
T102 I2, I3

T103 I1, I2, I4

• Let minimum confidence required be 70%.
T104 I1, I3

T105 I2, I3 • We have first to find out the frequent itemsets using
T106 I1, I3 Apriori algorithm.
T107 I1, I2 ,I3, I5

T108 I1, I2, I3 • Then, Association rules will be generated using min.
support & min. confidence.

33
Step 1: Generating 1-itemset Frequent Pattern

Itemset Sup.Count Itemset Sup.Count

Scan D to {I1} 6 Compare {I1} 6

count of {I2} 7 candidate {I2} 7
each {I3} 6 support count {I3} 6
candidate {I4} 2 with minimum {I4} 2
{I5} 2
support count (2) {I5} 2

C1 L1

• In the first iteration of the algorithm, each item is a member of the set of candidate.
• The set of frequent 1-itemsets, L1 , consists of the candidate 1-itemsets satisfying
minimum support.
34
Step 2: Generating 2-itemset Frequent Pattern

Itemset Itemset Sup. Itemset Sup

Count Count
Generate C2 {I1, I2}
Scan D {I1, I2} 4 Compare {I1, I2} 4
candidates for {I1, I3} 4 candidate {I1, I3} 4
{I1, I3}
from L1 count of support {I1, I5} 2
{I1, I4} {I1, I4} 1
each count
(L1xL1) {I1, I5} 2 {I2, I3} 4
{I1, I5} candida with
{I2, I4} 2
{I2, I3} te {I2, I3} 4 minimum
{I2, I5} 2
cartesian product Lk-1 x Lk-1 {I2, I4} {I2, I4} 2 support
{I2, I5} {I2, I5} 2 count L2
{I3, I4} {I3, I4} 0
{I3, I5} {I3, I5} 1
{I4, I5} {I4, I5} 0
C2 C2
35
Step 2: Generating 2-itemset Frequent Pattern [Cont.]

• To discover the set of frequent 2-itemsets, L2 , the algorithm uses L1 Join L1 to

generate a candidate set of 2-itemsets, C2.

• Next, the transactions in D are scanned and the support count for each candidate
itemset in C2 is accumulated (as shown in the middle table).

• The set of frequent 2-itemsets, L2 , is then determined, consisting of those

candidate 2-itemsets in C2 having minimum support.

• Note: We haven’t used Apriori Property yet.

36
Step 3: Generating 3-itemset Frequent Pattern

Generate C3 Compare
candidates Scan D candidate
from L2 Itemset
for count Itemset Sup. support count Itemset Sup
of each Count with min Count
L2xL2 {I1, I2, I3}
candidate {I1, I2, I3} 2 support count {I1, I2, I3} 2
{I1, I2, I5}
{I1, I2, I5} 2 {I1, I2, I5} 2
….
….
L3
C3 C3

• The generation of the set of candidate 3-itemsets, C3 , involves the use of the Apriori Property.
• In order to find C3, we compute L2 Join L2.
• C3 = L2 Join L2 = {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5}}.
• Now, Join step is complete and Prune step will be used to reduce the size of C3. Prune step helps
to avoid heavy computation due to large Ck.
37
Step 3: Generating 3-itemset Frequent Pattern [Cont.]
C3 = {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5}}.

• Based on the Apriori property that all subsets of a frequent itemset must also be frequent, we can
determine that four latter candidates cannot possibly be frequent. How ?
• For example, let’s take {I1, I2, I3}. The 2-item subsets of it are {I1, I2}, {I1, I3} & {I2, I3}. Since all 2-
item subsets of {I1, I2, I3} are members of L2, we will keep {I1, I2, I3} in C3.
• Let's take another example of {I2, I3, I5} which shows how the pruning is performed. The 2-item
subsets are {I2, I3}, {I2, I5} & {I3,I5}.
• BUT, {I3, I5} is not a member of L2 and hence it is not frequent violating Apriori Property. Thus we
will have to remove {I2, I3, I5} from C3.
• Therefore, C3 = {{I1, I2, I3}, {I1, I2, I5}} after checking for all members of result of Join operation for
Pruning.
• Now, the transactions in D are scanned in order to determine L3, consisting of those candidates 3-
itemsets in C3 having minimum support.
38
Step 4: Generating 4-itemset Frequent Pattern

• The algorithm uses L3 Join L3 to generate a candidate set of 4-itemsets, C4.

Although the join results in {{I1, I2, I3, I5}}, this itemset is pruned since its subset
{{I2, I3, I5}} is not frequent.
• Thus, C4 = φ , and algorithm terminates, having found all the frequent items.
This completes our Apriori Algorithm.

• What’s Next ?
These frequent itemsets will be used to generate strong association rules ( where
strong association rules satisfy both minimum support & minimum confidence).

39
Step 5: Generating Association Rules from Frequent Itemsets

• Back To Example:
List all frequent itemsets, L = {{I1}, {I2}, {I3}, {I4}, {I5}, {I1,I2}, {I1,I3}, {I1,I5}, {I2,I3}, {I2,I4},
{I2,I5}, {I1,I2,I3}, {I1,I2,I5}}.

• Lets take the 3-itemsets: {I1,I2,I5}.

• Its all nonempty subsets are {I1,I2}, {I1,I5}, {I2,I5}, {I1}, {I2}, {I5}.

• The resulting candidates rules are:

• {I1,I2} → {I5}
• {I1,I5} → {I2}
• {I2,I5} → {I1}
• {I5} → {I1,I2}
• {I2} → {I1,I5}
• {I1} → {I2,I5}

40
Step 5: Generating Association Rules from Frequent Itemsets [Cont.]
• Let minimum confidence threshold is , say 70%.
• The resulting association rules are shown below, each listed with its confidence.
• R1: I1 ^ I2 → I5
• Confidence = sc{I1,I2,I5}/sc{I1,I2} = 2/4 = 50% {I1,I2,I5}
• R1 is Rejected.

Support( X + Y )
• R2: I1 ^ I5 → I2 Confidence( X → Y ) =
support( X )
• Confidence = sc{I1,I2,I5}/sc{I1,I5} = 2/2 = 100%
• R2 is Selected.
{I1,I2}, {I1,I5}, {I2,I5}, {I1}, {I2}, {I5}
• R3: I2 ^ I5 → I1
• Confidence = sc{I1,I2,I5}/sc{I2,I5} = 2/2 = 100%
• R3 is Selected.
41
Step 5: Generating Association Rules from Frequent Itemsets
[Cont.]

• R4: I1 → I2 ^ I5
• Confidence = sc{I1,I2,I5}/sc{I1} = 2/6 = 33%
• R4 is Rejected.
• R5: I2 → I1 ^ I5
• Confidence = sc{I1,I2,I5}/{I2} = 2/7 = 29%
• R5 is Rejected.
• R6: I5 → I1 ^ I2
• Confidence = sc{I1,I2,I5}/ {I5} = 2/2 = 100%
• R6 is Selected.
In this way, we have found three strong association rules.

42
Outline
Overview

Apriori Algorithm

Evaluation of Candidate Rules

Examples

Application of Association Rules

Validation and Testing

Diagnostic
43
Applications of Association Rules
The term market basket analysis refers to a specific implementation of
association rules.
• For better merchandising – products to include/exclude from
inventory each month
• Placement of products
• Cross-selling
• Promotional programs—multiple product purchase incentives
managed through a loyalty card program
44
Applications of Association Rules
• Input: the simple point-of-sale transaction data
• Output: Most frequent affinities among items

• Example: according to the transaction data…

“Customer who bought a laptop computer and a virus protection software, also
bought extended service plan 70 percent of the time."

• How do you use such a pattern/knowledge?

• Put the items next to each other for ease of finding
• Promote the items as a package (do not put one on sale if the other(s) are on sale)
• Place items far apart from each other so that the customer must walk the aisles to search for it,
and by doing so potentially seeing and buying other items

45
Applications of Association Rules
Recommender systems – Amazon, Netflix:
• Clickstream analysis from web usage log files
• Website visitors to page X click on links A,B,C more than on links D,E,F

In medicine:
• relationships between symptoms and illnesses;
• diagnosis and patient characteristics and treatments (to be used in
medical DSS);
• genes and their functions (to be used in genomics projects)..
46
Outline
Overview

Apriori Algorithm

Evaluation of Candidate Rules

Application of Association Rules

Examples

Validation and Testing

Diagnostic
47
Validation and Testing
• The frequent and high confidence itemsets are found by pre-specified minimum support
and minimum confidence levels

• Measures like lift and/or leverage then ensure that interesting rules are identified rather
than coincidental ones

• However, some of the remaining rules may be considered subjectively uninteresting

because they don’t yield unexpected profitable actions

• E.g., rules like {paper} → {pencil} are not interesting/meaningful

• Incorporating subjective knowledge requires domain experts

• Good rules provide valuable insights for institutions to improve their business operations

48
Outline
Overview

Apriori Algorithm

Evaluation of Candidate Rules

Examples

Application of Association Rules

Validation and Testing

Diagnostic
49
Diagnostics
• Although the Apriori algorithm is easy to understand and implement, some
of the rules generated are uninteresting or practically useless.

• Additionally, some of the rules may be generated due to coincidental

relationships between the variables.

• Measures like confidence, lift, and leverage should be used along with
human insights to address this problem.

50
Diagnostics
• Another problem with association rules is that, in Phase 3 and 4 of the Data
Analytics Lifecycle , the team must specify the minimum support prior to
the model execution, which may lead to too many or too few rules.

• In related research, a variant of the algorithm can use a predefined target

range for the number of rules so that the algorithm can adjust the minimum
support accordingly.

• Algorithm requires a scan of the entire database to obtain the result.

Accordingly, as the database grows, it takes more time to compute in each
run.

51
Diagnostics
Approaches to improve Apriori’s efficiency:

Partitioning:
• Any itemset that is potentially frequent in a transaction database must be frequent in at least one of the partitions of the
transaction database.
Sampling:
• This extracts a subset of the data with a lower support threshold and uses the subset to perform association rule mining.

Transaction reduction:
• A transaction that does not contain frequent k-itemsets is useless in subsequent scans and therefore can be ignored.

Hash-based itemset counting:

• If the corresponding hashing bucket count of a k-itemset is below a certain threshold, the k-itemset cannot be frequent.

Dynamic itemset counting:

• Only add new candidate itemsets when all of their subsets are estimated to be frequent.

52
References
• EMC Education Services. (2015). Data Science and Big Data Analytics:
Discovering, Analyzing, Visualizing and Presenting Data. Wiley.

• Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and
techniques. Elsevier.

• Data Camp, ‘Market Basket Analysis using R’, retrieved from

https://www.datacamp.com/community/tutorials/market-basket-
analysis-r

6 - Association Rules - For Students
No ratings yet
6 - Association Rules - For Students
39 pages
Unit - V Part-1
No ratings yet
Unit - V Part-1
43 pages
Association: Market Basket Analysis
No ratings yet
Association: Market Basket Analysis
40 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
88 pages
Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
Association Rules
No ratings yet
Association Rules
24 pages
Big Data Analytics Unit3
No ratings yet
Big Data Analytics Unit3
27 pages
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
No ratings yet
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
37 pages
Chapter - 05 - Association Rules
No ratings yet
Chapter - 05 - Association Rules
38 pages
Data Analysis (No Free Launch Theorem)
No ratings yet
Data Analysis (No Free Launch Theorem)
8 pages
AI & ML: Association Rule Mining
No ratings yet
AI & ML: Association Rule Mining
46 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
30 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
ML Module3
No ratings yet
ML Module3
83 pages
Data Mining: Association Rules
No ratings yet
Data Mining: Association Rules
43 pages
Data Mining Mod 2
No ratings yet
Data Mining Mod 2
7 pages
06 FPBasic
No ratings yet
06 FPBasic
77 pages
CS2202 AssociationRuleMining
No ratings yet
CS2202 AssociationRuleMining
59 pages
Session 8-Association Rules Mining
No ratings yet
Session 8-Association Rules Mining
75 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Unit 4 .3 Association Analysis
No ratings yet
Unit 4 .3 Association Analysis
50 pages
Seminar 6
No ratings yet
Seminar 6
30 pages
Slides
No ratings yet
Slides
92 pages
Chapter 7
No ratings yet
Chapter 7
8 pages
Lab8 Apriori
No ratings yet
Lab8 Apriori
9 pages
Computing Techniques-Continued: Association Rule Mining Clustering Time Series Analysis
No ratings yet
Computing Techniques-Continued: Association Rule Mining Clustering Time Series Analysis
174 pages
Association Rules
No ratings yet
Association Rules
39 pages
(2025-05-27) - FPM - Lecture 9
No ratings yet
(2025-05-27) - FPM - Lecture 9
35 pages
Unit 4
No ratings yet
Unit 4
97 pages
Association Rules & Clustering Techniques
No ratings yet
Association Rules & Clustering Techniques
13 pages
Rule Mining by Akshay Rele
No ratings yet
Rule Mining by Akshay Rele
42 pages
Frequent Itemsets and Associations
No ratings yet
Frequent Itemsets and Associations
15 pages
BD25
No ratings yet
BD25
19 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
Market Basket Analysis with Association Rules
No ratings yet
Market Basket Analysis with Association Rules
54 pages
Association Rules and Frequent Item Analysis
No ratings yet
Association Rules and Frequent Item Analysis
30 pages
Association Rule Mining Spring 2022
No ratings yet
Association Rule Mining Spring 2022
84 pages
Basic Association Analysis in Data Mining
No ratings yet
Basic Association Analysis in Data Mining
71 pages
Unit 4
No ratings yet
Unit 4
72 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
Unit4 1 Association Rules Apriori
No ratings yet
Unit4 1 Association Rules Apriori
23 pages
04-Association Rule Mining
No ratings yet
04-Association Rule Mining
22 pages
Apriori Algorithm in Data Mining
No ratings yet
Apriori Algorithm in Data Mining
19 pages
W5 - Apriori
No ratings yet
W5 - Apriori
16 pages
Market Basket Analysis & Apriori Algorithm
No ratings yet
Market Basket Analysis & Apriori Algorithm
10 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
44 pages
DM Association
No ratings yet
DM Association
43 pages
Association Rule Mining Techniques
No ratings yet
Association Rule Mining Techniques
15 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
15 pages
Association Rules
No ratings yet
Association Rules
33 pages
DM - Unit 2
No ratings yet
DM - Unit 2
49 pages
Association Analysis in Data Mining
No ratings yet
Association Analysis in Data Mining
72 pages
Mining Frequent Patterns in Transactions
No ratings yet
Mining Frequent Patterns in Transactions
37 pages
Mod 4 Part1 - Merged
No ratings yet
Mod 4 Part1 - Merged
104 pages
Chapter-5 C# Collections
No ratings yet
Chapter-5 C# Collections
17 pages
Task2 SemesterTest
No ratings yet
Task2 SemesterTest
1 page
CTRR TSM
No ratings yet
CTRR TSM
4 pages
COS10025 Workshop Week 6-5
No ratings yet
COS10025 Workshop Week 6-5
22 pages
COS10025 Workshop Week 3
No ratings yet
COS10025 Workshop Week 3
21 pages
COS10025 Workshop Week 4-6-1
No ratings yet
COS10025 Workshop Week 4-6-1
13 pages
Kirchoff's Node Method in Circuits
No ratings yet
Kirchoff's Node Method in Circuits
12 pages
CSS Chapter 4 (Part 1) Notes by Ur Engineering Friend
No ratings yet
CSS Chapter 4 (Part 1) Notes by Ur Engineering Friend
6 pages
SAP Financial Reporting: - Aims & Objectives
No ratings yet
SAP Financial Reporting: - Aims & Objectives
21 pages
Yeni Metin Belgesi
No ratings yet
Yeni Metin Belgesi
4 pages
Bugreport OnePlus7T QKQ1.190716.003 2020 08 28 21 16 33 Dumpstate - Log 17562
No ratings yet
Bugreport OnePlus7T QKQ1.190716.003 2020 08 28 21 16 33 Dumpstate - Log 17562
31 pages
Manual
No ratings yet
Manual
249 pages
CSE 344 Midterm SQL & DB Queries
No ratings yet
CSE 344 Midterm SQL & DB Queries
8 pages
SIA324 - Margosis (Defense Against The Dark Ages - Your Old Web Apps Are Trying To Kill You)
No ratings yet
SIA324 - Margosis (Defense Against The Dark Ages - Your Old Web Apps Are Trying To Kill You)
37 pages
Programming Language Essentials
No ratings yet
Programming Language Essentials
11 pages
Activity Unit 3
No ratings yet
Activity Unit 3
5 pages
India's IT Sector: Growth & Impact
No ratings yet
India's IT Sector: Growth & Impact
4 pages
Core I7 6xxx Lga2011 v3 Datasheet Vol 1 PDF
No ratings yet
Core I7 6xxx Lga2011 v3 Datasheet Vol 1 PDF
82 pages
Backup: (Note 1)
No ratings yet
Backup: (Note 1)
20 pages
Determinants and Matrix Properties Guide
No ratings yet
Determinants and Matrix Properties Guide
7 pages
History of Computer Technology
No ratings yet
History of Computer Technology
50 pages
Ame - Met 306 - M 2 - V1
No ratings yet
Ame - Met 306 - M 2 - V1
88 pages
The Newport E360t Ventilator Specifications: Controls and Features
No ratings yet
The Newport E360t Ventilator Specifications: Controls and Features
2 pages
10L Statement
No ratings yet
10L Statement
4 pages
MASW Tutorial GEOPSY
No ratings yet
MASW Tutorial GEOPSY
15 pages
LG Electronics Product Cashback Offers
No ratings yet
LG Electronics Product Cashback Offers
120 pages
Piping Work: Objective
No ratings yet
Piping Work: Objective
3 pages
Script Simulation
No ratings yet
Script Simulation
5 pages
Notes Web Technologies
No ratings yet
Notes Web Technologies
118 pages
Unit V - Activity and Multimedia With Databases: Intent
No ratings yet
Unit V - Activity and Multimedia With Databases: Intent
53 pages
Strategi Peningkatan Layanan Putat Gede
No ratings yet
Strategi Peningkatan Layanan Putat Gede
14 pages
Build & Release Engineer Profile
No ratings yet
Build & Release Engineer Profile
4 pages
Salesforce Spring14 Release Notes
No ratings yet
Salesforce Spring14 Release Notes
357 pages
POS Solutions for Tech Developers
No ratings yet
POS Solutions for Tech Developers
14 pages
Advantages and Disadvantages For Nurses of Using Social Media
No ratings yet
Advantages and Disadvantages For Nurses of Using Social Media
3 pages
Keylock7700 E
No ratings yet
Keylock7700 E
2 pages

COS10022 DSP Week06 Association Rules

Uploaded by

COS10022 DSP Week06 Association Rules

Uploaded by

COS10022 – DATA SCIENCE PRINCIPLES

Dr Pei-Wei Tsai (Lecturer, Unit Convenor)

Evaluation of Candidate Rules

Application of Association Rules

Validation and Testing

• An unsupervised learning method.

• A descriptive, not predictive, method.

• Used to discover interesting hidden

• The disclosed relationships are represented

• Of those customers who are similar to this

1. A large collection of transactions (depicted as

2. Association rules go through the items being

3. The rules suggest that when cereal is purchased,

• One major component of Apriori is support.

• If an item is considered frequent, then any subset of the

• If itemset {B, C, D} is frequent, then all the subset of this

Evaluation of Candidate Rules

Application of Association Rules

Validation and Testing

• First, the set of frequent 1-itemsets is found by scanning the database to

• The finding of each Lk requires one full scan of the database.

• Main steps of iteration are:

• Itemsets that do not

Evaluation of Candidate Rules

Application of Association Rules

Validation and Testing

• Second, the appropriateness of these candidate rules are evaluated using:

• A relationship may be thought of as interesting when the algorithm identifies the

• Problem with Confidence:

• Lift = 1 if X and Y are statistically independent

• E.g. Assuming 1000 transactions,

Leverage( X → Y ) = Support( X + Y ) − Support( X ) × Support( Y )

• Leverage = 0 if X and Y are statistically independent

• E.g. Assuming 1000 transactions,

• It again confirms that milk and bread have a stronger association

{Milk} → {Egg} {Milk} → {Bread}

• Support, confidence, lift and leverage ensures the discovery of

Evaluation of Candidate Rules

Application of Association Rules

Validation and Testing

7. Generate candidate rules: Take the 8. Calculate confidence of each candidate

T101 I2, I4 • Suppose minimum support count required is 2 (i.e.

T103 I1, I2, I4

Itemset Sup.Count Itemset Sup.Count

Scan D to {I1} 6 Compare {I1} 6

Itemset Itemset Sup. Itemset Sup

• To discover the set of frequent 2-itemsets, L2 , the algorithm uses L1 Join L1 to

• The set of frequent 2-itemsets, L2 , is then determined, consisting of those

• Note: We haven’t used Apriori Property yet.

• The algorithm uses L3 Join L3 to generate a candidate set of 4-itemsets, C4.

• Lets take the 3-itemsets: {I1,I2,I5}.

• The resulting candidates rules are:

Evaluation of Candidate Rules

Application of Association Rules

Validation and Testing

• Example: according to the transaction data…

• How do you use such a pattern/knowledge?

Evaluation of Candidate Rules

Application of Association Rules

Validation and Testing

• However, some of the remaining rules may be considered subjectively uninteresting

• E.g., rules like {paper} → {pencil} are not interesting/meaningful

• Incorporating subjective knowledge requires domain experts

Evaluation of Candidate Rules

Application of Association Rules

Validation and Testing

• Additionally, some of the rules may be generated due to coincidental

• In related research, a variant of the algorithm can use a predefined target

• Algorithm requires a scan of the entire database to obtain the result.

Hash-based itemset counting:

Dynamic itemset counting:

• Data Camp, ‘Market Basket Analysis using R’, retrieved from

You might also like