0% found this document useful (0 votes)

156 views86 pages

Data Mining for Analysts

The document discusses mining association rules in large databases. It defines association rules as rules that describe relationships between items in a transactional database based on support and confidence metrics. Mining association rules involves first finding frequent itemsets that occur together above a minimum support threshold, and then generating association rules from those itemsets where the confidence is above a minimum confidence threshold. The Apriori algorithm is introduced as a method for efficiently finding all frequent itemsets in a database.

Uploaded by

Brook Barock

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

156 views86 pages

Data Mining for Analysts

Uploaded by

Brook Barock

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 86

Chapter Four

Mining Association Rules in Large Databases

March 10, 2024 Data Mining: Concepts and Techniques 1

Outline
 Association Rule
 Frequent Pattern
 Association mining from frequent Pattern
 Issues to be considered?
 Classification of Frequent Pattern Mining
Mining Frequent Itemsets: the Key Step
 Algorithm to find Frequent Itemsets
 The Apriori Algorithm
 Generating Association Rules from Frequent Itemsets

Data Mining: Concepts and Techniques 2

Association Rule
 Association rule is a rule which is described in the form of XY with
interestingness measure of support and confidence where
 X and Y are Simple or complex Statements
 A simple Statement is to mean a statement formed from a single attribute
say age, buy or sex and a value which is related by relational operator
 Example:

 Buy(X, “Computer”) Buy(X, “Printer”)[Supp = 25%, conf=95%]

 Which is to mean a person X who buy a computer also buy a printer .

 25% of the entire data shows a person buy a computer and printer
(support). Out of the tuples that buy a computer, 95% of them also buy
printer (confidence)

March 10, 2024 Data Mining: Concepts and Techniques 3

Association Rule
 Example 2:
 Buy(X, “Computer”) Buy(X, “Antivirus”)[Supp = 2%, conf=60%]
 Rule support and confidence are two measures of rule
interestingness. They respectively reflect the usefulness and
certainty of discovered rules.
 A support of 2% means that 2% of all the transactions under
analysis show that computer and antivirus software are purchased
together.
 A confidence of 60% means that 60% of the customers who
purchased a computer also bought the Antivirus.
 Association rules are considered interesting if they satisfy both a
minimum support threshold and a minimum confidence
threshold. These thresholds can be a set by users or domain
experts.

March 10, 2024 Data Mining: Concepts and Techniques 4

Association Rule
 A complex statement is usually represented as conjunction of simple
statements
 Example:
 Buy(X, “Computer”) ʌ Buy(X, “printer”)  Buy(X, “Scanner”)[Supp =
50%, Conf=90%]
 Which is to mean a person X who buy a computer and a printer also buy a
scanner.
 50% of the entire data shows a person buy a computer, a printer and scanner
among the entire data set(support).
 Out of all transactions with a person that buy computer and printer, 90% of
them also buy scanner(confidence)
 In order to mine such association rule, we need to discuss deeply about
frequent pattern and its extraction algorithm

March 10, 2024 Data Mining: Concepts and Techniques 5

Frequent Pattern
 Frequent pattern are patterns (such as item set, sub sequences, or sub
structures) that appear in a data set frequently.
 An item set are two or more items that appear together in a transaction data
set.
 An item set is said to be frequent item set if the item set appear frequently
together in a transaction data set.
 For example a milk and bread may occur together frequently in a single
transaction and hence are frequent item set.

 Subsequence refers to items that happen in transaction in a sequential order.

 For example, buying a computer at time t may be followed by buying a
0
digital camera at time t1, and buying memory card at time t2.
 A sub sequence that appear most frequently is said to be frequent
subsequence.

March 10, 2024 Data Mining: Concepts and Techniques 6

Frequent Pattern
 A sub structure refers to different structural forms of the data set, such as
sub-graphs, sub-trees, or sub-lattices, which may be combined with item
sets or subsequences.
 If a substructure occurs frequently, it is called a (frequent) structured
pattern.
 Finding such frequent patterns plays an essential role in mining
associations, correlations, classification, clustering, and other data mining
tasks as well.
 Thus, frequent pattern mining has become an important data mining task and
a focused theme in data mining research.

 This chapter is dedicated to methods of frequent itemset mining.

March 10, 2024 Data Mining: Concepts and Techniques 7

Frequent Pattern
 We look into the following questions:
 How can we find frequent itemsets from large amounts of data, where

the data are either transactional or relational?

 How can we mine association rules in multilevel and multidimensional

space?
 Which association rules are the most interesting?

 How can we help or guide the mining procedure to discover interesting

associations or correlations?
 How can we take advantage of user preferences or constraints to speed

up the mining process?

March 10, 2024 Data Mining: Concepts and Techniques 8

Frequent Pattern

 Frequent itemset mining leads to the discovery of associations and correlations

among items in large transactional or relational data sets.

 With massive amounts of data continuously being collected and stored, many
industries are becoming interested in mining frequent itemset patterns from
their databases.

 The discovery of interesting correlation relationships among huge amounts of

business transaction records can help in many business decision-making
processes such as:
 market basket analysis, catalog design, cross-marketing, loss-leader
analysis and customer shopping behavior analysis.

March 10, 2024 Data Mining: Concepts and Techniques 9

Association mining from frequent Pattern

 Rule form: “Body (X) -> Head (Y) [support, confidence]”.

 Which is read as if body (X) then head (Y) will occur together in the
transaction with the stated support and confidence

 Rule support and confidence are two measures of rule interestingness. They
respectively reflect the usefulness and certainty of discovered rules.

 Typically, association rules are considered interesting if they satisfy both a

minimum support threshold and a minimum confidence threshold.

 Such thresholds can be set by users or domain experts.

March 10, 2024 Data Mining: Concepts and Techniques 10

Association mining from frequent Pattern

 Additional analysis can be performed to uncover interesting statistical

correlations between associated items.
 Let I ={I , I , …, I } be a set of items.
1 2 m

 Let D, the task-relevant data set, be a set of database transactions where each
transaction T is a set of items such that T I .
 Each transaction is associated with an identifier, called TID (Transaction ID).
 Let A be a set of items.
 transaction T is said to contain A if and only if A  T.
 An association rule is an implication of the form A  B, where A I , B  I ,
and AB=.

March 10, 2024 Data Mining: Concepts and Techniques 11

Association mining from frequent Pattern

 The rule A  B holds in the transaction set D with support s, where s is the
percentage of transactions in D that contain A B (i.e., the union of itemsets A
and B, or say, both A and B).

 This is taken to be the probability, P(A  B) =

 Support shows the probability that all the predicates in A and B fulfill together.
 Count of tuples that has both A and B divided by total number of tuples in
the working data set

March 10, 2024 Data Mining: Concepts and Techniques 12

Association mining from frequent Pattern

 The rule A  B has confidence c in the transaction set D, where c is the

percentage of transactions in D containing A that also contain B.
 This is taken to be the conditional probability, P(B|A)=

 Confidence measure how often predicates B fulfilled if predicate A get fulfilled.

 Count of tuples that has both A and B together divided by total number of tuples
that has A
 That is
support(A B) = P(A  B) =

confidence(A B) = P(B|A) =

March 10, 2024 Data Mining: Concepts and Techniques 13

Association mining from frequent Pattern

• Rules that satisfy both a minimum support threshold (min sup) and a minimum
confidence threshold (min conf) are called strong.

• By convention, we write support and confidence values so as to occur between

0% and 100%, rather than 0 to 1.0 which require to multiply by 100%.
A and B occur together (C1)
Others occur A occur without B (C2)
without A and B B
(C4)

A
B occur without A (C3)

March 10, 2024 Data Mining: Concepts and Techniques 14

Association mining from frequent Pattern

 A set of items is referred to as an itemset.

 An itemset that contains k items is a k-itemset.
 The set {computer, antivirus software} is a 2-itemset.
 The occurrence frequency of an itemset is the number of transactions that
contain the itemset.
 This is also known as the frequency, support count, or count of the itemset.

 Note that the itemset support defined before is sometimes referred to as

relative support, whereas the occurrence frequency is called the absolute
support.
 If the relative support of an itemset I satisfies a prespecified minimum support
threshold (i.e., the absolute support of I satisfies the corresponding minimum
support count threshold), then I is a frequent itemset.

March 10, 2024 Data Mining: Concepts and Techniques 15

Association mining from frequent Pattern

 The set of frequent k-itemsets is commonly denoted by Lk.

 From the previous equation, we have

 confidence(AB) = P(B | A)
= support(A  B)/ support(A) (relative support)
= support_count(A  B)/support_count(A) (absolute support)

 The above equation shows that the confidence of rule A  B can be easily
derived from the support counts of A and A  B.
 That is, once the support counts of A, B, and A  B are found, it is straightforward
to derive the corresponding association rules AB and B A and check whether
they are strong.
 Thus, the problem of mining association rules can be reduced to that of mining
frequent itemsets.
March 10, 2024 Data Mining: Concepts and Techniques 16
Association mining from frequent Pattern:
Support and Confidence example

 Consider the following 4 transactions.

Transaction ID Items Bought
2000 A,B,C
1000 A,C
4000 A,D
5000 B,E,F
 The support for the various item set can be computed and the result shows:

support(A) 3 support(B,E,F) 1 support(B,F) 1

support(A,C) 2 support(A,D) 1 support(E,F) 1
support(B) 2 support(A,B) 1 support(D) 1
support( C ) 2 support(B,C) 1 support(E) 1
support(A,B,C) 1 support(B,E) 1 support(F) 1

March 10, 2024 Data Mining: Concepts and Techniques 17

Association mining from frequent
Pattern: Support and Confidence example
Transaction ID Items Bought
 The following are some of the association rules 2000 A,B,C
1000 A,C
with support and confidence. 4000 A,D
 A  B (25%, 33.3%)  A B  C (25%, 100%) 5000 B,E,F
 A  C (50%, 66.6%)  A  C B (25%, 50%) support(A) 3
support(A,C) 2
 A  D (25%, 33.3%)  B  C A (25%, 100%) support(B) 2
 B  A (25%, 50%)  A  C B (25%, 33.3%) support( C ) 2
support(A,B,C) 1
 C  A (50%, 100%)  B  A C (25%, 50%)
support(B,E,F) 1
 D  A (25%, 100%)  C  A B (25%, 50%) support(A,D) 1
 B  C (25%, 50%)  B E  F (25%, 100%) support(A,B) 1

 B  E (25%, 50%)  B  F E (25%, 100%) support(B,C) 1

support(B,E) 1
 B  F (25%, 50%)  E  F B (25%, 100%) support(B,F) 1
 C B (25%, 50%)  B  E F (25%, 50%) support(E,F) 1
 E B (25%, 100%)  E  B F (25%, 100%) support(D) 1
support(E) 1
 F B (25%, 100%)  F  B E (25%, 100%) support(F) 1
March 10, 2024 Data Mining: Concepts and Techniques 18
s and Techniques

Association mining from frequent Pattern

 In general, association rule mining can be viewed as a two-step process:

1. Find all frequent itemsets
2. Generate strong association rules from the frequent itemsets

Find all frequent itemsets

 By definition, each of these itemsets will occur at least as frequently as a
predetermined minimum support count, min sup.
 Let the minimum support count is 50% for the previous transaction which
consists of 4 transactions.
 This enable generation of the following item set
support(A) 3
support(A,C) 2
support(B) 2
support( C ) 2

March 10, 2024 19

Association mining from frequent Pattern

Generate strong association rules from the frequent

itemsets: support(A) 3
 At this step, we need to select association rules support(A,C) 2
that must satisfy minimum support and minimum support(B) 2
confidence. support( C ) 2

• In the example considered above, only two associations are possible: A C

and C A.
• Let the minimum confidence is 80%, i.e. (s, c) = ( 50%, 80%)
• Hence the rule which full fill the condition is C  A (50%, 100%)
• Where as A  C (50%, 66.6%) doesn’t fulfill the requirement of confidence
and filtered out
• As the second step is much less costly than the first, the overall performance of
mining association rules is determined by the first step

March 10, 2024 Data Mining: Concepts and Techniques 20

Classification of Frequent Pattern
Mining

 Frequent pattern mining can be classified in various ways, based on

different criteria, two of which are
1. Based on the levels of abstraction involved in the rule set:

2. Based on the number of data dimensions involved in the rule:

 Some other criterion may be

A. Based on the completeness of patterns to be mined:

B. Based on the types of values handled in the rule:

C. Based on the kinds of rules to be mined

Data Mining: Concepts and Techniques 21

Classification of Frequent Pattern
Mining

 Based on the levels of abstraction involved in the rule set:

 Based on the level of abstraction, we can classify frequent pattern mining as
single level and multiple level mining
 Multiple level frequent pattern mining for association rule can find rules at
differing levels of abstraction.
 For example, suppose that a set of association rules mined includes the following
rules where X is a variable representing a customer:
buys(X, “computer”)buys(X, “HP printer”)
buys(X, “desktop computer”)buys(X, “HP printer”)
 In the above Rules, the items bought are referenced at different levels of
abstraction (e.g., “computer” is a higher-level abstraction of “desktop computer”).
 If, instead, the rules within a given set do not reference items or attributes at
different levels of abstraction, then the set contains single-level association
rules.

March 10, 2024 Data Mining: Concepts and Techniques 22

Classification of Frequent Pattern
Mining
 Based on the number of data dimensions involved in the rule:
 Based on the number of data dimensions involved in the rule we can classify

frequent pattern mining as single dimensional or multidimensional

 If the items or attributes in an association rule reference only one dimension,
then it is a single-dimensional association rule.
buys(X, “computer”) buys(X, “antivirus software”)
buys(X, “computer”)buys(X, “HP printer”)
buys(X, “laptop computer”)buys(X, “HP printer”)
 The above rules are single-dimensional association rules because they each
refer to only one attribute/dimension, buys.
 If a rule references two or more dimensions, such as the dimensions or
attributes age, income, and buys, then it is a multidimensional association
rule.
 The following rule is an example of a multidimensional rule:
age(X, “30. . . 39”) ^ income(X, “42K. . .48K”)buys(X, “high resolution
TV”)
March 10, 2024 Data Mining: Concepts and Techniques 23
Mining Frequent Itemsets: the Key Step

 In order to mine association rule using frequent itemset from a database,

we should perform the following basic steps
1. Find the frequent itemsets:
 the sets of items that have minimum support
 A subset of a frequent itemset is also frequent i.e., if {AB} is a
frequent itemset, both {A} and {B} are frequent
 A number of algorithms are suggested to find the set of closed or
maximal frequent items
2. Use the frequent itemsets to generate association rules that fulfill the
confidence criteria.

March 10, 2024 Data Mining: Concepts and Techniques 24

Algorithm to find Frequent Itemsets

 There are a number of algorithms to find frequent itemset in

mining association pattern from the data set
 Some of them are:
1. The apriori algorithm
2. Frequent pattern growth method
3. Vertical data format method

March 10, 2024 Data Mining: Concepts and Techniques 25

Algorithm to find Frequent Itemsets

1. The apriori algorithm:

 It iteratively find frequent itemsets with cardinality from 1 to k (k-
itemset)

2. Frequent pattern growth method

 Find frequent item set using divide and conquer method of frequent
pattern tree

March 10, 2024 Data Mining: Concepts and Techniques 26

Algorithm to find Frequent Itemsets

3. Vertical data format method

 Usually working data set is represented as a set of record where each
record is identified by transaction id (TID) and associated itemsets.
 This format is called horizontal data format
 Vertical data format represent a record which is uniquely identified
by an item name and having associated transaction ids for that item.
 This approach uses this format of input data to discover all frequent
pattern

 We will discuss in this chapter only the first approach (Apriori algorithm)

March 10, 2024 Data Mining: Concepts and Techniques 27

The Apriori Algorithm

 Assume:
 Lk be the set of all frequent k-itemsets which are ordered
lexicographically (i.e. the ith itemset in Lk is smaller than the jth
itemset iff i< j)
 Ck be the set of k-itemset which is a super set of Lk .
 li and lj be the ith and jth k-itemset from a given Lk and each of their
elements are also sorted lexicographically.

March 10, 2024 Data Mining: Concepts and Techniques 28

The Apriori Algorithm

 The Apriori algorithm will have the following steps

 Initialization
 Join Step
 Prune Step
 Generation

March 10, 2024 Data Mining: Concepts and Techniques 29

The Apriori Algorithm

 Initialization
 Generate all the frequent itemset with cardinality of 1
(i.e. L1) in which each elements are sorted
lexicographically.
 Let L be {{i }, {i }, {i }, {i }, {i }} (Note the
1 1 4 7 9 11
ordering)

March 10, 2024 Data Mining: Concepts and Techniques 30

The Apriori Algorithm

 Join Step:
 Generate the candidate k-itemsets by joining L with itself
k-1

(i.e. Ck = Lk-1 ∞ Lk-1) using the following procedure:

 Take any two element from Lk-1 where each of them are
similar in all their elements except the last
 Form k-itemset set by union operation of the two (k-1)-
itemset
 Repeat the procedure for all possible such elements

March 10, 2024 Data Mining: Concepts and Techniques 31

The Apriori Algorithm

 Join Step:
o Let’s assume L2 = {{i1,i4}, {i1,i9}, {i1,i11} , {i4,i9} , {i4,i11} , {i7,i9} ,
{i7,i11}}
o The candidate 3-itemsets are {{i1,i4,i9}, {i1,i4,i11}, {i1,i9,i11},
{i4,i9,i11}, {i7,i9,i11} } (Note each elements are sorted and the
elements of the elements are also sorted)
o Note {i9,i11} is subset of the generated 3-itemset but not in L2.
o As a result, some of the 2-itemset are not frequent and hence those
3-item set having {i9,i11} as its subset could not fulfill the
requirement to be frequent itemset.
o which leads into immediate removal of the 3 candidate 3-itemsets
in the next step
March 10, 2024 Data Mining: Concepts and Techniques 32
The Apriori Algorithm
 Prune Step:
 generate Ck from the candidate k-itemset by pruning apriori those
elements which has subsets that are not frequent
 This can be best done by checking if an element in the k-itemset has
Any (k-1)-itemset that is not frequent.
 If such an element exist, it should be prunned as it is not frequent

March 10, 2024 Data Mining: Concepts and Techniques 33

The Apriori Algorithm

 Generation:
 Generate L from C by eliminating elements which are
k k
not frequent
 This can be best done by assigning count to each k-

itemset in Ck by exploring the entire database transaction

March 10, 2024 Data Mining: Concepts and Techniques 34

The Apriori Algorithm

 Input:
 D, a database of transactions;

 Min_sup, the minimum support count threshold.

 Output:
 L, frequent itemsets in D.

March 10, 2024 Data Mining: Concepts and Techniques 35

The Apriori Algorithm

 Method:
1. L1 = find frequent 1-itemsets(D); //initialize
2. for (k = 2;Lk-1;k++) {
3. Ck = apriori_gen(Lk-1); //join and prune
4. for each transaction t  D {// scan D for counts
5. Ct = subset(Ck, t);
// get the subsets of t that are
candidates
6. for each candidate c  C t

7. c.count++;
8. }
9. Lk = { c  Ck | c:count  min_sup}
//generate
10. }
March 10, 2024 11. return L = k LData
k; Mining: Concepts and Techniques 36
The Apriori Algorithm
procedure apriori_gen(Lk-1:frequent (k-1)-itemsets)
1. for each itemset l1  Lk-1{
2. for each itemset l2  Lk-1 {
3. if (l1[1] = l2[1])^(l1[2] = l2[2])^ . . . ^(l1[k-2] = l2[k-2])^
(l1[k-1] < l2[k-1]) then {
4. c = l1 ∞ l2; // join step: generate candidates
5. if (not (has_infrequent_subset(c, Lk-1))) then
6. add c to Ck;

7. else delete c; // prune step: remove unfruitful candidate

8. }
9. }
10. }
11. return Ck;
March 10, 2024 Data Mining: Concepts and Techniques 37
The Apriori Algorithm

procedure has_infrequent_subset (c: candidate k-itemset; Lk-1: frequent (k-

1)-itemsets); // use prior knowledge
1. for each (k-1)-subset s of c
2. if s  Lk-1 then
3. return TRUE;
4. return FALSE;

March 10, 2024 Data Mining: Concepts and Techniques 38

The Apriori Algorithm — Example
support = 2
C1
L1 C2
Database D itemset sup. itemset
TID Items {1} 2 itemset sup.
{1 2}
100 134 Scan D {2} 3 {1} 2 {1 3}
200 235 {3} 3 {2} 3 {1 5}
300 1235 {4} 1 {3} 3 {2 3}
400 25 {5} 3 {5} 3
{2 5}
{3 5}
itemset sup
{1, 2, 3} 1 C2 Scan D
{1, 2, 5} 1
L2
itemset sup
itemset sup
{1, 3, 5} 1 {1 2} 1
C3 {1 3} 2
L3 {2, 3, 5} 2
{2 3} 2
{1 3} 2
Scan D itemset {1 5} 1
itemset sup {2 5} 3 {2 3} 2
{2 3 5} 2 {2 3 5}
{3 5} 2 {2 5} 3
{3 5} 2
March 10, 2024 Data Mining: Concepts and Techniques 39
 Exercise: Given support and confidence vales as 50%
and 80%. Find the frequent 3-itemset with strong
association rule.
 Exercise 2:
 Let the database of transactions consist of the sets
{1,2,3,4}, {1,2}, {2,3,4}, {2,3}, {1,2,4}, {3,4}, and
{2,4}. Find a frequent 3 –itemset if min_sup = 3.

March 10, 2024 Data Mining: Concepts and Techniques 40

Problem
 Find all the
frequent three
itemsets using
apriori algorithm.
 Given:
Min_sup=20%

41
42
Generating Association Rules

43
 Example of Apriori: Support threshold=50%,
Confidence= 60%
Table 1
Transaction List of items
T1 I1,I2,I3
T2 I2,I3,I4
T3 I4,I5
T4 I1,I2,I4
T5 I1,I2,I3,I5
T6 I1,I2,I3,I4
 Solution:
 Support threshold=50% => 0.5*6= 3 => min_sup=3
March 10, 2024 Data Mining: Concepts and Techniques 44
 1. Count Of Each Item
Item Count
I1 4
I2 5
I3 4
I4 4
I5 2
 2. Prune Step: TABLE -2 shows that I5 item does not
meet min_sup=3, thus it is deleted, only I1, I2, I3, I4
meet min_sup count.

March 10, 2024 Data Mining: Concepts and Techniques 45

tem Count
I1 4
I2 5
I3 4
I4 4

 3. Join Step: Form 2-itemset. From TABLE-

1 find out the occurrences of 2-itemset.

March 10, 2024 Data Mining: Concepts and Techniques 46

 4. Prune Step: TABLE shows that item set
{I1, I4} and {I3, I4} does not meet min_sup,
thus it is deleted.
Item Count
I1,I2 4
I1,I3 3
I1,I4 2
I2,I3 4
I2,I4 3
I3,I4 2

March 10, 2024 Data Mining: Concepts and Techniques 47

 5. Join and Prune Step: Form 3-itemset.
From the TABLE- 1 find out occurrences of 3-
itemset. From below TABLE, find out the 2-
itemset subsets which support min_sup.
Item Count
I1,I2 4
I1,I3 3
I2,I3 4
I2,I4 3

March 10, 2024 Data Mining: Concepts and Techniques 48

 We can see for itemset {I1, I2, I3} subsets,
{I1, I2}, {I1, I3}, {I2, I3} are occurring in
above TABLE thus {I1, I2, I3} is frequent.
 We can see for itemset {I1, I2, I4} subsets,
{I1, I2}, {I1, I4}, {I2, I4}, {I1, I4} is not
frequent, as it is not occurring in
above TABLE thus {I1, I2, I4} is not frequent,
hence it is deleted.

March 10, 2024 Data Mining: Concepts and Techniques 49

 Only {I1, I2, I3} is frequent.
 6. Generate Association Rules: From the frequent
itemset discovered above the association could be:

Item
No I1,I2,I3
t fre

{
qu ent I1,I2,I4
I1,I3,I4
I2,I3,I4
 {I1, I2} => {I3}
Confidence = support {I1, I2, I3} / support {I1, I2} = (3/
4)* 100 = 75%
March 10, 2024 Data Mining: Concepts and Techniques 50
 {I1, I3} => {I2}
 Confidence = support {I1, I2, I3} / support {I1, I3} =
(3/ 3)* 100 = 100%
 {I2, I3} => {I1}
 Confidence = support {I1, I2, I3} / support {I2, I3} =
(3/ 4)* 100 = 75%
 {I1} => {I2, I3}
 Confidence = support {I1, I2, I3} / support {I1} = (3/
4)* 100 = 75%

March 10, 2024 Data Mining: Concepts and Techniques 51

 {I2} => {I1, I3}
 Confidence = support {I1, I2, I3} / support {I2 = (3/
5)* 100 = 60%
 {I3} => {I1, I2}
 Confidence = support {I1, I2, I3} / support {I3} = (3/
4)* 100 = 75%
 This shows that all the above association rules are
strong if minimum confidence threshold is 60%.

March 10, 2024 Data Mining: Concepts and Techniques 52

Generating Association Rules from
Frequent Itemsets

 So far we have seen how frequent itemset from transactions in a

database D have been generated
 These set of frequent itemset are important to generate association
rule pattern that are strong
 A strong association rule that satisfy both minimum support and
minimum confidence
 An association rule formed from frequent itemset automatically
fulfill the support criterion
 So the question is how we know a given association pattern fulfill
the minimum confidence criterion?

March 10, 2024 Data Mining: Concepts and Techniques 53

Generating Association Rules from
Frequent Itemsets

 How?
 For each frequent itemset S, generate all its nonempty subsets.

 For every nonempty subset α and β where α U β = S, output the

rule “α  β ” if
support_count(α U β =S) * 100 ≥ support_count(α) * min_conf

where min_conf is the minimum confidence threshold given where

min_conf is in percentage

March 10, 2024 Data Mining: Concepts and Techniques 54

Mining Multi-Level Associations Rules
 Items often form hierarchy in a dimension
 Multilevel association rule mining refers to mining association
pattern from items at different concept level of a dimension
 Items at the lower level are expected to have lower support.
 Rules regarding itemsets at appropriate levels could be quite useful
Food

milk bread

skim 2% wheat white

Fraser Sunset

March 10, 2024 Data Mining: Concepts and Techniques 55

Mining Multi-Level Associations Rules

 One of the most common algorithm to extract multilevel association rule

is progressive deepening
 In Multilevel association pattern extraction, one may follow uniform
support or reduce support approach as we move down the concept
hierarchy for strong association extraction
 In Multilevel association pattern extraction, two association may be
extracted such that one pattern is more general and the other is specific
cases of it. In this case we may filter the specific one as the general
dominates the specific
 We will see progressive deepening multiple level association rule
mining, the idea of uniform and reduced support concept and rule
filtering

March 10, 2024 Data Mining: Concepts and Techniques 56

Mining Multi-Level Associations Rules

Progressive Deepening
 A top_down, progressive deepening approach:
 First find high-level strong rules:

milk  bread [20%, 60%].

 Then find their lower-level “weaker” rules:
2%milk  wheat bread [6%, 50%].

Food Level 0
milk bread Level 1
skim 2% wheat white Level 2
Fraser Sunset
Level 3

March 10, 2024 Data Mining: Concepts and Techniques 57

Mining Multi-Level Associations Rules

Progressive Deepening
 Variations at mining multiple-level association rules.
 Level-crossed association rules:

2%milk  wheat bread (level 2)

 Association rules with multiple, alternative hierarchies:
2%milk  bread (level 2 and 1)
Food Level 0
milk bread Level 1
skim 2% wheat white Level 2
Fraser Sunset
Level 3

March 10, 2024 Data Mining: Concepts and Techniques 58

Mining Multi-Level Associations Rules

Uniform Support
 In this approach, the same (only one) minimum support needs
to be assessed to measure how frequent a pattern is at all
levels

 This approach, do not need to examine itemsets containing

any item whose ancestors do not have minimum support as
lower level items do not occur as frequently
 Hence, computationally fast.

March 10, 2024 Data Mining: Concepts and Techniques 59

Mining Multi-Level Associations Rules

Uniform Support
 Uniform Support (limitations)
 If support threshold

 too high  miss important low level associations

 too low  generate too many high level associations; most

may not be interesting

March 10, 2024 Data Mining: Concepts and Techniques 60

Mining Multi-Level Associations Rules

Uniform Support

Level 1 Milk
min_sup = 5% [support = 10%]

Level 2 2% Milk Skim Milk

min_sup = 5% [support = 6%] [support = 4%]

If adopting the same min_support across multi-levels then throw away

association T if any of its ancestors itemset is infrequent.
March 10, 2024 Data Mining: Concepts and Techniques 61
Mining Multi-Level Associations Rules

Reduced Support
 In this approach, the algorithm will reduce the
required minimum support as we go down to the
lower concept levels (i.e. the minimum support
reduce as the level increases)
 There are different search strategies to implement
reduced support multilevel association rule (see the
text book):

March 10, 2024 Data Mining: Concepts and Techniques 62

Mining Multi-Level Associations Rules

Reduced Support

Level 1 Milk
min_sup = 5% [support = 10%]

Level 2 2% Milk Skim Milk

min_sup = 3%
[support = 6%] [support = 4%]

If adopting reduced min_support at lower levels then examine only

those descendents whose ancestor’s support is frequent/non-negligible
using the reduced min_support.
March 10, 2024 Data Mining: Concepts and Techniques 63
Mining Multi-Level Associations Rules

Redundancy Filtering
 Some rules may be redundant due to “ancestor” relationships
between items.
 Example
 milk  wheat bread [support = 8%, confidence = 70%]
 2% milk  wheat bread [support = 2%, confidence = 72%]

 Note: 2% milk is a milk and a milk is either 2%milk or skim

 Hence, we may say the first rule is an ancestor of the second
rule.
 A rule is redundant if its support is close to the “expected” value,
based on the rule’s ancestor.

March 10, 2024 Data Mining: Concepts and Techniques 64

Mining Multi-Level Associations Rules

Redundancy Filtering
 What is expected?
 An expected support is that in the concept hierarchy, each ancestor

expect its descendant to have some proportion of the entire data

 For example: 50% of milk is skim and 50% of the milk is 2% milk.

 Example
 milk  wheat bread [support = 8%, confidence = 70%]
 2% milk  wheat bread [support = 2%, confidence = 72%]

 In this case the 2% support of the 2% milk is not expected

 Hence, non of the pattern is redundant

March 10, 2024 Data Mining: Concepts and Techniques 65

Mining Multi-Level Associations Rules

Redundancy Filtering
 Consider the two rules below:
 milk  wheat bread [support = 8%, confidence = 70%]
 2% milk  wheat bread [support = 8%, confidence = 72%]

 This shows all milk are 2% milk as both rules have the same
support
 Hence the 2nd rule is the best rule to take

March 10, 2024 Data Mining: Concepts and Techniques 66

Mining Multi-Level Associations Rules

Redundancy Filtering
 Consider the two rule with expectation of 50% each type of milk:
1. milk  wheat bread [support = 8%, confidence = 70%]
2. 2% milk  wheat bread [support = 2%, confidence = 72%]
3. Skim milk wheat bread [support = 6%, confidence = 69%]
 This shows significant difference between rule 1 and 2 but match
between rule 1 and 3
 Hence the 3rd rule may be removed as it doesn’t add extra
information given rule 1
 Rule 2 must be retained as it doesn’t have support as expected

March 10, 2024 Data Mining: Concepts and Techniques 67

Mining Multi-Dimensional Association Rule

 So far we have seen association rules that is derived from single

attribute such as buys (item sold)
 Such association rule is called single dimensional or intra-
dimensional association rule because it contains a single distinct
predicate (e.g., buys)with multiple occurrences.
 Example:
buys(X, “milk”)  buys(X, “bread”)
 However, we may need to have association rules that involve
predicates from two or more attributes
 Association rules that involve two or more dimensions or
predicates can be referred to as multidimensional association rules

March 10, 2024 Data Mining: Concepts and Techniques 68

Mining Multi-Dimensional Association Rule

 Multidimensional association rules with no repeated predicates are

called interdimensional association rules.
age(X,”19-25”)  occupation(X,“student”)  buys(X,“coke”)

 Multidimensional association rules with one or more repeated

predicates are called hybrid dimensional association rules.
age(X,”19-25”)  buys(X, “popcorn”)  buys(X, “coke”)

March 10, 2024 Data Mining: Concepts and Techniques 69

Mining Multi-Dimensional Association Rule
 During single dimensional association rule mining, we search for
frequent item set
 However, in multidimensional association rule mining we search for
frequent k-predicate sets.
 A k-predicate set is a set containing k conjunctive predicates
 For instance, the set of predicates {age, occupation, buys} is a 3-
predicate set
 For example, we may have a frequent 3-predicate set as
{age=“30..39”, income=“1K..2K”,buys=“laptop” } with support 30%

 Which means among all the individuals at any age range and income
level, 30% of them bought a laptop

March 10, 2024 Data Mining: Concepts and Techniques 70

Mining Multi-Dimensional Association Rule

 In relational database, finding all frequent k-predicate sets will

require k table scans to join each of the k tables

 One more processing of the merged table may be required to check

confidence for a given association pattern

March 10, 2024 Data Mining: Concepts and Techniques 71

Mining Multi-Dimensional Association Rule

 Data cube is well suited for mining as the cell of an n-dimensional

cuboid correspond to the predicate sets.
 Hence, mining from data cubes can be much faster.
()

(age) (income) (buys)

(age, income) (age,buys) (income,buys)

(age,income,buys)

March 10, 2024 Data Mining: Concepts and Techniques 72

Mining Multi-Dimensional Association Rule
 we use the notation Lk to refer to the set of frequent k-predicate sets.

 If the resulting task-relevant data are stored in a relational table or a

data warehouse, then the frequent itemset mining algorithms we have
discussed (such as Apriori algorithm) can be modified easily so as to
find all frequent predicate sets

March 10, 2024 Data Mining: Concepts and Techniques 73

Mining Multi-Dimensional Association Rule

 If the type of the attributes used in MD association rule is

quantitative (numeric nominal) it should be discretized before
starting frequent itemsets extraction

 Techniques of quantitative attribute discretization for mining

multidimensional association rules can be categorized into two
basic approaches:
1. Static discretization
2. Dynamic discretization

March 10, 2024 Data Mining: Concepts and Techniques 74

Mining Multi-Dimensional Association Rule

 Static discretization refers to discretizing quantitative attribute

using predefined concept hierarchy
 Example: income may be replaced by the discrete interval such

as “0..200”, “201.. 1000”, “1001.. 5000”, and “above 5000”

 Such concept hierarchies for the quantitative attribute can be

generated by the preprocessing phase.

March 10, 2024 Data Mining: Concepts and Techniques 75

Mining Multi-Dimensional Association Rule

 Dynamic discretization refers to discretizing quantitative

attribute dynamically based on its attribute distribution
 Usually binning is used for discretization
 These bins may be further combined during the mining process.

 The discretization process is dynamic and established so as to

satisfy some mining criteria, such as maximizing the confidence
of the rules mined.
 Sample example is given below:

March 10, 2024 Data Mining: Concepts and Techniques 76

Mining Multi-Dimensional Association Rule

 While numeric attributes are dynamically discretized, the

following should be taken into consideration: the confidence or
compactness of the rules mined should be maximized.
 2-D quantitative association rules: Aquan1  Aquan2  Acat
 (2D Quantitative Association Rules)

age(X,”20…25”) Λ income(X,”30K…40K”) -> buys (X,

”Laptop Computer”)
 Cluster “adjacent” association rules to form general rules using a
2-D grid.

March 10, 2024 Data Mining: Concepts and Techniques 77

Mining Multi-Dimensional Association Rule

Example:
age(X,”30-34”)  income(X,”24K - 48K”)  buys(X,”high resolution TV”)

• In order to find such a 2D quantitative association rule, there are

a number of algorithms and one of the most common is
Association Rule Clustering Systems (ARCS)

March 10, 2024 Data Mining: Concepts and Techniques 78

ARCS (Association Rule Clustering System)
 How does ARCS work?
 This approach maps pairs of quantitative attributes onto a 2-D

grid for tuples satisfying a given categorical attribute condition.

 The grid is then searched for clusters of points from which the

association rules are generated.

 The following grid shows income versus age distribution of

some measurements and its values

March 10, 2024 Data Mining: Concepts and Techniques 79

ARCS (Association Rule Clustering System)
 Three steps are involved in ARCS: this are
1. binning,
2. Find frequent predicateset,
3. clustering

March 10, 2024 Data Mining: Concepts and Techniques 80

ARCS (Association Rule Clustering System)
1. Binning
• Create a binning so that the set of quantitative values will map into a
single bin and find count information for each pair of bins in each
dimension for each categorical values

• At each grid point we have

support count (i.e. how many
measure at a given age and
income)

March 10, 2024 Data Mining: Concepts and Techniques 81

ARCS (Association Rule Clustering System)
2. Find frequent predicateset
• Scan to find the frequent predicate sets (those satisfying minimum
support) that also satisfy minimum confidence.
• Strong association rules can then be generated from these predicate
sets, using a rule generation algorithm

March 10, 2024 Data Mining: Concepts and Techniques 82

ARCS (Association Rule Clustering System)
3. Clustering
• In this step, neighbouring strong association rule will be merged to
make more general rules by looking at the distribution of the strong
association rules in the 2D space
• Consider the strong association shown bellow
age(X, 34) Λ income(X, “31K:::40K”)  buys(X, “HDTV”)
age(X, 35) Λ income(X, “31K:::40K”)  buys(X, “HDTV”)
age(X, 34) Λ income(X, “41K:::50K”)  buys(X, “HDTV”)
age(X, 35) Λ income(X, “41K:::50K”)  buys(X, “HDTV”).

March 10, 2024 Data Mining: Concepts and Techniques 83

ARCS (Association Rule Clustering System)

age(X, 34) Λ income(X, “31K:::40K”)  buys(X, “HDTV”)

age(X, 35) Λ income(X, “31K:::40K”)  buys(X, “HDTV”)
age(X, 34) Λ income(X, “41K:::50K”)  buys(X, “HDTV”)
age(X, 35) Λ income(X, “41K:::50K”)  buys(X, “HDTV”).

• This can be clustered into the rule

age(X, “34..35”) Λ income(X, “31K..50K”) buys(X, “HDTV”)
March 10, 2024 Data Mining: Concepts and Techniques 84
Limitations of ARCS

 Only quantitative attributes on Left Hand Side (LHS) of rules.

 Only 2 attributes on LHS. (2D limitation)
 An alternative to ARCS
 Non-grid-based
 equi-depth binning
 clustering based on a measure of partial completeness.
 “Mining Quantitative Association Rules in Large
Relational Tables” by R. Srikant and R. Agrawal.

March 10, 2024 Data Mining: Concepts and Techniques 85

Interestingness Measurements
 Objective measures
Two popular measurements:
 support; and

 confidence

 Subjective measures (Silberschatz & Tuzhilin,

KDD95)
A rule (pattern) is interesting if
 it is unexpected (surprising to the user); and/or

 actionable (the user can do something with it)

March 10, 2024 Data Mining: Concepts and Techniques 86

Chapter 7
No ratings yet
Chapter 7
26 pages
Adama Science and Technology University: School of Electrical Engineering and Computing
No ratings yet
Adama Science and Technology University: School of Electrical Engineering and Computing
10 pages
Operating System Course Outcome
No ratings yet
Operating System Course Outcome
6 pages
Preventive Maintenance
No ratings yet
Preventive Maintenance
9 pages
Chapter Four - Arrays and String Manipulation
No ratings yet
Chapter Four - Arrays and String Manipulation
16 pages
Chapter 3 Database Systems and Big Data
No ratings yet
Chapter 3 Database Systems and Big Data
39 pages
Program Logic Design Essentials
No ratings yet
Program Logic Design Essentials
17 pages
Sorting and Searching Algorithms Explained
100% (1)
Sorting and Searching Algorithms Explained
18 pages
Word Excercise
No ratings yet
Word Excercise
41 pages
Counting Principles in Discrete Math
No ratings yet
Counting Principles in Discrete Math
16 pages
Python Programming Lesson 03 - Control Structures in Python: 3.1 Decision Making
No ratings yet
Python Programming Lesson 03 - Control Structures in Python: 3.1 Decision Making
6 pages
Chapter 10 Applet
No ratings yet
Chapter 10 Applet
20 pages
Chapter - Three: Basics of Information, Data, Data Processing and Data Representation
No ratings yet
Chapter - Three: Basics of Information, Data, Data Processing and Data Representation
50 pages
Cryptography and Internet Security Exam
No ratings yet
Cryptography and Internet Security Exam
18 pages
Teff Leaf Disease Detection Using CNN
No ratings yet
Teff Leaf Disease Detection Using CNN
18 pages
Video Editing Course Overview in Ethiopia
No ratings yet
Video Editing Course Overview in Ethiopia
23 pages
DBMS Course Syllabus Overview
No ratings yet
DBMS Course Syllabus Overview
8 pages
Understanding Pointers in C++
No ratings yet
Understanding Pointers in C++
68 pages
Understanding Firewalls and Proxy Servers
No ratings yet
Understanding Firewalls and Proxy Servers
8 pages
Dbms HTML Connection With Example
No ratings yet
Dbms HTML Connection With Example
18 pages
Lecture 2
100% (1)
Lecture 2
25 pages
I. Choose The Best Answer From The Given Alternatives
No ratings yet
I. Choose The Best Answer From The Given Alternatives
1 page
Market Basket Analysis with WEKA
No ratings yet
Market Basket Analysis with WEKA
13 pages
Chapter 3 - Files and Directories
No ratings yet
Chapter 3 - Files and Directories
23 pages
Course Outline - Computer Application and Maintenance
100% (1)
Course Outline - Computer Application and Maintenance
4 pages
Structural Models in Systems Design
No ratings yet
Structural Models in Systems Design
21 pages
Advanced SQL for Database Users
No ratings yet
Advanced SQL for Database Users
91 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
2 pages
Prepared By: CS8661 Internet Programming Lab Manual
No ratings yet
Prepared By: CS8661 Internet Programming Lab Manual
89 pages
Python Programming Basics Guide
No ratings yet
Python Programming Basics Guide
46 pages
Chapter 2 Web Programming Best
No ratings yet
Chapter 2 Web Programming Best
49 pages
Chapter 4,5&6
No ratings yet
Chapter 4,5&6
30 pages
Formal Methods in Software Engineering
No ratings yet
Formal Methods in Software Engineering
6 pages
Java Programming Lab Manual
No ratings yet
Java Programming Lab Manual
73 pages
Connecting to MySQL Databases Guide
No ratings yet
Connecting to MySQL Databases Guide
10 pages
Operating System Overview and History
No ratings yet
Operating System Overview and History
34 pages
System Analysis and Design Module Overview
100% (1)
System Analysis and Design Module Overview
4 pages
DBMS Essentials for Beginners
No ratings yet
DBMS Essentials for Beginners
24 pages
Chapter Two HTML: Internet Programming Compiled By:tadesse K
No ratings yet
Chapter Two HTML: Internet Programming Compiled By:tadesse K
162 pages
Final Assessment 20202
No ratings yet
Final Assessment 20202
9 pages
Ch5 - Basic Computer Organization and Design
No ratings yet
Ch5 - Basic Computer Organization and Design
44 pages
Industrial Project Guidline Revised2
No ratings yet
Industrial Project Guidline Revised2
37 pages
3D Object Representation & Viewing
No ratings yet
3D Object Representation & Viewing
70 pages
Hard Copy of Faculty Feedback System
83% (6)
Hard Copy of Faculty Feedback System
16 pages
Zero Lecture CSE304
No ratings yet
Zero Lecture CSE304
34 pages
History of Computer Graphics
100% (1)
History of Computer Graphics
55 pages
Evolution of Computer Graphics
No ratings yet
Evolution of Computer Graphics
62 pages
Learning Agents & Factors For Designing Learning Agents
No ratings yet
Learning Agents & Factors For Designing Learning Agents
64 pages
Library Management System Project
No ratings yet
Library Management System Project
25 pages
TM09 Monitoring and Supporting Data Conversion
No ratings yet
TM09 Monitoring and Supporting Data Conversion
28 pages
SQL - Create Table
No ratings yet
SQL - Create Table
5 pages
Chapter 3 - Searching and Planning
No ratings yet
Chapter 3 - Searching and Planning
56 pages
Library Management System1
100% (2)
Library Management System1
31 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
93 pages
Miningpattern Association
No ratings yet
Miningpattern Association
53 pages
Chapter 4 Association Rule Mining1
No ratings yet
Chapter 4 Association Rule Mining1
44 pages
Association Rule Mining Techniques
No ratings yet
Association Rule Mining Techniques
108 pages
Chap 6
No ratings yet
Chap 6
77 pages
DM Unit 2
No ratings yet
DM Unit 2
330 pages
Mining Frequent Patterns and Correlations
No ratings yet
Mining Frequent Patterns and Correlations
100 pages
Esd Seminar
No ratings yet
Esd Seminar
11 pages
LG Electronics Product Cashback Offers
No ratings yet
LG Electronics Product Cashback Offers
120 pages
TH Vco Maximus
No ratings yet
TH Vco Maximus
6 pages
Y10 03 CT15 Activities Solutions
No ratings yet
Y10 03 CT15 Activities Solutions
4 pages
Software Quality Assurance Analyst in Austin TX Resume Rendle Taylor
No ratings yet
Software Quality Assurance Analyst in Austin TX Resume Rendle Taylor
2 pages
Pengenalan Sistem Audio & Video
No ratings yet
Pengenalan Sistem Audio & Video
41 pages
Print - Udyam Registration Certificate
No ratings yet
Print - Udyam Registration Certificate
1 page
AT6356E microIFEM Piezo 4P4x PDF
No ratings yet
AT6356E microIFEM Piezo 4P4x PDF
64 pages
DocumentationGuide MICROSAR
No ratings yet
DocumentationGuide MICROSAR
1 page
Yeni Metin Belgesi
No ratings yet
Yeni Metin Belgesi
4 pages
Johannes Martin Berg PDF
No ratings yet
Johannes Martin Berg PDF
55 pages
CRM Success with Salesforce Guide
No ratings yet
CRM Success with Salesforce Guide
2 pages
100+ Excel Functions
No ratings yet
100+ Excel Functions
14 pages
Game Cheat Codes & Controls Guide
No ratings yet
Game Cheat Codes & Controls Guide
2 pages
Introduction To Communication Lab Manual Using Multisim
No ratings yet
Introduction To Communication Lab Manual Using Multisim
40 pages
SAP Workflow Transaction Codes
No ratings yet
SAP Workflow Transaction Codes
3 pages
WBSEDCL Prepaid User Portal ManualV0.1
No ratings yet
WBSEDCL Prepaid User Portal ManualV0.1
22 pages
Advanced Topics: Harmonics Part 2
No ratings yet
Advanced Topics: Harmonics Part 2
7 pages
Xia CS Winter Break Homework
No ratings yet
Xia CS Winter Break Homework
2 pages
Relational Algebra Assignment Solutions
No ratings yet
Relational Algebra Assignment Solutions
8 pages
CSE 344 Midterm SQL & DB Queries
No ratings yet
CSE 344 Midterm SQL & DB Queries
8 pages
Transistor as a Switch Experiment
100% (1)
Transistor as a Switch Experiment
5 pages
Unit 3
No ratings yet
Unit 3
32 pages
ABB CoriolisMaster FCM2000
No ratings yet
ABB CoriolisMaster FCM2000
68 pages
Chapter # 1 (Management and Leadership) : Mind Maps
No ratings yet
Chapter # 1 (Management and Leadership) : Mind Maps
11 pages
HD Mini Cam Manual v1
No ratings yet
HD Mini Cam Manual v1
20 pages
Curve Intersection Finder
No ratings yet
Curve Intersection Finder
5 pages
Key Questions in Computer Organization
No ratings yet
Key Questions in Computer Organization
3 pages
Chapter 8. Wi-Fi 7 Network Planning - Wi-Fi 7 in Depth - Your Guide To Mastering Wi-Fi 7, The 802.11be Protocol, and Their Deployment
No ratings yet
Chapter 8. Wi-Fi 7 Network Planning - Wi-Fi 7 in Depth - Your Guide To Mastering Wi-Fi 7, The 802.11be Protocol, and Their Deployment
43 pages
Ssg-Ng01012401-Gen-Aa-5800-00008 - C01 - Project Interface Managment Plan
100% (1)
Ssg-Ng01012401-Gen-Aa-5800-00008 - C01 - Project Interface Managment Plan
16 pages