0% found this document useful (0 votes)

30 views67 pages

DWDM - Unit - IV

Uploaded by

Saidulu Inamanamelluri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views67 pages

DWDM - Unit - IV

Uploaded by

Saidulu Inamanamelluri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 67

UNIT – IV

 Mining Frequent Patterns, Association and

Correlations: The objective is discovery of interesting
associations, correlations between transactional and relational
databases.

 The discovery of frequent patterns, association, and

correlation relationships among huge amounts of data is
useful in selective marketing, decision analysis, and business
management.

 A popular area of application is market basket analysis, which

studies the buying habits of customers by searching for sets of
items that are frequently purchased together (or) in sequence.
1
 It consists of

1. What is frequent pattern analysis

2. Efficient and scalable frequent itemset mining methods

3. Mining various kinds of Association rules

4. From Association mining to correlation analysis
5. Constraint-based association mining

2
Kinds of Association
Rules

3
Association
Mining Mining to
Correlation
Methods 2 4
Mining Frequent Mining

Patterns, Association
and Correlations Constraint
1 5 Based
Frequent Association
Mining
pattern
Analysis

Fig: Mining Frequent Patterns, Association and Correlations

3
1. What is Frequent Pattern Analysis?
 Frequent Pattern: A Pattern (a set of items, subsequences,
substructures, etc.) that occurs frequently in a data set is called
Frequent Pattern.

 First proposed by Agrawal, Imielinski, and Swami in the context of

frequent itemsets and association rule mining.

 Motivation:
– Finding inherent regularities in data
– What products were often purchased together? - Beer and diapers?!
– What are the subsequent purchases after buying a PC?
– What kinds of DNA are sensitive to this new drug?
– Can we automatically classify web documents?

4
𝐦𝐢𝐥𝐤 , 𝐛𝐫𝐞𝐚𝐝

Fig: Market Basket Analysis

 Why is Frequent Pattern Mining Important?
 It discloses an intrinsic and important property of data sets
 It forms the foundation for many essential data mining tasks
– Association, correlation, and causality analysis
– Sequential, structural (Ex: sub-graph) patterns
– Pattern analysis in spatiotemporal, multimedia, time-series, and
stream data

– Classification: Associative classification

– Cluster analysis: Frequent pattern-based clustering
– Data warehousing: Iceberg cube and cube-gradient
– Semantic data compression: Fascicles(fascicle is a bundle or a
cluster)
– Broad applications
6
 A transaction DB is a collection of sets of items
(transactions).
 An itemset is a set of items.
 An association rule is an implication of the form X=>Y,
with X and Y itemsets.

 Support Count (SC) of an itemset X is the number of

transactions that contain X.
 Support of X (also frequency of X) = SC(X)/SC({ })

 Support of an association rule X=>Y equals the support of

X U Y.

 Confidence of an association rule X=>Y =

Support(X=>Y) / Support(X).
Association rule mining - Two-step Process

1. Find all frequent itemsets: By deﬁnition, each of

these itemsets will occur at least as frequently as
a predetermined minimum support count, min
sup.

2. Generate strong association rules from the

frequent itemsets: By deﬁnition, these rules must
satisfy minimum support and minimum
conﬁdence.

8
Example
Transaction-id Items bought  Itemset X = {x1, …, xk}
10 A, B, D  Find all the rules X  Y with minimum
20 A, C, D
support and confidence
30 A, D, E
– support, s, probability that a
40 B, E, F
transaction contains X  Y
50 B, C, D, E, F
– confidence, c, conditional
Customer Customer probability that a transaction
buys both buys diaper
having X also contains Y
Let supmin = 50%, confmin = 50%
Frequent Pattern: {A:3, B:3, D:4, E:3,
AD:3}
Customer
Association rules:
buys beer
A  D (60%, 100%)
9
D  A (60%, 75%)
Classification of Frequent Pattern Mining Ways
 Market basket analysis is just one form of frequent pattern mining.
 Frequent pattern mining can be classified in various ways, based on the
following criteria:
a) Based on the completeness of patterns to be mined
Complete set of Frequent Itemsets
Closed Frequent Itemsets and Maximal Frequent Itemsets
Constrained Frequent Itemsets, Approximate FI Sets
Near-match frequent itemsets, Top-k Frequent Itemsets
b) Based on the levels of abstraction involved in the rule set
buys(X, “Computer”) buys (X, “HP printer”)
buys(X, “Laptop_Computer”) buys(X, “HP printer”)
c) Based on the no. of data dimensions involved in the rule
buys(X, “computer”) buys(X, “antivirus software”)
Age(x,“30-39”)^income(X,“42K-48K”) buys(X, “HDTV”):
10
Classification of Frequent Pattern Mining ways
i) Based on the types of values handled in the rule
 Boolean Association Rule and Quantitative Association Rule

ii) Based on the kinds of rules to be mined

 Association Rules, Correlation Rules, strong gradient
relationships
 Eg: The average sales from Sony_Digital_Camera increase over
16% when sold together with Sony_Laptop_Computer

iii) Based on the kinds of patterns to be mined

 Frequent Itemset Mining
 Sequential Pattern Mining
 Structured Pattern Mining
11
Advantages
 To increase the business market

 To increase the popularity of stock market analysis.

 Both Apriori and FP-Growth are aiming to find out

complete set of patterns;

 FP-Growth is more efficient and scalable than Apriori in

respect to prolific and long patterns. 12
Applications

 Basket data analysis

 Cross-marketing

 Catalog design
 Sale campaign analysis
 Web log (click stream) analysis
 DNA sequence analysis.

13
2. SCALABLE METHODS FOR MINING FREQUENT PATTERNS

 A Pattern (a set of items, subsequences, substructures,

etc.) that occurs frequently in a data set is called Frequent
Pattern.
 Many efficient and scalable algorithms have been
developed for frequent itemset mining from which
association and correlation rules can be derived.

 The downward closure property of frequent patterns

– Any subset of a frequent itemset must be frequent
– If {beer, diaper, nuts} is frequent, so is {beer, diaper} i.e.,
every transaction having {beer, diaper, nuts} also contains
{beer, diaper}
14
 Efficient and scalable frequent itemset mining methods
a) Finding Frequent Itemsets Using Candidate Generation
b) Generating Association Rules from Frequent Itemsets
c) Improving the Efficiency of Apriori
d) Mining Frequent Itemsets without Candidate Generation
e) Mining Frequent Itemsets Using Vertical Data Format
f) Mining Closed Frequent Itemsets

 Scalable mining methods: Three major approaches

1. Apriori Algorithm - Agrawal & Srikant.
2. Frequent Pattern growth – FPgrowth - Han, Pei & Yin
3. Vertical data format approach – Charm Zaki & Hsiao
1 - Apriori Algorithm is a seminal(Determining) algorithm
for mining frequent itemsets for Boolean association
rules.

 Proposed by R. Agrawal and R. Srikant in 1994.

 Apriori uses a "bottom up" approach, where frequent
subsets are extended one item at a time (a step known as
candidate generation, and groups of candidates are tested
against the data.

 The Apriori algorithm is an efficient algorithm for

finding all frequent itemsets.
 The Apriori algorithm implements level-wise search
using frequent item property.
 It is used to find the frequent item sets of the given data
sets with candidate generation.
 The algorithm terminates when no further successful
extensions are found.
 Apriori uses breadth-first search and a hash tree structure
to count candidate item sets efficiently.

 An iterative/level-wise search, where k-itemsets are used

to explore (k+1)-itemsets.
 The Apriori algorithm can be additionally optimized.
 There are many measures for association rules.

 Property: All nonempty subsets of a frequent itemset

must also be frequent.
17
Ex: 1 – Find Frequent Pattern with Min Supp is 2
Supmin = 2
Itemset sup
Itemset sup
Database TDB {A} 2
L1 {A} 2
Tid Items C1 {B} 3
{B} 3
10 A, C, D {C} 3
{C} 3
20 B, C, E 1st scan {D} 1
{E} 3
30 A, B, C, E {E} 3
40 B, E
C2 Itemset sup C2 Itemset
{A, B} 1
L2 Itemset sup
{A, C} 2 2nd scan {A, B}
{A, C} 2 {A, C}
{A, E} 1
{B, C} 2 {A, E}
{B, C} 2
{B, E} 3
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}

C3 Itemset
3rd scan L3 Itemset sup
{B, C, E} {B, C, E} 2
18
Ex:2 Construct Apriori with Minimum Support is
2

Data Set

19
20
Ex-3: Find frequent pattern with Min Support is 50%

Min support =50% or

2
Database D itemset sup.
L1 itemset sup.
TID Items C1 {1} 2 {1} 2
100 134 {2} 3 {2} 3
200 235 Scan D {3} 3 {3} 3
300 1235 {4} 1 {5} 3
400 25 {5} 3
C2 itemset sup C2 itemset
L2 itemset sup {1 2} 1 Scan D {1 2}
{1 3} 2 {1 3} 2 {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 3} 2 {2 3}
{2 5} 3
{2 5} 3 {2 5}
{3 5} 2
{3 5} 2 {3 5}
C3 itemset Scan D L3 itemset sup
{2 3 5} {2 3 5} 2 21
22
23
Apriori Algorithms consists of

1. Finding Frequent Itemsets Using Candidate Generation.

2. Generating Association Rules from Frequent Itemsets.

3. Improving the Efficiency of Apriori

24
1. Finding Frequent Itemsets Using Candidate
Generation
 Apriori property: All nonempty subsets of a frequent
itemset must also be frequent.
 A two-step process is followed, consisting of join and
prune actions.

1. The join step: To find Lk, a set of candidate k-itemsets

is generated by joining Lk-1 with itself.

2. The prune step: Ck is a superset of Lk, that is, its

members may or may not be frequent, but all of the
frequent k-itemsets are included in Ck
2. Generating Association Rules from
Frequent Itemsets
Association rule mining consists of first finding
frequent itemsets (set of items, such as A and B,
satisfying a minimum support threshold, or
percentage of the task-relevant tuples), from
which strong association rules in the form of
A=>B are generated.

These rules also satisfy a minimum confidence

threshold (a prespecified probability of satisfying
B under the condition that A is satisfied).
 It is an implication expression of the form X → Y, where
X and Y are itemsets
Eg: {Milk, Diaper} → {Beer}
 Strong association rules satisfy both minimum support
and minimum confidence.
 Confidence(A=>B) = P(B/A) = Support count(AUB) /
Support count(A)

 Association rules can be generated as follows:

1. For each frequent itemset l, generate all nonempty
subsets of l.
2. For every nonempty subset s of l, output the rule “s => (l
- s)” if support count(l) / support count(s) >= min conf,
where min conf is the minimum confidence threshold.
27
 Generating association rules. Ex: The transactional data for All
Electronics.
 Suppose the data contain the frequent itemset l = {I1, I2, I5}. What
are the association rules that can be generated from l? The
nonempty subsets of l are {I1, I2}, {I1, I5}, {I2, I5}, {I1}, {I2},
and {I5}.
 The resulting association rules are as shown below, each listed
with its confidence:
• I1Î2=>I5, confidence = 2/4 = 50%
• I1Î5)=>I2, confidence = 2/2 = 100%
• I2Î5=>I1, confidence = 2/2 = 100%
• I1=>I2Î5, confidence = 2/6 = 33%
• I2=>I1Î5, confidence = 2/7 = 29%
• I5=>I1Î2, confidence = 2/2 = 100%
 If the minimum confidence threshold is, say, 70%, then only the
second, third, and last rules above are output, because these are the
only ones generated that are strong. 28
3. Improving the Efficiency of Apriori

Many variations of the Apriori algorithm have

been proposed are
i. Hash-based technique: Hashing itemsets into
corresponding buckets.
ii. Transaction reduction: Reducing the number of
transactions scanned in future iterations.
iii. Partitioning: Partitioning the data to find candidate
itemsets.
iv. Sampling: Mining on a subset of the given data).
v. Dynamic itemset counting: adding candidate itemsets
at different points during a scan. 29
Advantages
 Uses large itemset property.
 Easily parallelized.
 Easy to implement.

Disadvantages
 Multiple scans of transaction database
 Huge number of candidates
 Tedious workload of support counting for candidates
 Reduce passes of transaction database scans
 Shrink number of candidates
 Facilitate support counting of candidates.
 Assumes transaction database is memory resident.
 Requires many database scans.
30
Applications

 Basket data analysis

 Cross-marketing
 Catalog design
 Sale campaign analysis
 Web log (click stream) analysis, and
 DNA sequence analysis.

31
3-Mining Frequent Patterns Without Candidate
Generation – FP Growth Algorithm
 It is proposed by Han, Pei & Yin in 2000.

 Frequent pattern growth (FP-growth) is a method of

mining frequent itemsets without candidate generation.

 It constructs a highly compact data structure (an FP-tree)

to compress the original transaction database.

 It focuses on frequent pattern (fragment) growth, which

avoids costly candidate generation and resulting in
greater efficiency. 32
 It follows the frequent pattern growth approach, create the
root of the tree, labeled with null and creates branches suffix
and prefix nodes with frequent pattern growth.
 FP-growth algorithm adopts a divide – and – conquer
strategy.
 It reduces the size of candidate sets, leading to good
performance gain.
 It compress the DB representing frequent items into a
frequent – pattern, which retains the itemset association
information.

 FP-Growth follows
1. Scan DB once, find frequent 1-itemset (single item pattern)
2. Sort frequent items in frequency descending order, f-list
3. Scan DB again, construct FP-tree. 33
Eg: 1. All Electronics branch with transactional data using
FP-Growth Algorithm with Minimum support is 2.

34
Construct items how many times occurred in the table

35
Construct items with priority based

TID ITEMS PRIORITY

T100 I1,I2,I5 I2,I1,I5
T200 I2,I4 I2,I4
T300 I2,I3 I2,I3
T400 I1,I2,I4 I2,I1,I4
T500 I1,I3 I1,I3
T600 I2,I3 I2,I3
T700 I1,I3 I1,I3
T800 I1,I2,I3,I5 I2,I1,I3,I5
T900 I1,I2,I3 I2,I1,I3
36
Step 1: Create root of DB, label it as “NULL”
For T100 Transactions: I2,I1,I5

NULL

I2:1

I1:1

I5:1
37
Step 2: Scan DB for Second Transaction: T200
For T200 Transactions: I2,I4

NULL

I2:2

I4:1
I1:1

I5:1
38
Step 3: Scan DB for Third Transaction: T300
For T300 Transactions: I2,I3

NULL

I2:3

I4:1
I1:1 I3:1

I5:1
39
Step 4: Scan DB for fourth Transaction: T400
For T400 Transactions: I2,I1,I4

NULL

I2:4

I4:1
I1:2
I3:1

I5:1 I4:1
40
Step 5: Scan DB for fifth Transaction: T500
For T500 Transactions: I1,I3

NULL

I1:1
I2:4
I3:1
I4:1
I1:2
I3:1

I5:1 I4:1

41
Step 6: Scan DB for sixth Transaction: T600
For T600 Transactions: I2,I3

NULL

I1:1
I2:5
I3:1
I4:1
I1:2
I3:2

I5:1 I4:1

42
 Step 7: Scan DB for seventh Transaction: T700
 For T700 Transactions: I1,I3

NULL

I1:2
I2:5
I3:2
I4:1
I1:2
I3:2

I5:1 I4:1

43
 Step 8: Scan DB for eight Transaction: T800
 For T800 Transactions: I2,I1,I3,I5

NULL

I1:2
I2:6
I3:2
I4:1
I1:3
I3:3

I5:2 I4:1

44
Step 9: Scan DB for ninth Transaction: T900
For T900 Transactions: I2,I1,I3

Fig: FP-tree registers compressed, Frequent

Pattern Information.
45
Fig: Mining the FP-tree by creating Conditional
(sub-)pattern bases.
46
E.g. 3: A database has five transactions. Let min sup =
60% and min conf = 80% using FP-Growth Algorithm
with Minimum support is 3.

47
E.g.: A database has five transactions. Let min sup =
60% and min conf = 80% using FP-Growth Algorithm
with Minimum support is 3.

Support= 60%=>60*5( No. of Transactions)=300/100=3

Confidence: First identify possible subsets from set L

based on S=>(L-S) . Next calculate the confidence rule

Confidence = # of Tuples in both P(AuB)

______________________
# of Tuples only P(A)

48
FP-growth algorithm for discovering frequent
itemsets without candidate generation.

49
Advantages
1. No candidate generation
2. No candidate test
3. No repeated scan of entire database
4. Never break a long pattern of any transaction
5. Reduce irrelevant info—infrequent items are gone
6. Items in frequency descending order.
7. Never be larger than the original database.
8. Preserve complete information for frequent
pattern mining.
50
4- Vertical Data Format (ECLAT)
 Vertical data format approach is proposed by – Charm Zaki &
Hsiao in 2000.

 Mining frequent itemsets using vertical data format (ECLAT i.e.

(Equivalence CLASS Transformation) is a method that transforms a
given data set of transactions in the horizontal data format of TID-
itemset into the vertical data format of item-TID set.

 Frequent itemsets can also be mined efficiently using vertical data

format, which is the essence of the ECLAT algorithm.

 Data can also be presented in item-TID_set format (that is, {item :

TID_set}), where item is an item name, and TID_set is the set of
transaction identifiers containing the item, this format is known as
vertical data format. 51
Eg: 1. All Electronics branch with transactional data using
ECLAT with Minimum support is 2.

52
 It mines the transformed data set by TID_set intersections
based on the Apriori property and additional optimization
techniques, such as different set.
Step 1: The vertical data format of the transaction data set D
of 1-Item set

53
Step -2:

Step -3:

54
Mining Closed Frequent Itemsets
 Frequent itemset mining may generate a huge number of frequent
itemsets, especially when the min_sup threshold is set low or when
there exist long patterns in the data set.

 Closed frequent itemsets: An itemset X is a closed frequent itemset

in set S if X is both closed and frequent in S.

 Maximal Frequent itemsets: An itemset X is a maximal frequent

itemset (or max-itemset) in set S if X is frequent, and there exists
no super-itemset Y such that X(Y and Y is frequent in S.
 C = {(a1, a2,......,a100) : 1; (a1, a2, : : : , a50) : 2}
 M = {(a1, a2,......,a100) : 1}

 It is more desirable to mine the set of closed frequent itemsets

rather than the set of all frequent itemsets in most cases.
55
Example

 1st scan: find frequent items Tid Items

– A, B, C, D, E 10 A,B,C,D,E

 2nd scan: find support for 20 B,C,D,E,

– AB, AC, AD, AE, ABCDE 30 A,C,D,F

– BC, BD, BE, BCDE

Potential
– CD, CE, CDE, DE, max-patterns
 Since BCDE is a max-pattern, no need to check BCD, BDE, CDE
in later scan.
 Mining Closed Frequent Itemsets Techniques are item merging,
sub-itemset pruning and item skipping, as well as efficient subset
checking of generated itemsets in a pattern-tree. 56
5. Mining Various Kinds of Association Rules
 Multilevel association rules involve concepts at different levels
of abstraction.
 Multidimensional association rules involve more than one
dimension or predicate(e.g., rules relating what a customer buys
as well as the customer’s age.)
 Quantitative association rules involve numeric attributes that
have an implicit ordering among values (e.g., age).

It consists of:
1. Mining Multilevel Association Rules
2. Mining Multidimensional Association Rules
3. Mining Multidimensional Association Rules - Static Discretization of
Quantitative Attributes
4. Mining Quantitative Association Rules - ARCS

57
Example-1: Mining Association Rules

Ti
d
Items bought  Find all the rules X  Y with
10 Beer, Nuts, Diaper
minimum support and confidence
20 Beer, Coffee, Diaper
30 Beer, Diaper, Eggs – Support, s, probability that a
40 Nuts, Eggs, Milk transaction contains X  Y
50 Nuts, Coffee, Diaper, Eggs,
Milk – Confidence, c, conditional
Customer
buys both
Customer probability that a transaction
buys
having X also contains Y
diaper
 Let minsup = 50%, minconf = 50%
 Freq. Pat.: Beer:3, Nuts:3, Diaper:4, Eggs:3,
{Beer, Diaper}:3
Customer
buys beer
 Association rules: (many more!)

Beer  Diaper (60%, 100%)

Diaper  Beer (60%, 75%)
Example-2: Mining Association Rules

Transaction ID Items Bought Min. support 50%

2000 A,B,C Min. confidence 50%
1000 A,C
4000 A,D Frequent Itemset Support
{A} 75%
5000 B,E,F
{B} 50%
{C} 50%
{A,C} 50%

 For rule A  C:
support = support({A  C}) = 50%
confidence = support({A  C})/support({A}) = 66.6%
1. Mining Multi-Level Association Rules
 A concept hierarchy deﬁnes a sequence of mappings from a set of low-level
concepts to higher level, more general concepts.
 Data can be generalized by replacing low-level concepts within the data by their
higher-level concepts, or ancestors, from a concept hierarchy.

60
The multilevel association rule mining is done based on variations at level wise
search.
Three variation are:
i. Using uniform minimum support for all levels (referred to as uniform
support): The same minimum support threshold is used when mining at each
level of abstraction.
ii. Using reduced minimum support at lower levels (referred to as reduced
support): Each level of abstraction has its own minimum support threshold.
This is better approach compared to uniform support as it comes with more
item sets generated at each level.
iii. Using item or group-based minimum support (referred to as group-based
support): Because users or experts often have insight as to which groups are
more important than others, it is sometimes more desirable to set up user-
specific, item, or group based minimal support thresholds when mining
multilevel rules.

Problem: In Association rule mining to find frequent item set every time low level
item has to depending on its ancestor. ex: laptop computer has an ancestor
computer.
 buys(X, “laptop computer”) => buys(X, “HP printer”) [support = 8%, confidence = 70%]
 buys(X, “IBM laptop computer”) => buys(X, “HP printer”) [support = 2%, confidence = 72%]
62
2. Mining Multi-Dimensional Association
 Multidimensional association rules involve more than one
dimension or predicate (Ex: Rules relating what a
customer buys as well as the customer’s age.).
 Multidimensional association rules with no repeated predicates
are called interdimensional association rules.
 Mine multidimensional association rules with repeated predicates,
which contain multiple occurrences of some predicates, these
rules are called Hybrid-dimensional association rules.

 Categorical Attributes: finite number of possible values, no

ordering among values is called Data cube approach

 Quantitative Attributes: numeric, implicit ordering among values

is called Discretization, Clustering, and Gradient approaches. 63
 Types of Multi-dimensional rules:

1. Single-dimensional / Intra dimensional rules:

 buys(X, “milk”)  buys(X, “bread”)

2. Multi-dimensional / Inter dimensional rules:  2 dimensions or

predicates
a) Inter-dimension assoc. rules (no repeated predicates)
 age(X,”19-25”)occupation(X,“student”)buys(X, “coke”)

b) Hybrid-dimension assoc. rules (repeated predicates)

 age(X,”19-25”)  buys(X, “popcorn”)  buys(X, “coke”)
3. Hybrid dimensional rule:
64
age(X, “20…29”)^buys(X, “laptop”))buys(X, “HP printer”)
3. Mining Multidimensional Associations - Static Discretization

 Quantitative attributes, in this case, are discretized before mining

using pre defined concept hierarchies or data discretization
techniques, where numeric values are replaced by interval labels.
4. Mining Quantitative Association Rules - ARCS
 Quantitative association rules are multidimensional association rules
in which the numeric attributes are dynamically discretized during
the mining process so as to satisfy some mining criteria, such as
maximizing the confidence or compactness of the rules mined.
 Techniques can be categorized by how numerical attributes, such as
age or salary are treated.
 ARCS ( Association Rule and Clustering Systems):
 ARCS uses equal-width binning, where the bin size for each
quantitative attribute is input by the user.
1. Equal-width binning: where the interval size of each bin is the
same.
2. Equal-frequency binning: where each bin has approximately the
same number of tuples assigned to it.
3. Clustering-based binning: where clustering is performed on the
quantitative attribute to group neighboring points (judged based on
various distance measures) into the same bin
Ex: Clustering the association rules: Given rules:
age(X, 34)încome(X, “31K…40K”))buys(X, “HDTV”)
age(X, 35)încome(X, “31K…40K”))buys(X, “HDTV”)
age(X, 34)încome(X, “41K…50K”))buys(X, “HDTV”)
age(X, 35)încome(X, “41K…50K”))buys(X, “HDTV”).

Fig: A 2-D Grid for tuples representing customers who purchase high-Deﬁnition TVs

DM - Unit II
No ratings yet
DM - Unit II
65 pages
Fundamentals of Data Science Unit 5
No ratings yet
Fundamentals of Data Science Unit 5
25 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
16 pages
Inbound 5799672056943946753
No ratings yet
Inbound 5799672056943946753
47 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
DM Unit - 2
No ratings yet
DM Unit - 2
14 pages
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
Unit 2 Material
No ratings yet
Unit 2 Material
17 pages
Unit 2
No ratings yet
Unit 2
65 pages
Association Rules
No ratings yet
Association Rules
48 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
Note 1455181909
No ratings yet
Note 1455181909
30 pages
06 Association Rule Mining
No ratings yet
06 Association Rule Mining
20 pages
Retail Market Basket Analysis
No ratings yet
Retail Market Basket Analysis
43 pages
Association Analysis in Data Mining
No ratings yet
Association Analysis in Data Mining
72 pages
Chapter06 (Frequent Patterns)
No ratings yet
Chapter06 (Frequent Patterns)
47 pages
CSC 452 DM Week06 Association Rules 26102020 111149am
No ratings yet
CSC 452 DM Week06 Association Rules 26102020 111149am
52 pages
Contents
No ratings yet
Contents
59 pages
FP Tree Basics
No ratings yet
FP Tree Basics
67 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
2 Unit DM K Raj Kuamr
No ratings yet
2 Unit DM K Raj Kuamr
26 pages
Data Mining: Frequent Patterns
No ratings yet
Data Mining: Frequent Patterns
40 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
Data Mining UNIT 3 LECTURE NOTES
No ratings yet
Data Mining UNIT 3 LECTURE NOTES
13 pages
Data Mining Unit-III
No ratings yet
Data Mining Unit-III
24 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
Unit II
No ratings yet
Unit II
22 pages
Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
Unit 5
No ratings yet
Unit 5
40 pages
DM 2
No ratings yet
DM 2
71 pages
(2025-05-27) - FPM - Lecture 9
No ratings yet
(2025-05-27) - FPM - Lecture 9
35 pages
UNIT 5 Frequent Pattern Mining
No ratings yet
UNIT 5 Frequent Pattern Mining
42 pages
Frequent Patterns and Associations in Data Mining
No ratings yet
Frequent Patterns and Associations in Data Mining
66 pages
DMDW Chapter 4 (Updated)
No ratings yet
DMDW Chapter 4 (Updated)
28 pages
Association Rules
No ratings yet
Association Rules
20 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
67 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
Week 3
No ratings yet
Week 3
56 pages
Chap 4-Mining Frequent Patterns, Association-Lecture 6-2
No ratings yet
Chap 4-Mining Frequent Patterns, Association-Lecture 6-2
66 pages
Chapter 4
No ratings yet
Chapter 4
32 pages
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
No ratings yet
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
30 pages
DMDW U3
No ratings yet
DMDW U3
16 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Unit - III
No ratings yet
Unit - III
27 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
DM-BS-lec6-Mining Frequent Patterns
No ratings yet
DM-BS-lec6-Mining Frequent Patterns
37 pages
Apriori Algorithm for Association Rule Mining
No ratings yet
Apriori Algorithm for Association Rule Mining
32 pages
CS 412 Intro. To Data Mining
No ratings yet
CS 412 Intro. To Data Mining
55 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Concepts and Techniques: - Chapter 6
No ratings yet
Concepts and Techniques: - Chapter 6
64 pages
Frequent Itemset Mining
No ratings yet
Frequent Itemset Mining
58 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
Lecture 5
No ratings yet
Lecture 5
43 pages
Reference Manual
No ratings yet
Reference Manual
77 pages
Industrial IoT Lab Manual
No ratings yet
Industrial IoT Lab Manual
33 pages
Industrial IoT Insights & Use Cases
100% (1)
Industrial IoT Insights & Use Cases
6 pages
Iot - Mid-Ii - Bit Bank
No ratings yet
Iot - Mid-Ii - Bit Bank
21 pages
IIOT Slideshare
No ratings yet
IIOT Slideshare
12 pages
Industril Iot Unit Test 1
No ratings yet
Industril Iot Unit Test 1
1 page
APPGECET 2021 Provisional Allotment Order
No ratings yet
APPGECET 2021 Provisional Allotment Order
32 pages
Industrial IoT Laboratory
No ratings yet
Industrial IoT Laboratory
1 page
Higher Education Data Template
No ratings yet
Higher Education Data Template
3 pages
Offer Letter 2024-10-15
No ratings yet
Offer Letter 2024-10-15
10 pages
Entrepenureship Data Template
No ratings yet
Entrepenureship Data Template
3 pages
2023-24 UG Student Placements
No ratings yet
2023-24 UG Student Placements
57 pages
DWDM - Unit - VIII
No ratings yet
DWDM - Unit - VIII
32 pages
IoT Applications & Examples Guide
No ratings yet
IoT Applications & Examples Guide
23 pages
Clustering Techniques in Data Mining
No ratings yet
Clustering Techniques in Data Mining
38 pages
Personal Development Exam Guide
No ratings yet
Personal Development Exam Guide
2 pages
Notes For Final Period 1
100% (1)
Notes For Final Period 1
19 pages
Investigation Data Form: Date Received: Nps Docket No.
No ratings yet
Investigation Data Form: Date Received: Nps Docket No.
4 pages
Goalkeeper Training Schedule
No ratings yet
Goalkeeper Training Schedule
35 pages
7 Standards of Textuality
No ratings yet
7 Standards of Textuality
43 pages
Lashing Information
100% (1)
Lashing Information
8 pages
People v. Bayotas
100% (2)
People v. Bayotas
1 page
Termo Isolation
No ratings yet
Termo Isolation
3 pages
IR Playbook A Comprehensive Introduction To Interventional Radiology Entire PDF Ebook
100% (15)
IR Playbook A Comprehensive Introduction To Interventional Radiology Entire PDF Ebook
15 pages
20 đề thi thử Anh THPTQG
No ratings yet
20 đề thi thử Anh THPTQG
326 pages
Inventory Estimation and LCNRV Sample Problems
No ratings yet
Inventory Estimation and LCNRV Sample Problems
3 pages
Learner-Centered Lesson Plan
No ratings yet
Learner-Centered Lesson Plan
1 page
Princess Diana: A People's Queen
No ratings yet
Princess Diana: A People's Queen
16 pages
Notes and Rhythms in Music Theory
No ratings yet
Notes and Rhythms in Music Theory
6 pages
S Sa Aiip Pa Ax X1 10 00 0: Bosch Bosch M7.9.7 ECU M7.9.7 ECU
100% (1)
S Sa Aiip Pa Ax X1 10 00 0: Bosch Bosch M7.9.7 ECU M7.9.7 ECU
26 pages
Ayurveda Education & Opportunities
No ratings yet
Ayurveda Education & Opportunities
9 pages
Big Data for Energy Managers
No ratings yet
Big Data for Energy Managers
50 pages
Sacred Marriage Ritual
No ratings yet
Sacred Marriage Ritual
28 pages
(Business Environment) : DR Sakuntala Misra National Rehabilitation University, Lucknow Faculty of Law
No ratings yet
(Business Environment) : DR Sakuntala Misra National Rehabilitation University, Lucknow Faculty of Law
11 pages
Physical Education's Impact on Learning Areas
No ratings yet
Physical Education's Impact on Learning Areas
6 pages
ISO 41001 2018 Amd 1 2024 (En)
No ratings yet
ISO 41001 2018 Amd 1 2024 (En)
6 pages
Malnutrition, Undernutrition and Overnutrition Affect Health and Wellness
No ratings yet
Malnutrition, Undernutrition and Overnutrition Affect Health and Wellness
3 pages
2018 India Fuel Price Breakdown
No ratings yet
2018 India Fuel Price Breakdown
3 pages
When Mirrors Are Windows A View of A K Ramanujan S Poetics 1st Edition Guillermo Rodríguez PDF Available
No ratings yet
When Mirrors Are Windows A View of A K Ramanujan S Poetics 1st Edition Guillermo Rodríguez PDF Available
123 pages
Inguinal Hernia Case Study Analysis
No ratings yet
Inguinal Hernia Case Study Analysis
2 pages
Magazine Target Audience Analysis Guide
No ratings yet
Magazine Target Audience Analysis Guide
13 pages
Trauma and Haemorrhage
No ratings yet
Trauma and Haemorrhage
42 pages
Ethics and Moral Values in Sports
No ratings yet
Ethics and Moral Values in Sports
7 pages
Part 5b - Road Markings
No ratings yet
Part 5b - Road Markings
95 pages
Bhedams PDF
No ratings yet
Bhedams PDF
7 pages

DWDM - Unit - IV

Uploaded by

DWDM - Unit - IV

Uploaded by

UNIT – IV

 Mining Frequent Patterns, Association and

 The discovery of frequent patterns, association, and

 A popular area of application is market basket analysis, which

1. What is frequent pattern analysis

3. Mining various kinds of Association rules

Fig: Mining Frequent Patterns, Association and Correlations

 First proposed by Agrawal, Imielinski, and Swami in the context of

Fig: Market Basket Analysis

– Classification: Associative classification

 Support Count (SC) of an itemset X is the number of

 Support of an association rule X=>Y equals the support of

 Confidence of an association rule X=>Y =

1. Find all frequent itemsets: By deﬁnition, each of

2. Generate strong association rules from the

ii) Based on the kinds of rules to be mined

iii) Based on the kinds of patterns to be mined

 To increase the popularity of stock market analysis.

 Both Apriori and FP-Growth are aiming to find out

 FP-Growth is more efficient and scalable than Apriori in

 Basket data analysis

 A Pattern (a set of items, subsequences, substructures,

 The downward closure property of frequent patterns

 Scalable mining methods: Three major approaches

 Proposed by R. Agrawal and R. Srikant in 1994.

 The Apriori algorithm is an efficient algorithm for

 An iterative/level-wise search, where k-itemsets are used

 Property: All nonempty subsets of a frequent itemset

Min support =50% or

1. Finding Frequent Itemsets Using Candidate Generation.

2. Generating Association Rules from Frequent Itemsets.

3. Improving the Efficiency of Apriori

1. The join step: To find Lk, a set of candidate k-itemsets

2. The prune step: Ck is a superset of Lk, that is, its

These rules also satisfy a minimum confidence

 Association rules can be generated as follows:

Many variations of the Apriori algorithm have

 Basket data analysis

 Frequent pattern growth (FP-growth) is a method of

 It constructs a highly compact data structure (an FP-tree)

 It focuses on frequent pattern (fragment) growth, which

TID ITEMS PRIORITY

Fig: FP-tree registers compressed, Frequent

Support= 60%=>60*5( No. of Transactions)=300/100=3

Confidence: First identify possible subsets from set L

Confidence = # of Tuples in both P(AuB)

 Mining frequent itemsets using vertical data format (ECLAT i.e.

 Frequent itemsets can also be mined efficiently using vertical data

 Data can also be presented in item-TID_set format (that is, {item :

 Closed frequent itemsets: An itemset X is a closed frequent itemset

 Maximal Frequent itemsets: An itemset X is a maximal frequent

 It is more desirable to mine the set of closed frequent itemsets

 1st scan: find frequent items Tid Items

 2nd scan: find support for 20 B,C,D,E,

– AB, AC, AD, AE, ABCDE 30 A,C,D,F

– BC, BD, BE, BCDE

Transaction ID Items Bought Min. support 50%

 Categorical Attributes: finite number of possible values, no

 Quantitative Attributes: numeric, implicit ordering among values

1. Single-dimensional / Intra dimensional rules:

2. Multi-dimensional / Inter dimensional rules:  2 dimensions or

b) Hybrid-dimension assoc. rules (repeated predicates)

 Quantitative attributes, in this case, are discretized before mining

You might also like