0% found this document useful (0 votes)

165 views7 pages

Efficient Sequential Pattern Mining

1) The document outlines PrefixSpan, an algorithm for efficiently mining sequential patterns. 2) PrefixSpan uses a pattern-growth approach called prefix-projection, which avoids candidate generation and allows mining patterns in a single database scan. 3) The algorithm recursively projects a sequence database into smaller projected databases based on prefixes, and grows subsequences within each projected database.

Uploaded by

Ghiffari Agsarya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

165 views7 pages

Efficient Sequential Pattern Mining

Uploaded by

Ghiffari Agsarya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Outline

`
Mining Sequential Patterns
PrefixSpan: Mining Sequential Patterns

Problem statement
Efficiently by Prefix-Projected Pattern Definitions & examples
Growth Strategies
PrefixSpan algorithm
Authors:
Jian Pei, Jiawei Han, Behzad Mortazavi-Asi, Helen Pinto Qiming Chen, Motivation
Umeshwar Dayal, Mei-Chun Hsu
Definitions & examples
Algorithm
Example
Performance study
Presenter: Conclusions
Wojciech Stach
2

Sequential Pattern Mining Sequential Pattern Mining

` `
Given Find all the frequent subsequences, i.e. the
a set of sequences, where each sequence consists of a list subsequences whose occurrence frequency in the
of elements and each element consists of set of items set of sequences is no less than min_support
user-specified min_support threshold

Solution 53 frequent subsequences

<a><aa> <ab> <a(bc)> <a(bc)a> <aba> <abc>
<a(abc)(ac)d(cf)> - 5 elements, 9 items id Sequence
id Sequence <(ab)> <(ab)c> <(ab)d> <(ab)f> <(ab)dc> <ac>
10 <a(abc)(ac)d(cf)> 10 <a(abc)(ac)d(cf)> <aca> <acb> <acc> <ad> <adc> <af>
20 <(ad)c(bc)(ae)> <a(abc)(ac)d(cf)> - 9-sequence 20 <(ad)c(bc)(ae)> <ba> <bc> <(bc)> <(bc)a> <bd> <bdc> <bf>
30 <(ef)(ab)(df)cb> 30 <(ef)(ab)(df)cb> <c> <ca> <cb> <cc>
40 <eg(af)cbc> <a(abc)(ac)d(cf)> = <a(cba)(ac)d(cf)> 40 <eg(af)cbc> <d> <db> <dc> <dcb>
<a(abc)(ac)d(cf)> <a(ac)(abc)d(cf)>
<e> <ea> <eab> <eac> <eacb> <eb> <ebc> <ec>
<ecb> <ef> <efb> <efc> <efcb>
min_support = 2 <f> <fb> <fbc> <fc> <fcb>
3 4
Subsequence vs. super sequence Sequence Support Count
` `
Given two sequences =<a1a2an> and A sequence database is a set of tuples <sid, s>
=<b1b2bm> A tuple <sid, s> is said to contain a sequence , if
is called a subsequence of , denoted as , is a subsequence of s, i.e., s
if there exist integers 1j1<j2<<jn m such that The support of a sequence is the number of
a1bj1, a2 bj2,, anbjn tuples containing
is a super sequence of

id Sequence 1=<a> support(1) = 4

10 <a(abc)(ac)d(cf)>
=<a(abc)(ac)d(cf)> =<a(abc)(ac)d(cf)> 2=<ac> support(2) = 4
20 <(ad)c(bc)(ae)>
1=<aa(ac)d(c)> 4=<df(cf)> 30 <(ef)(ab)(df)cb> 3=<(ab)c> support(3) = 2
40 <eg(af)cbc>
2=<(ac)(ac)d(cf)> 5=<(cf)d>

3=<ac> 6=<(abc)dcf>
5 6

Strategies Outline
` `
Apriori-property based Mining Sequential Patterns
AprioriSome (1995) Problem statement
AprioriAll (1995) Definitions & examples
DynamicSome (1995) Strategies
GSP (1996) PrefixSpan algorithm
Motivation
Regular expression constraints Definitions & examples
SPIRIT (1999) Algorithm
Example
Data projection based Performance study
FreeSpan (2000) Conclusions

7 8
Motivation and Background Prefix
` `
Shortcomings of Apriori-like approaches Given two sequences =<a1a2an> and
Potentially huge set of candidate sequences =<b1b2bm>, mn
Multiple scans of databases
Sequence is called a prefix of if and only if:

Difficulties at mining long sequential patterns
bi = ai for i m-1;
FreeSpan (Frequent pattern-projected Sequential pattern bm am;
mining) pattern growth method All the items in (am bm) are alphabetically after those in
General idea is to use frequent items to recursively project bm
sequence databases into a smaller projected databases and
grow subsequence fragments in each projected database

=<a(abc)(ac)d(cf)> =<a(abc)(ac)d(cf)>
PrefixSpan (Prefix-projected Sequential pattern mining)
Less projections and quickly shrinking sequences =<a(abc)a> =<a(abc)c>

9 10

Projection Postfix
` `
Given sequences and , such that is a Let =<a1a2an> be the projection of w.r.t.
subsequence of . prefix =<a1a2am-1am> (m n)
A subsequence of sequence is called a Sequence =<amam+1an> is called the postfix of
projection of w.r.t. prefix if and only if w.r.t. prefix , denoted as = / , where
has prefix ; am=(am-am)
There exist no proper super-sequence of such that We also denote =
is a subsequence of and also has prefix

=<a(abc)(ac)d(cf)> =<a(abc)(ac)d(cf)>
=<(bc)a>
=<a(abc)a>
=<(bc)(ac)d(cf)>
=<(_c)d(cf)>

11 12
PrefixSpan Algorithm PrefixSpan Algorithm (2)
` `
Input: A sequence database S, and the minimum support Method
threshold min_sup
1. Scan S| once, find the set of frequent items b
Output: The complete set of sequential patterns such that:
a) b can be assembled to the last element of to form a
Method: Call PrefixSpan(<>,0,S) sequential pattern; or
b) can be appended to to form a sequential pattern.
Subroutine PrefixSpan(, l, S|) 2. For each frequent item b, append it to to form a
sequential pattern , and output ;
Parameters:
: sequential pattern, 3. For each , construct -projected database S|,
l: the length of ; and call PrefixSpan(, l+1, S|).
S|: the -projected database, if <>; otherwise; the
sequence database S.

13 14

id Sequence
10 <a(abc)(ac)d(cf)>

PrefixSpan - Example 20
30
<(ad)c(bc)(ae)>
<(ef)(ab)(df)cb>
PrefixSpan Example (2)
` 40 <eg(af)cbc> `
3. Find subsets of sequential patterns
1. Find length-1 sequential patterns min_support = 2
<a> <c> <d> <e> <f> <g> <d> <a> <c> <d> <e> <(_e)> <f> <(_f)>
4 4 4 3 3 3 1 <(cf)> 1 2 3 0 1 0 1 1
<c(bc)(ae)>
<(_f)cb>
2. Divide search space
Prefix
<db> <dc>

<a> <c> <d> <e> <f> <db> <dc> <c>
<(abc)(ac)d(cf)> <(_c)(ac)d(cf)> <(ac)d(cf)> <(cf)> <(_f)(ab)(df)cb> <(ab)(df)cb> <(_c)> <(bc)> 2 1
<(_d)c(bc)(ae)> <(_c)(ae)> <(bc)(ae)> <c(bc)(ae)> <(af)cbc> <cbc> 
<(_b)(df)cb> <(df)cb> <(_f)cb>
<(_f)cbc> <c> <bc>
<dcb>
<dcb>
<>
15 16
id Sequence
10 <a(abc)(ac)d(cf)>

PrefixSpan - characteristics Bi-level Projection 20

30
<(ad)c(bc)(ae)>
<(ef)(ab)(df)cb>
40 <eg(af)cbc>
` No candidate sequence needs to be generated by `

PrefixSpan min_support = 2
Scan to get 1-length sequences
Projected databases keep shrinking
Construct a triangular matrix instead of projected
The major cost of PrefixSpan is the construction of databases for each length-1 patterns
projected databases
a 2
How to reduce this cost? b (4,2,2) 1 ALL length-2 sequential
c (4,2,1) (3,3,2) 3 pattern
Different projection methods d (2,1,1) (2,2,0) (1,3,0) 0
e (1,2,1) (1,2,0) (1,2,0) (1,1,0) 0
Bi-level projection
f (2,1,1) (2,2,0) (1,2,1) (1,1,1) (2,0,1) 1
reduces the number and the size of projected databases a b c d e f

Pseudo-Projection
Support(<ac>) = 4
Support(<ca>) = 2 Support(<cc>) = 3
reduces the cost of projection when projected database can be
Support(<(ac)>) = 1
held in main memory
17 18

Bi-level projection (2) Bi-level projection (3) - optimization

` `
For each length-2 sequential pattern , construct Do we need to include every item in a postfix in
the -projected database and find the frequent the projected databases?
items
NO! Item pruning in projected database by 3-way
Construct corresponding S-matrix Apriori checking
<ab> a b c (_c) d (_d) e (_e) f (_f)
<(_c)(ac)(cf)> 2 0 2 2 0 1 0 0 1 0 Any super-sequence of c can be excluded from construction of
<ac> is not frequent
<(_c)a> it can never be a sequential <ab> - projected database
<c> pattern
<aba> <abc> <a(bc)>

a 0 <a(bd)> is not frequent To construct <a(bc)>-projected database,

sequence <a(bcde)df> should be projected to <(_e)df>
c (1,0,1) 1 instead of <(_de)df>
(_c) (,2, ) (,1, )
a c (_c)
<a(bc)a>
19 20
Pseudo-Projection Experimental Results
` `
Observation: postfixes of a sequence often Environment: 233MHz Pentium PC, 128 MB RAM,
appear repeatedly in recursive projected databases Windows NT, Visual C++ 6.0
Method: instead of constructing physical Reported test on synthetic data set: C10T8S8I8:
projection by collecting all the postfixes, we can 1000 items
use pointers referring to the sequences in the 10000 sequences
database as a pseudo-projection Average number of items within elements: 8
Every projection consists of two pieces of Average number of elements in a sequence: 8
information: pointer to the sequence in database Competitors:
and offset to the postfix in the sequence GSP
FreeSpan
s1=<a(abc)(ac)d(cf)> Pointer Offset Postfix PrefixSpan-1 (level-by-level projection)
s1 2 <(abc)(ac)d(cf)> PrefixSpan-2 (bi-level projection)
s1 5 <(ac)d(cf)>
s1 6 <(_c)d(cf)>
21 22

Runtime vs. support threshold I/O costs vs. threshold and scalability
` `

23 24
Outline Conclusions
` `
Mining Sequential Patterns
Problem statement PrefixSpan
Definitions & examples Efficient pattern growth method
Strategies Outperforms both GSP and FreeSpan
PrefixSpan algorithm Explores prefix-projection in sequential pattern mining
Motivation Mines the complete set of patterns but reduces the effort
Definitions & examples of candidate subsequence generation
Algorithm Prefix-projection reduces the size of projected database
Example and leads to efficient processing
Performance study Bi-level projection and pseudo-projection may improve
mining efficiency
Conclusions

25 26

References
` `
Pei J., Han J., Mortazavi-Asl J., Pinto H., Chen Q., Dayal U., Hsu M.,
PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected
Pattern Growth, 17th International Conference on Data Engineering
(ICDE), April 2001
Agrawal R., Srikant R., Mining sequential patterns, Proceedings 1995
Int. Conf. Very Large Data Bases (VLDB94), pp. 487-499, 1995 THANK YOU !!!
Han J., Dong G., Mortazavi-Asl B., Chen Q., Dayal U., Hsu M.-C.,
Freespan: Frequent pattern-projected sequential pattern mining,
Proceedings 2000 Int. Conf. Knowledge Discovery and Data Mining
(KDD00), pp. 355-359, 2000
Srikant R., Agrawal R., Mining sequential pattern: Generalizations

and performance improvements, Proceedings 5th Int. /conf. Any Questions?

Extending Database Technology (EDBT96), pp. 3-17, 1996
Zhao Q., Bhowmick S. S., Sequential Pattern Mining: A Survey.
Technical Report Center for Advanced Information Systems, School
of Computer Engineering, Nanyang Technological University,
Singapore, 2003

27 28

Greedy Algorithm: Definition & Examples
No ratings yet
Greedy Algorithm: Definition & Examples
20 pages
Maximum Subarray via Divide and Conquer
No ratings yet
Maximum Subarray via Divide and Conquer
16 pages
FP-Growth Algorithm
No ratings yet
FP-Growth Algorithm
23 pages
CH 6
No ratings yet
CH 6
72 pages
DMV Unit-4-1 PDF
100% (1)
DMV Unit-4-1 PDF
10 pages
Data Mining for CSE Students
No ratings yet
Data Mining for CSE Students
11 pages
Data Mining and Warehousing Overview
No ratings yet
Data Mining and Warehousing Overview
1 page
DWDM Unit 6 Cluster Analysis
No ratings yet
DWDM Unit 6 Cluster Analysis
183 pages
Data Mining & Warehousing Basics
100% (1)
Data Mining & Warehousing Basics
86 pages
Data Mining-Outlier Analysis
No ratings yet
Data Mining-Outlier Analysis
6 pages
Divisive Hierarchical Clustering
No ratings yet
Divisive Hierarchical Clustering
11 pages
Functional Dependencies and Normalization
No ratings yet
Functional Dependencies and Normalization
7 pages
Data Mining: Classification & Prediction
No ratings yet
Data Mining: Classification & Prediction
16 pages
Dynamic Programming in Algorithms
No ratings yet
Dynamic Programming in Algorithms
17 pages
358 33 Powerpoint Slides DSC Chapter 15
No ratings yet
358 33 Powerpoint Slides DSC Chapter 15
55 pages
Data Discretization Techniques
No ratings yet
Data Discretization Techniques
21 pages
Space and Time Trade-Off - PPT
No ratings yet
Space and Time Trade-Off - PPT
29 pages
Knuth Morris Pratt Algorithm
No ratings yet
Knuth Morris Pratt Algorithm
4 pages
Fdsa Unit 5
No ratings yet
Fdsa Unit 5
48 pages
Unit - 5
No ratings yet
Unit - 5
32 pages
Unit-1 Basics of Algorithms and Mathematics
No ratings yet
Unit-1 Basics of Algorithms and Mathematics
47 pages
Mining Frequent Patterns Without Candidate Generation
No ratings yet
Mining Frequent Patterns Without Candidate Generation
44 pages
Data Structures Homework Guide
No ratings yet
Data Structures Homework Guide
6 pages
DSP Signal Flow Graph Retiming
No ratings yet
DSP Signal Flow Graph Retiming
96 pages
Relational Database Design: Exercises
No ratings yet
Relational Database Design: Exercises
9 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
4 pages
Graph Mining Techniques Overview
No ratings yet
Graph Mining Techniques Overview
23 pages
UNIT-III Data Warehouse and Minig Notes MDU
No ratings yet
UNIT-III Data Warehouse and Minig Notes MDU
42 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
51 pages
BCA VI: Data Warehousing Essentials
No ratings yet
BCA VI: Data Warehousing Essentials
149 pages
Data Warehousing & Mining Syllabus
100% (1)
Data Warehousing & Mining Syllabus
1 page
Divide and Conquer
No ratings yet
Divide and Conquer
54 pages
CS8091 - Big Data Analytics - Unit 3
No ratings yet
CS8091 - Big Data Analytics - Unit 3
26 pages
Tree Traversal Methods Explained
No ratings yet
Tree Traversal Methods Explained
4 pages
Advanced Association Analysis Techniques
No ratings yet
Advanced Association Analysis Techniques
85 pages
Game Theory and AI Algorithms Explained
No ratings yet
Game Theory and AI Algorithms Explained
24 pages
Unit V Notes
No ratings yet
Unit V Notes
39 pages
Python Case Study Questions With Answers
No ratings yet
Python Case Study Questions With Answers
6 pages
DSV Module-3
No ratings yet
DSV Module-3
24 pages
R23!3!1 DWDM Final Syllabus On 21-06-2025
No ratings yet
R23!3!1 DWDM Final Syllabus On 21-06-2025
5 pages
DWDM Unit-4
No ratings yet
DWDM Unit-4
27 pages
Data Stream Mining Techniques
No ratings yet
Data Stream Mining Techniques
16 pages
DMDW Full Notes
No ratings yet
DMDW Full Notes
26 pages
BDA Lab ManuaL
No ratings yet
BDA Lab ManuaL
83 pages
KLA - Interview Questions
No ratings yet
KLA - Interview Questions
3 pages
Data Mining Assignment Analysis
No ratings yet
Data Mining Assignment Analysis
10 pages
Closest Pair Algorithm Explained
No ratings yet
Closest Pair Algorithm Explained
9 pages
Flowchart of Sequential Search: Begin
No ratings yet
Flowchart of Sequential Search: Begin
2 pages
CS3352 Fds
No ratings yet
CS3352 Fds
23 pages
Numpy - Tutorial - Ipynb - Colaboratory
No ratings yet
Numpy - Tutorial - Ipynb - Colaboratory
9 pages
Frequency Distributions Guide
No ratings yet
Frequency Distributions Guide
27 pages
Understanding Nice and Ugly Domains
100% (1)
Understanding Nice and Ugly Domains
76 pages
Data Mining Course Overview and Syllabus
No ratings yet
Data Mining Course Overview and Syllabus
129 pages
Unit-2 Solution
No ratings yet
Unit-2 Solution
22 pages
Parallel Database Systems Overview
No ratings yet
Parallel Database Systems Overview
17 pages
Maheshwari Chapter 1
No ratings yet
Maheshwari Chapter 1
39 pages
Mining Frequent Patterns, Association and Correlations
No ratings yet
Mining Frequent Patterns, Association and Correlations
42 pages
PrefixSpan The Presentation
No ratings yet
PrefixSpan The Presentation
93 pages
Sequential Pattern Mining Guide
No ratings yet
Sequential Pattern Mining Guide
43 pages
An Updown Directed Acyclic Graph Approach For Sequential Pattern Mining
No ratings yet
An Updown Directed Acyclic Graph Approach For Sequential Pattern Mining
67 pages
Bobcat Technical Information Bulletin - Compress
No ratings yet
Bobcat Technical Information Bulletin - Compress
2 pages
Al Injector Common Rail
100% (5)
Al Injector Common Rail
2 pages
Atlas 2:1-Rph: Planning Data
No ratings yet
Atlas 2:1-Rph: Planning Data
12 pages
MFI-17 Engine Power Setting Review
No ratings yet
MFI-17 Engine Power Setting Review
4 pages
4 Nos Generator Inspection Report
No ratings yet
4 Nos Generator Inspection Report
2 pages
PSIPL-PRD-02 Hourly Produciton Report (Blow)
No ratings yet
PSIPL-PRD-02 Hourly Produciton Report (Blow)
1 page
Marotta Capability Overview
No ratings yet
Marotta Capability Overview
2 pages
2011 Burndy Tooling Catalog
No ratings yet
2011 Burndy Tooling Catalog
142 pages
DTS 3000 Manual
No ratings yet
DTS 3000 Manual
24 pages
Kiev88 Repair Manual
No ratings yet
Kiev88 Repair Manual
20 pages
SAP BW-BPC Consultant Profile
No ratings yet
SAP BW-BPC Consultant Profile
3 pages
Friction
No ratings yet
Friction
7 pages
01 Introduction To Information Security
No ratings yet
01 Introduction To Information Security
16 pages
Two Way Radio Protocol
No ratings yet
Two Way Radio Protocol
2 pages
Bow Tie Assessment in Offshore Safety
100% (7)
Bow Tie Assessment in Offshore Safety
3 pages
Placing of Concrete Checklist For JAN 9,2021 Sta 1+600 To 1+700 RL
No ratings yet
Placing of Concrete Checklist For JAN 9,2021 Sta 1+600 To 1+700 RL
2 pages
Department of Veterans Affairs Boiler Safety Devices - Operation Check List
No ratings yet
Department of Veterans Affairs Boiler Safety Devices - Operation Check List
2 pages
BE Admitted After Round 3 - EC
No ratings yet
BE Admitted After Round 3 - EC
22 pages
Load Calculation Final
No ratings yet
Load Calculation Final
72 pages
Sid 55-41-01
No ratings yet
Sid 55-41-01
3 pages
Designing A WISHBONE Protocol Network Adapter For An Asynchronous Network-on-Chip
No ratings yet
Designing A WISHBONE Protocol Network Adapter For An Asynchronous Network-on-Chip
7 pages
Ropeway Spare Parts Catalog
No ratings yet
Ropeway Spare Parts Catalog
36 pages
100N03L STMicroelectronics
No ratings yet
100N03L STMicroelectronics
14 pages
Electron Beam Machining
No ratings yet
Electron Beam Machining
13 pages
Manual ZDBX-36R - CHINESE DECAPER FOR WIRES
No ratings yet
Manual ZDBX-36R - CHINESE DECAPER FOR WIRES
19 pages
Lubricants MSDS GenCOOL EG EN
No ratings yet
Lubricants MSDS GenCOOL EG EN
5 pages
Heat-Treated Steel Forgings Specs
No ratings yet
Heat-Treated Steel Forgings Specs
2 pages
Installation & Service Manual For Use With Remote Single Condensing Unit (51-RL, 76-RL & 101-RL)
No ratings yet
Installation & Service Manual For Use With Remote Single Condensing Unit (51-RL, 76-RL & 101-RL)
48 pages
Liquify Filter Tools in Photoshop
No ratings yet
Liquify Filter Tools in Photoshop
3 pages
15 - Photochemistry PDF
No ratings yet
15 - Photochemistry PDF
9 pages

Efficient Sequential Pattern Mining

Uploaded by

Efficient Sequential Pattern Mining

Uploaded by

Outline

Sequential Pattern Mining Sequential Pattern Mining

Solution 53 frequent subsequences

id Sequence 1=<a> support(1) = 4

PrefixSpan - characteristics Bi-level Projection 20

Bi-level projection (2) Bi-level projection (3) - optimization

a 0 <a(bd)> is not frequent To construct <a(bc)>-projected database,

and performance improvements, Proceedings 5th Int. /conf. Any Questions?

You might also like