Data Mining-Mining Time Series Data
Data Mining-Mining Time Series Data
Time-Series Database
Consists of sequences of values or events obtained
over repeated measurements of time (weekly,
hourly)
Stock market analysis, economic and sales
forecasting,
scientific
and
engineering
experiments, medical treatments etc.
Can also be considered as a Sequence database
Consists of a sequence of ordered events (time
optional)
Web page Traversal Sequence
Time-Series data can be analyzed to:
Identify correlations
Similar / Regular patterns, trends, outliers
Trend Analysis
Time Series involving a variable Y can be
represented as a function of time t, Y = F(t)
Goals of Time-Series Analysis
Modeling time series - To gain insight into the
mechanism
Forecasting time series - For prediction
Trend Analysis Components
Fluctuations
conceal
true
underlying
movement of the series and non-seasonal
characteristics
De-seasonalize the data
Ex: 3 7 2 0 4 5 9 7 2
Moving average of order 3: 4 3 2 3 6 7 6
Weighted average (1 4 1): 5.5 2.5 1 3.5 5.5
8 6.5
Similarity Search
Normal database query finds exact match
Similarity search finds data sequences that differ
only slightly from the given query sequence
Two categories of similarity queries
Whole matching: find a sequence that is similar to
the query sequence
Subsequence matching: find all pairs of similar
sequences
Typical Applications
Financial market
Market basket data analysis
Scientific databases
Medical diagnosis
Data Reduction and Transformation
Time Series data high-dimensional data each
point of time can be viewed as a dimension
Dimensionality Reduction techniques
Signal Processing techniques
Distance
preserving
Ortho-normal
transformations
Atomic matching
Find all pairs of gap-free windows of a small
length that are similar
Window stitching
Stitch similar windows to form pairs of large
similar subsequences allowing gaps between
atomic matches
Subsequence Ordering
Linearly order the subsequence matches to
determine whether enough similar pieces exist