Graph Sampling
Graph sampling is a technique used to select a subset of nodes or edges from a larger graph while
preserving its structural properties. The goal is to obtain a representative sample that can be used for
analysis, visualization, or other tasks.
Types of graph sampling:
1. Node sampling: Selecting a subset of nodes.
2. Edge sampling: Selecting a subset of edges.
3. Snowball sampling: Starting with a seed node and expanding to neighboring nodes.
4. Random walk sampling: Traversing the graph randomly.
Frequent subgraph mining
"Frequent subgraph mining is a specialized area within data mining that focuses on discovering patterns
and structures in graph data. Graphs are versatile representations of complex relationships, and
frequent subgraph mining aims to identify subgraphs that occur frequently across a dataset.
Vertices (Nodes): Represent entities or objects.
Edges: Represent relationships or connections between vertices.
Subgraphs: A subset of a graph's vertices and edges that forms a graph itself.
Frequent Subgraphs:
A subgraph is considered frequent if it appears in a significant portion of the graphs within a dataset,
often defined by a minimum support threshold.
Sequence mining
Sequence mining is a data mining technique focused on identifying patterns, trends, and relationships in
ordered sequences of data. This method is particularly useful in contexts where the order of events or
items is significant, such as time series data, customer behavior analysis, and biological data.
Key Concepts
Sequential Patterns: Patterns where the order of items matters. For example, in a customer purchase
sequence, the order of purchased items can reveal shopping behavior.
Time-Series Data: A sequence of data points indexed in time order. This is common in financial markets,
sensor data, and web logs.
Subsequences: A sequence derived from another sequence by deleting some elements without changing
the order of the remaining elements.
Frequent Patterns: Patterns that appear in a dataset with a frequency that meets or exceeds a specified
threshold.
Tree mining
Tree mining in data mining refers to the process of discovering patterns, relationships, and structures in
data represented in tree form. Trees are commonly used structures in data mining for organizing
hierarchical data, and tree mining techniques can help extract valuable insights from this data.
Key Concepts in Tree Mining
Tree Structures:
Rooted Trees: A tree with a designated root node. Each node can have zero or more children.
Subtrees: Any node and its descendants form a subtree.
Types of Tree Mining:
Frequent Subtree Mining: Identifying subtrees that appear frequently within a dataset. This is similar to
frequent itemset mining but focuses on tree structures.
Classification Trees: Trees used to classify data points based on attributes. They are built using
algorithms like CART, C4.5, or ID3.
Regression Trees: Similar to classification trees but used for predicting continuous outcomes.