Explain in detail about KDD process .
The KDD (Knowledge Discovery in Databases) process is a
multi-step procedure used to extract useful knowledge from
large datasets. It is often associated with data mining but
encompasses more than just the data mining step itself.
Here’s a detailed step-by-step explanation of the KDD process
in clear points:
1. Data Selection
Purpose: Identify the relevant data sources from
potentially many heterogeneous databases.
Details:
Choose data that is relevant to the analysis goals.
May involve multiple data sources (e.g., databases,
data warehouses, flat files).
Ensures the dataset is focused and manageable.
2. Data Preprocessing (Cleaning)
Purpose: Remove noise and inconsistencies to improve
data quality.
Details:
Handle missing values, noisy data, and
inconsistencies.
Examples: Removing duplicates, correcting wrong
entries, dealing with outliers.
This is critical since poor-quality data leads to poor
mining results
3. Data Transformation (Integration and Reduction)
Purpose: Convert data into appropriate formats for
mining.
Details:
Integration: Combine data from different sources
into a coherent dataset.
Transformation: Normalize or aggregate data (e.g.,
scaling numeric values, encoding categorical data).
Reduction: Reduce the data volume but keep
relevant information (e.g., feature selection,
dimensionality reduction).
4. Data Mining
Purpose: Apply algorithms to extract patterns or models
from prepared data.
Details:
Use techniques like classification, clustering,
association rule mining, regression, etc.
This is the core step where intelligent methods are
applied.
Output could be patterns, trends, relationships, or
predictive models
5. Pattern Evaluation
Purpose: Identify the truly interesting, useful, and valid
patterns.
Details:
Assess the discovered patterns for relevance and
novelty.
Remove redundant or insignificant patterns.
Criteria used: statistical significance, usefulness,
understandability.
6. Knowledge Representation (Visualization)
Purpose: Present the mined knowledge in a user-friendly
way.
Details:
Use visualization tools like charts, graphs,
dashboards, or reports.
Helps stakeholders understand and act on the
findings.
Often includes interactive interfaces for exploring
results.