0% found this document useful (0 votes)
6 views3 pages

DM Week 2 Des

The KDD (Knowledge Discovery in Databases) process involves a multi-step approach to extract useful knowledge from large datasets, including data selection, preprocessing, transformation, mining, evaluation, and representation. Each step focuses on improving data quality, applying algorithms to discover patterns, and presenting findings in an understandable format. The process emphasizes the importance of handling data properly to ensure meaningful insights are derived.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views3 pages

DM Week 2 Des

The KDD (Knowledge Discovery in Databases) process involves a multi-step approach to extract useful knowledge from large datasets, including data selection, preprocessing, transformation, mining, evaluation, and representation. Each step focuses on improving data quality, applying algorithms to discover patterns, and presenting findings in an understandable format. The process emphasizes the importance of handling data properly to ensure meaningful insights are derived.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Explain in detail about KDD process .

The KDD (Knowledge Discovery in Databases) process is a


multi-step procedure used to extract useful knowledge from
large datasets. It is often associated with data mining but
encompasses more than just the data mining step itself.

Here’s a detailed step-by-step explanation of the KDD process


in clear points:

1. Data Selection

Purpose: Identify the relevant data sources from


potentially many heterogeneous databases.

Details:

Choose data that is relevant to the analysis goals.

May involve multiple data sources (e.g., databases,


data warehouses, flat files).

Ensures the dataset is focused and manageable.

2. Data Preprocessing (Cleaning)

Purpose: Remove noise and inconsistencies to improve


data quality.

Details:

Handle missing values, noisy data, and


inconsistencies.
Examples: Removing duplicates, correcting wrong
entries, dealing with outliers.

This is critical since poor-quality data leads to poor


mining results

3. Data Transformation (Integration and Reduction)

Purpose: Convert data into appropriate formats for


mining.

Details:

Integration: Combine data from different sources


into a coherent dataset.

Transformation: Normalize or aggregate data (e.g.,


scaling numeric values, encoding categorical data).

Reduction: Reduce the data volume but keep


relevant information (e.g., feature selection,
dimensionality reduction).

4. Data Mining

Purpose: Apply algorithms to extract patterns or models


from prepared data.

Details:

Use techniques like classification, clustering,


association rule mining, regression, etc.

This is the core step where intelligent methods are


applied.
Output could be patterns, trends, relationships, or
predictive models

5. Pattern Evaluation

Purpose: Identify the truly interesting, useful, and valid


patterns.

Details:

Assess the discovered patterns for relevance and


novelty.

Remove redundant or insignificant patterns.

Criteria used: statistical significance, usefulness,


understandability.

6. Knowledge Representation (Visualization)

Purpose: Present the mined knowledge in a user-friendly


way.

Details:

Use visualization tools like charts, graphs,


dashboards, or reports.

Helps stakeholders understand and act on the


findings.

Often includes interactive interfaces for exploring


results.

You might also like