0% found this document useful (0 votes)
14 views3 pages

Data Mining Module1 Expanded Notes

The document outlines the fundamentals of data mining, including its definition, the KDD process, and various data mining tasks such as classification, prediction, and clustering. It also discusses types of data, data preprocessing, measures of central tendency, and issues like data quality and privacy. Additionally, it covers data warehousing and multidimensional data models, including schemas like star and snowflake schemas.

Uploaded by

ATHUL LAL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views3 pages

Data Mining Module1 Expanded Notes

The document outlines the fundamentals of data mining, including its definition, the KDD process, and various data mining tasks such as classification, prediction, and clustering. It also discusses types of data, data preprocessing, measures of central tendency, and issues like data quality and privacy. Additionally, it covers data warehousing and multidimensional data models, including schemas like star and snowflake schemas.

Uploaded by

ATHUL LAL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Data Mining - Module I (Expanded Notes Based on Reference Textbooks)

1. **Data Mining:**

- Data mining is the process of discovering interesting, non-trivial, implicit, previously unknown,

and potentially useful patterns or knowledge from large amounts of data.

- Involves multiple disciplines including database systems, statistics, machine learning, and

artificial intelligence.

- Also referred to as Knowledge Discovery in Databases (KDD).

2. **KDD Process:**

- **Selection:** Choosing the relevant data from various sources.

- **Preprocessing:** Removing noise, handling missing values, and resolving inconsistencies.

- **Transformation:** Converting data into appropriate formats for mining.

- **Data Mining:** Applying algorithms to extract patterns.

- **Evaluation:** Interpreting and validating the mined knowledge.

3. **Data Mining Tasks:**

- **Classification:** Assign data to predefined classes using algorithms like decision trees, k-NN,

SVM.

- **Prediction:** Estimate future values based on current data using regression techniques.

- **Clustering:** Group data into clusters with similar characteristics (e.g., k-means).

- **Association Rule Mining:** Discover relationships between items (e.g., Market Basket Analysis

using Apriori).

- **Outlier Detection:** Identify anomalies or rare items that differ from the norm.

4. **Types of Data in Data Mining:**

- **Structured Data:** Relational databases, data warehouses.


- **Semi-structured Data:** XML, JSON.

- **Unstructured Data:** Text, images, videos.

- **Data Streams:** Real-time continuous data.

5. **Data Objects and Attribute Types:**

- **Nominal:** Categorical values with no order (e.g., colors).

- **Binary:** Two values like 0/1, true/false.

- **Ordinal:** Ordered values (e.g., satisfaction level).

- **Numeric:**

- **Interval:** Values with meaningful differences but no true zero (e.g., temperature).

- **Ratio:** Values with a true zero (e.g., height, weight).

6. **Data Preprocessing:**

- Essential for improving the quality of input data.

- Steps: Data cleaning, integration, transformation, reduction, and discretization.

7. **Measures of Central Tendency:**

- **Mean:** Average value of a dataset.

- **Median:** Middle value separating the higher half from the lower half.

- **Mode:** Most frequently occurring value.

8. **Classification of Data Mining Systems:**

- **Based on Data Type:** Relational, spatial, multimedia, text, time-series.

- **Based on Knowledge Type:** Characterization, discrimination, association, classification,

clustering.

- **Based on Technique:** Statistical, machine learning, neural networks, visualization-based.


9. **Major Issues in Data Mining:**

- **Data Quality:** Incomplete, noisy, or inconsistent data.

- **Scalability:** Efficient algorithms for large datasets.

- **Privacy & Security:** Sensitive data protection.

- **Interpretability:** Understandable models for decision-makers.

10. **Data Warehousing:**

- A subject-oriented, integrated, time-variant, non-volatile collection of data.

- Supports decision-making by providing a unified view of enterprise data.

11. **Multidimensional Data Model:**

- Allows data to be modeled and analyzed from multiple dimensions (e.g., time, geography,

product).

- Fundamental concept: **Data Cube** - a multi-dimensional array of values.

12. **Schemas for Multidimensional Data:**

- **Star Schema:** A central fact table connected to dimension tables.

- **Snowflake Schema:** Normalized dimension tables.

- **Fact Constellation:** Multiple fact tables sharing dimension tables (also known as galaxy

schema).

These notes provide a comprehensive overview of all key concepts covered in Module I of Data

Mining.

You might also like