0% found this document useful (0 votes)
5 views3 pages

Data Mining Module1 Notes ReferenceBased

The document provides an overview of data mining, defining it as the process of extracting hidden patterns from large data sets using techniques from statistics and machine learning. It outlines various data mining tasks such as classification, prediction, and cluster analysis, as well as the KDD process, which includes steps like selection and preprocessing. Additionally, it discusses data mining functionalities, classifications, issues, and central tendency measures, along with concepts related to data warehousing.

Uploaded by

ATHUL LAL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views3 pages

Data Mining Module1 Notes ReferenceBased

The document provides an overview of data mining, defining it as the process of extracting hidden patterns from large data sets using techniques from statistics and machine learning. It outlines various data mining tasks such as classification, prediction, and cluster analysis, as well as the KDD process, which includes steps like selection and preprocessing. Additionally, it discusses data mining functionalities, classifications, issues, and central tendency measures, along with concepts related to data warehousing.

Uploaded by

ATHUL LAL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Data Mining - Module I Notes (Based on Reference Textbooks)

**1. Introduction to Data Mining** (Han & Kamber, Tan et al.):

- **What is Data Mining?**

- A process of extracting hidden patterns from large data sets.

- Integrates techniques from statistics, machine learning, database systems.

**2. Data Mining Tasks** (Han & Kamber):

- **Classification:** Assigning items to predefined categories.

- **Prediction:** Forecasting future data trends.

- **Cluster Analysis:** Grouping a set of objects in such a way that objects in the same group are

more similar to each other.

- **Association Rule Mining:** Discovering interesting relations between variables.

- **Outlier Detection**

**3. KDD Process** (Tan et al.):

- **Steps:**

1. Selection

2. Preprocessing

3. Transformation

4. Data Mining

5. Interpretation/Evaluation

**4. Data Mining Functionalities** (Han & Kamber):

- Finding patterns, associations, correlations, trends.

- Classifying and predicting outcomes.


**5. Classification of Data Mining Systems**:

- Based on the type of data (relational, transactional, spatial, multimedia, time-series).

- Based on the kind of knowledge mined (characterization, discrimination, association).

- Based on the techniques used (machine learning, statistics, visualization, database-oriented).

**6. Issues in Data Mining**:

- Handling noisy and incomplete data

- Performance and scalability

- Integration of data mining with databases

- Data ownership and privacy issues

**7. Data Objects and Attribute Types**:

- **Nominal:** Categories with no order (e.g., hair color)

- **Binary:** Two categories (e.g., true/false)

- **Ordinal:** Categories with a meaningful order (e.g., small, medium, large)

- **Numeric:**

- **Interval:** No true zero (e.g., temperature)

- **Ratio:** True zero exists (e.g., age)

**8. Central Tendency Measures** (Han & Kamber):

- **Mean:** Average value

- **Median:** Middle value

- **Mode:** Most frequent value

**9. Data Warehousing Concepts** (Paulraj Ponnaiah, Sam Anahory):

- **Definition:** A subject-oriented, integrated, time-variant, and non-volatile collection of data.

- **Multidimensional Data Models:**


- **Data Cubes:** Allow data to be modeled and viewed in multiple dimensions.

- **Schemas:**

- **Star Schema**

- **Snowflake Schema**

- **Fact Constellation**

10. Reference Texts Used:

- Jiawei Han & Micheline Kamber, *Data Mining: Concepts and Techniques*

- Pang-Ning Tan et al., *Introduction to Data Mining*

- Arun K. Pujari, *Data Mining Techniques*

- Sam Anahory & Dennis Murray, *Data Warehousing in the Real World*

- Paulraj Ponnaiah, *Data Warehousing Fundamentals*

You might also like