0% found this document useful (0 votes)
3 views1 page

Data Mining Notes

Data Mining, also known as Knowledge Discovery in Databases (KDD), involves extracting useful patterns from large datasets through a series of steps including data cleaning and transformation. It encompasses various tasks such as descriptive and predictive analytics, with applications in market analysis, healthcare, and fraud detection. Key techniques include classification, regression, clustering, and association rule mining, supported by evaluation metrics to assess their effectiveness.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views1 page

Data Mining Notes

Data Mining, also known as Knowledge Discovery in Databases (KDD), involves extracting useful patterns from large datasets through a series of steps including data cleaning and transformation. It encompasses various tasks such as descriptive and predictive analytics, with applications in market analysis, healthcare, and fraud detection. Key techniques include classification, regression, clustering, and association rule mining, supported by evaluation metrics to assess their effectiveness.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Data Mining – Exam Notes (Easy Language) 1.

Introduction • Data Mining is the process of


discovering useful patterns and knowledge from large datasets. • It is also called Knowledge
Discovery in Databases (KDD). • Steps of KDD: Data Cleaning → Data Integration → Data
Selection → Data Transformation → Data Mining → Pattern Evaluation → Knowledge Presentation.
2. Types of Data • Structured Data – tables, rows, columns. • Unstructured Data – text, images,
videos. • Semi-structured – XML, JSON. 3. Data Mining Tasks a) Descriptive – Find patterns that
describe data (clustering, association rules). b) Predictive – Predict future outcomes (classification,
regression). 4. Data Preprocessing • Data Cleaning – remove noise, missing values. • Data
Integration – combine data from multiple sources. • Data Transformation – normalization,
aggregation. • Data Reduction – reduce size using PCA, sampling. 5. Classification • Predicts a
category/class label. • Algorithms: Decision Tree, Naive Bayes, KNN, SVM, Random Forest. •
Example: Email → spam or not spam. 6. Regression • Predicts continuous values. • Algorithms:
Linear Regression, Polynomial Regression. • Example: Predicting house prices. 7. Clustering •
Groups similar data objects without labels. • Algorithms: K-Means, Hierarchical Clustering,
DBSCAN. • Example: Customer segmentation. 8. Association Rule Mining • Finds relationships
among items. • Algorithm: Apriori. • Example: “If customer buys bread, they also buy butter.” 9.
Outlier Detection • Identifying data points that are very different from the rest. • Useful in fraud
detection. 10. Evaluation Metrics • Classification: Accuracy, Precision, Recall, F1-score. •
Clustering: Silhouette Score, SSE. 11. Applications of Data Mining • Market Basket Analysis •
Healthcare diagnosis • Fraud detection (banking) • Recommendation systems (Netflix, Amazon) •
Customer segmentation in marketing

You might also like