Data Mining – Exam Notes (Easy Language) 1.
Introduction • Data Mining is the process of
discovering useful patterns and knowledge from large datasets. • It is also called Knowledge
Discovery in Databases (KDD). • Steps of KDD: Data Cleaning → Data Integration → Data
Selection → Data Transformation → Data Mining → Pattern Evaluation → Knowledge Presentation.
2. Types of Data • Structured Data – tables, rows, columns. • Unstructured Data – text, images,
videos. • Semi-structured – XML, JSON. 3. Data Mining Tasks a) Descriptive – Find patterns that
describe data (clustering, association rules). b) Predictive – Predict future outcomes (classification,
regression). 4. Data Preprocessing • Data Cleaning – remove noise, missing values. • Data
Integration – combine data from multiple sources. • Data Transformation – normalization,
aggregation. • Data Reduction – reduce size using PCA, sampling. 5. Classification • Predicts a
category/class label. • Algorithms: Decision Tree, Naive Bayes, KNN, SVM, Random Forest. •
Example: Email → spam or not spam. 6. Regression • Predicts continuous values. • Algorithms:
Linear Regression, Polynomial Regression. • Example: Predicting house prices. 7. Clustering •
Groups similar data objects without labels. • Algorithms: K-Means, Hierarchical Clustering,
DBSCAN. • Example: Customer segmentation. 8. Association Rule Mining • Finds relationships
among items. • Algorithm: Apriori. • Example: “If customer buys bread, they also buy butter.” 9.
Outlier Detection • Identifying data points that are very different from the rest. • Useful in fraud
detection. 10. Evaluation Metrics • Classification: Accuracy, Precision, Recall, F1-score. •
Clustering: Silhouette Score, SSE. 11. Applications of Data Mining • Market Basket Analysis •
Healthcare diagnosis • Fraud detection (banking) • Recommendation systems (Netflix, Amazon) •
Customer segmentation in marketing