Data Mining
Chapter 1: Introduction to Data Mining
1. Define data mining and explain its functionalities.
2. Explain the different classifications of data mining systems.
3. Describe the task primitives of data mining.
4. How does a data mining system integrate with a database or data warehouse?
5. Discuss the key issues in data mining.
6. A company uses data mining to classify customer transactions. Given that 10% of the
transactions are fraudulent, estimate the number of fraudulent transactions from a
dataset of 50,000 transactions.
7. Explain the Knowledge Discovery in Databases (KDD) process.
Chapter 2: Data Pre-processing
1. What is data summarization? Explain its importance in data mining.
2. Discuss various data cleaning techniques with examples.
3. Explain the process of data integration and transformation.
4. Suppose a dataset contains missing values. If 20% of a 1,000-record dataset has missing
values, how many records need imputation?
5. Describe the concept of data reduction and its techniques.
6. What is dimensionality reduction? Explain the CUR decomposition method.
7. Differentiate between feature extraction, feature transformation, and feature selection.
Chapter 3: Concept Description, Mining Frequent Patterns, Associations, and Correlations
1. What is concept description in data mining? How is it useful?
2. Explain data generalization and summarization-based characterization.
3. What are frequent item-set mining methods? Explain any two.
4. Discuss the various types of association rules and correlation analysis.
5. Explain advanced association rule techniques and their importance.
6. How do we measure the quality of rules in data mining?
Chapter 4: Classification and Prediction
1. Differentiate between classification and prediction.
2. Explain the issues related to classification and prediction.
3. Describe statistical-based and distance-based classification algorithms.
4. Explain decision tree-based and neural network-based classification algorithms.
5. What are rule-based classification techniques? Give an example.
6. Discuss the evaluation metrics used to measure classifier accuracy.
7. Explain logistic regression and its role in prediction.
8. How can tools like WEKA and DB Miner be used in classification?
9. Given the transactions:
T1: {Milk, Bread, Butter}
T2: {Milk, Bread}
T3: {Milk, Butter}
T4: {Bread, Butter}
1. Calculate the support and confidence for the rule {Milk} → {Bread}.
2. Find frequent itemsets using Apriori algorithm with a minimum support of 50%.
Chapter 5: Cluster Analysis
1. What is clustering in data mining? Explain its importance.
2. Discuss the problem definition of clustering and its applications.
3. Explain the K-Means algorithm and its additional issues.
4. What is the PAM algorithm? How does it differ from K-Means?
5. Differentiate between agglomerative and divisive hierarchical clustering methods.
6. Explain outlier detection and its importance in clustering.
7. How do we perform clustering on high-dimensional data?
8. A hierarchical clustering algorithm merges two clusters with distances D(A, B) = 5 and
D(B, C) = 7. Compute the new distance using:
1. Single linkage method
2. Complete linkage method
9. Discuss clustering techniques for graph and network data.
Chapter 6: Web Mining and Other Data Mining Techniques
1. What is web mining? Explain its different types.
2. Discuss web content mining, web usage mining, and web structure mining.
3. Explain the structure and issues related to web logs.
4. What is spatial data mining? How is it different from temporal mining?
5. Describe the concepts of multimedia mining.
6. What are the applications of distributed and parallel data mining?
7. A website has the following clickstream data:
Page A → Page B (50 clicks)
Page B → Page C (30 clicks)
Page C → Page A (20 clicks)
8. Compute the transition probability matrix for web usage mining