Solved Paper - Data Mining & Warehousing
Q1(a) Framework of Data Warehouse:
A typical data warehouse has four components: Operational Database, ETL Process, Data
Warehouse (staging, integration, access layers), and front-end tools. [Diagram is usually required].
Q1(b) Dimensional Model:
A data structure optimized for data warehousing tools. Design steps: Choose business process,
Declare grain, Identify dimensions, Identify facts.
Q1(c) OLTP vs OLAP:
OLTP: Real-time, high volume, normalized data.
OLAP: Analytical, historical data, denormalized schema.
Q1(d) EDW Parts:
1. Data Sources, 2. ETL Tools, 3. Staging Area, 4. Data Storage, 5. Metadata, 6. Query Tools.
Q1(e) Star Schema Example:
Fact Table: Transactions (Amount, Date, AccountID)
Dimension Tables: Customer, Time, Branch, AccountType
Q1(f) Hybrid DW Model:
Used when combining top-down and bottom-up approaches. Preferred when flexibility and faster
implementation are required.
Q2(a) Data to be mined:
Patterns, associations, clusters, outliers, predictive models.
Q2(b) Data Mining Metrics:
Support, Confidence, Lift, Accuracy, Precision, Recall, F-measure.
Q2(c) Statistical Description:
Includes measures of central tendency (mean, median), dispersion (variance, std deviation), and
distribution.
Q2(d) Need for Data Cleaning:
To remove noise, handle missing values, correct inconsistencies and improve data quality.
Q2(e) Apriori Algorithm:
Frequent itemsets: {3}, {5}, {2,5}, {1,3}
Rules example: {2}->{5}, Support=50%, Confidence=75%
Q2(f) FP-Growth Tree:
1. Count frequency.
2. Order items.
3. Build tree level-wise.
4. Extract patterns from tree.
Q3(a) Classification vs Prediction:
Classification predicts categorical labels, prediction forecasts continuous values.
Q3(b) Linear Regression:
Models relationship as Y = aX + b. E.g., Predicting sales based on advertising spend.
Q3(c) Classifier Performance:
Metrics: Accuracy, Confusion Matrix, ROC Curve, Precision, Recall.
Q3(d) K-means Steps:
1. Choose k
2. Assign points
3. Update centroids
4. Repeat till convergence.
Q3(e) ID3 Algorithm:
Build decision tree using information gain. Root: Age. Classification: Uses best attributes to classify
buys_computer.
Q3(f) Clustering Applications:
Marketing, Insurance Fraud Detection, Document Categorization, Customer Segmentation.
Q4(a) Features of Data Warehouse:
Subject-oriented, Integrated, Time-variant, Non-volatile.
Q4(b) Attribute Types:
Nominal, Ordinal, Interval, Ratio.
Q4(c) Clustering Requirements:
Scalability, Ability to deal with noise, Interpretability, High dimensionality support.
Q4(d) Granularity of Facts:
Level of detail. Fine granularity gives detailed data. Coarse granularity is summarized.
Q4(e) Association Rule Metrics:
Support: Frequency of itemset. Confidence: Likelihood of consequent given antecedent. Risk: Often
linked with lift or leverage.
Q4(f) Classification Applications:
Spam Detection, Medical Diagnosis, Customer Churn, Credit Scoring.