0% found this document useful (0 votes)
11 views4 pages

Solved Data Mining Warehousing Paper

The document outlines key concepts in Data Mining and Warehousing, including the framework of a data warehouse, dimensional modeling, and the differences between OLTP and OLAP systems. It also discusses data mining techniques, metrics, and algorithms such as the Apriori and FP-Growth, as well as classification and clustering methods. Additionally, it highlights the features of data warehouses and the requirements for effective clustering.

Uploaded by

Muskan Dhondney
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views4 pages

Solved Data Mining Warehousing Paper

The document outlines key concepts in Data Mining and Warehousing, including the framework of a data warehouse, dimensional modeling, and the differences between OLTP and OLAP systems. It also discusses data mining techniques, metrics, and algorithms such as the Apriori and FP-Growth, as well as classification and clustering methods. Additionally, it highlights the features of data warehouses and the requirements for effective clustering.

Uploaded by

Muskan Dhondney
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Solved Paper - Data Mining & Warehousing

Q1(a) Framework of Data Warehouse:

A typical data warehouse has four components: Operational Database, ETL Process, Data

Warehouse (staging, integration, access layers), and front-end tools. [Diagram is usually required].

Q1(b) Dimensional Model:

A data structure optimized for data warehousing tools. Design steps: Choose business process,

Declare grain, Identify dimensions, Identify facts.

Q1(c) OLTP vs OLAP:

OLTP: Real-time, high volume, normalized data.

OLAP: Analytical, historical data, denormalized schema.

Q1(d) EDW Parts:

1. Data Sources, 2. ETL Tools, 3. Staging Area, 4. Data Storage, 5. Metadata, 6. Query Tools.

Q1(e) Star Schema Example:

Fact Table: Transactions (Amount, Date, AccountID)

Dimension Tables: Customer, Time, Branch, AccountType

Q1(f) Hybrid DW Model:

Used when combining top-down and bottom-up approaches. Preferred when flexibility and faster

implementation are required.

Q2(a) Data to be mined:

Patterns, associations, clusters, outliers, predictive models.


Q2(b) Data Mining Metrics:

Support, Confidence, Lift, Accuracy, Precision, Recall, F-measure.

Q2(c) Statistical Description:

Includes measures of central tendency (mean, median), dispersion (variance, std deviation), and

distribution.

Q2(d) Need for Data Cleaning:

To remove noise, handle missing values, correct inconsistencies and improve data quality.

Q2(e) Apriori Algorithm:

Frequent itemsets: {3}, {5}, {2,5}, {1,3}

Rules example: {2}->{5}, Support=50%, Confidence=75%

Q2(f) FP-Growth Tree:

1. Count frequency.

2. Order items.

3. Build tree level-wise.

4. Extract patterns from tree.

Q3(a) Classification vs Prediction:

Classification predicts categorical labels, prediction forecasts continuous values.

Q3(b) Linear Regression:

Models relationship as Y = aX + b. E.g., Predicting sales based on advertising spend.


Q3(c) Classifier Performance:

Metrics: Accuracy, Confusion Matrix, ROC Curve, Precision, Recall.

Q3(d) K-means Steps:

1. Choose k

2. Assign points

3. Update centroids

4. Repeat till convergence.

Q3(e) ID3 Algorithm:

Build decision tree using information gain. Root: Age. Classification: Uses best attributes to classify

buys_computer.

Q3(f) Clustering Applications:

Marketing, Insurance Fraud Detection, Document Categorization, Customer Segmentation.

Q4(a) Features of Data Warehouse:

Subject-oriented, Integrated, Time-variant, Non-volatile.

Q4(b) Attribute Types:

Nominal, Ordinal, Interval, Ratio.

Q4(c) Clustering Requirements:

Scalability, Ability to deal with noise, Interpretability, High dimensionality support.

Q4(d) Granularity of Facts:

Level of detail. Fine granularity gives detailed data. Coarse granularity is summarized.
Q4(e) Association Rule Metrics:

Support: Frequency of itemset. Confidence: Likelihood of consequent given antecedent. Risk: Often

linked with lift or leverage.

Q4(f) Classification Applications:

Spam Detection, Medical Diagnosis, Customer Churn, Credit Scoring.

You might also like