0% found this document useful (0 votes)
31 views38 pages

DWM Assignment Ques

Uploaded by

yashshende208
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views38 pages

DWM Assignment Ques

Uploaded by

yashshende208
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 38

CHAPTER 1

5 MARKS
SR. NO
1
2
3
4

10 MARKS
SR. NO

3
4

6
CHAPTER 1

5 MARKS
QUESTION
Difference between OLTP and OLAP
Every data structure in the Data Warehouse contains the time element. Why?
What are the basic building blocks of Data warehouse?
Difference between ER modeling vs Dimensional modeling

10 MARKS
QUESTION

Suppose that a data warehouse consists of the three dimensions time, doctor and patiet and
two measures count and charge, where charge is the fee that a doctor charges a patient for a
visit

i) Draw a star schema daigram for the above data warehouse.

ii) Starting with the base cuboid [day, doctor, patient], what specific OLAP operations should b
performed in order to list the total fee collected by eaach doctor in 2010?

iii) To obtain the same list, write an SQL query assuming the data are stored in a relational
database with the schema fee (day, month, year, doctor, hospital, patient, count, charge).

The college wants to record the marks for the cources completed by students using the
dimensions:

a) Course b)Student c) Time and a measure of Agrregate marks.

Create a cube and describe following operation:


i)roll up ii)Drill down iii)Slice iv)Dice
Consider the quarterly sales of four companies C1,C2,C3,C4. The dimensions are:
a)Time
b) Shopping category(Mens, Womens, Electronics, Home)
c) Company
Create a cube and describe all five OLAP operation
For a supermarket chain, consider the dimensions namely Product, Store, Time, Promosion. T
schema contains the three facts namely units_sales, dollar_sales, and cost_dollar.

Design a star schema and calculate the maximum number of base fact table records for the
values given below:

Time period: 5 years

Stores: 300 reporting daily sales


Product: 40000 products in each stpre(about 4000 sell daily in each store)
Promotion: a sold item may be in only one promotion in a store on a given day.

Differentiate between Star schema and Snowflake schema. Design star schema for company
sales with three dimensions such as Location, Item and Time.

What is dimensional modeling? Design the data warehouse dimensional model for a wholesal
furniture company. The data warehouse has to analyze the company's situation at least with
respect to the Furniture, Customer, and Time. Moreover, the company needs to analyze: The
furniture with respect to its type, category, and material. The customer with respect to their
spatial location, by considering at least cities, regions, and states. The company is interested
learning the quantity, income, and discount of its sales.
YEAR
MAY_22, MAY_23, DEC_23
DEC_22, DEC_24
MAY_23
DEC_24

YEAR

DEC_19

MAY_22, DEC_23

DEC_22
DEC_22, DEC_24

MAY_23

DEC_23
CHAPTER 2

5 MARKS
SR. NO QUESTION
1 Explain major issues in data mining.

2 Application of Data mining


3 Short note on techniques of Data loading

10 MARKS

SR. NO QUESTION
Describe the steps involved in Data Mining when viewed as a process of knowledge
1
discovery

Develpop a model to predict the salary of college graduates with 10 years of work
experience using linear regression.

2
2
.

Suppose that the data for analysis includes the sttribute salary. We have the
following values for salary (in thousands of dollars), shown in incresing order: 30, 36,
47, 50, 52, 52, 56, 60, 63, 70, 70, 110
3 i) What are the mean, median, mode and midrange of the data?

ii) Find the first quartile (Q1) and the third quartile (Q3) of the data.
iii) Show a boxplot of tha data.

4 Discuss the different steps involved in data processing

5 Discuss the different types of attributes.


6 Discuss different Data visualization technique
Explain KDD process with neat daigram. Also state any five applications of data
7
mining.
YEAR
MAY_22, MAY_23, MAY_24

DEC_23
DEC_23

YEAR

DEC_19, MAY_23

DEC_19
DEC_19

DEC_19

MAY_22, DEC_23

DEC_22, DEC_24
MAY_23, DEC_23

DEC_23, MAY_24
CHAPTER 3

5 MARKS
SR. NO QUESTION

1 What are the various methods for estimating classifiers accuaracy.

2 What are the various issues regarding classification and prediction?


Explain Holdout and Random subsampling method to evaluate the accuracy of
3
classifier.

10MARKS
SR. NO QUESTION

Why tree prunning useful in decesion tree induction? What is a drawback of


1
using a separate set of tuples to evaluate prunning?

Apply the Naive Bayes Classifier to classify the tuple < Red, SUV, Domestic>
For the gien dataset below.

2
2

Explain Decision Tree based Classification approach with example. Discuss


3
metrics for evaluationg clasfier performance.
A data sample is given below. Find whether Patient X has flu or not using
Naive Bayes classifier.
If X = (chills=Y, runny nose=N, headache=Mild, fever=Y, flu=?)

5 Describe in detail about how to evaluate accuracy of the classifier.


A company wants to predict whether a customer will subscribe to a premium
membership based on their demographic and browsing behavior data. The
dataset contains information about customers, including age, gender, income,
browsing time, and subscription status.

Given the training data for height classification, classify the tuple, t=<Rohit,
M, 1.95> using Naïve Bayes Classification.

7
7
YEAR

MAY_22, DEC_22, DEC_23

MAY_22

DEC_24

YEAR

DEC_19

DEC_22
DEC_22

MAY_23

DEC_23

MAY_24
MAY_24

DEC_24
DEC_24
CHAPTER 4

5 MARKS

SR. NO QUESTION

1 Explain K meams clustering algorithm and Draw flowchart.

2 Explain FP Growth Algorithm.

10 MARKS
SR. NO QUESTION

Show the dendogram created by the complete link clusteriong algorithm for the given
set of points

The table below shows the six data points. Apply Agglomerative clustering to find
clusters. Use Euclidian distance measure. Consider single linkage.
2

Suppose that the data mining task is to cluster the following points into 3 clusters.
A1(2,10), A2(2,5), A3(8,4), B1(5,8), B2(7,5), B3(6,4), C1(1,2), C2(4,9). The distance
function is Euclidean distance. Suppose we initially assign A1, B1, C1 as the center of
3 each cluster respectively. Use the k means algorithm to show only

a) The three cluster centers after the first round of execution

b) The final three clusters


Use agglomative algorithm using the following data and plot a dendogram using link
approach. The following figure contains sample data items indicating the distance
between the elements

4
Explain K meams clustering algorithm. Discuss its advantages and limitations. Apply K-
means algorithm for the following data set with 3 ckusters.
5
Data set = {2,3,6,8,9,12,15,18,22}

Consider the data given below. Create adjacency matrix. apply complete link algorithm
tocluster the given data set and draw the dendogram.

Following table gives fat and proteins content of items. Apply single linkage clustering
and construct dendrogram.

7
7

Consider four objects with two attribute (X and Y). These four objects are to be grouped
together into two clusters using k-means clustering algorithm. Following are the objects
with their attribute values.

8
YEAR

DEC_23

DEC_24

YEAR

DEC_19
MAY_22

DEC_22

DEC_22
MAY_23

MAY_23

MAY_24
MAY_24

DEC_24
CHAPTER 5

5 MARKS

SR. NO QUESTION

1 Elucidate market basket analysis with an example.

10 MARKS

SR. NO QUESTION

Consider a transaction database given below

Use apriori algorithm with min-support count = 2 and min-confidence = 60% to find all frequent
itemset and strong association rules.

1
2 Demonstate Multidimensional and Multilevel Association Rule Mining with suitable examples.

A databse has four transactions. Let min sup=60% and min conf=80%

Find all the frequent item sets using apriori algorithm and also list all the strong association rules.

A database has five transactions

Let minimum support=3. Final all frequent itemsets using FP-growth algorithm

Apply apriori algorithm on the following dataset to find strong association rules. minimum support
5
threshold (s= 33.33%) and minimum confident threshold (c=60%)

Use apriori algorithm with min-support count = 2 and min-confidence = 60% to find all frequent
itemset and strong association rules.

6
6

Consider the following transaction database with minimum support 50% and minimum confidence
66%. Find the frequent patterns and strong association rules.

For the table given perform Apriori algorithm and show frequent item set and strong association
rules. Assume Minimum Support of 30% and Minimum confidence of 70%.

8
8

Given the following data, apply the Apriori algorithm. Find frequent item set and strong association
rules. Given Support threshold=50%, Confidence=60%

9
YEAR

DEC_19, DEC_22, MAY_24

YEAR

DEC_19,
DEC_19, MAY_23, DEC_23,
MAY_24, DEC_24

MAY_22

DEC_22

DEC_22

MAY_23
MAY_23

DEC_23

MAY_24
MAY_24

DEC_24
CHAPTER 6

5 MARKS

SR. NO QUESTION

1 Explain Web usage mining in detail

2 Explain page rank techniques in detail.

3 Short note on Web content mining.

4 Discuss different applications of Web Mining.

10 MARKS

SR. NO QUESTION

1 What is spatial data? Explain CLARANS Extension.

What is Web structure Mining? List the approaches used to structure the web
2 pages to improve on the effectiveness of search engines and crawlers. Explain
the Page Rank techniques in detail.

Is web mining different from classical data mining. Justify your answer.
3
Describe types of web mining.
What is web mining? Explain web structure mining and web usage mining in
4
Detail.

5 Explain page rank algorithm with example.

What is Web Mining? Differentiate between Web Mining and Data Mining.
6
Explain types of Web Mining.
YEAR

MAY_22, DEC_23, MAY_24

MAY_23

DEC_23

DEC_24

YEAR

DEC_19, DEC_22

DEC_19, DEC_22

DEC_22
MAY_23

DEC_23, MAY_24, DEC_24

DEC_24

You might also like