C 1100CDT3051222011100IET301122101 Pages: 3
Reg No.:_______________ Name:__________________________
APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY
Fifth Semester B.Tech Degree Regular and Supplementary Examination December 2022 (2019 Scheme)
Course Code: CDT 305
Course Name: DATA ANALYTICS
Max. Marks: 100 Duration: 3 Hours
PART A
(Answer all questions; each question carries 3 marks) Marks
1 How is a data warehouse different from a database? (3)
2 Explain following attribute types with examples: (3)
(i) Ordinal attribute
(ii) Nominal attribute
(iii) Numeric attribute
3 Explain any three methods for measuring the central tendency of data. (3)
4 Why do we need to pre-process the data? (3)
5 Compare rare pattern and negative pattern with the help of an example. (3)
6 Explain join and prune actions in Apriori algorithm. (3)
7 What is a dendrogram? How dendrogram is constructed in agglomerative and (3)
divisive hierarchical clustering techniques.
8 Explain density-reachability and density-connectivity concepts in DBSCAN. (3)
9 What are stop words? How a stop list is created. (3)
10 What is a language model? (3)
PART B
(Answer one full question from each module, each question carries 14 marks)
Module -1
11 a) Compare roll-up and drill-down operations in OLAP with an example. (8)
b) Given two objects represented by the tuples (10, 8, 4, 12) and (20,10,16,8): (6)
(i) Compute the Euclidean distance between the tuples.
(ii) Compute the Manhattan distance between the two objects.
12 a) Explain any four probability distributions. (8)
b) Explain the iterative analytics process model. (6)
Page 1 of 3
1100CDT3051222011100IET301122101
Module -2
13 a) Explain covariance analysis of numeric attributes with an example. (9)
b) Suppose that the data for analysis includes the marks of students in a class. (5)
The mark values for the data tuples are. Construct the five-number summary
for the dataset. (33, 37, 37, 35, 45, 36, 30, 42, 32, 31, 32, 28, 36, 31, 32, 34, 32,
32, 33, 44, 32, 36, 38, 40, 40, 36, 40, 41)
14 a) Explain any six strategies for data transformation. (6)
b) Explain any two methods for data normalization with examples. (8)
Module -3
15 a) Consider a database having five transactions. Let min_sup=2 and (10)
min_conf=60%. Find the frequent itemsets in the database using Apriori
algorithm.
TID ITEMS
TR1 jam, biscuit, chocolate
TR2 biscuit, butter
TR3 biscuit, milk
TR4 jam, biscuit, butter
TR5 jam, milk
TR6 biscuit, milk
TR7 jam, milk
TR8 jam, biscuit, milk, chocolate
TR9 jam, biscuit, milk
b) Explain any two key measures to quantify the strength of an association rule? (4)
Why Apriori algorithm is slow?
16 a) Explain pattern-growth approach for mining frequent item sets with an (10)
example.
b) What is market basket analysis? (4)
Module -4
17 a) Explain decision tree induction with the help of an example. (8)
b) Explain any two attribute selection measures with an example. (6)
18 a) Given the following distance matrix, construct the dendrogram using (10)
agglomerative clustering with complete linkage and average linkage.
Page 2 of 3
1100CDT3051222011100IET301122101
A B C D E
A 0 8 4 6 10
B 8 0 6 5 9
C 4 6 0 8 2
D 6 5 8 0 7
E 10 9 2 7 0
b) How divisive hierarchical clustering method works? Explain any three (4)
challenges.
Module -5
19 a) Explain Boolean Retrieval with an example. (10)
b) Compare unigram and bigram language model. (4)
20 a) Explain tokenization, stemming and lemmatization with an example. (9)
b) What is case-folding? Explain with an example. (5)
***
Page 3 of 3