0% found this document useful (0 votes)
80 views3 pages

Noida Institute of Engineering and Technology, Greater Noida

This document contains the questions for a Data Analytics exam, divided into 7 sections. It includes questions on topics like supervised vs. unsupervised learning, the five V's of big data, data science skills and roles, moments and outliers in data streams, linear vs logistic regression, defuzzification methods, advantages of R over Python, K-means clustering, and distance measures. Students must answer questions ranging from brief explanations to longer examples and calculations. The questions cover fundamental concepts in data analytics as well as specific techniques like Bloom filters, probability, fuzzy logic, machine learning limitations, support vector machines, clustering vs. classification, time series analysis, Apriori algorithm, Naive Bayes, Apache Hadoop, and more.

Uploaded by

Azeem Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views3 pages

Noida Institute of Engineering and Technology, Greater Noida

This document contains the questions for a Data Analytics exam, divided into 7 sections. It includes questions on topics like supervised vs. unsupervised learning, the five V's of big data, data science skills and roles, moments and outliers in data streams, linear vs logistic regression, defuzzification methods, advantages of R over Python, K-means clustering, and distance measures. Students must answer questions ranging from brief explanations to longer examples and calculations. The questions cover fundamental concepts in data analytics as well as specific techniques like Bloom filters, probability, fuzzy logic, machine learning limitations, support vector machines, clustering vs. classification, time series analysis, Apriori algorithm, Naive Bayes, Apache Hadoop, and more.

Uploaded by

Azeem Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Noida Institute of Engineering and Technology, Greater Noida

Printed Page:- Sub Code: ………KCS051…………..


Paper Id: Roll
No.:

Pre-University Test (Online)

B. TECH.
(Semester- 5 ) THEORY EXAMINATION 2020-21
Sub Name: DATA ANALYTICS

Time: 3 Hours Total Marks:100


Note: 1. Attempt all Sections. If require any missing date; then choose suitably.

SECTION-A
1. Attempt all questions in brief. 2 x 10 = 20
Q.No. Question Marks CO
a. Distinguish between supervised and unsupervised learning with example. 2 C02
b. Elaborate the five V of BIG DATA and also present suitable example. 2 Co2
c. Discuss various skill sets which are required to become a data scientist and also 2 CO1
explain multiple job roles associated.
d. What do you mean by kth moment in data stream? Compute the surprise number 2 CO3
(second moment) of the stream 3 1 4 1 3 4 2 1 2.
e. What is the difference between linear and logistic regression? 2 CO2
f. Explain the defuzzification process with at least 2 different methods along with 2 CO4
example.
g. Present the advantages of R over Python. 2 CO5
h. Assume user want to cluster 7 observation into 3 clusters using K-means 2 C05
clustering algorithm. After first iteration the clusters C1,C2,C has the following
observation C1: {(1,1) , (4,4), (7,7)}
C2: {(0,4), (4,0)}
C3: {(5,5) , (9,9)}
What will be cluster centroids if user go for second iteration.
i. List various types of distance measure used in the clustering with suitable 2 CO3
examples
j. How we find the outlier in any data set with respect to each feature in R? 2 CO5

SECTION-B
2. Attempt any three of the following: 3 x 10 = 30
Q.No. Question Marks CO
a. Explain each phase of data analytic life cycle and also present it with a neat 10 CO1
diagram.
b. Illustrate the working of Blooms filter with an example. 10 C01
c. A fair coin is tossed twice. What is the probability that both tosses result in 10 CO2
heads given that at least one of the tosses resulted in head?

1|Page
d. With respect to Fuzzy logic explain these terms with diagram and appropriate 10 CO2
example
1. Core
2. Support
3. Boundary
4. Cross over point
5. height
e. What are the limitations of machine learning? How deep learning overcome 10 CO2
these aspects and explain the perceptron learning algorithm with neat diagram
and terminologies.

SECTION-C

3. Attempt any one part of the following: 1 x 10 = 10


Q.No. Question Marks CO
a. What is Big Data? Why we need to analyze Big Data ?List out the characteristics 10 CO5
of Big data and challenges in handling big data?
b. Justify Why SUPPORT VECTOR MACHINE is effective on high dimensional 10 CO2
data and discuss the polynomial kernel function for multiple classes/

4. Attempt any one part of the following: 1 x 10 = 10


Q.No. Question Marks CO
a. What is difference between clustering and classification? Explain the K-means 10 CO3
clustering algorithm step wise .
b. A Diagnostic Test is conducted on 960 patients to detect a disorder with a 10 CO2
prevalence rate of 6.25% in the population. Assume that the test has a
specificity of 83.33%. How many people are incorrectly identified as having a
disease?

5. Attempt any one part of the following: 1 x 10 = 10


Q.No. Question Marks CO
a. perform agglomerative clustering using single linkage on following data set and 10 CO3
also draw the dendrogram.
Distance A B C D E
A 0 5 2 3 1
B 5 0 1 3 2
C 2 1 0 6 2
D 3 3 6 0 3
E 1 2 2 3 0
b. What is the concept of data stream? How we found unique element in continuous 10 CO4
stream. Explain various steps of FLAZOLET MARTIN algorithm with
appropriate example.

6. Attempt any one part of the following: 1 x 10 = 10

2|Page
Q.No. Question Marks CO

a. Explain the concept of Apriori Algorithm. Solve the numerical with min support 10
count =2. Generate the association rule with confidence value 60%.List out the
item which are frequently purchased on the basis of association rule.
T1 ITEM 1, ITEM 3 , ITEM 4
T2 ITEM 2 , ITEM 3, ITEM 5
T3 ITEM 1 , ITEM 2 ,ITEM 3, ITEM 5
T4 ITEM 2, ITEM 5
T5 ITEM 1 , ITEM 3, ITEM 5
b. Discuss the various component of time series analysis and also explain the ARIMA 10
model .

7. Attempt any one part of the following: 1 x 10 = 10


Q.No. Question Marks CO
a. With a neat diagram of Apache Hadoop Eco system Explain the following terms: 10 CO5
1. Map reduce job work flow with diagram
2. HIVE
3. APACHE PIG component
4. HDFS
5. HBASE
b. Using NAÏVE BAYES CLASSIFIER COMPUTE the probability that a RED SUV 10 CO2
DOMESTIC is going to stole or not. Write all computational steps.

Example no Colour Type Origin stolen


1 RED SPORTS DOMESTIC YES
2 RED SPORTS DOMESTIC NO
3 RED SPORTS DOMESTIC YES
4 YELLOW SPORTS DOMESTIC NO
5 YELLOW SPORTS IMPORTED YES
6 YELLOW SUV IMPORTED NO
7 YELLOW SUV IMPORTED YES
8 YELLOW SUV DOMESTIC NO
9 RED SUV IMPORTED NO
10 RED SPORTS IMPORTED YES

3|Page

You might also like