QUESTION BANK
BCA Semester-4th
SUBJECT: Introduction to DS
2 marks
1. What is the need of Data Pre-processing?
2. Define the term Data Mining.
3. Analyse the need of “Kernel” and ”Hinge Loss” in Support Vector Machine.
4. Write the formula for cosine Similarity.
5. What is the need of unsupervised learning?
6. What are the steps involved in Data Pre-processing?
7. What are the measures used to evaluate association rules in pattern mining?
8. Write the name of different data cleaning techniques.
9. What is High-Utility Pattern Mining?
10. How many types of Machine learning algorithms are there?
11. Write various data mining applications.
12. Why Cluster analysis is required?
13. What is Conditional probability? Write Baye’s Theorem.
14. Explain the term Heterogeneous database?
15. What is data reduction?
16. What is Multilevel and Multidimensional Pattern Mining?
17. What are the challenges of Mining High-Dimensional Data?
18. What is Cluster Analysis?
19. What is Rule-Based Classification?
20. What are Complex Data Types in data mining?
21. Predict the future of data mining.
5 marks
1. Define Frequent sets, confidence, support and association rule.
2. How Support Vector Machine Classify the data? Explain “Hyper plane” “Kernel”.
3. Name some variants of Apriori Algorithm. Discuss the importance of Association Rule
Mining.
Sr. No. Math Computer Result
Science
1 4 3 Fail
2 6 7 Pass
3 7 8 Pass
4 5 5 Fail
5 8 8 Pass
1. Why Apiori Algorithm is needed? Explain.
2. Explain the term Knowledge discovery in Database.
3. How does data mining impact society? Discuss ethical concerns like privacy, bias,
and security risks in data mining.
4. A bag contains 4 balls. Two balls are drawn at random without replacement and are
found to be blue. What is the probability that all balls in the bag are blue?
5. Describe the Apriori algorithm with an example. What are its key steps?
6. Why is data pre-processing important in data mining? Explain with examples.
7. Discuss the role of data mining in fraud detection and cyber security.
8. What are Complex Data Types in data mining? Discuss mining techniques used for
text, spatial, multimedia, and web data.
9. Write the tasks associated with data cleaning. How the missing value data of a table is
handled?
10. What is the requirement of Box-plots and Outliers?
10 arks
1. What is KDD? Explain about data mining as a step in the process of knowledge
discovery.
2. How can we handle missing values? Explain Noisy Data.
3. Write Short notes on: Unsupervised learning, Decision Tree, Logistic Regression,
Web Mining.
4. Describe any two alternative methodologies in data mining apart from classification,
clustering, and association rule mining.
5. Explain Hierarchical Clustering and differentiate between Agglomerative and
Divisive approaches.
6. What are the different methods to evaluate the quality of clustering? Explain different
techniques.
7. Write a comparative analysis of density based, and grid based cluster methods.
8. Write the Differences between supervised learning and unsupervised Learning.
Classify Supervised Learning algorithm. Define Binary and Multiclass classification
with example.
9. Write Short notes on :any two
(i) Decision Tree (ii) Clustering (iii) Similarity Matrices
10. Apply K-means clustering and perform clustering on these given data.
A1(2,10),A2(2,5),A3(8,4),B1(5,8),B2(7,5),B3(6,4),C1(1,2),C2(4,9)
Consider A1, B1, C1 as the center of each cluster.
(The distance is Euclidean distance.)
11. Write a comparative analysis on Nominal Attribute, Binary Attribute, Ordinal and
Numeric attributes of data.
12. How Similarity and dissimilarity matrix helps to measure the Similarity of different
data?
x = (5,0,3,0,2,0,0,2,0,0) and
y = (3,0,2,0,1,1,0,1,0,1).Calculate the cosine similarity between these two data.
According to above given sample data find the class of new arrival data using KNN
Algorithm.
The distance matrix is given
Now form the Agglomerative Hierarchical Clustering using Single Link Technique.