0% found this document useful (0 votes)

28 views5 pages

Data Mining Algorithms - Exam 22/23

Uploaded by

mau.spires

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views5 pages

Data Mining Algorithms - Exam 22/23

Uploaded by

mau.spires

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data Mining and Big Data Analytics 2022/23

CSI-6-DMA Semester 1

Question 1

Choose the best answer to each of the following questions (1.5 marks each):

1.1. Given a two-class classification problem, which of the following models is the
worst one?
(a) A model that has 100% false positive rate and 0% true positive rate.
(b) A model that has 100% false positive rate and 100% true positive rate.
(c) A model that has 0% false positive rate and 100% true positive rate.
(d) A model that has 0% false positive rate and 0% true positive rate.

1.2. For a given association rule, moving an item from the consequent of the rule to
the antecedent of the rule never changes the __________ of the association rule.
(a) support, confidence, and lift
(b) support and lift
(c) confidence and lift
(d) support

1.3. Given three itemsets X, Y, and Z, where 𝑋 ⊂ 𝑌 ⊂ 𝑍. If Y is frequent, then

__________.
(a) 𝑋 is either frequent or infrequent
(b) both 𝑋 and 𝑍 are infrequent
(c) 𝑍 is either frequent or infrequent
(d) both 𝑋 and 𝑍 are frequent

1.4. In a confusion matrix for a classifier, the sum of all the diagonal elements in the
matrix is the total number of the __________ samples that have been classified
__________ by the classifier.
(a) training, incorrectly
(b) testing, incorrectly
(c) training, correctly
(d) testing, correctly

1.5. The k-means clustering algorithm can be used for which of the following tasks?
(a) Anomaly detection.
(b) Outlier detection.
(c) Partition a sample space into several non-overlapping segments.
(d) Unsupervised classification.
(e) All of these.

Page 1 of 5
Data Mining and Big Data Analytics 2022/23
CSI-6-DMA Semester 1

1.6. Suppose a binary decision tree has 𝑚 nodes excluding all the leaf nodes, where
𝑚 is an integer number and 𝑚 ≥ 1. Then the decision tree forms __________
mutually exclusive partitions in the sample space.
(a) 𝑚 + 1
(b) m
(c) 𝑚/2
(d) 𝑚2

1.7. If a data mining algorithm continues to reduce the error on __________ set at a
cost of an increased error on __________, then model over-fitting happens.
(a) training, training
(b) training, testing
(c) testing, training
(d) testing, testing

1.8. Suppose that a transactional database has 𝑚 distinct items, where 𝑚 is an integer
number, then the total number of the itemsets that can be extracted from the
database is __________.
(a) 𝑚
(b) 𝑙𝑜𝑔2 𝑚
(c) 2𝑚 − 1
(d) 𝑚2

1.9. In __________ modelling, a given data set is usually divided into __________.
(a) descriptive, validation and test subsets
(b) predictive, testing and validation subsets
(c) descriptive, training and testing subsets
(d) predictive, training and testing subsets

1.10. Which of the following statements is true in the context of data mining?
(a) A linear regression model can be represented in the form of a decision tree.
(b) An association rule doesn’t represent a causal relationship between items.
(c) The output of a logistic regression model indicates the likelihood
(probability) of a sample to be classified into a class.
(d) Clusters created by the k-means algorithm can be represented in the form of a
binary decision tree.
(e) All of these.

Total: 15 Marks

Page 2 of 5
Data Mining and Big Data Analytics 2022/23
CSI-6-DMA Semester 1

Question 2

A fraud warning system has been developed by an insurance company to identify any
fraudulent insurance claims with a reasonably low false-alarm rate. Two models, M1
and M2, have been constructed for the system. Each model classifies a claim as either
True class or False class. The cost matrix used in the classifier design is shown
below, and the test results of the two models are given in the following confusion
matrices:

Cost matrix for the classifier design

Predicted Class
True False
Actual True -1 1
Class False 4 0

Confusion matrix for the two classification models

Predicted Class Predicted Class
Model M1 Model M2
True False True False
Actual True 30 0 Actual True 15 15
Class False 10 10 Class False 0 20

(a) What are the accuracy and cost of each of the two classifiers? You must show
clearly how you get your answer. You may leave your answers in the form of
fractions if you wish.
(12 marks)

(b) Which model has a lower false-alarm rate, i.e., a true claim has been classified as
False? You must show clearly how you get your answer. You may leave your
answers in the form of fractions if you wish.
(6 marks)

Total: 30 Marks

Page 3 of 5
Data Mining and Big Data Analytics 2022/23
CSI-6-DMA Semester 1

Question 3

(a) Consider the following data types:

a. Ordinal and binary.
b. Interval and discrete.
c. Ratio and discrete.
d. Ratio and continuous.

Give one variable as an example for each of these data types. Your answer should
include some possible values that each variable can take on.
(12 marks)

(a) Consider a dataset about road accidents in the area of London Bought of
Southwark. The variables of the dataset are shown below. Discuss what data pre-
processing tasks may need to be undertaken and explain why, if the k-means
clustering algorithm is to be applied for grouping the accidents into meaningful
segments (clusters).

Value range if numeric

Variable Data
Variable Description variable or distinct values if
Name Type
categorical variable
ACC_ID Accident ID Nominal Sequential integer number
Level of Accident
S_LEVEL Ordinal Fatal, Serious, Light
severity
Type of junction Authorised person, Auto
J_ CTRL Nominal
control traffic signal, Stop sign.
Number of casualties
CASU Ratio 0 – 40
in an incident
COST Cost of an accident (£) Ratio 100.00 – 5,000,000.00

(18 marks)

Total: 30 Marks

Page 4 of 5
Data Mining and Big Data Analytics 2022/23
CSI-6-DMA Semester 1

Question 4

(a) Name any four methods of outlier detection. Choose one of the named methods to
explain how it works with an appropriate example. Your answer must state clearly
to which type of data each named method is applicable.
(15 marks)

(b) Write brief notes to discuss how to choose a proper minimum support threshold in
association rule analysis.
(10 marks)

Total: 25 Marks

END OF PAPER

Page 5 of 5

Data Mining Algorithms - Exam 23/24
No ratings yet
Data Mining Algorithms - Exam 23/24
5 pages
Model BSC
No ratings yet
Model BSC
1 page
Answer Midterm Exam Data Mining1 2021 - 2022
100% (2)
Answer Midterm Exam Data Mining1 2021 - 2022
4 pages
Data Mining Exam Questions
No ratings yet
Data Mining Exam Questions
25 pages
Isp565 - Its665 Feb 22
No ratings yet
Isp565 - Its665 Feb 22
17 pages
Data Mining Exam Paper - May 2022
No ratings yet
Data Mining Exam Paper - May 2022
4 pages
IS328 Final Exam
No ratings yet
IS328 Final Exam
12 pages
Semester Two Examinations 2023 DATA7703
No ratings yet
Semester Two Examinations 2023 DATA7703
15 pages
Data Mining Insights and Applications
100% (1)
Data Mining Insights and Applications
13 pages
MCQ On Data Mining
No ratings yet
MCQ On Data Mining
20 pages
MCQ's of Data Mining CIT-661 Part 1 - Prepared by GCUF Guiders
No ratings yet
MCQ's of Data Mining CIT-661 Part 1 - Prepared by GCUF Guiders
9 pages
Assignment Data Mining
No ratings yet
Assignment Data Mining
27 pages
MCQ On Data Mining
No ratings yet
MCQ On Data Mining
20 pages
WQD7005 (Alternative Assessment)
No ratings yet
WQD7005 (Alternative Assessment)
4 pages
WQD7005 (Alternative Assessment)
100% (1)
WQD7005 (Alternative Assessment)
4 pages
MCQ On Data Mining
No ratings yet
MCQ On Data Mining
20 pages
Exam Advanced Data Mining Date: 5-11-2009 Time: 14.00-17.00: General Remarks
100% (1)
Exam Advanced Data Mining Date: 5-11-2009 Time: 14.00-17.00: General Remarks
5 pages
Sample Questions
No ratings yet
Sample Questions
51 pages
Review Sheet 1 Question I: MCQ
No ratings yet
Review Sheet 1 Question I: MCQ
10 pages
MCQ
100% (7)
MCQ
37 pages
III Yr B.Tech. - Computer Science & Engineering/Information Technology Data Mining
No ratings yet
III Yr B.Tech. - Computer Science & Engineering/Information Technology Data Mining
2 pages
Wednesday, 12 June 2024, 9:50 PM Finished Wednesday, 12 June 2024, 10:12 PM 22 Mins 10 Secs 20.00/20.00 Out of 100.00
No ratings yet
Wednesday, 12 June 2024, 9:50 PM Finished Wednesday, 12 June 2024, 10:12 PM 22 Mins 10 Secs 20.00/20.00 Out of 100.00
6 pages
Data Mining Questions and Answers Bank
No ratings yet
Data Mining Questions and Answers Bank
19 pages
Data Analytic MCQ
No ratings yet
Data Analytic MCQ
5 pages
Ch5 - Questions
No ratings yet
Ch5 - Questions
12 pages
Quiz 1-A
No ratings yet
Quiz 1-A
5 pages
Data Warehousing & Mining Exam 2018
No ratings yet
Data Warehousing & Mining Exam 2018
17 pages
Final Exam BWA44603
No ratings yet
Final Exam BWA44603
4 pages
212 Final-Solution
No ratings yet
212 Final-Solution
23 pages
DM QB
No ratings yet
DM QB
7 pages
Oral Exam 2024
No ratings yet
Oral Exam 2024
2 pages
DM-I Q Paper 2024
No ratings yet
DM-I Q Paper 2024
12 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
Data Final
No ratings yet
Data Final
17 pages
Data Mining - Sem 3 - Assignment - 2
No ratings yet
Data Mining - Sem 3 - Assignment - 2
5 pages
Data Mining Exam for B.Sc. Students
No ratings yet
Data Mining Exam for B.Sc. Students
6 pages
PRML 2022 Endsem
No ratings yet
PRML 2022 Endsem
3 pages
Data Science and Analytics Quiz
No ratings yet
Data Science and Analytics Quiz
71 pages
PAML Sem 3 - Model Paper Answers
No ratings yet
PAML Sem 3 - Model Paper Answers
4 pages
Data Warehousing & Mining Exam Questions
No ratings yet
Data Warehousing & Mining Exam Questions
655 pages
Data Mining
No ratings yet
Data Mining
7 pages
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
No ratings yet
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
2 pages
Advanced Data Analytics Exam Questions and Answers
No ratings yet
Advanced Data Analytics Exam Questions and Answers
7 pages
ML MCQs Set
No ratings yet
ML MCQs Set
18 pages
Test Bank
No ratings yet
Test Bank
55 pages
Multiple Choice & Matching Questions
No ratings yet
Multiple Choice & Matching Questions
56 pages
Test Bank
No ratings yet
Test Bank
55 pages
IT446 Test Bank
No ratings yet
IT446 Test Bank
57 pages
Subject Code: 80359 Subject Name: Data Warehousing and Data Mining Common Subject Code (If Any)
No ratings yet
Subject Code: 80359 Subject Name: Data Warehousing and Data Mining Common Subject Code (If Any)
9 pages
Mcqs 1
No ratings yet
Mcqs 1
34 pages
Data Mining IMP Objective Questions - Sep 2023
No ratings yet
Data Mining IMP Objective Questions - Sep 2023
4 pages
Dda3020 22
No ratings yet
Dda3020 22
4 pages
TYCS - Data Science MCQ
No ratings yet
TYCS - Data Science MCQ
6 pages
Dmbi Mcqs Mcqs For Data Mining and Business Intelligence
No ratings yet
Dmbi Mcqs Mcqs For Data Mining and Business Intelligence
24 pages
Sheet With Answers
100% (1)
Sheet With Answers
87 pages
Data Mining Multiple Choice Quiz
No ratings yet
Data Mining Multiple Choice Quiz
16 pages
Data Mining & Warehousing Test Bank
No ratings yet
Data Mining & Warehousing Test Bank
61 pages
Algorithm Complexity Insights
No ratings yet
Algorithm Complexity Insights
13 pages
Civil Engineering (Bsc. Only) : Sma 3261 Numerical Methods Cat I
No ratings yet
Civil Engineering (Bsc. Only) : Sma 3261 Numerical Methods Cat I
2 pages
Sample - Solution Manual Friendly Introduction To Numerical Analysis Brian Bradie
0% (1)
Sample - Solution Manual Friendly Introduction To Numerical Analysis Brian Bradie
13 pages
Automated Image Captioning With Convnets and Recurrent Nets: Andrej Karpathy, Fei-Fei Li
No ratings yet
Automated Image Captioning With Convnets and Recurrent Nets: Andrej Karpathy, Fei-Fei Li
105 pages
AI Search Algorithms Explained
No ratings yet
AI Search Algorithms Explained
78 pages
Exercise 2
No ratings yet
Exercise 2
4 pages
Environmental Sound Recognition and Classification
No ratings yet
Environmental Sound Recognition and Classification
36 pages
AI Lab Manual 2024-2025
No ratings yet
AI Lab Manual 2024-2025
25 pages
Practical FIR Filter Design - Part 1 - Design With Octave or Matlab
No ratings yet
Practical FIR Filter Design - Part 1 - Design With Octave or Matlab
9 pages
7 Newton Forward Interpolation
No ratings yet
7 Newton Forward Interpolation
2 pages
Recursion vs Iteration Explained
No ratings yet
Recursion vs Iteration Explained
38 pages
DSA Pattern Learning Mastery
No ratings yet
DSA Pattern Learning Mastery
12 pages
CS-878 Lecture-02 Logistic Regression
No ratings yet
CS-878 Lecture-02 Logistic Regression
55 pages
DSA 00 Useful Formula Eng Merged
No ratings yet
DSA 00 Useful Formula Eng Merged
210 pages
3 Goertzel's Algorithm
100% (1)
3 Goertzel's Algorithm
41 pages
Aoa Viva Q&a
No ratings yet
Aoa Viva Q&a
5 pages
Histogram Report
No ratings yet
Histogram Report
8 pages
George Liu 1981
No ratings yet
George Liu 1981
401 pages
DSP Laboratory: Power Spectral Density Analysis
No ratings yet
DSP Laboratory: Power Spectral Density Analysis
6 pages
Neural Network Learning Models
No ratings yet
Neural Network Learning Models
7 pages
Data Structures: Algorithm
No ratings yet
Data Structures: Algorithm
22 pages
Data Warehousing & Clustering Guide
No ratings yet
Data Warehousing & Clustering Guide
9 pages
September-2021 Ada QP
No ratings yet
September-2021 Ada QP
2 pages
MATLAB Transfer Function Analysis Lab
No ratings yet
MATLAB Transfer Function Analysis Lab
14 pages
Compound Shapes Area Worksheet
No ratings yet
Compound Shapes Area Worksheet
2 pages
BCS304-Module 3 Notes (Sowmya)
100% (1)
BCS304-Module 3 Notes (Sowmya)
6 pages
Numerical Linear Algebra Solution
No ratings yet
Numerical Linear Algebra Solution
55 pages
Assignment I
No ratings yet
Assignment I
2 pages
Understanding Time Complexity and Big-O
No ratings yet
Understanding Time Complexity and Big-O
21 pages
Deep Model For Dropout Prediction in MOOCs
No ratings yet
Deep Model For Dropout Prediction in MOOCs
7 pages

Data Mining Algorithms - Exam 22/23

Uploaded by

Data Mining Algorithms - Exam 22/23

Uploaded by

Data Mining and Big Data Analytics 2022/23

1.3. Given three itemsets X, Y, and Z, where 𝑋 ⊂ 𝑌 ⊂ 𝑍. If Y is frequent, then

Cost matrix for the classifier design

Confusion matrix for the two classification models

(a) Consider the following data types:

Value range if numeric

You might also like