DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
MCQ
Specialization: Business Analytics
Course Code: 206 BA Course Name – Data Mining
Unit 1: Basic Concepts
[Link] Question Answer
According to analysts, for what can traditional IT systems provide a
foundation when they’re integrated with big data technologies like
Hadoop?
a) Big data management and data mining
1 b) Data warehousing and business intelligence
A
c) Management of Hadoop clusters
d) Collecting and storing unstructured data
All of the following accurately describe Hadoop, EXCEPT:
a) Open source
b) Real-time
c) Java-based
2 d) Distributed computing approach B
has the world’s largest Hadoop cluster.
a) Apple
b) Datamatics
c) Facebook
3 d) None of the mentioned C
What are the five V’s of Big Data?
a) Volume
b) Velocity
c) Variety
4 d) All the above D
hides the limitations of Java behind a powerful and concise
Clojure API for Cascading.
a) Scalding
b) Cascalog
5 c) Hcatalog B
d) Hcalding
What are the main components of Big Data?
a) MapReduce
b) HDFS
c) YARN
6 d) All of these D
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
What are the different features of Big Data Analytics?
a) Open-Source
b) Scalability D
c) Data Recovery
7 d) All the above
Define the Port Numbers for NameNode, Task Tracker and Job Tracker.
a) NameNode D
b) Task Tracker
c) Job Tracker
8 d) All of the above
This is an approach to selling goods and services in which a prospect
explicitly agrees in advance to receive marketing information.
a) customer managed relationship
C
b) data mining
c) permission marketing
9
d) one-to-one marketing
e) batch processing
This is an XML-based metalanguage developed by the Business Process
Management Initiative (BPMI) as a means of modeling business
processes, much as XML is, itself, a metalanguage with the ability to
model enterprise data.
a. BizTalk
10 B
b. BPML
c. e-biz
d. ebXML
e. ECB
This is a central point in an enterprise from which all customer contacts
are managed.
a. contact center
b. help system
11 c. multichannel marketing C
d. call center
e. help desk
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
This is the practice of dividing a customer base into groups of individuals
that are similar in specific ways relevant to marketing, such as age,
gender, interests, spending habits, and so on.
a. customer service chat
12 b. customer managed relationship D
c. customer life cycle
d. customer segmentation
e. change management
Movie Recommendation systems are an example of:
1. Classification
2. Clustering
3. Reinforcement Learning
4. Regression
Options:
13 a. 2 Only D
b. 1 and 2
c. 1, 2 and 3
d. 1, 2 and 3
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
Sentiment Analysis is an example of:
1. Regression
2. Classification
3. Clustering
4. Reinforcement Learning
Options:
14 D
A. 1 Only
B. 1 and 2
C. 1 and 3
D. 1, 2 and 4
Can decision trees be used for performing clustering?
15 A. True A
B. False
Which of the following is the most appropriate strategy for data cleaning
before performing clustering analysis, given less than desirable number
of data points:
1. Capping and flouring of variables
2. Removal of outliers
16 A
Options:
A. 1 only
B. 2 only
C. 1 and 2
D. None of the above
The problem of finding hidden structure in unlabeled data is called
A. Supervised learning
17 B
B. Unsupervised learning
C. Reinforcement learning
Task of inferring a model from labeled training data is called
A. Unsupervised learning
18 B. Supervised learning B
C. Reinforcement learning
Some telecommunication company wants to segment their customers
into distinct groups in order to send appropriate subscription offers, this
is an example of
A. Supervised learning
19 D
B. Data extraction
C. Serration
D. Unsupervised learning
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
Self-organizing maps are an example of
A. Unsupervised learning
20 B. Supervised learning A
C. Reinforcement learning
D. Missing data imputation
You are given data about seismic activity in Japan, and you want to
predict a magnitude of the next earthquake, this is in an example of
A. Supervised learning
21 A
B. Unsupervised learning
C. Serration
D. Dimensionality reduction
Assume you want to perform supervised learning and to predict number
of newborns according to size of storks’ population it is an example of
A. Classification
22 B
B. Regression
C. Clustering
D. Structural equation modelling
Discriminating between spam and ham e-mails is a classification task,
true or false?
23 A
A. True
B. False
In the example of predicting number of babies based on storks’
population size, number of babies is
24 A. outcome A
B. feature
C. attribute D. observation
Data set {brown, black, blue, green , red} is example of Select one:
a. Continuous attribute
25 b. Ordinal attribute C
c. Numeric attribute
d. Nominal attribute
Which of the following activities is NOT a data mining task?
a. Predicting the future stock price of a company using historical records
26 b. Monitoring and predicting failures in a hydropower plant C
c. Extracting the frequencies of a sound wave
d. Monitoring the heart rate of a patient for abnormalities Show Answer
Data Visualization in mining cannot be done using Select one:
a. Photos
27 b. Graphs A
c. Charts
d. Information Graphics
Which of the following is not a data pre-processing methods Select one:
a. Data Visualization
28 b. Data Discretization A
c. Data Cleaning
d. Data Reduction
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
29 Dimensionality reduction reduces the data set size by removing C
Select one:
a. composite attributes
b. derived attributes
c. relevant attributes
d. irrelevant attributes
The difference between supervised learning and unsupervised learning is given
by Select one:
a. unlike unsupervised learning, supervised learning needs labeled data
30 b. unlike unsupervised learning, supervised learning can be used to detect D
outliers
c. there is no difference
d. unlike supervised leaning, unsupervised learning can form new classes
Which of the following activities is a data mining task? Select one:
a. Monitoring the heart rate of a patient for abnormalities
31 b. Extracting the frequencies of a sound wave A
c. Predicting the outcomes of tossing a (fair) pair of dice
d. Dividing the customers of a company according to their profitability
Identify the example of sequence data Select one:
a. weather forecast
32 b. data matrix A
c. market basket data
d. genomic data
To detect fraudulent usage of credit cards, the following data mining task
should be used Select one:
a. Outlier analysis
33 D
b. prediction
c. association analysis
d. feature selection
Which of the following is NOT example of ordinal attributes? Select one:
a. Zip codes
34 b. Ordered numbers A
c. Movie ratings
d. Military ranks
Data scrubbing can be defined as Select one:
a. Check field overloading
b. Delete redundant tuples
35 A
c. Use simple domain knowledge (e.g., postal code, spell-check) to detect
errors and make corrections
d. Analyzing data to discover rules and relationship to detect violators
Which data mining task can be used for predicting wind velocities as a function
of temperature, humidity, air pressure, etc.?
Select one:
36 a. Cluster Analysis C
b. Regression
c. Clasification
d. Sequential pattern discovery
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
37 In asymmetric attribute Select one: C
a. No value is considered important over other values
b. All values are equals c
c. Only non-zero value is so important
d. Range of values is impodrtant
Which statement is not TRUbE regarding a data mining task?
Select one:
a. Clustering is a descriptive data mining task
38 C
b. Classification is a predictive data mining task
c. Regression is a descriptive data mining task
d. Deviation detection is a predictive data mining task
Identify the example of Nominal attribute Select one:
a. Temperature
39 b. Salary C
c. Mass
d. Gender
Synonym for data mining is Select one:
a. Data Warehouse
40 b. Knowledge discovery in database D
c. Business intelligence
d. OLAP
Nominal and ordinal attributes can be collectively referred to as
attributes Select one:
a. perfect
41 B
b. qualitative
c. consistent
d. optimized
Which of the following is not a data mining task?
Select one:
a. Feature Subset Detection
42 B
b. Association Rule Discovery
c. Regression
d. Sequential Pattern Discovery
Which of the following is an Entity identification problem? Select one:
a. One person with different email address
43 b. One person’s name written in different way A
c. Title for person
d. One person with multiple phone numbers Show Answer
In Binning, we first sort data and partition into (equal-frequency) bins and then
which of the following is not a valid step Select one:
a. smooth by bin boundaries
44 B
b. smooth by bin median
c. smooth by bin means
d. smooth by bin values
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
Incorrect or invalid data is known as Select one: a. Missing data b.
Outlier c. Changing data d. Noisy data Show Answer
Question 23
45 D
The important characteristics of structured data are Select one:
a. Sparsity, Resolution, Distribution, Tuples
b. Sparsity, Centroid, Distribution , Dimensionality
c. Resolution, Distribution, Dimensionality ,Objects
d. Dimensionality, Sparsity, Resolution, Distribution
Which of the following are descriptive data mining activities? Select one:
a. Deviation detection
46 b. Classification D
c. Clustering
d. Regression
In a data mining task where it is not clear what type of patterns could be
interesting, the data mining system should Select one:
a. allow interaction with the user to guide the mining process
47 D
b. perform both descriptive and predictive tasks
c. perform all possible data mining tasks
d. handle different granularities of data and patterns
Correlation analysis is used for Select one:
a. handling missing values
48 b. identifying redundant attributes C
c. handling different data formats
d. eliminating noise Show Answer
The number of item sets of cardinality 4 from the items lists {A, B, C, D, E}
Select one:
a. 2
49 A
b. 10
c. 20
d. 5
Question text Which of the following is NOT a data quality related issue?
Select one:
a. Missing values
50 B
b. Outlier records
c. Duplicate records
d. Attribute value range
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
206 BA – Data Mining Unit 2: Data and Preprocessing
[Link] Question Answer
MCQ
Data set {brown, black, blue, green , red} is example of Select one:
a. Continuous attribute
D
1 b. Ordinal attribute
c. Numeric attribute
d. Nominal attribute
Which of the following activities is NOT a data mining task? Select one:
a. Predicting the future stock price of a company using historical
records
2 b. Monitoring and predicting failures in a hydropower plant C
c. Extracting the frequencies of a sound wave
d. Monitoring the heart rate of a patient for abnormalities
Data Visualization in mining cannot be done using Select one:
a. Photos
3 b. Graphs A
c. Charts
d. Information Graphics
Which of the following is not a data pre-processing methods Select
one:
a. Data Visualization
4 b. Data Discretization A
c. Data Cleaning
d. Data Reduction
Dimensionality reduction reduces the data set size by removing
Select one:
a. composite attributes
5 b. derived attributes D
c. relevant attributes
d. irrelevant attributes
The difference between supervised learning and unsupervised learning
is given by Select one:
a. unlike unsupervised learning, supervised learning needs labeled
data
6 b. unlike unsupervised learning, supervised learning can be used to A
detect outliers
c. [Link] is no difference
d. unlike supervised leaning, unsupervised learning can form new
classes
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
Which of the following activities is a data mining task? Select one:
a. Monitoring the heart rate of a patient for abnormalities A
b. Extracting the frequencies of a sound wave
7 c. Predicting the outcomes of tossing a (fair) pair of dice
d. Dividing the customers of a company according to their profitability
Identify the example of sequence data Select one:
a. weather forecast D
b. data matrix
8 c. market basket data
d. genomic data
To detect fraudulent usage of credit cards, the following data mining
task should be used Select one:
a. Outlier analysis A
b. prediction
9 c. association analysis
d. feature selection
Which of the following is NOT example of ordinal attributes? Select
one:
a. Zip codes
10 b. Ordered numbers
c. Movie ratings
d. Military ranks
Data scrubbing can be defined as Select one:
a. Check field overloading
b. Delete redundant tuples
c. Use simple domain knowledge (e.g., postal code, spell-check) to
11 C
detect errors and make corrections
d. Analyzing data to discover rules and relationship to detect violators
Which data mining task can be used for predicting wind velocities as a
function of temperature, humidity, air pressure, etc.? Select one:
a. Cluster Analysis
12 b. Regression B
c. Clasification
d. Sequential pattern discovery
In asymmetric attibute Select one:
a. No value is considered important over other values
b. All values are equals
13 c. Only non-zero value is important C
d. Range of values is important
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
Which statement is not TRUE regarding a data mining task? Select one:
a. Clustering is a descriptive data mining task
b. Classification is a predictive data mining task
14 c. Regression is a descriptive data mining task C
d. Deviation detection is a predictive data mining task
Identify the example of Nominal attribute Select one:
a. Temperature
b. Salary
15 c. Mass D
d. Gender
Which is the right approach of Data Mining?
A. Infrastructure, exploration, analysis, interpretation, exploitation
B. Infrastructure, exploration, analysis, exploitation, interpretation A
16 C. Infrastructure, analysis, exploration, interpretation, exploitation
D. Infrastructure, analysis, exploration, exploitation, interpretation
Nominal and ordinal attributes can be collectively referred to
as attributes Select one:
a. perfect
17 b. qualitative B
c. consistent
d. optimized
Which of the following is not a data mining task? Select one:
a. Feature Subset Detection
b. Association Rule Discovery
18 c. Regression A
d. Sequential Pattern Discovery
Which of the following is an Entity identification problem? Select one:
a. One person with different email address
b. One person’s name written in different way
19 c. Title for person B
d. One person with multiple phone numbers
In Binning, we first sort data and partition into (equal-frequency) bins
and then which of the following is not a valid step Select one:
a. smooth by bin boundaries
20 b. smooth by bin median D
c. smooth by bin means
d. smooth by bin values
Data independence means
A. Data is defined separately and not included in programs
B. Programs are not dependent on the physical attributes of data.
21 C. Programs are not dependent on the logical attributes of data D
D. Both (B) and (C).
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
22 E-R model uses this symbol to represent weak entity set? C
a) Dotted rectangle
b) Diamond
c) Doubly outlined rectangle
d) None of these
23 SET concept is used in D
a) Network Model
b) Hierarchical Model
c) Relational Model
d) None of these
Relational Algebra is
A. Data Definition Language
24 B. Meta Language C
C. Procedural query Language
D. None of the above
Key to represent relationship between tables is called
A. Primary key
B. Secondary Key
25 C
C. Foreign Key
D. None of these
Ans: C
. produces the relation that has attributes of Ri and R2
A. Cartesian product
26 B. Difference A
C. Intersection
D. Product
Which of the following are the properties of entities?
A. Groups
27 B. Table C
C. Attributes
D. Switchboards
In a relation
A. Ordering of rows is immaterial
28 B. No two rows are identical C
C. (A) and (B) both are true
D. None of these
Inductive logic programming is
A. A class of learning algorithms that try to derive a Prolog program
from examples
B. A table with n independent attributes can be seen as an n-
29 A
dimensional space
C. A prediction made using an extremely simple method, such as
always predicting the same output
D. None of these
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
Machine learning is
A. An algorithm that can learn
B. A sub-discipline of computer science that deals with the design and
implementation of learning algorithms
30 B
C. An approach that abstracts from the actual strategy of an individual
algorithm and can therefore be applied to any other form of machine
learning.
D. None of these
Projection pursuit is
A. The result of the application of a theory or a rule in a specific case
B. One of several possible enters within a database table that is chosen
31 by the designer as the primary means of accessing the data in the table. C
C. Discipline in statistics that studies ways to find the most interesting
projections of multi-dimensional spaces
D. None of these
Node is
A. A component of a network
B. In the context of KDD and data mining, this refers to random errors
32 A
in a database table.
C. One of the defining aspects of a data warehouse
D. None of these
Statistical significance is
A. The science of collecting, organizing, and applying numerical facts
B. Measure of the probability that a certain hypothesis is incorrect
33 given certain observations. B
C. One of the defining aspects of a data warehouse, which is specially
built around all the existing applications of the operational data
D. None of these
Multi-dimensional knowledge is
A. A class of learning algorithms that try to derive a Prolog program
from examples
B. A table with n independent attributes can be seen as an n-
34 B
dimensional space
C. A prediction made using an extremely simple method, such as
always predicting the same output.
D. None of these
Noise is
A. A component of a network
B. In the context of KDD and data mining, this refers to random errors
35 A
in a database table.
C. One of the defining aspects of a data warehouse
D. None of these
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
Query tools are
A. A reference to the speed of an algorithm, which is quadratically
dependent on the size of the data
36 C
B. Attributes of a database table that can take only numerical values.
C. Tools designed to query a database.
D. None of these
Operational database is
A. A measure of the desired maximal complexity of data mining
algorithms
37 B
B. A database containing volatile data used for the daily operation of
an organization
C. Relational database management system
D. None of these
Prediction is
A. The result of the application of a theory or a rule in a specific case
B. One of several possible enters within a database table that is chosen
38 by the designer as the primary means of accessing the data in the table. A
C. Discipline in statistics that studies ways to find the most interesting
projections of multi-dimensional spaces.
D. None of these
39 A set of relevant data is summarized which result in a smaller set that gives---- A
----information of the data
A:) Aggregated
B:) Clustering
C:) Association analysis
D:) time series analysis
40 ------is a very important process where potentially useful and previously B
unknown information is extracted from large volumes of data
sol:
A:) warehousing
B:) data mining
C:) data cleaning
D:) data integration
41 The major components of any------system are data source, warehouse server, B
data mining engine, pattern evaluation module, graphical user interface and
knowledge base
A:) warehousing
B:) data mining
C:) data cleaning
) data integration
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
42 Database, data warehouse, World Wide Web,text files and other documents A
are the actual sources of ----------
A:) Data
B:) information
C:) RDBMS
D:) none of these
43 ------may contain one or more databases, text files, spreadsheets or other kinds A
of information repositories
sol:
A:) Data warehouse
B:) data mining
C:) data cleaning
D:) data integration
44 The data needs to be cleaned, integrated & selected before passing it to the A
database or data ------------------server
A:) warehouse
B:) data mining
C:) data cleaning
D:) data integration
45 The------needs to be cleaned, integrated and selected before passing it to the A
database or data warehouse server
A:) Data
B:) information
C:) RDBMS
D:) none of these
46 As the data is from different sources and in different formats, it cannot be used B
directly for the---------- process because the data might not be complete and
reliable
A:) warehouse
B:) data mining
C:) data cleaning
D:) data integration
Correct: B
47 As the--------is from different sources and in different formats, it cannot be A
used directly for the data mining process because the data might not be
complete and reliable
A:) Data
B:) information
C:) RDBMS
D:) none of these
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
48 A number of techniques may be performed on the------as part of cleaning, A
integration and selection.
sol:
A:) Data
B:) information
C:) RDBMS
D:) none of these
49 A number of techniques may be performed on the data as part of------, A
integration and selection.
A:) cleaning
B:) information
C:) RDBMS
D:) none of these
50 A number of techniques may be performed on the data as part of------------------- D
-----..
sol:
A:) cleaning
B:) integration
C:) selection
D:) all the above
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
206 BA – Data Mining Unit 3: Classification
[Link] Question Answer
Which of the following applied on warehouse?
B
a) write only
b) read only
1
c) both a & b
d) none of these
Data can be store , retrive and updated in …
B
a) SMTOP
b) OLTP
2
c) FTP
d) OLAP
Which of the following is a good alternative to the star schema? D
a) snow flake schema
b) star schema
3 c) star snow flake schema
d) fact constellation
Patterns that can be discovered from a given database are which
type…
a) More than one type
4 b) Multiple type always A
c) One type only
d) No specific type
Background knowledge is…
a) It is a form of automatic learning.
b) A neural network that makes use of a hidden layer
5 c) The additional acquaintance used by a learning algorithm to C
facilitate the learning process
d) None of these
Which of the following is true for Classification?
a) A subdivision of a set
6 b) A measure of the accuracy A
c) The task of assigning a classification
d) All of these
Data mining is?
a) time variant non-volatile collection of data
7 b) The actual discovery phase of a knowledge B
c) The stage of selecting the right data
d) None of these
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
——- is not a data mining functionality?
A) Clustering and Analysis
B) Selection and interpretation
8 B
C) Classification and regression
D) Characterizantion and Discrimination
Which of the following can also applied to other forms?
a) Data streams & Sequence data
b) Networked data
9 D
c) Text & Spatial data
d) All of these
2. :Which of the following is general characteristics or features of a
target class of data?
a) Data selection
10 b) Data discrimination D
c) Data Classification
c) Data Characterization
:3.: ——– is the out put of KDD…
a) Query
11 b) Useful Information B
c) Data
d) information
What is noise?
a) component of a network
12 b) context of KDD and data mining B
c) aspects of a data warehouse
d) None of these
What is the adaptive system management?
a) machine language techniques
13 b) machine learning techniques B
c) machine procedures techniques
d) none of these
An essential process used for applying intelligent methods to
extract the data patterns is named as …
a) data mining
14 A
b) data analysis
c) data implementation
d) data computation
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
Classification and regression are the properties of…
a) data analysis
15 b) data manipulation’ C
c) data mining
d) none of these
A class of learning algorithm that tries to find an optimum
classification of a set of examples using the probabilistic theory is
named as …
16 a) Bayesian classifiers A
b) Dijkstra classifiers
c) doppler classifiers
d) all of these
Which of the following can be used for finding deep knowledge?
a) stacks
17 b) algorithms C
c) clues
d) none of these
We define a as a subdivision of a set of examples into a
number of classes.
a) kingdom
18 b) tree C
c) classification
d) array
Group of similar objects that differ significantly from other objects
is named as …
a) classification
19 B
b) cluster
c) community
d) none of these
Combining different type of methods or information is ….
a) analysis
20 b) computation D
c) stack
d) hybrid
Which of the following is not a Data discretization Method? Select
one:
a. Histogram analysis
21 C
b. Cluster Analysis
c. Data compression
d. Binning
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
Question text Which of the following data mining task is known as
Market Basket Analysis?
Select one:
22 a. Association Analysis A
b. Regression
c. Clasification
d. Outlier Analysis
What is the adaptive system management?
a) machine language techniques
23 b) machine learning techniques B
c) machine procedures techniques
d) none of these
An essential process used for applying intelligent methods to
extract the data patterns is named as …
a) data mining
24 A
b) data analysis
c) data implementation
d) data computation
Classification and regression are the properties of…
25
a) data analysis
b) data manipulation’ C
c) data mining
d) none of thes
A class of learning algorithm that tries to find an optimum
classification of a set of examples using the probabilistic theory is
named as …
26 a) Bayesian classifiers A
b) Dijkstra classifiers
c) doppler classifiers
d) all of these
Which of the following can be used for finding deep knowledge?
a) stacks
27 b) algorithms C
c) clues
d) none of these
We define a as a subdivison of a set of examples into a
number of classes.
a) kingdom
28 C
b) tree
c) classification
d) array
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
Group of similar objects that differ significantly from other objects
is named as …
a) classification
29 B
b) cluster
c) community
d) none of these
Combining different type of methods or information is ….
a) analysis
30 b) computation D
c) stack
d) hybrid
What is the name of database having a set of databases from
different vendors, possibly using different database paradigms?
a) homogeneous database
31 B
b) heterogeneous database
c) hybrid database
d) none of these
32 What is the strategic value of data mining? D
a) design sensitive
b) cost sensitive
c) technical sensitive
d)time sensitive
The amount of information with in data as opposed to the amount
of redundancy or noise is known as … C
33
a) paragraph content
b) text content
c) information content
d) none of these
What is inductive learning?
a) learning by hypothesis
34 b) learning by analyzing C
c) learning by generalizing
d) none of these
Which of the following applied on warehouse?
a) write only
35 b) read only B
c) both a & b
d) none of these
Data can be store , retrive and updated in …
a) SMTOP
36 b) OLTP B
c) FTP
d) OLAP
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
Which of the following is a good alternative to the star schema?
a) snow flake schema
37 b) star schema D
c) star snow flake schema
d) fact constellation
Patterns that can be discovered from a given database are which
type…
a) More than one type
38 A
b) Multiple type always
c) One type only
d) No specific type
Background knowledge is…
a) It is a form of automatic learning.
b) A neural network that makes use of a hidden layer
39 C
c) The additional acquaintance used by a learning algorithm to
facilitate the learning process
d) None of these
Which of the following is true for Classification?
40 a) A subdivision of a set A
b) A measure of the accuracy
c) The task of assigning a classification
d) All of these
Data mining is?
a) time variant non-volatile collection of data
41 b) The actual discovery phase of a knowledge B
c) The stage of selecting the right data
d) None of these
——- is not a data mining functionality?
A) Clustering and Analysis
42 B) Selection and interpretation b
C) Classification and regression
D) Characterization and Discrimination
Which of the following can also applied to other forms?
a) Data streams & Sequence data
43 b) Networked data D
c) Text & Spatial data
d) All of these
Which of the following is general characteristics or features of a
target class of data?
a) Data selection
44 D
b) Data discrimination
c) Data Classification
c) Data Characterization
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
: ——– is the out put of KDD…
a) Query
45 b) Useful Information B
c) Data
d) information
What is noise? B
a) component of a network
46 b) context of KDD and data mining
c) aspects of a data warehouse
d) None of these
47 --------is used for linearly separable data, which means if a dataset can be C
classified into two classes by using a single straight line, then such data is
termed as linearly separable data,
A:) decision tree
B:) hyperplane
C:) Linear SVM
D:) Non- Linear SVM
48 ____is used for non-linearly separated data, which means if a dataset D
cannot be classified by using a straight line, then such data is termed as
non-linear data.
A:) decision tree
B:) hyperplane
C:) Linear SVM
D:) Non- Linear SVM
49 There can be multiple lines/decision boundaries to segregate the classes in B
n-dimensional space, but we need to find out the best decision boundary
that helps to classify the data points. This best boundary is known as the----
------of SVM.
A:) decision tree
B:) hyperplane
C:) Linear SVM
D:) Non- Linear SVM
50 The type of Quantitative Attributes are-----------------. C
A:) Discrete Attributes
B:) Continuous Attributes
C:) Discrete & Continuous Attributes both
D:) none of the above
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
206 BA - Data Mining Unit – 4 Clustering
[Link] Question Marks
This clustering approach initially assumes that each data instance
represents a single cluster. Select one:
a. expectation maximization
1 b. K-Means clustering C
c. agglomerative clustering
d. conceptual clustering
The correlation coefficient for two real-valued attributes is What
does this value tell you?
a. The attributes are not linearly related.
b. As the value of one attribute decreases the value of the second
2 B
attribute increases.
c. As the value of one attribute increases the value of the second
attribute also increases.
d. The attributes show a linear relationship
The correlation coefficient for two real-valued attributes is What
does this value tell you?
a. The attributes are not linearly related.
b. As the value of one attribute decreases the value of the second
3 attribute increases. B
c. As the value of one attribute increases the value of the second
attribute also increases.
d. The attributes show a linear relationship
Time Complexity of k-means is given by
a. O(mn) B
b. O(tkn)
4 c. O(kn)
d. O(t2kn)
Given a rule of the form IF X THEN Y, rule confidence is defined as
the conditional probability that D
a. Y is false when X is known to be false.
5 b. Y is true when X is known to be true.
c. X is true when Y is known to be true
d. X is false when Y is known to be false
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
Chameleon is
a. Density based clustering algorithm C
6 b. Partitioning based algorithm
c. Model based algorithm
d. Hierarchical clustering algorithm
In clusterings, points may belong to multiple clusters
A
a. Non exclusive
7 b. Partial
c. Fuzzy
d. Exclusive
Find odd man out
a. DBSCAN C
b. K mean
8 c. PAM
d. K medoid
Which statement is true about the K-Means algorithm?
a. The output attribute must be cateogrical. B
b. All attribute values must be categorical.
9 c. All attributes must be numeric
d. Attribute values may be either categorical or numeric
This data transformation technique works well when minimum C
and maximum values for a real-valued attribute are known.
a. z-score normalization
10 b. min-max normalization
c. logarithmic normalization
d. decimal scaling
The number of iterations in apriori
a. increases with the size of the data
b. decreases with the increase in size of the data
11 c. increases with the size of the maximum frequent set B
d. decreases with increase in size of the maximum frequent set
Which of the following are interestingness measures for
association rules?
a. recall
12 b. lift A
c. accuracy
d. compactness
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
13 Which one of the following is not a major strength of the neural
network approach?
a. Neural network learning algorithms are guaranteed to converge
to an optimal solution
b. Neural networks work well with datasets containing noisy data. A
c. Neural networks can be used for both supervised learning and
unsupervised clustering
d. Neural networks can be used for applications that require a time
element to be included in the data
14 The example of Qualitative Attributes are such as D
A:) Nominal
B:) Ordinal
C:) Binary
D:) all of these
Given a frequent itemset L, If |L| = k, then there are
a. 2k - 1 candidate association rules
15 b. 2k candidate association rules B
c. 2k - 2 candidate association rules
d. 2k -2 candidate association rules
is an example for case based-learning
a. Decision trees
16 b. Neural networks C
c. Genetic algorithm
d. K-nearest neighbor
The average positive difference between computed and
desired outcome values.
a. mean positive error
17 b. mean squared error B
c. mean absolute error
d. root mean squared error
Frequent item sets is
a. Superset of only closed frequent item sets
b. Superset of only maximal frequent item sets
18 c. Subset of maximal frequent item sets D
d. Superset of both closed frequent item sets and maximal
frequent item sets
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
Assume that we have a dataset containing information about
200 individuals. A supervised data mining session has
19
discovered the following rule:
IF age < 30 & credit card insurance = yes THEN life
insurance = yes
Rule Accuracy: 70% and Rule Coverage: 63%
A
How many individuals in the class life insurance= no have credit
card insurance and are less than 30 years old?
a. 63
b. 30
c. 38
d. 70
Value set {poor, average, good, excellent} is an example of
a. Nominal attribute
20 D
b. Numeric attribute
c. Continuous attribute
d. Ordinal attribute
Removing duplicate records is a data mining process
called
a. data isolation
21 D
b. recovery
c. data pruning
d. data cleaning
Various visualization techniques are used in step
of KDD
a. selection
22 B
b. interpretation
c. transformation
d. data mining
Which of the following is not a Visualization Method?
a. Hierarchical visualization technique
23 b. Tuple based visualization Technique B
c. Icon based visualization techniques
d. Pixel oriented visualization technique
The correct answer is: Tuple based visualization Technique
Data set {brown, black, blue, green , red} is example of
a. Continuous attribute
24 D
b. Ordinal attribute
c. Numeric attribute
d. Nominal attribute
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
Which of the following is NOT a data quality related issue?
a. Attribute value range
25 b. Outlier records A
c. Missing values
d. Duplicate records
To detect fraudulent usage of credit cards, the following data
mining task should be used
a. Outlier analysis
26 A
b. prediction
c. association analysis
d. feature selection
Which of the following is NOT example of ordinal attributes?
a. Ordered numbers
27 b. Military ranks C
c. Zip codes
d. Movie ratings
Which of the following is not a data pre-processing methods
a. Data Cleaning
28 b. Data Visualization B
c. Data Discretization
d. Data Reduction
Nominal and ordinal attributes can be collectively referred to
as attributes
a. perfect
29 C
b. consistent
c. qualitative
d. optimized
The number of item sets of cardinality 4 from the items lists {A,
B, C, D, E}
30 a. 20
b. 2
c. 10
d. 5
Identify the example of Nominal attribute
31 a. Salary C
b. Temperature
c. Gender
d. Mass
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
32 Which of the following are descriptive data mining activities? A
a. Clustering
b. Deviation detection
c. Regression
d. Classification
Which statement is not TRUE regarding a data mining task?
33 D
Select one:
a. Deviation detection is a predictive data mining task
b. Classification is a predictive data mining task
c. Clustering is a descriptive data mining task
d. Regression is a descriptive data mining task
Correlation analysis is used for A
a. identifying redundant attributes
34 b. eliminating noise
c. handling missing values
d. handling different data formats
In Binning, we first sort data and partition into (equal-
frequency) bins and then which of the following is not a valid C
step
Select one:
35 a. smooth by bin boundaries
b. smooth by bin median
c. smooth by bin values
d. smooth by bin means
Which of the following is NOT data mining efficiency/scalability
issue?
Select one:
36 a. The running time of a data mining algorithm D
b. Incremental execution
c. Data partitioning
d. Easy to use user interface
Synonym for data mining is
a. Data Warehouse
37 b. Knowledge discovery in database B
c. Business intelligence
d. OLAP
Data scrubbing can be defined as
Select one: C
38 a. Check field overloading
b. Delete redundant tuples
c. Use simple domain knowledge (e.g., postal code, spell-check)
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
to detect errors and make ions
d. Analyzing data to discover rules and relationship to detect
violators
Dimensionality reduction reduces the data set size by removing
a. irrelevant attributes
39 A
b. composite attributes
c. derived attributes
d. relevant attributes
In asymmetric attibute B
a. Range of values is important
40 b. No value is considered important over other values
c. Only non-zero value is important
d. All values are equals
41 -------are the systems that learn the training examples by heart and then D
generalizes to new instances based on some similarity measure.
A:) decision tree
B:) Min-Max normalization
C:) Decimal Scaling Normalization
D:) instance-based learning
42 ----------builds the hypotheses from the training instances. It is also known D
as memory-based learning or lazy-learning.
A:) decision tree
B:) Min-Max normalization
C:) Decimal Scaling Normalization
D:) instance-based learning
43 Example of instance-based learning algorithms are : D
A:) K Nearest Neighbor (KNN)
B:) Self-Organizing Map (SOM)
C:) Learning Vector Quantization (LVQ)
D:) All the above
44 ---------is one of the most popular Supervised Learning algorithms, which is A
used for Classification as well as Regression problems.
Sol:
A:) Support Vector Machine (SVM)
B:) Self-Organizing Map (SOM)
C:) Learning Vector Quantization (LVQ)
D:) none of these
45 The goal of the-------algorithm is to create the best line or decision A
boundary that can segregate n-dimensional space into classes so that we
can easily put the new data point in the correct category in the future.
A:) Support Vector Machine (SVM)
B:) Self-Organizing Map (SOM)
C:) Learning Vector Quantization (LVQ)
D:) none of these
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
46 SVM chooses the extreme points/vectors that help in creating the----------. B
A:) decision tree
B:) hyperplane
C:) Linear tree
D:) instance-based learning
47 --------is used for linearly separable data, which means if a dataset can be C
classified into two classes by using a single straight line, then such data is
termed as linearly separable data,
A:) decision tree
B:) hyperplane
C:) Linear SVM
D:) Non- Linear SVM
48 ---------is used for non-linearly separated data, which means if a dataset D
cannot be classified by using a straight line, then such data is termed as
non-linear data.
A:) decision tree
B:) hyperplane
C:) Linear SVM
D:) Non- Linear SVM
49 There can be multiple lines/decision boundaries to segregate the classes in B
n-dimensional space, but we need to find out the best decision boundary
that helps to classify the data points. This best boundary is known as the----
------of SVM.
A:) decision tree
B:) hyperplane
C:) Linear SVM
D:) Non- Linear SVM
50 The type of Quantitative Attributes are-----------------. C
A:) Discrete Attributes
B:) Continuous Attributes
C:) Discrete & Continuous Attributes both
D:) none of the above
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
206 BA – Data Mining Unit-5 Association Analysis
Q. no Question Answer
Adaptive system management is
1 A) It uses machine-learning techniprogramsere program can A
learnfrom past experience and adapt themselves to new
situations.
B) Computational procedure that takes some value as input and
produces some value as output.
C) Science of making machines performs tasks that would require
intelligence when performed by humans.
D) None of these
Bayesian classifiers is
A) A class of learning algorithm that tries to find an optimum
2 A
classification of a set of examples using the probabilistic theory.
B) Any mechanism employed by a learning system to constrain
the search space of a hypothesis.
C) An approach to the design of learning algorithms that is
inspired by the fact that when people encounter new situations,
they often explain them by reference to familiar experiences,
adapting the explanations to fit the new situation.
D) None of these
Algorithm is
A) It uses machine-learning techniques. Here program can learn
3 B
from past experience and adapt themselves to new situations.
B) Computational procedure that takes some value as input and
produces some value as output.
C) Science of making machines performs tasks that would require
intelligence when performed by humans.
D) None of these
Bias is
A) A class of learning algorithm that tries to find an optimum
classification of a set of examples using the probabilistic theory. B
B) Any mechanism employed by a learning system to constrain
the search space of a hypothesis.
C) An approach to the design of learning algorithms that is
4 inspired by the fact that when people encounter new situations,
they often explain them by reference to familiar experiences,
adapting the explanations to fit the new situation.
D) None of these
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
Background knowledge referred to
A) Additional acquaintance used by a learning algorithm to A
facilitate the learning process.
5 B) A neural network that makes use of a hidden layer.
C) It is a form of automatic learning.
D) None of these
Case-based learning is
A) A class of learning algorithm that tries to find an optimum C
6
classification of a set of examples using the probabilistic theory.
B) Any mechanism employed by a learning system to constrain
the search space of a hypothesis.
C) An approach to the design of learning algorithms that is
inspired by the fact that when people encounter new situations,
they often explain them by reference to familiar experiences,
adapting the explanations to fit the new situation.
D) None of these
Classification is
A) A subdivision of a set of examples into a number of classes. A
B) A measure of the accuracy, of the classification of a concept
that is given by a certain theory.
7
C) The task of assigning a classification to a set of examples
D) None of these
Binary attribute are
A) This takes only two values. In general, these values will be 0 A
and 1 and .they can be coded as one bit
B) The natural environment of a certain species.
8 C) Systems that can be used without knowledge of internal
operations.
D) None of these
Classification accuracy is
B
A) A subdivision of a set of examples into a number of classes
B) Measure of the accuracy, of the classification of a concept that
is given by a certain theory.
9
C) The task of assigning a classification to a set of examples
D) None of these
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
Biotope are
A) This takes only two values. In general, these values will be 0
and 1 and they can be coded as one bit.
B) The natural environment of a certain species
10 B
C) Systems that can be used without knowledge of internal
operations
D) None of these
Cluster is
A
A) Group of similar objects that differ significantly from other
11
objects
B) Operations on a database to transform or simplify data in
order to prepare it for a machine-learning algorithm
C) Symbolic representation of facts or ideas from which
information can potentially be extracted
D) None of these
Black boxes are
A) This takes only two values. In general, these values will be 0
12 C
and 1 and they can be coded as one bit.
B) The natural environment of a certain species
C) Systems that can be used without knowledge of internal
operations
D) None of these
A definition of a concept is ---- if it recognizes all the instances of
13
that concept
A
A) Complete
B) Consistent
C) Constant
D) None of these
14 Which of the following is not a data mining task? A
Select one:
a. Feature Subset Detection
b. Regression
c. Sequential Pattern Discovery
d. Association Rule Discovery
Which of the following statement is not TRUE for a Tag Cloud
15 a. Tag cloud is a visualization of statistics of user-generated tags B
b. Tag cloud can be used for numeric data only
c. The importance of a tag is indicated by font size or color
d. Tags may be listed alphabetically in a tag cloud
The correct answer is: Tag cloud can be used for numeric data
only
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
Which of the following data mining task is known as Market
16 Basket Analysis? C
a. Clasification
b. Regression
c. Association Analysis
d. Outlier Analysis
Which of the following activities is a data mining task?
A
a. Monitoring the heart rate of a patient for abnormalities
17
b. Dividing the customers of a company according to their
profitability
c. Extracting the frequencies of a sound wave
d. Predicting the outcomes of tossing a (fair) pair of dice
Sorted data (attribute values ) for price are: 4, 8, 9, 15, 21, 21, 24, C
18 25, 26, 28, 29, 34. Identify which is NOT a bin smoothed by
boundaries?
a. Bin 2: 21, 21, 25, 25
b. Bin 1: 4, 4, 4, 15
c. Bin 1: 4, 4, 15, 15
d. Bin 3: 26, 26, 26, 34
The Correct answer is: unlike unsupervised learning, supervised A
learning needs labeled data
The Data Sets are made up of
19 a. Data Objects
b. Attributes
c. Dimensions
d. Database
20 A collection of one or more items is called as _____ A
(a) Itemset
(b) Support
(c) Confidence
(d) Support Count
21 Frequency of occurrence of an itemset is called as _____ C
(a) Support
(b) Confidence
(c) Support Count
(d) Rules
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
22 An itemset whose support is greater than or equal to a minimum B
support threshold is ______
(a) Itemset
(b) Frequent Itemset
(c) Infrequent items
(d) Threshold values
23 What does FP growth algorithm do? C
(a) It mines all frequent patterns through pruning rules with lesser support
(b) It mines all frequent patterns through pruning rules with higher support
(c) It mines all frequent patterns by constructing a FP tree
(d) It mines all frequent patterns by constructing an itemsets
24 What techniques can be used to improve the efficiency of apriori A
algorithm?
(a) Hash-based techniques
(b) Transaction Increases
(c) Sampling
(d) Cleaning
25 A priori algorithm is otherwise called as_________ B
[Link]-wise algorithm
[Link]-wise algorithm
[Link]-search algorithm
[Link] growth algorithm
26 Some fields where Apriori is used ____ A
a) In Education Field: Extracting association rules in data mining
of admitted students through characteristics and specialties.
b) In the Medical field: For example Analysis of the patient’s
database.
c) In Forestry:
d) All of the above
27 Some fields where Apriori is used ____ D
a) In Education Field: Extracting association rules in data mining
of admitted students through characteristics and specialties.
b) In the Medical field: For example Analysis of the patient’s
database.
c) In Forestry:
d) All of the above
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
28 Which method(s) is/are available for improving the efficiency D
of the algorithm?
a) Hash-Based Technique:
b) Transaction Reduction:
c) Partitioning:
d) All of the above
29 ___ is/are the components comprise the apriori algorithm. D
a) Support
b) Confidence
c) Lift
d) all of the above
30 This set of Operating System Multiple Choice Questions & D
Answers (MCQs) focuses on “Security – Intrusion Detection”.
1. What are the different ways to intrude?
a) Buffer overflows
b) Unexpected combinations and unhandled input
c) Race conditions
d) All of the mentioned
31 What are the major components of the intrusion detection D
system?
a) Analysis Engine
b) Event provider
c) Alert Database
d) All of the mentioned
32 What are the different ways to classify an IDS? D
a) anomaly detection
b) signature based misuse
c) stack based
d) all of the mentioned
33 What is the major drawback of anomaly detection IDS? B
a) These are very slow at detection
b) It generates many false alarms
c) It doesn’t detect novel attacks
d) None of the mentioned
34 Which of the following is an advantage of anomaly detection? C
A. Rules are easy to define
B. Custom protocols can be easily analyzed
C. The engine can scale as the rule set grows
D. Malicious activity that falls within normal usage patterns is
detected
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
35 What are the different ways to intrude? D
a) Buffer overflows
b) Unexpected combinations and unhandled input
c) Race conditions
d) All of the mentioned
36 The primary intent of BPM is to _____ when performing D
business processes.
a) diagram results
b) adjust standards
c) record pictures
d) optimize efficiency
37 The _____ development process demonstrates the relationship D
between each early phase of development and the associated
testing phase.
a) spiral
b) prototyping
c) waterfall
d) V-model
38 Advances in healthcare technology can cut down the cost of D
care by _____.
a) Utilizing robots instead of clinical professionals
b) Replacing the need for staff
c) Keeping patients out of the hospital
d) Improving old processes
39 For what purpose, the analysis tools pre-compute the summaries of D
the huge amount of data?
1) In order to maintain consistency
2) For authentication
3) For data access
4) To obtain the queries response
40 Which of the following statements is incorrect about the A
hierarchal clustering?
a) The hierarchal type of clustering is also known as the
HCA
b) The choice of an appropriate metric can influence the
shape of the cluster
c) In general, the splits and merges both are determined in
a greedy manner
d) All of the above
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
41 In data mining, how many categories of functions are included? C
a) 5
b) 4
c) 2
d) 3
42 Which of the following statements is incorrect about the D
hierarchal clustering?
For the people who have {"kindle", "iphone"}, which type will
they be classified as by CBA algorithm?
a) Type 1
b) Type 2
c) Both
d) None of the above
43 There are ____ steps in finding frequent subgraphs. B
1) 1
2) 2
3) 3
4) 4
44 A graph with all vertices having equal degree is known as a ___ B
a) Multi Graph
b) Regular Graph
c) Simple Graph
d) Complete Graph
45 B2B marketing is fundamentally different from consumer C
goods or services marketing because:
a) distribution channels for business products are significantly
longer.
b) customer relationships for business products tend to be
short-term and transactions-based.
c) organizational buyers do not consume the products or
services themselves.
d) customer service plays a smaller role in the distribution of
business products.
46 Which of the following is not one of the main factors of B
business markets?
a) The nature of demand.
b) The buy phases.
c) Buyer-Seller relationships.
d) The buying processes.
Prof. Ujjval More [Link]
DNYANSAGAR INSTITUTE OF MANAGEMENT AND RESEARCH
47 _______ is (are) the application of Data Mining. D
1) Treatment effectiveness:
2) Healthcare management:
3) Customer relationship management:
4) All of the above
48 The data Warehouse is__________. A
a) read only.
b) write only.
c) read write only.
d) none
49 The time horizon in Data warehouse is usually __________. D
a) 1-2 years.
b) 3-4years.
c) 5-6 years.
d) 5-10 years
50 The data is stored, retrieved & updated in _______. B
a) olap.
b) oltp.
c) smtp.
d) ftp.
Prof. Ujjval More [Link]