100% found this document useful (1 vote)
1K views19 pages

Predictive Modeling MCQs IMT

The document contains multiple choice questions about statistics and predictive analytics topics. It covers concepts like data distributions, properties of normal distributions, CRISP-DM methodology and metrics to evaluate predictive models. There are 12 questions in total across 3 chapters testing fundamental statistical concepts.

Uploaded by

khushal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
1K views19 pages

Predictive Modeling MCQs IMT

The document contains multiple choice questions about statistics and predictive analytics topics. It covers concepts like data distributions, properties of normal distributions, CRISP-DM methodology and metrics to evaluate predictive models. There are 12 questions in total across 3 chapters testing fundamental statistical concepts.

Uploaded by

khushal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Multiple-Choice Questions

Chapter 1
' Which of the
following statements is true?
is oftenbasedon nonparametn'calgorithms;no
(a) Statistics guaranteed
optimum.
(b) In statistics, models are typically nonlinear.
(c) Statisticsalgorithmsare not as efficient or stable for small data.
(d) In statistics, data is typically smaller, the model is important.
- What are the challenges in using Predictive
Analytics?
(a) Predictive models require data in the form of twodimensiona1 data
(rows and columns).
(b) Often, deploymentof predictive models require shift in resources
{0,
an organization.
(c) The models become too complex because of overfitting.
(d) All of the above.
. What is the format in which data must be available for predictive
modelling?
(a) One-dimension
(b) Two-dimension
(c) Three-dimension
(d) n-Dimension
. Computational methodsto discover and report influential patternsin data
are known as
(a) Data mining
(b) Data discovery
(c) Data analytics
(d) All of the above
. Ptedictiveanalytia is the processof
(a) Justcleaning data
(b) Justcompressing data
(c) Guessing about present output Without any data
(d) Information retrieval to make useful predictions about future outcomes
. Discovering interesting and meaningful patterns in data is knovm as
(a) Data analytics
(b) Predictive analytics
(6) Data discovery
(d) All of the above
Questions513
MutipIe-Choice
basedon theproximityof
7- Inputsare analyzedand grouped]clustered
input values to one another is
(a) Supervised learning
(b) Unsupervised learning
(c) Descriptive modeling
(d) Both0» and (c)

Answer Keys

1.(d) 4.(c) 7.(d)


2. (d) 5. (d)
3. (b) 6. (d)
Chapter 2
a
is mining
data
standard
following methOdOlOgy
1. Web 0rm
e
(a) CRISPDM
(b) 5955
(c) Clementine
Mineset isNOT
(d) a in
stepCRISPDM?
ofthefollowing_
2. Which
understanding
(a)Business
(b)Customer
understanding
(c) Dataunderstandmg
for
critical of
success
predictive
3Shigofr:utlhn§experts mOdelling?
(a) Domainexperts
(b) Dataordatabase
experts
expelts
modeling
(c) Predictive
(d) All of the above
4. ofthe
followingbe
ShOUld .
resolvedBuSmess
dunng .
undemtand
V2211?
s e.
to busmess
the ob]echve5. . . _
isavailable
data
(a)8What quanhfyaboutthedata.
(b) Examine characteristics
keysummary
toenumerate
(c) Begin problems withthedata.
datatogainfurtherinsights. .
(d) Vlsualize
5. Whatis trueaboutpredictivemodelmgalgonthms,assumingthereare
two customerrecordsin the data who are actually brother and SiSter?
(a) Predictivealgorithmstreat the two customersrecordsas dependent
(b) Predictive algorithms must know they are related
(c) Predictivealgorithmstreatthesetwo no differently than any othertwo
people with similar patterns or behavior
(d) None of the above
6. Assume that we have records of each visit by a customer to a medical
shop. Which of the following will be a derived variable?
(a) Customers name
(b) Average spend in the last month
(c) Date of birth
(d) Doctor who prescribedthe medicine
7. Whichof the followingis a metricused to in
assessmodelaccuracy
classificationproblem?
(a) Average error
(b) Confusion matrix
(c) Median error
(d ) Median absolute error
metric is best suited to assessmodel accuracyin
8. Which of the following
continuous-valued estimationproblem?
(a) Percentcorrectclassification
(b) Area under the curve
(c) Average error
(d) Confusion matrix
9- According to CRISP-DM,how many phasesare therein a data-mining
project life cycle?
(a) Five
('0) Six
(c) Four
(d) Seven
10. Most frequent metricsto assessmodel accuracy in classification problems
13

(a) PCC
(b) ROC
(c) AUC
((1) None of the above
11. determine magnitude of error
(a) Average errors
(b) Mean squared error
(c) Median ermr
(d) Average absolute error
12. Mean, median, and mode give clear picture about data spread and
variability.
(a) True
(b) False

Answer Keys

1. (a) 5. (c) 9. (b)


1 (b) 6 (b) 10- (a)
a (d) 7. (b) 11. (d)
4. (a) 8. (c) 12. (b)
Chapter 3
1 Whatistrueabout
adistribution bykurtosls?
measured
(a) Kurtosis is always negative.
(b) Normal distribution will have a Kurtosis value of 2.
(c) A leptokurticdistribution is one in which Kurtosis values ismorethan
(d) A platykurticdistribution is one in which Kurtosis valuesis male.
than 3.
2 Whichofthefollowingstatements
istrue?
(a) Themedianwould notchangemuchwhenthereisa singlelargeoutlier '
(b) Themeanwould not changemuch when thereis a single largeoutlier.
(c) The mean is defined as the value that is exactly 50 percentof theWet.
from theminimum to maximum value of thevariable. y
(d) The calculation of mean requires the data to be first sorted.
3- Whatis the correcttwoway combinations/interactions
possible,
if the
number of variables is 5?
(a) The number of possible twoway interactions is 2.
(b) The number of possible two-way interactions is 5.
(c) The number of possible twoway interactions is 10.
(d) The number of possible twoway interactions is 20.
If the value of a variable can range from negative infinity to Positiv
e
infinity,whatisthetypeofthisvariable?
(a) Categorical variables
(b) Continuous variables
(c) Binary variables
(d) Numeric Variables .
5 Which of the following is a property of normal distribution?
(a) The distribution is asymmetric.
(b) The mean and the median are not the same value.
(c) The median and the mode are not the same value.
(d) The mean, median and mode are all the same value.
6. Which of the following is a property of normal distribution?
(a) Approximately 95% of the data will fall between the mean and +/1
standard deviation from the mean.
(b) Approximately 95% of the data wi 1 fall between the mean and +/-2
standard deviations from the mean.
(C) Approximately
95%of the data wt 1 fall between the mean and +l-3
standard deviations from the mean.
95%of thedatawill fall betweenthemeanand4
(1) Approximately
standard deviations from the mean.
7 ofthefollewingisaproperty DistributioII?
ofUniform
gm}:distribution
isasymmetric themean.
about
(b)hee distnbution is infinite.
Multiple-Cholce Questions 5I 7

(b)0
(c) 1
than 1
(d) Greater
is the phenomenon called
9. What when a trend is seen in individual
variables! but 15 reversed when variables are combined?
(a) Simpsons paradox
Rule
(b) Redskin
C Anscombes Quartet
(d) Platykuric . .
of the follewmg 18not a property of normal distribution?
10. WhiCh
(a) It is symmetnc
and mode are all same
(b) Mean,median bell curve
(c) It is
alsocalled
betweenthemeanand the+/-1 Standarddeviation
(d) 86%of datalies data distribution visuajjzanon can be done
11.Generally,one-dimensional
using
(a) Seatter plot
(b) Histogram
(c) Scatterplotmatrices
(d) Anscombes quartet
12.Whichof theseis true about Uniform Distribution?
(a) Thedistribution is mfimte
(b) Meanand midpoint are different
(c) Distribution is symmetric about the mean
(d) None of the above
13. Which of these is false about correlations between two variables?
(a) Measures the numerical relationship of one variable to Others
(b) One variable meaning is related to anothers
(c) Both of these
(d) None of the above

Answer Keys

5((d) 6-(b) 11.(b)


3"(Ca; 7-(c) 12.(c)
4.(b) 3- (b) 13.(b)
5. (d) 9. {a}
IO. ((1)
Chapter 4
l. WhatisNOl'akeystepindata preparationrelatedtothecolumnsinu».Q
data?
(a) Variable naming
(b) Variable cleaning
(c) Variable selection
(d) Feature creation
2.WhatisNOl'akeystepindatapreparationtelatedtotherowsinthedata? ,
(a) Record Selection
(b) Sampiing
(c) Fatwa Creation
(d) Recmd Archiving
3. What are the different approachesto handle outliers in data?
(a) Remove the outlnets from the modeling data
(b) Separatethe anthem and createseparate modeb just for outliers
(c) Tramfonn the outinersso that they are no longer outliers
(d) But the dam
(c) All of the above
4. Whnch o! the followmg statements is mnect?
(a) Missing Canpletely at Random (MCAR) implies a cutditioml
telabonshnp between the mm; value and other venabla
(b) ang at Random (MAR) means that themeis no way to determine
what the value should have been
(c) an3 Not at Random (MNAR) means the missing value can be
intenedmgmralbythemhdtlnttkvalueumissmg
(d) All of the above
5. Whmholtheblkmmgisatypmlmethodtocormctmgativeskewin
dtstnbuuon?
(a) hog Transform Ex log)
0)) Multnphcanve Inverse 1/1
(c) Square Root or sqrt(x)
(d) Power Transform Ex: 1'
6. lfthednsmbutionhasspika. whatisagoodconectiveaction?
(a) Bmmng untoreglonscenteredon splines
(b) LoglO tramlotm
(c) Power transhxm
(d) Flip transform
7. Whrdtol thefollowmgis NOT a SingleVariable
SelectimTechnique?
(3) Chrsquare Test
(b) Sampson'sParadox
(c) ANOVA
(d) Linear regressionforward selection(1 step)
V ummeomm :19
0"
teem
samplmg .
. Sm.
..
pm "ewmm
.

mm
8 .What
bility? -
C Validation
r055
:3 Of
uI'Sedimensionality
C
(b)Ruleof11
(c)Iemporal .
(d) stands Seqwmg
MCAR for
9. completely atrandom
(a) [ ngconditional atrandom
(b)Missingconvolute atrandom
(c) Missing
thae
((1)Noneof as
Missing
10. val 15"19
(a) mm
(b) zero
(c) False
(d) null '
min-max
11.Ingeneral, normalization rangeofavanableto
changes
(b)_100t0100
(c) 50 to 50
(d) -1 to 1
(e) 0 to 1
12
sb;1<(:<:;2lsmatamaccurabe,mthedatausedtotrahmt
(a) Underfitting
mammm
(c) Randomness
(d) All of the above

Answer Keys
1- (a) 5. (d ) 9.(a)
2 (d) a (a) 10.
(d)
3. (c) 7. (b) 11.(d)
4.(c) 3,(a) 1"(b)
Chapter 5 W
Fallowing is an example database/dataset of I superma,
andfiveitems
transactions (milk, butter,
bread, beer,
diapers)
APu:Mt m
Indicatedby 1 in the item column. 0:
section ID Milk Dre the
l l

OO
0

01:wa OHO
OO 0

Following questions are based on the above sample database:


1. Consider an example rule for the supermarket mutter, bread} -§ [milk]
meaning that if butter and bread are bought, customers also buy milk. In
this rule, which is the antecedent?
(a) (butter, bread]
(b) {milk}
(c) {butter}
(d) [bread]
Consider an example rule for the supermarket {beer} -+ [diaper] meamng
that if beer was bought, customers also buy diaper. In thus rule, which is
the consequent?
(a) lbeerl
(b) {diaper}
(c) +
(d) None of the above
In the example dataset of supermarket given above, what is the support
for {bread} a [milk]?
(a) 40%
(b) 60%
(c) 66.67%
(d) 33.33%
In the sample example dataset table of supermarket, what is the Antecedent
Support for {bread} -> {milk}?
(a) 400/0
0)) 60%
(c) 66.67%
(d) 3333%
Multiple-Cholce Questions 521

5. In the sample example dataset table of supermarket, what is the Conhdence


for {bread} -> {milk}?
(a) 40%
(b) 60%
(c) 66.67%
(d) 33.33%
6. In the sample example dataset table of supermarket, what is the Lift for
{milk, bread} > {butter}?
(a) 1
(b) 2.5
(c) 4
(d) None of the above
7. If the sample example dataset table of supermarket is converted to
Transactional Format, how many rows of data will be present?
(a) 5
(b) 25
(c) 10
(d) 9
8. Which of the following statements is true?
(a) Data in Standard predictive analytics format can have extraordinarily
large number of columns.
(b) In Standard predictive analytics format, representation of data will be
sparse.
(c) Data in Standard predictive analytics format will normally have lesser
number of rows compared to Transactional Format.
(d) All of the above.
9. is defined as number of times a rule occurs in data divided

by the number of transactions in the data?


(a) Antecedent support
(b) Confidence
(c) Accuracy
(d) Support
10. is a measure of how many times more likely the consequent
will occur when antecedent is true compared to how often the consequent
occurs on its own.
(a) Support
(b) Confidence
((1) Accuracy
((1) Lift
11. ~
Generally, is a data format in which there only few columns,
but many rows are there.
(a) Standard predictive modelling data format
(b) Transactional format
(C) Key value
((1) None of these
322 Applied Pradictfve Analytics

nswer Keys
1. (a) 5. (
z (b) 6.(,3
3 (a) 7. (d)
4 (b)
8.(d)
Chapter 6
is
' Which of the following statements incorrect?
(a) Descriptive modeling algorithms are also called as unsupervised
learning methods.
(b) Descriptive modeling algorithms try to find relationshipsbetweeninputs
(c) Descriptive modeling algorithms discover the best way to segmentthe
data.
(d) Descriptive modeling algorithms try to find relationships thatassociate
inputs to one or more target variables.
, Which of the following statements is incorrect?
(a) Decision Tree is a commonly used unsupervised modeling algorithm.
(b) K-Means clustering is a commonly used unsupervised modeling
algorithm.
(c) Kohonen Self-Organizing Maps (SOM) is a commonly used
unsupervised modeling algorithm.
(d) Principal ComponentAnalysis (PCA)is acommonlyusedunsupervised
modeling algorithm.
. Which of the following statements is incorrect?
(a) Inputs must be numeric for K-Means clustering algorithm.
(b) Kohonen SelfOrganizing Maps (SOM) needs all data to be populated,
there can be no missing values.
(c) Inputs need not be numeric for Kohonen SelfOrganizing Maps (SOM)
algorithm.
(d) When using Principal Component Analysis (PCA), any categorical
variable to be included in the model, must be converted to a number.
. Which of the following algorithms is best suited for reducing the number
of inputs for predictive models?
(a) K-Means clustering
(b) Kohonen Self-Organizing Maps (SOM)
(c) Principal Component Analysis (PCA)
(d) All of the above
5. Which of the following is NOT one of the distancemetric used in building
the K-Means clustering model?
(a) Mahalanobis distance metric
(b) Milwaukee distance metric
(c) Manhattan distance metric
(d) Minkowski distance metric
6. Which of the following is widely used as unsupervisedlearningneural
network algorithm?
(a) Perceptron
(b) Kohonen Self-Organizing Map (SOM)
(c) Both Perceptron and Kohonen Self-OrganizingMap (SOM)
(d) None of the above
7""
Analytics
Predictive
524Applied
KMEANS, what isthe ofclusters
number inthedata?
7.111
will
Algorithm thesame
determine dynamically
(a) bepre-specified
(b)It must
(c) It isalways2
(d) It is always3
Whichof theseisnotunsupervised
modelingalgorithm?
(a) K-means clustering
(b) Kohonen
maps(SOMs)
(c) Self-organizing
((1)Linearregression
In K-means, clustersmodel parametersare definedby
(a) A numberof weights
03) Number of clusters
(C) Value, one per unit
((1) One per unit
10. Generally, in Kogonen map, number of nodes are
(a) Post determined after ploting map
(b) Predetermined
(c) Predetermined by length and width of map
(d) Randomly

Answer Keys

1. (d) 5. (b) 9. (c)


2- (a) 6. (c) 10. (c)
3- (C) 7. (b)
4. (c) 8. (d)
Chapter 7
.
1. How clusterdiffers from one another15-~ PTOblem-
(a)unsupervised
learning
(b) supervised learning
(c) reinforcementlearning
(d) hybrid
prov1desuseful information
2. Software aboutclusterbut failsto explain
about _______.
(a) how clustersare formed by algorithm
(b) meaning of cluster
(c) Both (a) and (b)
(01)None of the above
3. If softwaredoes not provide summaries,thenit is impossible to generate
summary from clusters.
(a) True
(b) False
4. Dummy variable is helpful to reduce bias with dummy
variables.
(a) removal
(b) scaling
(c) inclusion
(d) None of the above
Table 7.2 shows that cluster 1 and 2 have higher number of gifts than
average gifts.

Table 7-2: Cluster Centers for K-Means 3-Cluster Model

L S s . .
#RecoldsinCluster 8,538 8,511 30,656 47,705
LASTDATE 0.319 0.304 0.179 0,225
FISTDATE 0886 0.885 0.908 0.900
0.711 0.716 0.074 0.303
321; 0.382 0.390 0.300 0.331
ERpf-ZA 0.499 0.500 0.331 0.391
{RF
1422: 0.369 0.366 0.568 0.496
DOM
X1103 0-449 0.300 0.368 0.370
DOMALNZ 0300 0.700 0.489 0.493
DOMAIN] 0.515 0.300 0.427 0.420
NGIFTALL
1 0384 0.385 0.233 0.287
LAST3117113151100 -348 0.343 0.430 0.400
7'"

526Applied Analytics
Predictive
(a) True
(b)False is to
applied clustering
algorithm
a
6.Ifda tmaries after
t .
clustermg. then't 1 is .
unders _ .
normahzanon difficult
to
(a) without
(b) withnormalizgh'on
(c) with compressxon
(d) noneof theabove . are
vanables
7. Generally, and
interval ratio problematic
tomterpmt
(a) True
(b) False
8. Asa thumprule orguidingprinciple,ANOVAmethodWorks
when there are _________ clusters.
(a) worst, small no. of
(b) best, small no. of
(c) best, large no. of
(d) worst, large no. of
H ierarchicalclusteringworks well with large number of records,
(a) True
(b) False
10. Decision trees are not distance-baqed algorithms and therefore
distributions.
andskewed are
by
(a) unaffected,outliers
(b) affected, outliers
(c) affected, nonnahzed
(d) unaffected, nonnahzed
11. In mulhvanabe problem, ANOVA determines which variables has
most
signihcant dnffemnce m ____________values between the clusters.
(a) mean
(b) variance
(c) error
(d) none
12 Mean value for variables in each cluster is called as
(a) Cluster mean
(b) Cluster average
(c) Cluster center
(d) Cluster median

Answer Key:
1.

2((3 a(a) 9.(b)


1m M mm
7. (a)
(b)
4.
&(b) n
12.
(l)
(c)
Chapter 9

1. The chanceof the model assessment metric should be tied to


rather than ________.____.
(a) operational considerations.algorithmicexpedience
(b) algorithmicexpedience, operational consndetations
(c) lgOhdU'nlC mnsnderations, operational expedience
(d) None
shouldbe done firston
1 Modelassessment
(a) training data
(b) new data
(c) testdata
(d) None of the above
built classifieralwayshave percent correct
3. Generally,a __________
classification(PCC) in the numeric range of
(a) well, 50 to 100
(b) badly, 50 to 100
(c) badly, O to 10
(d) well, 0 to 10
4. A lift of a model is a ratio of model accuracy to accuracy of a random
guess.
(a) True
0)) False
5. You must always be aware of the base rate to ensure models with a large
baseline rate are not perceived as models.
(a) good
(b) poor
(c) best
(d) None of the above
6. Whichof the following confusion matrix measuresuses all
quadrants of
Confusion matrix?
(a) PCC
(b) Recall
(C)Precision
(d)Noneof theabove
'
themetricis thepercentage
of foundby themodel.
(2)35am,
(b) Is
(c) 25
(d
KIn)None
oftheabov
e
assessment
which business
matches closely
objectives
a) Theralhgzdel
?Ould .
(b)Fake
*--
528 AppliedPredictive
Analytics
9. In assessing
regression thevalueof R2shouldbe
models,
(a) Fixed
(b) dependson application
((2) 0.3
((1) None of the above
10. Whichof thefollowingis
commonlyusedmetricsfor regression
problems
(a) Average absolute error and R2
(b) R2andaveragesquarederror
(C) AveragepercentageerrorandR2
(d) All of the above .111 .
11. Which one is the target for optimizing model parameters Lmeu
Regression?
(a) Minimizes cross entropy
(b) Minimized distance between data
(c) Minimizes squared mean error
((1) Minimizes mean squared error

Answer Keys

5 9. (b)
1. (b)
(a)
6. (a) 10. (b)

i, E?) 7.(b) 11.(c)


4 (a) 8'(a)
Chapter 10
. Model ensembles improve model accuracy and robustness.
(a) True
(b) False
. The best models have
(a) high bias, low variance.
(b) high bias, high variance.
((2) low bias, high variance.
(d) low bias, low variance.
is an important requirement
for buildinggoodbagged
ensembles.
(a) Underfitting the model
(b) Overfitting the model
(c) Exact fitting the model
(d) None of the above
. In boosting algorithm, final predictions are made basedon
of predictions from all models.
(a) average
(b) median
(c) weighted average
(d) None of the above
At each split in the tree, rather than considering all input variables as
candidates, only a random subset of variables is considered in random
forest.
(a) True
(b) False
TreeNET has proven to be an accurate predictor with the benth that very
little data cleanup is needed for the trees before modeling.
(a) True
(b) False
. Ensembles are the methods which not only increases model accuracy but
also
(a) they increase only mode] sensitivity.
0)) they reduce risk on deploying poor model.
(C) they reduce error.
(d) None of the above.
- Sometimes the ensemble will significantly reduce the behavioral
complexity.
(a) True
03) False
530 Applied
Predictive
Analytics
9. Ensembles
areoftenconsidered
blackboxmodels, thatwhatthey
meaning
do is not transparent to the modeler or domain expert.
(a) True
(b) False
10. Ensemble are appropriate solution to all problems.
(a) True
(b) False _
11. In general, regression requires for good results if the thh-complexity
model has a bias, but it has a variance for
training data set.
(a) High, low
(b) Low, high
(c) Low, low
(d) High, high

Answer Keys

1. (a) 5. (a) 9. (a)


' °
2. (d) 6. (a) 10. (b)
3. (b) 7. (b) 11.(a)
4. (c) 3- (a)

You might also like