100% found this document useful (3 votes)
1K views2,520 pages

Machine Learning MCQ Guide

This document contains a 29 question multiple choice quiz on machine learning concepts. The questions cover topics such as adaptive system management, Bayesian classifiers, algorithms, bias, background knowledge, case-based learning, classification, binary attributes, classification accuracy, clusters, black boxes, data mining, discovery, DNA, hybrid systems, Euclidean distance, hidden knowledge, heterogeneous databases, enumeration, heuristics, hybrid learning, Kohonen self-organizing maps, and incremental learning. Each question is followed by 4 possible answers with one correct answer identified.

Uploaded by

Dip
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
1K views2,520 pages

Machine Learning MCQ Guide

This document contains a 29 question multiple choice quiz on machine learning concepts. The questions cover topics such as adaptive system management, Bayesian classifiers, algorithms, bias, background knowledge, case-based learning, classification, binary attributes, classification accuracy, clusters, black boxes, data mining, discovery, DNA, hybrid systems, Euclidean distance, hidden knowledge, heterogeneous databases, enumeration, heuristics, hybrid learning, Kohonen self-organizing maps, and incremental learning. Each question is followed by 4 possible answers with one correct answer identified.

Uploaded by

Dip
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2520

MCQ for unit 1: Introduction to Machine learning

1) Adaptive system management is

A) It uses machine-learning techniques. Here program can learn


from past experience and adapt themselves to new situations.
B) Computational procedure that takes some value as input and
produces some value as output.
C) Science of making machines performs tasks that would require
intelligence when performed by humans.
D) None of these

Answer: A

2) Bayesian classifiers is

A) A class of learning algorithm that tries to find an optimum


classification of a set of examples using the probabilistic
theory.
B) Any mechanism employed by a learning system to constrain the
search space of a hypothesis.
C) An approach to the design of learning algorithms that is
inspired by the fact that when people encounter new situations,
they often explain them by reference to familiar experiences,
adapting the explanations to fit the new situation.
D) None of these

Answer: A

3) Algorithm is

A) It uses machine-learning techniques. Here program can learn


from past experience and adapt themselves to new situations.
B) Computational procedure that takes some value as input and
produces some value as output.
C) Science of making machines performs tasks that would require
intelligence when performed by humans.
D) None of these

Answer: B

4) Bias is

A) A class of learning algorithm that tries to find an optimum


classification of a set of examples using the probabilistic
theory.
B) Any mechanism employed by a learning system to constrain the
search space of a hypothesis.
C) An approach to the design of learning algorithms that is
inspired by the fact that when people encounter new situations,
they often explain them by reference to familiar experiences,
adapting the explanations to fit the new situation.
D) None of these

Answer: B

5) Background knowledge referred to

A) Additional acquaintance used by a learning algorithm to


facilitate the learning process.
B) A neural network that makes use of a hidden layer.
C) It is a form of automatic learning.
D) None of these

Answer: A

6) Case-based learning is

A) A class of learning algorithm that tries to find an optimum


classification of a set of examples using the probabilistic
theory.
B) Any mechanism employed by a learning system to constrain the
search space of a hypothesis.
C) An approach to the design of learning algorithms that is
inspired by the fact that when people encounter new situations,
they often explain them by reference to familiar experiences,
adapting the explanations to fit the new situation.
D) None of these

Answer: C

7) Classification is

A) A subdivision of a set of examples into a number of classes.


B) A measure of the accuracy, of the classification of a
concept that is given by a certain theory.
C) The task of assigning a classification to a set of examples
D) None of these

Answer: A

8) Binary attribute are


A) This takes only two values. In general, these values will be
0 and 1 and .they can be coded as one bit
B) The natural environment of a certain species.
C) Systems that can be used without knowledge of internal
operations.
D) None of these

Answer: A

9) Classification accuracy is

A) A subdivision of a set of examples into a number of classes


B) Measure of the accuracy, of the classification of a concept
that is given by a certain theory.
C) The task of assigning a classification to a set of examples
D) None of these

Answer: B

10) Biotope are

A) This takes only two values. In general, these values will be


0 and 1 and they can be coded as one bit.
B) The natural environment of a certain species
C) Systems that can be used without knowledge of internal
operations
D) None of these

Answer: B

11) Cluster is

A) Group of similar objects that differ significantly from


other objects
B) Operations on a database to transform or simplify data in
order to prepare it for a machine-learning algorithm
C) Symbolic representation of facts or ideas from which
information can potentially be extracted
D) None of these

Answer: A

12) Black boxes are

A) This takes only two values. In general, these values will be


0 and 1 and they can be coded as one bit.
B) The natural environment of a certain species
C) Systems that can be used without knowledge of internal
operations
D) None of these

Answer: C

13) A definition of a concept is-----if it recognizes all the


instances of that concept

A) Complete
B) Consistent
C) Constant
D) None of these

Answer: A

14) Data mining is

A) The actual discovery phase of a knowledge discovery process


B) The stage of selecting the right data for a KDD process
C) A subject-oriented integrated time variant non-volatile
collection of data in support of management
D) None of these

Answer: A

15) A definition or a concept is------------- if it classifies


any examples as coming within the concept

A) Complete
B) Consistent
C) Constant
D) None of these

Answer: B

16) Data selection is

A) The actual discovery phase of a knowledge discovery process


B) The stage of selecting the right data for a KDD process
C) A subject-oriented integrated time variant non-volatile
collection of data in support of management
D) None of these

Answer: B

17) Classification task referred to


A) A subdivision of a set of examples into a number of classes
B) A measure of the accuracy, of the classification of a
concept that is given by a certain theory.
C) The task of assigning a classification to a set of examples
D) None of these

Answer: C

18) DNA (Deoxyribonucleic acid)

A) It is hidden within a database and can only be recovered if


one ,is given certain clues (an example IS encrypted
information).
B) The process of executing implicit previously unknown and
potentially useful information from data
C) An extremely complex molecule that occurs in human
chromosomes and that carries genetic information in the form of
genes.
D) None of these

Answer: C

19) Hybrid is

A) Combining different types of method or information


B) Approach to the design of learning algorithms that is
structured along the lines of the theory of evolution.
C) Decision support systems that contain an information base
filled with the knowledge of an expert formulated in terms of
if-then rules.
D) None of these

Answer: A

20) Discovery is

A) It is hidden within a database and can only be recovered if


one is given certain clues (an example IS encrypted
information).
B) The process of executing implicit previously unknown and
potentially useful information from data.
C) An extremely complex molecule that occurs in human
chromosomes and that carries genetic information in the form of
genes.
D) None of these
Answer: B

21) Euclidean distance measure is

A) A stage of the KDD process in which new data is added to the


existing selection.
B) The process of finding a solution for a problem simply by
enumerating all possible solutions according to some pre-defined
order and then testing them
C) The distance between two points as calculated using the
Pythagoras theorem.
D) None of these

Answer: C

22) Hidden knowledge referred to

A) A set of databases from different vendors, possibly using


different database paradigms
B) An approach to a problem that is not guaranteed to work but
performs well in most cases
C) Information that is hidden in a database and that cannot be
recovered by a simple SQL query.
D) None of these

Answer: C

23) Enrichment is

A) A stage of the KDD process in which new data is added to the


existing selection
B) The process of finding a solution for a problem simply by
enumerating all possible solutions according to some pre-defined
order and then testing them
C) The distance between two points as calculated using the
Pythagoras theorem.
D) None of these

Answer: A

24) Heterogeneous databases referred to

A) A set of databases from different b vendors, possibly using


different database paradigms
B) An approach to a problem that is not guaranteed to work but
performs well in most cases.
C) Information that is hidden in a database and that cannot be
recovered by a simple SQL query.
D) None of these

Answer: A

25) Enumeration is referred to

A) A stage of the KDD process in which new data is added to the


existing selection.
B) The process of finding a solution for a problem simply by
enumerating all possible solutions according to some pre-defined
order and then testing them
C) The distance between two points as calculated using the
Pythagoras theorem.
D) None of these

Answer: B

26) Heuristic is

A) A set of databases from different vendors, possibly using


different database paradigms
B) An approach to a problem that is not guaranteed to work but
performs well in most cases
C) Information that is hidden in a database and that cannot be
recovered by a simple SQL query.
D) None of these

Answer: B

27) Hybrid learning is

A) Machine-learning involving different techniques


B) The learning algorithmic analyzes the examples on a
systematic basis 2nd makes incremental adjustments to the theory
that is learned
C) Learning by generalizing from examples
D) None of these

Answer: A

28) Kohonen self-organizing map referred to

A) The process of finding the right formal representation of a


certain body of knowledge in order to represent it in a
knowledge-based system
B) It automatically maps an external signal space into a
system's internal representational space. They are useful in the
performance of classification tasks
C) A process where an individual learns how to carry out a
certain task when making a transition from a situation in which
the task cannot be carried out to a situation in which the same'
task under the same circumstances can be carried out.
D) None of these

Answer: B

29) Incremental learning referred to

A) Machine-learning involving different techniques


B) The learning algorithmic analyzes the examples on a
systematic basis and makes incremental adjustments to the theory
that is learned
C) Learning by generalizing from examples
D) None of these

Answer: B

30) Knowledge engineering is

A) The process of finding the right formal representation of a


certain body of knowledge in order to represent it in a
knowledge-based system
B) It automatically maps an external signal space into a
system's internal representational space. They are useful in the
performance of classification tasks.
C) A process where an individual learns how to carry out a
certain task when making a transition from a situation in which
the task cannot be carried out to a situation in which the same
task under the same circumstances can be carried out.
D) None of these

Answer: A

31) Information content is

A) The amount of information with in data as opposed to the


amount of redundancy or noise.
B) One of the defining aspects of a data warehouse
C) Restriction that requires data in one column of a database
table to the a subset of another-column.
D) None of these
Answer: A

32) Inductive learning is

A) Machine-learning involving different techniques


B) The learning algorithmic analyzes the examples on a
systematic basis and makes incremental adjustments to the theory
that is learned
C) Learning by generalizing from examples
D) None of these

Answer: C

33) Inclusion dependencies

A) The amount of information with in data as opposed to the


amount of redundancy or noise
B) One of the defining aspects of a data warehouse
C) Restriction that requires data in one column of a database
table to the a subset of another-column
D) None of these

Answer: C

34) KDD (Knowledge Discovery in Databases) is referred to

A) Non-trivial extraction of implicit previously unknown and


potentially useful information from data
B) Set of columns in a database table that can be used to
identify each record within this table uniquely.
C) Collection of interesting and useful patterns in a database
D) none of these

Answer: A

35) Learning is

A) The process of finding the right formal representation of a


certain body of knowledge in order to represent it in a
knowledge-based system
B) It automatically maps an external signal space into a
system's internal representational space. They are useful in the
performance of classification tasks.
C) A process where an individual learns how to carry out a
certain task when making a transition from a situation in which
the task cannot be carried out to a situation in which the same
task under the same circumstances can be carried out.
D) None of these

Answer: C

36) Naive prediction is

A) A class of learning algorithms that try to derive a Prolog


program from examples.
B) A table with n independent attributes can be seen as an n-
dimensional space.
C) A prediction made using an extremely simple method, such as
always predicting the same output.
D) None of these

Answer: C

37) Learning algorithm referrers to

A) An algorithm that can learn


B) A sub-discipline of computer science that deals with the
design and implementation of learning algorithms.
C) A machine-learning approach that abstracts from the actual
strategy of an individual algorithm and can therefore be applied
to any other form of machine learning.
D) None of these

Answer: A

38) Knowledge is referred to

A) Non-trivial extraction of implicit previously unknown and


potentially useful information from data
B) Set of columns in a database table that can be used to
identify each record within this table uniquely
C) Collection of interesting and useful patterns in a database
D) none of these

Answer: C

39) Node is

A) A component of a network
B) In the context of KDD and data mining, this refers to random
errors in a database table.
C) One of the defining aspects of a data warehouse
D) None of these
Answer: A

40) Machine learning is

A) An algorithm that can learn


B) A sub-discipline of computer science that deals with the
design and implementation of learning algorithms
C) An approach that abstracts from the actual strategy of an
individual algorithm and can therefore be applied to any other
form of machine learning.
D) None of these

Answer: B

41) Projection pursuit is

A) The result of the application of a theory or a rule in a


specific case
B) One of several possible enters within a database table that
is chosen by the designer as the primary means of accessing the
data in the table.
C) Discipline in statistics that studies ways to find the most
interesting projections of multi-dimensional spaces
D) None of these

Answer: C

42) Inductive logic programming is

A) A class of learning algorithms that try to derive a Prolog


program from examples
B) A table with n independent attributes can be seen as an
n-dimensional space
C) A prediction made using an extremely simple method, such as
always predicting the same output
D) None of these

Answer: A

43) Statistical significance is

A) The science of collecting, organizing, and applying


numerical facts
B) Measure of the probability that a certain hypothesis is
incorrect given certain observations.
C) One of the defining aspects of a data warehouse, which is
specially built around all the existing applications of the
operational data
D) None of these

Answer: B

44) Multi-dimensional knowledge is

A) A class of learning algorithms that try to derive a Prolog


program from examples
B) A table with n independent attributes can be seen as an
n-dimensional space
C) A prediction made using an extremely simple method, such as
always predicting the same output.
D) None of these

Answer: B

45) Prediction is

A) The result of the application of a theory or a rule in a


specific case
B) One of several possible enters within a database table that
is chosen by the designer as the primary means of accessing the
data in the table.
C) Discipline in statistics that studies ways to find the most
interesting projections of multi-dimensional spaces.
D) None of these

Answer: A

46) Query tools are

A) A reference to the speed of an algorithm, which is


quadratically dependent on the size of the data
B) Attributes of a database table that can take only numerical
values.
C) Tools designed to query a database.
D) None of these

Answer: C

47) Operational database is

A) A measure of the desired maximal complexity of data mining


algorithms
B) A database containing volatile data used for the daily
operation of an organization
C) Relational database management system
D) None of these

Answer: B

48) Which of the following is/are the Data mining tasks?

(a) Regression
(b) Classification
(c) Clustering
(d) inference of associative rules
(e) All (a), (b), (c) and (d) above.

Answer: E
Explanation: Regression, Classification and Clustering are the
data mining tasks.

49) In a data warehouse, if D1 and D2 are two conformed


dimensions, then

(a) D1 may be an exact replica of D2


(b) D1 may be at a rolled up level of granularity compared to
D2
(c) Columns of D1 may be a subset of D2 and vice versa
(d) Rows of D1 may be a subset of D2 and vice versa
(e) All (a), (b), (c) and (d) above.

Answer: A
Explanation: In a data warehouse, if D1 and D2 are two
conformed dimensions, then D1 may be an exact replica of D2.

50. Which of the following is not an ETL tool?

(a) Informatica
(b) Oracle warehouse builder
(c) Datastage
(d) Visual studio
(e) DT/studio.

Answer: D
Explanation: Visual Studio is not an ETL tool.

51) ...................... is an essential process where


intelligent methods are applied to extract data patterns.
A) Data warehousing
B) Data mining
C) Text mining
D) Data selection

Answer: B) Data mining

52) Data mining can also applied to other forms such


as ................

i) Data streams
ii) Sequence data
iii) Networked data
iv) Text data
v) Spatial data

A) i, ii, iii and v only


B) ii, iii, iv and v only
C) i, iii, iv and v only
D) All i, ii, iii, iv and v

Answer: D) All i, ii, iii, iv and v

53) Which of the following is not a data mining functionality?

A) Characterization and Discrimination


B) Classification and regression
C) Selection and interpretation
D) Clustering and Analysis

Answer: C) Selection and interpretation

54) ............................. is a summarization of the


general characteristics or features of a target class of data.

A) Data Characterization
B) Data Classification
C) Data discrimination
D) Data selection

Answer: A) Data Characterization

55) ............................. is a comparison of the general


features of the target class data objects against the general
features of objects from one or multiple contrasting classes.
A) Data Characterization
B) Data Classification
C) Data discrimination
D) Data selection

Answer: C) Data discrimination

56) Strategic value of data mining is ......................

A) cost-sensitive
B) work-sensitive
C) time-sensitive
D) technical-sensitive

Answer: C) time-sensitive

57) ............................. is the process of finding a


model that describes and distinguishes data classes or concepts.

A) Data Characterization
B) Data Classification
C) Data discrimination
D) Data selection

Answer: B) Data Classification

58. The various aspects of data mining methodologies


is/are ...................

i) Mining various and new kinds of knowledge


ii) Mining knowledge in multidimensional space
iii) Pattern evaluation and pattern or constraint-guided
mining.
iv) Handling uncertainty, noise, or incompleteness of data

A) i, ii and iv only
B) ii, iii and iv only
C) i, ii and iii only
D) All i, ii, iii and iv

Answer: D) All i, ii, iii and iv

59) The full form of KDD is ..................

A) Knowledge Database
B) Knowledge Discovery Database
C) Knowledge Data House
D) Knowledge Data Definition

Answer: B) Knowledge Discovery Database

60) The out put of KDD is .............

A) Data
B) Information
C) Query
D) Useful information

Answer: D) Useful information

61. The full form of OLAP is

A) Online Analytical Processing


B) Online Advanced Processing
C) Online Advanced Preparation
D) Online Analytical Performance

Answer: A) Online Analytical Processing

62) ......................... is a subject-oriented, integrated,


time-variant, nonvolatile collection or data in support of
management decisions.

A) Data Mining
B) Data Warehousing
C) Document Mining
D) Text Mining

Answer: B) Data Warehousing

63) The data is stored, retrieved and updated


in ....................

A) OLAP
B) OLTP
C) SMTP
D) FTP

Answer: B) OLTP
64) An .................. system is market-oriented and is used
for data analysis by knowledge workers, including managers,
executives, and analysts.

A) OLAP
B) OLTP
C) Both of the above
D) None of the above

Answer: A) OLAP

65) ........................ is a good alternative to the star


schema.

A) Star schema
B) Snowflake schema
C) Fact constellation
D) Star-snowflake schema

Answer: C) Fact constellation

66) The ............................ exposes the information


being captured, stored, and managed by operational systems.

A) top-down view
B) data warehouse view
C) data source view
D) business query view

Answer: C) data source view

67) The type of relationship in star schema is ...............

A) many to many
B) one to one
C) one to many
D) many to one

Answer: C) one to many

68) The .................. allows the selection of the relevant


information necessary for the data warehouse.

A) top-down view
B) data warehouse view
C) data source view
D) business query view

Answer: A) top-down view

69) Which of the following is not a component of a data


warehouse?

A) Metadata
B) Current detail data
C) Lightly summarized data
D) Component Key

Answer: D) Component Key

70) Which of the following is not a kind of data warehouse


application?

A) Information processing
B) Analytical processing
C) Data mining
D) Transaction processing

Answer: D) Transaction processing

71) Data warehouse architecture is based


on .......................

A) DBMS
B) RDBMS
C) Sybase
D) SQL Server

Answer:B) RDBMS

72) .......................... supports basic OLAP operations,


including slice and dice, drill-down, roll-up and pivoting.

A) Information processing
B) Analytical processing
C) Data mining
D) Transaction processing

Answer: B) Analytical processing


73) The core of the multidimensional model is
the ....................... , which consists of a large set of
facts and a number of dimensions.

A) Multidimensional cube
B) Dimensions cube
C) Data cube
D) Data model

Answer: C) Data cube

74) The data from the operational environment


enter ........................ of data warehouse.

A) Current detail data


B) Older detail data
C) Lightly Summarized data
D) Highly summarized data

Answer: A) Current detail data

75) A data warehouse is ......................

A) updated by end users.


B) contains numerous naming conventions and formats
C) organized around important subject areas
D) contain only current data

Answer: C) organized around important subject areas

76) Business Intelligence and data warehousing is used


for ..............

A) Forecasting
B) Data Mining
C) Analysis of large volumes of product sales data
D) All of the above

Answer: D) All of the above

77) Data warehouse contains ................ data that is never


found in the operational environment.

A) normalized
B) informational
C) summary
D) denormalized
Answer: C) summary

78) ................... are responsible for running queries and


reports against data warehouse tables.

A) Hardware
B) Software
C) End users
D) Middle ware

Answer: C) End users

79) The biggest drawback of the level indicator in the classic


star schema is that is limits ............

A) flexibility
B) quantify
C) qualify
D) ability

Answer: A) flexibility

80) ............................. are designed to overcome any


limitations placed on the warehouse by the nature of the
relational data model.

A) Operational database
B) Relational database
C) Multidimensional database
D) Data repository

Answer: C) Multidimensional database

81) Which of the following is the most important when deciding


on the data structure of a data mart?

(a) XML data exchange standards


(b) Data access tools to be used
(c) Metadata naming conventions
(d) Extract, Transform, and Load (ETL) tool to be used
(e) All (a), (b), (c) and (d) above.

Answer: B
Explanation: Data access tools to be used when deciding on the
data structure of a data mart.
82) The process of removing the deficiencies and loopholes in
the data is called as

(a) Aggregation of data


(b) Extracting of data
(c) Cleaning up of data.
(d) Loading of data
(e) Compression of data.

Answer: C
Explanation: The process of removing the deficiencies and
loopholes in the data is called as cleaning up of data.

83) Which one manages both current and historic transactions?

(a) OLTP
(b) OLAP
(c) Spread sheet
(d) XML
(e) All (a), (b), (c) and (d) above.

Answer: B
Explanation: Online Analytical Processing (OLAP) manages both
current and historic transactions.

84) Which of the following is the collection of data objects


that are similar to one another within the same group?

(a) Partitioning
(b) Grid
(c) Cluster
(d) Table
(e) Data source.

Answer: C
Explanation: Cluster is the collection of data objects that are
similar to one another within the same group.

85) Which of the following employees data mining techniques to


analyze the intent of a user query, provided additional
generalized or associated information relevant to the query?

(a) Iceberg query method


(b) Data analyzer
(c) Intelligent query answering
(d) DBA
(e) Query parser.
Answer: C
Explanation: Intelligent Query Answering employee’s data
mining techniques to analyze the intent of a user query provided
additional generalized or associated information relevant to the
query.

86) Which of the following process includes data cleaning, data


integration, data selection, data transformation, data mining,
pattern evolution and knowledge presentation?

(a) KDD process


(b) ETL process
(c) KTL process
(d) MDX process
(e) None of the above.

Answer: A
Explanation: KDD Process includes data cleaning, data
integration, data selection, data transformation, data mining,
pattern evolution, and knowledge presentation.

87. At which level we can create dimensional models?

(a) Business requirements level


(b) Architecture models level
(c) Detailed models level
(d) Implementation level
(e) Testing level.

Answer: B
Explanation: Dimensional models can be created at Architecture
models level.

88) Which of the following is not related to dimension table


attributes?

(a) Verbose
(b) Descriptive
(c) Equally unavailable
(d) Complete
(e) Indexed.

Answer: C
Explanation: Equally unavailable is not related to dimension
table attributes.
89) Data warehouse bus matrix is a combination of

(a) Dimensions and data marts


(b) Dimensions and facts
(c) Facts and data marts
(d) Dimensions and detailed facts
(e) All (a), (b), (c) and (d) above.

Answer: A
Explanation: Data warehouse bus matrix is a combination of
Dimensions and data marts.

90) Which of the following is not the managing issue in the


modeling process?

(a) Content of primary units column


(b) Document each candidate data source
(c) Do regions report to zones
(d) Walk through business scenarios
(e) Ensure that the transaction edit flat is used for analysis.

Answer: E
Explanation: Ensure that the transaction edit flat is used for
analysis is not the managing issue in the modeling process.

91) Data modeling technique used for data marts is

(a) Dimensional modeling


(b) ER – model
(c) Extended ER – model
(d) Physical model
(e) Logical model.

Answer: A
Explanation: Data modeling technique used for data marts is
Dimensional modeling.

92) A warehouse architect is trying to determine what data must


be included in the warehouse. A meeting has been arranged with a
business analyst to understand the data requirements, which of
the following should be included in the agenda?

(a) Number of users


(b) Corporate objectives
(c) Database design
(d) Routine reporting
(e) Budget.
Answer: D
Explanation: Routine reporting should be included in the
agenda.

93. An OLAP tool provides for

(a) Multidimensional analysis


(b) Roll-up and drill-down
(c) Slicing and dicing
(d) Rotation
(e) Setting up only relations.

Answer: C
Explanation: An OLAP tool provides for Slicing and dicing.

94. The Synonym for data mining is

(a) Data warehouse


(b) Knowledge discovery in database
(c) ETL
(d) Business intelligence
(e) OLAP.

Answer: C
Explanation: The synonym for data mining is Knowledge discovery
in Database.

95) Which of the following statements is true?

(a) A fact table describes the transactions stored in a DWH


(b) A fact table describes the granularity of data held in a
DWH
(c) The fact table of a data warehouse is the main store of
descriptions of the transactions stored in a DWH
(d) The fact table of a data warehouse is the main store of all
of the recorded transactions over time
(e) A fact table maintains the old records of the database.

Answer: D
Explanation: The fact table of a data warehouse is the main
store of all of the recorded transactions over time is the
correct statement.

96) Most common kind of queries in a data warehouse

(a) Inside-out queries


(b) Outside-in queries
(c) Browse queries
(d) Range queries
(e) All (a), (b), (c) and (d) above.

Answer: A
Explanation: The Most common kind of queries in a data
warehouse is Inside-out queries.

97) Concept description is the basic form of the

(a) Predictive data mining


(b) Descriptive data mining
(c) Data warehouse
(d) Relational data base
(e) Proactive data mining.

Answer: B
Explanation: Concept description is the basis form of the
descriptive data mining.

98) The apriori property means

(a) If a set cannot pass a test, all of its supersets will fail
the same test as well
(b) To improve the efficiency the level-wise generation of
frequent item sets
(c) If a set can pass a test, all of its supersets will fail
the same test as well
(d) To decrease the efficiency the level-wise generation of
frequent item sets
(e) All (a), (b), (c) and (d) above.

Answer: B
Explanation: The apriori property means to improve the
efficiency the level-wise generation of frequent item sets.

99) Which of following form the set of data created to support a


specific short lived business situation?

(a) Personal data marts


(b) Application models
(c) Downstream systems
(d) Disposable data marts
(e) Data mining models.

Answer: D
Explanation: Disposable Data Marts is the form the set of data
created to support a specific short lived business situation.

100) What is/are the different types of Meta data?

I. Administrative.
II. Business.
III. Operational.

(a) Only (I) above


(b) Both (II) and (III) above
(c) Both (I) and (II) above
(d) Both (I) and (III) above
(e) All (I), (II) and (III) above.

Answer: E
Explanation: The different types of Meta data are
Administrative, Business and Operational.

101) Multiple Regression means

(a) Data are modeled using a straight line


(b) Data are modeled using a curve line
(c) Extension of linear regression involving only one
predicator value
(d) Extension of linear regression involving more than one
predicator value
(e) All (a), (b), (c) and (d) above.

Answer: D
Explanation: Multiple Regression means extension of linear
regression involving more than one predicator value.

102) Which of the following should not be considered for each


dimension attribute?

(a) Attribute name


(b) Rapid changing dimension policy
(c) Attribute definition
(d) Sample data
(e) Cardinality.

Answer: B
Explanation: Rapid changing dimension policy should not be
considered for each dimension attribute.

103) A Business Intelligence system requires data from:


(a) Data warehouse
(b) Operational systems
(c) All possible sources within the organization and possibly
from external sources
(d) Web servers
(e) Database servers.

Answer: A
Explanation: A business Intelligence system requires data from
Data warehouse

104) Data mining application domains are

(a) Biomedical
(b) DNA data analysis
(c) Financial data analysis
(d) Retail industry and telecommunication industry
(e) All (a), (b), (c) and (d) above.

Answer: E
Explanation: Data mining application domains are Biomedical,
DNA data analysis, Financial data analysis and Retail industry
and telecommunication industry

105. The generalization of multidimensional attributes of a


complex object class can be performed by examining each
attribute, generalizing each attribute to simple-value data and
constructing a multidimensional data cube is called as

(a) Object cube


(b) Relational cube
(c) Transactional cube
(d) Tuple
(e) Attribute.

Answer: A
Explanation: The generalization of multidimensional attributes
of a complex object class can be performed by examining each
attribute, generalizing each attribute to simple-value data and
constructing a multidimensional data cube is called as object
cube.

106. Which of the following project is a building a data mart


for a business process/department that is very critical for your
organization?
(a) High risk high reward
(b) High risk low reward
(c) Low risk low reward
(d) Low risk high reward
(e) Involves high risks.

Answer: A
Explanation: High risk high reward project is a building a data
mart for a business process/department that is very critical for
your organization

107. Which of the following tools a business intelligence system


will have?

(a) OLAP tool


(b) Data mining tool
(c) Reporting tool
(d) Both(a) and (b) above
(e) (a), (b) and (c) above.

Answer: A
Explanation: Business intelligence system will have OLAP, Data
mining and reporting tolls.

108. A feature F1 can take certain value: A, B, C, D, E, & F and


represents grade of students from a college.

1) Which of the following statement is true in following case?

A) Feature F1 is an example of nominal variable.


B) Feature F1 is an example of ordinal variable.
C) It doesn’t belong to any of the above category.
D) Both of these

Solution: (B)

Ordinal variables are the variables which has some order in


their categories. For example, grade A should be consider as
high grade than grade B.

2) Which of the following is an example of a deterministic


algorithm?

A) PCA
B) K-Means

C) None of the above

Solution: (A)

A deterministic algorithm is that in which output does not


change on different runs. PCA would give the same result if we
run again, but not k-means.

3) [True or False] A Pearson correlation between two variables


is zero but, still their values can still be related to each
other.

A) TRUE

B) FALSE

Solution: (A)

Y=X2. Note that, they are not only associated, but one is a
function of the other and Pearson correlation between them is 0.

4) Which of the following statement(s) is / are true for


Gradient Decent (GD) and Stochastic Gradient Decent (SGD)?

1. In GD and SGD, you update a set of parameters in an


iterative manner to minimize the error function.

2. In SGD, you have to run through all the samples in your


training set for a single update of a parameter in each
iteration.

3. In GD, you either use the entire data or a subset of


training data to update a parameter in each iteration.

A) Only 1

B) Only 2

C) Only 3
D) 1 and 2

E) 2 and 3

F) 1,2 and 3

Solution: (A)

In SGD for each iteration you choose the batch which is


generally contain the random sample of data But in case of GD
each iteration contain the all of the training observations.

5) Which of the following hyper parameter(s), when increased may


cause random forest to over fit the data?

1. Number of Trees

2. Depth of Tree

3. Learning Rate

A) Only 1

B) Only 2

C) Only 3

D) 1 and 2

E) 2 and 3

F) 1,2 and 3

Solution: (B)

Usually, if we increase the depth of tree it will cause


overfitting. Learning rate is not an hyperparameter in random
forest. Increase in the number of tree will cause under fitting.

6) Imagine, you are working with “Analytics Vidhya” and you want
to develop a machine learning algorithm which predicts the
number of views on the articles.
Your analysis is based on features like author name, number of
articles written by the same author on Analytics Vidhya in past
and a few other features. Which of the following evaluation
metric would you choose in that case?

1. Mean Square Error

2. Accuracy

3. F1 Score

A) Only 1

B) Only 2

C) Only 3

D) 1 and 3

E) 2 and 3

F) 1 and 2

Solution:(A)

You can think that the number of views of articles is the


continuous target variable which fall under the regression
problem. So, mean squared error will be used as an evaluation
metrics.

7) Given below are three images (1,2,3). Which of the following


option is correct for these images?
A)

B)
C)
A) 1 is tanh, 2 is ReLU and 3 is SIGMOID activation functions.

B) 1 is SIGMOID, 2 is ReLU and 3 is tanh activation functions.

C) 1 is ReLU, 2 is tanh and 3 is SIGMOID activation functions.

D) 1 is tanh, 2 is SIGMOID and 3 is ReLU activation functions.

Solution: (D)

The range of SIGMOID function is [0,1].

The range of the tanh function is [-1,1].

The range of the RELU function is [0, infinity].

So Option D is the right answer.

8) Below are the 8 actual values of target variable in the train


file.

[0,0,0,1,1,1,1,1]

What is the entropy of the target variable?

A) -(5/8 log(5/8) + 3/8 log(3/8))

B) 5/8 log(5/8) + 3/8 log(3/8)


C) 3/8 log(5/8) + 5/8 log(3/8)

D) 5/8 log(3/8) – 3/8 log(5/8)

Solution: (A)

The formula for entropy is

So the answer is A.

9) Let’s say, you are working with categorical feature(s) and


you have not looked at the distribution of the categorical
variable in the test data.

You want to apply one hot encoding (OHE) on the categorical


feature(s). What challenges you may face if you have applied OHE
on a categorical variable of train dataset?

A) All categories of categorical variable are not present in the


test dataset.

B) Frequency distribution of categories is different in train as


compared to the test dataset.

C) Train and Test always have same distribution.

D) Both A and B

E) None of these

Solution: (D)

Both are true, The OHE will fail to encode the categories which
is present in test but not in train so it could be one of the
main challenges while applying OHE. The challenge given in
option B is also true you need to more careful while applying
OHE if frequency distribution doesn’t same in train and test.

10) Skip gram model is one of the best models used in Word2vec
algorithm for words embedding. Which one of the following models
depict the skip gram model?
A) A

B) B

C) Both A and B

D) None of these

Solution: (B)

Both models (model1 and model2) are used in Word2vec algorithm.


The model1 represent a CBOW model where as Model2 represent the
Skip gram model.

11) Let’s say, you are using activation function X in hidden


layers of neural network. At a particular neuron for any given
input, you get the output as “-0.0001”. Which of the following
activation function could X represent?

A) ReLU

B) tanh

C) SIGMOID
D) None of these

Solution: (B)

The function is a tanh because the this function output range is


between (-1,-1).

12) [True or False] LogLoss evaluation metric can have negative


values.

A) TRUE
B) FALSE

Solution: (B)

Log loss cannot have negative values.

13) Which of the following statements is/are true about “Type-1”


and “Type-2” errors?

1. Type1 is known as false positive and Type2 is known as


false negative.

2. Type1 is known as false negative and Type2 is known as


false positive.

3. Type1 error occurs when we reject a null hypothesis when it


is actually true.

A) Only 1

B) Only 2

C) Only 3

D) 1 and 2

E) 1 and 3

F) 2 and 3

Solution: (E)
In statistical hypothesis testing, a type I error is the
incorrect rejection of a true null hypothesis (a “false
positive”), while a type II error is incorrectly retaining a
false null hypothesis (a “false negative”).

14) Which of the following is/are one of the important step(s)


to pre-process the text in NLP based projects?

1. Stemming

2. Stop word removal

3. Object Standardization

A) 1 and 2

B) 1 and 3

C) 2 and 3

D) 1,2 and 3

Solution: (D)

Stemming is a rudimentary rule-based process of stripping the


suffixes (“ing”, “ly”, “es”, “s” etc) from a word.

Stop words are those words which will have not relevant to the
context of the data for example is/am/are.

Object Standardization is also one of the good way to


pre-process the text.

15) Suppose you want to project high dimensional data into lower
dimensions. The two most famous dimensionality reduction
algorithms used here are PCA and t-SNE. Let’s say you have
applied both algorithms respectively on data “X” and you got the
datasets “X_projected_PCA” , “X_projected_tSNE”.

Which of the following statements is true for “X_projected_PCA”


& “X_projected_tSNE” ?
A) X_projected_PCA will have interpretation in the nearest
neighbour space.

B) X_projected_tSNE will have interpretation in the nearest


neighbour space.

C) Both will have interpretation in the nearest neighbour space.

D) None of them will have interpretation in the nearest


neighbour space.

Solution: (B)

t-SNE algorithm consider nearest neighbour points to reduce the


dimensionality of the data. So, after using t-SNE we can think
that reduced dimensions will also have interpretation in nearest
neighbour space. But in case of PCA it is not the case.

Context: 16-17

Given below are three scatter plots for two features (Image 1, 2
& 3 from left to right).

16) In the above images, which of the following is/are example


of multi-collinear features?

A) Features in Image 1

B) Features in Image 2

C) Features in Image 3

D) Features in Image 1 & 2


E) Features in Image 2 & 3

F) Features in Image 3 & 1

Solution: (D)

In Image 1, features have high positive correlation where as in


Image 2 has high negative correlation between the features so in
both images pair of features are the example of multicollinear
features.

17) In previous question, suppose you have identified


multi-collinear features. Which of the following action(s) would
you perform next?

1. Remove both collinear variables.

2. Instead of removing both variables, we can remove only one


variable.

3. Removing correlated variables might lead to loss of


information. In order to retain those variables, we can use
penalized regression models like ridge or lasso regression.

A) Only 1

B)Only 2

C) Only 3

D) Either 1 or 3

E) Either 2 or 3

Solution: (E)

You cannot remove the both features because after removing the
both features you will lose all of the information so you
should either remove the only 1 feature or you can use the
regularization algorithm like L1 and L2.
18) Adding a non-important feature to a linear regression model
may result in.

1. Increase in R-square

2. Decrease in R-square

A) Only 1 is correct

B) Only 2 is correct

C) Either 1 or 2

D) None of these

Solution: (A)

After adding a feature in feature space, whether that feature is


important or unimportant features the R-squared always increase.

19) Suppose, you are given three variables X, Y and Z. The


Pearson correlation coefficients for (X, Y), (Y, Z) and (X, Z)
are C1, C2 & C3 respectively.

Now, you have added 2 in all values of X (i.enew values become


X+2), subtracted 2 from all values of Y (i.e. new values are
Y-2) and Z remains the same. The new coefficients for (X,Y),
(Y,Z) and (X,Z) are given by D1, D2 & D3 respectively. How do
the values of D1, D2 & D3 relate to C1, C2 & C3?

A) D1= C1, D2 < C2, D3 > C3

B) D1 = C1, D2 > C2, D3 > C3

C) D1 = C1, D2 > C2, D3 < C3

D) D1 = C1, D2 < C2, D3 < C3

E) D1 = C1, D2 = C2, D3 = C3

F) Cannot be determined

Solution: (E)
Correlation between the features won’t change if you add or
subtract a value in the features.

20) Imagine, you are solving a classification problems with


highly imbalanced class. The majority class is observed 99% of
times in the training data.

Your model has 99% accuracy after taking the predictions on test
data. Which of the following is true in such a case?

1. Accuracy metric is not a good idea for imbalanced class


problems.

2. Accuracy metric is a good idea for imbalanced class


problems.

3. Precision and recall metrics are good for imbalanced class


problems.

4. Precision and recall metrics aren’t good for imbalanced


class problems.

A) 1 and 3

B) 1 and 4

C) 2 and 3

D) 2 and 4

Solution: (A)

Refer the question number 4 from in this article.

21) In ensemble learning, you aggregate the predictions for weak


learners, so that an ensemble of these models will give a better
prediction than prediction of individual models.

Which of the following statements is / are true for weak


learners used in ensemble model?

1. They don’t usually overfit.


2. They have high bias, so they cannot solve complex learning
problems

3. They usually overfit.

A) 1 and 2

B) 1 and 3

C) 2 and 3

D) Only 1

E) Only 2

F) None of the above

Solution: (A)

Weak learners are sure about particular part of a problem. So,


they usually don’t overfit which means that weak learners have
low variance and high bias.

22) Which of the following options is/are true for K-fold


cross-validation?

1. Increase in K will result in higher time required to cross


validate the result.

2. Higher values of K will result in higher confidence on the


cross-validation result as compared to lower value of K.

3. If K=N, then it is called Leave one out cross validation,


where N is the number of observations.

A) 1 and 2

B) 2 and 3

C) 1 and 3

D) 1,2 and 3
Solution: (D)

Larger k value means less bias towards overestimating the true


expected error (as training folds will be closer to the total
dataset) and higher running time (as you are getting closer to
the limit case: Leave-One-Out CV). We also need to consider the
variance between the k folds accuracy while selecting the k.

Question Context 23-24

Cross-validation is an important step in machine learning for


hyper parameter tuning. Let’s say you are tuning a
hyper-parameter “max_depth” for GBM by selecting it from 10
different depth values (values are greater than 2) for tree
based model using 5-fold cross validation.

Time taken by an algorithm for training (on a model with


max_depth 2) 4-fold is 10 seconds and for the prediction on
remaining 1-fold is 2 seconds.

Note: Ignore hardware dependencies from the equation.

23) Which of the following option is true for overall execution


time for 5-fold cross validation with 10 different values of
“max_depth”?

A) Less than 100 seconds

B) 100 – 300 seconds

C) 300 – 600 seconds

D) More than or equal to 600 seconds

C) None of the above

D) Can’t estimate

Solution: (D)

Each iteration for depth “2” in 5-fold cross validation will


take 10 secs for training and 2 second for testing. So, 5 folds
will take 12*5 = 60 seconds. Since we are searching over the 10
depth values so the algorithm would take 60*10 = 600 seconds.
But training and testing a model on depth greater than 2 will
take more time than depth “2” so overall timing would be greater
than 600.

24) In previous question, if you train the same algorithm for


tuning 2 hyper parameters say “max_depth” and “learning_rate”.

You want to select the right value against “max_depth” (from


given 10 depth values) and learning rate (from given 5 different
learning rates). In such cases, which of the following will
represent the overall time?

A) 1000-1500 second

B) 1500-3000 Second

C) More than or equal to 3000 Second

D) None of these

Solution: (D)

Same as question number 23.

25) Given below is a scenario for training error TE and


Validation error VE for a machine learning algorithm M1. You
want to choose a hyperparameter (H) based on TE and VE.

H TE VE

1 105 90

2 200 85

3 250 96

4 105 85

5 300 100

Which value of H will you choose based on the above table?

A) 1
B) 2

C) 3

D) 4

E) 5

Solution: (D)

Looking at the table, option D seems the best

26) What would you do in PCA to get the same projection as SVD?

A) Transform data to zero mean

B) Transform data to zero median

C) Not possible

D) None of these

Solution: (A)

When the data has a zero mean vector PCA will have same
projections as SVD, otherwise you have to centre the data first
before taking SVD.

Question Context 27-28

Assume there is a black box algorithm, which takes training data


with multiple observations (t1, t2, t3,…….. tn) and a new
observation (q1). The black box outputs the nearest neighbor of
q1 (say ti) and its corresponding class label ci.

You can also think that this black box algorithm is same as 1-NN
(1-nearest neighbor).

27) It is possible to construct a k-NN classification algorithm


based on this black box alone.

Note: Where n (number of training observations) is very large


compared to k.
A) TRUE

B) FALSE

Solution: (A)

In first step, you pass an observation (q1) in the black box


algorithm so this algorithm would return a nearest observation
and its class.

In second step, you through it out nearest observation from


train data and again input the observation (q1). The black box
algorithm will again return the a nearest observation and it’s
class.

You need to repeat this procedure k times

28) Instead of using 1-NN black box we want to use the j-NN
(j>1) algorithm as black box. Which of the following option is
correct for finding k-NN using j-NN?

1. J must be a proper factor of k

2. J > k

3. Not possible

A) 1

B) 2

C) 3

Solution: (A)

Same as question number 27

29) Suppose you are given 7 Scatter plots 1-7 (left to right)
and you want to compare Pearson correlation coefficients between
variables of each scatterplot.

Which of the following is in the right order?


1. 1<2<3<4

2. 1>2>3 > 4

3. 7<6<5<4

4. 7>6>5>4

A) 1 and 3

B) 2 and 3

C) 1 and 4

D) 2 and 4

Solution: (B)

from image 1to 4 correlation is decreasing (absolute value). But


from image 4 to 7 correlation is increasing but values are
negative (for example, 0, -0.3, -0.7, -0.99).

30) You can evaluate the performance of a binary class


classification problem using different metrics such as accuracy,
log-loss, F-Score. Let’s say, you are using the log-loss
function as evaluation metric.

Which of the following option is / are true for interpretation


of log-loss as an evaluation metric?

1.
If a classifier is confident about an incorrect
classification, then log-loss will penalise it heavily.

2. For a particular observation, the classifier assigns a very


small probability for the correct class then the
corresponding contribution to the log-loss will be very
large.

3. Lower the log-loss, the better is the model.

A) 1 and 3

B) 2 and 3

C) 1 and 2

D) 1,2 and 3

Solution: (D)

Options are self-explanatory.

Question 31-32

Below are five samples given in the dataset.

Note: Visual distance between the points in the image represents


the actual distance.

31) Which of the following is leave-one-out cross-validation


accuracy for 3-NN (3-nearest neighbor)?

A) 0

D) 0.4

C) 0.8
D) 1

Solution: (C)

In Leave-One-Out cross validation, we will select (n-1)


observations for training and 1 observation of validation.
Consider each point as a cross validation point and then find
the 3 nearest point to this point. So if you repeat this
procedure for all points you will get the correct classification
for all positive class given in the above figure but negative
class will be misclassified. Hence you will get 80% accuracy.

32) Which of the following value of K will have least


leave-one-out cross validation accuracy?

A) 1NN

B) 3NN

C) 4NN

D) All have same leave one out error

Solution: (A)

Each point which will always be misclassified in 1-NN which


means that you will get the 0% accuracy.

33) Suppose you are given the below data and you want to apply a
logistic regression model for classifying it in two given
classes.
You are using logistic regression with L1 regularization.

Where C is the
regularization parameter and w1 & w2 are the coefficients of x1
and x2.

Which of the following option is correct when you increase the


value of C from zero to a very large value?

A) First w2 becomes zero and then w1 becomes zero


B) First w1 becomes zero and then w2 becomes zero

C) Both becomes zero at the same time

D) Both cannot be zero even after very large value of C

Solution: (B)

By looking at the image, we see that even on just using x2, we


can efficiently perform classification. So at first w1 will
become 0. As regularization parameter increases more, w2 will
come more and more closer to 0.

34) Suppose we have a dataset which can be trained with 100%


accuracy with help of a decision tree of depth 6. Now consider
the points below and choose the option based on these points.

Note: All other hyper parameters are same and other factors are
not affected.

1. Depth 4 will have high bias and low variance

2. Depth 4 will have low bias and low variance

A) Only 1

B) Only 2

C) Both 1 and 2

D) None of the above

Solution: (A)

If you fit decision tree of depth 4 in such data means it will


more likely to underfit the data. So, in case of underfitting
you will have high bias and low variance.

35) Which of the following options can be used to get global


minima in k-Means Algorithm?

1. Try to run algorithm for different centroid initialization

2. Adjust number of iterations


3. Find out the optimal number of clusters

A) 2 and 3

B) 1 and 3

C) 1 and 2

D) All of above

Solution: (D)

All of the option can be tuned to find the global minima.

36) Imagine you are working on a project which is a binary


classification problem. You trained a model on training dataset
and get the below confusion matrix on validation dataset.

Based on the above confusion matrix, choose which option(s)


below will give you correct predictions?

1. Accuracy is ~0.91

2. Misclassification rate is ~ 0.91

3. False positive rate is ~0.95

4. True positive rate is ~0.95

A) 1 and 3

B) 2 and 4
C) 1 and 4

D) 2 and 3

Solution: (C)

The Accuracy (correct classification) is (50+100)/165 which is


nearly equal to 0.91.

The true Positive Rate is how many times you are predicting
positive class correctly so true positive rate would be 100/105
= 0.95 also known as “Sensitivity” or “Recall”

37) For which of the following hyperparameters, higher value is


better for decision tree algorithm?

1. Number of samples used for split

2. Depth of tree

3. Samples for leaf

A)1 and 2

B) 2 and 3

C) 1 and 3

D) 1, 2 and 3

E) Can’t say

Solution: (E)

For all three options A, B and C, it is not necessary that if


you increase the value of parameter the performance may
increase. For example, if we have a very high value of depth of
tree, the resulting tree may overfit the data, and would not
generalize well. On the other hand, if we have a very low value,
the tree may underfit the data. So, we can’t say for sure that
“higher is better”.

Context 38-39
Imagine, you have a 28 * 28 image and you run a 3 * 3
convolution neural network on it with the input depth of 3 and
output depth of 8.

Note: Stride is 1 and you are using same padding.

38) What is the dimension of output feature map when you are
using the given parameters.

A) 28 width, 28 height and 8 depth

B) 13 width, 13 height and 8 depth

C) 28 width, 13 height and 8 depth

D) 13 width, 28 height and 8 depth

Solution: (A)

The formula for calculating output size is

output size = (N – F)/S + 1

where, N is input size, F is filter size and S is stride.

Read this article to get a better understanding.

39) What is the dimensions of output feature map when you are
using following parameters.

A) 28 width, 28 height and 8 depth

B) 13 width, 13 height and 8 depth

C) 28 width, 13 height and 8 depth

D) 13 width, 28 height and 8 depth

Solution: (B)

Same as above
40) Suppose, we were plotting the visualization for different
values of C (Penalty parameter) in SVM algorithm. Due to some
reason, we forgot to tag the C values with visualizations. In
that case, which of the following option best explains the C
values for the images below (1,2,3 left to right, so C values
are C1 for image1, C2 for image2 and C3 for image3 ) in case of
rbf kernel.

A) C1 = C2 = C3

B) C1 > C2 > C3

C) C1 < C2 < C3

D) None of these

Solution: (C)

Penalty parameter C of the error term. It also controls the


trade-off between smooth decision boundary and classifying the
training points correctly. For large values of C, the
optimization will choose a smaller-margin hyperplane.
1. Which of the following is a widely used and effective machine learning algorithm
based on the idea of bagging?
A. Decision Tree
B. Regression
C. Classification
D. Random Forest
ANSWER: D

2. The most widely used metrics and tools to assess a classification model is:
A. Confusion matrix
B. Cost-sensitive accuracy
C. Area under the ROC curve
D. All of these
ANSWER: D

3. Which of the following is a good test dataset characteristic?


A. Large enough to yield meaningful results
B. Is representative of the dataset as a whole
C. Both A and B
D. None of these
ANSWER: C

4. How do you handle missing or corrupted data in a dataset?


A. Drop missing rows or columns
B. Replace missing values with mean/median/mode
C. Assign a unique category to missing values
D. All of these
ANSWER: D

5. What is the purpose of performing cross-validation?


A. To assess the predictive performance of the models
B. To judge how the trained model performs outside the sample on test data
C. Both A and B
D. None of these
ANSWER: C

6. Statistical significance is
A. The science of collecting, ogranizing and applying numerical facts
B. Measure of the probability that a certain hypothesis is incorrect given certain
observations
C. One of the defining aspects of a data warehouse, which is specially built around
all the existing applicatons of the operational data
D. None of these
ANSWER: B
7. Which of the folllowing is an example of feature extraction?
A. Constructing bag of words vector from an email
B. Applying PCA projects to a large high-dimensional data
C. Removing stopwords in a sentence
D. All of these
ANSWER: D

8. How can you prevent a clustering algorithm from getting stuck in bad local optima?
A. Set the same seed value for each run
B. Use multiple random initializations
C. Both A and B
D. None of these
ANSWER: B

9. Adaptive system management is


A. It uses machine learning technique and program can learn from past experience and
adapt themselves to new situation
B. Computational procedure that takes some value as input and produces some value as
output
C. Science of making machines performs tasks that would require intelligence when
performed by humans
D. None of these
ANSWER: A

10. Binary attribute are


A. This takes only two values. In general, these values will be 0 and 1 and .they can
be coded as one bit
B. The natural environment of a certain species
C. Systems that can be used without knowledge of internal operations
D. None of these
ANSWER: A

11. Background knowledge referred to


A. Additional acquaintance used by a learning algorithm to facilitate the learning
process
B. Neural network that makes use of a hidden layer
C. It is a form of automatic learning
D. None of these
ANSWER: A

12. Classification is
A. Subdivision of a set of examples into a number of classes
B. Measure of the accuracy, of the classification of a concept that is given by a
certain theory
C. The task of assigning a classification to a set of examples
D. None of these
ANSWER: A

13. Classification accuracy is


A. Subdivision of a set of examples into a number of classes
B. Measure of the accuracy, of the classification of a concept that is given by a
certain theory
C. The task of assigning a classification to a set of examples
D. None of these
ANSWER: B

14. Cluster is
A. Group of similar objects that differ significantly from other objects
B. Operations on a database to transform or simplify data in order to prepare it for
a machine-learning algorithm
C. Symbolic representation of facts or ideas from which information can potentially
be extracted
D. None of these
ANSWER: A

15. Suppose you are given an EM algorithm that finds maximum likelihood estimates for
a model with latent variables. You are asked to modify the algorithm so that it finds MAP
estimates instead. Which step or steps do you need to modify?
A. Expectation
B. Maximization
C. No modification necessary
D. Both A & B
ANSWER: B

16. Compared to the variance of the Maximum Likelihood Estimate (MLE), the variance
of the Maximum A Posteriori (MAP) estimate is ________
A. Higher
B. Same
C. Lower
D. It could be any of the above
ANSWER: C

17. Incremental learning referred to


A. Machine-learning involving different techniques
B. The learning algorithmic analyzes the examples on a systematic basis and makes
incremental adjustments to the theory that is learned
C. Learning by generalizing from examples
D. None of these
ANSWER: B

18. Inductive learning is


A. Machine-learning involving different techniques
B. The learning algorithmic analyzes the examples on a systematic basis and makes
incremental adjustments to the theory that is learned
C. Learning by generalizing from examples
D. None of these
ANSWER: C

19. Predicting on whether will it rain or not tomorrow evening at a particular time
is a type of _________ problem.
A. Classification
B. Regression
C. Unsupervised learning
D. All o these
ANSWER: A

20. Machine learning is


A. An algorithm that can learn
B. Sub-discipline of computer science that deals with the design and implementation
of learning algorithms
C. An approach that abstracts from the actual strategy of an individual algorithm and
can therefore be applied to any other form of machine learning.
D. None of these
ANSWER: B

21. A feature F1 can take certain value: A, B, C, D, E, & F and represents grade of
students from a college.Which of the following statement is true in following case?
A. Feature F1 is an example of nominal variable.
B. Feature F1 is an example of ordinal variable.
C. It doesn’t belong to any of the above category.
D. Both of A & B
ANSWER: B

22. If your training loss increases with number of epochs, which of the following could
be a possible issue with the learning process?
A. Regularization is too low and model is overfitting
B. Regularization is too high and model is underfitting
C. Step size is too large
D. Step size is too small
ANSWER: C
23. Given a large dataset of medical records from patients suffering from heart disease,
try to learn whether there might be different clusters of such patients for which we might
tailor separate treatments. What kind of learning problem is this?
A. Supervised learning
B. Unsupervised learning
C. Both A and B
D. None of these
ANSWER: B

24. Multi-dimensional knowledge is


A. A class of learning algorithms that try to derive a Prolog program from examples
B. A table with n independent attributes can be seen as an n-dimensional space
C. A prediction made using an extremely simple method, such as always predicting the
same output
D. None of these
ANSWER: B

25. The mutual information


A. Is symmetric
B. Always non negative
C. Both A and B
D. None of these
ANSWER: C

26. Classifying email as a spam, labeling webpages based on their content, voice
recognition are the example of _____.
A. Supervised learning
B. Unsupervised learning
C. Machine learning
D. Deep learning
ANSWER: A

27. Deep learning is a subfield of machine learning where concerned algorithms are
inspired by the structured and function of the brain called _____.
A. Machine learning
B. Artificial neural networks
C. Deep learning
D. Robotics
ANSWER: B

28. Machine learning invented by _____.


A. John McCarthy
B. Nicklaus Wirth
C. Joseph Weizenbaum
D. Arthur Samuel
ANSWER: D

29. When the number of output classes is greater than one, there are main possibilities
to manage a classification problem:
A. One-vs-all, One-vs-one
B. One-vs-one, Many-vs-one
C. One-vs-many, Many-vs-one
D. None of these
ANSWER: A

30. For a neural network, which one of these structural assumptions is the one that
most affects the trade-off between underfitting (i.e. a high bias model) and overfitting
(i.e. a high variance model):
A. The learning rate
B. The number of hidden nodes
C. The initial choice of weights
D. The use of a constant-term unit input
ANSWER: B

31. ___________ refers to a model that can neither model the training data nor
generalize to new data.
A. Good fitting
B. Overfitting
C. Underfitting
D. All of the these
ANSWER: C

32. Given two Boolean random variables, A and B, where P(A) = 1/2, P(B) = 1/3, and P(A
| ¬B) = 1/4, what is P(A | B)?
A. 1/6
B. 1/4
C. 3/4
D. 1
ANSWER: D

33. Suppose your model is overfitting. Which of the following is NOT a valid way to
try and reduce the overfitting?
A. Increase the amount of training data
B. Improve the optimization algorithm being used for error minimization
C. Decrease the model complexity
D. Reduce the noise in the training data
ANSWER: B

34. Predicting on whether will it rain or not tomorrow evening at a particular time
is a type of _________ problem.
A. Classification
B. Regression
C. Unsupervised learning
D. All of these
ANSWER: A

35. Given a large dataset of medical records from patients suffering from heart disease,
try to learn whether there might be different clusters of such patients for which we might
tailor separate treatments. What kind of learning problem is this?
A. Supervised learning
B. Unsupervised learning
C. Both A and B
D. Neither A nor B
ANSWER: B

36. Given a large dataset of medical records from patients suffering from heart disease,
try to learn whether there might be different clusters of such patients for which we might
tailor separate treatments. What kind of learning problem is this?
A. Supervised learning
B. Unsupervised learning
C. Both A and B
D. Neither A nor B
ANSWER: B

37. Which of the following is NOT supervised learning?


A. Decision Tree
B. PCA
C. Linear Regression
D. Naive Bayesian
ANSWER: B

38. In 1984, the computer scientist_______proposed a mathematical approach to


determine whether a problem is learnable by a computer.
A. John McCarthy
B. Nicklaus Wirth
C. L. Valiant
D. Arthur Samuel
ANSWER: C
39. In binary classification which error measure or loss fuction is used?
A. Non-negative error measure
B. Mean square error
C. Zero-one-loss
D. None of these
ANSWER: C

40. Benefits of Parametric Machine Learning Algorithms:


A. Complex, slow, more training data
B. Simpler, faster, less traning Data
C. Both A and B
D. Neither A nor B
ANSWER: B

41. Limitations of Parametric Machine Learning Algorithms is:


A. Highly Constrained
B. Limited Complexity
C. Poor Fit
D. All of these
ANSWER: D

42. Artificial Neural Networks is example of:


A. Nonparametric model
B. Parametric models
C. Both A and B
D. None of these
ANSWER: A

43. Benefits of Non-parametric Machine Learning Algorithms:


A. More data, Slower, Overfitting
B. Flexibility, Power, Performance
C. Both A and B
D. Neither A nor B
ANSWER: B

44. Limitations of Non-parametric Machine Learning Algorithms:


A. More data, Slower, Overfitting
B. Flexibility, Power, Performance
C. Both A and B
D. Neither A nor B
ANSWER: A

45. Naive Bayes is example of:


A. Nonparametric model
B. Parametric models
C. Both A and B
D. Neither A nor B
ANSWER: B

46. Which of the following is wrong statement about the maximum likelihood approach?
A. This method doesn’t always involve probability calculations
B. It finds a tree that best accounts for the variation in a set of sequences
C. The method is similar to the maximum parsimony method
D. The analysis is performed on each column of a multiple sequence alignment
ANSWER: A

47. The main disadvantage of maximum likelihood methods is that they are _____
A. Mathematically less folded
B. Mathematically less complex
C. Computationally lucid
D. Computationally intense
ANSWER: B

48. Which learning is often preferable to MAP learning?


A. Expectation-maximization
B. Log-likelihood (L)
C. Maximum-likelihood (ML)
D. None of these
ANSWER: C

49. Which is measure used in information thoery?


A. Entropy
B. Cross-entropy
C. Conditional entropy
D. All of these
ANSWER: C

50. Which measure uses bits in information thoery?


A. Entropy
B. Cross-entropy
C. Conditional entropy
D. All of these
ANSWER: A
MCQ questions for unit 3: Regression

Multiple choice questions

1) True-False: Linear Regression is a supervised machine learning algorithm.

A) TRUE
B) FALSE

Solution: (A)

Yes, Linear regression is a supervised learning algorithm because it uses true labels for
training. Supervised learning algorithm should have input variable (x) and an output variable
(Y) for each example.

2) True-False: Linear Regression is mainly used for Regression.

A) TRUE
B) FALSE

Solution: (A)

Linear Regression has dependent variables that have continuous values.

3) True-False: It is possible to design a Linear regression algorithm using a neural


network?

A) TRUE
B) FALSE

Solution: (A)

True. A Neural network can be used as a universal approximator, so it can definitely


implement a linear regression algorithm.

4) Which of the following methods do we use to find the best fit line for data in Linear
Regression?

C2 General
A) Least Square Error
B) Maximum Likelihood
C) Logarithmic Loss
D) Both A and B

Solution: (A)

In linear regression, we try to minimize the least square errors of the model to identify the
line of best fit.

5) Which of the following evaluation metrics can be used to evaluate a model while
modeling a continuous output variable?

A) AUC-ROC
B) Accuracy
C) Logloss
D) Mean-Squared-Error

Solution: (D)

Since linear regression gives output as continuous values, so in such case we use mean
squared error metric to evaluate the model performance. Remaining options are use in case
of a classification problem.

6) True-False: Lasso Regularization can be used for variable selection in Linear


Regression.

A) TRUE
B) FALSE

Solution: (A)

True, In case of lasso regression we apply absolute penalty which makes some of the
coefficients zero.

7) Which of the following is true about Residuals ?

A) Lower is better
B) Higher is better
C) A or B depend on the situation
D) None of these

Solution: (A)

C2 General
Residuals refer to the error values of the model. Therefore lower residuals are desired.

8) Suppose that we have N independent variables (X1,X2… Xn) and dependent variable is
Y. Now Imagine that you are applying linear regression by fitting the best fit line using least
square error on this data.

You found that correlation coefficient for one of it’s variable(Say X1) with Y is -0.95.

Which of the following is true for X1?

A) Relation between the X1 and Y is weak


B) Relation between the X1 and Y is strong
C) Relation between the X1 and Y is neutral
D) Correlation can’t judge the relationship

Solution: (B)

The absolute value of the correlation coefficient denotes the strength of the relationship.
Since absolute correlation is very high it means that the relationship is strong between X1
and Y.

9) Looking at above two characteristics, which of the following option is the correct
for Pearson correlation between V1 and V2?

If you are given the two variables V1 and V2 and they are following below two
characteristics.

1. If V1 increases then V2 also increases

2. If V1 decreases then V2 behavior is unknown

A) Pearson correlation will be close to 1


B) Pearson correlation will be close to -1
C) Pearson correlation will be close to 0
D) None of these

Solution: (D)

We cannot comment on the correlation coefficient by using only statement 1. We need to


consider the both of these two statements. Consider V1 as x and V2 as |x|. The correlation
coefficient would not be close to 1 in such a case.

C2 General
10) Suppose Pearson correlation between V1 and V2 is zero. In such case, is it right
to conclude that V1 and V2 do not have any relation between them?

A) TRUE
B) FALSE

Solution: (B)

Pearson correlation coefficient between 2 variables might be zero even when they have a
relationship between them. If the correlation coefficient is zero, it just means that that they
don’t move together. We can take examples like y=|x| or y=x^2.

11) Which of the following offsets, do we use in linear regression’s least square line
fit? Suppose horizontal axis is independent variable and vertical axis is dependent
variable.

A) Vertical offset
B) Perpendicular offset
C) Both, depending on the situation
D) None of above

Solution: (A)

We always consider residuals as vertical offsets. We calculate the direct differences


between actual value and the Y labels. Perpendicular offset are useful in case of PCA.

12) True- False: Overfitting is more likely when you have huge amount of data to
train?

C2 General
A) TRUE
B) FALSE

Solution: (B)

With a small training dataset, it’s easier to find a hypothesis to fit the training data exactly
i.e. overfitting.

13) We can also compute the coefficient of linear regression with the help of an
analytical method called “Normal Equation”. Which of the following is/are true about
Normal Equation?

1. We don’t have to choose the learning rate

2. It becomes slow when number of features is very large

3. Thers is no need to iterate

A) 1 and 2
B) 1 and 3
C) 2 and 3
D) 1,2 and 3

Solution: (D)

Instead of gradient descent, Normal Equation can also be used to find coefficients. Refer
this article for read more about normal equation.

14) Which of the following statement is true about sum of residuals of A and B?

Below graphs show two fitted regression lines (A & B) on randomly generated data. Now, I
want to find the sum of residuals in both cases A and B.

Note:

1. Scale is same in both graphs for both axis.

2. X axis is independent variable and Y-axis is dependent variable.

C2 General
A) A has higher sum of residuals than B
B) A has lower sum of residual than B
C) Both have same sum of residuals
D) None of these

Solution: (C)

Sum of residuals will always be zero, therefore both have same sum of residuals

Question Context 15-17:

Suppose you have fitted a complex regression model on a dataset. Now, you are using
Ridge regression with penality x.

15) Choose the option which describes bias in best manner.


A) In case of very large x; bias is low
B) In case of very large x; bias is high
C) We can’t say about bias
D) None of these

Solution: (B)

If the penalty is very large it means model is less complex, therefore the bias would be high.

16) What will happen when you apply very large penalty?

A) Some of the coefficient will become absolute zero


B) Some of the coefficient will approach zero but not absolute zero
C) Both A and B depending on the situation
D) None of these

C2 General
Solution: (B)

In lasso some of the coefficient value become zero, but in case of Ridge, the coefficients
become close to zero but not zero.

17) What will happen when you apply very large penalty in case of Lasso?
A) Some of the coefficient will become zero
B) Some of the coefficient will be approaching to zero but not absolute zero
C) Both A and B depending on the situation
D) None of these

Solution: (A)

As already discussed, lasso applies absolute penalty, so some of the coefficients will
become zero.

18) Which of the following statement is true about outliers in Linear regression?

A) Linear regression is sensitive to outliers


B) Linear regression is not sensitive to outliers
C) Can’t say
D) None of these

Solution: (A)

The slope of the regression line will change due to outliers in most of the cases. So Linear
Regression is sensitive to outliers.

19) Suppose you plotted a scatter plot between the residuals and predicted values in
linear regression and you found that there is a relationship between them. Which of
the following conclusion do you make about this situation?

A) Since the there is a relationship means our model is not good


B) Since the there is a relationship means our model is good
C) Can’t say
D) None of these

Solution: (A)

C2 General
There should not be any relationship between predicted values and residuals. If there exists
any relationship between them,it means that the model has not perfectly captured the
information in the data.

Question Context 20-22:

Suppose that you have a dataset D1 and you design a linear regression model of degree 3
polynomial and you found that the training and testing error is “0” or in another terms it
perfectly fits the data.

20) What will happen when you fit degree 4 polynomial in linear regression?
A) There are high chances that degree 4 polynomial will over fit the data
B) There are high chances that degree 4 polynomial will under fit the data
C) Can’t say
D) None of these

Solution: (A)

Since is more degree 4 will be more complex(overfit the data) than the degree 3 model so it
will again perfectly fit the data. In such case training error will be zero but test error may not
be zero.

21) What will happen when you fit degree 2 polynomial in linear regression?
A) It is high chances that degree 2 polynomial will over fit the data
B) It is high chances that degree 2 polynomial will under fit the data
C) Can’t say
D) None of these

Solution: (B)

If a degree 3 polynomial fits the data perfectly, it’s highly likely that a simpler model(degree
2 polynomial) might under fit the data.

22) In terms of bias and variance. Which of the following is true when you fit degree 2
polynomial?

A) Bias will be high, variance will be high


B) Bias will be low, variance will be high

C2 General
C) Bias will be high, variance will be low
D) Bias will be low, variance will be low

Solution: (C)

Since a degree 2 polynomial will be less complex as compared to degree 3, the bias will be
high and variance will be low.

Question Context 23:

Which of the following is true about below graphs(A,B, C left to right) between the cost
function and Number of iterations?

23) Suppose l1, l2 and l3 are the three learning rates for A,B,C respectively. Which of
the following is true about l1,l2 and l3?

A) l2 < l1 < l3

B) l1 > l2 > l3
C) l1 = l2 = l3
D) None of these

Solution: (A)

In case of high learning rate, step will be high, the objective function will decrease quickly
initially, but it will not find the global minima and objective function starts increasing after a
few iterations.

In case of low learning rate, the step will be small. So the objective function will decrease
slowly

C2 General
Question Context 24-25:

We have been given a dataset with n records in which we have input attribute as x and
output attribute as y. Suppose we use a linear regression method to model this data. To test
our linear regressor, we split the data in training set and test set randomly.

24) Now we increase the training set size gradually. As the training set size increases,
what do you expect will happen with the mean training error?

A) Increase
B) Decrease
C) Remain constant
D) Can’t Say

Solution: (D)

Training error may increase or decrease depending on the values that are used to fit the
model. If the values used to train contain more outliers gradually, then the error might just
increase.

25) What do you expect will happen with bias and variance as you increase the size
of training data?

A) Bias increases and Variance increases


B) Bias decreases and Variance increases
C) Bias decreases and Variance decreases
D) Bias increases and Variance decreases
E) Can’t Say False

Solution: (D)

As we increase the size of the training data, the bias would increase while the variance
would decrease.

Question Context 26:

Consider the following data where one input(X) and one output(Y) is given.

C2 General
26) What would be the root mean square training error for this data if you run a
Linear Regression model of the form (Y = A0+A1X)?

A) Less than 0
B) Greater than zero
C) Equal to 0
D) None of these

Solution: (C)

We can perfectly fit the line on the following data so mean error will be zero.

Question Context 27-28:

Suppose you have been given the following scenario for training and validation error for
Linear Regression.

Number of Validation
Scenario Learning Rate Training Error
iterations Error
1 0.1 1000 100 110
2 0.2 600 90 105
3 0.3 400 110 110
4 0.4 300 120 130
5 0.4 250 130 150

C2 General
27) Which of the following scenario would give you the right hyper parameter?

A) 1
B) 2
C) 3
D) 4

Solution: (B)

Option B would be the better option because it leads to less training as well as validation
error.

28) Suppose you got the tuned hyper parameters from the previous question. Now,
Imagine you want to add a variable in variable space such that this added feature is
important. Which of the following thing would you observe in such case?

A) Training Error will decrease and Validation error will increase

B) Training Error will increase and Validation error will increase


C) Training Error will increase and Validation error will decrease
D) Training Error will decrease and Validation error will decrease
E) None of the above

Solution: (D)

If the added feature is important, the training and validation error would decrease.

Question Context 29-30:

Suppose, you got a situation where you find that your linear regression model is under
fitting the data.

29) In such situation which of the following options would you consider?

1. I will add more variables

2. I will start introducing polynomial degree variables

3. I will remove some variables

A) 1 and 2
B) 2 and 3
C) 1 and 3
D) 1, 2 and 3

C2 General
Solution: (A)

In case of under fitting, you need to induce more variables in variable space or you can add
some polynomial degree variables to make the model more complex to be able to fir the
data better.

30) Now situation is same as written in previous question(under fitting).Which of


following regularization algorithm would you prefer?

A) L1
B) L2
C) Any
D) None of these

Solution: (D)

I won’t use any regularization methods because regularization is used in case of overfitting.

MCQs ON Linear Regression

1) True-False: Is Logistic regression a supervised machine learning algorithm?

A) TRUE
B) FALSE

Solution: A

True, Logistic regression is a supervised learning algorithm because it uses true labels for
training. Supervised learning algorithm should have input variables (x) and an target
variable (Y) when you train the model .

2) True-False: Is Logistic regression mainly used for Regression?

A) TRUE
B) FALSE

Solution: B

Logistic regression is a classification algorithm, don’t confuse with the name regression.

C2 General
3) True-False: Is it possible to design a logistic regression algorithm using a Neural
Network Algorithm?

A) TRUE
B) FALSE

Solution: A

True, Neural network is a is a universal approximator so it can implement linear regression


algorithm.

4) True-False: Is it possible to apply a logistic regression algorithm on a 3-class


Classification problem?

A) TRUE
B) FALSE

Solution: A

Yes, we can apply logistic regression on 3 classification problem, We can use One Vs all
method for 3 class classification in logistic regression.

5) Which of the following methods do we use to best fit the data in Logistic
Regression?

A) Least Square Error


B) Maximum Likelihood
C) Jaccard distance
D) Both A and B

Solution: B

Logistic regression uses maximum likely hood estimate for training a logistic regression.

6) Which of the following evaluation metrics can not be applied in case of logistic
regression output to compare with target?

C2 General
A) AUC-ROC
B) Accuracy
C) Logloss
D) Mean-Squared-Error

Solution: D

Since, Logistic Regression is a classification algorithm so it’s output can not be real time
value so mean squared error can not use for evaluating it

7) One of the very good methods to analyze the performance of Logistic Regression
is AIC, which is similar to R-Squared in Linear Regression. Which of the following is
true about AIC?

A) We prefer a model with minimum AIC value


B) We prefer a model with maximum AIC value
C) Both but depend on the situation
D) None of these

Solution: A

We select the best model in logistic regression which can least AIC. For more information
refer this source: http://www4.ncsu.edu/~shu3/Presentation/AIC.pdf

8) [True-False] Standardisation of features is required before training a Logistic


Regression.

A) TRUE
B) FALSE

Solution: B

Standardization isn’t required for logistic regression. The main goal of standardizing
features is to help convergence of the technique used for optimization.

9) Which of the following algorithms do we use for Variable Selection?

A) LASSO
B) Ridge

C2 General
C) Both
D) None of these

Solution: A

In case of lasso we apply a absolute penality, after increasing the penality in


lasso some of the coefficient of variables may become zero.

Context: 10-11

Consider a following model for logistic regression: P (y =1|x, w)= g(w0 + w1x)
where g(z) is the logistic function.

In the above equation the P (y =1|x; w) , viewed as a function of x, that we can get by
changing the parameters w.

10) What would be the range of p in such case?

A) (0, inf)
B) (-inf, 0 )
C) (0, 1)
D) (-inf, inf)

Solution: C

For values of x in the range of real number from −∞ to +∞ Logistic function will give the
output between (0,1)

11) In above question what do you think which function would make p between (0,1)?

A) logistic function
B) Log likelihood function
C) Mixture of both
D) None of them

Solution: A

Explanation is same as question number 10

C2 General
Context: 12-13

Suppose you train a logistic regression classifier and your hypothesis function H is

12) Which of the following figure will represent the decision boundary as given by
above classifier?

A)

B)

C)

D)

C2 General
Solution: B

Option B would be the right answer. Since our line will be represented by y = g(-6+x2) which
is shown in the option A and option B. But option B is the right answer because when you
put the value x2 = 6 in the equation then y = g(0) you will get that means y= 0.5 will be on
the line, if you increase the value of x2 greater then 6 you will get negative values so output
will be the region y =0.

13) If you replace coefficient of x1 with x2 what would be the output figure?

A)

B)

C)

D)

C2 General
Solution: D

Same explanation as in previous question.

14) Suppose you have been given a fair coin and you want to find out the odds of
getting heads. Which of the following option is true for such a case?

A) odds will be 0
B) odds will be 0.5
C) odds will be 1
D) None of these

Solution: C

Odds are defined as the ratio of the probability of success and the probability of failure. So
in case of fair coin probability of success is 1/2 and the probability of failure is 1/2 so odd
would be 1

15) The logit function(given as l(x)) is the log of odds function. What could be the
range of logit function in the domain x=[0,1]?

A) (– ∞ , ∞)
B) (0,1)
C) (0, ∞)
D) (- ∞, 0)

Solution: A

For our purposes, the odds function has the advantage of transforming the probability
function, which has values from 0 to 1, into an equivalent function with values between 0
and ∞. When we take the natural log of the odds function, we get a range of values from -∞
to ∞.

C2 General
16) Which of the following option is true?

A) Linear Regression errors values has to be normally distributed but in case of Logistic
Regression it is not the case
B) Logistic Regression errors values has to be normally distributed but in case of Linear
Regression it is not the case
C) Both Linear Regression and Logistic Regression error values have to be normally
distributed
D) Both Linear Regression and Logistic Regression error values have not to be normally
distributed

Solution:A

Only A is true.

17) Which of the following is true regarding the logistic function for any value “x”?

Note:
Logistic(x): is a logistic function of any number “x”

Logit(x): is a logit function of any number “x”

Logit_inv(x): is a inverse logit function of any number “x”

A) Logistic(x) = Logit(x)
B) Logistic(x) = Logit_inv(x)
C) Logit_inv(x) = Logit(x)
D) None of these

Solution: B

C2 General
MCQ For UNIT 2

1) Which of the following statement is true in following case?

A) Feature F1 is an example of nominal variable.


B) Feature F1 is an example of ordinal variable.
C) It doesn’t belong to any of the above category.
D) Both of these

Solution: (B)

Ordinal variables are the variables which has some order in their categories. For example,
grade A should be consider as high grade than grade B.

2) Which of the following is an example of a deterministic algorithm?

A) PCA

B) K-Means

C) None of the above

Solution: (A)

A deterministic algorithm is that in which output does not change on different runs. PCA
would give the same result if we run again, but not k-means.

3) [True or False] A Pearson correlation between two variables is zero but, still their
values can still be related to each other.

A) TRUE

B) FALSE

Solution: (A)

Y=X2. Note that, they are not only associated, but one is a function of the other and
Pearson correlation between them is 0.
4) Which of the following statement(s) is / are true for Gradient Decent (GD) and
Stochastic Gradient Decent (SGD)?

1. In GD and SGD, you update a set of parameters in an iterative manner to


minimize the error function.

2. In SGD, you have to run through all the samples in your training set for a
single update of a parameter in each iteration.

3. In GD, you either use the entire data or a subset of training data to update a
parameter in each iteration.

A) Only 1

B) Only 2

C) Only 3

D) 1 and 2

E) 2 and 3

F) 1,2 and 3

Solution: (A)

In SGD for each iteration you choose the batch which is generally contain the random
sample of data But in case of GD each iteration contain the all of the training observations.

5) Which of the following hyper parameter(s), when increased may cause random
forest to over fit the data?

1. Number of Trees

2. Depth of Tree

3. Learning Rate

A) Only 1

B) Only 2

C) Only 3
D) 1 and 2

E) 2 and 3

F) 1,2 and 3

Solution: (B)

Usually, if we increase the depth of tree it will cause overfitting. Learning rate is not an
hyperparameter in random forest. Increase in the number of tree will cause under fitting.

6) Imagine, you are working with “Analytics Vidhya” and you want to develop a
machine learning algorithm which predicts the number of views on the articles.

Your analysis is based on features like author name, number of articles written by the
same author on Analytics Vidhya in past and a few other features. Which of the
following evaluation metric would you choose in that case?

1. Mean Square Error

2. Accuracy

3. F1 Score

A) Only 1

B) Only 2

C) Only 3

D) 1 and 3

E) 2 and 3

F) 1 and 2

Solution:(A)

You can think that the number of views of articles is the continuous target variable which fall
under the regression problem. So, mean squared error will be used as an evaluation
metrics.
7) Given below are three images (1,2,3). Which of the following option is correct for
these images?

A)

B)
C)
A) 1 is tanh, 2 is ReLU and 3 is SIGMOID activation functions.

B) 1 is SIGMOID, 2 is ReLU and 3 is tanh activation functions.

C) 1 is ReLU, 2 is tanh and 3 is SIGMOID activation functions.

D) 1 is tanh, 2 is SIGMOID and 3 is ReLU activation functions.

Solution: (D)

The range of SIGMOID function is [0,1].

The range of the tanh function is [-1,1].

The range of the RELU function is [0, infinity].

So Option D is the right answer.

8) Below are the 8 actual values of target variable in the train file.

[0,0,0,1,1,1,1,1]

What is the entropy of the target variable?

A) -(5/8 log(5/8) + 3/8 log(3/8))

B) 5/8 log(5/8) + 3/8 log(3/8)


C) 3/8 log(5/8) + 5/8 log(3/8)

D) 5/8 log(3/8) – 3/8 log(5/8)

Solution: (A)

The formula for entropy is

So the answer is A.

9) Let’s say, you are working with categorical feature(s) and you have not looked at
the distribution of the categorical variable in the test data.

You want to apply one hot encoding (OHE) on the categorical feature(s). What
challenges you may face if you have applied OHE on a categorical variable of train
dataset?

A) All categories of categorical variable are not present in the test dataset.

B) Frequency distribution of categories is different in train as compared to the test dataset.

C) Train and Test always have same distribution.

D) Both A and B

E) None of these

Solution: (D)

Both are true, The OHE will fail to encode the categories which is present in test but not in
train so it could be one of the main challenges while applying OHE. The challenge given in
option B is also true you need to more careful while applying OHE if frequency distribution
doesn’t same in train and test.

10) Skip gram model is one of the best models used in Word2vec algorithm for words
embedding. Which one of the following models depict the skip gram model?
A) A

B) B

C) Both A and B

D) None of these

Solution: (B)

Both models (model1 and model2) are used in Word2vec algorithm. The model1 represent
a CBOW model where as Model2 represent the Skip gram model.

11) Let’s say, you are using activation function X in hidden layers of neural network.
At a particular neuron for any given input, you get the output as “-0.0001”. Which of
the following activation function could X represent?

A) ReLU

B) tanh

C) SIGMOID

D) None of these
Solution: (B)

The function is a tanh because the this function output range is between (-1,-1).

12) [True or False] LogLoss evaluation metric can have negative values.

A) TRUE
B) FALSE

Solution: (B)

Log loss cannot have negative values.

13) Which of the following statements is/are true about “Type-1” and “Type-2” errors?

1. Type1 is known as false positive and Type2 is known as false negative.

2. Type1 is known as false negative and Type2 is known as false positive.

3. Type1 error occurs when we reject a null hypothesis when it is actually true.

A) Only 1

B) Only 2

C) Only 3

D) 1 and 2

E) 1 and 3

F) 2 and 3

Solution: (E)

In statistical hypothesis testing, a type I error is the incorrect rejection of a true null
hypothesis (a “false positive”), while a type II error is incorrectly retaining a false null
hypothesis (a “false negative”).
14) Which of the following is/are one of the important step(s) to pre-process the text
in NLP based projects?

1. Stemming

2. Stop word removal

3. Object Standardization

A) 1 and 2

B) 1 and 3

C) 2 and 3

D) 1,2 and 3

Solution: (D)

Stemming is a rudimentary rule-based process of stripping the suffixes (“ing”, “ly”, “es”, “s”
etc) from a word.

Stop words are those words which will have not relevant to the context of the data for
example is/am/are.

Object Standardization is also one of the good way to pre-process the text.

15) Suppose you want to project high dimensional data into lower dimensions. The
two most famous dimensionality reduction algorithms used here are PCA and t-SNE.
Let’s say you have applied both algorithms respectively on data “X” and you got the
datasets “X_projected_PCA” , “X_projected_tSNE”.

Which of the following statements is true for “X_projected_PCA” &


“X_projected_tSNE” ?

A) X_projected_PCA will have interpretation in the nearest neighbour space.

B) X_projected_tSNE will have interpretation in the nearest neighbour space.

C) Both will have interpretation in the nearest neighbour space.

D) None of them will have interpretation in the nearest neighbour space.


Solution: (B)

t-SNE algorithm consider nearest neighbour points to reduce the dimensionality of the data.
So, after using t-SNE we can think that reduced dimensions will also have interpretation in
nearest neighbour space. But in case of PCA it is not the case.

Context: 16-17

Given below are three scatter plots for two features (Image 1, 2 & 3 from left to right).

16) In the above images, which of the following is/are example of multi-collinear
features?

A) Features in Image 1

B) Features in Image 2

C) Features in Image 3

D) Features in Image 1 & 2

E) Features in Image 2 & 3

F) Features in Image 3 & 1

Solution: (D)

In Image 1, features have high positive correlation where as in Image 2 has high negative
correlation between the features so in both images pair of features are the example of
multicollinear features.
17) In previous question, suppose you have identified multi-collinear features. Which
of the following action(s) would you perform next?

1. Remove both collinear variables.

2. Instead of removing both variables, we can remove only one variable.

3. Removing correlated variables might lead to loss of information. In order to


retain those variables, we can use penalized regression models like ridge or
lasso regression.

A) Only 1

B)Only 2

C) Only 3

D) Either 1 or 3

E) Either 2 or 3

Solution: (E)

You cannot remove the both features because after removing the both features you will
lose all of the information so you should either remove the only 1 feature or you can use the
regularization algorithm like L1 and L2.

18) Adding a non-important feature to a linear regression model may result in.

1. Increase in R-square

2. Decrease in R-square

A) Only 1 is correct

B) Only 2 is correct

C) Either 1 or 2

D) None of these

Solution: (A)
After adding a feature in feature space, whether that feature is important or unimportant
features the R-squared always increase.

19) Suppose, you are given three variables X, Y and Z. The Pearson correlation
coefficients for (X, Y), (Y, Z) and (X, Z) are C1, C2 & C3 respectively.

Now, you have added 2 in all values of X (i.enew values become X+2), subtracted 2
from all values of Y (i.e. new values are Y-2) and Z remains the same. The new
coefficients for (X,Y), (Y,Z) and (X,Z) are given by D1, D2 & D3 respectively. How do
the values of D1, D2 & D3 relate to C1, C2 & C3?

A) D1= C1, D2 < C2, D3 > C3

B) D1 = C1, D2 > C2, D3 > C3

C) D1 = C1, D2 > C2, D3 < C3

D) D1 = C1, D2 < C2, D3 < C3

E) D1 = C1, D2 = C2, D3 = C3

F) Cannot be determined

Solution: (E)

Correlation between the features won’t change if you add or subtract a value in the
features.

20) Imagine, you are solving a classification problems with highly imbalanced class.
The majority class is observed 99% of times in the training data.

Your model has 99% accuracy after taking the predictions on test data. Which of the
following is true in such a case?

1. Accuracy metric is not a good idea for imbalanced class problems.

2. Accuracy metric is a good idea for imbalanced class problems.

3. Precision and recall metrics are good for imbalanced class problems.

4. Precision and recall metrics aren’t good for imbalanced class problems.
A) 1 and 3

B) 1 and 4

C) 2 and 3

D) 2 and 4

Solution: (A)

Refer the question number 4 from in this article.

21) In ensemble learning, you aggregate the predictions for weak learners, so that an
ensemble of these models will give a better prediction than prediction of individual
models.

Which of the following statements is / are true for weak learners used in ensemble
model?

1. They don’t usually overfit.

2. They have high bias, so they cannot solve complex learning problems

3. They usually overfit.

A) 1 and 2

B) 1 and 3

C) 2 and 3

D) Only 1

E) Only 2

F) None of the above

Solution: (A)

Weak learners are sure about particular part of a problem. So, they usually don’t overfit
which means that weak learners have low variance and high bias.
22) Which of the following options is/are true for K-fold cross-validation?

1. Increase in K will result in higher time required to cross validate the result.

2. Higher values of K will result in higher confidence on the cross-validation


result as compared to lower value of K.

3. If K=N, then it is called Leave one out cross validation, where N is the number
of observations.

A) 1 and 2

B) 2 and 3

C) 1 and 3

D) 1,2 and 3

Solution: (D)

Larger k value means less bias towards overestimating the true expected error (as training
folds will be closer to the total dataset) and higher running time (as you are getting closer to
the limit case: Leave-One-Out CV). We also need to consider the variance between the k
folds accuracy while selecting the k.

Question Context 23-24

Cross-validation is an important step in machine learning for hyper parameter tuning.


Let’s say you are tuning a hyper-parameter “max_depth” for GBM by selecting it from
10 different depth values (values are greater than 2) for tree based model using 5-fold
cross validation.

Time taken by an algorithm for training (on a model with max_depth 2) 4-fold is 10
seconds and for the prediction on remaining 1-fold is 2 seconds.

Note: Ignore hardware dependencies from the equation.

23) Which of the following option is true for overall execution time for 5-fold cross
validation with 10 different values of “max_depth”?

A) Less than 100 seconds


B) 100 – 300 seconds

C) 300 – 600 seconds

D) More than or equal to 600 seconds

C) None of the above

D) Can’t estimate

Solution: (D)

Each iteration for depth “2” in 5-fold cross validation will take 10 secs for training and 2
second for testing. So, 5 folds will take 12*5 = 60 seconds. Since we are searching over the
10 depth values so the algorithm would take 60*10 = 600 seconds. But training and testing
a model on depth greater than 2 will take more time than depth “2” so overall timing would
be greater than 600.

24) In previous question, if you train the same algorithm for tuning 2 hyper
parameters say “max_depth” and “learning_rate”.

You want to select the right value against “max_depth” (from given 10 depth values)
and learning rate (from given 5 different learning rates). In such cases, which of the
following will represent the overall time?

A) 1000-1500 second

B) 1500-3000 Second

C) More than or equal to 3000 Second

D) None of these

Solution: (D)

Same as question number 23.

25) Given below is a scenario for training error TE and Validation error VE for a
machine learning algorithm M1. You want to choose a hyperparameter (H) based on
TE and VE.
H TE VE
1 105 90
2 200 85
3 250 96
4 105 85
5 300 100
Which value of H will you choose based on the above table?

A) 1

B) 2

C) 3

D) 4

E) 5

Solution: (D)

Looking at the table, option D seems the best

26) What would you do in PCA to get the same projection as SVD?

A) Transform data to zero mean

B) Transform data to zero median

C) Not possible

D) None of these

Solution: (A)

When the data has a zero mean vector PCA will have same projections as SVD, otherwise
you have to centre the data first before taking SVD.
Question Context 27-28

Assume there is a black box algorithm, which takes training data with multiple
observations (t1, t2, t3,…….. tn) and a new observation (q1). The black box outputs
the nearest neighbor of q1 (say ti) and its corresponding class label ci.

You can also think that this black box algorithm is same as 1-NN (1-nearest
neighbor).

27) It is possible to construct a k-NN classification algorithm based on this black box
alone.

Note: Where n (number of training observations) is very large compared to k.

A) TRUE

B) FALSE

Solution: (A)

In first step, you pass an observation (q1) in the black box algorithm so this algorithm would
return a nearest observation and its class.

In second step, you through it out nearest observation from train data and again input the
observation (q1). The black box algorithm will again return the a nearest observation and it’s
class.

You need to repeat this procedure k times

28) Instead of using 1-NN black box we want to use the j-NN (j>1) algorithm as black
box. Which of the following option is correct for finding k-NN using j-NN?

1. J must be a proper factor of k

2. J > k

3. Not possible

A) 1

B) 2

C) 3
Solution: (A)

Same as question number 27

29) Suppose you are given 7 Scatter plots 1-7 (left to right) and you want to compare
Pearson correlation coefficients between variables of each scatterplot.

Which of the following is in the right order?

1. 1<2<3<4

2. 1>2>3 > 4

3. 7<6<5<4

4. 7>6>5>4

A) 1 and 3

B) 2 and 3

C) 1 and 4

D) 2 and 4

Solution: (B)

from image 1to 4 correlation is decreasing (absolute value). But from image 4 to 7
correlation is increasing but values are negative (for example, 0, -0.3, -0.7, -0.99).

30) You can evaluate the performance of a binary class classification problem using
different metrics such as accuracy, log-loss, F-Score. Let’s say, you are using the
log-loss function as evaluation metric.

Which of the following option is / are true for interpretation of log-loss as an


evaluation metric?
1.
If a classifier is confident about an incorrect classification, then log-loss will penalise
it heavily.

2. For a particular observation, the classifier assigns a very small probability for the
correct class then the corresponding contribution to the log-loss will be very large.

3. Lower the log-loss, the better is the model.

A) 1 and 3

B) 2 and 3

C) 1 and 2

D) 1,2 and 3

Solution: (D)

Options are self-explanatory.

Question 31-32

Below are five samples given in the dataset.

Note: Visual distance between the points in the image represents the actual distance.

31) Which of the following is leave-one-out cross-validation accuracy for 3-NN


(3-nearest neighbor)?
A) 0

D) 0.4

C) 0.8

D) 1

Solution: (C)

In Leave-One-Out cross validation, we will select (n-1) observations for training and 1
observation of validation. Consider each point as a cross validation point and then find the 3
nearest point to this point. So if you repeat this procedure for all points you will get the
correct classification for all positive class given in the above figure but negative class will be
misclassified. Hence you will get 80% accuracy.

32) Which of the following value of K will have least leave-one-out cross validation
accuracy?

A) 1NN

B) 3NN

C) 4NN

D) All have same leave one out error

Solution: (A)

Each point which will always be misclassified in 1-NN which means that you will get the 0%
accuracy.

33) Suppose you are given the below data and you want to apply a logistic regression
model for classifying it in two given classes.
You are using logistic regression with L1 regularization.

Where C is the regularization


parameter and w1 & w2 are the coefficients of x1 and x2.

Which of the following option is correct when you increase the value of C from zero to a
very large value?

A) First w2 becomes zero and then w1 becomes zero

B) First w1 becomes zero and then w2 becomes zero


C) Both becomes zero at the same time

D) Both cannot be zero even after very large value of C

Solution: (B)

By looking at the image, we see that even on just using x2, we can efficiently perform
classification. So at first w1 will become 0. As regularization parameter increases more, w2
will come more and more closer to 0.

34) Suppose we have a dataset which can be trained with 100% accuracy with help of
a decision tree of depth 6. Now consider the points below and choose the option
based on these points.

Note: All other hyper parameters are same and other factors are not affected.

1. Depth 4 will have high bias and low variance

2. Depth 4 will have low bias and low variance

A) Only 1

B) Only 2

C) Both 1 and 2

D) None of the above

Solution: (A)

If you fit decision tree of depth 4 in such data means it will more likely to underfit the data.
So, in case of underfitting you will have high bias and low variance.

35) Which of the following options can be used to get global minima in k-Means
Algorithm?

1. Try to run algorithm for different centroid initialization

2. Adjust number of iterations

3. Find out the optimal number of clusters

A) 2 and 3
B) 1 and 3

C) 1 and 2

D) All of above

Solution: (D)

All of the option can be tuned to find the global minima.

36) Imagine you are working on a project which is a binary classification problem.
You trained a model on training dataset and get the below confusion matrix on
validation dataset.

Based on the above confusion matrix, choose which option(s) below will give you
correct predictions?

1. Accuracy is ~0.91

2. Misclassification rate is ~ 0.91

3. False positive rate is ~0.95

4. True positive rate is ~0.95

A) 1 and 3

B) 2 and 4

C) 1 and 4
D) 2 and 3

Solution: (C)

The Accuracy (correct classification) is (50+100)/165 which is nearly equal to 0.91.

The true Positive Rate is how many times you are predicting positive class correctly so true
positive rate would be 100/105 = 0.95 also known as “Sensitivity” or “Recall”

37) For which of the following hyperparameters, higher value is better for decision
tree algorithm?

1. Number of samples used for split

2. Depth of tree

3. Samples for leaf

A)1 and 2

B) 2 and 3

C) 1 and 3

D) 1, 2 and 3

E) Can’t say

Solution: (E)

For all three options A, B and C, it is not necessary that if you increase the value of
parameter the performance may increase. For example, if we have a very high value of
depth of tree, the resulting tree may overfit the data, and would not generalize well. On the
other hand, if we have a very low value, the tree may underfit the data. So, we can’t say for
sure that “higher is better”.

Context 38-39

Imagine, you have a 28 * 28 image and you run a 3 * 3 convolution neural network on
it with the input depth of 3 and output depth of 8.

Note: Stride is 1 and you are using same padding.


38) What is the dimension of output feature map when you are using the given
parameters.

A) 28 width, 28 height and 8 depth

B) 13 width, 13 height and 8 depth

C) 28 width, 13 height and 8 depth

D) 13 width, 28 height and 8 depth

Solution: (A)

The formula for calculating output size is

output size = (N – F)/S + 1

where, N is input size, F is filter size and S is stride.

Read this article to get a better understanding.

39) What is the dimensions of output feature map when you are using following
parameters.

A) 28 width, 28 height and 8 depth

B) 13 width, 13 height and 8 depth

C) 28 width, 13 height and 8 depth

D) 13 width, 28 height and 8 depth

Solution: (B)

Same as above

40) Suppose, we were plotting the visualization for different values of C (Penalty
parameter) in SVM algorithm. Due to some reason, we forgot to tag the C values with
visualizations. In that case, which of the following option best explains the C values
for the images below (1,2,3 left to right, so C values are C1 for image1, C2 for image2
and C3 for image3 ) in case of rbf kernel.
A) C1 = C2 = C3

B) C1 > C2 > C3

C) C1 < C2 < C3

D) None of these

Solution: (C)
MCQ questions for unit 4: Naïve Bayes and Support Vector Machine

1. 1. How many terms are required for building a bayes model?


a) 1
b) 2
c) 3
d) 4 Answer: c
Explanation: The three required terms are a conditional probability and two unconditional
probability.
2. 2. What is needed to make probabilistic systems feasible in the world?
a) Reliability
b) Crucial robustness
c) Feasibility
d) None of the mentioned Answer: b
Explanation: On a model-based knowledge provides the crucial robustness needed to
make probabilistic system feasible in the real world.
3. 3. Where does the bayes rule can be used?
a) Solving queries
b) Increasing complexity
c) Decreasing complexity
d) Answering probabilistic query Answer: d
Explanation: Bayes rule can be used to answer the probabilistic queries conditioned on
one piece of evidence.
4. 4. What does the bayesian network provides?
a) Complete description of the domain
b) Partial description of the domain
c) Complete description of the problem
d) None of the mentioned Answer: a
Explanation: A Bayesian network provides a complete description of the domain.
5. 5. How the entries in the full joint probability distribution can be calculated?
a) Using variables
b) Using information
c) Both Using variables & information
d) None of the mentioned Answer: b
Explanation: Every entry in the full joint probability distribution can be calculated from
the information in the network
6. 6. How the bayesian network can be used to answer any query?
a) Full distribution
b) Joint distribution
c) Partial distribution
d) All of the mentioned Answer: b
Explanation: If a bayesian network is a representation of the joint distribution, then it can
solve any query, by summing all the relevant joint entries.
7. 7. How the compactness of the bayesian network can be described?
a) Locally structured
b) Fully structured
c) Partial structure
d) All of the mentioned Answer: a
Explanation: The compactness of the bayesian network is an example of a very general
property of a locally structured system.
8. 8. To which does the local structure is associated?
a) Hybrid
b) Dependant
c) Linear
d) None of the mentioned Answer: c
Explanation: Local structure is usually associated with linear rather than exponential
growth in complexity.
9. 9. Which condition is used to influence a variable directly by all the others?
a) Partially connected
b) Fully connected
c) Local connected
d) None of the mentioned Answer: b
Explanation: None.
10. 10. What is the consequence between a node and its predecessors while creating bayesian
network?
a) Functionally dependent
b) Dependant
c) Conditionally independent
d) Both Conditionally dependant & Dependant Answer: c
Explanation: The semantics to derive a method for constructing bayesian networks were
led to the consequence that a node can be conditionally independent of its predecessors.

11. What do you mean by generalization error in terms of the SVM?

12. A) How far the hyperplane is from the support vectors


B) How accurately the SVM can predict outcomes for unseen data
C) The threshold amount of error in an SVM

Solution: B

Generalisation error in statistics is generally the out-of-sample error which is the measure
of how accurately a model can predict values for previously unseen data.
13. The minimum time complexity for training an SVM is O(n2). According to this fact, what
sizes of datasets are not best suited for SVM’s?

A) Large datasets
B) Small datasets
C) Medium sized datasets
D) Size does not matter

Solution: A

Datasets which have a clear classification boundary will function best with SVM’s.

14. The effectiveness of an SVM depends upon:

A) Selection of Kernel
B) Kernel Parameters
C) Soft Margin Parameter C
D) All of the above

Solution: D

The SVM effectiveness depends upon how you choose the basic 3 requirements mentioned
above in such a way that it maximises your efficiency, reduces error and overfitting.

14. The SVM’s are less effective when:

A) The data is linearly separable


B) The data is clean and ready to use
C) The data is noisy and contains overlapping points

Solution: C

When the data has noise and overlapping points, there is a problem in drawing a clear hyperplane
without misclassifying.

15. Suppose you are using RBF kernel in SVM with high Gamma value. What does this
signify?

A) The model would consider even far away points from hyperplane for modeling
B) The model would consider only the points close to the hyperplane for modeling
C) The model would not be affected by distance of points from hyperplane for modeling
D) None of the above

Solution: B
The gamma parameter in SVM tuning signifies the influence of points either near or far away
from the hyperplane.

For a low gamma, the model will be too constrained and include all points of the training dataset,
without really capturing the shape.

For a higher gamma, the model will capture the shape of the dataset well.

16. The cost parameter in the SVM means:

A) The number of cross-validations to be made


B) The kernel to be used
C) The tradeoff between misclassification and simplicity of the model
D) None of the above

Solution: C

The cost parameter decides how much an SVM should be allowed to “bend” with the data. For a
low cost, you aim for a smooth decision surface and for a higher cost, you aim to classify more
points correctly. It is also simply referred to as the cost of misclassification.

17. Which of the following are real world applications of the SVM?

A) Text and Hypertext Categorization


B) Image Classification
C) Clustering of News Articles
D) All of the above

Solution: D

SVM’s are highly versatile models that can be used for practically all real world problems
ranging from regression to clustering and handwriting recognitions.

18. We usually use feature normalization before using the Gaussian kernel in SVM. What is
true about feature normalization?

1. We do feature normalization so that new feature will dominate other


2. Some times, feature normalization is not feasible in case of categorical variables
3. Feature normalization always helps when we use Gaussian kernel in SVM

A) 1
B) 1 and 2
C) 1 and 3
D) 2 and 3

Solution: B
Statements one and two are correct.

19. What is/are true about kernel in SVM?

1. Kernel function map low dimensional data to high dimensional space


2. It’s a similarity function

A) 1
B) 2
C) 1 and 2
D) None of these

Solution: C

Both the given statements are correct.

1. Which of the following is a widely used and effective machine learning algorithm based on
the idea of bagging?
a. Decision Tree
b. Regression
c. Classification
d. Random Forest - answer
2. To find the minimum or the maximum of a function, we set the gradient to zero because:
a. The value of the gradient at extrema of a function is always zero - answer
b. Depends on the type of problem
c. Both A and B
d. None of the above
3. The most widely used metrics and tools to assess a classification model are:
a. Confusion matrix
b. Cost-sensitive accuracy
c. Area under the ROC curve
d. All of the above - answer
4. Which of the following is a good test dataset characteristic?
a. Large enough to yield meaningful results
b. Is representative of the dataset as a whole
c. Both A and B - answer
d. None of the above
5. Which of the following is a disadvantage of decision trees?
a. Factor analysis
b. Decision trees are robust to outliers
c. Decision trees are prone to be overfit - answer
d. None of the above
6. How do you handle missing or corrupted data in a dataset?
a. Drop missing rows or columns
b. Replace missing values with mean/median/mode
c. Assign a unique category to missing values
d. All of the above - answer
7. What is the purpose of performing cross-validation?
a. To assess the predictive performance of the models
b. To judge how the trained model performs outside the sample on test data
c. Both A and B - answer
8. Why is second order differencing in time series needed?
a. To remove stationarity
b. To find the maxima or minima at the local point
c. Both A and B - answer
d. None of the above
9. When performing regression or classification, which of the following is the correct way to
preprocess the data?
a. Normalize the data → PCA → training - answer
b. PCA → normalize PCA output → training
c. Normalize the data → PCA → normalize PCA output → training
d. None of the above
10. Which of the folllowing is an example of feature extraction?
a. Constructing bag of words vector from an email
b. Applying PCA projects to a large high-dimensional data
c. Removing stopwords in a sentence
d. All of the above - answer
11. What is pca.components_ in Sklearn?
a. Set of all eigen vectors for the projection space - answer
b. Matrix of principal components
c. Result of the multiplication matrix
d. None of the above options
12. Which of the following is true about Naive Bayes ?
a. Assumes that all the features in a dataset are equally important
b. Assumes that all the features in a dataset are independent
c. Both A and B - answer
d. None of the above options
13. Which of the following statements about regularization is not correct?
a. Using too large a value of lambda can cause your hypothesis to underfit the data.
b. Using too large a value of lambda can cause your hypothesis to overfit the data.
c. Using a very large value of lambda cannot hurt the performance of your hypothesis.
d. None of the above - answer
14. How can you prevent a clustering algorithm from getting stuck in bad local optima?
a. Set the same seed value for each run
b. Use multiple random initializations - answer
c. Both A and B
d. None of the above
15. Which of the following techniques can be used for normalization in text mining?
a. Stemming
b. Lemmatization
c. Stop Word Removal
d. Both A and B - answer
16. In which of the following cases will K-means clustering fail to give good results? 1) Data
points with outliers 2) Data points with different densities 3) Data points with nonconvex
shapes
a. 1 and 2
b. 2 and 3
c. 1, 2, and 3 - answer
d. 1 and 3
17. Which of the following is a reasonable way to select the number of principal components
"k"?
a. Choose k to be the smallest value so that at least 99% of the varinace is retained. -
answer
b. Choose k to be 99% of m (k = 0.99*m, rounded to the nearest integer).
c. Choose k to be the largest value so that 99% of the variance is retained.
d. Use the elbow method
18. You run gradient descent for 15 iterations with a=0.3 and compute J(theta) after each
iteration. You find that the value of J(Theta) decreases quickly and then levels off. Based on
this, which of the following conclusions seems most plausible?
a. Rather than using the current value of a, use a larger value of a (say a=1.0)
b. Rather than using the current value of a, use a smaller value of a (say a=0.1)
c. a=0.3 is an effective choice of learning rate- answer
d. None of the above
19. What is a sentence parser typically used for?
a. It is used to parse sentences to check if they are utf-8 compliant.
b. It is used to parse sentences to derive their most likely syntax tree structures. -
answer
c. It is used to parse sentences to assign POS tags to all tokens.
d. It is used to check if sentences can be parsed into meaningful tokens.
20. Suppose you have trained a logistic regression classifier and it outputs a new example x
with a prediction ho(x) = 0.2. This means
a. Our estimate for P(y=1 | x)
b. Our estimate for P(y=0 | x) - answer
c. Our estimate for P(y=1 | x)
d. Our estimate for P(y=0 | x)

1) If you remove the following any one red points from the data. Does the
decision boundary will change?
A) Yes
B) No
Solution: A
These three examples are positioned such that removing any one of them introduces slack
in the constraints. So the decision boundary would completely change.

21. [True or False] If you remove the non-red circled points from the data, the decision
boundary will change?
A) True
B) False
Solution: B
On the other hand, rest of the points in the data won’t affect the decision boundary much.

22. What do you mean by generalization error in terms of the SVM?


A) How far the hyperplane is from the support vectors
B) How accurately the SVM can predict outcomes for unseen data
C) The threshold amount of error in an SVM
Solution: B
Generalisation error in statistics is generally the out-of-sample error which is the measure
of how accurately a model can predict values for previously unseen data.

23. When the C parameter is set to infinite, which of the following holds true?
A) The optimal hyperplane if exists, will be the one that completely separates the data
B) The soft-margin classifier will separate the data
C) None of the above
Solution: A
At such a high level of misclassification penalty, soft margin will not hold existence as
there will be no room for error.

24. What do you mean by a hard margin?


A) The SVM allows very low error in classification
B) The SVM allows high amount of error in classification
C) None of the above
Solution: A
A hard margin means that an SVM is very rigid in classification and tries to work
extremely well in the training set, causing overfitting.

25. The minimum time complexity for training an SVM is O(n2). According to this fact, what
sizes of datasets are not best suited for SVM’s?
A) Large datasets
B) Small datasets
C) Medium sized datasets
D) Size does not matter
Solution: A
Datasets which have a clear classification boundary will function best with SVM’s.

26. The effectiveness of an SVM depends upon:


A) Selection of Kernel
B) Kernel Parameters
C) Soft Margin Parameter C
D) All of the above
Solution: D
The SVM effectiveness depends upon how you choose the basic 3 requirements
mentioned above in such a way that it maximises your efficiency, reduces error and
overfitting.
27. Support vectors are the data points that lie closest to the decision surface.
A) TRUE
B) FALSE
Solution: A
They are the points closest to the hyperplane and the hardest ones to classify. They also
have a direct bearing on the location of the decision surface.

28. The SVM’s are less effective when:


A) The data is linearly separable
B) The data is clean and ready to use
C) The data is noisy and contains overlapping points
Solution: C
When the data has noise and overlapping points, there is a problem in drawing a clear
hyperplane without misclassifying.

29. Suppose you are using RBF kernel in SVM with high Gamma value. What does this
signify?
A) The model would consider even far away points from hyperplane for modeling
B) The model would consider only the points close to the hyperplane for modeling
C) The model would not be affected by distance of points from hyperplane for
modeling
D) None of the above
Solution: B
The gamma parameter in SVM tuning signifies the influence of points either near or far
away from the hyperplane

For a low gamma, the model will be too constrained and include all points of the training
dataset, without really capturing the shape.
For a higher gamma, the model will capture the shape of the dataset well.

30. The cost parameter in the SVM means:


A) The number of cross-validations to be made
B) The kernel to be used
C) The tradeoff between misclassification and simplicity of the model
D) None of the above
Solution: C
The cost parameter decides how much an SVM should be allowed to “bend” with the
data. For a low cost, you aim for a smooth decision surface and for a higher cost, you aim
to classify more points correctly. It is also simply referred to as the cost of
misclassification.

31. 12)Suppose you are building a SVM model on data X. The data X can be error prone
which means that you should not trust any specific data point too much. Now think that
you want to build a SVM model which has quadratic kernel function of polynomial
degree 2 that uses Slack variable C as one of it’s hyper parameter. Based upon that give
the answer for following question.
What would happen when you use very large value of C(C->infinity)?
Note: For small C was also classifying all data points correctly

A) We can still classify data correctly for given setting of hyper parameter C
B) We can not classify data correctly for given setting of hyper parameter C
C) Can’t Say
D) None of these
Solution: A
For large values of C, the penalty for misclassifying points is very high, so the decision
boundary will perfectly separate the data if possible.

32. What would happen when you use very small C (C~0)?
A) Misclassification would happen
B) Data will be correctly classified
C) Can’t say
D) None of these
Solution: A
The classifier can maximize the margin between most of the points, while misclassifying
a few points, because the penalty is so low.

33. If I am using all features of my dataset and I achieve 100% accuracy on my training set,
but ~70% on validation set, what should I look out for?
A) Underfitting
B) Nothing, the model is perfect
C) Overfitting
Solution: C
If we’re achieving 100% training accuracy very easily, we need to check to verify if
we’re overfitting our data.

34. Which of the following are real world applications of the SVM?
A) Text and Hypertext Categorization
B) Image Classification
C) Clustering of News Articles
D) All of the above
Solution: D
SVM’s are highly versatile models that can be used for practically all real world problems
ranging from regression to clustering and handwriting recognitions.

Question Context: 16 – 18
Suppose you have trained an SVM with linear decision boundary after training SVM, you
correctly infer that your SVM model is under fitting.
35. Which of the following option would you more likely to consider iterating SVM next
time?
A) You want to increase your data points
B) You want to decrease your data points
C) You will try to calculate more variables
D) You will try to reduce the features
Solution: C
The best option here would be to create more features for the model.

36. Suppose you gave the correct answer in previous question. What do you think that is
actually happening?
1.We are lowering the bias
2. We are lowering the variance
3. We are increasing the bias
4. We are increasing the variance

A) 1 and 2
B) 2 and 3
C) 1 and 4
D) 2 and 4
Solution: C
Better model will lower the bias and increase the variance

37. In above question suppose you want to change one of it’s(SVM) hyperparameter so that
effect would be same as previous questions i.e model will not under fit?
A) We will increase the parameter C
B) We will decrease the parameter C
C) Changing in C don’t effect
D) None of these
Solution: A
Increasing C parameter would be the right thing to do here, as it will ensure regularized
model

38. We usually use feature normalization before using the Gaussian kernel in SVM. What is
true about feature normalization?
1.We do feature normalization so that new feature will dominate other
2. Some times, feature normalization is not feasible in case of categorical variables
3. Feature normalization always helps when we use Gaussian kernel in SVM
A) 1
B) 1 and 2
C) 1 and 3
D) 2 and 3
Solution: B
Statements one and two are correct.

Question Context: 20-22


Suppose you are dealing with 4 class classification problem and you want to train a SVM
model on the data for that you are using One-vs-all method. Now answer the below
questions?
39. How many times we need to train our SVM model in such case?
A) 1
B) 2
C) 3
D) 4
Solution: D
For a 4 class problem, you would have to train the SVM at least 4 times if you are using a
one-vs-all method.

40. Suppose you have same distribution of classes in the data. Now, say for training 1 time in
one vs all setting the SVM is taking 10 second. How many seconds would it require to
train one-vs-all method end to end?
A) 20
B) 40
C) 60
D) 80
Solution: B
It would take 10×4 = 40 seconds

41. Suppose your problem has changed now. Now, data has only 2 classes. What would you
think how many times we need to train SVM in such case?
A) 1
B) 2
C) 3
D) 4
Solution: A
Training the SVM only one time would give you appropriate results
1. Support Vector Machine works well with,
a) Linear Scenarios
b) Non-linear Scenarios
c) Both of these
d) None of these

Answer: c) Both of these

2. Which of the following is best for MNIST dataset classification,


a) Naïve Bayes
b) Support Vector Machines
c) Random forest
d) Decision tree

Answer: b) Support Vector Machines

3. Two classes separated by a margin with two boundaries are called as,
a) Linear Vectors
b) Support Vectors
c) Test Vectors
d) None of these

Answer: b) Support Vectors

4. Scikit-learn supports which kernels,


a) Polynomial kernels
b) Sigmoid kernels
c) Custom kernels
d) All of these

Answer: d) All of these

5. Which of the following is the default kernel used in SVM,


a) Polynomial kernel
b) Sigmoid kernel
c) Custom kernel
d) Radial Basis Function

Answer: d) Radial Basis Function

6. The gamma parameter in RBF determines,


a) Amplitude of the function
b) Altitude of the function
c) Complexity of the function
d) None of these

Answer: a) Amplitude of the function


7. Scikit-learn allows us to create which kernel as a normal python function,
a) Polynomial kernel
b) Custom kernel
c) Sigmoid kernel
d) All of these

Answer: b) Custom kernel

8. To find out a trade-off between precision and number of support vectors, scikit-learn provides
an implementation called as,
a) NuSVC
b) BuSVC
c) MuSVC
d) AuSVC

Answer: a) NuSVC

9. The RBF kernel is based on the function:

a)

b)

c)
d) None of these

Answer: a)

10. The polynomial kernel is based on the function:

a)

b)

c)
d) None of these

Answer: b)
11. The sigmoid kernel is based on this function:

a)

b)

c)
d) None of these

Answer: c)

12. What is/are true about kernel in SVM,


1. It maps low dimensional data to high dimensional data.
2. It is a similarity function.

a) 1
b) 2
c) Both 1 and 2
d) None of these

Answer: c) Both 1 and 2

13. Which type of classifier is SVM,


a) Discriminative
b) Generative
c) Both
d) None of these

Answer: a) Discriminative

14. SVM is used to solve which type of problems,


a) Classification
b) Regression
c) Clustering
d) Both Classification and Regression

Answer: d) Both Classification and Regression

15. SVM is which type of learning algorithm,


a) Supervised
b) Unsupervised
c) Both
d) None of these
Answer: a) Supervised

16. The goal of SVM is to,


a) Find the optimal separating hyperplane which minimizes the margin of training data.
b) Find the optimal separating hyperplane which maximizes the margin of training data.
c) Both
d) None of these

Answer: b) Find the optimal separating hyperplane which maximizes the margin of training data.

17. The equation for hyperplane is,

a)

b)

c)
d) None of these

Answer: a)

18. What is a kernel in SVM?


a) SVM algorithms use a set of mathematical functions that are defined as the kernel
b) SVM algorithms use a set of logarithmic functions that are defined as the kernel
c) SVM algorithms use a set of exponential functions that are defined as the kernel
d) SVM algorithms use a set of algebraic functions that are defined as the kernel

Answer: a) SVM algorithms use a set of mathematical functions that are defined as the kernel

19. Which of the following is false,


a) SVM’s are very good when we have no idea on the data.
b) It works well with unstructured and semi structured data.
c) The kernel trick is real strength of SVM.
d) It scales relatively well to low dimensional data.

Answer: d) It scales relatively well to low dimensional data.

20. Which of the following is false,


a) SVM algorithm is suitable for large data sets.
b) It does not perform well when the data has more noise.
c) SVM algorithm is not suitable for large data sets.
d) None of these

Answer: a) SVM algorithm is suitable for large data sets.


1. The Naive Bayes Classifier is a _____ in probability.
A. Technique.
B. Process.
C. Classification.
D. None of these answers are correct.
ANSWER: D

2. How many terms are required for building a bayes model?


A. 1
B. 2
C. 3
D. 4
ANSWER: C

3. Where does the bayes rule can be used


A. Solving queries
B. Increasing complexity
C. Decreasing complexity
D. Answering probabilistic query
ANSWER: D

4. _____ is the mathematical likelihood that something will occur.


A. Classification
B. Probability
C. NAive Bayes CLassifier
D. None
ANSWER: B

5. ______________binary distribution, useful when a feature can be present or absent


A. Bernoulli
B. multinomial
C. Gaussian
D. None
ANSWER: A

6. Naïve Bayes Algorithm is a ________ learning algorithm.


A. Supervised
B. Reinforcement
C. Unsupervised
D. None of these
ANSWER: A

7. Examples of Naïve Bayes Algorithm is/are


A. Spam filtration
B. Sentimental analysis
C. Classifying articles
D. All of the above
ANSWER: D
8. Why it is needed to make probabilistic systems feasible in the world
A. Feasibility
B. Reliability
C. Crucial robustness
D. None of the above
ANSWER: C

9. Probability provides a way of summarizing the ______ that comes from our laziness and
ignorances.
A. Belief
B. Uncertaintity
C. Joint probability distributions
D. Randomness
ANSWER: B

10. The entries in the full joint probability distribution can be calculated as
A. Using variables
B. Both Using variables & information
C. Using information
D. All of the above
ANSWER: C

11. Which of the following is correct about the Naive Bayes?


A. Assumes that all the features in a dataset are independent
B. Assumes that all the features in a dataset are equally important
C. None
D. All of the above
ANSWER: C

12. Naïve Bayes algorithm is based on _______ and used for solving classification problems.
A. Bayes Theorem
B. Candidate elimination algorithm
C. EM algorithm
D. None of the above
ANSWER: A

13. Types of Naïve Bayes Model:


A. Bernoulli
B. multinomial
C. Gaussian
D. All of above
ANSWER: D

14. Disadvantages of Naïve Bayes Classifier


A. Naive Bayes assumes that all features are independent or unrelated, so it cannot learn
the relationship between features.
B. It performs well in Multi-class predictions as compared to the other Algorithms.
C. Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.
D. It is the most popular choice for text classification problems.
15. The benefit of Naïve Bayes
A. Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.
B. It is the most popular choice for text classification problems.
C. It can be used for Binary as well as Multi-class Classifications.
D. All of the above
ANSWER: D

16. How can SVM be classified


A. It is a model trained using unsupervised learning. It can be used for classification
and regression.
B. It is a model trained using unsupervised learning. It can be used for classification
but not for regression.
C. It is a model trained using supervised learning. It can be used for classification
and regression.
D. It is a model trained using unsupervised learning. It can be used for classification
but not for regression.
ANSWER: C

17. What do you mean by a hard margin


A. The SVM allows very low error in classification
B. The SVM allows high amount of error in classification
C. None of the above
D. All of above
ANSWER: A

18. The effectiveness of an SVM depends upon:


A. Selection of Kernel
B. Kernel Parameters
C. Soft Margin Parameter C
D. All of the above
ANSWER: D

19. Support vectors are the data points that lie closest to the decision surface.
A. TRUE
B. FALSE
ANSWER: A

20. The SVM’s are less effective when:


A. The data is linearly separable
B. The data is clean and ready to use
C. The data is noisy and contains overlapping points
ANSWER: C

21. The cost parameter in the SVM means:


A. The number of cross-validations to be made
B. The kernel to be used
C. The tradeoff between misclassification and simplicity of the model
D. None of the above
ANSWER: C

22. Which of the following are real world applications of the SVM?
A. Text and Hypertext Categorization
B. Image Classification
C. Clustering of News Articles
D. All of the above
ANSWER:D

23. Gaussian naive Bayes is useful when working with continuous values whose probabilities
can be modeled using a Gaussian distribution
A. Bernoulli
B. multinomial
C. Gaussian
D. All of above
ANSWER: C

24. A multinomial distribution is useful to model feature vectors where each value
represents,the number of occurrences of a term or its relative frequency
A. Bernoulli
B. multinomial
C. Gaussian
D. All of above
ANSWER: B

25. Gaussian naive Bayes is limited due to


A. Mean and variance
B. Mean and Median
C. Median and covariance
D. Mean and standard deviation
ANSWER:A

26. The two classes are normally separated by a margin with two boundaries where a few
elements lie. Those elements are called
A. principal componants
B. support vectors
C. factors
D. None
ANSWER: B

27. What is/are true about kernel in SVM? 1. Kernel function map low dimensional data to
high dimensional space. 2.It’s a similarity function
A. 1
B. 2
C. 1 and 2
D. None of these
ANSWER: C
28. Support vector machine (SVM) is a _________ classifier
A. Descrinative
B. Generative
ANSWER: A

29. SVM is termed as ________ classifier


A. Maximum margin
B. Manimum margin
ANSWER:A

30. The training examples closest to the separating hyperplane are called as _______
A. Training vector
B. Testing Vector
C. Support margin
D. Support vector
ANSWER:D

31. Which of the following is a type of SVM?


A. Maximum margin classifier
B. Soft margin classifier
C. Support vector regression
D. All of the above
ANSWER: D

32. The goal of the SVM is to __________


A. Find the optimal separating hyperplane which minimizes the margin of training data
B. Find the optimal separating hyperplane which maximizes the margin of training data
ANSWER:B

33. When using R, which of the following package is used for SVM?
A. b1072
B. c1071
C. d2012
D. e1071
ANSWER:D

34. What are the different kernels functions in SVM ?


A. Linear Kernel
B. Polynomial kernel
C. Radial basis kernel
D. Sigmoid kernel
E. ALl of the above
ANSWER:E

35. Which of the following might be valid reasons for preferring an SVM over a neural
network?
A. An SVM can automatically learn to apply a non-linear transformation on the input space;
a neural net cannot.
B. An SVM can effectively map the data to an infinite-dimensional space; a neural net
cannot.
C. An SVM should not get stuck in local minima, unlike a neural net.
D. The transformed (basis function) representation constructed by an SVM is usually
easier to visualise/interpret than for a neural net.
ANSWER: B,C

36. You are given a labeled binary classification data set with N data points and D features.
Suppose that N < D. In training an SVM on this data set, which of the following kernels
is likely to be most appropriate?
A. Linear kernel
B. Quadratic kernel
C. Higher-order polynomial kernel
D. RBF kernel
ANSWER: A
UNIT I
1. What is classification?
a) when the output variable is a category, such as “red” or “blue” or “disease” and “no
disease”.
b) when the output variable is a real value, such as “dollars” or “weight”.

Ans: Solution A

2. What is regression?
a) When the output variable is a category, such as “red” or “blue” or “disease” and “no
disease”.
b) When the output variable is a real value, such as “dollars” or “weight”.

Ans: Solution B

3. What is supervised learning?


a) All data is unlabelled and the algorithms learn to inherent structure from the input data
b) All data is labelled and the algorithms learn to predict the output from the input data
c) It is a framework for learning where an agent interacts with an environment and receives
a reward for each interaction
d) Some data is labelled but most of it is unlabelled and a mixture of supervised and
unsupervised techniques can be used.

Ans: Solution B

4. What is Unsupervised learning?


a) All data is unlabelled and the algorithms learn to inherent structure from the input data
b) All data is labelled and the algorithms learn to predict the output from the input data
c) It is a framework for learning where an agent interacts with an environment and receives
a reward for each interaction
d) Some data is labelled but most of it is unlabelled and a mixture of supervised and
unsupervised techniques can be used.

Ans: Solution A

5. What is Semi-Supervised learning?


a) All data is unlabelled and the algorithms learn to inherent structure from the input data
b) All data is labelled and the algorithms learn to predict the output from the input data
c) It is a framework for learning where an agent interacts with an environment and receives
a reward for each interaction
d) Some data is labelled but most of it is unlabelled and a mixture of supervised and
unsupervised techniques can be used.

Ans: Solution D
6. What is Reinforcement learning?
a) All data is unlabelled and the algorithms learn to inherent structure from the input data
b) All data is labelled and the algorithms learn to predict the output from the input data
c) It is a framework for learning where an agent interacts with an environment and receives
a reward for each interaction
d) Some data is labelled but most of it is unlabelled and a mixture of supervised and
unsupervised techniques can be used.

Ans: Solution C

7. Sentiment Analysis is an example of:

Regression,

Classification

Clustering

Reinforcement Learning

Options:

A. 1 Only

B. 1 and 2

C. 1 and 3

D. 1, 2 and 4

Ans : Solution D

8. The process of forming general concept definitions from examples of concepts to be


learned.
a) Deduction
b) abduction
c) induction
d) conjunction

Ans : Solution C

9. Computers are best at learning


a) facts.
b) concepts.
c) procedures.
d) principles.
Ans : Solution A

10. Data used to build a data mining model.


a) validation data
b) training data
c) test data
d) hidden data

Ans : Solution B

11. Supervised learning and unsupervised clustering both require at least one
a) hidden attribute.
b) output attribute.
c) input attribute.
d) categorical attribute.

Ans : Solution A

12. Supervised learning differs from unsupervised clustering in that supervised learning requires
a) at least one input attribute.
b) input attributes to be categorical.
c) at least one output attribute.
d) output attributes to be categorical.

Ans : Solution B

13. A regression model in which more than one independent variable is used to predict the
dependent variable is called
a) a simple linear regression model
b) a multiple regression models
c) an independent model
d) none of the above

Ans : Solution C

14. A term used to describe the case when the independent variables in a multiple regression model
are correlated is
a) Regression
b) correlation
c) multicollinearity
d) none of the above

Ans : Solution C
15. A multiple regression model has the form: y = 2 + 3x1 + 4x2. As x1 increases by 1 unit (holding x2
constant), y will
a) increase by 3 units
b) decrease by 3 units
c) increase by 4 units
d) decrease by 4 units

Ans : Solution C

16. A multiple regression model has


a) only one independent variable
b) more than one dependent variable
c) more than one independent variable
d) none of the above

Ans : Solution B

17. A measure of goodness of fit for the estimated regression equation is the
a) multiple coefficient of determination
b) mean square due to error
c) mean square due to regression
d) none of the above

Ans : Solution C

18. The adjusted multiple coefficient of determination accounts for


a) the number of dependent variables in the model
b) the number of independent variables in the model
c) unusually large predictors
d) none of the above

Ans : Solution D

19. The multiple coefficient of determination is computed by


a) dividing SSR by SST
b) dividing SST by SSR
c) dividing SST by SSE
d) none of the above

Ans : Solution C

20. For a multiple regression model, SST = 200 and SSE = 50. The multiple coefficient of
determination is
a) 0.25
b) 4.00
c) 0.75
d) none of the above

Ans : Solution B

21. A nearest neighbor approach is best used


a) with large-sized datasets.
b) when irrelevant attributes have been removed from the data.
c) when a generalized model of the data is desirable.
d) when an explanation of what has been found is of primary importance.

Ans : Solution B

22. Another name for an output attribute.


a) predictive variable
b) independent variable
c) estimated variable
d) dependent variable

Ans : Solution B

23. Classification problems are distinguished from estimation problems in that


a) classification problems require the output attribute to be numeric.
b) classification problems require the output attribute to be categorical.
c) classification problems do not allow an output attribute.
d) classification problems are designed to predict future outcome.

Ans : Solution C

24. Which statement is true about prediction problems?


a) The output attribute must be categorical.
b) The output attribute must be numeric.
c) The resultant model is designed to determine future outcomes.
d) The resultant model is designed to classify current behavior.

Ans : Solution D

25. Which statement about outliers is true?


a) Outliers should be identified and removed from a dataset.
b) Outliers should be part of the training dataset but should not be present in the test
data.
c) Outliers should be part of the test dataset but should not be present in the training
data.
d) The nature of the problem determines how outliers are used.
Ans : Solution D

26. Which statement is true about neural network and linear regression models?
a) Both models require input attributes to be numeric.
b) Both models require numeric attributes to range between 0 and 1.
c) The output of both models is a categorical attribute value.
d) Both techniques build models whose output is determined by a linear sum of weighted
input attribute values.

Ans : Solution A

27. Which of the following is a common use of unsupervised clustering?


a) detect outliers
b) determine a best set of input attributes for supervised learning
c) evaluate the likely performance of a supervised learner model
d) determine if meaningful relationships can be found in a dataset

Ans : Solution A

28. The average positive difference between computed and desired outcome values.
a) root mean squared error
b) mean squared error
c) mean absolute error
d) mean positive error

Ans : Solution D

29. Selecting data so as to assure that each class is properly represented in both the training and
test set.
a) cross validation
b) stratification
c) verification
d) bootstrapping

Ans : Solution B

30. The standard error is defined as the square root of this computation.
a) The sample variance divided by the total number of sample instances.
b) The population variance divided by the total number of sample instances.
c) The sample variance divided by the sample mean.
d) The population variance divided by the sample mean.

Ans : Solution A
31. Data used to optimize the parameter settings of a supervised learner model.
a) Training
b) Test
c) Verification
d) Validation

Ans : Solution D

32. Bootstrapping allows us to


a) choose the same training instance several times.
b) choose the same test set instance several times.
c) build models with alternative subsets of the training data several times.
d) test a model with alternative subsets of the test data several times.

Ans : Solution A

33. The correlation between the number of years an employee has worked for a company and the
salary of the employee is 0.75. What can be said about employee salary and years worked?
a) There is no relationship between salary and years worked.
b) Individuals that have worked for the company the longest have higher salaries.
c) Individuals that have worked for the company the longest have lower salaries.
d) The majority of employees have been with the company a long time.
e) The majority of employees have been with the company a short period of time.

Ans : Solution B

34. The correlation coefficient for two real-valued attributes is –0.85. What does this value tell you?
a) The attributes are not linearly related.
b) As the value of one attribute increases the value of the second attribute also increases.
c) As the value of one attribute decreases the value of the second attribute increases.
d) The attributes show a curvilinear relationship.

Ans : Solution C

35. The average squared difference between classifier predicted output and actual output.
a) mean squared error
b) root mean squared error
c) mean absolute error
d) mean relative error

Ans : Solution A

36. Simple regression assumes a __________ relationship between the input attribute and output
attribute.
a) Linear
b) Quadratic
c) reciprocal
d) inverse

Ans : Solution A

37. Regression trees are often used to model _______ data.


a) Linear
b) Nonlinear
c) Categorical
d) Symmetrical

Ans : Solution B

38. The leaf nodes of a model tree are


a) averages of numeric output attribute values.
b) nonlinear regression equations.
c) linear regression equations.
d) sums of numeric output attribute values.

Ans : Solution C

39. Logistic regression is a ________ regression technique that is used to model data having a
_____outcome.
a) linear, numeric
b) linear, binary
c) nonlinear, numeric
d) nonlinear, binary

Ans : Solution D

40. This technique associates a conditional probability value with each data instance.
a) linear regression
b) logistic regression
c) simple regression
d) multiple linear regression

Ans : Solution B

41. This supervised learning technique can process both numeric and categorical input attributes.
a) linear regression
b) Bayes classifier
c) logistic regression
d) backpropagation learning
Ans : Solution A

42. With Bayes classifier, missing data items are


a) treated as equal compares.
b) treated as unequal compares.
c) replaced with a default value.
d) ignored.

Ans : Solution B

43. This clustering algorithm merges and splits nodes to help modify nonoptimal partitions.
a) agglomerative clustering
b) expectation maximization
c) conceptual clustering
d) K-Means clustering

Ans : Solution D

44. This clustering algorithm initially assumes that each data instance represents a single cluster.
a) agglomerative clustering
b) conceptual clustering
c) K-Means clustering
d) expectation maximization

Ans : Solution C

45. This unsupervised clustering algorithm terminates when mean values computed for the current
iteration of the algorithm are identical to the computed mean values for the previous iteration.
a) agglomerative clustering
b) conceptual clustering
c) K-Means clustering
d) expectation maximization

Ans : Solution C

46. Machine learning techniques differ from statistical techniques in that machine learning methods
a) typically assume an underlying distribution for the data.
b) are better able to deal with missing and noisy data.
c) are not able to explain their behavior.
d) have trouble with large-sized datasets.

Ans : Solution B
UNIT –II

1.True- False: Over fitting is more likely when you have huge amount of data to train?
A) TRUE
B) FALSE
Ans Solution: (B)
With a small training dataset, it’s easier to find a hypothesis to fit the training data exactly i.e.
over fitting.

2.What is pca.components_ in Sklearn?


Set of all eigen vectors for the projection space
Matrix of principal components
Result of the multiplication matrix
None of the above options
Ans A

3.Which of the following techniques would perform better for reducing dimensions of a data
set?
A. Removing columns which have too many missing values
B. Removing columns which have high variance in data
C. Removing columns with dissimilar data trends
D. None of these
Ans Solution: (A)
If a columns have too many missing values, (say 99%) then we can remove such columns.

4.It is not necessary to have a target variable for applying dimensionality reduction
algorithms.
A. TRUE
B. FALSE
Ans Solution: (A)
LDA is an example of supervised dimensionality reduction algorithm.

5. PCA can be used for projecting and visualizing data in lower dimensions.
A. TRUE
B. FALSE
Ans Solution: (A)
Sometimes it is very useful to plot the data in lower dimensions. We can take the first 2 principal
components and then visualize the data using scatter plot.

6. The most popularly used dimensionality reduction algorithm is Principal Component Analysis
(PCA). Which of the following is/are true about PCA?
PCA is an unsupervised method
It searches for the directions that data have the largest variance
Maximum number of principal components <= number of features
All principal components are orthogonal to each other
A. 1 and 2
B. 1 and 3
C. 2 and 3
D. All of the above

Ans D

7. PCA works better if there is?


A linear structure in the data
If the data lies on a curved surface and not on a flat surface
If variables are scaled in the same unit
A. 1 and 2
B. 2 and 3
C. 1 and 3
D. 1 ,2 and 3
Ans Solution: (C)

8. What happens when you get features in lower dimensions using PCA?
The features will still have interpretability
The features will lose interpretability
The features must carry all information present in data
The features may not carry all information present in data
A. 1 and 3
B. 1 and 4
C. 2 and 3
D. 2 and 4
Ans Solution: (D)
When you get the features in lower dimensions then you will lose some information of data
most of the times and you won’t be able to interpret the lower dimension data.

9. Which of the following option(s) is / are true?


You need to initialize parameters in PCA
You don’t need to initialize parameters in PCA
PCA can be trapped into local minima problem
PCA can’t be trapped into local minima problem
A. 1 and 3
B. 1 and 4
C. 2 and 3
D. 2 and 4
Ans Solution: (D)
PCA is a deterministic algorithm which doesn’t have parameters to initialize and it doesn’t have
local minima problem like most of the machine learning algorithms has.

10. What is of the following statement is true about t-SNE in comparison to PCA?
A. When the data is huge (in size), t-SNE may fail to produce better results.
B. T-NSE always produces better result regardless of the size of the data
C. PCA always performs better than t-SNE for smaller size data.
D. None of these
Ans Solution: (A)
Option A is correct

11. [ True or False ] PCA can be used for projecting and visualizing data in lower dimensions.
A. TRUE
B. FALSE

Solution: (A)
Sometimes it is very useful to plot the data in lower dimensions. We can take the first 2 principal
components and then visualize the data using scatter plot.

12. A feature F1 can take certain value: A, B, C, D, E, & F and represents grade of students from
a college.
1) Which of the following statement is true in following case?
A) Feature F1 is an example of nominal variable.
B) Feature F1 is an example of ordinal variable.
C) It doesn’t belong to any of the above category.
D) Both of these
Solution: (B)
Ordinal variables are the variables which has some order in their categories. For example, grade
A should be consider as high grade than grade B.

13. Which of the following is an example of a deterministic algorithm?


A) PCA
B) K-Means
C) None of the above
Solution: (A)
A deterministic algorithm is that in which output does not change on different runs. PCA would
give the same result if we run again, but not k-means.
UNIT –III

1. Which of the following methods do we use to best fit the data in Logistic Regression?
A) Least Square Error
B) Maximum Likelihood
C) Jaccard distance
D) Both A and B
Ans Solution: B

2. Choose which of the following options is true regarding One-Vs-All method in Logistic
Regression.
A) We need to fit n models in n-class classification problem
B) We need to fit n-1 models to classify into n classes
C) We need to fit only 1 model to classify into n classes
D) None of these
Ans Solution: A

3. Suppose, You applied a Logistic Regression model on a given data and got a training accuracy
X and testing accuracy Y. Now, you want to add a few new features in the same data. Select the
option(s) which is/are correct in such a case.
Note: Consider remaining parameters are same.
A) Training accuracy increases
B) Training accuracy increases or remains the same
C) Testing accuracy decreases
D) Testing accuracy increases or remains the same
Ans Solution: A and D
Adding more features to model will increase the training accuracy because model has to
consider more data to fit the logistic regression. But testing accuracy increases if feature is
found to be significant

4. Which of the following algorithms do we use for Variable Selection?


A) LASSO
B) Ridge
C) Both
D) None of these
Ans Solution: A
In case of lasso we apply a absolute penality, after increasing the penality in lasso some of the
coefficient of variables may become zero

5. Which of the following statement is true about outliers in Linear regression?


A) Linear regression is sensitive to outliers
B) Linear regression is not sensitive to outliers
C) Can’t say
D) None of these
Ans Solution: (A)
The slope of the regression line will change due to outliers in most of the cases. So Linear
Regression is sensitive to outliers.

6. Which of the following methods do we use to find the best fit line for data in Linear
Regression?
A) Least Square Error
B) Maximum Likelihood
C) Logarithmic Loss
D) Both A and B
Ans Solution: (A)
In linear regression, we try to minimize the least square errors of the model to identify the line
of best fit.

7. Which of the following is true about Residuals?


A) Lower is better
B) Higher is better
C) A or B depend on the situation
D) None of these
Ans Solution: (A)
Residuals refer to the error values of the model. Therefore lower residuals are desired.

8. Suppose you plotted a scatter plot between the residuals and predicted values in linear
regression and you found that there is a relationship between them. Which of the following
conclusion do you make about this situation?

A) Since the there is a relationship means our model is not good


B) Since the there is a relationship means our model is good
C) Can’t say
D) None of these
Ans Solution: (A)
There should not be any relationship between predicted values and residuals. If there exists any
relationship between them, it means that the model has not perfectly captured the information
in the data.

9. Suppose you have fitted a complex regression model on a dataset. Now, you are using Ridge
regression with penalty x.
Choose the option which describes bias in best manner.
A) In case of very large x; bias is low
B) In case of very large x; bias is high
C) We can’t say about bias
D) None of these
Ans Solution: (B)
If the penalty is very large it means model is less complex, therefore the bias would be high.

10. Which of the following option is true?


A) Linear Regression errors values has to be normally distributed but in case of Logistic
Regression it is not the case
B) Logistic Regression errors values has to be normally distributed but in case of Linear
Regression it is not the case
C) Both Linear Regression and Logistic Regression error values have to be normally distributed
D) Both Linear Regression and Logistic Regression error values have not to be normally
distributed
Ans Solution: A

11. Suppose you have trained a logistic regression classifier and it outputs a new example x with
a prediction ho(x) = 0.2. This means
Our estimate for P(y=1 | x)
Our estimate for P(y=0 | x)
Our estimate for P(y=1 | x)
Our estimate for P(y=0 | x)
Ans Solution: B

12. True-False: Linear Regression is a supervised machine learning algorithm.


A) TRUE
B) FALSE
Solution: (A)
Yes, Linear regression is a supervised learning algorithm because it uses true labels for training.
Supervised learning algorithm should have input variable (x) and an output variable (Y) for each
example.

13. True-False: Linear Regression is mainly used for Regression.


A) TRUE
B) FALSE
Solution: (A)
Linear Regression has dependent variables that have continuous values.
14. True-False: It is possible to design a Linear regression algorithm using a neural network?

A) TRUE
B) FALSE

Solution: (A)

True. A Neural network can be used as a universal approximator, so it can definitely implement
a linear regression algorithm.

15. Which of the following methods do we use to find the best fit line for data in Linear
Regression?
A) Least Square Error
B) Maximum Likelihood
C) Logarithmic Loss
D) Both A and B
Solution: (A)
In linear regression, we try to minimize the least square errors of the model to identify the line
of best fit.

16. Which of the following evaluation metrics can be used to evaluate a model while modeling
a continuous output variable?
A) AUC-ROC
B) Accuracy
C) Logloss
D) Mean-Squared-Error
Solution: (D)
Since linear regression gives output as continuous values, so in such case we use mean squared
error metric to evaluate the model performance. Remaining options are use in case of a
classification problem.

17. True-False: Lasso Regularization can be used for variable selection in Linear Regression.
A) TRUE
B) FALSE
Solution: (A)
True, In case of lasso regression we apply absolute penalty which makes some of the coefficients
zero.

18. Which of the following is true about Residuals ?


A) Lower is better
B) Higher is better
C) A or B depend on the situation
D) None of these
Solution: (A)
Residuals refer to the error values of the model. Therefore lower residuals are desired.

19. Suppose that we have N independent variables (X1,X2… Xn) and dependent variable is Y.
Now Imagine that you are applying linear regression by fitting the best fit line using least square
error on this data.
You found that correlation coefficient for one of it’s variable(Say X1) with Y is -0.95.
Which of the following is true for X1?
A) Relation between the X1 and Y is weak
B) Relation between the X1 and Y is strong
C) Relation between the X1 and Y is neutral
D) Correlation can’t judge the relationship
Solution: (B)
The absolute value of the correlation coefficient denotes the strength of the relationship.
Since absolute correlation is very high it means that the relationship is strong between X1 and
Y.

20. Looking at above two characteristics, which of the following option is the correct for
Pearson correlation between V1 and V2?
If you are given the two variables V1 and V2 and they are following below two characteristics.
1. If V1 increases then V2 also increases
2. If V1 decreases then V2 behavior is unknown
A) Pearson correlation will be close to 1
B) Pearson correlation will be close to -1
C) Pearson correlation will be close to 0
D) None of these

Solution: (D)
We cannot comment on the correlation coefficient by using only statement 1. We need to
consider the both of these two statements. Consider V1 as x and V2 as |x|. The correlation
coefficient would not be close to 1 in such a case.

21. Suppose Pearson correlation between V1 and V2 is zero. In such case, is it right to
conclude that V1 and V2 do not have any relation between them?
A) TRUE
B) FALSE
Solution: (B)
Pearson correlation coefficient between 2 variables might be zero even when they have a
relationship between them. If the correlation coefficient is zero, it just means that that they
don’t move together. We can take examples like y=|x| or y=x^2.
22. True- False: Overfitting is more likely when you have huge amount of data to train?
A) TRUE
B) FALSE
Solution: (B)
With a small training dataset, it’s easier to find a hypothesis to fit the training data exactly i.e.
overfitting.

23. We can also compute the coefficient of linear regression with the help of an analytical
method called “Normal Equation”. Which of the following is/are true about Normal Equation?
1. We don’t have to choose the learning rate
2. It becomes slow when number of features is very large
3. Thers is no need to iterate

A) 1 and 2
B) 1 and 3
C) 2 and 3
D) 1,2 and 3
Solution: (D)
Instead of gradient descent, Normal Equation can also be used to find coefficients.

Question Context 24-26:


Suppose you have fitted a complex regression model on a dataset. Now, you are using Ridge
regression with penality x.
24. Choose the option which describes bias in best manner.
A) In case of very large x; bias is low
B) In case of very large x; bias is high
C) We can’t say about bias
D) None of these
Solution: (B)
If the penalty is very large it means model is less complex, therefore the bias would be high.

25. What will happen when you apply very large penalty?
A) Some of the coefficient will become absolute zero
B) Some of the coefficient will approach zero but not absolute zero
C) Both A and B depending on the situation
D) None of these
Solution: (B)
In lasso some of the coefficient value become zero, but in case of Ridge, the coefficients become
close to zero but not zero.

26. What will happen when you apply very large penalty in case of Lasso?
A) Some of the coefficient will become zero
B) Some of the coefficient will be approaching to zero but not absolute zero
C) Both A and B depending on the situation
D) None of these
Solution: (A)
As already discussed, lasso applies absolute penalty, so some of the coefficients will become
zero.

27. Which of the following statement is true about outliers in Linear regression?
A) Linear regression is sensitive to outliers
B) Linear regression is not sensitive to outliers
C) Can’t say
D) None of these
Solution: (A)
The slope of the regression line will change due to outliers in most of the cases. So Linear
Regression is sensitive to outliers.

28. Suppose you plotted a scatter plot between the residuals and predicted values in linear
regression and you found that there is a relationship between them. Which of the following
conclusion do you make about this situation?

A) Since the there is a relationship means our model is not good


B) Since the there is a relationship means our model is good
C) Can’t say
D) None of these
Solution: (A)
There should not be any relationship between predicted values and residuals. If there exists any
relationship between them,it means that the model has not perfectly captured the information
in the data.

Question Context 29-31:


Suppose that you have a dataset D1 and you design a linear regression model of degree 3
polynomial and you found that the training and testing error is “0” or in another terms it
perfectly fits the data.
29. What will happen when you fit degree 4 polynomial in linear regression?
A) There are high chances that degree 4 polynomial will over fit the data
B) There are high chances that degree 4 polynomial will under fit the data
C) Can’t say
D) None of these
Solution: (A)
Since is more degree 4 will be more complex(overfit the data) than the degree 3 model so it will
again perfectly fit the data. In such case training error will be zero but test error may not be
zero.
30. What will happen when you fit degree 2 polynomial in linear regression?
A) It is high chances that degree 2 polynomial will over fit the data
B) It is high chances that degree 2 polynomial will under fit the data
C) Can’t say
D) None of these
Solution: (B)
If a degree 3 polynomial fits the data perfectly, it’s highly likely that a simpler model(degree 2
polynomial) might under fit the data.

31. In terms of bias and variance. Which of the following is true when you fit degree 2
polynomial?

A) Bias will be high, variance will be high


B) Bias will be low, variance will be high
C) Bias will be high, variance will be low
D) Bias will be low, variance will be low
Solution: (C)
Since a degree 2 polynomial will be less complex as compared to degree 3, the bias will be high
and variance will be low.

Question Context 32-33:


We have been given a dataset with n records in which we have input attribute as x and output
attribute as y. Suppose we use a linear regression method to model this data. To test our linear
regressor, we split the data in training set and test set randomly.
32. Now we increase the training set size gradually. As the training set size increases, what do
you expect will happen with the mean training error?

A) Increase
B) Decrease
C) Remain constant
D) Can’t Say
Solution: (D)
Training error may increase or decrease depending on the values that are used to fit the model.
If the values used to train contain more outliers gradually, then the error might just increase.

33. What do you expect will happen with bias and variance as you increase the size of training
data?

A) Bias increases and Variance increases


B) Bias decreases and Variance increases
C) Bias decreases and Variance decreases
D) Bias increases and Variance decreases
E) Can’t Say False
Solution: (D)
As we increase the size of the training data, the bias would increase while the variance would
decrease.

Question Context 34:


Consider the following data where one input(X) and one output(Y) is given.

34. What would be the root mean square training error for this data if you run a Linear
Regression model of the form (Y = A0+A1X)?

A) Less than 0
B) Greater than zero
C) Equal to 0
D) None of these
Solution: (C)
We can perfectly fit the line on the following data so mean error will be zero.

Question Context 35-36:


Suppose you have been given the following scenario for training and validation error for Linear
Regression.
Number Validation
Learning Training
Scenario of Error
Rate Error
iterations

1 0.1 1000 100 110

2 0.2 600 90 105


3 0.3 400 110 110

4 0.4 300 120 130

5 0.4 250 130 150

35. Which of the following scenario would give you the right hyper parameter?
A) 1
B) 2
C) 3
D) 4
Solution: (B)
Option B would be the better option because it leads to less training as well as validation error.
36. Suppose you got the tuned hyper parameters from the previous question. Now, Imagine
you want to add a variable in variable space such that this added feature is important. Which
of the following thing would you observe in such case?
A) Training Error will decrease and Validation error will increase
B) Training Error will increase and Validation error will increase
C) Training Error will increase and Validation error will decrease
D) Training Error will decrease and Validation error will decrease
E) None of the above
Solution: (D)
If the added feature is important, the training and validation error would decrease.

Question Context 37-38:


Suppose, you got a situation where you find that your linear regression model is under fitting
the data.
37. In such situation which of the following options would you consider?
1. I will add more variables
2. I will start introducing polynomial degree variables
3. I will remove some variables
A) 1 and 2
B) 2 and 3
C) 1 and 3
D) 1, 2 and 3
Solution: (A)
In case of under fitting, you need to induce more variables in variable space or you can add
some polynomial degree variables to make the model more complex to be able to fir the data
better.
38. Now situation is same as written in previous question(under fitting).Which of following
regularization algorithm would you prefer?

A) L1
B) L2
C) Any
D) None of these
Solution: (D)
I won’t use any regularization methods because regularization is used in case of overfitting.

39. True-False: Is Logistic regression a supervised machine learning algorithm?


A) TRUE
B) FALSE
Solution: A
True, Logistic regression is a supervised learning algorithm because it uses true labels for
training. Supervised learning algorithm should have input variables (x) and an target variable (Y)
when you train the model .

40. True-False: Is Logistic regression mainly used for Regression?


A) TRUE
B) FALSE
Solution: B
Logistic regression is a classification algorithm, don’t confuse with the name regression.

41. True-False: Is it possible to design a logistic regression algorithm using a Neural Network
Algorithm?
A) TRUE
B) FALSE
Solution: A
True, Neural network is a is a universal approximator so it can implement linear regression
algorithm.

42. True-False: Is it possible to apply a logistic regression algorithm on a 3-class Classification


problem?
A) TRUE
B) FALSE
Solution: A
Yes, we can apply logistic regression on 3 classification problem, We can use One Vs all method
for 3 class classification in logistic regression.

43. Which of the following methods do we use to best fit the data in Logistic Regression?
A) Least Square Error
B) Maximum Likelihood
C) Jaccard distance
D) Both A and B
Solution: B
Logistic regression uses maximum likely hood estimate for training a logistic regression.

44. Which of the following evaluation metrics can not be applied in case of logistic regression
output to compare with target?
A) AUC-ROC
B) Accuracy
C) Logloss
D) Mean-Squared-Error
Solution: D
Since, Logistic Regression is a classification algorithm so it’s output can not be real time value so
mean squared error can not use for evaluating it

45. One of the very good methods to analyze the performance of Logistic Regression is AIC,
which is similar to R-Squared in Linear Regression. Which of the following is true about AIC?
A) We prefer a model with minimum AIC value
B) We prefer a model with maximum AIC value
C) Both but depend on the situation
D) None of these
Solution: A
We select the best model in logistic regression which can least AIC.

46. [True-False] Standardisation of features is required before training a Logistic Regression.


A) TRUE
B) FALSE
Solution: B
Standardization isn’t required for logistic regression. The main goal of standardizing features is
to help convergence of the technique used for optimization.

47. Which of the following algorithms do we use for Variable Selection?


A) LASSO
B) Ridge
C) Both
D) None of these

Solution: A
In case of lasso we apply a absolute penality, after increasing the penality in lasso some of the
coefficient of variables may become zero.
Context: 48-49

Consider a following model for logistic regression: P (y =1|x, w)= g(w0 + w1x)
where g(z) is the logistic function.

In the above equation the P (y =1|x; w) , viewed as a function of x, that we can get by changing the
parameters w.

48 What would be the range of p in such case?

A) (0, inf)
B) (-inf, 0 )
C) (0, 1)
D) (-inf, inf)

Solution: C

For values of x in the range of real number from −∞ to +∞ Logistic function will give the output
between (0,1)

49 In above question what do you think which function would make p between (0,1)?

A) logistic function
B) Log likelihood function
C) Mixture of both
D) None of them

Solution: A

Explanation is same as question number 10

50. Suppose you have been given a fair coin and you want to find out the odds of getting heads.
Which of the following option is true for such a case?

A) odds will be 0
B) odds will be 0.5
C) odds will be 1
D) None of these

Solution: C

Odds are defined as the ratio of the probability of success and the probability of failure. So in case of fair
coin probability of success is 1/2 and the probability of failure is 1/2 so odd would be 1

51. The logit function(given as l(x)) is the log of odds function. What could be the range of logit
function in the domain x=[0,1]?
A) (– ∞ , ∞)
B) (0,1)
C) (0, ∞)
D) (- ∞, 0)

Solution: A

For our purposes, the odds function has the advantage of transforming the probability function, which
has values from 0 to 1, into an equivalent function with values between 0 and ∞. When we take the
natural log of the odds function, we get a range of values from -∞ to ∞.

52. Which of the following option is true?

A) Linear Regression errors values has to be normally distributed but in case of Logistic Regression it is
not the case
B) Logistic Regression errors values has to be normally distributed but in case of Linear Regression it is
not the case
C) Both Linear Regression and Logistic Regression error values have to be normally distributed
D) Both Linear Regression and Logistic Regression error values have not to be normally distributed

Solution:A

53. Which of the following is true regarding the logistic function for any value “x”?

Note:
Logistic(x): is a logistic function of any number “x”

Logit(x): is a logit function of any number “x”

Logit_inv(x): is a inverse logit function of any number “x”

A) Logistic(x) = Logit(x)
B) Logistic(x) = Logit_inv(x)
C) Logit_inv(x) = Logit(x)
D) None of these

Solution: B

54. How will the bias change on using high(infinite) regularisation?

Suppose you have given the two scatter plot “a” and “b” for two classes( blue for positive and red for
negative class). In scatter plot “a”, you correctly classified all data points using logistic regression ( black
line is a decision boundary).
A) Bias will be high
B) Bias will be low
C) Can’t say
D) None of these

Solution: A

Model will become very simple so bias will be very high.

55. Suppose, You applied a Logistic Regression model on a given data and got a training accuracy X
and testing accuracy Y. Now, you want to add a few new features in the same data. Select the
option(s) which is/are correct in such a case.

Note: Consider remaining parameters are same.

A) Training accuracy increases


B) Training accuracy increases or remains the same
C) Testing accuracy decreases
D) Testing accuracy increases or remains the same

Solution: A and D

Adding more features to model will increase the training accuracy because model has to consider more
data to fit the logistic regression. But testing accuracy increases if feature is found to be significant

56. Choose which of the following options is true regarding One-Vs-All method in Logistic Regression.

A) We need to fit n models in n-class classification problem


B) We need to fit n-1 models to classify into n classes
C) We need to fit only 1 model to classify into n classes
D) None of these
Solution: A

If there are n classes, then n separate logistic regression has to fit, where the probability of each
category is predicted over the rest of the categories combined.

57. Below are two different logistic models with different values for β0 and β1.

Which of the
following statement(s) is true about β0 and β1 values of two logistics models (Green, Black)?

Note: consider Y = β0 + β1*X. Here, β0 is intercept and β1 is coefficient.

A) β1 for Green is greater than Black


B) β1 for Green is lower than Black
C) β1 for both models is same
D) Can’t Say

Solution: B

β0 and β1: β0 = 0, β1 = 1 is in X1 color(black) and β0 = 0, β1 = −1 is in X4 color (green)

Context 58-60

Below are the three scatter plot(A,B,C left to right) and hand drawn decision boundaries for logistic
regression.
58. Which of the following above figure shows that the decision boundary is overfitting the training
data?

A) A
B) B
C) C
D)None of these

Solution: C

Since in figure 3, Decision boundary is not smooth that means it will over-fitting the data.

59. What do you conclude after seeing this visualization?

1. The training error in first plot is maximum as compare to second and third plot.

2. The best model for this regression problem is the last (third) plot because it has minimum
training error (zero).

3. The second model is more robust than first and third because it will perform best on unseen
data.

4. The third model is overfitting more as compare to first and second.

5. All will perform same because we have not seen the testing data.

A) 1 and 3
B) 1 and 3
C) 1, 3 and 4
D) 5

Solution: C

The trend in the graphs looks like a quadratic trend over independent variable X. A higher degree(Right
graph) polynomial might have a very high accuracy on the train population but is expected to fail badly
on test dataset. But if you see in left graph we will have training error maximum because it underfits the
training data

60. Suppose, above decision boundaries were generated for the different value of regularization.
Which of the above decision boundary shows the maximum regularization?

A) A
B) B
C) C
D) All have equal regularization

Solution: A

Since, more regularization means more penality means less complex decision boundry that shows in first
figure A.

61. What would do if you want to train logistic regression on same data that will take less time as well
as give the comparatively similar accuracy(may not be same)?

Suppose you are using a Logistic Regression model on a huge dataset. One of the problem you may face
on such huge data is that Logistic regression will take very long time to train.

A) Decrease the learning rate and decrease the number of iteration


B) Decrease the learning rate and increase the number of iteration
C) Increase the learning rate and increase the number of iteration
D) Increase the learning rate and decrease the number of iteration

Solution: D

If you decrease the number of iteration while training it will take less time for surly but will not give the
same accuracy for getting the similar accuracy but not exact you need to increase the learning rate.

62. Which of the following image is showing the cost function for y =1.

Following is the loss function in logistic regression(Y-axis loss function and x axis log probability) for
two class classification problem.

Note: Y is the target class


A) A
B) B
C) Both
D) None of these

Solution: A

A is the true answer as loss function decreases as the log probability increases

63. Suppose, Following graph is a cost function for logistic regression.

Now, How many local minimas are present in the graph?

A) 1
B) 2
C) 3
D) 4

Solution: C
There are three local minima present in the graph

64. Can a Logistic Regression classifier do a perfect classification on the below data?

Note: You can use only X1 and X2 variables where X1 and X2 can take only two binary values(0,1).

A) TRUE
B) FALSE
C) Can’t say
D) None of these

Solution: B

No, logistic regression only forms linear decision surface, but the examples in the figure are not linearly
separable.
UNIT IV

1. The SVM’s are less effective when:

A) The data is linearly separable


B) The data is clean and ready to use
C) The data is noisy and contains overlapping points

Ans Solution: C

When the data has noise and overlapping points, there is a problem in drawing a clear hyperplane
without misclassifying.

2. The cost parameter in the SVM means:

A) The number of cross-validations to be made


B) The kernel to be used
C) The tradeoff between misclassification and simplicity of the model
D) None of the above

Ans Solution: C

The cost parameter decides how much an SVM should be allowed to “bend” with the data. For a low
cost, you aim for a smooth decision surface and for a higher cost, you aim to classify more points
correctly. It is also simply referred to as the cost of misclassification.

3. Which of the following are real world applications of the SVM?

A) Text and Hypertext Categorization


B) Image Classification
C) Clustering of News Articles
D) All of the above

Ans Solution: D

SVM’s are highly versatile models that can be used for practically all real world problems ranging from
regression to clustering and handwriting recognitions.

4. Which of the following is true about Naive Bayes ?

Assumes that all the features in a dataset are equally important

Assumes that all the features in a dataset are independent

Both A and B - answer

None of the above options


Ans Solution: C

5 What do you mean by generalization error in terms of the SVM?

A) How far the hyperplane is from the support vectors


B) How accurately the SVM can predict outcomes for unseen data
C) The threshold amount of error in an SVM

Ans Solution: B

Generalisation error in statistics is generally the out-of-sample error which is the measure of how
accurately a model can predict values for previously unseen data.

6 The SVM’s are less effective when:

A) The data is linearly separable


B) The data is clean and ready to use
C) The data is noisy and contains overlapping points

Ans Solution: C

When the data has noise and overlapping points, there is a problem in drawing a clear hyperplane
without misclassifying.

7 What is/are true about kernel in SVM?

1. Kernel function map low dimensional data to high dimensional space


2. It’s a similarity function

A) 1
B) 2
C) 1 and 2
D) None of these

Ans Solution: C

Both the given statements are correct.

Question Context:8– 9

Suppose you are using a Linear SVM classifier with 2 class classification problem. Now you have been
given the following data in which some points are circled red that are representing support vectors.
8. If you remove the following any one red points from the data. Does the decision boundary will
change?

A) Yes
B) No

Solution: A

These three examples are positioned such that removing any one of them introduces slack in the
constraints. So the decision boundary would completely change.

9. [True or False] If you remove the non-red circled points from the data, the decision boundary will
change?

A) True
B) False

Solution: B

On the other hand, rest of the points in the data won’t affect the decision boundary much.

10. What do you mean by generalization error in terms of the SVM?

A) How far the hyperplane is from the support vectors


B) How accurately the SVM can predict outcomes for unseen data
C) The threshold amount of error in an SVM

Solution: B

Generalization error in statistics is generally the out-of-sample error which is the measure of how
accurately a model can predict values for previously unseen data.
11. When the C parameter is set to infinite, which of the following holds true?

A) The optimal hyperplane if exists, will be the one that completely separates the data
B) The soft-margin classifier will separate the data
C) None of the above

Solution: A

At such a high level of misclassification penalty, soft margin will not hold existence as there will be no
room for error.

12. What do you mean by a hard margin?

A) The SVM allows very low error in classification


B) The SVM allows high amount of error in classification
C) None of the above

Solution: A

A hard margin means that an SVM is very rigid in classification and tries to work extremely well in the
training set, causing overfitting.

13. The minimum time complexity for training an SVM is O(n2). According to this fact, what sizes of
datasets are not best suited for SVM’s?

A) Large datasets
B) Small datasets
C) Medium sized datasets
D) Size does not matter

Solution: A

Datasets which have a clear classification boundary will function best with SVM’s.

14. The effectiveness of an SVM depends upon:

A) Selection of Kernel
B) Kernel Parameters
C) Soft Margin Parameter C
D) All of the above

Solution: D

The SVM effectiveness depends upon how you choose the basic 3 requirements mentioned above in
such a way that it maximises your efficiency, reduces error and overfitting.

15. upport vectors are the data points that lie closest to the decision surface.
A) TRUE
B) FALSE

Solution: A

They are the points closest to the hyperplane and the hardest ones to classify. They also have a direct
bearing on the location of the decision surface.

16. The SVM’s are less effective when:

A) The data is linearly separable


B) The data is clean and ready to use
C) The data is noisy and contains overlapping points

Solution: C

When the data has noise and overlapping points, there is a problem in drawing a clear hyperplane
without misclassifying.

17. Suppose you are using RBF kernel in SVM with high Gamma value. What does this signify?

A) The model would consider even far away points from hyperplane for modeling
B) The model would consider only the points close to the hyperplane for modeling
C) The model would not be affected by distance of points from hyperplane for modeling
D) None of the above

Solution: B

The gamma parameter in SVM tuning signifies the influence of points either near or far away from the
hyperplane.

For a low gamma, the model will be too constrained and include all points of the training dataset,
without really capturing the shape.

For a higher gamma, the model will capture the shape of the dataset well.

18. The cost parameter in the SVM means:

A) The number of cross-validations to be made


B) The kernel to be used
C) The tradeoff between misclassification and simplicity of the model
D) None of the above

Solution: C
The cost parameter decides how much an SVM should be allowed to “bend” with the data. For a low
cost, you aim for a smooth decision surface and for a higher cost, you aim to classify more points
correctly. It is also simply referred to as the cost of misclassification.

19. Suppose you are building a SVM model on data X. The data X can be error prone which means that
you should not trust any specific data point too much. Now think that you want to build a SVM model
which has quadratic kernel function of polynomial degree 2 that uses Slack variable C as one of it’s hyper
parameter. Based upon that give the answer for following question.

What would happen when you use very large value of C(C->infinity)?

Note: For small C was also classifying all data points correctly

A) We can still classify data correctly for given setting of hyper parameter C
B) We can not classify data correctly for given setting of hyper parameter C
C) Can’t Say
D) None of these

Solution: A

For large values of C, the penalty for misclassifying points is very high, so the decision boundary will
perfectly separate the data if possible.

20. What would happen when you use very small C (C~0)?

A) Misclassification would happen


B) Data will be correctly classified
C) Can’t say
D) None of these

Solution: A

The classifier can maximize the margin between most of the points, while misclassifying a few points,
because the penalty is so low.

21. If I am using all features of my dataset and I achieve 100% accuracy on my training set, but ~70% on
validation set, what should I look out for?

A) Underfitting
B) Nothing, the model is perfect
C) Overfitting

Solution: C

If we’re achieving 100% training accuracy very easily, we need to check to verify if we’re overfitting our
data.
22. Which of the following are real world applications of the SVM?

A) Text and Hypertext Categorization


B) Image Classification
C) Clustering of News Articles
D) All of the above

Solution: D

SVM’s are highly versatile models that can be used for practically all real world problems ranging from
regression to clustering and handwriting recognitions.

Question Context: 23 – 25

Suppose you have trained an SVM with linear decision boundary after training SVM, you correctly infer
that your SVM model is under fitting.

23. Which of the following option would you more likely to consider iterating SVM next time?

A) You want to increase your data points


B) You want to decrease your data points
C) You will try to calculate more variables
D) You will try to reduce the features

Solution: C

The best option here would be to create more features for the model.

24. Suppose you gave the correct answer in previous question. What do you think that is actually
happening?

1. We are lowering the bias


2. We are lowering the variance
3. We are increasing the bias
4. We are increasing the variance

A) 1 and 2
B) 2 and 3
C) 1 and 4
D) 2 and 4

Solution: C

Better model will lower the bias and increase the variance
25. In above question suppose you want to change one of it’s(SVM) hyperparameter so that effect
would be same as previous questions i.e model will not under fit?

A) We will increase the parameter C


B) We will decrease the parameter C
C) Changing in C don’t effect
D) None of these

Solution: A

Increasing C parameter would be the right thing to do here, as it will ensure regularized model

26. We usually use feature normalization before using the Gaussian kernel in SVM. What is true about
feature normalization?

1. We do feature normalization so that new feature will dominate other


2. Some times, feature normalization is not feasible in case of categorical variables
3. Feature normalization always helps when we use Gaussian kernel in SVM

A) 1
B) 1 and 2
C) 1 and 3
D) 2 and 3

Solution: B

Statements one and two are correct.

Question Context: 27-29

Suppose you are dealing with 4 class classification problem and you want to train a SVM model on the
data for that you are using One-vs-all method. Now answer the below questions?

27. How many times we need to train our SVM model in such case?

A) 1
B) 2
C) 3
D) 4

Solution: D

For a 4 class problem, you would have to train the SVM at least 4 times if you are using a one-vs-all
method.
28. Suppose you have same distribution of classes in the data. Now, say for training 1 time in one vs all
setting the SVM is taking 10 second. How many seconds would it require to train one-vs-all method end
to end?

A) 20
B) 40
C) 60
D) 80

Solution: B

It would take 10×4 = 40 seconds

29 Suppose your problem has changed now. Now, data has only 2 classes. What would you think how
many times we need to train SVM in such case?

A) 1
B) 2
C) 3
D) 4

Solution: A

Training the SVM only one time would give you appropriate results

Question context: 30 –31

Suppose you are using SVM with linear kernel of polynomial degree 2, Now think that you have applied
this on data and found that it perfectly fit the data that means, Training and testing accuracy is 100%.

30. Now, think that you increase the complexity (or degree of polynomial of this kernel). What would
you think will happen?

A) Increasing the complexity will over fit the data


B) Increasing the complexity will under fit the data
C) Nothing will happen since your model was already 100% accurate
D) None of these

Solution: A

Increasing the complexity of the data would make the algorithm overfit the data.
31. In the previous question after increasing the complexity you found that training accuracy was still
100%. According to you what is the reason behind that?

1. Since data is fixed and we are fitting more polynomial term or parameters so the algorithm starts
memorizing everything in the data
2. Since data is fixed and SVM doesn’t need to search in big hypothesis space

A) 1
B) 2
C) 1 and 2
D) None of these

Solution: C

Both the given statements are correct.

32. What is/are true about kernel in SVM?

1. Kernel function map low dimensional data to high dimensional space


2. It’s a similarity function

A) 1
B) 2
C) 1 and 2
D) None of these

Solution: C

Both the given statements are correct.

UNIT V

1. Which of the following is a widely used and effective machine learning algorithm based on the
idea of bagging?

a) Decision Tree
b) Regression
c) Classification
d) Random Forest

Ans D

2. Which of the following is a disadvantage of decision trees?

a) Factor analysis
b) Decision trees are robust to outliers
c) Decision trees are prone to be overfit
d) None of the above

Ans C

3. Can decision trees be used for performing clustering?

a. True
b. False

Ans Solution: (A)

Decision trees can also be used to for clusters in the data but clustering often generates natural
clusters and is not dependent on any objective function.

4. Which of the following algorithm is most sensitive to outliers?

a. K-means clustering algorithm


b. K-medians clustering algorithm
c. K-modes clustering algorithm
d. K-medoids clustering algorithm

Ans Solution: (A)

5 Sentiment Analysis is an example of:

Regression

Classification

Clustering

Reinforcement Learning

Options:

a. 1 Only
b. 1 and 2
c. 1 and 3
d. 1, 2 and 4

Ans D

6 Which of the following is the most appropriate strategy for data cleaning before performing
clustering analysis, given less than desirable number of data points:

Capping and flouring of variables

Removal of outliers
Options:
a. 1 only
b. 2 only
c. 1 and 2
d. None of the above

Ans A

7 Which of the following is/are true about bagging trees?

1. In bagging trees, individual trees are independent of each other


2. Bagging is the method for improving the performance by aggregating the results of weak
learners

A) 1
B) 2
C) 1 and 2
D) None of these

Ans Solution: C

Both options are true. In Bagging, each individual trees are independent of each other because they
consider different subset of features and samples.

8. Which of the following is/are true about boosting trees?

1. In boosting trees, individual weak learners are independent of each other


2. It is the method for improving the performance by aggregating the results of weak learners

A) 1
B) 2
C) 1 and 2
D) None of these

Ans Solution: B

In boosting tree individual weak learners are not independent of each other because each tree correct
the results of previous tree. Bagging and boosting both can be consider as improving the base learners
results.

9. In Random forest you can generate hundreds of trees (say T1, T2 …..Tn) and then aggregate
the results of these tree. Which of the following is true about individual (Tk) tree in Random Forest?
1. Individual tree is built on a subset of the features

2. Individual tree is built on all the features

3. Individual tree is built on a subset of observations

4. Individual tree is built on full set of observations

A) 1 and 3
B) 1 and 4
C) 2 and 3
D) 2 and 4

Ans Solution: A

Random forest is based on bagging concept, that consider faction of sample and faction of feature for
building the individual trees.

10. Suppose you are using a bagging based algorithm say a RandomForest in model building.
Which of the following can be true?

1. Number of tree should be as large as possible

2. You will have interpretability after using Random Forest

A) 1
B) 2
C) 1 and 2
D) None of these

Ans Solution: A

Since Random Forest aggregate the result of different weak learners, If It is possible we would want
more number of trees in model building. Random Forest is a black box model you will lose
interpretability after using it.

11. Which of the following is/are true about Random Forest and Gradient Boosting ensemble
methods?

1. Both methods can be used for classification task

2. Random Forest is use for classification whereas Gradient Boosting is use for regression task

3. Random Forest is use for regression whereas Gradient Boosting is use for Classification task

4. Both methods can be used for regression task


A) 1
B) 2
C) 3
D) 4
E) 1 and 4

Solution: E

Both algorithms are design for classification as well as regression task.

12. In Random forest you can generate hundreds of trees (say T1, T2 …..Tn) and then aggregate the
results of these tree. Which of the following is true about individual(Tk) tree in Random Forest?

1. Individual tree is built on a subset of the features

2. Individual tree is built on all the features

3. Individual tree is built on a subset of observations

4. Individual tree is built on full set of observations

A) 1 and 3
B) 1 and 4
C) 2 and 3
D) 2 and 4

Solution: A

Random forest is based on bagging concept, that consider faction of sample and faction of feature for
building the individual trees.

13. Which of the following algorithm doesn’t uses learning Rate as of one of its hyperparameter?

1. Gradient Boosting

2. Extra Trees

3. AdaBoost

4. Random Forest

A) 1 and 3
B) 1 and 4
C) 2 and 3
D) 2 and 4

Solution: D
Random Forest and Extra Trees don’t have learning rate as a hyperparameter.

14. Which of the following algorithm are not an example of ensemble learning algorithm?

A) Random Forest
B) Adaboost
C) Extra Trees
D) Gradient Boosting
E) Decision Trees

Solution: E

Decision trees doesn’t aggregate the results of multiple trees so it is not an ensemble algorithm.

15. Suppose you are using a bagging based algorithm say a RandomForest in model building. Which of
the following can be true?

1. Number of tree should be as large as possible

2. You will have interpretability after using RandomForest

A) 1
B) 2
C) 1 and 2
D) None of these

Solution: A

Since Random Forest aggregate the result of different weak learners, If It is possible we would want
more number of trees in model building. Random Forest is a black box model you will lose
interpretability after using it.

16. True-False: The bagging is suitable for high variance low bias models?

A) TRUE
B) FALSE

Solution: A

The bagging is suitable for high variance low bias models or you can say for complex models.

17. To apply bagging to regression trees which of the following is/are true in such case?

1. We build the N regression with N bootstrap sample

2. We take the average the of N regression tree

3. Each tree has a high variance with low bias


A) 1 and 2
B) 2 and 3
C) 1 and 3
D) 1,2 and 3

Solution: D

All of the options are correct and self-explanatory

18. How to select best hyper parameters in tree based models?

A) Measure performance over training data


B) Measure performance over validation data
C) Both of these
D) None of these

Solution: B

We always consider the validation results to compare with the test result.

19. In which of the following scenario a gain ratio is preferred over Information Gain?

A) When a categorical variable has very large number of category


B) When a categorical variable has very small number of category
C) Number of categories is the not the reason
D) None of these

Solution: A

When high cardinality problems, gain ratio is preferred over Information Gain technique.

20. Suppose you have given the following scenario for training and validation error for Gradient
Boosting. Which of the following hyper parameter would you choose in such case?

Scenario Depth Training Error Validation Error

1 2 100 110

2 4 90 105

3 6 50 100

4 8 45 105
5 10 30 150

A) 1
B) 2
C) 3
D) 4

Solution: B

Scenario 2 and 4 has same validation accuracies but we would select 2 because depth is lower is better
hyper parameter.

21. Which of the following is/are not true about DBSCAN clustering algorithm:

1. For data points to be in a cluster, they must be in a distance threshold to a core point

2. It has strong assumptions for the distribution of data points in dataspace

3. It has substantially high time complexity of order O(n 3)

4. It does not require prior knowledge of the no. of desired clusters

5. It is robust to outliers

Options:

A. 1 only

B. 2 only

C. 4 only

D. 2 and 3

Solution: D

 DBSCAN can form a cluster of any arbitrary shape and does not have strong assumptions for the
distribution of data points in the data space.

 DBSCAN has a low time complexity of order O (n log n) only.

22. Point out the correct statement.


a) The choice of an appropriate metric will influence the shape of the clusters
b) Hierarchical clustering is also called HCA
c) In general, the merges and splits are determined in a greedy manner
d) All of the mentioned
Answer: d
Explanation: Some elements may be close to one another according to one distance and farther away
according to another.

23. Which of the following is required by K-means clustering?


a) defined distance metric
b) number of clusters
c) initial guess as to cluster centroids
d) all of the mentioned

Answer: d
Explanation: K-means clustering follows partitioning approach.

24. Point out the wrong statement.


a) k-means clustering is a method of vector quantization
b) k-means clustering aims to partition n observations into k clusters
c) k-nearest neighbor is same as k-means
d) none of the mentioned

Answer: c
Explanation: k-nearest neighbour has nothing to do with k-means.

25. Which of the following function is used for k-means clustering?


a) k-means
b) k-mean
c) heat map
d) none of the mentioned

Answer: a
Explanation: K-means requires a number of clusters.

26. K-means is not deterministic and it also consists of number of iterations.


a) True
b) False

Answer: a
Explanation: K-means clustering produces the final estimate of cluster centroids.
27.
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
S.r No Question a b c d Correct Image
Answer
Write down question Option a Option b Option c Option d a/b/c/d img.jpg
1 In reinforcement learning if feedback is Penalty Overlearning Reward None of above A
negative one it is defined as____.
2 According to____ , it�s a key success Claude Shannon's theory Gini Index Darwin�s theory None of above C
factor for the survival and evolution of all
species.
3 How can you avoid overfitting ? By using a lot of data By using inductive machine By using validation only None of above A
learning
4 What are the popular algorithms of Decision Trees and Neural Probabilistic networks and Support vector machines All D
Machine Learning? Networks (back Nearest Neighbor
propagation)
5 What is �Training set�? Training set is used to test A set of data is used to Both A & B None of above B
the accuracy of the discover the potentially
hypotheses generated by the predictive relationship.
learner.
6 Common deep learning applications Image classification, Autonomous car driving, All above D
include____ Real-time visual tracking Logistic optimization Bioinformatics,
Speech recognition
7 what is the function of �Supervised Classifications, Predict time Speech recognition, Both A & B None of above C
Learning�? series, Annotate strings Regression
8 Commons unsupervised applications Object segmentation Similarity detection Automatic labeling All above D
include
9 Reinforcement learning is particularly the environment is not it's often very dynamic it's impossible to have a All above D
efficient when______________. completely deterministic precise error measure
10 if there is only a discrete number of Regression Classification. Modelfree Categories B
possible outcomes (called categories),
the process becomes a______.
11 Which of the following are supervised Spam detection, Image classification, Autonomous car driving, A
learning applications Pattern detection, Real-time visual tracking Logistic optimization Bioinformatics,
Natural Language Speech recognition
Processing
12 During the last few years, many ______ Logical Classical Classification None of above D
algorithms have been applied to deep
neural networks to learn the best policy
for playing Atari video games and to teach
an agent how to associate the right action
with an input representing the state.
13 Which of the following sentence is Machine learning relates Data mining can be defined Both A & B None of the above C
correct? with the study, design and as the process in which the
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
development of the unstructured data tries to
algorithms that give extract knowledge or
computers the capability to unknown interesting
learn without being explicitly patterns.
programmed.
14 What is �Overfitting� in Machine when a statistical model Robots are programed so While involving the process a set of data is used to A
learning? describes random error or that they can perform the of learning �overfitting� discover the potentially
noise instead of underlying task based on data they occurs. predictive relationship
relationship �overfitting� gather from sensors.
occurs.
15 What is �Test set�? Test set is used to test the It is a set of data is used to Both A & B None of above A
accuracy of the hypotheses discover the potentially
generated by the learner. predictive relationship.
16 ________is much more difficult because it's Removing the whole line Creating sub-model to Using an automatic All above B
necessary to determine a supervised predict those features strategy to input them
strategy to train a model for each feature according to the other
and, finally, to predict their value known values
17 How it's possible to use a different regression classification random_state missing_values D
placeholder through the
parameter_______.
18 If you need a more powerful scaling RobustScaler DictVectorizer LabelBinarizer FeatureHasher A
feature, with a superior control on outliers
and the possibility to select a quantile
range, there's also the class________.
19 scikit-learn also provides a class for per- max, l0 and l1 norms max, l1 and l2 norms max, l2 and l3 norms max, l3 and l4 norms B
sample normalization, Normalizer. It can
apply________to each element of a dataset
20 There are also many univariate methods F-tests and p-values chi-square ANOVA All above A
that can be used in order to select the
best features according to specific criteria
based on________.
21 Which of the following selects only a SelectPercentile FeatureHasher SelectKBest All above A
subset of features belonging to a certain
percentile
22 ________performs a PCA with non-linearly SparsePCA KernelPCA SVD None of the Mentioned B
separable data sets.
23 A feature F1 can take certain value: A, B, Feature F1 is an example of Feature F1 is an example of It doesn�t belong to any Both of these B
C, D, E, & F and represents grade of nominal variable. ordinal variable. of the above category.
students from a college.
Which of the following statement is true in
following case?
24 What would you do in PCA to get the Transform data to zero mean Transform data to zero Not possible None of these A
same projection as SVD? median
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
25 What is PCA, KPCA and ICA used for? Principal Components Kernel based Principal Independent Component All above D
Analysis Component Analysis Analysis
26 Can a model trained for item based YES NO A
similarity also choose from a given set of
items?
27 What are common feature selection correlation coefficient Greedy algorithms All above None of these C
methods in regression task?
28 The parameter______ allows specifying test_size training_size All above None of these C
the percentage of elements to put into the
test/training set
29 In many classification problems, the random_state dataset test_size All above B
target ______ is made up of categorical
labels which cannot immediately be
processed by any algorithm.
30 _______adopts a dictionary-oriented LabelEncoder class LabelBinarizer class DictVectorizer FeatureHasher A
approach, associating to each category
label a progressive integer number.
31 If Linear regression model perfectly first a) Test error is also always b) Test error is non zero c) Couldn�t comment on d) Test error is equal to Train c
i.e., train error is zero, then zero Test error error
_____________________
32 Which of the following metrics can be a) ii and iv b) i and ii c) ii, iii and iv d) i, ii, iii and iv d
used for evaluating regression models?i)
R Squaredii) Adjusted R Squarediii) F
Statisticsiv) RMSE / MSE / MAE
33 How many coefficients do you need to a) 1 b) 2 c) 3 d) 4 b
estimate in a simple linear regression
model (One independent variable)?
34 In a simple linear regression model (One a) by 1 b) no change c) by intercept d) by its slope d
independent variable), If we change the
input variable by 1 unit. How much output
variable will change?
35 �Function used for linear regression in R a) lm(formula, data) b) lr(formula, data) c) lrm(formula, data) d) regression.linear(formula, a
is __________ data)
36 In syntax of linear model a) Matrix b) Vector c) Array d) List b
lm(formula,data,..), data refers to ______
37 In the mathematical Equation of Linear a) (X-intercept, Slope) b) (Slope, X-Intercept) c) (Y-Intercept, Slope) d) (slope, Y-Intercept) c
Regression Y?=??1 + ?2X + ?, (?1, ?2)
refers to __________
38 Linear Regression is a supervised A) TRUE B) FALSE a
machine learning algorithm.
39 It is possible to design a Linear regression A) TRUE B) FALSE a
algorithm using a neural network?
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
40 Which of the following methods do we A)�Least Square Error B)�Maximum Likelihood C) Logarithmic Loss D) Both A and B a
use to find the best fit line for data in
Linear Regression?
41 Which of the following evaluation metrics A)�AUC-ROC B)�Accuracy C)�Logloss D)�Mean-Squared-Error d
can be used to evaluate a model while
modeling a continuous output variable?
42 Which of the following is true about A) Lower is better B)�Higher is better C)�A or B depend on the D)�None of these a
Residuals ? situation
43 Overfitting is more likely when you have A) TRUE B) FALSE b
huge amount of data to train?
44 Which of the following statement is true A)�Linear regression is B)�Linear regression is C)�Can�t say D)�None of these a
about outliers in Linear regression? sensitive to outliers not sensitive to outliers
45 Suppose you plotted a scatter plot A)�Since the there is a B)�Since the there is a C)�Can�t say D)�None of these a
between the residuals and predicted relationship means our relationship means our
values in linear regression and you found model is not good model is good
that there is a relationship between them.
Which of the following conclusion do you
make about this situation?
46 Naive Bayes classifiers are a collection Classification Clustering Regression All a
------------------of algorithms�
47 Naive Bayes classifiers is _______________ Supervised Unsupervised Both None a
Learning
48 Features being classified is independent False TRUE b
of each other in Na�ve Bayes Classifier
49 Features being classified is __________ of Independent Dependent Partial Dependent None a
each other in Na�ve Bayes Classifier
50 Bayes Theorem is given by where 1. P(H) True FALSE a bayes.jpg
is the probability of hypothesis H being
true.
2. P(E) is the probability of the
evidence(regardless of the hypothesis).
3. P(E|H) is the probability of the evidence
given that hypothesis is true.
4. P(H|E) is the probability of the
hypothesis given that the evidence is
there.
51 In given image, P(H|E) Posterior Prior a bayes.jpg
is__________probability.
52 In given image, P(H) Posterior Prior b bayes.jpg
is__________probability.
53 Conditional probability is a measure of the True FALSE a
probability of an event given that another
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
event has already occurred.
54 Bayes� theorem describes the True FALSE a
probability of an event, based on prior
knowledge of conditions that might be
related to the event.
55 Bernoulli Na�ve Bayes Classifier is Continuous Discrete Binary c
___________distribution
56 Multinomial Na�ve Bayes Classifier is Continuous Discrete Binary b
___________distribution
57 Gaussian Na�ve Bayes Classifier is Continuous Discrete Binary a
___________distribution
58 Binarize parameter in BernoulliNB scikit True FALSE a
sets threshold for binarizing of sample
features.
59 Gaussian distribution when plotted, gives Mean Variance Discrete Random a
a bell shaped curve which is symmetric
about the _______ of the feature values.
60 SVMs directly give us the posterior True FALSE b
probabilities P(y = 1jx) and P(y = ??1jx)
61 Any linear combination of the True FALSE a
components of a multivariate Gaussian is
a univariate Gaussian.
62 Solving a non linear separation problem True FALSE a
with a hard margin Kernelized SVM
(Gaussian RBF Kernel) might lead to
overfitting
63 SVM is a ------------------ algorithm� Classification Clustering Regression All a
64 SVM is a ------------------ learning Supervised Unsupervised Both None a
65 The linear�SVM�classifier works by True FALSE a
drawing a straight line between two
classes
66 Which of the following function provides cl_forecastB cl_nowcastC cl_precastD None of the Mentioned D --
unsupervised prediction ?
67 Which of the following is characteristic of fast accuracy scalable All above D --
best machine learning method ?
68 What are the different Algorithm Supervised Learning and Unsupervised Learning and Both A & B None of the Mentioned C --
techniques in Machine Learning? Semi-supervised Learning Transduction
69 What is the standard approach to split the set of example into group the set of example a set of observed learns programs from data A --
supervised learning? the training set and the test into the training set and the instances tries to induce a
test general rule
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
70 Which of the following is not Machine Artificial Intelligence Rule based inference Both A & B None of the Mentioned B --
Learning?
71 What is Model Selection in Machine The process of selecting when a statistical model Find interesting directions All above A --
Learning? models among different describes random error or in data and find novel
mathematical models, which noise instead of underlying observations/ database
are used to describe the relationship cleaning
same data set
72 Which are two techniques of Machine Genetic Programming and Speech recognition and Both A & B None of the Mentioned A --
Learning ? Inductive Learning Regression
73 Even if there are no actual supervisors Supervised Reinforcement Unsupervised None of the above B --
________ learning is also based on
feedback provided by the environment
74 What does learning exactly mean? Robots are programed so A set of data is used to Learning is the ability to It is a set of data is used to C --
that they can perform the discover the potentially change according to discover the potentially
task based on data they predictive relationship. external stimuli and predictive relationship.
gather from sensors. remembering most of all
previous experiences.
75 When it is necessary to allow the model to Overfitting Overlearning Classification Regression A --
develop a generalization ability and avoid
a common problem called______.
76 Techniques involve the usage of both Supervised Semi-supervised Unsupervised None of the above B --
labeled and unlabeled data is called___.
77 In reinforcement learning if feedback is Penalty Overlearning Reward None of above A --
negative one it is defined as____.
78 According to____ , it�s a key success Claude Shannon's theory Gini Index Darwin�s theory None of above C --
factor for the survival and evolution of all
species.
79 A supervised scenario is characterized by Programmer Teacher Author Farmer B --
the concept of a _____.
80 overlearning causes due to an excessive Capacity Regression Reinforcement Accuracy A --
______.
81 Which of the following is an example of a PCA K-Means None of the above A --
deterministic algorithm?
82 Which of the following model model MCV MARS MCRS All above B --
include a backwards elimination feature
selection routine?
83 Can we extract knowledge without apply YES NO A --
feature selection
84 While using feature selection on the data, NO YES B --
is the number of features decreases.
85 Which of the following are several models regression classification None of the above C --
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
for feature extraction
86 _____ provides some built-in datasets that scikit-learn classification regression None of the above A --
can be used for testing purposes.
87 While using _____ all labels are LabelEncoder class LabelBinarizer class DictVectorizer FeatureHasher A --
turned into sequential numbers.
88 _______produce sparse matrices of real DictVectorizer FeatureHasher Both A & B None of the Mentioned C --
numbers that can be fed into any machine
learning model.
89 scikit-learn offers the class______, which is LabelEncoder LabelBinarizer DictVectorizer Imputer D --
responsible for filling the holes using a
strategy based on the mean, median, or
frequency
90 Which of the following scale data by MinMaxScaler MaxAbsScaler Both A & B None of the Mentioned C --
removing elements that don't belong to a
given range or by considering a maximum
absolute value.
91 scikit-learn also provides a class for per- Normalizer Imputer Classifier All above A --
sample normalization,_____
92 ______dataset with many features normalized unnormalized Both A & B None of the Mentioned B --
contains information proportional to the
independence of all features and their
variance.
93 In order to assess how much information Concuttent matrix Convergance matrix Supportive matrix Covariance matrix D --
is brought by each component, and the
correlation among them, a useful tool is
the_____.
94 The_____ parameter can assume different run start stop C --
values which determine how the data init
matrix is initially processed.
95 ______allows exploiting the natural SparsePCA KernelPCA SVD init parameter A --
sparsity of data while extracting principal
components.
96 Which of the following evaluation metrics AUC-ROC Accuracy Logloss Mean-Squared-Error D --
can be used to evaluate a model while
modeling a continuous output variable?
97 Which of the following is true about Lower is better Higher is better A or B depend on the None of these A --
Residuals ? situation
98 Overfitting is more likely when you have TRUE FALSE B --
huge amount of data to train?
99 Which of the following statement is true Linear regression is sensitive Linear regression is not Can�t say None of these A --
about outliers in Linear regression? to outliers sensitive to outliers
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
100 Suppose you plotted a scatter plot Since the there is a Since the there is a Can�t say None of these A --
between the residuals and predicted relationship means our relationship means our
values in linear regression and you found model is not good model is good
that there is a relationship between them.
Which of the following conclusion do you
make about this situation?
101 Let�s say, a �Linear regression� model You will always have test You can not have test error None of the above C --
perfectly fits the training data (train error error zero zero
is zero). Now, Which of the following
statement is true?
102 In a linear regression problem, we are If R Squared increases, this If R Squared decreases, this Individually R squared None of these. C --
using �R-squared� to measure variable is significant. variable is not significant. cannot tell about variable
goodness-of-fit. We add a feature in linear importance. We can�t say
regression model and retrain the same anything about it right now.
model.Which of the following option is
true?
103 Which of the one is true about Linear Regression with Linear Regression with Linear Regression with None of these A --
Heteroskedasticity? varying error terms constant error terms zero error terms
104 Which of the following assumptions do 1,2 and 3. 1,3 and 4. 1 and 3. All of above. D --
we make while deriving linear regression
parameters?1. The true relationship
between dependent y and predictor x is
linear2. The model errors are statistically
independent3. The errors are normally
distributed with a 0 mean and constant
standard deviation4. The predictor x is
non-stochastic and is measured error-free
105 To test linear relationship of y(dependent) Scatter plot Barchart Histograms None of these A --
and x(independent) continuous variables,
which of the following plot best suited?
106 which of the following step / assumption The polynomial degree Whether we learn the The use of a constant-term A --
in regression modeling impacts the trade- weights by matrix inversion
off between under-fitting and over-fitting or gradient descent
the most.
107 Can we calculate the skewness of TRUE FALSE B --
variables based on mean and median?
108 Which of the following is true about Ridge regression uses Lasso regression uses Both use subset selection None of above B --
�Ridge� or �Lasso� regression subset selection of features subset selection of features of features
methods in case of feature selection?
109 Which of the following statement(s) can 1 and 2 1 and 3 2 and 4 None of the above A --
be true post adding a variable in a linear
regression model?1. R-Squared and
Adjusted R-squared both increase2. R-
Squared increases and Adjusted R-
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
squared decreases3. R-Squared
decreases and Adjusted R-squared
decreases4. R-Squared decreases and
Adjusted R-squared increases
110 How many coefficients do you need to 1 2 Can�t Say B --
estimate in a simple linear regression
model (One independent variable)?
111 In given image, P(H) Posterior Prior B bayes.jpg
is__________probability.
112 Conditional probability is a measure of the True FALSE A --
probability of an event given that another
event has already occurred.
113 Gaussian distribution when plotted, gives Mean Variance Discrete Random A --
a bell shaped curve which is symmetric
about the _______ of the feature values.
114 SVMs directly give us the posterior True FALSE B --
probabilities P(y = 1jx) and P(y = ??1jx)
115 SVM is a ------------------ algorithm� Classification Clustering Regression All A --
116 What is/are true about kernel in SVM?1. 1 2 1 and 2 None of these C --
Kernel function map low dimensional data
to high dimensional space2. It�s a
similarity function
117 Suppose you are building a SVM model on Misclassification would Data will be correctly Can�t say None of these A --
data X. The data X can be error prone happen classified
which means that you should not trust
any specific data point too much. Now
think that you want to build a SVM model
which has quadratic kernel function of
polynomial degree 2 that uses Slack
variable C as one of it�s hyper
parameter.What would happen when you
use very small C (C~0)?
118 The cost parameter in the SVM means: The number of cross- The kernel to be used The tradeoff between None of the above C --
validations to be made misclassification and
simplicity of the model
119 Bayes� theorem describes the True FALSE A --
probability of an event, based on prior
knowledge of conditions that might be
related to the event.
120 Bernoulli Na�ve Bayes Classifier is Continuous Discrete Binary C --
___________distribution
121 If you remove the non-red circled points TRUE FALSE B svm.jpg
from the data, the decision boundary will
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
change?
122 How do you handle missing or corrupted a. Drop missing rows or b. Replace missing values c. Assign a unique d. All of the above� D --
data in a dataset? columns with mean/median/mode category to missing values
123 Binarize parameter in BernoulliNB scikit True FALSE A --
sets threshold for binarizing of sample
features.
124 Which of the following statements about A.�����Attributes are B.�����Attributes are C.�����Attributes are D.�����Attributes can B --
Naive Bayes is incorrect? equally important. statistically dependent of statistically independent of be nominal or numeric
one another given the class one another given the
value. class value.
125 The SVM�s are less effective when: The data is linearly separable The data is clean and ready The data is noisy and C --
to use contains overlapping
points
126 Naive Bayes classifiers is _______________ Supervised Unsupervised Both None A --
Learning
127 Features being classified is independent False TRUE B --
of each other in Na�ve Bayes Classifier
128 Features being classified is __________ of Independent Dependent Partial Dependent None A --
each other in Na�ve Bayes Classifier
129 Bayes Theorem is given by where 1. P(H) True FALSE A bayes.jpg
is the probability of hypothesis H being
true.
2. P(E) is the probability of the
evidence(regardless of the hypothesis).
3. P(E|H) is the probability of the evidence
given that hypothesis is true.
4. P(H|E) is the probability of the
hypothesis given that the evidence is
there.
130 Any linear combination of the True FALSE A --
components of a multivariate Gaussian is
a univariate Gaussian.

This sheet
is for 2
Mark
questions
S.r No Question a b c d Correct Image
Answer
e.g 1 Write down question Option a Option b Option c Option d a/b/c/d img.jpg
1 A supervised scenario is characterized by Programmer Teacher Author Farmer B
the concept of a _____.
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
2 overlearning causes due to an excessive Capacity Regression Reinforcement Accuracy A
______.
3 If there is only a discrete number of Modelfree Categories Prediction None of above B
possible outcomes called _____.
4 What is the standard approach to split the set of example into group the set of example a set of observed learns programs from data A
supervised learning? the training set and the test into the training set and the instances tries to induce a
test general rule
5 Some people are using the term ___ Inference Interference Accuracy None of above A
instead of prediction only to avoid the
weird idea that machine learning is a sort
of modern magic.
6 The term _____ can be freely used, but Accuracy Cluster Regression Prediction D
with the same meaning adopted in
physics or system theory.
7 Which are two techniques of Machine Genetic Programming and Speech recognition and Both A & B None of the Mentioned A
Learning ? Inductive Learning Regression
8 Even if there are no actual supervisors Supervised Reinforcement Unsupervised None of the above B
________ learning is also based on
feedback provided by the environment
9 Common deep learning applications / Real-time visual object Classic approaches Automatic labeling Bio-inspired adaptive B
problems can also be solved using____ identification systems
10 Identify the various approaches for Concept Vs Classification Symbolic Vs Statistical Inductive Vs Analytical All above D
machine learning. Learning Learning Learning
11 what is the function of �Unsupervised Find clusters of the data and Find interesting directions Interesting coordinates All D
Learning�? find low-dimensional in data and find novel and correlations
representations of the data observations/ database
cleaning
12 What are the two methods used for the Platt Calibration and Isotonic Statistics and A
calibration in Supervised Learning? Regression Informal Retrieval
13 What is the standard approach to split the set of example into group the set of example a set of observed learns programs from data A
supervised learning? the training set and the test into the training set and the instances tries to induce a
test general rule
14 Which of the following is not Machine Artificial Intelligence Rule based inference Both A & B None of the Mentioned B
Learning?
15 What is Model Selection in Machine The process of selecting when a statistical model Find interesting directions All above A
Learning? models among different describes random error or in data and find novel
mathematical models, which noise instead of underlying observations/ database
are used to describe the relationship cleaning
same data set
16 _____ provides some built-in datasets that scikit-learn classification regression None of the above A
can be used for testing purposes.
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
17 While using _____ all labels are LabelEncoder class LabelBinarizer class DictVectorizer FeatureHasher A
turned into sequential numbers.
18 _______produce sparse matrices of real DictVectorizer FeatureHasher Both A & B None of the Mentioned C
numbers that can be fed into any machine
learning model.
19 scikit-learn offers the class______, which is LabelEncoder LabelBinarizer DictVectorizer Imputer D
responsible for filling the holes using a
strategy based on the mean, median, or
frequency
20 Which of the following scale data by MinMaxScaler MaxAbsScaler Both A & B None of the Mentioned C
removing elements that don't belong to a
given range or by considering a maximum
absolute value.
21 Which of the following model model MCV MARS MCRS All above B
include a backwards elimination feature
selection routine?
22 Can we extract knowledge without apply YES NO A
feature selection
23 While using feature selection on the data, NO YES B
is the number of features decreases.
24 Which of the following are several models regression classification None of the above C
for feature extraction
25 scikit-learn also provides a class for per- Normalizer Imputer Classifier All above A
sample normalization,_____
26 ______dataset with many features normalized unnormalized Both A & B None of the Mentioned B
contains information proportional to the
independence of all features and their
variance.
27 In order to assess how much information Concuttent matrix Convergance matrix Supportive matrix Covariance matrix D
is brought by each component, and the
correlation among them, a useful tool is
the_____.
28 The_____ parameter can assume different run start stop C
values which determine how the data init
matrix is initially processed.
29 ______allows exploiting the natural SparsePCA KernelPCA SVD init parameter A
sparsity of data while extracting principal
components.
30 Which of the following is an example of a PCA K-Means None of the above A
deterministic algorithm?
31 Let�s say, a �Linear regression� model A. You will always have test B. You can not have test C. None of the above c
perfectly fits the training data (train error error zero error zero
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
is zero). Now, Which of the following
statement is true?
32 In a linear regression problem, we are A. If R Squared increases, B. If R Squared decreases, C. Individually R squared D. None of these. c
using �R-squared� to measure this variable is significant. this variable is not cannot tell about variable
goodness-of-fit. We add a feature in linear significant. importance. We can�t say
regression model and retrain the same anything about it right now.
model.Which of the following option is
true?
33 Which of the one is true about A. Linear Regression with B. Linear Regression with C. Linear Regression with D. None of these a
Heteroskedasticity? varying error terms constant error terms zero error terms
34 Which of the following assumptions do A. 1,2 and 3. B. 1,3 and 4. C. 1 and 3. D. All of above. d
we make while deriving linear regression
parameters?1. The true relationship
between dependent y and predictor x is
linear2. The model errors are statistically
independent3. The errors are normally
distributed with a 0 mean and constant
standard deviation4. The predictor x is
non-stochastic and is measured error-free
35 To test linear relationship of y(dependent) A. Scatter plot B. Barchart C. Histograms D. None of these a
and x(independent) continuous variables,
which of the following plot best suited?
36 Generally, which of the following A. 1 and 2 B. only 1 C. only 2 D. None of these. b
method(s) is used for predicting
continuous dependent variable?1. Linear
Regression2. Logistic Regression
37 Suppose you are training a linear A. Both are False B. 1 is False and 2 is True C. 1 is True and 2 is False D. Both are True c
regression model. Now consider these
points.1. Overfitting is more likely if we
have less data2. Overfitting is more likely
when the hypothesis space is small.Which
of the above statement(s) are correct?
38 Suppose we fit �Lasso Regression� to a A. It is more likely for X1 to B. It is more likely for X1 to C. Can�t say D. None of these b
data set, which has 100 features be excluded from the model be included in the model
(X1,X2�X100).� Now, we rescale one of
these feature by multiplying with 10 (say
that feature is X1),� and then refit Lasso
regression with the same regularization
parameter.Now, which of the following
option will be correct?
39 Which of the following is true about A. Ridge regression uses B. Lasso regression uses C. Both use subset D. None of above b
�Ridge� or �Lasso� regression subset selection of features subset selection of features selection of features
methods in case of feature selection?
40 Which of the following statement(s) can A. 1 and 2 B. 1 and 3 C. 2 and 4 D. None of the above a
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
be true post adding a variable in a linear
regression model?1. R-Squared and
Adjusted R-squared both increase2. R-
Squared increases and Adjusted R-
squared decreases3. R-Squared
decreases and Adjusted R-squared
decreases4. R-Squared decreases and
Adjusted R-squared increases
41 We can also compute the coefficient of A. 1 and 2 B. 1 and 3. C. 2 and 3. D. 1,2 and 3. d
linear regression with the help of an
analytical method called �Normal
Equation�. Which of the following is/are
true about �Normal Equation�?1. We
don�t have to choose the learning rate2.
It becomes slow when number of features
is very large3. No need to iterate
42 How many coefficients do you need to A. 1 B. 2 C. Can�t Say b
estimate in a simple linear regression
model (One independent variable)?
43 �If two variables are correlated, is it A. Yes B. No b
necessary that they have a linear
relationship?
44 Correlated variables can have zero A. True B. False a
correlation coeffficient. True or False?
45 Which of the following option is true A. The relationship is B. The relationship is not C. The relationship is not D. The relationship is d
regarding �Regression� and symmetric between x and y symmetric between x and y symmetric between x and symmetric between x and y
�Correlation� ?Note: y is dependent in both. in both. y in case of correlation but in case of correlation but in
variable and x is independent variable. in case of regression it is case of regression it is not
symmetric. symmetric.
46 What is/are true about kernel in SVM?1. 1 2 1 and 2 None of these c
Kernel function map low dimensional data
to high dimensional space2. It�s a
similarity function
47 Suppose you are building a SVM model on Misclassification would Data will be correctly Can�t say None of these a
data X. The data X can be error prone happen classified
which means that you should not trust
any specific data point too much. Now
think that you want to build a SVM model
which has quadratic kernel function of
polynomial degree 2 that uses Slack
variable C as one of it�s hyper
parameter.What would happen when you
use very small C (C~0)?
48 Suppose you are using a Linear SVM yes no a svm.jpg
classifier with 2 class classification
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
problem. Now you have been given the
following data in which some points are
circled red that are representing support
vectors.If you remove the following any
one red points from the data. Does the
decision boundary will change?
49 If you remove the non-red circled points TRUE FALSE b svm.jpg
from the data, the decision boundary will
change?
50 When the C parameter is set to infinite, The optimal hyperplane if The soft-margin classifier None of the above a
which of the following holds true? exists, will be the one that will separate the data
completely separates the
data
51 Suppose you are building a SVM model on We can still classify data We can not classify data Can�t Say None of these a
data X. The data X can be error prone correctly for given setting of correctly for given setting
which means that you should not trust hyper parameter C of hyper parameter C
any specific data point too much. Now
think that you want to build a SVM model
which has quadratic kernel function of
polynomial degree 2 that uses Slack
variable C as one of it�s hyper
parameter.What would happen when you
use very large value of C(C->infinity)?
52 SVM can solve�linear�and non- TRUE FALSE a
linear�problems
53 The objective of the support vector TRUE FALSE a
machine algorithm is to find a hyperplane
in an N-dimensional space(N � the
number of features) that distinctly
classifies the data points.
54 Hyperplanes are _____________boundaries usual decision parallel b
that help classify the data points.�
55 The _____of the hyperplane depends upon dimension classification reduction a
the number of features.
56 Hyperplanes are decision boundaries that TRUE FALSE a
help classify the data points.�
57 SVM�algorithms�use�a set of TRUE FALSE a
mathematical functions that are defined
as the�kernel.
58 In SVM, Kernel function is used to map a TRUE FALSE a
lower dimensional data into a higher
dimensional data.
59 In SVR we try to fit the error within a TRUE FALSE a
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
certain threshold.
60 When the C parameter is set to infinite, The optimal hyperplane if The soft-margin classifier None of the above a
which of the following holds true? exists, will be the one that will separate the data
completely separates the
data
61 How do you handle missing or corrupted a. Drop missing rows or b. Replace missing values c. Assign a unique d. All of the above� d
data in a dataset? columns with mean/median/mode category to missing values
62 What is the purpose of performing cross- a. To assess the predictive b. To judge how the trained c. Both A and B� c
validation? performance of the models model performs outside the
sample on test data
63 Which of the following is true about Naive a. Assumes that all the b. Assumes that all the c. Both A and B� d. None of the above option c
Bayes ? features in a dataset are features in a dataset are
equally important independent
64 Which of the following statements about A.�����Attributes are B.�����Attributes are C.�����Attributes are D.�����Attributes can b
Naive Bayes is incorrect? equally important. statistically dependent of statistically independent of be nominal or numeric
one another given the class one another given the
value. class value.
65 Which of the following ��PCA ��Decision Tree ��Naive Bayesian Linerar regression a
is�not�supervised learning?
66 How can you avoid overfitting ? By using a lot of data By using inductive machine By using validation only None of above A --
learning
67 What are the popular algorithms of Decision Trees and Neural Probabilistic networks and Support vector machines All D --
Machine Learning? Networks (back Nearest Neighbor
propagation)
68 What is �Training set�? Training set is used to test A set of data is used to Both A & B None of above B --
the accuracy of the discover the potentially
hypotheses generated by the predictive relationship.
learner.
69 Identify the various approaches for Concept Vs Classification Symbolic Vs Statistical Inductive Vs Analytical All above D --
machine learning. Learning Learning Learning
70 what is the function of �Unsupervised Find clusters of the data and Find interesting directions Interesting coordinates All D --
Learning�? find low-dimensional in data and find novel and correlations
representations of the data observations/ database
cleaning
71 What are the two methods used for the Platt Calibration and Isotonic Statistics and A --
calibration in Supervised Learning? Regression Informal Retrieval
72 ______can be adopted when it's necessary Supervised Semi-supervised Reinforcement Clusters B --
to categorize a large amount of data with
a few complete examples or when there's
the need to impose some constraints to a
clustering algorithm.
73 In reinforcement learning, this feedback is Overfitting Overlearning Reward None of above C --
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
usually called as___.
74 In the last decade, many researchers Deep learning Machine learning Reinforcement learning Unsupervised learning A --
started training bigger and bigger models,
built with several different layers that's
why this approach is called_____.
75 there's a growing interest in pattern Regression Accuracy Modelfree Scalable C --
recognition and associative memories
whose structure and functioning are
similar to what happens in the neocortex.
Such an approach also allows simpler
algorithms called _____
76 ______ showed better performance than Machine learning Deep learning Reinforcement learning Supervised learning B --
other approaches, even without a context-
based model
77 Common deep learning applications / Real-time visual object Classic approaches Automatic labeling Bio-inspired adaptive B --
problems can also be solved using____ identification systems
78 Some people are using the term ___ Inference Interference Accuracy None of above A --
instead of prediction only to avoid the
weird idea that machine learning is a sort
of modern magic.
79 The term _____ can be freely used, but Accuracy Cluster Regression Prediction D --
with the same meaning adopted in
physics or system theory.
80 If there is only a discrete number of Modelfree Categories Prediction None of above B --
possible outcomes called _____.
81 A feature F1 can take certain value: A, B, Feature F1 is an example of Feature F1 is an example of It doesn�t belong to any Both of these B --
C, D, E, & F and represents grade of nominal variable. ordinal variable. of the above category.
students from a college.
Which of the following statement is true in
following case?
82 What would you do in PCA to get the Transform data to zero mean Transform data to zero Not possible None of these A --
same projection as SVD? median
83 What is PCA, KPCA and ICA used for? Principal Components Kernel based Principal Independent Component All above D --
Analysis Component Analysis Analysis
84 Can a model trained for item based YES NO A --
similarity also choose from a given set of
items?
85 What are common feature selection correlation coefficient Greedy algorithms All above None of these C --
methods in regression task?
86 The parameter______ allows specifying test_size training_size All above None of these C --
the percentage of elements to put into the
test/training set
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
87 In many classification problems, the random_state dataset test_size All above B --
target ______ is made up of categorical
labels which cannot immediately be
processed by any algorithm.
88 _______adopts a dictionary-oriented LabelEncoder class LabelBinarizer class DictVectorizer FeatureHasher A --
approach, associating to each category
label a progressive integer number.
89 ________is much more difficult because it's Removing the whole line Creating sub-model to Using an automatic All above B --
necessary to determine a supervised predict those features strategy to input them
strategy to train a model for each feature according to the other
and, finally, to predict their value known values
90 How it's possible to use a different regression classification random_state missing_values D --
placeholder through the
parameter_______.
91 If you need a more powerful scaling RobustScaler DictVectorizer LabelBinarizer FeatureHasher A --
feature, with a superior control on outliers
and the possibility to select a quantile
range, there's also the class________.
92 scikit-learn also provides a class for per- max, l0 and l1 norms max, l1 and l2 norms max, l2 and l3 norms max, l3 and l4 norms B --
sample normalization, Normalizer. It can
apply________to each element of a dataset
93 There are also many univariate methods F-tests and p-values chi-square ANOVA All above A --
that can be used in order to select the
best features according to specific criteria
based on________.
94 Which of the following selects only a SelectPercentile FeatureHasher SelectKBest All above A --
subset of features belonging to a certain
percentile
95 ________performs a PCA with non-linearly SparsePCA KernelPCA SVD None of the Mentioned B --
separable data sets.
96 �If two variables are correlated, is it Yes No B --
necessary that they have a linear
relationship?
97 Correlated variables can have zero TRUE FALSE A --
correlation coeffficient. True or False?
98 Suppose we fit �Lasso Regression� to a It is more likely for X1 to be It is more likely for X1 to be Can�t say None of these B --
data set, which has 100 features excluded from the model included in the model
(X1,X2�X100).� Now, we rescale one of
these feature by multiplying with 10 (say
that feature is X1),� and then refit Lasso
regression with the same regularization
parameter.Now, which of the following
option will be correct?
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
99 If Linear regression model perfectly first Test error is also always zero Test error is non zero Couldn�t comment on Test error is equal to Train C --
i.e., train error is zero, then Test error error
_____________________
100 Which of the following metrics can be ii and iv i and ii ii, iii and iv i, ii, iii and iv D --
used for evaluating regression models?i)
R Squaredii) Adjusted R Squarediii) F
Statisticsiv) RMSE / MSE / MAE
101 In syntax of linear model Matrix Vector Array List B --
lm(formula,data,..), data refers to ______
102 Linear Regression is a supervised TRUE FALSE A --
machine learning algorithm.
103 It is possible to design a Linear regression TRUE FALSE A --
algorithm using a neural network?
104 Which of the following methods do we Least Square Error Maximum Likelihood Logarithmic Loss Both A and B A --
use to find the best fit line for data in
Linear Regression?
105 Suppose you are training a linear Both are False 1 is False and 2 is True 1 is True and 2 is False Both are True C --
regression model. Now consider these
points.1. Overfitting is more likely if we
have less data2. Overfitting is more likely
when the hypothesis space is small.Which
of the above statement(s) are correct?
106 We can also compute the coefficient of 1 and 2 1 and 3. 2 and 3. 1,2 and 3. D --
linear regression with the help of an
analytical method called �Normal
Equation�. Which of the following is/are
true about �Normal Equation�?1. We
don�t have to choose the learning rate2.
It becomes slow when number of features
is very large3. No need to iterate
107 Which of the following option is true The relationship is The relationship is not The relationship is not The relationship is D --
regarding �Regression� and symmetric between x and y symmetric between x and y symmetric between x and symmetric between x and y
�Correlation� ?Note: y is dependent in both. in both. y in case of correlation but in case of correlation but in
variable and x is independent variable. in case of regression it is case of regression it is not
symmetric. symmetric.
108 In a simple linear regression model (One by 1 no change by intercept by its slope D --
independent variable), If we change the
input variable by 1 unit. How much output
variable will change?
109 Generally, which of the following 1 and 2 only 1 only 2 None of these. B --
method(s) is used for predicting
continuous dependent variable?1. Linear
Regression2. Logistic Regression
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
110 How many coefficients do you need to 1 2 3 4 B --
estimate in a simple linear regression
model (One independent variable)?
111 Suppose you are building a SVM model on We can still classify data We can not classify data Can�t Say None of these A --
data X. The data X can be error prone correctly for given setting of correctly for given setting
which means that you should not trust hyper parameter C of hyper parameter C
any specific data point too much. Now
think that you want to build a SVM model
which has quadratic kernel function of
polynomial degree 2 that uses Slack
variable C as one of it�s hyper
parameter.What would happen when you
use very large value of C(C->infinity)?
112 SVM can solve�linear�and non- TRUE FALSE A --
linear�problems
113 The objective of the support vector TRUE FALSE A --
machine algorithm is to find a hyperplane
in an N-dimensional space(N � the
number of features) that distinctly
classifies the data points.
114 Hyperplanes are _____________boundaries usual decision parallel B --
that help classify the data points.�
115 When the C parameter is set to infinite, The optimal hyperplane if The soft-margin classifier None of the above A --
which of the following holds true? exists, will be the one that will separate the data
completely separates the
data
116 SVM is a ------------------ learning Supervised Unsupervised Both None A --
117 The linear�SVM�classifier works by True FALSE A --
drawing a straight line between two
classes
118 In a real problem, you should check to see TRUE FALSE B --
if the SVM is separable and then include
slack variables if it is not separable.
119 Which of the following are real world Text and Hypertext Image Classification Clustering of News All of the above D --
applications of the SVM? Categorization Articles
120 The _____of the hyperplane depends upon dimension classification reduction A --
the number of features.
121 Hyperplanes are decision boundaries that TRUE FALSE A --
help classify the data points.�
122 SVM�algorithms�use�a set of TRUE FALSE A --
mathematical functions that are defined
as the�kernel.
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
123 Naive Bayes classifiers are a collection Classification Clustering Regression All A --
------------------of algorithms�
124 In given image, P(H|E) Posterior Prior A bayes.jpg
is__________probability.
125 Solving a non linear separation problem True FALSE A
with a hard margin Kernelized SVM
(Gaussian RBF Kernel) might lead to
overfitting
126 100 people are at party. Given data gives TRUE FALSE A man.jpg
information about how many wear pink or
not, and if a man or not. Imagine a pink
wearing guest leaves, was it a man?
127 For the given weather data, Calculate 0.4 0.64 0.29 0.75 B weather
probability of playing data.jpg
128 In SVM, Kernel function is used to map a TRUE FALSE A --
lower dimensional data into a higher
dimensional data.
129 In SVR we try to fit the error within a TRUE FALSE A --
certain threshold.
130 When the C parameter is set to infinite, The optimal hyperplane if The soft-margin classifier None of the above A --
which of the following holds true? exists, will be the one that will separate the data
completely separates the
data

This sheet
is for 3
Mark
questions
S.r No Question a b c d Correct Image
Answer
e.g 1 Write down question Option a Option b Option c Option d a/b/c/d img.jpg
1 Which of the following is characteristic of fast accuracy scalable All above D
best machine learning method ?
2 What are the different Algorithm Supervised Learning and Unsupervised Learning and Both A & B None of the Mentioned C
techniques in Machine Learning? Semi-supervised Learning Transduction
3 ______can be adopted when it's necessary Supervised Semi-supervised Reinforcement Clusters B
to categorize a large amount of data with
a few complete examples or when there's
the need to impose some constraints to a
clustering algorithm.
4 In reinforcement learning, this feedback is Overfitting Overlearning Reward None of above C
usually called as___.
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
5 In the last decade, many researchers Deep learning Machine learning Reinforcement learning Unsupervised learning A
started training bigger and bigger models,
built with several different layers that's
why this approach is called_____.
6 What does learning exactly mean? Robots are programed so A set of data is used to Learning is the ability to It is a set of data is used to C
that they can perform the discover the potentially change according to discover the potentially
task based on data they predictive relationship. external stimuli and predictive relationship.
gather from sensors. remembering most of all
previous experiences.
7 When it is necessary to allow the model to Overfitting Overlearning Classification Regression A
develop a generalization ability and avoid
a common problem called______.
8 Techniques involve the usage of both Supervised Semi-supervised Unsupervised None of the above B
labeled and unlabeled data is called___.
9 there's a growing interest in pattern Regression Accuracy Modelfree Scalable C
recognition and associative memories
whose structure and functioning are
similar to what happens in the neocortex.
Such an approach also allows simpler
algorithms called _____
10 ______ showed better performance than Machine learning Deep learning Reinforcement learning Supervised learning B
other approaches, even without a context-
based model
11 Which of the following sentence is Machine learning relates Data mining can be defined Both A & B None of the above C --
correct? with the study, design and as the process in which the
development of the unstructured data tries to
algorithms that give extract knowledge or
computers the capability to unknown interesting
learn without being explicitly patterns.
programmed.
12 What is �Overfitting� in Machine when a statistical model Robots are programed so While involving the process a set of data is used to A --
learning? describes random error or that they can perform the of learning �overfitting� discover the potentially
noise instead of underlying task based on data they occurs. predictive relationship
relationship �overfitting� gather from sensors.
occurs.
13 What is �Test set�? Test set is used to test the It is a set of data is used to Both A & B None of above A --
accuracy of the hypotheses discover the potentially
generated by the learner. predictive relationship.
14 what is the function of �Supervised Classifications, Predict time Speech recognition, Both A & B None of above C --
Learning�? series, Annotate strings Regression
15 Commons unsupervised applications Object segmentation Similarity detection Automatic labeling All above D --
include
16 Reinforcement learning is particularly the environment is not it's often very dynamic it's impossible to have a All above D --
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
efficient when______________. completely deterministic precise error measure
17 During the last few years, many ______ Logical Classical Classification None of above D --
algorithms have been applied to deep
neural networks to learn the best policy
for playing Atari video games and to teach
an agent how to associate the right action
with an input representing the state.
18 Common deep learning applications Image classification, Autonomous car driving, All above D --
include____ Real-time visual tracking Logistic optimization Bioinformatics,
Speech recognition
19 if there is only a discrete number of Regression Classification. Modelfree Categories B --
possible outcomes (called categories),
the process becomes a______.
20 Which of the following are supervised Spam detection, Image classification, Autonomous car driving, A --
learning applications Pattern detection, Real-time visual tracking Logistic optimization Bioinformatics,
Natural Language Speech recognition
Processing
21 Let�s say, you are working with All categories of categorical Frequency distribution of Train and Test always have Both A and B D --
categorical feature(s) and you have not variable are not present in categories is different in same distribution.
looked at the distribution of the the test dataset. train as compared to the
categorical variable in the test data. test dataset.

You want to apply one hot encoding (OHE)


on the categorical feature(s). What
challenges you may face if you have
applied OHE on a categorical variable of
train dataset?
22 Which of the following sentence is FALSE It relates inputs to outputs. It is used for prediction. It may be used for It discovers causal D --
regarding regression? interpretation. relationships.
23 Which of the following method is used to k-Means Density-Based Spatial Spectral Clustering Find All above D --
find the optimal features for cluster Clustering clusters
analysis
24 scikit-learn also provides functions for make_classification() make_regression() make_blobs() All above D --
creating
dummy datasets from scratch:
25 _____which can accept a NumPy make_blobs random_state test_size training_size B --
RandomState generator or an integer
seed.
26 In many classification problems, the 1 2 3 4 B --
target dataset is made up of categorical
labels which cannot immediately be
processed by any algorithm. An encoding
is needed and scikit-learn offers at
least_____valid options
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
27 In which of the following each categorical LabelEncoder class DictVectorizer LabelBinarizer class FeatureHasher C --
label is first turned into a positive integer
and then transformed into a vector where
only one feature is 1 while all the others
are 0.
28 ______is the most drastic one and should Removing the whole line Creating sub-model to Using an automatic All above A --
be considered only when the dataset is predict those features strategy to input them
quite large, the number of missing according to the other
features is high, and any prediction could known values
be risky.
29 It's possible to specify if the scaling with_mean=True/False with_std=True/False Both A & B None of the Mentioned C --
process must include both mean and
standard deviation using the
parameters________.
30 Which of the following selects the best K SelectPercentile FeatureHasher SelectKBest All above C --
high-score features.
31 How does number of observations 1 and 4 2 and 3 1 and 3 None of theses A --
influence overfitting? Choose the correct
answer(s).Note: Rest all parameters are
same1. In case of fewer observations, it is
easy to overfit the data.2. In case of fewer
observations, it is hard to overfit the
data.3. In case of more observations, it is
easy to overfit the data.4. In case of more
observations, it is hard to overfit the data.
32 Suppose you have fitted a complex In case of very large lambda; In case of very large In case of very large In case of very large lambda; C --
regression model on a dataset. Now, you bias is low, variance is low lambda; bias is low, lambda; bias is high, bias is high, variance is high
are using Ridge regression with tuning variance is high variance is low
parameter lambda to reduce its
complexity. Choose the option(s) below
which describes relationship of bias and
variance with lambda.
33 What is/are true about ridge regression?1. 1 and 3 1 and 4 2 and 3 2 and 4 A --
When lambda is 0, model works like linear
regression model2. When lambda is 0,
model doesn�t work like linear
regression model3. When lambda goes to
infinity, we get very, very small coefficients
approaching 04. When lambda goes to
infinity, we get very, very large coefficients
approaching infinity
34 Which of the following method(s) does Ridge regression Lasso Both Ridge and Lasso None of both B --
not have closed form solution for its
coefficients?
35 �Function used for linear regression in R lm(formula, data) lr(formula, data) lrm(formula, data) regression.linear(formula, A --
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
is __________ data)
36 In the mathematical Equation of Linear (X-intercept, Slope) (Slope, X-Intercept) (Y-Intercept, Slope) (slope, Y-Intercept) C --
Regression Y?=??1 + ?2X + ?, (?1, ?2)
refers to __________
37 Suppose that we have N independent Relation between the X1 and Relation between the X1 Relation between the X1 Correlation can�t judge the B --
variables (X1,X2� Xn) and dependent Y is weak and Y is strong and Y is neutral relationship
variable is Y. Now Imagine that you are
applying linear regression by fitting the
best fit line using least square error on
this data. You found that correlation
coefficient for one of it�s variable(Say
X1) with Y is -0.95.Which of the following
is true for X1?
38 We have been given a dataset with n Increase Decrease Remain constant Can�t Say D --
records in which we have input attribute
as x and output attribute as y. Suppose we
use a linear regression method to model
this data. To test our linear regressor, we
split the data in training set and test set
randomly. Now we increase the training
set size gradually. As the training set size
increases, what do you expect will happen
with the mean training error?
39 We have been given a dataset with n �Bias increases and Bias decreases and Bias decreases and Bias increases and Variance D --
records in which we have input attribute Variance increases Variance increases Variance decreases decreases
as x and output attribute as y. Suppose we
use a linear regression method to model
this data. To test our linear regressor, we
split the data in training set and test set
randomly. What do you expect will
happen with bias and variance as you
increase the size of training data?
40 Suppose, you got a situation where you 1 and 2 2 and 3 �1 and 3 1, 2 and 3 A --
find that your linear regression model is
under fitting the data. In such situation
which of the following options would you
consider?1. I will add more variables2. I
will start introducing polynomial degree
variables3. I will remove some variables
41 Problem:�Players will play if weather is TRUE FALSE A weather
sunny. Is this statement is correct? data.jpg
42 Multinomial Na�ve Bayes Classifier is Continuous Discrete Binary B
___________distribution
43 For the given weather data, Calculate 0.4 0.64 0.36 0.5 C weather
probability of not playing data.jpg
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
44 Suppose you have trained an SVM with You want to increase your You want to decrease your You will try to calculate You will try to reduce the C --
linear decision boundary after training data points data points more variables features
SVM, you correctly infer that your SVM
model is under fitting.Which of the
following option would you more likely to
consider iterating SVM next time?
45 The minimum time complexity for training Large datasets Small datasets Medium sized datasets Size does not matter A --
an SVM is O(n2). According to this fact,
what sizes of datasets are not best suited
for SVM�s?
46 The effectiveness of an SVM depends Selection of Kernel Kernel Parameters Soft Margin Parameter C All of the above D --
upon:
47 What do you mean by generalization error How far the hyperplane is How accurately the SVM The threshold amount of B --
in terms of the SVM? from the support vectors can predict outcomes for error in an SVM
unseen data
48 What do you mean by a hard margin? The SVM allows very low The SVM allows high None of the above A --
error in classification amount of error in
classification
49 We usually use feature normalization 1 1 and 2 1 and 3 2 and 3 B --
before using the Gaussian kernel in SVM.
What is true about feature normalization?
1.�We do feature normalization so that
new feature will dominate other 2. Some
times, feature normalization is not
feasible in case of categorical variables3.
Feature normalization always helps when
we use Gaussian kernel in SVM
50 Support vectors are the data points that TRUE FALSE A --
lie closest to the decision surface.
51 Which of the following ��PCA ��Decision Tree ��Naive Bayesian Linerar regression A --
is�not�supervised learning?
52 Suppose you are using RBF kernel in SVM The model would consider The model would consider The model would not be None of the above B --
with high Gamma value. What does this even far away points from only the points close to the affected by distance of
signify? hyperplane for modeling hyperplane for modeling points from hyperplane for
modeling
53 Gaussian Na�ve Bayes Classifier is Continuous Discrete Binary A --
___________distribution
54 If I am using all features of my dataset Underfitting Nothing, the model is Overfitting C --
and I achieve 100% accuracy on my perfect
training set, but ~70% on validation set,
what should I look out for?
55 What is the purpose of performing cross- a. To assess the predictive b. To judge how the trained c. Both A and B� C --
validation? performance of the models model performs outside the
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
sample on test data
56 Which of the following is true about Naive a. Assumes that all the b. Assumes that all the c. Both A and B� d. None of the above option C --
Bayes ? features in a dataset are features in a dataset are
equally important independent
57 Suppose you are using a Linear SVM yes no A svm.jpg
classifier with 2 class classification
problem. Now you have been given the
following data in which some points are
circled red that are representing support
vectors.If you remove the following any
one red points from the data. Does the
decision boundary will change?
58 Linear SVMs have no hyperparameters TRUE FALSE B --
that need to be set by cross-validation
59 For the given weather data, what is the 0.5 0.26 0.73 0.6 D weather
probability that players will play if weather data.jpg
is sunny
60 100 people are at party. Given data gives 0.4 0.2 0.6 0.45 B man.jpg
information about how many wear pink or
not, and if a man or not. Imagine a pink
wearing guest leaves, what is the
probability of being a man
61 Problem:�Players will play if weather is TRUE FALSE a weather
sunny. Is this statement is correct? data.jpg
62 For the given weather data, Calculate 0.4 0.64 0.29 0.75 b weather
probability of playing data.jpg
63 For the given weather data, Calculate 0.4 0.64 0.36 0.5 c weather
probability of not playing data.jpg
64 For the given weather data, what is the 0.5 0.26 0.73 0.6 d weather
probability that players will play if weather data.jpg
is sunny
65 100 people are at party. Given data gives 0.4 0.2 0.6 0.45 b man.jpg
information about how many wear pink or
not, and if a man or not. Imagine a pink
wearing guest leaves, what is the
probability of being a man
66 100 people are at party. Given data gives TRUE FALSE a man.jpg
information about how many wear pink or
not, and if a man or not. Imagine a pink
wearing guest leaves, was it a man?
67 What do you mean by generalization error How far the hyperplane is How accurately the SVM The threshold amount of b
in terms of the SVM? from the support vectors can predict outcomes for error in an SVM
unseen data
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
68 What do you mean by a hard margin? The SVM allows very low The SVM allows high None of the above a
error in classification amount of error in
classification
69 The minimum time complexity for training Large datasets Small datasets Medium sized datasets Size does not matter a
an SVM is O(n2). According to this fact,
what sizes of datasets are not best suited
for SVM�s?
70 The effectiveness of an SVM depends Selection of Kernel Kernel Parameters Soft Margin Parameter C All of the above d
upon:
71 Support vectors are the data points that TRUE FALSE a
lie closest to the decision surface.
72 The SVM�s are less effective when: The data is linearly separable The data is clean and ready The data is noisy and c
to use contains overlapping
points
73 Suppose you are using RBF kernel in SVM The model would consider The model would consider The model would not be None of the above b
with high Gamma value. What does this even far away points from only the points close to the affected by distance of
signify? hyperplane for modeling hyperplane for modeling points from hyperplane for
modeling
74 The cost parameter in the SVM means: The number of cross- The kernel to be used The tradeoff between None of the above c
validations to be made misclassification and
simplicity of the model
75 If I am using all features of my dataset Underfitting Nothing, the model is Overfitting c
and I achieve 100% accuracy on my perfect
training set, but ~70% on validation set,
what should I look out for?
76 Which of the following are real world Text and Hypertext Image Classification Clustering of News All of the above d
applications of the SVM? Categorization Articles
77 Suppose you have trained an SVM with You want to increase your You want to decrease your You will try to calculate You will try to reduce the c
linear decision boundary after training data points data points more variables features
SVM, you correctly infer that your SVM
model is under fitting.Which of the
following option would you more likely to
consider iterating SVM next time?
78 We usually use feature normalization 1 1 and 2 1 and 3 2 and 3 b
before using the Gaussian kernel in SVM.
What is true about feature normalization?
1.�We do feature normalization so that
new feature will dominate other 2. Some
times, feature normalization is not
feasible in case of categorical variables3.
Feature normalization always helps when
we use Gaussian kernel in SVM
79 Linear SVMs have no hyperparameters TRUE FALSE b
This sheet FIELD2 FIELD3 FIELD4 FIELD5 FIELD6 FIELD7 FIELD8 FIELD9
is for 1
Mark
questions
that need to be set by cross-validation
80 In a real problem, you should check to see TRUE FALSE b
if the SVM is separable and then include
slack variables if it is not separable.
MCQs on Unit 1( One mark questions)

1. A system which performs following tasks:


Taking Input , Processing input & Providing output, can be called as:
a. Adaptive System
b. Classic System- answer
c. Reinforced Learning
d. None of the above

2. A system which performs following tasks:


Taking Input , Processing input, Providing output & Tuning Parameters through Feedback
from environment can be termed as:
a. Adaptive System.
b. Classic System.
c. Non Adaptive System
d. None of the above

3. Classification and Regression techniques fall under the category of

a. Supervised Learning
b. Unsupervised Learning
c. Semi-supervised Learning
d. Reinforcement Learning

4. Supervised Learning algorithms are accompanied by both Input and Expected Output?

a. True- answer
b. False

5. Linear Regression, Random Forest , SVM are examples of

a. Supervised Learning- answer


b. Unsupervised Learning
c. Semi-Supervised Learning
d. Reinforcement Learning

6. Decision Tree algorithm can work on

a. Only Categorical values


b. Only Continuous values
c. Both Categorical and Continuous values- answer
d. None of the above
7. If the input and output variables are continuous in nature, which technique is more preferred?

a. Regression- answer
b. Classification
c. Association Rule mining
d. All of these

8. k-NN algorithm does more computation on ‘test’ time rather than ‘train’ time.

a. True- answer
b. False

9. Which of the following distance metric can be used in k-NN?

a. Manhattan
b. Minkowski
c. Jaccard
d. Mahalanobis
e. All can be used- answer

10. Which of the following machine learning algorithm can be used for imputing missing values
of both categorical and continuous variables?

a. K-NN- answer
b. Linear Regression
c. Logistic Regression
d. Decision Tree

11. Which of the following algorithm isNOT an example of ensemble learning algorithm

a. Random Forest
b. Adaboost
c. Gradient Boosting
d. Decision Trees

12. Spam detection, pattern detection, NLP are examples of

a. Semi-supervised learning.
b. Supervised Learning
c. Unsupervised Learning
d. All of these

13. Clustering technique & Association rule mining are examples of


a. Supervised Learning
b. Semi-supervised Learning
c. Unsupervised Learning- answer
d. Reinforcement Learning

14. Unsupervised Learning algorithms are accompanied by both Input and Expected Output?

a. True
b. False (Only Input) - answer

15. K-Means technique is an example of

a. Clustering- answer
b. Classification
c. Regression
d. Association

16. Which of the following is/are types of clustering

a. Centroid-based Clustering
b. Density-based Clustering
c. Hierarchical Clustering
d. All of the above- answer

17. Learning algorithms that use both labelled and unlabelled data can be categorised as

a. Supervised Algorithms
b. Unsupervised Algorithms
c. Semi-supervised Algorithms- answer
d. Reinforcement Learning

18. Reinforcement learning is particularly efficient when the environment is NOT


completely deterministic

a. True- answer
b. False

19. When the number of output classes is greater than one, which is / are the possible strategy
used to handle them

a. One-vs-All
b. One-vs-One
c. Both of them- answer
d. None of the above
20. In One-vs-All strategy how many classifiers are trained for n classes

a. 1
b. n- answer
c. n/2
d. None of the above

21. In One-vs-One strategy how many classifiers are trained for n classes

a. 1
b. n
c. n*(n-1)/2- answer
d. n/2

22. When the model isn't able to capture the dynamicsshown by the same training set, such
situation is called as

a. Underfitting- answer
b. Overfitting
c. Normal fitting
d. Regularization

23. When the model can associate almost perfectly all the known samples to the corresponding
output values, but when an unknown input is presented, the corresponding prediction error
can be very high, such situation is called as

a. Underfitting
b. Overfitting- answer
c. Normal fitting
d. None of these

24. The formula given below is to calculate_____________

a. Posterior Probability in Naïve Bayes Classifier- answer


b. Prior Probability in Naïve Bayes Classifier
c. Entropy in Decision Tree classifier
d. None of the above

25. The following


formula is used to calculate __________

a. Information Gain
b. Entropy- answer
c. Probability of an event
d. None of the above

26. Which algorithm is not a type of Parametric Learning?

a. Logistic Regression
b. Naïve Bayes
c. K-Nearest Neighbors- answer
d. Simple Neural Networks

27. What is Machine learning?

a. The autonomous acquisition of knowledge through the use of computer programs-


answer
b. The autonomous acquisition of knowledge through the use of manual programs
c. The selective acquisition of knowledge through the use of computer programs
d. The selective acquisition of knowledge through the use of manual programs

28. Which of the factors affect the performance of learner system does not include?

a. Representation scheme used


b. Training scenario
c. Type of feedback
d. Good data structures- answer

29. Which system is based on static or permanent structures?

a. Adaptive system
b. Non-adaptive system- answer
c. Both
d. None of the above

30. Which is not a type of supervised learning algorithm?

a. K-Nearest Neighbor
b. Decision Tree
c. K-means- answer
d. Linear Regression

31. From following, which are the approaches to Machine Learning?

a. Supervised Learning
b. Unsupervised Learning
c. Reinforcement Learning
d. All of the above- answer

32. In which type of Learning, both features and labels are given to an algorithm?

a. Supervised Learning- answer


b. Unsupervised Learning
c. Reinforcement Learning
d. None of the above

33. In which type of learning, the algorithm maps input variable to output variable?

a. Supervised Learning- answer


b. Unsupervised Learning
c. Reinforcement Learning
d. None of the above

34. Which is not a type of Supervised Learning?

a. Classification
b. Regression
c. Clustering- answer
d. None of the above

35. Which approach should be use to e-mail spam filtering?

a. Classification- answer
b. Clustering
c. Regression
d. Association

36. Which approach should be use to predict sales of a supermarket?

a. Classification
b. Clustering
c. Regression- answer
d. Association

37. In which learning technique, the system discovers patterns from dataset?

a. Supervised Learning
b. Unsupervised Learning- answer
c. Reinforcement Learning
d. None of the above
38. In which type of learning, the problem can be solved without knowing labels?

a. Supervised Learning
b. Unsupervised Learning- answer
c. All of the above
d. None of the above

39. Which type of problem discovers groups of data based on similarities?

a. Clustering- answer
b. Association
c. Regression
d. None of the above

40. Which type of problem discovers rules to describe large data?

a. Clustering
b. Association- answer
c. Regression
d. None of the above

41. From the following, which is best suited to build a game of chess?

a. Supervised Learning
b. Unsupervised Learning
c. Deep Learning- answer
d. None of the above

42. In which type, rewards and punishments are given as a feedback?

a. Supervised Learning
b. Unsupervised Learning
c. Reinforcement Learning- answer
d. None of the above

43. Which approach should be use for automatic labelling?

a. Supervised Learning
b. Unsupervised Learning- answer
c. Reinforcement Learning
d. None of the above

44. From the options, which application you should solve by deep learning for the best
performance?

a. Spam filtering
b. Image classification- answer
c. Sales prediction
d. Automatic labelling
45. A neural network model is said to be inspired from the human brain.Which of the following
statement(s) correctly represents a real neuron?

a. A neuron has a single input and a single output only


b. A neuron has multiple inputs but a single output only
c. A neuron has a single input but multiple outputs
d. All of the above statements are valid- answer

46. What is unsupervised learning?


a. Features of group explicitly stated
b. Number of groups may be known
c. Neither feature & nor number of groups is known- answer
d. None of the mentioned

47. Which is not a correct statement with respect to Deep Learning?

a. Large computing power is required


b. Less complex than machine learning- answer
c. Difficulty in interpreting the resulting models
d. Requires large amount of labelled data

48. Which algorithm is not a type of Non-parametric learning?

a. Naïve bayes- answer


b. C4.5
c. K-Nearest Neighbor
d. Support Vector Machines

49. In which type, the training data is modelled very well?

a. Underfitting
b. Overfitting- answer
c. Both
d. Not a and b

50. Which model gives poor performance on training data?

a. Underfitting- answer
b. Overfitting
c. Both
d. None of the above
Unit 1: Two marks questions
1. The goal(s) of the supervised learning system is (are) ___________
a. Training a system that must also work with samples never seen before.
b. To allow the model to develop a generalization ability and avoid a common problem
called over fitting
c. Supervisor: to provide the agent with a precise measure of its error
d. All of the above- answer

2. Identify the type of model for the given problem

a. Reinforcement learning
b. Supervised learning- answer
c. Un supervised learning
d. Semi supervised learning

3. The goal (s) of Classification techniques is (are) __________


a. Try to find the best separating hyperplane (in this case, it's a linear problem).
b. Reduce the number of misclassifications
c. Increasing the noise-robustness
d. All of these- answer

4. Consider D be a training set of n samples , each sample is represented by X of m features , X


= (x1, x2, x3 …… xn), Consider C classes : C1, C2…… Cc.
Bayesian classifier predicts that tuple X belongs to class Ci iff.
a. P(Ci/X) > P(Cj/X) for i<= j<=c , j != i Thus we maximize P(Ci/X) - answer
b. P(Ci/X) < P(Cj/X) for i<= j<=c , j != i Thus we maximize P(Ci/X)
c. P(Ci/X) > P(Cj/X) for i<= j<=c , j != i Thus we maximize P(Cj/X)
d. None of the above

5. The problem of high variance and low bias is called__________


a. Over-fitting- answer
b. Underfitting
c. Normal fitting
d. Best fitting

6. Identify the type of Machine learning approach to solve the given problems:
Decision Support System to predict the decision to play Match or not to play
a. Reinforcement learning
b. Supervised learning- answer
c. Un supervised learning
d. Semi supervised learning

7. Identify the type of Machine learning approach to solve the given problems:
Grouping of documents retrieved by Google Search Engine
a. Reinforcement learning
b. Supervised learning
c. Un supervised learning- answer
d. Semi supervised learning

8. Identify the type of Machine learning approach to solve the given problems:

System to predict price of product in next year

a. Reinforcement learning
b. Supervised learning- answer
c. Unsupervised learning
d. Semi supervised learning

9. Identify the type of Machine learning approach to solve the given problems:
System to predict the suitable treatment
a. Reinforcement learning
b. Supervised learning
c. Un supervised learning
d. Semi supervised learning

10. Identify the type of Machine learning approach to solve the given problems:
System for Driverless Car
a. Reinforcement learning- answer
b. Supervised learning
c. Unsupervised learning
d. Semi supervised learning

11. Which is true for AI, ML and DP


a. AI>ML>DP- answer
b. DP>ML>AI
c. ML>AI>DP
d. DP>ML>AI
MCQ’s on Unit 2: Feature selection (Two marks)

1. For creating Training and Test datasets which statements are true?
a. Both datasets must reflect the original distribution
b. The original dataset must be randomly shuffled before the split phase in order to avoid
correlation between consequent elements
c. Both a and b - answer
d. None of the above

2. SK-Learn provides which function to create train and test data:


a. train_test_split- answer
b. test_train_split
c. TestTrainSplit
d. Split_test_train

3. In scikit-learn LabelEncoder class:


a. Adopts a dictionary-oriented approach,
b. Associating to each category label a progressive integer number,
c. That is an index of an instance array called classes_

d. All of the above- answer

4. Scikit-learn class Imputer fills the holes using a strategy based on the:
a. mean
b. median
c. frequency (the most frequent entry)
d. All of the above- answer

5. Consider 3 dimensional dataset given below


x y z
1 Nan 2
2 3 nan
-1 4 2
SK-Learn Imputer mean strategy will fill missing values with
a. 3, 2
b. 4, 2
c. 3.5, 2 - answer
d. Difficult to tell
6. Consider 3 dimensional dataset given below
x y z
1 Nan 2
2 3 nan
-1 4 2
SK-Learn Imputer median strategy will fill missing values with
a. 3, 2
b. 4, 2
c. 3.5, 2 - answer
d. Difficult to tell

7. Consider 3 dimensional dataset given below


x y z
1 Nan 2
2 3 nan
-1 4 2
SK-Learn Imputer most_frequent strategy will fill missing values with
a. 3, 2- answer
b. 4, 2
c. 3.5, 2
d. Difficult to tell

8. Which statement(s) is (are) true for SK-Learn MinMaxScaler ?


a. Works well for cases when the distribution is not Gaussian
b. Works well when the standard deviation is very small
c. It is sensitive to outliers
d. All of these- answer

9. _________________ uses the interquartile range , which makes it robust to outliers.


a. MonMaxScaler
b. Standard Scaler
c. Robust Scaler- answer
d. None of these

10. Consider Q1=31 and Q3=119. The inter quartile range (IQR) will be______
a. 88 - answer
b. -88
c. 150
d. -150
MCQs on unit 2 (One mark question)
1) Which of the following contains train_test_split() function

A) sklearn.feature_extraction
B) sklearn.preprocessing
C) sklearn.model_selection- answer
D) sklearn.decomposition

2) Default value of test_size in train_test_split() when both test_size and train_size are none

A) 0.33
B) 0.25 - answer
C) 0.50
D) 0.20

3) The LabelEncoder class, adopts which approach?

A) Dictionary-oriented- answer
B) List-oriented
C) Tree-oriented
D) Map-oriented

4) FeatureHasher class in scikit-learn adopts which hashing technique:

A) SHA256
B) MD5
C) MurmurHash 3- answer
D) BLAKE3

5) Which of the following is best option to handle missing data?

A) Removing the whole line


B) Creating sub-model to predict those features
C) Using an automatic strategy to input them according to the other known values-
answer
D) Inserting random values

6) When performing regression or classification, which of the following is the correct way to
preprocess the data?

A) Normalize the data → PCA → training - answer


B) PCA → normalize PCA output → training
C) Normalize the data → PCA → normalize PCA output → training
D) None of the above
7) What is pca.components_ in Sklearn?

A) Set of all eigen vectors for the projection space - answer


B) Matrix of principal components
C) Result of the multiplication matrix
D) None of the above options

8) How do you handle missing or corrupted data in a dataset?

A) Drop missing rows or columns


B) Replace missing values with mean/median/mode
C) Assign a unique category to missing values
D) All of the above - answer

9) The class KernelPCA, which performs a PCAwith?


A) non-linearly separable data sets - answer
B) linearly separable data sets
C) categorical data sets
D) Heterogeneous data sets

10) Principal component analysis is a method to select only a subset of features which contain
the largest amount of?

A) Total covariance
B) Total variance - answer
C) Total count
D) Mean

11) In the following loss function which parametercontrols the level of sparsity?

A) xi
B) c - answer
C) D
D) αi

12) Which parameter determines the number of atoms in scikit-learn DictioanryLearning class?

A) alpha
B) n_jobs
C) n_components - answer
D) tol

13) In KernalPCA the default value for gamma is?

A) 1.0/number of features - answer


B) 2.0/number of features
C) 10/number of features
D) None of above

14) Non negative matrix factorization algorithm optimizes a loss function based on?

A) L1 Norm
B) Frobenius norm - answer
C) linalgnorm
D) matrix norm

15) Which of the following encoding technique is efficient to deal with large number of possible
categories?

A) Effect Encoding
B) Feature Hashing
C) One Hot Encoding
D) Bin counting scheme - answer

16) Which scaling technique scales data without being affected by outliers?

A) Robust Scaling - answer


B) Min Max Scaling
C) Standardized Scaling
D) Z-score Scaling

17) Which feature selection technique use recursive approach?

A) Filter Methods
B) Wrapper Methods - answer
C) Embedded Methods
D) Subset Methods

18) From the following which can be applied on dataset with more than one dimension?

A) Mean
B) Standard Deviation
C) Covariance - answer
D) Variance

19) In principal component analysis the sparse loadings can be obtained by imposing which
constraint on regression coefficients:

A) Ridge
B) Lasso - answer
C) Linear
D) Logistic

20) What provides better statistical regularization?

A) Sparse PCA - answer


B) Kernel PCA
C) Non-negative Matrix Factorization
D) Atom Extraction

21) Eigen vector with ____ Eigen value is the principle component of dataset.

A) Lowest
B) Highest - answer
C) Mean
D) Zero
22) Trace is equal to the ___ of the Eigen values.

A) Difference
B) Sum - answer
C) Product
D) Mean

23) In which scaling technique the upper and lower can be specified by user?

A) Robust Scaling
B) Min Max Scaling - answer
C) Standardized Scaling
D) Z-score Scaling

24) Principal component analysis (PCA) can be used with variables of any mathematical types:
quantitative, qualitative, ora mixture of these types.

A) True
B) False - answer

25) Variances and covariances can be computed for variables of any mathematical types:
quantitative, qualitative, or a mixture of these types.

A) True
B) False - answer
Unit- 3: Regression (One mark)

1. A process by which we estimate the value of dependent variable on the basis of one or more
independent variables is called:
a. Correlation
b. Regression - answer
c. Residual
d. Slope
2. All data points falling along a straight line is called:
a. Linear relationship - answer
b. Non linear relationship
c. Residual
d. Scatter diagram
3. A relationship where the flow of the data points is best represented by a curve is called:
a. Linear relationship
b. Nonlinear relationship - answer
c. Linear positive
d. Linear negative

4. The value we would predict for the dependent variable when the independent variables are all
equal to zero is called:
(a) Slope
(b) Sum of residual
(c) Intercept - answer
(d) Difficult to tell

5. The predicted rate of response of the dependent variable to changes in the independent variable is
called:
(a) Slope - answer
(b) Intercept
(c) Error
(d) Regression equation
6. The slope of the regression line of Y on X is also called the:
(a) Correlation coefficient of X on Y
(b) Correlation coefficient of Y on X
(c) Regression coefficient of X on Y
(d) Regression coefficient of Y on X - answer
8. In simple linear regression, the numbers of unknown constants are:
(a) One
(b) Two - answer
(c) Three
(d) Four
9. In simple regression equation, the numbers of variables involved are:
(a) 0
(b) 1
(c) 2 - answer
(d) 3
10. If the value of any regression coefficient is zero, then two variables are:
(a) Qualitative
(b) Correlation
(c) Dependent
(d) Independent- answer
11. In SK-Learn Linear Regression offers two instance variables, __________ and ____________
a) intercept_ and coef_ - answer
b) Intercept and coef
c) Slope and Intercept
d) Slope and Coef
12. _________ regression imposes an additional shrinkage penalty to the ordinary least squares loss
function to limit its squared L2 norm:

a) Lasso
b) LassoCV
c) Ridge - answer
d) ElasticNet
13. _____________ regressor imposes a penalty on the L1 norm of w to determine a potentially
higher number of null coefficients:

a) Lasso - answer
b) RidgeCV
c) Ridge
d) ElasticNet
14. A Regression approach to avoid the problem of outliers is offered by _______________
a) Linear Regression
b) Logistic Regression
c) RANSAC Regressor - answer
d) Polynomial Regressor

15. Model with high variance and low bias is called_________________


a) Over-fitted model - answer
b) Under-fitted model
c) Best fitted
d) None of the above

16. ________ occurs when our model neither fits the training data nor generalizes on the new data.
a) Over-fitting
b) Under-fitting - answer
c) Best fitting
d) None of the above

17. ________________ is the process of adding information in order to solve an ill-posed problem
or to prevent overfitting

a) Under-fitting
b) Regularization - answer
c) Best fitting
d) None of the above

18. ____________ selects the only some feature while reduces the coefficients of others to zero.
This property is known as feature selection

a) Lasso - answer
b) RidgeCV
c) Ridge
d) ElasticNet
19. ______ combines both Lasso and Ridge Regression into one model with two penalty factors, one
proportional to L1 norm and other proportional to L2 norm.
a) LassoCV
b) RidgeCV
c) ElasticNet - answer
d) None of the above
20. ____________minimizes the cost function by gradually updating the weight values.

a. Gradient Descent - answer


b. Perceptron
c. Grid search
d. None of the above
21. _______ is a technique allows using linear models even when the dataset has strong non-
linearities. The idea is to add some extra variables computed from the existing ones and using (in
this case) only polynomial combinations.

a) Linear Regression
b) Logistic Regression
c) RANSAC Regressor
d) Polynomial Regressor - answer
22. The Regression technique that uses sigmoid function is called________________
a) Linear Regression
b) Logistic Regression - answer
c) RANSAC Regressor
d) Polynomial Regressor

23. Confusion Matrix can be used to measure the performance of _______________ model.
a) Linear Regression
b) Logistic Regression - answer
c) RANSAC Regressor
d) Polynomial Regressor
24. The residual is defined as the difference between the:
a) actual value of y and the estimated value of y - answer
b) actual value of x and the estimated value of x
c) actual value of y and the estimated value of x
d) actual value of x and the estimated value of y

25)Which of the following methods do we use to find the best fit line for data in Linear
Regression?
A) Least Square Error
B) Maximum Likelihood
C) Logarithmic Loss
D) Both A and B
Answer:(A)

26)True- False: Overfitting is more likely when you have a huge amount of data to train.
A) TRUE
B) FALSE
Solution: (B)

27) What will happen when you apply very large penalty in the case of Lasso?
A) Some of the coefficients will become zero
B) Some of the coefficients will be approaching to zero but not absolute zero
C) Both A and B depending on the situation
D) None of these
Solution: (A)

28) Generally, which of the following method(s) is used for predicting continuous dependent
variable?
1. Linear Regression 2. Logistic Regression
A) 1 and 2
B)only 1
C)only 2
D)None of these
Solution:(B)

29)Full form of ROC is


A)Regression Operation Characteristics Curve
B)Receiver Operating Characteristics Curve
C)Regression Operating Characteristics Curve
D)Ridge Operation Characteristics Curve
Solution:(B)

30.F score is given by :


A)F=2*(precision+recall)/precision*recall
B)F=(precision+recall)/precision*recall
C)F=2*(precision*recall)/(precision+recall)
D)F=precision+recall

31)Which is L1 regression
A)Lasso
B)Ridge
C)polynomial
D)Isotonic
Answer A

32)Which of the following is true about “Ridge” or “Lasso” regression methods in case of feature
selection?
A) Ridge regression uses subset selection of features
B)Lasso regression uses subset selection of features
C)Both use subset selection of features
D)None of the above
Solution:(B)

33)SSE can never be


(A) larger than SST
(B) smaller than SST
(C)equal to 1
(D)equal to zero
Solution:(A)

34) 1. Which of the following is correct about regularized regression?


a) Can help with bias trade-off
b) Cannot help with model selection
c) Cannot help with variance trade-off
d) All of the mentioned
Solution:(A)

35) Which of the following statement is true about outliers in Linear regression?
A) Linear regression is sensitive to outliers
B) Linear regression is not sensitive to outliers
C) Can’t say
D) None of these
Solution: (A)

36) What do you expect will happen with bias and variance as you increase the size of training
data?
A) Bias increases and Variance increases
B) Bias decreases and Variance increases
C) Bias decreases and Variance decreases
D) Bias increases and Variance decreases
Solution: (D)

37)A Pearson correlation between two variables is zero but, still, their values can still be related
to each other.
A) TRUE
B) FALSE
Solution: (A)

38) Which of the following statement(s) is / are true for Gradient Decent (GD) and Stochastic
Gradient Decent (SGD)?
1. In GD and SGD, you update a set of parameters in an iterative manner to minimize the
error function.
2. In SGD, you have to run through all the samples in your training set for a single update of
a parameter in each iteration.
3. In GD, you either use the entire data or a subset of training data to update a parameter in
each iteration.
A) Only 1
B) Only 2
C) Only 3
D) 1 and 2
Solution:(A)

39) When hypothesis tests and confidence limits are to be used, the residuals are assumed
to follow the __________distribution.
A) Formal
B) Mutual
C) Normal
D) Abnormal
Solution:(C)

40)The error due to simplistic assumptions made by the model in fitting the data is called as
A)variance
B)bias
C)MSE
D)none of these
Solution:(B)

41)ROC curves show the trade-off between which parameters


A)TPR and FPR
B)TNR And TPR
C)FPR and TNR
D)FPR and FNR
Solution:(A)
42)The accuracy of the model can be measured by
A)The area above ROC curve
B)The area under ROC curve
C)All of the above
D)None of the above
Solution:(B)

43) Least square method calculates the best-fitting line for the observed data by minimizing the sum
of the squares of the _______ deviations.
a) Vertical
b) Horizontal
c) Both of these
d) None of these
Solution:(A)
Unit-3 (Two marks)
1. The regression line yhat = 3 + 2x has been fitted to the data points (4,8), (2,5), and (1,2). The
residual sum of squares will be:
a) 10
b) 15
c) 13
d) 22 - answer
2. Suppose you have trained a logistic regression classifier and it outputs a new example x with a
prediction ho(x) = 0.2. This means
a. Our estimate for P(y=1 | x)
b. Our estimate for P(y=0 | x) - answer
c. Our estimate for P(y=1 | x)
d. Our estimate for P(y=0 | x)

3. A regression analysis between sales (in $1000) and advertising (in $100) resulted in the following
least squares line: yhat = 75 +6x. This implies that if advertising is $800, then the predicted amount
of sales (in dollars) is:
a. $4875 - answer
b. $123,000
c. $487,500
d. $12,300
4. The value for SSE equals zero. This means that the coefficient of determination (r^2) must equal:
a. 0.0.
b. -1.0.
c. 2.3.
d. 1.0 - answer

5. Below equation shows the loss function of ____________________

a) Logistic Regression Model


b) Linear Regression Model - answer
c) Gaussian Naïve Bayes Model
d) Polynomial Model
6. For the given results of a recently conducted study on the correlation of the number of hours spent
driving with the risk of developing acute backache. The Intercept of the line is_______.

a) 12.58 - answer
b) 10.58
c) 11.85
d) 10.85

7. For the given results of a recently conducted study on the correlation of the number of hours spent
driving with the risk of developing acute backache. The slope of the line is_______.

a) 4.59 - answer
b) 10.58
c) 5.85
d) 10.85

8. for the given vector of outputs the Mean squared error is ________.
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]

a) 0.45
b) 0.375 - answer
c) 0.56
d) None of the above

9)The correct relationship between SST, SSR, and SSE is given by;
a) SSR = SST + SSE
b) SST = SSR + SSE
c) SSE = SSR – SST
d) all of the above
Solution:(B)

10)Stochastic gradient descent performs less computation per update than batch gradient descent.
A)True
B)False
Solution:(A)

11)A parameter that is external to model and whose value cannot be estimated from data is called as
A)Hyperparameter
B)Model Parameter
C)Outlier
D)Regularization constant
Solution:(A)

12)Which strategy is used for tuning hyperparameters


A)Gradient Descent
B)Feature Scaling
C)Regularization
D)Grid Search
Solution:(D)

13) Which is another term for true positive rate


A)precision
B)Recall
C)Specificity
D)Fscore
Solution:(B)

14)The most widely used metrics and tools to assess a classification model are:
A)Confusion matrix
B)Cost-sensitive accuracy
C)Area under the ROC curve
D)All of the above
Solution:(D)

15)Regularization term in ridge regression is


A) λ (sum of the absolute value of coefficients)
B) λ (sum of the square of coefficients)
C)λ square
D)None on these
Solution:(B)

16) In practice, Line of best fit or regression line is found when _____________
a) Sum of residuals (Σ(Y – h(X))) is minimum
b) Sum of the absolute value of residuals (Σ|Y-h(X)|) is maximum
c) Sum of the square of residuals ( Σ (Y-h(X))2) is minimum
d) Sum of the square of residuals ( Σ (Y-h(X))2) is maximum
Solution:(C)
Unit- 4 : Naïve Bayes and SVM
(one mark)
1. Naive bayes falls under which category-
a. Unsupervised classification learning
b. Supervised classification learning
c. Semi- supervised classification learning
d. Reinforcement learning
Ans - b
2. What machine learning task is the Naive Bayes algorithm used for?
a. dimensionality reduction
b. clustering
c. classification
d. regression
Ans - c
3. Naive Bayes assumption about data is-
a. input is independent, conditional on the output label.
b. input is dependent, conditional on the output label.
c. input is independent, not conditional on the output label.
d. input is dependent, not conditional on the output label.
Ans - a

4. Bayes rule:
a. P(A |B) = P(B|A) .P(B) / P(A)
b. P(A |B) = P(B|A) .P(A) / P(B)
c. P(A |B) = P(B|A) .P(A)
d. P(A |B) = P(B|A) .P(B)
Ans - b

5. Which is not a main type of naive bayes classifier -


a. Bernoulli naive bayes
b. Multinomial naive bayes
c. Gaussian naive bayes
d. Complement Naive bayes
Ans - d

6. Which type of naive bayes classifier is suited for imbalanced datasets -


a. Bernoulli naive bayes
b. Multinomial naive bayes
c. Gaussian naive bayes
d. Complement Naive bayes
Ans - b

7. Which type of naive bayes classifier is best suited for document classification problem -
a. Bernoulli naive bayes
b. Multinomial naive bayes
c. Gaussian naive bayes
d. Complement Naive bayes
Ans - b

8. Which type of naive bayes classifiers is usually used for yes/no type boolean predictores-
a. Bernoulli naive bayes
b. Multinomial naive bayes
c. Gaussian naive bayes
d. Complement Naive bayes
Ans - a

9. Naive Bayes is termed as 'Naive' because it assumes-


a. Dependence between every pair of feature in the data.
b. It is multiclass classifier
c. It is not multiclass classifier
d. Independence between every pair of feature in the data.
Ans- d
10. SVM Classifiers and Linear Classifiers are strictly:
a. Probabilistic Binary Linear Classifier
b. Probabilistic Multiclass classifier
c. Non Probabilistic Binary Linear Classifier
d. Non Probabilistic Multiclass classifier
Ans - c
11. SVM falls under which category-
a. Unsupervised classification learning
b. Supervised classification learning
c. Semi- supervised classification learning
d. Reinforcement learning
Ans - b
12. The effectiveness of an SVM depends upon:
a. Selection of Kernel
b. Kernel Parameters
c. Soft Margin Parameter C
d. All of the above
Ans- D
9. Which of the following is true about Naive Bayes ?
a. Assumes that all the features in a dataset are equally important
b. Assumes that all the features in a dataset are independent
c. Both A and B - answer
d. None of the above options

(Two marks)
1. One marble jar has several different colored marbles inside of it. It has 1 red, 2 green, 4 blue, and
8 yellow marbles. All the marbles are the same size and shape. If Peter takes out a marble from the
jar without looking, what is the probability that he will NOT choose a yellow marble.
a. 7/15
b. 8/15
c. 7/8
d. 5/8
Ans- a

2. If we train a Naive Bayes classifier using infinite training data that satisfies all of its modeling
assumptions , then in general, what can we say about the training error (error in training data) and
test error (error in held-out test data)?
a. It may not achieve either zero training error or zero test error
b. It will always achieve zero training error and zero test error.
c. It will always achieve zero training error but may not achieve zero test error.
d. It may not achieve zero training error but will always achieve zero test error.
Ans - a

3. If P(A) = 0.10, P(B) = 0.05.and P(B|A) = 7%. Find P(A|B)-


a. 0.35
b. 0.34
c. 0.14
d. 0.15
Ans - c
4. Which method is provided by scikit learn to tackle large scale classification for which full training
set might not fit in memory-
a. Memory_manage method
b. Partial_manage method
c. Partial_fit method
d. None of the above
Ans - c
5. If I am using all features of my dataset and I achieve 100% accuracy on my training set, but ~70%
on validation set, what should I look out for?
a. Underfitting
b. Nothing, the model is perfect
c. Overfitting
d. None of the above
Ans- C

6. What is/are true about kernel in SVM?


1.. Kernel function map low dimensional data to high dimensional space
2. It’s a similarity function
a. 1
b. 2
c. 1 and
d. None of these
Ans- C

7. The performance of SVM depends on which factors


a. the number of training instances
b. the distribution of the data
c. linear vs. non-linear problems
d. input scale of the features
e. All of the above
Ans - e

8. What do you mean by generalization error in terms of the SVM?


a. How far the hyperplane is from the support vectors
b. How accurately the SVM can predict outcomes for unseen data
c. How much you want to avoid misclassification of each training example
d. How far the influence of a single training example reaches.
Ans- b

9. What is regularisation parameter tells in SVM-


a. How far the hyperplane is from the support vectors
b. How accurately the SVM can predict outcomes for unseen data
c. How much you want to avoid misclassification of each training example
d. How far the influence of a single training example reaches.
Ans - c

10. What is gamma parameter tells in SVM-


a. How far the hyperplane is from the support vectors
b. How accurately the SVM can predict outcomes for unseen data
c. How much you want to avoid misclassification of each training example
d. How far the influence of a single training example reaches.
Ans - d

11. The SVM’s are less effective when:


a. The data is linearly separable
b. The data is clean and ready to use
c. The data is noisy and contains overlapping points
d. None of the above
Ans- c

12. Which of the following are real world applications of the SVM?
a. Text and Hypertext Categorization
b. Image Classification
c. Clustering of News Articles
d. All of the above
Ans- d

13. What is the kernel trick -


a. Polynomial and exponential kernels calculate the separation line in lower dimensions.
b. Polynomial and exponential kernels calculate the separation line in higher dimensions.
c. Polynomial or exponential kernels calculate the separation line in lower dimensions.
d. Polynomial or exponential kernels calculate the separation line in higher dimensions.
Ans - b

Unit-V (One mark)


2. SK-Learn provides _______ in built class for Decision Tree Classifier?
a) DTClassifier
b) DecisionTreeClassifier - answer
c) Tree
d) None of the above

3. What approach is taken by Decision Tree for Knowledge Engineering?


a) Inductive - answer
b) Association Rules
c) Statistical
d) Substitutive

4. Which of the following is a widely used and effective machine learning algorithm based on the
idea of bagging?
a. Decision Tree
b. Regression
c. Classification
d. Random Forest - answer
5. In the given formula of Decision Tree family , what A and D represents?
Gain(A) = Cross_Entropy(D) – EntropyA(D)

a. Attribute, Decision
b. Attribute, Dataset- answer
c. Probability, Dataset
d. None of the above

6. In the given formula of Decision Tree family , which are the given statements are true?
Gain(A) = Cross_Entropy(D) – EntropyA(D)

a. Gain(A) should be maximum.


b. The attribute A with highest gain is chosen as the splitting attribute
c. Both a and b-answer
d. None of the above

7. A _________ is a decision support tool that uses a tree-like graph or model of decisions and
their possible consequences, including chance event outcomes, resource costs, and utility.
a. Decision tree- answer
b. Graphs
c. Trees
d. Neural Networks

8. 3. What is Decision Tree?


a) Flow-Chart
b) Structure in which internal node represents test on an attribute, each branch represents
outcome of test and each leaf node represents class label
c) Flow-Chart & Structure in which internal node represents test on an attribute,
each branch represents outcome of test and each leaf node represents class label- answer
d) None of the mentioned

9. Decision Trees can be used for Classification Tasks.


a) True- answer
b) False

10. The most widely used metrics and tools to assess a classification model are:
a. Confusion matrix
b. Cost-sensitive accuracy
c. Area under the ROC curve
d. All of the above - answer
11. Which of the following is a good test dataset characteristic?
a. Large enough to yield meaningful results
b. Is representative of the dataset as a whole
c. Both A and B - answer
d. None of the above
12. Which of the following is a disadvantage of decision trees?
a. Factor analysis
b. Decision trees are robust to outliers
c. Decision trees are prone to be overfit - answer
d. None of the above
13. What is the purpose of performing cross-validation?
a. To assess the predictive performance of the models
b. To judge how the trained model performs outside the sample on test data
c. Both A and B – answer
d. None of the above

14. Which of the following is/are true about bagging trees?

1.In bagging trees, individual trees are independent of each other

2.Bagging is the method for improving the performance by aggregating the results of weak
learners

A) 1
B) 2
C) 1 and 2- answer
D) None of these

15. Which of the following is/are true about boosting trees?

1. In boosting trees, individual weak learners are independent of each other


2. It is the method for improving the performance by aggregating the results of weak learners

A) 1
B) 2- answer
C) 1 and 2
D) None of these

16. Which of the following algorithm are not an example of ensemble learning algorithm?
A) Random Forest
B) Adaboost
C) Extra Trees
D) Gradient Boosting
E) Decision Trees- answer

17. Suppose you are using a bagging based algorithm say a RandomForest in model building.
Which of the following can be true?

1. Number of tree should be as large as possible


2. You will have interpretability after using RandomForest

A) 1- answer
B) 2
C) 1 and 2
D) None of these
18. True-False: The bagging is suitable for high variance low bias models?

A) TRUE- answer
B) FALSE

19. In which of the following scenario a gain ratio is preferred over Information Gain?

A) When a categorical variable has very large number of category - answer


B) When a categorical variable has very small number of category
C) Number of categories is the not the reason
D) None of these

20. In K-means clustering, the distance between each sample and each centroid is computed and the
sample is assigned to the cluster where the distance is minimum. This approach is often called ----

a. Minimizing the inertia of the clusters- answer


b. Minimizing no. of clusters
c. Maximizing the inertia of the clusters
d. None of the above
21. Which statements are true about K-means method of clustering?

1)The process is iterative

2)All the distances are recomputed.

3)The algorithm stops when the centroids become stable and, therefore, the inertia is minimized

4) All of these- answer

22. [True or False] k-NN algorithm does more computation on test time rather than train
time.

A) TRUE - answer
B) FALSE

23. Which of the following statements is true for k-NN classifiers?

A) The classification accuracy is better with larger values of k


B) The decision boundary is smoother with smaller values of k
C) The decision boundary is linear
D) k-NN does not require an explicit training step- answer

Unit-5 (Two marks)


1. Which of the following is/are true about Random Forest and Gradient Boosting ensemble
methods?

1.Both methods can be used for classification task


2.Random Forest is use for classification whereas Gradient Boosting is use for regression
task
3.Random Forest is use for regression whereas Gradient Boosting is use for Classification
task
4.Both methods can be used for regression task
A) 1
B) 2
C) 3
D) 4
E) 1 and 4 – answer

2. In Random forest you can generate hundreds of trees (say T1, T2 …..Tn) and then aggregate the
results of these tree. Which of the following is true about individual(Tk) tree in Random Forest?

1. Individual tree is built on a subset of the features


2. Individual tree is built on all the features
3. Individual tree is built on a subset of observations
4. Individual tree is built on full set of observations

A) 1 and 3 - answer
B) 1 and 4
C) 2 and 3
D) 2 and 4

3. Which of the following algorithm doesn’t uses learning Rate as of one of its hyperparameter?

1. Gradient Boosting
2. Extra Trees
3. AdaBoost
4. Random Forest

A) 1 and 3
B) 1 and 4
C) 2 and 3
D) 2 and 4 - answer

4. Which of the following algorithm would you take into the consideration in your final model
building on the basis of performance?

Suppose you have given the following graph which shows the ROC curve for two different
classification algorithms such as Random Forest(Red) and Logistic Regression(Blue)
A) Random Forest- anwser
B) Logistic Regression
C) Both of the above
D) None of these

5. Which of the following is true about training and testing error in such case?
Suppose you want to apply AdaBoost algorithm on Data D which has T observations. You
set half the data for training and half for testing initially. Now you want to increase the
number of data points for training T1, T2 … Tn where T1 < T2…. Tn-1 < Tn.

E) The difference between training error and test error increases as number of observations
increases
B) The difference between training error and test error decreases as number of
observations increases- answer
C) The difference between training error and test error will not change
D) None of These

6. In random forest or gradient boosting algorithms, features can be of any type. For example,
it can be a continuous feature or a categorical feature. Which of the following option is true
when you consider these types of features?
A) Only Random forest algorithm handles real valued attributes by discretizing them
B) Only Gradient boosting algorithm handles real valued attributes by discretizing them
C) Both algorithms can handle real valued attributes by discretizing them- answer
D) None of these

7. Consider the following figure for answering the next few questions. In the figure, X1 and X2
are the two features and the data point is represented by dots (-1 is negative class and +1 is a
positive class). And you first split the data based on feature X1(say splitting point is x11)
which is shown in the figure using vertical line. Every value less than x11 will be predicted
as positive class and greater than x will be predicted as negative class.

How many data points are misclassified in above image?


A) 1- answer
B) 2
C) 3
D) 4
8. Suppose, you are working on a binary classification problem with 3 input features. And you
chose to apply a bagging algorithm(X) on this data. You chose max_features = 2 and the
n_estimators =3. Now, Think that each estimators have 70% accuracy.
Note: Algorithm X is aggregating the results of individual estimators based on maximum
voting
What will be the maximum accuracy you can get?
A) 70%
B) 80%
C) 90%
D) 100%- answer

9. Which of the following is true about the Gradient Boosting trees?

1. In each stage, introduce a new regression tree to compensate the shortcomings of existing
model
2. We can use gradient decent method for minimize the loss function

A) 1
B) 2
C) 1 and 2- answer
D) None of these

9. In SK-Learn which below parameters are in built in KMeans method

a. cluster_centers_
b. inertia_
c. n_clusters
d. all of the above

10. In which of the following cases will K-means clustering fail to give good results?
1) Data points with outliers 2) Data points with different densities 3) Data points with
nonconvex shapes
1. 1 and 2
2. 2 and 3
3. 1, 2, and 3 - answer
4. 1 and 3
11. Which of the following is a reasonable way to select the number of clusters "k"?
1. Choose k to be the smallest value so that at least 99% of the varinace is retained.
2. Choose k to be 99% of m (k = 0.99*m, rounded to the nearest integer).
3. Choose k to be the largest value so that 99% of the variance is retained.
4. Use the elbow method- answer
12. A company has build a kNN classifier that gets 100% accuracy on training data. When they
deployed this model on client side it has been found that the model is not at all accurate.
Which of the following thing might gone wrong?
Note: Model has successfully deployed and no technical issues are found at client side except
the model performance

A) It is probably a overfitted model - answer


B) It is probably a underfitted model
C) Can’t say
D) None of these

13. In k-NN it is very likely to overfit due to the curse of dimensionality. Which of the
following option would you consider to handle such problem?

1. Dimensionality Reduction
2. Feature selection

A) 1
B) 2
C) 1 and 2 - answer
D) None of these

14. In the image below, which would be the best value for k assuming that the algorithm you are
using is k-Nearest Neighbor.

A) 3
B) 10 - answer
C) 20
D 50

15. Which of the following is/are not true about DBSCAN clustering algorithm:

1. For data points to be in a cluster, they must be in a distance threshold to a core point
2. It has strong assumptions for the distribution of data points in dataspace
3. It has substantially high time complexity of order O(n3)
4. It does not require prior knowledge of the no. of desired clusters
5. It is robust to outliers

Options:

A. 1 only

B. 2 only
C. 4 only

D. 2 and 3 - answer

Unit-6 (Two marks)

1. After performing K-Means Clustering analysis on a dataset, you observed the following
dendrogram. Which of the following conclusion can be drawn from the dendrogram?

A. There were 28 data points in clustering analysis

B. The best no. of clusters for the analyzed data points is 4

C. The proximity function used is Average-link clustering

D. The above dendrogram interpretation is not possible for K-Means clustering analysis -
answer

3. In the figure below, if you draw a horizontal line on y-axis for y=2. What will be the number

of clusters formed?

A. 1

B. 2 - answer
C. 3

D. 4

4. What should be the best choice for number of clusters based on the following results:

A. 5

B. 6 - answer

C. 14

D. Greater than 14

5. Which of the following is/are not true about Centroid based K-Means clustering algorithm
and Distribution based expectation-maximization clustering algorithm:

1. Both starts with random initializations


2. Both are iterative algorithms
3. Both have strong assumptions that the data points must fulfill
4. Both are sensitive to outliers
5. Expectation maximization algorithm is a special case of K-Means
6. Both requires prior knowledge of the no. of desired clusters
7. The results produced by both are non-reproducible.

Options:

A. 1 only
B. 5 only - answer

C. 1 and 3

D. 6 and 7

7. If you are using Multinomial mixture models with the expectation-maximization algorithm for
clustering a set of data points into two clusters, which of the assumptions are important:

A. All the data points follow two Gaussian distribution

B. All the data points follow n Gaussian distribution (n >2)

C. All the data points follow two multinomial distribution - answer

D. All the data points follow n multinomial distribution (n >2)

8. Below is a mathematical representation of a neuron.

The different components of the neuron are denoted as:

• x1, x2,…, xN: These are inputs to the neuron. These can either be the actual observations
from input layer or an intermediate value from one of the hidden layers.
• w1, w2,…,wN: The Weight of each input.
• bi: Is termed as Bias units. These are constant values added to the input of the activation
function corresponding to each weight. It works similar to an intercept term.
• a: Is termed as the activation of the neuron which can be represented as
• and y: is the output of the neuron

Considering the above notations, will a line equation (y = mx + c) fall into the category of a
neuron?

A. Yes- answer
B. No

9. In the graph below, we observe that the error has many “ups and downs”

Should we be worried?

A. Yes, because this means there is a problem with the learning rate of neural network.
B. No, as long as there is a cumulative decrease in both training and validation error,
we don’t need to worry - answer

Unit 6 ( One mark)

1. Which of the following metrics, do we have for finding dissimilarity between two clusters in
hierarchical clustering?

1. Single-link
2. Complete-link
3. Average-link

Options:

A. 1 and 2

B. 1 and 3

C. 2 and 3

D. 1, 2 and 3 - answer

2. Which of the following statement(s) correctly represents a real neuron?

A. A neuron has a single input and a single output only


B. A neuron has multiple inputs but a single output only

C. A neuron has a single input but multiple outputs

D. A neuron has multiple inputs and multiple outputs

E. All of the above statements are valid - answer

3. If you increase the number of hidden layers in a Multi Layer Perceptron, the classification
error of test data always decreases. True or False?

A. True

B. False - answer

4. You are building a neural network where it gets input from the previous layer as well as from
itself.

Which of the following architecture has feedback connections?

A. Recurrent Neural network - answer

B. Convolutional Neural Network

C. Restricted Boltzmann Machine

D. None of these

5. In which neural net architecture, does weight sharing occur?


A. Convolutional neural Network
B. Recurrent Neural Network
C. Fully Connected Neural Network
D. Both A and B - answer

6. In a neural network, which of the following techniques is used to deal with overfitting?
A. Dropout
B. Regularization
C. Batch Normalization
D. All of these - answer

7. What is a dead unit in a neural network?

A. A unit which doesn’t update during training by any of its neighbour - answer
B. A unit which does not respond completely to any of the training patterns

C. The unit which produces the biggest sum-squared error

D. None of these

8. Suppose a convolutional neural network is trained on ImageNet dataset (Object recognition


dataset). This trained model is then given a completely white image as an input.The output
probabilities for this input would be equal for all classes. True or False?
A. True
B. False - answer

9. For an image recognition problem (recognizing a cat in a photo), which architecture of neural
network would be better suited to solve the problem?
A. Multi Layer Perceptron
B. Convolutional Neural Network - answer
C. Recurrent Neural network
D. Perceptron

10. What are the factors to select the depth of neural network?

1. Type of neural network (eg. MLP, CNN etc)


2. Input data
3. Computation power, i.e. Hardware capabilities and software capabilities
4. Learning Rate
5. The output function to map

A. 1, 2, 4, 5

B. 2, 3, 4, 5

C. 1, 3, 4, 5

D. All of these - answer

11. Movie Recommendation systems are an example of:


1. Classification
2. Clustering
3. Reinforcement Learning
4. Regression

Options:
1. 2 Only
2. 1 only
C. 1 and 2
D. 2 and 3 - answer
13. Recommendation systems are used in which of the following applications:
a. Banking
b. Shopping
c. Search Engine
d. All of the above – answer

14. Which of the following are methods of Recommendation Systems-


a. Naïve User based systems,
b. Content based Systems,
c. Model free collaborative filtering
d. All of the above – answer

15. Select correct option related to Hierarchical clustering.


a. Creates sets of clusters
b. Uses A tree data structure Dendrogram
c. Only b
d. Both a and b- answer

16. Agglomerative clustering is based on __________ approach


a. Top Down
b. Bottom Up- answer
c. Linear
d. Partition

17. For each pair of clusters, which algorithm computes the maximum distance between the clusters
using below formula?

a. Single link
b. Complete link -answer
c. Average link
d. Ward’s Linkage

18. ___________ Graphical method to better understand the agglomeration process shows in a static
way how the aggregations are performed ,starting from the bottom (where all samples are separated)
till the top (where the linkage is complete).

a. Flow chart
b. Histo graph
c. Dendrogram –answer
d. Decision tree

19. Which of the following functions are activation function?


a. ReLU
b. Tanh
c. Sigmoid
d. All of the above- answer
20. Which activation function is used by most of the Deep networks nowadays?
a. ReLU - answer
b. Tanh
c. Sigmoid
d. All of the above

21. ___________ are general computers which can learn algorithms to map input
sequences to output sequences
a. CNN
b. RNN- answer
c. Deep Q-Learning
d. All of these
UNIT I
1. What is classification?
a) when the output variable is a category, such as “red” or “blue” or “disease” and “no
disease”.
b) when the output variable is a real value, such as “dollars” or “weight”.

Ans: Solution A

2. What is regression?
a) When the output variable is a category, such as “red” or “blue” or “disease” and “no
disease”.
b) When the output variable is a real value, such as “dollars” or “weight”.

Ans: Solution B

3. What is supervised learning?


a) All data is unlabelled and the algorithms learn to inherent structure from the input data
b) All data is labelled and the algorithms learn to predict the output from the input data
c) It is a framework for learning where an agent interacts with an environment and receives
a reward for each interaction
d) Some data is labelled but most of it is unlabelled and a mixture of supervised and
unsupervised techniques can be used.

Ans: Solution B

4. What is Unsupervised learning?


a) All data is unlabelled and the algorithms learn to inherent structure from the input data
b) All data is labelled and the algorithms learn to predict the output from the input data
c) It is a framework for learning where an agent interacts with an environment and receives
a reward for each interaction
d) Some data is labelled but most of it is unlabelled and a mixture of supervised and
unsupervised techniques can be used.

Ans: Solution A

5. What is Semi-Supervised learning?


a) All data is unlabelled and the algorithms learn to inherent structure from the input data
b) All data is labelled and the algorithms learn to predict the output from the input data
c) It is a framework for learning where an agent interacts with an environment and receives
a reward for each interaction
d) Some data is labelled but most of it is unlabelled and a mixture of supervised and
unsupervised techniques can be used.

Ans: Solution D
6. What is Reinforcement learning?
a) All data is unlabelled and the algorithms learn to inherent structure from the input data
b) All data is labelled and the algorithms learn to predict the output from the input data
c) It is a framework for learning where an agent interacts with an environment and receives
a reward for each interaction
d) Some data is labelled but most of it is unlabelled and a mixture of supervised and
unsupervised techniques can be used.

Ans: Solution C

7. Sentiment Analysis is an example of:

Regression,

Classification

Clustering

Reinforcement Learning

Options:

A. 1 Only

B. 1 and 2

C. 1 and 3

D. 1, 2 and 4

Ans : Solution D

8. The process of forming general concept definitions from examples of concepts to be


learned.
a) Deduction
b) abduction
c) induction
d) conjunction

Ans : Solution C

9. Computers are best at learning


a) facts.
b) concepts.
c) procedures.
d) principles.
Ans : Solution A

10. Data used to build a data mining model.


a) validation data
b) training data
c) test data
d) hidden data

Ans : Solution B

11. Supervised learning and unsupervised clustering both require at least one
a) hidden attribute.
b) output attribute.
c) input attribute.
d) categorical attribute.

Ans : Solution A

12. Supervised learning differs from unsupervised clustering in that supervised learning requires
a) at least one input attribute.
b) input attributes to be categorical.
c) at least one output attribute.
d) output attributes to be categorical.

Ans : Solution B

13. A regression model in which more than one independent variable is used to predict the
dependent variable is called
a) a simple linear regression model
b) a multiple regression models
c) an independent model
d) none of the above

Ans : Solution C

14. A term used to describe the case when the independent variables in a multiple regression model
are correlated is
a) Regression
b) correlation
c) multicollinearity
d) none of the above

Ans : Solution C
15. A multiple regression model has the form: y = 2 + 3x1 + 4x2. As x1 increases by 1 unit (holding x2
constant), y will
a) increase by 3 units
b) decrease by 3 units
c) increase by 4 units
d) decrease by 4 units

Ans : Solution C

16. A multiple regression model has


a) only one independent variable
b) more than one dependent variable
c) more than one independent variable
d) none of the above

Ans : Solution B

17. A measure of goodness of fit for the estimated regression equation is the
a) multiple coefficient of determination
b) mean square due to error
c) mean square due to regression
d) none of the above

Ans : Solution C

18. The adjusted multiple coefficient of determination accounts for


a) the number of dependent variables in the model
b) the number of independent variables in the model
c) unusually large predictors
d) none of the above

Ans : Solution D

19. The multiple coefficient of determination is computed by


a) dividing SSR by SST
b) dividing SST by SSR
c) dividing SST by SSE
d) none of the above

Ans : Solution C

20. For a multiple regression model, SST = 200 and SSE = 50. The multiple coefficient of
determination is
a) 0.25
b) 4.00
c) 0.75
d) none of the above

Ans : Solution B

21. A nearest neighbor approach is best used


a) with large-sized datasets.
b) when irrelevant attributes have been removed from the data.
c) when a generalized model of the data is desirable.
d) when an explanation of what has been found is of primary importance.

Ans : Solution B

22. Another name for an output attribute.


a) predictive variable
b) independent variable
c) estimated variable
d) dependent variable

Ans : Solution B

23. Classification problems are distinguished from estimation problems in that


a) classification problems require the output attribute to be numeric.
b) classification problems require the output attribute to be categorical.
c) classification problems do not allow an output attribute.
d) classification problems are designed to predict future outcome.

Ans : Solution C

24. Which statement is true about prediction problems?


a) The output attribute must be categorical.
b) The output attribute must be numeric.
c) The resultant model is designed to determine future outcomes.
d) The resultant model is designed to classify current behavior.

Ans : Solution D

25. Which statement about outliers is true?


a) Outliers should be identified and removed from a dataset.
b) Outliers should be part of the training dataset but should not be present in the test
data.
c) Outliers should be part of the test dataset but should not be present in the training
data.
d) The nature of the problem determines how outliers are used.
Ans : Solution D

26. Which statement is true about neural network and linear regression models?
a) Both models require input attributes to be numeric.
b) Both models require numeric attributes to range between 0 and 1.
c) The output of both models is a categorical attribute value.
d) Both techniques build models whose output is determined by a linear sum of weighted
input attribute values.

Ans : Solution A

27. Which of the following is a common use of unsupervised clustering?


a) detect outliers
b) determine a best set of input attributes for supervised learning
c) evaluate the likely performance of a supervised learner model
d) determine if meaningful relationships can be found in a dataset

Ans : Solution A

28. The average positive difference between computed and desired outcome values.
a) root mean squared error
b) mean squared error
c) mean absolute error
d) mean positive error

Ans : Solution D

29. Selecting data so as to assure that each class is properly represented in both the training and
test set.
a) cross validation
b) stratification
c) verification
d) bootstrapping

Ans : Solution B

30. The standard error is defined as the square root of this computation.
a) The sample variance divided by the total number of sample instances.
b) The population variance divided by the total number of sample instances.
c) The sample variance divided by the sample mean.
d) The population variance divided by the sample mean.

Ans : Solution A
31. Data used to optimize the parameter settings of a supervised learner model.
a) Training
b) Test
c) Verification
d) Validation

Ans : Solution D

32. Bootstrapping allows us to


a) choose the same training instance several times.
b) choose the same test set instance several times.
c) build models with alternative subsets of the training data several times.
d) test a model with alternative subsets of the test data several times.

Ans : Solution A

33. The correlation between the number of years an employee has worked for a company and the
salary of the employee is 0.75. What can be said about employee salary and years worked?
a) There is no relationship between salary and years worked.
b) Individuals that have worked for the company the longest have higher salaries.
c) Individuals that have worked for the company the longest have lower salaries.
d) The majority of employees have been with the company a long time.
e) The majority of employees have been with the company a short period of time.

Ans : Solution B

34. The correlation coefficient for two real-valued attributes is –0.85. What does this value tell you?
a) The attributes are not linearly related.
b) As the value of one attribute increases the value of the second attribute also increases.
c) As the value of one attribute decreases the value of the second attribute increases.
d) The attributes show a curvilinear relationship.

Ans : Solution C

35. The average squared difference between classifier predicted output and actual output.
a) mean squared error
b) root mean squared error
c) mean absolute error
d) mean relative error

Ans : Solution A

36. Simple regression assumes a __________ relationship between the input attribute and output
attribute.
a) Linear
b) Quadratic
c) reciprocal
d) inverse

Ans : Solution A

37. Regression trees are often used to model _______ data.


a) Linear
b) Nonlinear
c) Categorical
d) Symmetrical

Ans : Solution B

38. The leaf nodes of a model tree are


a) averages of numeric output attribute values.
b) nonlinear regression equations.
c) linear regression equations.
d) sums of numeric output attribute values.

Ans : Solution C

39. Logistic regression is a ________ regression technique that is used to model data having a
_____outcome.
a) linear, numeric
b) linear, binary
c) nonlinear, numeric
d) nonlinear, binary

Ans : Solution D

40. This technique associates a conditional probability value with each data instance.
a) linear regression
b) logistic regression
c) simple regression
d) multiple linear regression

Ans : Solution B

41. This supervised learning technique can process both numeric and categorical input attributes.
a) linear regression
b) Bayes classifier
c) logistic regression
d) backpropagation learning
Ans : Solution A

42. With Bayes classifier, missing data items are


a) treated as equal compares.
b) treated as unequal compares.
c) replaced with a default value.
d) ignored.

Ans : Solution B

43. This clustering algorithm merges and splits nodes to help modify nonoptimal partitions.
a) agglomerative clustering
b) expectation maximization
c) conceptual clustering
d) K-Means clustering

Ans : Solution D

44. This clustering algorithm initially assumes that each data instance represents a single cluster.
a) agglomerative clustering
b) conceptual clustering
c) K-Means clustering
d) expectation maximization

Ans : Solution C

45. This unsupervised clustering algorithm terminates when mean values computed for the current
iteration of the algorithm are identical to the computed mean values for the previous iteration.
a) agglomerative clustering
b) conceptual clustering
c) K-Means clustering
d) expectation maximization

Ans : Solution C

46. Machine learning techniques differ from statistical techniques in that machine learning methods
a) typically assume an underlying distribution for the data.
b) are better able to deal with missing and noisy data.
c) are not able to explain their behavior.
d) have trouble with large-sized datasets.

Ans : Solution B
UNIT –II

1.True- False: Over fitting is more likely when you have huge amount of data to train?
A) TRUE
B) FALSE
Ans Solution: (B)
With a small training dataset, it’s easier to find a hypothesis to fit the training data exactly i.e.
over fitting.

2.What is pca.components_ in Sklearn?


Set of all eigen vectors for the projection space
Matrix of principal components
Result of the multiplication matrix
None of the above options
Ans A

3.Which of the following techniques would perform better for reducing dimensions of a data
set?
A. Removing columns which have too many missing values
B. Removing columns which have high variance in data
C. Removing columns with dissimilar data trends
D. None of these
Ans Solution: (A)
If a columns have too many missing values, (say 99%) then we can remove such columns.

4.It is not necessary to have a target variable for applying dimensionality reduction
algorithms.
A. TRUE
B. FALSE
Ans Solution: (A)
LDA is an example of supervised dimensionality reduction algorithm.

5. PCA can be used for projecting and visualizing data in lower dimensions.
A. TRUE
B. FALSE
Ans Solution: (A)
Sometimes it is very useful to plot the data in lower dimensions. We can take the first 2 principal
components and then visualize the data using scatter plot.

6. The most popularly used dimensionality reduction algorithm is Principal Component Analysis
(PCA). Which of the following is/are true about PCA?
PCA is an unsupervised method
It searches for the directions that data have the largest variance
Maximum number of principal components <= number of features
All principal components are orthogonal to each other
A. 1 and 2
B. 1 and 3
C. 2 and 3
D. All of the above

Ans D

7. PCA works better if there is?


A linear structure in the data
If the data lies on a curved surface and not on a flat surface
If variables are scaled in the same unit
A. 1 and 2
B. 2 and 3
C. 1 and 3
D. 1 ,2 and 3
Ans Solution: (C)

8. What happens when you get features in lower dimensions using PCA?
The features will still have interpretability
The features will lose interpretability
The features must carry all information present in data
The features may not carry all information present in data
A. 1 and 3
B. 1 and 4
C. 2 and 3
D. 2 and 4
Ans Solution: (D)
When you get the features in lower dimensions then you will lose some information of data
most of the times and you won’t be able to interpret the lower dimension data.

9. Which of the following option(s) is / are true?


You need to initialize parameters in PCA
You don’t need to initialize parameters in PCA
PCA can be trapped into local minima problem
PCA can’t be trapped into local minima problem
A. 1 and 3
B. 1 and 4
C. 2 and 3
D. 2 and 4
Ans Solution: (D)
PCA is a deterministic algorithm which doesn’t have parameters to initialize and it doesn’t have
local minima problem like most of the machine learning algorithms has.

10. What is of the following statement is true about t-SNE in comparison to PCA?
A. When the data is huge (in size), t-SNE may fail to produce better results.
B. T-NSE always produces better result regardless of the size of the data
C. PCA always performs better than t-SNE for smaller size data.
D. None of these
Ans Solution: (A)
Option A is correct

11. [ True or False ] PCA can be used for projecting and visualizing data in lower dimensions.
A. TRUE
B. FALSE

Solution: (A)
Sometimes it is very useful to plot the data in lower dimensions. We can take the first 2 principal
components and then visualize the data using scatter plot.

12. A feature F1 can take certain value: A, B, C, D, E, & F and represents grade of students from
a college.
1) Which of the following statement is true in following case?
A) Feature F1 is an example of nominal variable.
B) Feature F1 is an example of ordinal variable.
C) It doesn’t belong to any of the above category.
D) Both of these
Solution: (B)
Ordinal variables are the variables which has some order in their categories. For example, grade
A should be consider as high grade than grade B.

13. Which of the following is an example of a deterministic algorithm?


A) PCA
B) K-Means
C) None of the above
Solution: (A)
A deterministic algorithm is that in which output does not change on different runs. PCA would
give the same result if we run again, but not k-means.
UNIT –III

1. Which of the following methods do we use to best fit the data in Logistic Regression?
A) Least Square Error
B) Maximum Likelihood
C) Jaccard distance
D) Both A and B
Ans Solution: B

2. Choose which of the following options is true regarding One-Vs-All method in Logistic
Regression.
A) We need to fit n models in n-class classification problem
B) We need to fit n-1 models to classify into n classes
C) We need to fit only 1 model to classify into n classes
D) None of these
Ans Solution: A

3. Suppose, You applied a Logistic Regression model on a given data and got a training accuracy
X and testing accuracy Y. Now, you want to add a few new features in the same data. Select the
option(s) which is/are correct in such a case.
Note: Consider remaining parameters are same.
A) Training accuracy increases
B) Training accuracy increases or remains the same
C) Testing accuracy decreases
D) Testing accuracy increases or remains the same
Ans Solution: A and D
Adding more features to model will increase the training accuracy because model has to
consider more data to fit the logistic regression. But testing accuracy increases if feature is
found to be significant

4. Which of the following algorithms do we use for Variable Selection?


A) LASSO
B) Ridge
C) Both
D) None of these
Ans Solution: A
In case of lasso we apply a absolute penality, after increasing the penality in lasso some of the
coefficient of variables may become zero

5. Which of the following statement is true about outliers in Linear regression?


A) Linear regression is sensitive to outliers
B) Linear regression is not sensitive to outliers
C) Can’t say
D) None of these
Ans Solution: (A)
The slope of the regression line will change due to outliers in most of the cases. So Linear
Regression is sensitive to outliers.

6. Which of the following methods do we use to find the best fit line for data in Linear
Regression?
A) Least Square Error
B) Maximum Likelihood
C) Logarithmic Loss
D) Both A and B
Ans Solution: (A)
In linear regression, we try to minimize the least square errors of the model to identify the line
of best fit.

7. Which of the following is true about Residuals?


A) Lower is better
B) Higher is better
C) A or B depend on the situation
D) None of these
Ans Solution: (A)
Residuals refer to the error values of the model. Therefore lower residuals are desired.

8. Suppose you plotted a scatter plot between the residuals and predicted values in linear
regression and you found that there is a relationship between them. Which of the following
conclusion do you make about this situation?

A) Since the there is a relationship means our model is not good


B) Since the there is a relationship means our model is good
C) Can’t say
D) None of these
Ans Solution: (A)
There should not be any relationship between predicted values and residuals. If there exists any
relationship between them, it means that the model has not perfectly captured the information
in the data.

9. Suppose you have fitted a complex regression model on a dataset. Now, you are using Ridge
regression with penalty x.
Choose the option which describes bias in best manner.
A) In case of very large x; bias is low
B) In case of very large x; bias is high
C) We can’t say about bias
D) None of these
Ans Solution: (B)
If the penalty is very large it means model is less complex, therefore the bias would be high.

10. Which of the following option is true?


A) Linear Regression errors values has to be normally distributed but in case of Logistic
Regression it is not the case
B) Logistic Regression errors values has to be normally distributed but in case of Linear
Regression it is not the case
C) Both Linear Regression and Logistic Regression error values have to be normally distributed
D) Both Linear Regression and Logistic Regression error values have not to be normally
distributed
Ans Solution: A

11. Suppose you have trained a logistic regression classifier and it outputs a new example x with
a prediction ho(x) = 0.2. This means
Our estimate for P(y=1 | x)
Our estimate for P(y=0 | x)
Our estimate for P(y=1 | x)
Our estimate for P(y=0 | x)
Ans Solution: B

12. True-False: Linear Regression is a supervised machine learning algorithm.


A) TRUE
B) FALSE
Solution: (A)
Yes, Linear regression is a supervised learning algorithm because it uses true labels for training.
Supervised learning algorithm should have input variable (x) and an output variable (Y) for each
example.

13. True-False: Linear Regression is mainly used for Regression.


A) TRUE
B) FALSE
Solution: (A)
Linear Regression has dependent variables that have continuous values.
14. True-False: It is possible to design a Linear regression algorithm using a neural network?

A) TRUE
B) FALSE

Solution: (A)

True. A Neural network can be used as a universal approximator, so it can definitely implement
a linear regression algorithm.

15. Which of the following methods do we use to find the best fit line for data in Linear
Regression?
A) Least Square Error
B) Maximum Likelihood
C) Logarithmic Loss
D) Both A and B
Solution: (A)
In linear regression, we try to minimize the least square errors of the model to identify the line
of best fit.

16. Which of the following evaluation metrics can be used to evaluate a model while modeling
a continuous output variable?
A) AUC-ROC
B) Accuracy
C) Logloss
D) Mean-Squared-Error
Solution: (D)
Since linear regression gives output as continuous values, so in such case we use mean squared
error metric to evaluate the model performance. Remaining options are use in case of a
classification problem.

17. True-False: Lasso Regularization can be used for variable selection in Linear Regression.
A) TRUE
B) FALSE
Solution: (A)
True, In case of lasso regression we apply absolute penalty which makes some of the coefficients
zero.

18. Which of the following is true about Residuals ?


A) Lower is better
B) Higher is better
C) A or B depend on the situation
D) None of these
Solution: (A)
Residuals refer to the error values of the model. Therefore lower residuals are desired.

19. Suppose that we have N independent variables (X1,X2… Xn) and dependent variable is Y.
Now Imagine that you are applying linear regression by fitting the best fit line using least square
error on this data.
You found that correlation coefficient for one of it’s variable(Say X1) with Y is -0.95.
Which of the following is true for X1?
A) Relation between the X1 and Y is weak
B) Relation between the X1 and Y is strong
C) Relation between the X1 and Y is neutral
D) Correlation can’t judge the relationship
Solution: (B)
The absolute value of the correlation coefficient denotes the strength of the relationship.
Since absolute correlation is very high it means that the relationship is strong between X1 and
Y.

20. Looking at above two characteristics, which of the following option is the correct for
Pearson correlation between V1 and V2?
If you are given the two variables V1 and V2 and they are following below two characteristics.
1. If V1 increases then V2 also increases
2. If V1 decreases then V2 behavior is unknown
A) Pearson correlation will be close to 1
B) Pearson correlation will be close to -1
C) Pearson correlation will be close to 0
D) None of these

Solution: (D)
We cannot comment on the correlation coefficient by using only statement 1. We need to
consider the both of these two statements. Consider V1 as x and V2 as |x|. The correlation
coefficient would not be close to 1 in such a case.

21. Suppose Pearson correlation between V1 and V2 is zero. In such case, is it right to
conclude that V1 and V2 do not have any relation between them?
A) TRUE
B) FALSE
Solution: (B)
Pearson correlation coefficient between 2 variables might be zero even when they have a
relationship between them. If the correlation coefficient is zero, it just means that that they
don’t move together. We can take examples like y=|x| or y=x^2.
22. True- False: Overfitting is more likely when you have huge amount of data to train?
A) TRUE
B) FALSE
Solution: (B)
With a small training dataset, it’s easier to find a hypothesis to fit the training data exactly i.e.
overfitting.

23. We can also compute the coefficient of linear regression with the help of an analytical
method called “Normal Equation”. Which of the following is/are true about Normal Equation?
1. We don’t have to choose the learning rate
2. It becomes slow when number of features is very large
3. Thers is no need to iterate

A) 1 and 2
B) 1 and 3
C) 2 and 3
D) 1,2 and 3
Solution: (D)
Instead of gradient descent, Normal Equation can also be used to find coefficients.

Question Context 24-26:


Suppose you have fitted a complex regression model on a dataset. Now, you are using Ridge
regression with penality x.
24. Choose the option which describes bias in best manner.
A) In case of very large x; bias is low
B) In case of very large x; bias is high
C) We can’t say about bias
D) None of these
Solution: (B)
If the penalty is very large it means model is less complex, therefore the bias would be high.

25. What will happen when you apply very large penalty?
A) Some of the coefficient will become absolute zero
B) Some of the coefficient will approach zero but not absolute zero
C) Both A and B depending on the situation
D) None of these
Solution: (B)
In lasso some of the coefficient value become zero, but in case of Ridge, the coefficients become
close to zero but not zero.

26. What will happen when you apply very large penalty in case of Lasso?
A) Some of the coefficient will become zero
B) Some of the coefficient will be approaching to zero but not absolute zero
C) Both A and B depending on the situation
D) None of these
Solution: (A)
As already discussed, lasso applies absolute penalty, so some of the coefficients will become
zero.

27. Which of the following statement is true about outliers in Linear regression?
A) Linear regression is sensitive to outliers
B) Linear regression is not sensitive to outliers
C) Can’t say
D) None of these
Solution: (A)
The slope of the regression line will change due to outliers in most of the cases. So Linear
Regression is sensitive to outliers.

28. Suppose you plotted a scatter plot between the residuals and predicted values in linear
regression and you found that there is a relationship between them. Which of the following
conclusion do you make about this situation?

A) Since the there is a relationship means our model is not good


B) Since the there is a relationship means our model is good
C) Can’t say
D) None of these
Solution: (A)
There should not be any relationship between predicted values and residuals. If there exists any
relationship between them,it means that the model has not perfectly captured the information
in the data.

Question Context 29-31:


Suppose that you have a dataset D1 and you design a linear regression model of degree 3
polynomial and you found that the training and testing error is “0” or in another terms it
perfectly fits the data.
29. What will happen when you fit degree 4 polynomial in linear regression?
A) There are high chances that degree 4 polynomial will over fit the data
B) There are high chances that degree 4 polynomial will under fit the data
C) Can’t say
D) None of these
Solution: (A)
Since is more degree 4 will be more complex(overfit the data) than the degree 3 model so it will
again perfectly fit the data. In such case training error will be zero but test error may not be
zero.
30. What will happen when you fit degree 2 polynomial in linear regression?
A) It is high chances that degree 2 polynomial will over fit the data
B) It is high chances that degree 2 polynomial will under fit the data
C) Can’t say
D) None of these
Solution: (B)
If a degree 3 polynomial fits the data perfectly, it’s highly likely that a simpler model(degree 2
polynomial) might under fit the data.

31. In terms of bias and variance. Which of the following is true when you fit degree 2
polynomial?

A) Bias will be high, variance will be high


B) Bias will be low, variance will be high
C) Bias will be high, variance will be low
D) Bias will be low, variance will be low
Solution: (C)
Since a degree 2 polynomial will be less complex as compared to degree 3, the bias will be high
and variance will be low.

Question Context 32-33:


We have been given a dataset with n records in which we have input attribute as x and output
attribute as y. Suppose we use a linear regression method to model this data. To test our linear
regressor, we split the data in training set and test set randomly.
32. Now we increase the training set size gradually. As the training set size increases, what do
you expect will happen with the mean training error?

A) Increase
B) Decrease
C) Remain constant
D) Can’t Say
Solution: (D)
Training error may increase or decrease depending on the values that are used to fit the model.
If the values used to train contain more outliers gradually, then the error might just increase.

33. What do you expect will happen with bias and variance as you increase the size of training
data?

A) Bias increases and Variance increases


B) Bias decreases and Variance increases
C) Bias decreases and Variance decreases
D) Bias increases and Variance decreases
E) Can’t Say False
Solution: (D)
As we increase the size of the training data, the bias would increase while the variance would
decrease.

Question Context 34:


Consider the following data where one input(X) and one output(Y) is given.

34. What would be the root mean square training error for this data if you run a Linear
Regression model of the form (Y = A0+A1X)?

A) Less than 0
B) Greater than zero
C) Equal to 0
D) None of these
Solution: (C)
We can perfectly fit the line on the following data so mean error will be zero.

Question Context 35-36:


Suppose you have been given the following scenario for training and validation error for Linear
Regression.
Number Validation
Learning Training
Scenario of Error
Rate Error
iterations

1 0.1 1000 100 110

2 0.2 600 90 105


3 0.3 400 110 110

4 0.4 300 120 130

5 0.4 250 130 150

35. Which of the following scenario would give you the right hyper parameter?
A) 1
B) 2
C) 3
D) 4
Solution: (B)
Option B would be the better option because it leads to less training as well as validation error.
36. Suppose you got the tuned hyper parameters from the previous question. Now, Imagine
you want to add a variable in variable space such that this added feature is important. Which
of the following thing would you observe in such case?
A) Training Error will decrease and Validation error will increase
B) Training Error will increase and Validation error will increase
C) Training Error will increase and Validation error will decrease
D) Training Error will decrease and Validation error will decrease
E) None of the above
Solution: (D)
If the added feature is important, the training and validation error would decrease.

Question Context 37-38:


Suppose, you got a situation where you find that your linear regression model is under fitting
the data.
37. In such situation which of the following options would you consider?
1. I will add more variables
2. I will start introducing polynomial degree variables
3. I will remove some variables
A) 1 and 2
B) 2 and 3
C) 1 and 3
D) 1, 2 and 3
Solution: (A)
In case of under fitting, you need to induce more variables in variable space or you can add
some polynomial degree variables to make the model more complex to be able to fir the data
better.
38. Now situation is same as written in previous question(under fitting).Which of following
regularization algorithm would you prefer?

A) L1
B) L2
C) Any
D) None of these
Solution: (D)
I won’t use any regularization methods because regularization is used in case of overfitting.

39. True-False: Is Logistic regression a supervised machine learning algorithm?


A) TRUE
B) FALSE
Solution: A
True, Logistic regression is a supervised learning algorithm because it uses true labels for
training. Supervised learning algorithm should have input variables (x) and an target variable (Y)
when you train the model .

40. True-False: Is Logistic regression mainly used for Regression?


A) TRUE
B) FALSE
Solution: B
Logistic regression is a classification algorithm, don’t confuse with the name regression.

41. True-False: Is it possible to design a logistic regression algorithm using a Neural Network
Algorithm?
A) TRUE
B) FALSE
Solution: A
True, Neural network is a is a universal approximator so it can implement linear regression
algorithm.

42. True-False: Is it possible to apply a logistic regression algorithm on a 3-class Classification


problem?
A) TRUE
B) FALSE
Solution: A
Yes, we can apply logistic regression on 3 classification problem, We can use One Vs all method
for 3 class classification in logistic regression.

43. Which of the following methods do we use to best fit the data in Logistic Regression?
A) Least Square Error
B) Maximum Likelihood
C) Jaccard distance
D) Both A and B
Solution: B
Logistic regression uses maximum likely hood estimate for training a logistic regression.

44. Which of the following evaluation metrics can not be applied in case of logistic regression
output to compare with target?
A) AUC-ROC
B) Accuracy
C) Logloss
D) Mean-Squared-Error
Solution: D
Since, Logistic Regression is a classification algorithm so it’s output can not be real time value so
mean squared error can not use for evaluating it

45. One of the very good methods to analyze the performance of Logistic Regression is AIC,
which is similar to R-Squared in Linear Regression. Which of the following is true about AIC?
A) We prefer a model with minimum AIC value
B) We prefer a model with maximum AIC value
C) Both but depend on the situation
D) None of these
Solution: A
We select the best model in logistic regression which can least AIC.

46. [True-False] Standardisation of features is required before training a Logistic Regression.


A) TRUE
B) FALSE
Solution: B
Standardization isn’t required for logistic regression. The main goal of standardizing features is
to help convergence of the technique used for optimization.

47. Which of the following algorithms do we use for Variable Selection?


A) LASSO
B) Ridge
C) Both
D) None of these

Solution: A
In case of lasso we apply a absolute penality, after increasing the penality in lasso some of the
coefficient of variables may become zero.
Context: 48-49

Consider a following model for logistic regression: P (y =1|x, w)= g(w0 + w1x)
where g(z) is the logistic function.

In the above equation the P (y =1|x; w) , viewed as a function of x, that we can get by changing the
parameters w.

48 What would be the range of p in such case?

A) (0, inf)
B) (-inf, 0 )
C) (0, 1)
D) (-inf, inf)

Solution: C

For values of x in the range of real number from −∞ to +∞ Logistic function will give the output
between (0,1)

49 In above question what do you think which function would make p between (0,1)?

A) logistic function
B) Log likelihood function
C) Mixture of both
D) None of them

Solution: A

Explanation is same as question number 10

50. Suppose you have been given a fair coin and you want to find out the odds of getting heads.
Which of the following option is true for such a case?

A) odds will be 0
B) odds will be 0.5
C) odds will be 1
D) None of these

Solution: C

Odds are defined as the ratio of the probability of success and the probability of failure. So in case of fair
coin probability of success is 1/2 and the probability of failure is 1/2 so odd would be 1

51. The logit function(given as l(x)) is the log of odds function. What could be the range of logit
function in the domain x=[0,1]?
A) (– ∞ , ∞)
B) (0,1)
C) (0, ∞)
D) (- ∞, 0)

Solution: A

For our purposes, the odds function has the advantage of transforming the probability function, which
has values from 0 to 1, into an equivalent function with values between 0 and ∞. When we take the
natural log of the odds function, we get a range of values from -∞ to ∞.

52. Which of the following option is true?

A) Linear Regression errors values has to be normally distributed but in case of Logistic Regression it is
not the case
B) Logistic Regression errors values has to be normally distributed but in case of Linear Regression it is
not the case
C) Both Linear Regression and Logistic Regression error values have to be normally distributed
D) Both Linear Regression and Logistic Regression error values have not to be normally distributed

Solution:A

53. Which of the following is true regarding the logistic function for any value “x”?

Note:
Logistic(x): is a logistic function of any number “x”

Logit(x): is a logit function of any number “x”

Logit_inv(x): is a inverse logit function of any number “x”

A) Logistic(x) = Logit(x)
B) Logistic(x) = Logit_inv(x)
C) Logit_inv(x) = Logit(x)
D) None of these

Solution: B

54. How will the bias change on using high(infinite) regularisation?

Suppose you have given the two scatter plot “a” and “b” for two classes( blue for positive and red for
negative class). In scatter plot “a”, you correctly classified all data points using logistic regression ( black
line is a decision boundary).
A) Bias will be high
B) Bias will be low
C) Can’t say
D) None of these

Solution: A

Model will become very simple so bias will be very high.

55. Suppose, You applied a Logistic Regression model on a given data and got a training accuracy X
and testing accuracy Y. Now, you want to add a few new features in the same data. Select the
option(s) which is/are correct in such a case.

Note: Consider remaining parameters are same.

A) Training accuracy increases


B) Training accuracy increases or remains the same
C) Testing accuracy decreases
D) Testing accuracy increases or remains the same

Solution: A and D

Adding more features to model will increase the training accuracy because model has to consider more
data to fit the logistic regression. But testing accuracy increases if feature is found to be significant

56. Choose which of the following options is true regarding One-Vs-All method in Logistic Regression.

A) We need to fit n models in n-class classification problem


B) We need to fit n-1 models to classify into n classes
C) We need to fit only 1 model to classify into n classes
D) None of these
Solution: A

If there are n classes, then n separate logistic regression has to fit, where the probability of each
category is predicted over the rest of the categories combined.

57. Below are two different logistic models with different values for β0 and β1.

Which of the
following statement(s) is true about β0 and β1 values of two logistics models (Green, Black)?

Note: consider Y = β0 + β1*X. Here, β0 is intercept and β1 is coefficient.

A) β1 for Green is greater than Black


B) β1 for Green is lower than Black
C) β1 for both models is same
D) Can’t Say

Solution: B

β0 and β1: β0 = 0, β1 = 1 is in X1 color(black) and β0 = 0, β1 = −1 is in X4 color (green)

Context 58-60

Below are the three scatter plot(A,B,C left to right) and hand drawn decision boundaries for logistic
regression.
58. Which of the following above figure shows that the decision boundary is overfitting the training
data?

A) A
B) B
C) C
D)None of these

Solution: C

Since in figure 3, Decision boundary is not smooth that means it will over-fitting the data.

59. What do you conclude after seeing this visualization?

1. The training error in first plot is maximum as compare to second and third plot.

2. The best model for this regression problem is the last (third) plot because it has minimum
training error (zero).

3. The second model is more robust than first and third because it will perform best on unseen
data.

4. The third model is overfitting more as compare to first and second.

5. All will perform same because we have not seen the testing data.

A) 1 and 3
B) 1 and 3
C) 1, 3 and 4
D) 5

Solution: C

The trend in the graphs looks like a quadratic trend over independent variable X. A higher degree(Right
graph) polynomial might have a very high accuracy on the train population but is expected to fail badly
on test dataset. But if you see in left graph we will have training error maximum because it underfits the
training data

60. Suppose, above decision boundaries were generated for the different value of regularization.
Which of the above decision boundary shows the maximum regularization?

A) A
B) B
C) C
D) All have equal regularization

Solution: A

Since, more regularization means more penality means less complex decision boundry that shows in first
figure A.

61. What would do if you want to train logistic regression on same data that will take less time as well
as give the comparatively similar accuracy(may not be same)?

Suppose you are using a Logistic Regression model on a huge dataset. One of the problem you may face
on such huge data is that Logistic regression will take very long time to train.

A) Decrease the learning rate and decrease the number of iteration


B) Decrease the learning rate and increase the number of iteration
C) Increase the learning rate and increase the number of iteration
D) Increase the learning rate and decrease the number of iteration

Solution: D

If you decrease the number of iteration while training it will take less time for surly but will not give the
same accuracy for getting the similar accuracy but not exact you need to increase the learning rate.

62. Which of the following image is showing the cost function for y =1.

Following is the loss function in logistic regression(Y-axis loss function and x axis log probability) for
two class classification problem.

Note: Y is the target class


A) A
B) B
C) Both
D) None of these

Solution: A

A is the true answer as loss function decreases as the log probability increases

63. Suppose, Following graph is a cost function for logistic regression.

Now, How many local minimas are present in the graph?

A) 1
B) 2
C) 3
D) 4

Solution: C
There are three local minima present in the graph

64. Can a Logistic Regression classifier do a perfect classification on the below data?

Note: You can use only X1 and X2 variables where X1 and X2 can take only two binary values(0,1).

A) TRUE
B) FALSE
C) Can’t say
D) None of these

Solution: B

No, logistic regression only forms linear decision surface, but the examples in the figure are not linearly
separable.
UNIT IV

1. The SVM’s are less effective when:

A) The data is linearly separable


B) The data is clean and ready to use
C) The data is noisy and contains overlapping points

Ans Solution: C

When the data has noise and overlapping points, there is a problem in drawing a clear hyperplane
without misclassifying.

2. The cost parameter in the SVM means:

A) The number of cross-validations to be made


B) The kernel to be used
C) The tradeoff between misclassification and simplicity of the model
D) None of the above

Ans Solution: C

The cost parameter decides how much an SVM should be allowed to “bend” with the data. For a low
cost, you aim for a smooth decision surface and for a higher cost, you aim to classify more points
correctly. It is also simply referred to as the cost of misclassification.

3. Which of the following are real world applications of the SVM?

A) Text and Hypertext Categorization


B) Image Classification
C) Clustering of News Articles
D) All of the above

Ans Solution: D

SVM’s are highly versatile models that can be used for practically all real world problems ranging from
regression to clustering and handwriting recognitions.

4. Which of the following is true about Naive Bayes ?

Assumes that all the features in a dataset are equally important

Assumes that all the features in a dataset are independent

Both A and B - answer

None of the above options


Ans Solution: C

5 What do you mean by generalization error in terms of the SVM?

A) How far the hyperplane is from the support vectors


B) How accurately the SVM can predict outcomes for unseen data
C) The threshold amount of error in an SVM

Ans Solution: B

Generalisation error in statistics is generally the out-of-sample error which is the measure of how
accurately a model can predict values for previously unseen data.

6 The SVM’s are less effective when:

A) The data is linearly separable


B) The data is clean and ready to use
C) The data is noisy and contains overlapping points

Ans Solution: C

When the data has noise and overlapping points, there is a problem in drawing a clear hyperplane
without misclassifying.

7 What is/are true about kernel in SVM?

1. Kernel function map low dimensional data to high dimensional space


2. It’s a similarity function

A) 1
B) 2
C) 1 and 2
D) None of these

Ans Solution: C

Both the given statements are correct.

Question Context:8– 9

Suppose you are using a Linear SVM classifier with 2 class classification problem. Now you have been
given the following data in which some points are circled red that are representing support vectors.
8. If you remove the following any one red points from the data. Does the decision boundary will
change?

A) Yes
B) No

Solution: A

These three examples are positioned such that removing any one of them introduces slack in the
constraints. So the decision boundary would completely change.

9. [True or False] If you remove the non-red circled points from the data, the decision boundary will
change?

A) True
B) False

Solution: B

On the other hand, rest of the points in the data won’t affect the decision boundary much.

10. What do you mean by generalization error in terms of the SVM?

A) How far the hyperplane is from the support vectors


B) How accurately the SVM can predict outcomes for unseen data
C) The threshold amount of error in an SVM

Solution: B

Generalization error in statistics is generally the out-of-sample error which is the measure of how
accurately a model can predict values for previously unseen data.
11. When the C parameter is set to infinite, which of the following holds true?

A) The optimal hyperplane if exists, will be the one that completely separates the data
B) The soft-margin classifier will separate the data
C) None of the above

Solution: A

At such a high level of misclassification penalty, soft margin will not hold existence as there will be no
room for error.

12. What do you mean by a hard margin?

A) The SVM allows very low error in classification


B) The SVM allows high amount of error in classification
C) None of the above

Solution: A

A hard margin means that an SVM is very rigid in classification and tries to work extremely well in the
training set, causing overfitting.

13. The minimum time complexity for training an SVM is O(n2). According to this fact, what sizes of
datasets are not best suited for SVM’s?

A) Large datasets
B) Small datasets
C) Medium sized datasets
D) Size does not matter

Solution: A

Datasets which have a clear classification boundary will function best with SVM’s.

14. The effectiveness of an SVM depends upon:

A) Selection of Kernel
B) Kernel Parameters
C) Soft Margin Parameter C
D) All of the above

Solution: D

The SVM effectiveness depends upon how you choose the basic 3 requirements mentioned above in
such a way that it maximises your efficiency, reduces error and overfitting.

15. upport vectors are the data points that lie closest to the decision surface.
A) TRUE
B) FALSE

Solution: A

They are the points closest to the hyperplane and the hardest ones to classify. They also have a direct
bearing on the location of the decision surface.

16. The SVM’s are less effective when:

A) The data is linearly separable


B) The data is clean and ready to use
C) The data is noisy and contains overlapping points

Solution: C

When the data has noise and overlapping points, there is a problem in drawing a clear hyperplane
without misclassifying.

17. Suppose you are using RBF kernel in SVM with high Gamma value. What does this signify?

A) The model would consider even far away points from hyperplane for modeling
B) The model would consider only the points close to the hyperplane for modeling
C) The model would not be affected by distance of points from hyperplane for modeling
D) None of the above

Solution: B

The gamma parameter in SVM tuning signifies the influence of points either near or far away from the
hyperplane.

For a low gamma, the model will be too constrained and include all points of the training dataset,
without really capturing the shape.

For a higher gamma, the model will capture the shape of the dataset well.

18. The cost parameter in the SVM means:

A) The number of cross-validations to be made


B) The kernel to be used
C) The tradeoff between misclassification and simplicity of the model
D) None of the above

Solution: C
The cost parameter decides how much an SVM should be allowed to “bend” with the data. For a low
cost, you aim for a smooth decision surface and for a higher cost, you aim to classify more points
correctly. It is also simply referred to as the cost of misclassification.

19. Suppose you are building a SVM model on data X. The data X can be error prone which means that
you should not trust any specific data point too much. Now think that you want to build a SVM model
which has quadratic kernel function of polynomial degree 2 that uses Slack variable C as one of it’s hyper
parameter. Based upon that give the answer for following question.

What would happen when you use very large value of C(C->infinity)?

Note: For small C was also classifying all data points correctly

A) We can still classify data correctly for given setting of hyper parameter C
B) We can not classify data correctly for given setting of hyper parameter C
C) Can’t Say
D) None of these

Solution: A

For large values of C, the penalty for misclassifying points is very high, so the decision boundary will
perfectly separate the data if possible.

20. What would happen when you use very small C (C~0)?

A) Misclassification would happen


B) Data will be correctly classified
C) Can’t say
D) None of these

Solution: A

The classifier can maximize the margin between most of the points, while misclassifying a few points,
because the penalty is so low.

21. If I am using all features of my dataset and I achieve 100% accuracy on my training set, but ~70% on
validation set, what should I look out for?

A) Underfitting
B) Nothing, the model is perfect
C) Overfitting

Solution: C

If we’re achieving 100% training accuracy very easily, we need to check to verify if we’re overfitting our
data.
22. Which of the following are real world applications of the SVM?

A) Text and Hypertext Categorization


B) Image Classification
C) Clustering of News Articles
D) All of the above

Solution: D

SVM’s are highly versatile models that can be used for practically all real world problems ranging from
regression to clustering and handwriting recognitions.

Question Context: 23 – 25

Suppose you have trained an SVM with linear decision boundary after training SVM, you correctly infer
that your SVM model is under fitting.

23. Which of the following option would you more likely to consider iterating SVM next time?

A) You want to increase your data points


B) You want to decrease your data points
C) You will try to calculate more variables
D) You will try to reduce the features

Solution: C

The best option here would be to create more features for the model.

24. Suppose you gave the correct answer in previous question. What do you think that is actually
happening?

1. We are lowering the bias


2. We are lowering the variance
3. We are increasing the bias
4. We are increasing the variance

A) 1 and 2
B) 2 and 3
C) 1 and 4
D) 2 and 4

Solution: C

Better model will lower the bias and increase the variance
25. In above question suppose you want to change one of it’s(SVM) hyperparameter so that effect
would be same as previous questions i.e model will not under fit?

A) We will increase the parameter C


B) We will decrease the parameter C
C) Changing in C don’t effect
D) None of these

Solution: A

Increasing C parameter would be the right thing to do here, as it will ensure regularized model

26. We usually use feature normalization before using the Gaussian kernel in SVM. What is true about
feature normalization?

1. We do feature normalization so that new feature will dominate other


2. Some times, feature normalization is not feasible in case of categorical variables
3. Feature normalization always helps when we use Gaussian kernel in SVM

A) 1
B) 1 and 2
C) 1 and 3
D) 2 and 3

Solution: B

Statements one and two are correct.

Question Context: 27-29

Suppose you are dealing with 4 class classification problem and you want to train a SVM model on the
data for that you are using One-vs-all method. Now answer the below questions?

27. How many times we need to train our SVM model in such case?

A) 1
B) 2
C) 3
D) 4

Solution: D

For a 4 class problem, you would have to train the SVM at least 4 times if you are using a one-vs-all
method.
28. Suppose you have same distribution of classes in the data. Now, say for training 1 time in one vs all
setting the SVM is taking 10 second. How many seconds would it require to train one-vs-all method end
to end?

A) 20
B) 40
C) 60
D) 80

Solution: B

It would take 10×4 = 40 seconds

29 Suppose your problem has changed now. Now, data has only 2 classes. What would you think how
many times we need to train SVM in such case?

A) 1
B) 2
C) 3
D) 4

Solution: A

Training the SVM only one time would give you appropriate results

Question context: 30 –31

Suppose you are using SVM with linear kernel of polynomial degree 2, Now think that you have applied
this on data and found that it perfectly fit the data that means, Training and testing accuracy is 100%.

30. Now, think that you increase the complexity (or degree of polynomial of this kernel). What would
you think will happen?

A) Increasing the complexity will over fit the data


B) Increasing the complexity will under fit the data
C) Nothing will happen since your model was already 100% accurate
D) None of these

Solution: A

Increasing the complexity of the data would make the algorithm overfit the data.
31. In the previous question after increasing the complexity you found that training accuracy was still
100%. According to you what is the reason behind that?

1. Since data is fixed and we are fitting more polynomial term or parameters so the algorithm starts
memorizing everything in the data
2. Since data is fixed and SVM doesn’t need to search in big hypothesis space

A) 1
B) 2
C) 1 and 2
D) None of these

Solution: C

Both the given statements are correct.

32. What is/are true about kernel in SVM?

1. Kernel function map low dimensional data to high dimensional space


2. It’s a similarity function

A) 1
B) 2
C) 1 and 2
D) None of these

Solution: C

Both the given statements are correct.

UNIT V

1. Which of the following is a widely used and effective machine learning algorithm based on the
idea of bagging?

a) Decision Tree
b) Regression
c) Classification
d) Random Forest

Ans D

2. Which of the following is a disadvantage of decision trees?

a) Factor analysis
b) Decision trees are robust to outliers
c) Decision trees are prone to be overfit
d) None of the above

Ans C

3. Can decision trees be used for performing clustering?

a. True
b. False

Ans Solution: (A)

Decision trees can also be used to for clusters in the data but clustering often generates natural
clusters and is not dependent on any objective function.

4. Which of the following algorithm is most sensitive to outliers?

a. K-means clustering algorithm


b. K-medians clustering algorithm
c. K-modes clustering algorithm
d. K-medoids clustering algorithm

Ans Solution: (A)

5 Sentiment Analysis is an example of:

Regression

Classification

Clustering

Reinforcement Learning

Options:

a. 1 Only
b. 1 and 2
c. 1 and 3
d. 1, 2 and 4

Ans D

6 Which of the following is the most appropriate strategy for data cleaning before performing
clustering analysis, given less than desirable number of data points:

Capping and flouring of variables

Removal of outliers
Options:
a. 1 only
b. 2 only
c. 1 and 2
d. None of the above

Ans A

7 Which of the following is/are true about bagging trees?

1. In bagging trees, individual trees are independent of each other


2. Bagging is the method for improving the performance by aggregating the results of weak
learners

A) 1
B) 2
C) 1 and 2
D) None of these

Ans Solution: C

Both options are true. In Bagging, each individual trees are independent of each other because they
consider different subset of features and samples.

8. Which of the following is/are true about boosting trees?

1. In boosting trees, individual weak learners are independent of each other


2. It is the method for improving the performance by aggregating the results of weak learners

A) 1
B) 2
C) 1 and 2
D) None of these

Ans Solution: B

In boosting tree individual weak learners are not independent of each other because each tree correct
the results of previous tree. Bagging and boosting both can be consider as improving the base learners
results.

9. In Random forest you can generate hundreds of trees (say T1, T2 …..Tn) and then aggregate
the results of these tree. Which of the following is true about individual (Tk) tree in Random Forest?
1. Individual tree is built on a subset of the features

2. Individual tree is built on all the features

3. Individual tree is built on a subset of observations

4. Individual tree is built on full set of observations

A) 1 and 3
B) 1 and 4
C) 2 and 3
D) 2 and 4

Ans Solution: A

Random forest is based on bagging concept, that consider faction of sample and faction of feature for
building the individual trees.

10. Suppose you are using a bagging based algorithm say a RandomForest in model building.
Which of the following can be true?

1. Number of tree should be as large as possible

2. You will have interpretability after using Random Forest

A) 1
B) 2
C) 1 and 2
D) None of these

Ans Solution: A

Since Random Forest aggregate the result of different weak learners, If It is possible we would want
more number of trees in model building. Random Forest is a black box model you will lose
interpretability after using it.

11. Which of the following is/are true about Random Forest and Gradient Boosting ensemble
methods?

1. Both methods can be used for classification task

2. Random Forest is use for classification whereas Gradient Boosting is use for regression task

3. Random Forest is use for regression whereas Gradient Boosting is use for Classification task

4. Both methods can be used for regression task


A) 1
B) 2
C) 3
D) 4
E) 1 and 4

Solution: E

Both algorithms are design for classification as well as regression task.

12. In Random forest you can generate hundreds of trees (say T1, T2 …..Tn) and then aggregate the
results of these tree. Which of the following is true about individual(Tk) tree in Random Forest?

1. Individual tree is built on a subset of the features

2. Individual tree is built on all the features

3. Individual tree is built on a subset of observations

4. Individual tree is built on full set of observations

A) 1 and 3
B) 1 and 4
C) 2 and 3
D) 2 and 4

Solution: A

Random forest is based on bagging concept, that consider faction of sample and faction of feature for
building the individual trees.

13. Which of the following algorithm doesn’t uses learning Rate as of one of its hyperparameter?

1. Gradient Boosting

2. Extra Trees

3. AdaBoost

4. Random Forest

A) 1 and 3
B) 1 and 4
C) 2 and 3
D) 2 and 4

Solution: D
Random Forest and Extra Trees don’t have learning rate as a hyperparameter.

14. Which of the following algorithm are not an example of ensemble learning algorithm?

A) Random Forest
B) Adaboost
C) Extra Trees
D) Gradient Boosting
E) Decision Trees

Solution: E

Decision trees doesn’t aggregate the results of multiple trees so it is not an ensemble algorithm.

15. Suppose you are using a bagging based algorithm say a RandomForest in model building. Which of
the following can be true?

1. Number of tree should be as large as possible

2. You will have interpretability after using RandomForest

A) 1
B) 2
C) 1 and 2
D) None of these

Solution: A

Since Random Forest aggregate the result of different weak learners, If It is possible we would want
more number of trees in model building. Random Forest is a black box model you will lose
interpretability after using it.

16. True-False: The bagging is suitable for high variance low bias models?

A) TRUE
B) FALSE

Solution: A

The bagging is suitable for high variance low bias models or you can say for complex models.

17. To apply bagging to regression trees which of the following is/are true in such case?

1. We build the N regression with N bootstrap sample

2. We take the average the of N regression tree

3. Each tree has a high variance with low bias


A) 1 and 2
B) 2 and 3
C) 1 and 3
D) 1,2 and 3

Solution: D

All of the options are correct and self-explanatory

18. How to select best hyper parameters in tree based models?

A) Measure performance over training data


B) Measure performance over validation data
C) Both of these
D) None of these

Solution: B

We always consider the validation results to compare with the test result.

19. In which of the following scenario a gain ratio is preferred over Information Gain?

A) When a categorical variable has very large number of category


B) When a categorical variable has very small number of category
C) Number of categories is the not the reason
D) None of these

Solution: A

When high cardinality problems, gain ratio is preferred over Information Gain technique.

20. Suppose you have given the following scenario for training and validation error for Gradient
Boosting. Which of the following hyper parameter would you choose in such case?

Scenario Depth Training Error Validation Error

1 2 100 110

2 4 90 105

3 6 50 100

4 8 45 105
5 10 30 150

A) 1
B) 2
C) 3
D) 4

Solution: B

Scenario 2 and 4 has same validation accuracies but we would select 2 because depth is lower is better
hyper parameter.

21. Which of the following is/are not true about DBSCAN clustering algorithm:

1. For data points to be in a cluster, they must be in a distance threshold to a core point

2. It has strong assumptions for the distribution of data points in dataspace

3. It has substantially high time complexity of order O(n 3)

4. It does not require prior knowledge of the no. of desired clusters

5. It is robust to outliers

Options:

A. 1 only

B. 2 only

C. 4 only

D. 2 and 3

Solution: D

 DBSCAN can form a cluster of any arbitrary shape and does not have strong assumptions for the
distribution of data points in the data space.

 DBSCAN has a low time complexity of order O (n log n) only.

22. Point out the correct statement.


a) The choice of an appropriate metric will influence the shape of the clusters
b) Hierarchical clustering is also called HCA
c) In general, the merges and splits are determined in a greedy manner
d) All of the mentioned
Answer: d
Explanation: Some elements may be close to one another according to one distance and farther away
according to another.

23. Which of the following is required by K-means clustering?


a) defined distance metric
b) number of clusters
c) initial guess as to cluster centroids
d) all of the mentioned

Answer: d
Explanation: K-means clustering follows partitioning approach.

24. Point out the wrong statement.


a) k-means clustering is a method of vector quantization
b) k-means clustering aims to partition n observations into k clusters
c) k-nearest neighbor is same as k-means
d) none of the mentioned

Answer: c
Explanation: k-nearest neighbour has nothing to do with k-means.

25. Which of the following function is used for k-means clustering?


a) k-means
b) k-mean
c) heat map
d) none of the mentioned

Answer: a
Explanation: K-means requires a number of clusters.

26. K-means is not deterministic and it also consists of number of iterations.


a) True
b) False

Answer: a
Explanation: K-means clustering produces the final estimate of cluster centroids.
27.
((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1
3 UPTO 10)
((QUESTION)) Which of the following step / assumption in regression modeling impacts
the trade-off between under-fitting and over-fitting the most

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO


((OPTION_A)) The polynomial degree
THIS IS MANDATORY OPTION
((OPTION_B)) Whether we learn the weights by matrix inversion or gradient descent

THIS IS ALSO MANDATORY OPTION


((OPTION_C)) The use of a constant-term
This is optional
((OPTION_D))
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) Suppose you have the following data with one real-value input variable &
one real-value output variable. What is leave-one out cross validation
mean square error in case of linear regression (Y = bX+c)?

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO


((OPTION_A)) Oct-27
THIS IS MANDATORY OPTION
((OPTION_B)) 20/27
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) 50/27
This is optional
((OPTION_D)) 49/27
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E D
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) Which of the following is/ are true about “Maximum Likelihood estimate
(MLE)”?
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
1. MLE may not always exist
2. MLE always exists
3. If MLE exist, it (they) may not be unique
4. If MLE exist, it (they) must be unique
((OPTION_A)) 1and4
THIS IS MANDATORY OPTION
((OPTION_B)) 2 and3
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) 1 and3
This is optional
((OPTION_D)) 2 and4
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E C
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) Let’s say, a “Linear regression” model perfectly fits the training data
(train error is zero). Now, Which of the following statement is true?

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO


((OPTION_A)) You will always have test error zero
THIS IS MANDATORY OPTION
((OPTION_B)) . You can not have test error zero
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) None of the above
This is optional
((OPTION_D))
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E C
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) Which one of the statement is true regarding residuals in regression
analysis?
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A)) A. Mean of residuals is always zero
THIS IS MANDATORY OPTION
((OPTION_B)) Mean of residuals is always less than zero
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) Mean of residuals is always greater than zero
This is optional
((OPTION_D)) There is no such rule for residuals.
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) Which of the one is true about Heteroskedasticity?
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A)) Linear Regression with varying error terms
THIS IS MANDATORY OPTION
((OPTION_B)) Linear Regression with constant error terms
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) Linear Regression with zero error terms
This is optional
((OPTION_D)) None of the above
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional
((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1
3 UPTO 10)
((QUESTION)) Which of the following indicates a fairly strong relationship between X
and Y?
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A)) A. Correlation coefficient = 0.9
THIS IS MANDATORY OPTION
((OPTION_B)) . The p-value for the null hypothesis Beta coefficient =0 is 0.0001
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) The t-statistic for the null hypothesis Beta coefficient=0 is 30
This is optional
((OPTION_D)) None of these
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) Which of the following assumptions do we make while deriving linear regression param

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO 1. The true relationship between dependent y and predictor x is linear
2. The model errors are statistically independent

3. The errors are normally distributed with a 0 mean and constant standard deviation.

((OPTION_A)) 1,2&3
THIS IS MANDATORY OPTION
((OPTION_B)) 1&3
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) All of above
This is optional
((OPTION_D))
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E C
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) To test linear relationship of y(dependent) and x(independent)
continuous variables, which of the following plot best suited?

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO


((OPTION_A)) Scatter plot
THIS IS MANDATORY OPTION
((OPTION_B)) Barchart
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) Histograms
This is optional
((OPTION_D)) None of these
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional
((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1
3 UPTO 10)
((QUESTION)) Generally, which of the following method(s) is used for predicting
continuous dependent variable?
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
1. Linear Regression
2. Logistic Regression
((OPTION_A)) 1&2
THIS IS MANDATORY OPTION
((OPTION_B)) Only 1
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) Only 2
This is optional
((OPTION_D)) None f the above
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E B
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) . A correlation between age and health of a person found to be -1.09. On
the basis of this you would tell the doctors that:

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO


((OPTION_A)) . The age is good predictor of health
THIS IS MANDATORY OPTION
((OPTION_B)) . The age is poor predictor of health
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) None of these
This is optional
((OPTION_D)) All of the above
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E C
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) Which of the following offsets, do we use in case of least square line fit? Suppose horizontal axis is
independent variable and vertical axis is dependent variable

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO


((OPTION_A)) Vertical offset
THIS IS MANDATORY OPTION
((OPTION_B)) Perpendicular offset
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) Both but depend on situation
This is optional
((OPTION_D)) Both a&b
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional
((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1
3 UPTO 10)
((QUESTION)) Suppose we have generated the data with help of polynomial regression of degree 3 (degree 3 will perfectly fit
this data). Now consider below points and choose the option based on these points.

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO 1. Simple Linear regression will have high bias and low variance
2. Simple Linear regression will have low bias and high variance
3. polynomial of degree 3 will have low bias and high variance
Polynomial of degree 3 will have low bias and Low variance

((OPTION_A)) . Only 1
THIS IS MANDATORY OPTION
((OPTION_B)) 1&3
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) 1&4
This is optional
((OPTION_D)) None of the above
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E C
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) . Suppose you are training a linear regression model. Now consider these
points.
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
1. Overfitting is more likely if we have less data
2. Overfitting is more likely when the hypothesis space is small

Which of the above statement(s) are correct?


((OPTION_A)) Both are False
THIS IS MANDATORY OPTION
((OPTION_B)) 1 is False and 2 is True
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) 1 is True and 2 is False
This is optional
((OPTION_D)) None of the above
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E c
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) Suppose we fit “Lasso Regression” to a data set, which has 100 features (X1,X2…X100). Now, we rescale one
of these feature by multiplying with 10 (say that feature is X1), and then refit Lasso regression with the same
regularization parameter.

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO Now, which of the following option will be correct ?
((OPTION_A)) It is more likely for X1 to be excluded from the model
THIS IS MANDATORY OPTION
((OPTION_B)) It is more likely for X1 to be included in the model
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) . Can’t say
This is optional
((OPTION_D)) None of the above
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E B
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) Which of the following is true about “Ridge” or “Lasso” regression
methods in case of feature selection?
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A)) Ridge regression uses subset selection of features
THIS IS MANDATORY OPTION
((OPTION_B)) . Lasso regression uses subset selection of features
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) Both use subset selection of features
This is optional
((OPTION_D)) All of the above
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E B
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) . Which of the following statement(s) can be true post adding a variable in
a linear regression model?
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO 1. R-Squared and Adjusted R-squared both increase
2. R-Squared increases and Adjusted R-squared decreases
3. R-Squared decreases and Adjusted R-squared decreases
4. R-Squared decreases and Adjusted R-squared increases
((OPTION_A)) . 1 and 2
THIS IS MANDATORY OPTION
((OPTION_B)) 1 and 3
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) 2 and 4
This is optional
((OPTION_D)) none of these

This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) . Which of the following metrics can be used for evaluating regression
models?
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO 1. R Squared
2. Adjusted R Squared
3. F Statistics
1. RMSE / MSE / MAE
((OPTION_A)) 2 and 4
THIS IS MANDATORY OPTION
((OPTION_B)) 1 and 2.
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) . 2, 3 and 4.
This is optional
((OPTION_D)) All of the above
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E D
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) We can also compute the coefficient of linear regression with the help of
an analytical method called “Normal Equation”. Which of the following
is/are true about “Normal Equation”?

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO 1. We don’t have to choose the learning rate
2. It becomes slow when number of features is very large
3. No need to iterate
((OPTION_A)) 1 and 2
THIS IS MANDATORY OPTION
((OPTION_B)) 1&3
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) 2&3
This is optional
((OPTION_D)) 1,2&3
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E D
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) . The expected value of Y is a linear function of the X(X1,X2….Xn) variables and regression line is defined as:

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO Y = β0 + β1 X1 + β2 X2……+ βn Xn


Which of the following statement(s) are true?
1. If Xi changes by an amount ∆Xi, holding other variables constant, then the expected value of Y changes
by a proportional amount βi ∆Xi, for some constant βi (which in general could be a positive or negative
number).
2. The value of βi is always the same, regardless of values of the other X’s.
3. The total effect of the X’s on the expected value of Y is the sum of their separate effects.

((OPTION_A)) . 1 and 2
THIS IS MANDATORY OPTION
((OPTION_B)) 1 and 3
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) 2 and 3
This is optional
((OPTION_D)) 1,2 and 3
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E D
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) . How many coefficients do you need to estimate in a simple linear
regression model (One independent variable)
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A)) 1
THIS IS MANDATORY OPTION
((OPTION_B)) 2
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) CAN’T SAY
This is optional
((OPTION_D))
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E B
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 2


3 UPTO 10)
((QUESTION)) . Below graphs show two fitted regression lines (A & B) on randomly generated data. Now, I want to find the
sum of residuals in both cases A and B.

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO

Which of the following statement is true about sum of residuals of A and B

((OPTION_A)) A has higher than B


THIS IS MANDATORY OPTION
((OPTION_B)) A has lower than B
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) Both have same
This is optional
((OPTION_D)) None of these
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E C
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) If two variables are correlated, is it necessary that they have a linear
relationsh
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A)) YES
THIS IS MANDATORY OPTION
((OPTION_B)) NO
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) Both a&b
This is optional
((OPTION_D)) None of the above
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E B
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) Correlated variables can have zero correlation coeffficient. True or
False?
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A)) TRUE
THIS IS MANDATORY OPTION
((OPTION_B)) FALSE
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
This is optional
((OPTION_D))
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) Suppose I applied a logistic regression model on data and got training accuracy X and testing accuracy Y.
Now I want to add few new features in data. Select option(s) which are correct in such case.

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO Note: Consider remaining parameters are same.
1. Training accuracy always decreases.
2. Training accuracy always increases or remain same.
3. Testing accuracy always decreases
Testing accuracy always increases or remain same

((OPTION_A)) Only 2
THIS IS MANDATORY OPTION
((OPTION_B)) Only 1
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) Only3
This is optional
((OPTION_D)) All of the above
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional
((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1
3 UPTO 10)
((QUESTION)) The graph below represents a regression line predicting Y from X. The values on the
graph shows the residuals for each predictions value. Use this information to compute
the SSE.
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A)) 3.02
THIS IS MANDATORY OPTION
((OPTION_B)) 0.75
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) 1.01
This is optional
((OPTION_D)) None of these
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) Suppose the distribution of salaries in a company X has median $35,000,
and 25th and 75th percentiles are $21,000 and $53,000 respectively.

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO Would a person with Salary $1 be considered an Outlier?
((OPTION_A)) YES
THIS IS MANDATORY OPTION
((OPTION_B)) NO
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) . More information is required
This is optional
((OPTION_D)) None of these
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E C
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) Which of the following option is true regarding “Regression” and
“Correlation” ?
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO Note: y is dependent variable and x is independent variable.
((OPTION_A)) The relationship is symmetric between x and y in both.
THIS IS MANDATORY OPTION
((OPTION_B)) The relationship is not symmetric between x and y in both.
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) The relationship is not symmetric between x and y in case of correlation but in
case of regression it is symmetric.
This is optional
((OPTION_D)) The relationship is symmetric between x and y in case of correlation but in
case of regression it is not symmetric.
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E B
((EXPLANATION)) This is also optional
((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1
3 UPTO 10)
((QUESTION)) True-False: Is Logistic regression a supervised machine learning
algorithm?
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A)) TRUE
THIS IS MANDATORY OPTION
((OPTION_B)) FALSE
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) _
This is optional
((OPTION_D))
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) True-False: Is Logistic regression mainly used for Regression?

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO


((OPTION_A)) TRUE
THIS IS MANDATORY OPTION
((OPTION_B)) FALSE
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
This is optional
((OPTION_D))
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E B
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) True-False: Is it possible to design a logistic regression algorithm using a
Neural Network Algorithm?
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A)) TRUE
THIS IS MANDATORY OPTION
((OPTION_B)) FALSE
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
This is optional
((OPTION_D))
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) True-False: Is it possible to apply a logistic regression algorithm on a 3-
class Classification problem?
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A)) TRUE
THIS IS MANDATORY OPTION
((OPTION_B)) FALSE
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
This is optional
((OPTION_D))
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) Which of the following methods do we use to best fit the data in Logistic
Regression?
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A)) Least Square Error
THIS IS MANDATORY OPTION
((OPTION_B)) Maximum Likelihood
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) Jaccard distance
This is optional
((OPTION_D)) Both a&B
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E B
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) One of the very good methods to analyze the performance of Logistic
Regression is AIC, which is similar to R-Squared in Linear Regression.
Which of the following is true about AIC

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO


((OPTION_A)) We prefer a model with minimum AIC value
THIS IS MANDATORY OPTION
((OPTION_B)) We prefer a model with maximum AIC value
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) Both but depend on the situation
This is optional
((OPTION_D)) None of the above
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) True-False] Standardisation of features is required before training a
Logistic Regression
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A)) TRUE
THIS IS MANDATORY OPTION
((OPTION_B)) FALSE
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
This is optional
((OPTION_D))
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E B
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) Which of the following algorithms do we use for Variable Selection?

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO


((OPTION_A)) ) LASSO
THIS IS MANDATORY OPTION
((OPTION_B)) Ridge
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) Both
This is optional
((OPTION_D)) All of these
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional
((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1
3 UPTO 10)
((QUESTION)) Suppose you have been given a fair coin and you want to find out the
odds of getting heads. Which of the following option is true for such a
case?
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A)) odds will be 0
THIS IS MANDATORY OPTION
((OPTION_B)) odds will be 0.5
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) odds will be 1
This is optional
((OPTION_D)) None of the above
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E C
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) ) The logit function(given as l(x)) is the log of odds function. What could
be the range of logit function in the domain x=[0,1]?

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO


((OPTION_A)) (– ∞ , ∞)
THIS IS MANDATORY OPTION
((OPTION_B)) (0,1)
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) (0, ∞)
This is optional
((OPTION_D)) (- ∞, 0)
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) Which of the following option is true?
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A)) Linear Regression errors values has to be normally distributed but in case of
Logistic Regression it is not the case
THIS IS MANDATORY OPTION
((OPTION_B)) Linear Regression errors values has to be normally distributed but in case of
Logistic Regression it is not the case
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) Both Linear Regression and Logistic Regression error values have to be
normally distributed
This is optional
((OPTION_D)) Both Linear Regression and Logistic Regression error values have not to be
normally distributed
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional
((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1
3 UPTO 10)
((QUESTION)) 17) Which of the following is true regarding the logistic function for any value “x Note:

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO Logistic(x): is a logistic function of any number “x”
Logit(x): is a logit function of any number “x”
Logit_inv(x): is a inverse logit function of any number “x””?

((OPTION_A)) C) A) Logistic(x) = Logit(x)


THIS IS MANDATORY OPTION
((OPTION_B)) Logistic(x) = Logit_inv(x)
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) A) Logistic(x) = Logit(x)
This is optional
((OPTION_D)) None of these
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E B
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 2


3 UPTO 10)
((QUESTION)) Suppose, You applied a Logistic Regression model on a given data and
got a training accuracy X and testing accuracy Y. Now, you want to add a
few new features in the same data. Select the option(s) which is/are
correct in such a case.

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO Note: Consider remaining parameters are same.
((OPTION_A)) Training accuracy increases
THIS IS MANDATORY OPTION
((OPTION_B)) Training accuracy increases or remains the same
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) Testing accuracy decreases
This is optional
((OPTION_D)) Testing accuracy increases or remains the same
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A&D
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) Choose which of the following options is true regarding One-Vs-All
method in Logistic Regression.
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A)) We need to fit n models in n-class classification problem
THIS IS MANDATORY OPTION
((OPTION_B)) We need to fit n-1 models to classify into n classes
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) We need to fit only 1 model to classify into n classes
This is optional
((OPTION_D))
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional
((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1
3 UPTO 10)
((QUESTION)) What would do if you want to train logistic regression on same data that
will take less time as well as give the comparatively similar accuracy(may
not be same)?
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO Suppose you are using a Logistic Regression model on a huge dataset. One of
the problem you may face on such huge data is that Logistic regression will
take very long time to train
((OPTION_A)) Decrease the learning rate and decrease the number of iteration
THIS IS MANDATORY OPTION
((OPTION_B)) Decrease the learning rate and increase the number of iteration
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) Increase the learning rate and increase the number of iteration
This is optional
((OPTION_D)) Increase the learning rate and decrease the number of iteration
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E D
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 2


3 UPTO 10)
((QUESTION)) Which of the following image is showing the cost function for y =1.

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO Following is the loss function in logistic regression(Y-axis loss function and x axis log probability) for two
class classification problem.
Note: Y is the target class

((OPTION_A)) A
THIS IS MANDATORY OPTION
((OPTION_B)) B
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) BOTH
This is optional
((OPTION_D)) NON OF THESE
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) Logistic regression is used when you want to:
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A)) Predict a dichotomous variable from continuous or dichotomous variables.

THIS IS MANDATORY OPTION


((OPTION_B)) Predict a continuous variable from dichotomous variables.
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) Predict any categorical variable from several other categorical variables.

This is optional
((OPTION_D)) Predict a continuous variable from dichotomous or continuous variables

This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional
((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1
3 UPTO 10)
((QUESTION)) The odds ratio is

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO


((OPTION_A)) The ratio of the probability of an event not happening to the probability of the event happening.

THIS IS MANDATORY OPTION


((OPTION_B)) The probability of an event occurring.

THIS IS ALSO MANDATORY OPTION


((OPTION_C)) The ratio of the odds after a unit change in the predictor to the original odds.

This is optional
((OPTION_D)) The ratio of the probability of an event happening to the probability of the event not happening.

This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E C
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) Large values of the log-likelihood statistic indicate:

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO


((OPTION_A)) That there are a greater number of explained vs. unexplained observations.

THIS IS MANDATORY OPTION


((OPTION_B)) That the statistical model fits the data well.

THIS IS ALSO MANDATORY OPTION


((OPTION_C)) That as the predictor variable increases, the likelihood of the outcome occurring decreases.

This is optional
((OPTION_D)) That the statistical model is a poor fit of the data.

This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E B
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) Logistic regression assumes a:

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO


((OPTION_A)) Linear relationship between continuous predictor variables and the outcome variable.

THIS IS MANDATORY OPTION


((OPTION_B)) Linear relationship between continuous predictor variables and the logit of the outcome variable.

THIS IS ALSO MANDATORY OPTION


((OPTION_C)) Linear relationship between continuous predictor variables.

This is optional
((OPTION_D)) Linear relationship between observations.

This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E B
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) In binary logistic regression:

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO


((OPTION_A)) The dependent variable is continuous.

THIS IS MANDATORY OPTION


((OPTION_B)) The dependent variable is divided into two equal subcategories.

THIS IS ALSO MANDATORY OPTION


((OPTION_C)) The dependent variable consists of two categories.

This is optional
((OPTION_D)) There is no dependent variable.

This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E C
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR 1


3 UPTO 10)
((QUESTION)) The correlation coefficient is used to determine
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A)) A specific value of the y-variable given a specific value of the x-
variable
THIS IS MANDATORY OPTION
((OPTION_B)) A specific value of the x-variable given a specific value of the y-
variable
THIS IS ALSO MANDATORY OPTION
((OPTION_C)) The strength of the relationship between the x and y variables

This is optional
((OPTION_D)) none
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E C
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION)) Choose the options that is incorrect regarding machine learning (ML) and
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO artificial intelligence (AI)
((OPTION_A))
ML is an alternate way of programming intelligent machines.
THIS IS MANDATORY OPTION
((OPTION_B))
ML and AI have very different goals
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
ML is a set of techniques that turns a dataset into a software.
This is optional
((OPTION_D))
AI is a software that can emulate the human mind
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E B
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION))
Which of the following sentence is FALSE regarding regression
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A))
It is used for prediction
THIS IS MANDATORY OPTION
((OPTION_B))
It may be used for interpretation
It may be used for interpretation
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
It relates inputs to outputs.
This is optional
((OPTION_D))
It discovers causal relationships
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E D
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION))
Grid search is
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A))
Linear in D
THIS IS MANDATORY OPTION
((OPTION_B))
Exponential in D
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
Linear in N
This is optional
((OPTION_D))
Both B&C
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E D
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION))
Find incorrect regarding Gradient of a continuous and differentiable function
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A))
is zero at a minimum
THIS IS MANDATORY OPTION
((OPTION_B))
is non-zero at a maximum
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
is zero at a saddle point
This is optional
((OPTION_D))
decreases as you get closer to the minimum
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E B
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION)) Consider a linear-regression model with N = 3 and D = 1 with input-ouput

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO pairs as follows: y 1 = 22, x 1 = 1, y 2 = 3, x 2 = 1, y 3 = 3, x 3 = 2. What

is the gradient of mean-square error (MSE) with respect to β 1 when β 0 = 0


and β 1 = 1? Give your answer correct to two decimal digits.
((OPTION_A))
-1.66
THIS IS MANDATORY OPTION
((OPTION_B))
2
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
3
This is optional
((OPTION_D))
4
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION)) Let us say that we have computed the gradient of our cost function and

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO stored it in a vector g. What is the cost of one gradient descent update
given the gradient?
((OPTION_A))
O (D )
THIS IS MANDATORY OPTION
((OPTION_B))
O (N )
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
O (ND )
This is optional
((OPTION_D))
O (ND 2)
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION)) You observe the following while fitting a linear regression to the data: As

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO you increase the amount of training data, the test error decreases and the

training error increases. The train error is quite low (almost what you expect
it to), while the test error is much higher than the train error.
What do you think is the main reason behind this behavior. Choose the
most probable option
((OPTION_A))
High variance
THIS IS MANDATORY OPTION
((OPTION_B))
High model bias
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
High estimation bias
This is optional
((OPTION_D))
None of the above
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION)) Adding more basis functions in a linear model... (pick the most probably
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO option)
((OPTION_A))
Decreases model bias
THIS IS MANDATORY OPTION
((OPTION_B))
Decreases estimation bias
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
Decreases variance
This is optional
((OPTION_D))
Doesn’t affect bias and variance
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION))
The problem of finding hidden structure in unlabeled data is called
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A))
Supervised learning
THIS IS MANDATORY OPTION
((OPTION_B))
UnSupervised learning
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
Reinforcement learning
This is optional
((OPTION_D))
None of the above
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E B
((EXPLANATION)) This is also optional
((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR
1
3 UPTO 10)
((QUESTION))
Task of inferring a model from labeled training data is called
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A))
Unsupervised learning
THIS IS MANDATORY OPTION
((OPTION_B))
supervised learning
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
Reinforcement learning
This is optional
((OPTION_D))
None of the above
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E B
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION)) Some telecommunication company wants to segment their customers into
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO distinct groups in order to send appropriate subscription offers, this is an
((OPTION_A))
Supervised learning
THIS IS MANDATORY OPTION
((OPTION_B))
Data extraction
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
Serration
This is optional
((OPTION_D))
Unsupervised learning
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E D
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION))
Self-organizing maps are an example of
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A))
Unsupervised learning
THIS IS MANDATORY OPTION
((OPTION_B))
Supervised learning
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
Reinforcement learning
This is optional
((OPTION_D))
Missing data imputation
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION)) You are given data about seismic activity in Japan, and you want to
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO predict a magnitude of the next earthquake, this is in an example of
((OPTION_A))
Supervised learning
THIS IS MANDATORY OPTION
((OPTION_B))
Unsupervised learning
Unsupervised learning
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
Serration
This is optional
((OPTION_D))
None of the above
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
Assume you want to perform supervised learning and to predict number
((QUESTION))
of newborns according to size of storks’ population

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO (http://www.brixtonhealth.com/storksBabies.pdf), it is an example of


((OPTION_A))
Classification
THIS IS MANDATORY OPTION
((OPTION_B))
Regression
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
Clustering
This is optional
((OPTION_D))
None of the above
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E B
((EXPLANATION)) This is also optional
((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR
1
3 UPTO 10)
((QUESTION)) Discriminating between spam and ham e-mails is a classification task,
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO true or false?
((OPTION_A))
TRUE
THIS IS MANDATORY OPTION
((OPTION_B))
FALSE
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
This is optional
((OPTION_D))
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION)) In the example of predicting number of babies based on storks’
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO population size, number of babies is
((OPTION_A))
Outcome
THIS IS MANDATORY OPTION
((OPTION_B))
Feature
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
Attribute
This is optional
((OPTION_D))
None of the above
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION)) It may be better to avoid the metric of ROC curve as it can suffer from
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO accuracy paradox.
((OPTION_A))
TRUE
THIS IS MANDATORY OPTION
((OPTION_B))
FALSE
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
This is optional
((OPTION_D))
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E B
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION))
which of the following is not involve in data mining
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A))
Knowledge extraction
THIS IS MANDATORY OPTION
((OPTION_B))
Data archaeology
Data archaeology
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
Data exploration
This is optional
((OPTION_D))
Data transformation
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E D
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION))
The expected value or _______ of a random variable is the center of its distribution.
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A))
Mode
THIS IS MANDATORY OPTION
((OPTION_B))
median
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
mean
This is optional
((OPTION_D))
None of the above
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E C
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION))
Point out the correct statement.
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A))
Some cumulative distribution function F is non-decreasing and right-continuous
THIS IS MANDATORY OPTION
((OPTION_B))
Every cumulative distribution function F is decreasing and right-continuous
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
Every cumulative distribution function F is increasing and left-continuous
This is optional
((OPTION_D))
None of the above
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E D
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION))
Which of the following of a random variable is a measure of spread
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A))
variance
THIS IS MANDATORY OPTION
((OPTION_B))
standard deviation
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
empirical mean
This is optional
((OPTION_D))
All above
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION))
The square root of the variance is called the ________ deviation
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A))
empirical
THIS IS MANDATORY OPTION
((OPTION_B))
mean
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
continuous
This is optional
((OPTION_D))
standard
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E D
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION))
For continuous random variables, the CDF is the derivative of the PDF.
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A))
TRUE
THIS IS MANDATORY OPTION
((OPTION_B))
FALSE
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
This is optional
((OPTION_D))
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E B
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION)) Cumulative distribution functions are used to specify the distribution of
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO multivariate random variables.
((OPTION_A))
TRUE
THIS IS MANDATORY OPTION
((OPTION_B))
FALSE
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
This is optional
((OPTION_D))
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
Consider the results of a medical experiment that aims to predict whether someone is
((QUESTION))
going to develop myopia based on some physical measurements and heredity. In this
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO case, the input dataset consists of the person’s medical characteristics and the target
((OPTION_A))
Regression
Regression
THIS IS MANDATORY OPTION
((OPTION_B))
Desicion Tree
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
Clustering
This is optional
((OPTION_D))
Association Rule
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E B
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
The purpose of a machine learning model is to approximate an unknown function
((QUESTION))
that
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO associates input elements to output ones

((OPTION_A))
TRUE
THIS IS MANDATORY OPTION
((OPTION_B))
FALSE
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
This is optional
((OPTION_D))
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional
((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR
1
3 UPTO 10)
((QUESTION))
Training set is normally a representation of a global distribution
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A))
TRUE
THIS IS MANDATORY OPTION
((OPTION_B))
FALSE
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
This is optional
((OPTION_D))
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


2
3 UPTO 10)
((QUESTION)) The model has an excessive capacity and it's not more able to
generalize considering the original dynamics provided by the training set. This
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
problem is called as

((OPTION_A))
Underfitting
THIS IS MANDATORY OPTION
((OPTION_B))
Overfitting
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
Both
Both
This is optional
((OPTION_D))
None
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION)) It can associate almost perfectly all the known samples to the corresponding output

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO values, but when an unknown input is presented, the corresponding prediction
error can be very high, This problem is called as

((OPTION_A))
Underfitting
THIS IS MANDATORY OPTION
((OPTION_B))
Overfitting
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
Both
This is optional
((OPTION_D))
None
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E
((EXPLANATION)) This is also optional
((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR
1
3 UPTO 10)
((QUESTION)) ---------- may prove to be more difficult to discover as it could be initially considered
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO the result of a perfect fitting
((OPTION_A))
Underfitting
THIS IS MANDATORY OPTION
((OPTION_B))
Overfitting
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
Both
This is optional
((OPTION_D))
None
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION)) when working with a supervised scenario, we define a non-negative error

measure e m which takes two arguments and allows us to compute a total error value
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
over the whole dataset. Those two arguments are.

((OPTION_A))
expected and predicted output
THIS IS MANDATORY OPTION
((OPTION_B))
calculated and predicted output
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
calculated and measured output
calculated and measured output
This is optional
((OPTION_D))
none
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A

((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION)) Initial value represents a starting point over the surface of a n-variables function. A

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO generic training algorithm has to find the global minimum or a point quite close to it

(there's always a tolerance to avoid an excessive number of iterations and a


consequent risk
of overfitting). This measure is also called
((OPTION_A))
loss function
THIS IS MANDATORY OPTION
((OPTION_B))
predicted output
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
measured output
This is optional
((OPTION_D))
mean square error
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional
((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR
1
3 UPTO 10)
((QUESTION)) In 1984, the computer scientist L. Valiant

ENTER CONTENT. QTN CAN HAVE IMAGES ALSO proposed a mathematical approach to determine whether a problem is learnable by a

computer. The name of this technique is


((OPTION_A))
Max likelihood
THIS IS MANDATORY OPTION
((OPTION_B))
Zero one loss error
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
Probably approximately correct
This is optional
((OPTION_D))
none
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E C
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION)) In particular, a concept is a subset of input patterns X which determine the same
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO output element
((OPTION_A))
TRUE
THIS IS MANDATORY OPTION
((OPTION_B))
FALSE
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
This is optional
((OPTION_D))
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION)) Therefore, learning a
concept (parametrically) means minimizing the corresponding loss function
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
restricted to a
specific class, while learning all possible concepts (belonging to the same universe),
means
finding the minimum of a global loss function
((OPTION_A))
TRUE
THIS IS MANDATORY OPTION
((OPTION_B))
FALSE
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
This is optional
((OPTION_D))
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional
((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR
1
3 UPTO 10)
An exponential time could lead to computational explosions when the datasets are
((QUESTION))
too large
or the optimization starting point is very far from an acceptable minimum. Moreover,
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
it's
important to remember the so-called …….
((OPTION_A))
curse of dimensionality
THIS IS MANDATORY OPTION
((OPTION_B))
Hughes phenomenon
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
Probably approximately correct
This is optional
((OPTION_D))
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION)) In many cases, in order to capture the full expressivity, it's
necessary to have a very large dataset and without enough training data, the
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
approximation
can become problematic. This is called…
((OPTION_A))
curse of dimensionality
THIS IS MANDATORY OPTION
((OPTION_B))
Hughes phenomenon
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
Probably approximately correct
This is optional
((OPTION_D))
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E B
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION))
First term is called as
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A))
posteriori
THIS IS MANDATORY OPTION
((OPTION_B))
Apriori
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
likelihood.
This is optional
((OPTION_D))
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E A
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION))
second term is called as
second term is called as
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A))
posteriori
THIS IS MANDATORY OPTION
((OPTION_B))
Apriori
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
likelihood.
This is optional
((OPTION_D))
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E B
((EXPLANATION)) This is also optional

((MARKS)) QUESTION IS OF HOW MANY MARKS? (1 OR 2 OR


1
3 UPTO 10)
((QUESTION))
Third term is called as
ENTER CONTENT. QTN CAN HAVE IMAGES ALSO
((OPTION_A))
posteriori
THIS IS MANDATORY OPTION
((OPTION_B))
Apriori
THIS IS ALSO MANDATORY OPTION
((OPTION_C))
likelihood.
This is optional
((OPTION_D))
This is optional
((OPTION_E)) This is optional. If optional keep empty so that
system will skip this option
((CORRECT_CHOICE)) Either A or B or C or D or E C
((EXPLANATION)) This is also optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Choose the options that is incorrect regarding machine learning (ML) and
artificial intelligence (AI)
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) ML is an alternate way of programming intelligent machines.

THIS IS
MANDATORY
OPTION

((OPTION_B)) ML and AI have very different goals


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) ML is a set of techniques that turns a dataset into a software.


This is optional

((OPTION_D)) AI is a software that can emulate the human mind


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH B
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Which of the following sentence is FALSE regarding regression


ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) It is used for prediction


THIS IS
MANDATORY
OPTION

((OPTION_B)) It may be used for interpretation


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) It relates inputs to outputs.


This is optional

((OPTION_D)) It discovers causal relationships


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH D
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Grid search is


ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) Linear in D
THIS IS
MANDATORY
OPTION

((OPTION_B)) Exponential in D
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Linear in N
This is optional

((OPTION_D)) Both B&C


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH D
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Find incorrect regarding Gradient of a continuous and differentiable


function
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) is zero at a minimum


THIS IS
MANDATORY
OPTION

((OPTION_B)) is non-zero at a maximum


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) is zero at a saddle point


This is optional

((OPTION_D)) decreases as you get closer to the minimum


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH B
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Consider a linear-regression model with N = 3 and D = 1 with input-ouput


pairs as follows: y1 = 22, x1 = 1, y2 = 3, x2 = 1, y3 = 3, x3 = 2. What
ENTER is the gradient of mean-square error (MSE) with respect to β1 when β0 = 0
CONTENT. QTN and β1 = 1? Give your answer correct to two decimal digits.
CAN HAVE
IMAGES ALSO

((OPTION_A)) -1.66
THIS IS
MANDATORY
OPTION

((OPTION_B)) 2
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) 3
This is optional

((OPTION_D)) 4
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Let us say that we have computed the gradient of our cost function and
stored it in a vector g. What is the cost of one gradient descent update
ENTER given the gradient?
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) O(D)
THIS IS
MANDATORY
OPTION

((OPTION_B)) O(N)
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) O(ND)
This is optional

((OPTION_D)) O(ND2)

This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) You observe the following while fitting a linear regression to the data: As
you increase the amount of training data, the test error decreases and the
ENTER training error increases. The train error is quite low (almost what you
CONTENT. QTN expect
CAN HAVE it to), while the test error is much higher than the train error.
IMAGES ALSO What do you think is the main reason behind this behavior. Choose the
most probable option
((OPTION_A)) High variance
THIS IS
MANDATORY
OPTION

((OPTION_B)) High model bias


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) High estimation bias


This is optional

((OPTION_D)) None of the above


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Adding more basis functions in a linear model... (pick the most probably
option)
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) Decreases model bias


THIS IS
MANDATORY
OPTION

((OPTION_B)) Decreases estimation bias


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Decreases variance


This is optional

((OPTION_D)) Doesn’t affect bias and variance


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) The problem of finding hidden structure in unlabeled data is called


ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) Supervised learning


THIS IS
MANDATORY
OPTION

((OPTION_B)) UnSupervised learning


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Reinforcement learning


This is optional

((OPTION_D)) None of the above


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH B
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Task of inferring a model from labeled training data is called


ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) Unsupervised learning


THIS IS
MANDATORY
OPTION

((OPTION_B)) supervised learning


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Reinforcement learning


This is optional

((OPTION_D)) None of the above


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH B
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Some telecommunication company wants to segment their customers


into distinct groups in order to send appropriate subscription offers,
ENTER
this is an example of
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) Supervised learning


THIS IS
MANDATORY
OPTION

((OPTION_B)) Data extraction


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Serration
This is optional

((OPTION_D)) Unsupervised learning


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH D
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Self-organizing maps are an example of


ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) Unsupervised learning


THIS IS
MANDATORY
OPTION

((OPTION_B)) Supervised learning


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Reinforcement learning


This is optional

((OPTION_D)) Missing data imputation


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) You are given data about seismic activity in Japan, and you want to
predict a magnitude of the next earthquake, this is in an example of
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) Supervised learning


THIS IS
MANDATORY
OPTION

((OPTION_B)) Unsupervised learning


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Serration
This is optional

((OPTION_D)) None of the above


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Assume you want to perform supervised learning and to predict


number of newborns according to size of storks’ population
ENTER
(http://www.brixtonhealth.com/storksBabies.pdf), it is an example of
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) Classification
THIS IS
MANDATORY
OPTION

((OPTION_B)) Regression
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Clustering
This is optional

((OPTION_D)) None of the above


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH B
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Discriminating between spam and ham e-mails is a classification task,


true or false?
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) True
THIS IS
MANDATORY
OPTION

((OPTION_B)) False
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) In the example of predicting number of babies based on storks’


population size, number of babies is
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) Outcome
THIS IS
MANDATORY
OPTION

((OPTION_B)) Feature
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Attribute
This is optional

((OPTION_D)) None of the above


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) It may be better to avoid the metric of ROC curve as it can suffer
from accuracy paradox.
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) True

THIS IS
MANDATORY
OPTION

((OPTION_B)) False

THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH B
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) which of the following is not involve in data mining


ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) Knowledge extraction


THIS IS
MANDATORY
OPTION

((OPTION_B)) Data archaeology


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Data exploration


This is optional

((OPTION_D)) Data transformation


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH D
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) The expected value or _______ of a random variable is the center of its
distribution.
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) Mode
THIS IS
MANDATORY
OPTION

((OPTION_B)) median
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) mean
This is optional

((OPTION_D)) None of the above


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH C
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Point out the correct statement.

ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) Some cumulative distribution function F is non-decreasing and right-continuous


THIS IS
MANDATORY
OPTION

((OPTION_B)) Every cumulative distribution function F is decreasing and right-continuous


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Every cumulative distribution function F is increasing and left-continuous


This is optional

((OPTION_D)) None of the above


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH D
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Which of the following of a random variable is a measure of spread

ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) variance
THIS IS
MANDATORY
OPTION

((OPTION_B)) standard deviation


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) empirical mean


This is optional

((OPTION_D)) All above


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) The square root of the variance is called the ________ deviation

ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) empirical
THIS IS
MANDATORY
OPTION

((OPTION_B)) mean
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) continuous
This is optional

((OPTION_D)) standard
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH D
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) For continuous random variables, the CDF is the derivative of the PDF.

ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) True
THIS IS
MANDATORY
OPTION

((OPTION_B)) False
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH B
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Cumulative distribution functions are used to specify the distribution of


multivariate random variables.
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) True
THIS IS
MANDATORY
OPTION

((OPTION_B)) False
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Consider the results of a medical experiment that aims to predict whether someone is
going to develop myopia based on some physical measurements and heredity. In this
ENTER case, the input dataset consists of the person’s medical characteristics and the target
variable is binary: 1 for those who are likely to develop myopia and 0 for those who
CONTENT. QTN aren’t. This can be best classified as
CAN HAVE
IMAGES ALSO

((OPTION_A)) Regression
THIS IS
MANDATORY
OPTION

((OPTION_B)) Desicion Tree


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Clustering
This is optional

((OPTION_D)) Association Rule


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH B
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)
The purpose of a machine learning model is to approximate an unknown function
((QUESTION))
that
ENTER associates input elements to output ones
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) True
THIS IS
MANDATORY
OPTION

((OPTION_B)) False
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)
Training set is normally a representation of a global distribution
((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) True
THIS IS
MANDATORY
OPTION

((OPTION_B)) False
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 2
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)
The model has an excessive capacity and it's not more able to
((QUESTION))
generalize considering the original dynamics provided by the training set. This
ENTER problem is called as
CONTENT. QTN
CAN HAVE
IMAGES ALSO
Underfitting
((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B)) Overfitting

THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Both
This is optional

((OPTION_D)) None
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)
It can associate almost perfectly all the known samples to the corresponding
((QUESTION))
output
ENTER values, but when an unknown input is presented, the corresponding prediction
CONTENT. QTN error can be very high, This problem is called as
CAN HAVE
IMAGES ALSO
Underfitting
((OPTION_A))
THIS IS
MANDATORY
OPTION
Overfitting
((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Both
This is optional

((OPTION_D)) None
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)
---------- may prove to be more difficult to discover as it could be initially
((QUESTION))
considered the result of a perfect fitting
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO
Underfitting
((OPTION_A))
THIS IS
MANDATORY
OPTION
Overfitting
((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Both
This is optional

((OPTION_D)) None
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)
when working with a supervised scenario, we define a non-negative error
((QUESTION))
measure em which takes two arguments and allows us to compute a total error
ENTER value over the whole dataset. Those two arguments are.
CONTENT. QTN
CAN HAVE
IMAGES ALSO
expected and predicted output
((OPTION_A))
THIS IS
MANDATORY
OPTION
calculated and predicted output
((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION
calculated and measured output
((OPTION_C))
This is optional
none
((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option
A
((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)
Initial value represents a starting point over the surface of a n-variables function.
((QUESTION))
A
ENTER generic training algorithm has to find the global minimum or a point quite close
CONTENT. QTN to it
CAN HAVE (there's always a tolerance to avoid an excessive number of iterations and a
IMAGES ALSO consequent risk
of overfitting). This measure is also called

loss function
((OPTION_A))
THIS IS
MANDATORY
OPTION
predicted output
((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION
measured output
((OPTION_C))
This is optional
mean square error
((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) In 1984, the computer scientist L. Valiant


proposed a mathematical approach to determine whether a problem is learnable
ENTER by a
CONTENT. QTN computer. The name of this technique is
CAN HAVE
IMAGES ALSO
Max likelihood
((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B)) Zero one loss error


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Probably approximately correct


This is optional
none
((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH C
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) In particular, a concept is a subset of input patterns X which determine the same
output element
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) True
THIS IS
MANDATORY
OPTION

((OPTION_B)) False
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Therefore, learning a


concept (parametrically) means minimizing the corresponding loss function
ENTER restricted to a
CONTENT. QTN specific class, while learning all possible concepts (belonging to the same
CAN HAVE universe), means
IMAGES ALSO finding the minimum of a global loss function

((OPTION_A)) True
THIS IS
MANDATORY
OPTION

((OPTION_B)) False
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) An exponential time could lead to computational explosions when the datasets
are too large
ENTER or the optimization starting point is very far from an acceptable minimum.
CONTENT. QTN Moreover, it's
CAN HAVE important to remember the so-called …….
IMAGES ALSO

((OPTION_A)) curse of dimensionality


THIS IS
MANDATORY
OPTION

((OPTION_B)) Hughes phenomenon


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Probably approximately correct


This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) In many cases, in order to capture the full expressivity, it's


necessary to have a very large dataset and without enough training data, the
ENTER approximation
CONTENT. QTN can become problematic. This is called…
CAN HAVE
IMAGES ALSO

((OPTION_A)) curse of dimensionality


THIS IS
MANDATORY
OPTION

((OPTION_B)) Hughes phenomenon


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Probably approximately correct


This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH B
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE First term is called as
IMAGES ALSO

((OPTION_A)) posteriori
THIS IS
MANDATORY
OPTION

((OPTION_B)) Apriori
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) likelihood.
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
second term is called as
CAN HAVE
IMAGES ALSO

((OPTION_A)) posteriori
THIS IS
MANDATORY
OPTION

((OPTION_B)) Apriori
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) likelihood.
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH B
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
Third term is called as
CAN HAVE
IMAGES ALSO

((OPTION_A)) posteriori
THIS IS
MANDATORY
OPTION

((OPTION_B)) Apriori
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) likelihood.
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH C
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) We can create the object of abstract class


ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Which of the following step / assumption in regression modeling


impacts the trade-off between under-fitting and over-fitting the most
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) The polynomial degree

THIS IS
MANDATORY
OPTION

((OPTION_B)) Whether we learn the weights by matrix inversion or gradient descent

THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) The use of a constant-term

This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Suppose you have the following data with one real-value input
variable & one real-value output variable. What is leave-one out cross
ENTER validation mean square error in case of linear regression (Y = bX+c)?
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) 10/27
THIS IS
MANDATORY
OPTION

((OPTION_B)) 20/27

THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) 50/27
This is optional

((OPTION_D)) 49/27
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH D
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Which of the following is/ are true about “Maximum Likelihood
estimate (MLE)”?
ENTER
CONTENT. QTN 1. MLE may not always exist
CAN HAVE 2. MLE always exists
IMAGES ALSO 3. If MLE exist, it (they) may not be unique
4. If MLE exist, it (they) must be unique

((OPTION_A)) 1and4

THIS IS
MANDATORY
OPTION

((OPTION_B)) 2 and3
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) 1 and3
This is optional

((OPTION_D)) 2 and4
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH C
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Let’s say, a “Linear regression” model perfectly fits the training data
(train error is zero). Now, Which of the following statement is true?
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) You will always have test error zero


THIS IS
MANDATORY
OPTION

((OPTION_B)) . You can not have test error zero

THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) None of the above

This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH C
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Which one of the statement is true regarding residuals in regression


analysis?
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) A. Mean of residuals is always zero

THIS IS
MANDATORY
OPTION

((OPTION_B)) Mean of residuals is always less than zero


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Mean of residuals is always greater than zero


This is optional

((OPTION_D)) There is no such rule for residuals.


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Which of the one is true about Heteroskedasticity?


ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) Linear Regression with varying error terms

THIS IS
MANDATORY
OPTION

((OPTION_B)) Linear Regression with constant error terms


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Linear Regression with zero error terms


This is optional

((OPTION_D)) None of the above

This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Which of the following indicates a fairly strong relationship between


X and Y?
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) A. Correlation coefficient = 0.9

THIS IS
MANDATORY
OPTION

((OPTION_B)) . The p-value for the null hypothesis Beta coefficient =0 is 0.0001
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) The t-statistic for the null hypothesis Beta coefficient=0 is 30


This is optional

((OPTION_D)) None of these

This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)
Which of the following assumptions do we make while deriving linear regression param
((QUESTION))
1. The true relationship between dependent y and predictor x is linear
ENTER 2. The model errors are statistically independent
CONTENT. QTN 3. The errors are normally distributed with a 0 mean and constant standard deviation.
CAN HAVE
IMAGES ALSO

((OPTION_A)) 1,2&3

THIS IS
MANDATORY
OPTION

((OPTION_B)) 1&3

THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) All of above

This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH C
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) To test linear relationship of y(dependent) and x(independent)


continuous variables, which of the following plot best suited?
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) Scatter plot


THIS IS
MANDATORY
OPTION

((OPTION_B)) Barchart
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Histograms
This is optional

((OPTION_D)) None of these


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Generally, which of the following method(s) is used for predicting


continuous dependent variable?
ENTER
CONTENT. QTN 1. Linear Regression
CAN HAVE 2. Logistic Regression
IMAGES ALSO

((OPTION_A)) 1&2
THIS IS
MANDATORY
OPTION

((OPTION_B)) Only 1
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Only 2
This is optional

((OPTION_D)) None f the above


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH B
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) . A correlation between age and health of a person found to be -1.09.


On the basis of this you would tell the doctors that:
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) . The age is good predictor of health

THIS IS
MANDATORY
OPTION

((OPTION_B)) . The age is poor predictor of health


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) None of these


This is optional

((OPTION_D)) All of the above


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH C
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)
Which of the following offsets, do we use in case of least square line fit? Suppose horizontal axis is
((QUESTION)) independent variable and vertical axis is dependent variable

ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) Vertical offset

THIS IS
MANDATORY
OPTION

((OPTION_B)) Perpendicular offset


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Both but depend on situation


This is optional

((OPTION_D)) Both a&b


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)
Suppose we have generated the data with help of polynomial regression of degree 3 (degree 3 will
((QUESTION)) perfectly fit this data). Now consider below points and choose the option based on these points.

ENTER 1. Simple Linear regression will have high bias and low variance
CONTENT. QTN 2. Simple Linear regression will have low bias and high variance
3. polynomial of degree 3 will have low bias and high variance
CAN HAVE
IMAGES ALSO Polynomial of degree 3 will have low bias and Low variance

((OPTION_A)) . Only 1

THIS IS
MANDATORY
OPTION

((OPTION_B)) 1&3
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) 1&4
This is optional

((OPTION_D)) None of the above


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH C
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) . Suppose you are training a linear regression model. Now consider
these points.
ENTER
CONTENT. QTN 1. Overfitting is more likely if we have less data
CAN HAVE 2. Overfitting is more likely when the hypothesis space is small
IMAGES ALSO
Which of the above statement(s) are correct?
((OPTION_A)) Both are False

THIS IS
MANDATORY
OPTION

((OPTION_B)) 1 is False and 2 is True


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) 1 is True and 2 is False

This is optional

((OPTION_D)) None of the above


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH c
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)
Suppose we fit “Lasso Regression” to a data set, which has 100 features (X1,X2…X100). Now, we rescale
((QUESTION)) one of these feature by multiplying with 10 (say that feature is X1), and then refit Lasso regression with
the same regularization parameter.
ENTER
CONTENT. QTN Now, which of the following option will be correct?
CAN HAVE
IMAGES ALSO

((OPTION_A)) It is more likely for X1 to be excluded from the model

THIS IS
MANDATORY
OPTION

((OPTION_B)) It is more likely for X1 to be included in the model


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) . Can’t say


This is optional

((OPTION_D)) None of the above


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH B
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Which of the following is true about “Ridge” or “Lasso” regression


methods in case of feature selection?
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) Ridge regression uses subset selection of features


THIS IS
MANDATORY
OPTION

((OPTION_B)) . Lasso regression uses subset selection of features


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Both use subset selection of features


This is optional

((OPTION_D)) All of the above


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH B
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) .Which of the following statement(s) can be true post adding a


variable in a linear regression model?
ENTER 1. R-Squared and Adjusted R-squared both increase
CONTENT. QTN 2. R-Squared increases and Adjusted R-squared decreases
CAN HAVE 3. R-Squared decreases and Adjusted R-squared decreases
IMAGES ALSO 4. R-Squared decreases and Adjusted R-squared increases

((OPTION_A)) . 1 and 2

THIS IS
MANDATORY
OPTION

((OPTION_B)) 1 and 3

THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) 2 and 4

This is optional

((OPTION_D)) none of these

This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) . Which of the following metrics can be used for evaluating regression
models?
ENTER 1. R Squared
CONTENT. QTN 2. Adjusted R Squared
CAN HAVE 3. F Statistics
IMAGES ALSO 1. RMSE / MSE / MAE

((OPTION_A)) 2 and 4

THIS IS
MANDATORY
OPTION

((OPTION_B)) 1 and 2.
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) . 2, 3 and 4.
This is optional

((OPTION_D)) All of the above


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH D
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) We can also compute the coefficient of linear regression with the help
of an analytical method called “Normal Equation”. Which of the
ENTER following is/are true about “Normal Equation”?
CONTENT. QTN 1. We don’t have to choose the learning rate
CAN HAVE 2. It becomes slow when number of features is very large
IMAGES ALSO 3. No need to iterate

((OPTION_A)) 1 and 2

THIS IS
MANDATORY
OPTION

((OPTION_B)) 1&3
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) 2&3
This is optional

((OPTION_D)) 1,2&3
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH D
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)
. The expected value of Y is a linear function of the X(X1,X2….Xn) variables and regression line is
((QUESTION)) defined as:
Y = β0 + β1 X1 + β2 X2……+ βn Xn
ENTER Which of the following statement(s) are true?
1. If Xi changes by an amount ∆Xi, holding other variables constant, then the expected value of Y
CONTENT. QTN changes by a proportional amount βi ∆Xi, for some constant βi (which in general could be a
CAN HAVE positive or negative number).
2. The value of βi is always the same, regardless of values of the other X’s.
IMAGES ALSO 3. The total effect of the X’s on the expected value of Y is the sum of their separate effects.

((OPTION_A)) . 1 and 2

THIS IS
MANDATORY
OPTION

((OPTION_B)) 1 and 3

THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) 2 and 3

This is optional

((OPTION_D)) 1,2 and 3

This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH D
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) . How many coefficients do you need to estimate in a simple linear


regression model (One independent variable)
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) 1
THIS IS
MANDATORY
OPTION

((OPTION_B)) 2
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) CAN’T SAY


This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH B
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 2
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)
. Below graphs show two fitted regression lines (A & B) on randomly generated data. Now, I want to find
((QUESTION)) the sum of residuals in both cases A and B.

ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO Which of the following statement is true about sum of residuals of A and B

((OPTION_A)) A has higher than B

THIS IS
MANDATORY
OPTION

((OPTION_B)) A has lower than B


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Both have same

This is optional

((OPTION_D)) None of these

This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH C
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) If two variables are correlated, is it necessary that they have a linear
relationsh
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) YES
THIS IS
MANDATORY
OPTION

((OPTION_B)) NO
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Both a&b


This is optional

((OPTION_D)) None of the above


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH B
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Correlated variables can have zero correlation coeffficient. True or


False?
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) TRUE
THIS IS
MANDATORY
OPTION

((OPTION_B)) FALSE
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)
Suppose I applied a logistic regression model on data and got training accuracy X and testing accuracy Y.
((QUESTION)) Now I want to add few new features in data. Select option(s) which are correct in such case.
Note: Consider remaining parameters are same.
ENTER 1. Training accuracy always decreases.
2. Training accuracy always increases or remain same.
CONTENT. QTN 3. Testing accuracy always decreases
CAN HAVE Testing accuracy always increases or remain same

IMAGES ALSO

((OPTION_A)) Only 2
THIS IS
MANDATORY
OPTION

((OPTION_B)) Only 1
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Only3
This is optional

((OPTION_D)) All of the above


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)
The graph below represents a regression line predicting Y from X. The values on the
((QUESTION)) graph shows the residuals for each predictions value. Use this information to
ENTER compute the SSE.
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) 3.02

THIS IS
MANDATORY
OPTION

((OPTION_B)) 0.75
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) 1.01

This is optional

((OPTION_D)) None of these


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Suppose the distribution of salaries in a company X has median


$35,000, and 25th and 75th percentiles are $21,000 and $53,000
ENTER respectively.
CONTENT. QTN Would a person with Salary $1 be considered an Outlier?
CAN HAVE
IMAGES ALSO

((OPTION_A)) YES

THIS IS
MANDATORY
OPTION

((OPTION_B)) NO
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) . More information is required


This is optional

((OPTION_D)) None of these


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH C
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Which of the following option is true regarding “Regression” and


“Correlation” ?
ENTER Note: y is dependent variable and x is independent variable.
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) The relationship is symmetric between x and y in both.

THIS IS
MANDATORY
OPTION

((OPTION_B)) The relationship is not symmetric between x and y in both.


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) The relationship is not symmetric between x and y in case of correlation


but in case of regression it is symmetric.
This is optional

((OPTION_D)) The relationship is symmetric between x and y in case of correlation but


in case of regression it is not symmetric.
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH B
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) True-False: Is Logistic regression a supervised machine learning


algorithm?
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) TRUE

THIS IS
MANDATORY
OPTION

((OPTION_B)) FALSE

THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) _
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) True-False: Is Logistic regression mainly used for Regression?


ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) TRUE
THIS IS
MANDATORY
OPTION

((OPTION_B)) FALSE
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH B
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) True-False: Is it possible to design a logistic regression algorithm


using a Neural Network Algorithm?
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) TRUE
THIS IS
MANDATORY
OPTION

((OPTION_B)) FALSE
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) True-False: Is it possible to apply a logistic regression algorithm on a


3-class Classification problem?
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) TRUE
THIS IS
MANDATORY
OPTION

((OPTION_B)) FALSE
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Which of the following methods do we use to best fit the data in
Logistic Regression?
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) Least Square Error

THIS IS
MANDATORY
OPTION

((OPTION_B)) Maximum Likelihood


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Jaccard distance


This is optional

((OPTION_D)) Both a&B


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH B
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) One of the very good methods to analyze the performance of Logistic
Regression is AIC, which is similar to R-Squared in Linear
ENTER Regression. Which of the following is true about AIC
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) We prefer a model with minimum AIC value

THIS IS
MANDATORY
OPTION

((OPTION_B)) We prefer a model with maximum AIC value


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Both but depend on the situation


This is optional

((OPTION_D)) None of the above


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) True-False] Standardisation of features is required before training a


Logistic Regression
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) TRUE
THIS IS
MANDATORY
OPTION

((OPTION_B)) FALSE
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH B
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Which of the following algorithms do we use for Variable Selection?


ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) ) LASSO

THIS IS
MANDATORY
OPTION

((OPTION_B)) Ridge

THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Both
This is optional

((OPTION_D)) All of these


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Suppose you have been given a fair coin and you want to find out the
odds of getting heads. Which of the following option is true for such a
ENTER case?
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) odds will be 0

THIS IS
MANDATORY
OPTION

((OPTION_B)) odds will be 0.5


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) odds will be 1


This is optional

((OPTION_D)) None of the above


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH C
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) ) The logit function(given as l(x)) is the log of odds function. What
could be the range of logit function in the domain x=[0,1]?
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) (– ∞ , ∞)

THIS IS
MANDATORY
OPTION

((OPTION_B)) (0,1)
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) (0, ∞)
This is optional

((OPTION_D)) (- ∞, 0)
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Which of the following option is true?


ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) Linear Regression errors values has to be normally distributed but in case
of Logistic Regression it is not the case
THIS IS
MANDATORY
OPTION

((OPTION_B)) Linear Regression errors values has to be normally distributed but in case
of Logistic Regression it is not the case
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Both Linear Regression and Logistic Regression error values have to be
normally distributed
This is optional

((OPTION_D)) Both Linear Regression and Logistic Regression error values have not to
be normally distributed
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)
17) Which of the following is true regarding the logistic function for any value “x Note:
((QUESTION)) Logistic(x): is a logistic function of any number “x”
Logit(x): is a logit function of any number “x”
ENTER Logit_inv(x): is a inverse logit function of any number “x””?
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) C) A) Logistic(x) = Logit(x)

THIS IS
MANDATORY
OPTION

((OPTION_B)) Logistic(x) = Logit_inv(x)


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) A) Logistic(x) = Logit(x)


This is optional

((OPTION_D)) None of these

This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH B
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 2
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Suppose, You applied a Logistic Regression model on a given data and
got a training accuracy X and testing accuracy Y. Now, you want to
ENTER add a few new features in the same data. Select the option(s) which
CONTENT. QTN is/are correct in such a case.
CAN HAVE
IMAGES ALSO Note: Consider remaining parameters are same.

((OPTION_A)) Training accuracy increases

THIS IS
MANDATORY
OPTION

((OPTION_B)) Training accuracy increases or remains the same


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Testing accuracy decreases


This is optional

((OPTION_D)) Testing accuracy increases or remains the same


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A&D
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Choose which of the following options is true regarding One-Vs-All


method in Logistic Regression.
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) We need to fit n models in n-class classification problem

THIS IS
MANDATORY
OPTION

((OPTION_B)) We need to fit n-1 models to classify into n classes


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) We need to fit only 1 model to classify into n classes


This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) What would do if you want to train logistic regression on same data
that will take less time as well as give the comparatively similar
ENTER accuracy(may not be same)?
CONTENT. QTN
CAN HAVE Suppose you are using a Logistic Regression model on a huge dataset. One
IMAGES ALSO of the problem you may face on such huge data is that Logistic regression
will take very long time to train
((OPTION_A)) Decrease the learning rate and decrease the number of iteration
THIS IS
MANDATORY
OPTION

((OPTION_B)) Decrease the learning rate and increase the number of iteration
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Increase the learning rate and increase the number of iteration
This is optional

((OPTION_D)) Increase the learning rate and decrease the number of iteration
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH D
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 2
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)
Which of the following image is showing the cost function for y =1.
((QUESTION)) Following is the loss function in logistic regression(Y-axis loss function and x axis log probability) for two
class classification problem.
ENTER Note: Y is the target class
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) A
THIS IS
MANDATORY
OPTION

((OPTION_B)) B
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) BOTH
This is optional

((OPTION_D)) NON OF THESE


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) Logistic regression is used when you want to:

ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) Predict a dichotomous variable from continuous or dichotomous variables.

THIS IS
MANDATORY
OPTION

((OPTION_B)) Predict a continuous variable from dichotomous variables.


THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Predict any categorical variable from several other categorical


variables.
This is optional

((OPTION_D)) Predict a continuous variable from dichotomous or continuous variables


This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH A
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)
The odds ratio is
((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) The ratio of the probability of an event not happening to the probability of the event happening.

THIS IS
MANDATORY
OPTION

((OPTION_B)) The probability of an event occurring.

THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) The ratio of the odds after a unit change in the predictor to the original odds.

This is optional

((OPTION_D)) The ratio of the probability of an event happening to the probability of the event not happening.

This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH C
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)
Large values of the log-likelihood statistic indicate:
((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) That there are a greater number of explained vs. unexplained observations.

THIS IS
MANDATORY
OPTION

((OPTION_B)) That the statistical model fits the data well.

THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) That as the predictor variable increases, the likelihood of the outcome occurring decreases.

This is optional

((OPTION_D)) That the statistical model is a poor fit of the data.

This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH B
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)
Logistic regression assumes a:
((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) Linear relationship between continuous predictor variables and the outcome variable.

THIS IS
MANDATORY
OPTION

((OPTION_B)) Linear relationship between continuous predictor variables and the logit of the outcome
variable.
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) Linear relationship between continuous predictor variables.

This is optional

((OPTION_D)) Linear relationship between observations.

This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH B
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)
In binary logistic regression:
((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) The dependent variable is continuous.

THIS IS
MANDATORY
OPTION

((OPTION_B)) The dependent variable is divided into two equal subcategories.

THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) The dependent variable consists of two categories.

This is optional

((OPTION_D)) There is no dependent variable.

This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH C
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 1
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION)) The correlation coefficient is used to determine


ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A)) A specific value of the y-variable given a specific value of the x-


variable
THIS IS
MANDATORY
OPTION

((OPTION_B)) A specific value of the x-variable given a specific value of the y-


variable
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C)) The strength of the relationship between the x and y variables


This is optional

((OPTION_D)) none
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH C
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH B
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS)) 2
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
((MARKS))
QUESTION IS OF
HOW MANY
MARKS? (1 OR 2
OR 3 UPTO 10)

((QUESTION))
ENTER
CONTENT. QTN
CAN HAVE
IMAGES ALSO

((OPTION_A))
THIS IS
MANDATORY
OPTION

((OPTION_B))
THIS IS ALSO
MANDATORY
OPTION

((OPTION_C))
This is optional

((OPTION_D))
This is optional

((OPTION_E))
This is optional.
If optional keep
empty so that
system will skip
this option

((CORRECT_CH
OICE)) Either A
or B or C or D or
E

((EXPLANATION
)) This is also
optional
This sheet is for 3 Mark questions
S.r No Question Image a b c d Correct Answer
e.g 1 Write down question img.jpg Option a Option b Option c Option d a/b/c/d
1 Which of the following is characteristic of best fast accuracy scalable All above D
machine learning method ?

2 What are the different Algorithm techniques in Supervised Unsupervised Both A & B None of the C
Machine Learning? Learning and Learning and Mentioned
Semi- Transduction
3 ______can be adopted when it's necessary to Supervised Semi- Reinforcement Clusters B
categorize a large amount of data with a few supervised
complete examples or when there's the need to
4 In reinforcement learning, this feedback is usually Overfitting Overlearning Reward None of above C
called as___.

5 In the last decade, many researchers started training Deep learning Machine Reinforcement Unsupervised A
bigger and bigger models, built with several different learning learning learning
layers that's why this approach is called_____.
6 What does learning exactly mean? Robots are A set of data Learning is the It is a set of C
programed so is used to ability to data is used to
that they can discover the change discover the
7 When it is necessary to allow the model to develop a Overfitting Overlearning Classification Regression A
generalization ability and avoid a common problem
called______.
8 Techniques involve the usage of both labeled and Supervised Semi- Unsupervised None of the B
unlabeled data is called___. supervised above

9 there's a growing interest in pattern recognition and Regression Accuracy Modelfree Scalable C
associative memories whose structure and functioning
are similar to what happens in the neocortex. Such an
10 ______ showed better performance than other Machine Deep learning Reinforcement Supervised B
approaches, even without a context-based model learning learning learning

11 Machine Data mining


None of the
Which of the following sentence is correct? -- learning relates can be defined Both A & B C
above
with the study, as the process
12 when a Robots are
While a set of data is
statistical programed so
involving the used to
model that they can
process of discover the
What is ‘Overfitting’ in Machine learning? -- describes perform the A
learning potentially
random error task based on
‘overfitting’ predictive
or noise data they
occurs. relationship
instead of gather from
13
Test set is used It is a set of
to test the data is used to
accuracy of the discover the
What is ‘Test set’? -- Both A & B None of above A
hypotheses potentially
generated by predictive
the learner. relationship.

14 Classifications,
Predict time Speech
what is the function of ‘Supervised Learning’? -- series, recognition, Both A & B None of above C
Annotate Regression
strings
15 Object Similarity Automatic
Commons unsupervised applications include -- All above D
segmentation detection labeling
16
the it's impossible
Reinforcement learning is particularly efficient environment is it's often very to have a
-- All above D
when______________. not completely dynamic precise error
deterministic measure

17 During the last few years, many ______ algorithms


have been applied to deep
neural networks to learn the best policy for playing
-- Logical Classical Classification None of above D
Atari video games and to teach an agent how to
associate the right action with an input representing
the state.
18
Image Autonomous
classification, car driving, Bioinformatics,
Common deep learning applications include____ -- All above D
Real-time Logistic Speech
visual tracking optimization recognition

19 if there is only a discrete number of possible


outcomes (called categories), -- Regression Classification. Modelfree Categories B
the process becomes a______.
20
Spam detection,
Pattern Image Autonomous
Which of the following are supervised learning detection, classification, car driving, Bioinformatics,
-- A
applications Natural Real-time Logistic Speech
Language visual tracking optimization recognition
Processing
21
Let’s say, you are working with categorical feature(s) Frequency
and you have not looked at the distribution of the All categories distribution of
Train and Test
categorical variable in the test data. of categorical categories is
always have
-- variable are different in Both A and B D
same
You want to apply one hot encoding (OHE) on the not present in train as
distribution.
categorical feature(s). What challenges you may face the test dataset. compared to
if you have applied OHE on a categorical variable of the test dataset.
train dataset?
22 It may be used It discovers
Which of the following sentence is FALSE regarding It relates inputs It is used for
-- for causal D
regression? to outputs. prediction.
interpretation. relationships.
23
Density-Based Spectral
Which of the following method is used to find the
-- k-Means Spatial Clustering Find All above D
optimal features for cluster analysis
Clustering clusters

24 scikit-learn also provides functions for creating make_classifica make_regressio


-- make_blobs() All above D
dummy datasets from scratch: tion() n()
25 _____which can accept a NumPy RandomState
-- make_blobs random_state test_size training_size B
generator or an integer seed.
26 In many classification problems, the target dataset is
made up of categorical labels which cannot
immediately be processed by any algorithm. An -- 1 2 3 4 B
encoding is needed and scikit-learn offers at
least_____valid options
27 In which of the following each categorical label is
first turned into a positive integer and then LabelEncoder LabelBinarizer
-- DictVectorizer FeatureHasher C
transformed into a vector where only one feature is 1 class class
while all the others are 0.
28 Using an
automatic
______is the most drastic one and should be Creating sub-
strategy to
considered only when the dataset is quite large, the Removing the model to
-- input them All above A
number of missing features is high, and any whole line predict those
according to
prediction could be risky. features
the other
known values
29 It's possible to specify if the scaling process must
with_mean=Tru with_std=True/ None of the
include both mean and standard deviation using the -- Both A & B C
e/False False Mentioned
parameters________.
30 Which of the following selects the best K high-score SelectPercentil
-- FeatureHasher SelectKBest All above C
features. e
31
How does number of observations influence
overfitting? Choose the correct answer(s).Note:
Rest all parameters are same1. In case of fewer
observations, it is easy to overfit the data.2. In
-- 1 and 4 2 and 3 1 and 3 None of theses A
case of fewer observations, it is hard to overfit
the data.3. In case of more observations, it is
easy to overfit the data.4. In case of more
observations, it is hard to overfit the data.
32 Suppose you have fitted a complex regression In case of In case of In case of In case of
model on a dataset. Now, you are using Ridge very large very large very large very large
regression with tuning parameter lambda to lambda; bias lambda; bias lambda; bias lambda; bias
-- C
reduce its complexity. Choose the option(s) is low, is low, is high, is high,
below which describes relationship of bias and variance is variance is variance is variance is
variance with lambda. low high low high
33 What is/are true about ridge regression?1. When
lambda is 0, model works like linear regression
model2. When lambda is 0, model doesn’t work
like linear regression model3. When lambda goes
-- 1 and 3 1 and 4 2 and 3 2 and 4 A
to infinity, we get very, very small coefficients
approaching 04. When lambda goes to infinity,
we get very, very large coefficients approaching
infinity
34 Which of the following method(s) does not have Ridge Both Ridge
-- Lasso None of both B
closed form solution for its coefficients? regression and Lasso
35
Function used for linear regression in R is lm(formula, lr(formula, lrm(formula, regression.linear
-- A
__________ data) data) data) (formula, data)
36
In the mathematical Equation of Linear Regression (X-intercept, (Slope, X- (Y-Intercept, (slope, Y-
-- C
Y = β1 + β2X + ϵ, (β1, β2) refers to __________ Slope) Intercept) Slope) Intercept)
37
Suppose that we have N independent variables
(X1,X2… Xn) and dependent variable is Y. Now
Relation Relation Relation Correlation
Imagine that you are applying linear regression
between the between the between the can’t judge
by fitting the best fit line using least square error -- B
X1 and Y is X1 and Y is X1 and Y is the
on this data. You found that correlation
weak strong neutral relationship
coefficient for one of it’s variable(Say X1) with
Y is -0.95.Which of the following is true for X1?
38 We have been given a dataset with n records in
which we have input attribute as x and output
attribute as y. Suppose we use a linear regression
method to model this data. To test our linear
Remain
regressor, we split the data in training set and test -- Increase Decrease Can’t Say D
constant
set randomly. Now we increase the training set
size gradually. As the training set size increases,
what do you expect will happen with the mean
training error?
39 We have been given a dataset with n records in
which we have input attribute as x and output
attribute as y. Suppose we use a linear regression Bias Bias Bias
Bias increases
method to model this data. To test our linear increases and decreases decreases and
-- and Variance D
regressor, we split the data in training set and test Variance and Variance Variance
decreases
set randomly. What do you expect will happen increases increases decreases
with bias and variance as you increase the size of
training data?
40
Suppose, you got a situation where you find that
your linear regression model is under fitting the
data. In such situation which of the following
-- 1 and 2 2 and 3 1 and 3 1, 2 and 3 A
options would you consider?1. I will add more
variables2. I will start introducing polynomial
degree variables3. I will remove some variables
41 Problem: Players will play if weather is sunny. Is
weather data.jpg TRUE FALSE A
this statement is correct?
42 Multinomial Naïve Bayes Classifier is
Continuous Discrete Binary B
___________distribution
43 For the given weather data, Calculate probability
weather data.jpg 0.4 0.64 0.36 0.5 C
of not playing
44
Suppose you have trained an SVM with linear
You want to
decision boundary after training SVM, you You want to You will try You will try
decrease
correctly infer that your SVM model is under -- increase your to calculate to reduce the C
your data
fitting.Which of the following option would you data points more variables features
points
more likely to consider iterating SVM next time?
45 The minimum time complexity for training an
Small Medium Size does not
SVM is O(n2). According to this fact, what sizes -- Large datasets A
datasets sized datasets matter
of datasets are not best suited for SVM’s?
46 Selection of Kernel Soft Margin All of the
The effectiveness of an SVM depends upon: -- D
Kernel Parameters Parameter C above
47 How
How far the
accurately The threshold
hyperplane is
What do you mean by generalization error in the SVM can amount of
-- from the B
terms of the SVM? predict error in an
support
outcomes for SVM
vectors
unseen data
48 The SVM
The SVM
allows high
allows very None of the
What do you mean by a hard margin? -- amount of A
low error in above
error in
classification
classification
49 We usually use feature normalization before
using the Gaussian kernel in SVM. What is true
about feature normalization? 1. We do feature
normalization so that new feature will dominate
-- 1 1 and 2 1 and 3 2 and 3 B
other 2. Some times, feature normalization is not
feasible in case of categorical variables3. Feature
normalization always helps when we use
Gaussian kernel in SVM
50 Support vectors are the data points that lie
-- TRUE FALSE A
closest to the decision surface.
51 Which of the following is not supervised Decision Naive Linerar
-- PCA A
learning? Tree Bayesian regression
52
The model The model The model
would would would not be
consider even consider only affected by
Suppose you are using RBF kernel in SVM with None of the
-- far away the points distance of B
high Gamma value. What does this signify? above
points from close to the points from
hyperplane hyperplane hyperplane
for modeling for modeling for modeling
53 Gaussian Naïve Bayes Classifier is
-- Continuous Discrete Binary A
___________distribution
54 If I am using all features of my dataset and I
Nothing, the
achieve 100% accuracy on my training set, but
-- Underfitting model is Overfitting C
~70% on validation set, what should I look out
perfect
for?
55 b. To judge
how the
a. To assess trained
What is the purpose of performing cross- the predictive model c. Both A and
-- C
validation? performance performs B
of the models outside the
sample on
test data
56 a. Assumes
b. Assumes
that all the
that all the
Which of the following is true about Naive features in a c. Both A and d. None of the
-- features in a C
Bayes ? dataset are B above option
dataset are
equally
independent
important
57 Suppose you are using a Linear SVM classifier
with 2 class classification problem. Now you
have been given the following data in which
some points are circled red that are representing svm.jpg yes no A
support vectors.If you remove the following any
one red points from the data. Does the decision
boundary will change?
58 Linear SVMs have no hyperparameters that need
-- TRUE FALSE B
to be set by cross-validation
59 For the given weather data, what is the
probability that players will play if weather is weather data.jpg 0.5 0.26 0.73 0.6 D
sunny
60 100 people are at party. Given data gives
information about how many wear pink or not,
and if a man or not. Imagine a pink wearing man.jpg 0.4 0.2 0.6 0.45 B
guest leaves, what is the probability of being a
man
61 Problem: Players will play if weather is sunny. Is this statement
weather is correct?
data.jpg TRUE FALSE a
62 For the given weather data, Calculate probability of playingdata.jpg
weather 0.4 0.64 0.29 0.75 b
63 For the given weather data, Calculate probability of not playing
weather data.jpg 0.4 0.64 0.36 0.5 c
64 For the given weather data, what is the probabilityweather
that players will play if weather
data.jpg 0.5 is sunny 0.26 0.73 0.6 d
65 100 people are at party. Given data gives information about how many wear pink
man.jpg 0.4 or not, and 0.2
if a man or not.0.6
Imagine a pink0.45
wearing
b guest leaves, what is the probabilit
66 100 people are at party. Given data gives information about how many wear
man.jpg TRUE pink or not, and if a man or not. Imagine a pink wearing
FALSE a guest leaves, was it a man?
67 What do you mean by generalization error in terms of the SVM? How far the hyperplane How accurately
is fromThe
the threshold
support
SVM canvectors
amount
predict outcomes
of error inbfor
an unseen
SVM data
68 What do you mean by a hard margin? The SVM allowsThe very
SVMlow allows
errorNone
inhigh
classification
ofamount
the above
of error in classification
a
69 The minimum time complexity for training an SVM is O(n2). According Large
to this
datasets
fact, what
Smallsizes
datasets
of datasets
Mediumare sized
notdatasets
best
Size suited
does notfor matter
SVM’s?
a
70 The effectiveness of an SVM depends upon: Selection of Kernel Kernel ParametersSoft Margin Parameter
All of theC aboved
71 TRUE
Support vectors are the data points that lie closest to the decision surface. FALSE a
72 The SVM’s are less effective when: The data is linearlyTheseparable
data is clean
Theanddataready
is noisy
to use
and contains overlapping
c points
73 Suppose you are using RBF kernel in SVM with high Gamma value.The Whatmodel
doeswould
thisThe
signify?
consider
model would
evenThefar
consider
model
away would
points
onlyNone
the
not