0% found this document useful (0 votes)

69 views10 pages

BigData ML

The document contains questions related to machine learning concepts like clustering, big data characteristics, Apache Spark features, dimensionality reduction, regression, classification, and data preprocessing techniques. Some key questions ask about the definition of clustering, characteristics of big data that are not volume, variety or veracity, statements that are false about Apache Spark, whether dimensionality reduction is a clustering, feature extraction, classification or regression problem, and whether linear regression finds a relationship between independent and dependent variables.

Uploaded by

Mo Farhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views10 pages

BigData ML

Uploaded by

Mo Farhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

5.

The procedure to organize items of a given collection into groups based on some similar
features called as ————-
(1 Point)

Decision Trees

Regression

Association

Clustering
6.
The terms used in Machine Learning with Big data are. i). Pattern Recognition ii). Data
mining iii). Data slang iv).Predictive Analytics
(1 Point)

(iii) is wrong.

(ii) is correct.

Only (i) and (iv) are correct.

All are correct.

7.
Which of these is not the characteristic of Big Data?
(1 Point)

Veracity

Volume

Integrity

Variety
8.
Which of the following is false for Apache Spark?
(1 Point)

Enables powerful interactive and data analytics application across live streaming data

It is the kernel of spark

It enables users to run SQL/HQL queries on the top of spark.

Provides an execution platform for all the spark applications

9.
Dimensionality Reduction is a
(1 Point)

Clustering Problem

Feature Extraction Problem

Classification Problem

Regression Problem
10.
The process of constructing a mathematical model that can be used to predict one
variable by another variable
(1 Point)

Correlation

Outlier

Regression

Cluster Analysis
11.
How is KNN model used for classification?
(1 Point)
All the neighbours that are ‘K’ distance apart from the new sample point determine the label
for the new sample

The class labels of ‘K’ neighbouring samples determine the label for the new sample.

All the training samples within a circle of ‘K’ radius determine the label for the new sample.
12.
Choose the false statement.
(1 Point)

Association analysis, define what an item set is.

One can uncover unexpected and useful relationships with association analysis

Association rules are not used to determine when items or events occur together

The goal is to come up with a set of rules to capture associations between items or events
13.
What are the two parts in data understanding phase of CRISP-DM?
(1 Point)

Data exploration and data preprocessing

Data acquisition and data exploration

Data acquisition and data preprocessing

Data preprocessing and data modelling

14.
The main steps in the k-means clustering algorithm are
(1 Point)

Calculate the centroids, then determine the appropriate stopping criterion depending on the
number of centroids.

Assign each sample to the closest centroid, then calculate the new centroid.
Calculate the distances between the cluster centroids, then find the two closest centroids.

Count the number of samples, then determine the initial centroids.

15.
__________ is a process of predicting numeric values and not category.
(1 Point)

Regression

Classification

Prediction

Analysis
16.
Misclassification rate is another name given for :
(1 Point)

Classification Rate

Training Rate

Error Rate

Testing Rate
17.
What is the purpose of exploring data?
(1 Point)

To digitize your data.

To gather your data into one repository

To gain a better understanding of your data.

To generate label for your data

18.
Select the one that is NOT a way to handle missing values?
(1 Point)

Drop samples with missing values

Replace missing values with median value

Replace a missing value with an outlier

Replace missing values with most probable value.

19.
Which of the following is measure used in decision trees while selecting splitting criteria
that partitions data into the best possible manner.
(1 Point)

Gini Index

Association

Probability

Regression
20.
What is involved in data wrangling?
(1 Point)

Removing the outliers

Removing noise from the data

Feature selection and Feature transformation

Cleaning the data

21.
Which of the following is NOT an example of regression?
(1 Point)
Estimating the amount of rain

Predicting the price of a stock

Determining whether power usage will rise or fall

Predicting the demand for a product

22.
Which of the following is not affected by the curse of dimensionality?
(1 Point)

KNN

Correlation

Decision Tree

Naïve Bayes
23.
Choose the correct statement
(1 Point)

Bar plots never use aggregation - not sure

Bar plots are drawn for numeric variables

Histograms and bar plots are used for categorical and numeric data respectively.

Bar plots always use numeric binner

24.
A model that overfits will not _______ well to new data.
(1 Point)

Regularise

Generalize
Justify

Optimize
25.
In linear regression, the least squares method is used to
(1 Point)

Determine the regression line that best fits the samples

Determine whether the target is categorical or numerical

Determine how to partition the data into training and test sets.

Determine the distance between two pairs of samples.

26.
Sentiment Analysis is an example of :
(1 Point)

Regression, Classification and Clustering

Regression Only

Regression, Classification, Clustering and Reinforcement Learning

Regression, Classification and Reinforcement

27.
Which of these statements is true about samples and variables?
(1 Point)

All

A Sample is an instance or example of an entity.

A variable describes a special characteristic of an entity in your data.

A sample can have many variables to describe it.

28.
Merging duplicate records while retaining relevant data is an example that illustrates the
use of _________ knowledge to address a data quality issue.
(1 Point)

data

none of these

feature

domain
29.
What is Dimensionality reduction?
(1 Point)

Dimensionality reduction is scaling variable values to smaller range

Dimensionality reduction is analysing data in high dimensional space

Generation of synthetic data from original data

Dimensionality reduction is finding a smaller subset of feature that can effectively capture
the characteristics of the input data
30.
Cluster results can be used to
(1 Point)

Segment the data into groups so that each group can be analyzed further

Create labeled samples for a classification task

All of these choices are valid uses of the resulting clusters.

Determine anomalous samples

Classify new samples

31.
Which category of machine learning algorithms are supervised?
(1 Point)

Classification and clustering

Regression and clustering

Regression and association analysis

Classification and regression

32.
Which of the following is not a type of clustering algorithm?
(1 Point)

Centroid clustering

K-Mean clustering

Density clustering

Simple clustering
33.
Which is not a way to accomplish pre-pruning in decision trees?
(1 Point)

Stop if number of records< some threshold

Stop if improvement in impurity measure< some threshold

None

Stop when the tree is grown to its maximum size

34.
———— regression finds a relationship between one or more features (independent
variables) and a continuous variables (dependent variable).
(1 Point)
None of These

Non-linear

Linear

Both of these - not sure

Submit
This content is created by the owner of the form. The data you submit will be sent to the form owner. Microsoft is not
responsible for the privacy or security practices of its customers, including those of this form owner. Never give out
your password.

Powered by Microsoft Forms | Privacy and cookies | Terms of use

Doubts - 8, 23, 26, 32, 34 please write question and answer no..
PLEASE CONFIRM REMAINING 4

-> Choose the correct statement - Bar plots never use aggregation - i had done this since
bar is category type things and histogram is numeric type things

-> regression finds a relationship between one or more features (independent

variables) and a continuous variables (dependent variable). - Both of these

-> Which of the following is not a type of clustering algorithm? - Simple Clustering - SURE?

-> Which of the following is false for Apache Spark? - Enables powerful interactive and
data analytics application across live streaming data

-> Sentiment Analysis is an example of: - Regression, Classification and Reinforcement

submit??

Submitted - shameek ++
Thanks everyone
Submitted - Devesh

Practice MCQ AI
No ratings yet
Practice MCQ AI
4 pages
q2 Finals Itpfl7
No ratings yet
q2 Finals Itpfl7
1 page
MCQ's of Data Mining CIT-661 Part 1 - Prepared by GCUF Guiders
No ratings yet
MCQ's of Data Mining CIT-661 Part 1 - Prepared by GCUF Guiders
9 pages
Data Mining Algorithms MCQs
No ratings yet
Data Mining Algorithms MCQs
9 pages
Mcqs Unit 3
No ratings yet
Mcqs Unit 3
6 pages
ML BIT Ans
No ratings yet
ML BIT Ans
5 pages
ML QB Ans
No ratings yet
ML QB Ans
48 pages
Regression & Clustering Quiz
No ratings yet
Regression & Clustering Quiz
4 pages
Shivaji University, Kolhapur
No ratings yet
Shivaji University, Kolhapur
12 pages
Untitled Document
No ratings yet
Untitled Document
21 pages
Oral Exam 2024
No ratings yet
Oral Exam 2024
2 pages
Data Mining Questions and Answers Bank
No ratings yet
Data Mining Questions and Answers Bank
19 pages
Unit 3 Question Bank
No ratings yet
Unit 3 Question Bank
8 pages
Data Mining and Predictive Analytics Quiz
No ratings yet
Data Mining and Predictive Analytics Quiz
6 pages
Sem3 Asmt Answers
No ratings yet
Sem3 Asmt Answers
20 pages
DWDM MID - 2 Question Paper and Online Bits
No ratings yet
DWDM MID - 2 Question Paper and Online Bits
3 pages
Data Science Exam Practice Questions
No ratings yet
Data Science Exam Practice Questions
5 pages
Advanced Data Analytics Exam Questions and Answers
No ratings yet
Advanced Data Analytics Exam Questions and Answers
7 pages
Revision Exercise SDSC5001 Midterm
No ratings yet
Revision Exercise SDSC5001 Midterm
4 pages
Data Mining
No ratings yet
Data Mining
8 pages
Statistics
No ratings yet
Statistics
16 pages
MLP Question Bank of AI and ML and NLP
No ratings yet
MLP Question Bank of AI and ML and NLP
7 pages
Assignment-2: Predictive Analysis and Regression Models
No ratings yet
Assignment-2: Predictive Analysis and Regression Models
5 pages
Machine Learning Concepts and Techniques
No ratings yet
Machine Learning Concepts and Techniques
15 pages
Data Mining Concepts and Techniques
No ratings yet
Data Mining Concepts and Techniques
5 pages
Supervised vs Unsupervised Learning
No ratings yet
Supervised vs Unsupervised Learning
5 pages
Final - Model-Machine Learning Without Solution
No ratings yet
Final - Model-Machine Learning Without Solution
15 pages
Data Final
No ratings yet
Data Final
17 pages
Foundations of Data Science - R19AD253
No ratings yet
Foundations of Data Science - R19AD253
22 pages
MCQ On Data Mining
No ratings yet
MCQ On Data Mining
20 pages
Data Science Quiz Questions
No ratings yet
Data Science Quiz Questions
7 pages
MCQ On Data Mining
No ratings yet
MCQ On Data Mining
20 pages
Imp Qs
No ratings yet
Imp Qs
10 pages
Question Big Data-1
No ratings yet
Question Big Data-1
11 pages
MLfinal 1
No ratings yet
MLfinal 1
7 pages
Data Mining
No ratings yet
Data Mining
9 pages
Important Questions
No ratings yet
Important Questions
3 pages
Ds
No ratings yet
Ds
22 pages
Data Analytics Questions
No ratings yet
Data Analytics Questions
40 pages
Ai ML
No ratings yet
Ai ML
12 pages
Data Mining New
No ratings yet
Data Mining New
3 pages
Data Mining Techniques in Engineering
No ratings yet
Data Mining Techniques in Engineering
11 pages
Data Warehousing & Mining Exam Questions
No ratings yet
Data Warehousing & Mining Exam Questions
655 pages
trắc nghiệm phân tích dữ liệu trong kế toán
No ratings yet
trắc nghiệm phân tích dữ liệu trong kế toán
24 pages
MCQ
100% (7)
MCQ
37 pages
Ai Chapter 4
No ratings yet
Ai Chapter 4
3 pages
Data Mining Multiple Choice Quiz
No ratings yet
Data Mining Multiple Choice Quiz
16 pages
Data Mining Insights and Applications
100% (1)
Data Mining Insights and Applications
13 pages
Business Intelligence Exam Prep
No ratings yet
Business Intelligence Exam Prep
8 pages
Final Term Quizzes Compilation - Answer Key
No ratings yet
Final Term Quizzes Compilation - Answer Key
5 pages
DM - One Word Old
No ratings yet
DM - One Word Old
13 pages
Machine Learning & AI Quiz Answers
No ratings yet
Machine Learning & AI Quiz Answers
15 pages
DWDM MCQ Qns 2020
No ratings yet
DWDM MCQ Qns 2020
5 pages
Axioms
No ratings yet
Axioms
3 pages
DS Bits Mid-2 Exam
No ratings yet
DS Bits Mid-2 Exam
4 pages
MCQ On Data Mining
No ratings yet
MCQ On Data Mining
20 pages
ML Suggestions
No ratings yet
ML Suggestions
19 pages
ML Suggestion 2
No ratings yet
ML Suggestion 2
11 pages
MCQs on Data and Feature Engineering
No ratings yet
MCQs on Data and Feature Engineering
15 pages
Lesson '1-Ae20
No ratings yet
Lesson '1-Ae20
16 pages
BI Analytics Overview
No ratings yet
BI Analytics Overview
60 pages
Cloud Notes
No ratings yet
Cloud Notes
13 pages
ICS - Technical Consultant - Oracle - Ashwin Kumar - 5+ Years
No ratings yet
ICS - Technical Consultant - Oracle - Ashwin Kumar - 5+ Years
3 pages
SQP 8
No ratings yet
SQP 8
6 pages
Moments, Skewness & Kurtosis
No ratings yet
Moments, Skewness & Kurtosis
9 pages
GCP Digital Leader Exam Prep Guide
100% (1)
GCP Digital Leader Exam Prep Guide
67 pages
Salesforce Agentforce Specialist Sample Questions v1.0
No ratings yet
Salesforce Agentforce Specialist Sample Questions v1.0
26 pages
Practical File Question 28.09.2022
No ratings yet
Practical File Question 28.09.2022
15 pages
How To Deploy Ethereum On Windows v51
No ratings yet
How To Deploy Ethereum On Windows v51
28 pages
Oracle Database Essentials
No ratings yet
Oracle Database Essentials
17 pages
Slides
No ratings yet
Slides
26 pages
Fundamentals of Information Systems MCQ PDF
No ratings yet
Fundamentals of Information Systems MCQ PDF
32 pages
Prashanth Presentation
No ratings yet
Prashanth Presentation
18 pages
Question Set 1-14 Merged
No ratings yet
Question Set 1-14 Merged
267 pages
OpenText Content Server 10.5 Release Notes
No ratings yet
OpenText Content Server 10.5 Release Notes
131 pages
CS223 - Database Management Systems Short Answer Questions Set - 1
No ratings yet
CS223 - Database Management Systems Short Answer Questions Set - 1
20 pages
SAP T-Codes & Functions Guide
No ratings yet
SAP T-Codes & Functions Guide
4 pages
Three-Tier Architecture of Data Warehouse
No ratings yet
Three-Tier Architecture of Data Warehouse
5 pages
Predictive Analytics Modelling (21CSH-440) : Apex Institute of Technology
No ratings yet
Predictive Analytics Modelling (21CSH-440) : Apex Institute of Technology
42 pages
SAP HANA Implementation Guide
No ratings yet
SAP HANA Implementation Guide
123 pages
Resultsetmetadata:: Date: 1-Mar-24
No ratings yet
Resultsetmetadata:: Date: 1-Mar-24
36 pages
JCL Basics for IT Professionals
No ratings yet
JCL Basics for IT Professionals
11 pages
Chapter 8 - Quiz
No ratings yet
Chapter 8 - Quiz
3 pages
Mysql
No ratings yet
Mysql
169 pages
Python and CSV Files
No ratings yet
Python and CSV Files
35 pages
Crafting a Thesis for TKAM
100% (3)
Crafting a Thesis for TKAM
4 pages
21CSC205P DBMS - Syllabus 1
No ratings yet
21CSC205P DBMS - Syllabus 1
2 pages
Week 5 Assignment For IT 332 Database Management
No ratings yet
Week 5 Assignment For IT 332 Database Management
6 pages
Library Subject Headings Guide
No ratings yet
Library Subject Headings Guide
9 pages

BigData ML

Uploaded by

BigData ML

Uploaded by

5.

Only (i) and (iv) are correct.

All are correct.

It is the kernel of spark

It enables users to run SQL/HQL queries on the top of spark.

Provides an execution platform for all the spark applications

Feature Extraction Problem

Association analysis, define what an item set is.

Data exploration and data preprocessing

Data acquisition and data exploration

Data acquisition and data preprocessing

Data preprocessing and data modelling

Count the number of samples, then determine the initial centroids.

To digitize your data.

To gather your data into one repository

To gain a better understanding of your data.

To generate label for your data

Drop samples with missing values

Replace missing values with median value

Replace a missing value with an outlier

Replace missing values with most probable value.

Removing the outliers

Removing noise from the data

Feature selection and Feature transformation

Cleaning the data

Predicting the price of a stock

Determining whether power usage will rise or fall

Predicting the demand for a product

Bar plots never use aggregation - not sure

Bar plots are drawn for numeric variables

Bar plots always use numeric binner

Determine the regression line that best fits the samples

Determine whether the target is categorical or numerical

Determine the distance between two pairs of samples.

Regression, Classification and Clustering

Regression, Classification, Clustering and Reinforcement Learning

Regression, Classification and Reinforcement

A Sample is an instance or example of an entity.

A variable describes a special characteristic of an entity in your data.

A sample can have many variables to describe it.

Dimensionality reduction is scaling variable values to smaller range

Dimensionality reduction is analysing data in high dimensional space

Generation of synthetic data from original data

Create labeled samples for a classification task

All of these choices are valid uses of the resulting clusters.

Determine anomalous samples

Classify new samples

Classification and clustering

Regression and clustering

Regression and association analysis

Classification and regression

Stop if number of records< some threshold

Stop if improvement in impurity measure< some threshold

Stop when the tree is grown to its maximum size

Both of these - not sure

Powered by Microsoft Forms | Privacy and cookies | Terms of use

-> regression finds a relationship between one or more features (independent

-> Sentiment Analysis is an example of: - Regression, Classification and Reinforcement

You might also like