100% found this document useful (1 vote)
449 views2 pages

Data Prep Quiz for Analysts

This document is a quiz on data preparation techniques. It covers topics like data quality issues, imputing missing data, outliers, feature selection, and principal component analysis (PCA). The questions ask about identifying data quality issues, replacing missing values, defining outliers, using domain knowledge to address issues, feature selection methods, the goal of feature sets, properties of zero-normalized data, and statements that are not true about PCA.

Uploaded by

Mr.Padmanaban V
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
449 views2 pages

Data Prep Quiz for Analysts

This document is a quiz on data preparation techniques. It covers topics like data quality issues, imputing missing data, outliers, feature selection, and principal component analysis (PCA). The questions ask about identifying data quality issues, replacing missing values, defining outliers, using domain knowledge to address issues, feature selection methods, the goal of feature sets, properties of zero-normalized data, and statements that are not true about PCA.

Uploaded by

Mr.Padmanaban V
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Quiz 4 - Data Preparation

1. Which of the following is NOT a data quality issue?

 Inconsistent data
 Scaled data
 Missing values
 Duplicate data

2. Imputing missing data means to

 replace missing values with something reasonable.


 drop samples with missing values.
 replace missing values with outliers.
 merge samples with missing values.

3. A data sample with values that are considerably different than


the rest of the other data samples in the dataset is called an/a
_____________.

 Outlier
 Invalid data
 Noise
 Inconsistent data

4. Which one of the following examples illustrates the use of


domain knowledge to address a data quality issue?

 Simply discard the samples that lie significantly outside the distribution of your
data
 Drop samples with missing values
 Merge duplicate records while retaining relevant data
 None of these

5. Which of the following is NOT an example of feature selection?

 Adding an in-state feature based on an applicant's home state.


 Re-formatting an address field into separate street address, city, state, and zip
code fields.
 Removing a feature with a lot of missing values.
 Replacing a missing value with the variable mean.

6. Which one of the following is the best feature set for your
analysis?

 Feature set with the smallest set of features that best capture the
characteristics of the data for the intended application
 Feature set with the smallest number of features
 Feature set with the largest number of features
 Feature set that contains exclusively re-coded features

7. The mean value and the standard deviation of a zero-


normalized feature are

 mean = 0 and standard deviation = 0


 mean = 1 and standard deviation = 0
 mean = 0 and standard deviation = 1
 mean = 1 and standard deviation = 1

8. Which of the following is NOT true about PCA?

 PCA stands for principal component analysis


 PC1 and PC2, the first and second principal components, respectively, are always
orthogonal to each other.
 PC1, the first principal component , captures the largest amount of variance in
the data along a single dimension.
 PCA is a dimensionality reduction technique that removes a feature that is
very correlated with another feature.

You might also like