Data Science MCQs
Time: 30 Minutes Marks: 30
* Indicates required question
Email *
[email protected]
Which of the following is a method for dealing with high cardinality categorical *
variables?
A) One-hot encoding
B) Min-Max Scaling
C) Frequency encoding
D) Imputation
What is the purpose of standardization in data preprocessing? *
A) To scale data to a range of 0 to 1
B) To remove outliers from the dataset
C) To ensure the data has a mean of 0 and a standard deviation of 1
D) To handle missing values
How does feature scaling help in machine learning models like k-NN or SVM? *
A) It reduces the dataset size
B) It prevents overfitting
C) It ensures that features contribute equally to distance calculations
D) It improves model interpretability
Which of the following is NOT a common technique to handle class imbalance in *
a dataset?
A) Oversampling
B) Undersampling
C) One-hot encoding
D) Synthetic data generation (SMOTE)
Why is it important to shuffle data before splitting it into training and testing sets? *
A) To remove outliers
B) To avoid any bias due to the order of the data
C) To improve model accuracy
D) To reduce dimensionality
What type of scaling method would you use for features that follow a normal *
distribution?
A) Min-Max scaling
B) Z-score normalization
C) One-hot encoding
D) Log scaling
Which data preprocessing step is essential when dealing with categorical *
variables in a linear regression model?
A) One-hot encoding
B) Min-Max Scaling
C) Log transformation
D) Imputation
Which technique is commonly used to handle missing data? *
A) One-hot encoding
B) Imputation
C) Dimensionality reduction
D) PCA
Which of the following can cause a machine learning model to perform poorly? *
A) Feature scaling
B) Feature engineering
C) Irrelevant or redundant features
D) Data splitting
Which of the following is NOT a data preprocessing step? *
A) Data normalization
B) Data augmentation
C) Model evaluation
D) Missing value imputation
Which of the following methods can be used to detect outliers in a dataset? *
A) Min-Max Scaling
B) Z-Score Method
C) One-hot encoding
D) Imputation
One-hot encoding is typically applied to which type of data? *
A) Numerical data
B) Ordinal data
C) Categorical data
D) Continuous data
In which situation would you apply dimensionality reduction techniques like PCA? *
A) When the dataset contains missing values
B) When the dataset contains a large number of correlated features
C) When you want to remove outliers
D) When the dataset has no categorical variables
Which of the following is a method to reduce overfitting in decision trees? *
A) Feature scaling
B) Pruning
C) Z-Score normalization
D) One-hot encoding
Which of the following is used to deal with multicollinearity in regression *
problems?
A) Standardization
B) L2 Regularization
C) One-hot encoding
D) Min-Max scaling
What is the result of applying Principal Component Analysis (PCA) on a dataset? *
A) Reduced number of features while retaining as much variance as possible
B) Elimination of duplicate rows in the dataset
C) Removal of outliers
D) Increase in the number of features
Which of the following is an example of feature extraction? *
A) Scaling numeric features
B) Transforming categorical data into numerical format
C) Using PCA to reduce feature dimensionality
D) Filling missing values in the dataset
Which of the following can be used to fill missing numerical values? *
A) Mean, median, or mode
B) One-hot encoding
C) PCA
D) Z-Score
What does it mean if a dataset is said to be “highly imbalanced”? *
A) The dataset contains a large number of features
B) One or more classes occur much more frequently than others
C) The dataset has many missing values
D) The dataset contains outliers
Which technique reduces the dimensionality of a dataset by creating new *
features based on the old ones?
A) Feature scaling
B) Feature selection
C) Feature extraction
D) Data augmentation
What is the main reason for splitting a dataset into training and testing sets? *
A) To improve the performance of the model
B) To prevent overfitting
C) To assess the model’s generalization ability
D) To generate more data
Min-Max scaling transforms the data by bringing all values between: *
A) 0 and 10
B) -1 and 1
C) 0 and 1
D) -10 and 10
How does SMOTE (Synthetic Minority Over-sampling Technique) handle *
imbalanced datasets?
A) By undersampling the majority class
B) By oversampling the majority class
C) By generating synthetic examples for the minority class
D) By removing outliers in the dataset
What is the primary goal of feature selection? *
A) To remove noise from the data
B) To select features that have the most impact on the target variable
C) To standardize features
D) To impute missing data
Which of the following is NOT a common strategy for dealing with missing data? *
A) Deleting rows with missing values
B) Filling missing values with zeros
C) Filling missing values using a machine learning model
D) Ignoring the missing values during training
What is the main purpose of data preprocessing in machine learning? *
A) To create new features
B) To improve the quality of the data
C) To discard irrelevant data
D) To balance the dataset
What is the primary function of the Box-Cox transformation? *
A) To reduce the number of features
B) To normalize a distribution to make it more Gaussian
C) To handle missing values
D) To encode categorical variables
When would you apply log transformation to a feature in a dataset? *
A) When the feature contains negative values
B) When the feature has a normal distribution
C) When the feature has a skewed distribution
D) When the feature is categorical
Which of the following is NOT a characteristic of robust scaling? *
A) It uses the median for centering the data
B) It scales the data based on percentiles
C) It is highly sensitive to outliers
D) It handles data with outliers better than Min-Max scaling
What is the main purpose of data normalization? *
A) To reduce the number of features
B) To encode categorical variables
C) To scale numeric features to a common range
D) To remove missing values
Submit Page 1 of 1 Clear form
This form was created inside of Indian Institute of Information Technology, Nagpur. Report Abuse
Forms