0% found this document useful (0 votes)
35 views3 pages

IE425 Spring25 Quiz1

The document is a quiz for the IE 425 Data Mining course, consisting of various types of questions including true/false, short answer, and multiple choice, focused on data mining concepts. It covers topics such as structured vs unstructured data, feature selection, regression models, and the implications of feature scaling. Students are instructed to provide reasoning for their answers and to manage their time effectively during the 40-minute quiz.

Uploaded by

ahmetyes123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views3 pages

IE425 Spring25 Quiz1

The document is a quiz for the IE 425 Data Mining course, consisting of various types of questions including true/false, short answer, and multiple choice, focused on data mining concepts. It covers topics such as structured vs unstructured data, feature selection, regression models, and the implications of feature scaling. Students are instructed to provide reasoning for their answers and to manage their time effectively during the 40-minute quiz.

Uploaded by

ahmetyes123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

18.03.

2025

IE 425 Data Mining


Quiz 1, Spring 2025
Duration: 40 minutes
Instructions:
• Answer all questions within the given time.
• Show your reasoning where required. Partial credit may be given for incomplete but well-
reasoned answers.
• For multiple-choice questions, select the best answer.
• For true/false questions, justify your answer with a brief explanation. Explain your reasoning.
• Clearly label your answers.
Good luck!

Questions (10 pts each)


1. True/False: Structured data always consists of numerical values, while unstructured data consists
only of text or images.

2. A dataset contains features with highly different variances. You apply covariance-based feature
selection and observe that some features with lower variance are not selected, even though they are
relevant.
• What could be the reason for this outcome?
• How would you modify the data preprocessing step to ensure fair feature selection?

3. True/False: Lasso (L1-penalized regression) can completely remove some features from a model by
setting their coefficients to zero.

1
4. Short Answer: Describe the geometric and probabilistic interpretations of data. Provide an
example where one interpretation is more useful than the other.
Hint: Think about how data can be represented in a coordinate space (geometric) versus how we
summarize data with distributions (probabilistic).

5. You train a linear regression model on the Boston Housing dataset to predict house prices. You
then add random noise as extra features to the dataset.
• What impact will this have on the model coefficients?
• How can applying L1-penalized regression (Lasso) help mitigate this issue?

6. You are working with a dataset where one feature (X1) has values in the range [0, 1], while another
feature (X2) has values in the range [1000, 5000].
• What issue might arise in models that rely on geometric interpretations?
• How can this issue be corrected?

7. True/False: If a model has both high training error and high test error, it is likely overfitting.

2
8. Multiple Choice: Which of the following can be a consequence of an overly complex model?
a) High bias
b) Low variance
c) Poor generalization to new data
d) Reduced computational time
9. Multiple Choice: Which of the following is an example of an unsupervised learning task?
a) Predicting stock prices based on past trends
b) Grouping customers based on purchasing behavior
c) Classifying emails as spam or not spam
d) Diagnosing diseases using labeled medical data
10. Multiple Choice: Which of the following statements about feature scaling is correct?
a) It is only necessary when features have different units
b) It is important for distance-based models but has no effect on linear regression
c) It ensures all features contribute equally to model training
d) It removes correlation between features

You might also like