0% found this document useful (0 votes)
163 views2 pages

Data Preprocessing Questions

This document contains theory and problem-based questions related to data preprocessing, covering topics such as data cleaning, data quality, normalization techniques, and strategies for data transformation. It includes practical exercises on smoothing, normalization, and binning methods with specific datasets. The questions aim to deepen understanding of data preprocessing concepts and their application in real-world scenarios.

Uploaded by

ssitavinya2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
163 views2 pages

Data Preprocessing Questions

This document contains theory and problem-based questions related to data preprocessing, covering topics such as data cleaning, data quality, normalization techniques, and strategies for data transformation. It includes practical exercises on smoothing, normalization, and binning methods with specific datasets. The questions aim to deepen understanding of data preprocessing concepts and their application in real-world scenarios.

Uploaded by

ssitavinya2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Data Preprocessing - Unit 2 Chapter 3 Questions

Theory Questions

1. Differentiate between data cleaning, data integration, data reduction, and data transformation with suitable

examples.

2. Enlist and briefly explain the six key elements of data quality.

3. Explain the need for data preprocessing in real-world data mining applications.

4. Differentiate between dimensionality reduction and numerosity reduction.

5. Enlist and describe different methods to handle missing values during data cleaning.

6. Explain the concept of normalization. What are the commonly used normalization techniques?

7. Explain the steps involved in data integration. How does it help avoid redundancies and inconsistencies?

8. Describe the different strategies for data transformation with examples (e.g., smoothing, aggregation).

9. Explain the process of data discretization and concept hierarchy generation with examples.

10. Differentiate between supervised and unsupervised discretization, and between top-down and bottom-up

approaches.

Problem-Based Questions

1. A dataset contains age values: 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35,

35, 36, 40, 45, 46, 52, 70.

(a) Use smoothing by bin means with bin size 3.

(b) Comment on the effect of smoothing.

2. Normalize the values 200, 300, 400, 600, 1000 using:

(a) Min-max normalization with range [0,1]

(b) Z-score normalization

(c) Decimal scaling normalization.


Data Preprocessing - Unit 2 Chapter 3 Questions

3. Use min-max normalization to transform the value 35 from a dataset where min = 13 and max = 70.

4. Given attributes: age and body fat for a dataset.

(a) Perform Z-score normalization

(b) Compute correlation coefficient and determine the correlation type.

5. A sales dataset has values: 5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215.

Apply equal-width and equal-frequency binning.

Comment on the advantages of each.

You might also like