Machine Learning Techniques
Assignment-4
Aim – Prepare data through cleaning, transformation, and feature engineering for
model training.
Theory –
Q.1 Why is data preparation considered one of the most time-consuming but critical steps in
the CRISP-ML process?
Q.2 Differentiate between data cleaning, data transformation, and feature engineering with one
example each.
Q.3 Explain the difference between normalization and standardization. In which scenarios
would you prefer one over the other?
Q.4 Suppose you have time-series data (daily sales). Which feature engineering techniques
could you apply to make the data more useful for machine learning models?
Q.5 Why is it important to detect and treat data leakage during feature engineering? Give an
example.
Q.6 You are preparing a dataset for predicting student exam performance with features like
`StudyHours`, `Attendance`, and `ParentalIncome`.Which data cleaning issues might you
expect? What transformations could improve the dataset? Suggest two new features you
could engineer.
Reference Study Material -
Web References :
1. What is feature engineering | Feature Engineering Tutorial Python # 1
https://www.youtube.com/watch?v=pYVScuY-GPk
2. Effective Data Cleaning and Preparation Techniques | Stanford Data Ocean Lecture
Series #2
https://www.youtube.com/watch?v=7D2Ngvgrl5Y
3. Practical AI 009: Data Transformation and Cleaning
https://www.youtube.com/watch?v=3Gac0g7KCQA
4. Feature Engineering Full Course - in 1 Hour | Beginner Level
https://www.youtube.com/watch?v=uu8um0JmYA8