0% found this document useful (0 votes)
33 views2 pages

Assignment 2

The assignment focuses on training a regression model using the Boston Housing dataset to predict median home values, emphasizing prediction accuracy and interpretability. It includes steps for data exploration, preprocessing, model selection, evaluation using various metrics, and feature importance analysis. Additionally, there is a bonus section for improving model performance through techniques like feature engineering or regularization, with submission guidelines for a well-documented Jupyter Notebook or Python script.

Uploaded by

queensman43
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views2 pages

Assignment 2

The assignment focuses on training a regression model using the Boston Housing dataset to predict median home values, emphasizing prediction accuracy and interpretability. It includes steps for data exploration, preprocessing, model selection, evaluation using various metrics, and feature importance analysis. Additionally, there is a bonus section for improving model performance through techniques like feature engineering or regularization, with submission guidelines for a well-documented Jupyter Notebook or Python script.

Uploaded by

queensman43
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Assignment 2: Regression Task with evaluation methods

Objective:
Train a regression model on a real-world dataset, emphasizing both prediction accuracy and
interpretability.

Dataset:
Use the Boston Housing dataset from the UCI repository. This dataset measures various factors
about houses in Boston suburbs and aims to predict the median value of owner-occupied
homes.

Data Exploration & Preprocessing:


● Load the dataset.
● Handle missing values if any.
● Visualize the distribution of the target variable (MEDV - Median value of homes).
● Explore relationships between predictors and the target variable using scatter plots or
correlation matrix.

Data Splitting:
Divide the dataset into a training set (70%) and a test set (30%).

Model Selection & Training:


● Choose a regression algorithm of your choice (e.g., Linear Regression, Decision Trees,
SVM regression, etc.).
● Train the model using the training data.

Model Evaluation:
● Use the test set to evaluate the model.
● Calculate the following evaluation metrics on the test set:
○ Mean Absolute Error (MAE)
○ Mean Squared Error (MSE)
○ Root Mean Squared Error (RMSE)
○ R-squared (Coefficient of Determination)
○ Adjusted R-squared
Feature Importance:
● Depending on the chosen model, determine the importance of each feature.
● Discuss the significance of each feature in predicting the target variable.

Improvement (Bonus):
● Apply at least one technique to improve the model's performance, such as:
○ Feature engineering.
○ Polynomial regression.
○ Regularization techniques (L1/L2).
● Re-evaluate the model using the metrics and compare with the initial model.

Submission Guidelines:
● Submit a Jupyter Notebook or a Python script containing all the code used for the
assignment.
● The code should be well-commented to explain your reasoning at each step.
● Include visualizations for data exploration, feature importance, and results.
● A report (1-2 pages) summarizing your findings, the model's performance metrics, and
any conclusions drawn from the exercise.

Evaluation Criteria:
1. Data Preprocessing: Clean handling and transformation of data.
2. Implementation: Correctness and clarity of code.
3. Evaluation: Proper and correct computation of metrics.
4. Interpretation: Insight into feature importance and model performance.
5. Improvement: Effectiveness and clarity of the improvement technique.

By the end of this assignment, you should have a solid understanding of regression tasks, the
intricacies of feature interactions, and the significance of model interpretability.

You might also like