Housing Price Prediction Linear Regression Assignment
Housing Price Prediction Linear Regression Assignment
Overview
This assignment will help you develop practical skills in linear regression modeling, from basic
single-predictor models to advanced techniques including regularization. You will work with
real-world housing data to predict house prices.
Notation: Throughout this assignment, use b for the intercept and w (with subscripts w1 , w2 , . . .
for multiple features) for coefficients.
Learning Outcomes
By completing this assignment, you will be able to:
Dataset
California Housing Dataset (available via scikit-learn)
1
House Price Prediction Linear Regression Assignment
Deliverable: Include a summary table of statistics and a statement about data quality in your
report.
Deliverable: Scatter plot with regression line, model equation, performance metrics, and
interpretation.
Page 2
House Price Prediction Linear Regression Assignment
4. Compare the performance with the simple linear regression from Part 1:
• R² comparison
• RMSE comparison
1. Create a correlation matrix including all features and the target variable
3. Identify and list the top 3 features most correlated with house value
4. Check for multicollinearity: Are there any pairs of features highly correlated with each
other? (correlation > 0.7 or < −0.7)
Page 3
House Price Prediction Linear Regression Assignment
3. Analyze the plot: Is there any visible pattern (e.g., funnel shape, curves)?
4. Interpretation: What does the pattern (or lack thereof) indicate about your model?
Deliverable: Residual plot, histogram, Q-Q plot, outlier analysis table, and interpretations.
Page 4
House Price Prediction Linear Regression Assignment
2. Fit the multiple linear regression model (from Task 2.1) on the training data only
• Training R²
• Testing R²
• Training RMSE
• Testing RMSE
4. Create a comparison table or bar chart showing training vs. testing performance
5. Analysis:
• RoomsPerHousehold = AveRooms
AveOccup
• BedroomsPerRoom = AveBedrms
AveRooms
Page 5
House Price Prediction Linear Regression Assignment
6. Interpretation: Which engineered feature(s) seem most valuable? How can you tell?
Page 6
House Price Prediction Linear Regression Assignment
1. Implement Ridge Regression with the following alpha values: [0.1, 1, 10, 100]
• Use the feature set from Task 3.2 (with engineered features)
• For each alpha, train on the training set
• Report the testing R² score for each alpha
2. Implement Lasso Regression with the same alpha values: [0.1, 1, 10, 100]
6. Analysis:
• Which features did Lasso shrink to zero (or near zero, |w| < 0.01)?
• What does this tell you about feature importance?
• Explain the difference between Ridge and Lasso in your own words
Deliverable: Alpha vs. R² plot, coefficient comparison table, and regularization analysis.
Page 7
House Price Prediction Linear Regression Assignment
4 Deliverables
4.1 1. Python Code (Jupyter Notebook or .py file)
• Well-commented code explaining each step
• Clear section headers matching the assignment parts
• Code should run without errors
• Include all necessary imports at the beginning
Page 8
House Price Prediction Linear Regression Assignment
Visualization Guidelines:
Page 9
House Price Prediction Linear Regression Assignment
5 Grading Rubric
Component Points
Part 1: Data Exploration & Single Predictor Modeling 30
Task 1.1: Understanding Your Data 10
Task 1.2: Building Your First Model 20
Part 2: Multiple Predictors & Model Diagnostics 40
Task 2.1: Expanding to Multiple Features 15
Task 2.2: Understanding Feature Relationships 10
Task 2.3: Validating Model Assumptions 15
Part 3: Model Optimization & Enhancement 30
Task 3.1: Evaluating Model Generalization 10
Task 3.2: Creating Better Features 10
Task 3.3: Preventing Overfitting with Regularization 10
Code Quality & Documentation 10
Clear comments and structure 5
Code runs without errors 5
Report Quality & Analysis 15
Clear writing and organization 5
Depth of analysis and insights 7
Professional formatting 3
Visualizations 10
All required plots included 5
Quality and clarity of visualizations 5
Total 100
Page 10
House Price Prediction Linear Regression Assignment
6 Submission Guidelines
What to Submit
Create a ZIP file containing:
Submission Platform
Moodle
Important Dates
• Assignment Release: Thu. Oct 09 2025
• Due Date: Wed. Oct 22 2025 at 11:50 pm
Academic Integrity
• You may discuss concepts with classmates, but all code and writing must be your own
• Properly cite any external resources or code snippets used
• Use of AI tools (ChatGPT, Copilot, etc.) must be disclosed in your report
• Plagiarism will result in zero points and potential disciplinary action
7 Resources
Required Python Libraries
Install using pip or conda:
pip install numpy pandas matplotlib seaborn scikit - learn
Helpful Documentation
• Scikit-learn: https://scikit-learn.org/stable/
• Linear Regression: https://scikit-learn.org/stable/modules/linear_model.html
• California Housing Dataset: https://scikit-learn.org/stable/datasets/real_world.
html#california-housing-dataset
• Matplotlib: https://matplotlib.org/
• Seaborn: https://seaborn.pydata.org/
• Pandas: https://pandas.pydata.org/
Page 11
House Price Prediction Linear Regression Assignment
# Import libraries
import numpy as np
import pandas as pd
import matplotlib . pyplot as plt
import seaborn as sns
from sklearn . datasets import f e t c h _ c a l i f o r n i a _ h o u s i n g
from sklearn . model_selection import train_test_split
from sklearn . linear_model import LinearRegression , Ridge , Lasso
from sklearn . metrics import r2_score , mea n_ sq ua re d_ er ro r
# Load dataset
california = f e t c h _ c a l i f o r n i a _ h o u s i n g ()
X = pd . DataFrame ( california . data , columns = california . feature_names )
y = california . target
Good Luck!
Remember: The goal is to learn and understand linear regression,
not just to get the right answers. Show your thinking process!
Page 12