0% found this document useful (0 votes)

42 views10 pages

Boston House Prediction - Colab1

kaggle lab sheet boston house prediction

Uploaded by

Avinash Official

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views10 pages

Boston House Prediction - Colab1

kaggle lab sheet boston house prediction

Uploaded by

Avinash Official

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Avinash Shukla (27)

# IMPORTANT: RUN THIS CELL IN ORDER TO IMPORT YOUR KA # TO THE CORRECT LOCATION
(/kaggle/input) IN YOUR NOT # THEN FEEL FREE TO DELETE THIS CELL.
# NOTE: THIS NOTEBOOK ENVIRONMENT DIFFERS FROM KAGGLE # ENVIRONMENT SO THERE MAY
BE MISSING LIBRARIES USED # NOTEBOOK.

import os import sys

from tempfile import NamedTemporaryFile from [Link] import urlopen
from [Link] import unquote, urlparse from [Link] import HTTPError
from zipfile import ZipFile import tarfile
import shutil

CHUNK_SIZE = 40960
DATA_SOURCE_MAPPING = 'boston-house-price-prediction:

KAGGLE_INPUT_PATH='/kaggle/input'
KAGGLE_WORKING_PATH='/kaggle/working' KAGGLE_SYMLINK='kaggle'

!umount /kaggle/input/ 2> /dev/null

[Link]('/kaggle/input', ignore_errors=True)
[Link](KAGGLE_INPUT_PATH, 0o777, exist_ok=True) [Link](KAGGLE_WORKING_PATH,
0o777, exist_ok=True

try:
[Link](KAGGLE_INPUT_PATH, [Link]("..", 'i except FileExistsError:
pass try:
[Link](KAGGLE_WORKING_PATH, [Link]("..", except FileExistsError:
pass
Avinash Shukla (27)
for data_source_mapping in DATA_SOURCE_MAPPING.split( directory, download_url_encoded =
data_source_map download_url = unquote(download_url_encoded)
filename = urlparse(download_url).path
destination_path = [Link](KAGGLE_INPUT_PATH try:
with urlopen(download_url) as fileres, NamedT total_length = [Link]['content-l
print(f'Downloading {directory}, {total_l dl = 0
data = [Link](CHUNK_SIZE) while len(data) > 0:
dl += len(data) [Link](data)
done = int(50 * dl / int(total_length [Link](f"\r[{'=' * done}{'
[Link]()
data = [Link](CHUNK_SIZE) if [Link]('.zip'):
with ZipFile(tfile) as zfile:
[Link](destination_path) else:
with [Link]([Link]) as tarfil [Link](destination_path)
print(f'\nDownloaded and uncompressed: {d except HTTPError as e:
print(f'Failed to load (likely expired) {down continue
except OSError as e:
print(f'Failed to load {download_url} to path continue

print('Data source import complete.')

Avinash Shukla (27)
# Importing necessary libraries import pandas as pd
import numpy as np
import [Link] as plt import seaborn as sns

# Load the dataset

data = pd.read_csv('/kaggle/input/boston-house-price-

#Displaying the first few rows of the dataframe print("First 5 rows of the dataset:")
print([Link]())
First 5 rows of the dataset:

crim zn indus chas nox rm age dis rad tax ptratio \

0 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3
1 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8
2 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8
3 0.03237 0.0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7
4 0.06905 0.0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7

b lstat medv
0 396.90 4.98 24.0
1 396.90 9.14 21.6
2 392.83 4.03 34.7
3 394.63 2.94 33.4
4 396.90 5.33 36.2

# Checking for any missing values in the dataset print("\nMissing values in the dataset:")
print([Link]().sum())

Missing values in the dataset:

crim 0

zn 0

indus 0

chas 0

nox 0

rm 5

age 0

dis 0

rad 0

tax 0

ptratio 0

b 0

lstat 0

medv 0
dtype: int64

# Displaying the summary statistics of the dataset print("\nSummary Statistics:")

print([Link]())

Summary Statistics:
crim zn indus chas nox rm \
count 506.00000 506.00000 506.00000 506.00000 506.00000 501.000000
0 0 0 0 0
mean 3.613524 11.363636 11.136779 0.069170 0.554695 6.284341
std 8.601545 23.322453 6.860353 0.253994 0.115878 0.705587
min 0.006320 0.000000 0.460000 0.000000 0.385000 3.561000
25% 0.082045 0.000000 5.190000 0.000000 0.449000 5.884000
50% 0.256510 0.000000 9.690000 0.000000 0.538000 6.208000
75% 3.677083 12.500000 18.100000 0.000000 0.624000 6.625000
max 88.976200 100.00000 27.740000 1.000000 0.871000 8.780000
0
Avinash Shukla (27)
age dis rad tax ptratio b \

count 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000

mean 68.574901 3.795043 9.549407 408.237154 18.455534 356.674032
std 28.148861 2.105710 8.707259 168.537116 2.164946 91.294864
min 2.900000 1.129600 1.000000 187.000000 12.600000 0.320000
25% 45.025000 2.100175 4.000000 279.000000 17.400000 375.377500
50% 77.500000 3.207450 5.000000 330.000000 19.050000 391.440000
75% 94.075000 5.188425 24.000000 666.000000 20.200000 396.225000
max 100.000000 12.126500 24.000000 711.000000 22.000000 396.900000

lstat med
count 506.00000 v
0 506.00000
0
mean 12.653063 22.532806
std 7.141062 9.197104
min 1.730000 5.000000
25% 6.950000 17.025000
50% 11.360000 21.200000
75% 16.955000 25.000000
max 37.970000 50.000000

# Visualizing missing data (optional, for better unde [Link](figsize=(10, 6))

[Link]([Link](), cbar=False, cmap="viridis" [Link]('Missing Values Heatmap')
[Link]()

C C

# Visualizing the distribution of a few important fea [Link](figsize=(14, 10))

[Link](2, 2, 1)
[Link](data['crim'], kde=True) [Link]('Crime Rate Distribution')

[Link](2, 2, 2)
[Link](data['rm'], kde=True)
[Link]('Average Number of Rooms Distribution')
Avinash Shukla (27)
[Link](2, 2, 3)
[Link](data['lstat'], kde=True)
[Link]('Lower Status Population (%) Distribution')

[Link](2, 2, 4)
[Link](data['medv'], kde=True)
[Link]('Median Home Value Distribution')

plt.tight_layout() [Link]()

Question No.02
Avinash Shukla (27)
# Calculate the correlation matrix correlation_matrix = [Link]()

# Displaying the correlation matrix print("Correlation Matrix:")

print(correlation_matrix)
Correlation Matrix:

crim zn indus chas nox rm age \

crim 1.000000 -0.200469 0.406583 -0.055892 0.420972 -0.219433 0.352734

zn -0.200469 1.000000 -0.533828 -0.042697 -0.516604 0.311173 -0.569537
indus 0.406583 -0.533828 1.000000 0.062938 0.763651 -0.394193 0.644779

chas -0.055892 -0.042697 0.062938 1.000000 0.091203 0.091468 0.086518

nox 0.420972 -0.516604 0.763651 0.091203 1.000000 -0.302751 0.731470

rm -0.219433 0.311173 -0.394193 0.091468 -0.302751 1.000000 -0.240286

age 0.352734 -0.569537 0.644779 0.086518 0.731470 -0.240286 1.000000

dis -0.379670 0.664408 -0.708027 -0.099176 -0.769230 0.203507 -0.747881

rad 0.625505 -0.311948 0.595129 -0.007368 0.611441 -0.210718 0.456022

tax 0.582764 -0.314563 0.720760 -0.035587 0.668023 -0.292794 0.506456

ptratio 0.289946 -0.391679 0.383248 -0.121515 0.188933 -0.357612 0.261515
b -0.385064 0.175520 -0.356977 0.048788 -0.380051 0.128107 -0.273534
lstat 0.455621 -0.412995 0.603800 -0.053929 0.590879 -0.615721 0.602339
medv -0.388305 0.360445 -0.483725 0.175260 -0.427321 0.696169 -0.376955

dis rad tax ptratio b lstat medv

crim -0.379670 0.625505 0.582764 0.289946 -0.385064 0.455621 -0.388305

zn 0.664408 -0.311948 -0.314563 -0.391679 0.175520 -0.412995 0.360445

indus -0.708027 0.595129 0.720760 0.383248 -0.356977 0.603800 -0.483725

chas -0.099176 -0.007368 -0.035587 -0.121515 0.048788 -0.053929 0.175260

nox -0.769230 0.611441 0.668023 0.188933 -0.380051 0.590879 -0.427321

rm 0.203507 -0.210718 -0.292794 -0.357612 0.128107 -0.615721 0.696169

age -0.747881 0.456022 0.506456 0.261515 -0.273534 0.602339 -0.376955

dis 1.000000 -0.494588 -0.534432 -0.232471 0.291512 -0.496996 0.249929

rad -0.494588 1.000000 0.910228 0.464741 -0.444413 0.488676 -0.381626

tax -0.534432 0.910228 1.000000 0.460853 -0.441808 0.543993 -0.468536

ptratio -0.232471 0.464741 0.460853 1.000000 -0.177383 0.374044 -0.507787

b 0.291512 -0.444413 -0.441808 -0.177383 1.000000 -0.366087 0.333461

lstat -0.496996 0.488676 0.543993 0.374044 -0.366087 1.000000 -0.737663

medv 0.249929 -0.381626 -0.468536 -0.507787 0.333461 -0.737663 1.000000

# Visualize the correlation matrix using a heatmap [Link](figsize=(12, 8))

[Link](correlation_matrix, annot=True, cmap='coo [Link]('Correlation Matrix Heatmap')
[Link]()
Avinash Shukla (27)

C C

# Identify the features with the highest positive and # Assume 'medv' is the target variable (median
home v target_variable = 'medv'
correlation_with_target = correlation_matrix[target_v

# Display the features with highest positive and nega print("\nFeatures with highest positive correlation
w print(correlation_with_target[correlation_with_target

print("\nFeatures with highest negative correlation w

print(correlation_with_target[correlation_with_target

Features with highest positive correlation with house prices:

medv 1.000000
rm 0.696169
zn 0.360445
b 0.333461
dis 0.249929
Name: medv, dtype: float64

Features with highest negative correlation with house prices:

age -0.376955
rad -0.381626
crim -0.388305
nox -0.427321
tax -0.468536
Name: medv, dtype: float64
Avinash Shukla (27)
Question no.03
# Import necessary libraries
from sklearn.model_selection import train_test_split from [Link] import
StandardScaler

# Select the features and the target variable

X = [Link](columns=['medv']) # Features (all colu y = data['medv'] # Target variable (house
prices)

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X

# Standardize the feature variables using StandardSca scaler = StandardScaler()

# Fit the scaler on the training data and transform b X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = [Link](X_test)

# Print shapes to verify the splits

print("Training data shape:", X_train_scaled.shape) print("Testing data shape:", X_test_scaled.shape)
Training data shape: (404, 13)

Testing data shape: (102, 13)

Question No.04

from sklearn.linear_model import LinearRegression

# Impute missing values using the mean for each colum [Link]([Link](), inplace=True)

# Re-select the features and target variable after im

X = [Link](columns=['medv']) # Features (all colu y = data['medv'] # Target variable (house
prices)

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X

# Standardize the feature variables using StandardSca scaler = StandardScaler()

Avinash Shukla (27)
# Fit the scaler on the training data and transform b X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = [Link](X_test)

# Train the Linear Regression model model = LinearRegression()

[Link](X_train_scaled, y_train)
▾ LinearRegression

C C

# Display the model's coefficients and intercept print("Model Coefficients:", model.coef_)

print("Model Intercept:", model.intercept_)
Model Coefficients: [-1.00208747 0.69855082 0.28733122 0.71955092 -2.02070833 3.13708935
-0.17081271 -3.06972351 2.25417948 -1.76697719 -2.04359481 1.12936985
-3.61451369]
Model Intercept: 22.796534653465343

Question No.05

# Import necessary libraries

from [Link] import mean_absolute_error, mean import numpy as np

# Predict the house prices using the testing data y_pred = [Link](X_test_scaled)

# Calculate and display performance metrics # Mean Absolute Error (MAE)

mae = mean_absolute_error(y_test, y_pred) # Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred) # Root Mean Squared Error (RMSE)
rmse = [Link](mse)

print(f"Mean Absolute Error (MAE): {mae}") print(f"Mean Squared Error (MSE): {mse}")
print(f"Root Mean Squared Error (RMSE): {rmse}")
Mean Absolute Error (MAE): 3.2064039639003856 Mean Squared Error (MSE): 24.40482518814648
Root Mean Squared Error (RMSE): 4.940124005341008

# Plot the predicted vs actual house prices plt figure(figsize=(8 6))

Avinash Shukla (27)
[Link](y_test, y_pred, color='blue', edgecolor=' [Link]([min(y_test), max(y_test)], [min(y_test), ma
[Link]('Predicted vs Actual House Prices')
[Link]('Actual House Prices')
[Link]('Predicted House Prices') [Link](True)
[Link]()

2 Program
No ratings yet
2 Program
8 pages
Experiment No 11
No ratings yet
Experiment No 11
19 pages
Machine Learning Laboratory
No ratings yet
Machine Learning Laboratory
23 pages
DSBDA Prac4 2
No ratings yet
DSBDA Prac4 2
1 page
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
33 pages
Exp 12 and 15
No ratings yet
Exp 12 and 15
4 pages
Exp - 2-EDA - CaliforniaData Set - HeatMap - PairPlot-checkpoint - Jupyter Notebook
No ratings yet
Exp - 2-EDA - CaliforniaData Set - HeatMap - PairPlot-checkpoint - Jupyter Notebook
12 pages
Boston Housing Price Prediction Analysis
No ratings yet
Boston Housing Price Prediction Analysis
5 pages
Data Mining Practicals Complete
No ratings yet
Data Mining Practicals Complete
13 pages
ML Lab Program 1& 2
No ratings yet
ML Lab Program 1& 2
6 pages
ML Labmanual
No ratings yet
ML Labmanual
33 pages
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
No ratings yet
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
127 pages
Python ML for Engineers: Week 3
No ratings yet
Python ML for Engineers: Week 3
12 pages
ML Lab File
No ratings yet
ML Lab File
47 pages
ML Lab - Exp1-10
No ratings yet
ML Lab - Exp1-10
4 pages
Data Cleaning EDA
No ratings yet
Data Cleaning EDA
5 pages
The Boston Housing Dataset
100% (2)
The Boston Housing Dataset
4 pages
A4 Dsbda Sana
No ratings yet
A4 Dsbda Sana
16 pages
ML Observation
No ratings yet
ML Observation
29 pages
ML - Datascience Manual
No ratings yet
ML - Datascience Manual
64 pages
Main - Py Text File
No ratings yet
Main - Py Text File
5 pages
California Housing Data Analysis EDA
No ratings yet
California Housing Data Analysis EDA
117 pages
Ass 1 ML
No ratings yet
Ass 1 ML
21 pages
For ML Lab Observation - Ex No 1-10
No ratings yet
For ML Lab Observation - Ex No 1-10
48 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
DALab Part-B BCU&BU
No ratings yet
DALab Part-B BCU&BU
12 pages
Making Predictions
No ratings yet
Making Predictions
13 pages
ML Short Code - Under Updating
No ratings yet
ML Short Code - Under Updating
4 pages
Injecttive Blockchain
No ratings yet
Injecttive Blockchain
14 pages
Real Estate
No ratings yet
Real Estate
10 pages
Week 1 Get Familier With Jupyter Notebook
No ratings yet
Week 1 Get Familier With Jupyter Notebook
4 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
26 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
42 pages
DA Manual - Part B
No ratings yet
DA Manual - Part B
13 pages
Exp 1 A
No ratings yet
Exp 1 A
5 pages
House Price Prediction for Analysts
No ratings yet
House Price Prediction for Analysts
91 pages
ML Lab Assessment3.Ipynb - Colab
No ratings yet
ML Lab Assessment3.Ipynb - Colab
3 pages
ML Merged
No ratings yet
ML Merged
28 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
P04 The Regression Pipeline - Preprocessing Ans
No ratings yet
P04 The Regression Pipeline - Preprocessing Ans
19 pages
ML 3
No ratings yet
ML 3
24 pages
Data Visualization & Preprocessing Guide
No ratings yet
Data Visualization & Preprocessing Guide
18 pages
Data Exploration with Python on Kaggle
No ratings yet
Data Exploration with Python on Kaggle
20 pages
M PDF
No ratings yet
M PDF
13 pages
FOUND. DATA SCIENCE Practical
No ratings yet
FOUND. DATA SCIENCE Practical
15 pages
Data Analytics I: Link of The Dataset
No ratings yet
Data Analytics I: Link of The Dataset
12 pages
Lab Questionbank
No ratings yet
Lab Questionbank
3 pages
Exercise3 Solution
No ratings yet
Exercise3 Solution
19 pages
West Rox
No ratings yet
West Rox
29 pages
Data Preprocessing 2
No ratings yet
Data Preprocessing 2
5 pages
Boston Housing Price Prediction
No ratings yet
Boston Housing Price Prediction
9 pages
MLLab Manual
No ratings yet
MLLab Manual
24 pages
ML Lab Manual
No ratings yet
ML Lab Manual
110 pages
ML Final Prac
No ratings yet
ML Final Prac
47 pages
Linear Regression Analysis - Polynomial Regression
No ratings yet
Linear Regression Analysis - Polynomial Regression
25 pages
Quantile Bins and Countplots for Diamonds Dataset
No ratings yet
Quantile Bins and Countplots for Diamonds Dataset
12 pages
Proposal Workshop 1 PowerBI 0tab
No ratings yet
Proposal Workshop 1 PowerBI 0tab
2 pages
Implementasi Data Mining Menggunakan Algoritma Apriori
No ratings yet
Implementasi Data Mining Menggunakan Algoritma Apriori
10 pages
Facebook SWOT & TOWS Analysis
78% (9)
Facebook SWOT & TOWS Analysis
1 page
Hasil Uji Paired T Test
No ratings yet
Hasil Uji Paired T Test
2 pages
Fisher and Bloomfield Understanding The Research Process
100% (1)
Fisher and Bloomfield Understanding The Research Process
7 pages
S4 LogisticRegression 15jan2025
No ratings yet
S4 LogisticRegression 15jan2025
25 pages
Thesis Proposal Defense Checklist
No ratings yet
Thesis Proposal Defense Checklist
4 pages
PGCBA 6 Brochure
No ratings yet
PGCBA 6 Brochure
12 pages
Data Analysis and Visualization
No ratings yet
Data Analysis and Visualization
8 pages
PROBLEM SENSING FOR TEACHERS AND MTs
No ratings yet
PROBLEM SENSING FOR TEACHERS AND MTs
91 pages
HW Ses3 Solutions
No ratings yet
HW Ses3 Solutions
5 pages
Measures of Variability Explained
No ratings yet
Measures of Variability Explained
11 pages
Organizational Information Systems
100% (2)
Organizational Information Systems
15 pages
Random Walk Models in Economics
No ratings yet
Random Walk Models in Economics
1 page
Unit 5 and 6
No ratings yet
Unit 5 and 6
9 pages
APP006 REVIEWER Hotdog
No ratings yet
APP006 REVIEWER Hotdog
6 pages
DSA Study Dug Yudfjuy
No ratings yet
DSA Study Dug Yudfjuy
12 pages
Understanding Big Data Fundamentals
No ratings yet
Understanding Big Data Fundamentals
10 pages
Explore Your Course 1 Scenario TheLook
No ratings yet
Explore Your Course 1 Scenario TheLook
3 pages
CH 04 Ex 8
No ratings yet
CH 04 Ex 8
2 pages
844-Data Science
No ratings yet
844-Data Science
2 pages
Statistical Analysis, Chapter 4
No ratings yet
Statistical Analysis, Chapter 4
31 pages
BES Case Study Presentation-Tut-2-Group 3-Ms Phuong
No ratings yet
BES Case Study Presentation-Tut-2-Group 3-Ms Phuong
38 pages
Marketing Impact on Vitacimin Sales
No ratings yet
Marketing Impact on Vitacimin Sales
10 pages
Qualitative Research Fundamentals Guide
No ratings yet
Qualitative Research Fundamentals Guide
7 pages
Chikwama Strategic Planning Practices
No ratings yet
Chikwama Strategic Planning Practices
109 pages
CSCI 688 Homework 6: Megan Rose Bryant Department of Mathematics William and Mary November 12, 2014
No ratings yet
CSCI 688 Homework 6: Megan Rose Bryant Department of Mathematics William and Mary November 12, 2014
12 pages
Answer For Assignment I For Biostatistics Course 2024 PG1 1
No ratings yet
Answer For Assignment I For Biostatistics Course 2024 PG1 1
27 pages
Agroforestry Practices in Ramechhap Ward 2
No ratings yet
Agroforestry Practices in Ramechhap Ward 2
21 pages
Using Gretl For POE4
No ratings yet
Using Gretl For POE4
501 pages