0% found this document useful (0 votes)

21 views15 pages

Task 1 - Data Analytics in Python

This report investigates the Manchester Housing dataset to identify factors influencing property values using the CRISP DM framework. Key findings indicate that floor space and the number of bedrooms significantly impact prices, while waterfront status has a minor effect. The analysis recommends focusing on larger properties with better amenities for pricing strategies and investment decisions.

Uploaded by

sahilsahilkamboj510

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views15 pages

Task 1 - Data Analytics in Python

Uploaded by

sahilsahilkamboj510

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

COM7024

Msc Data Science

Programming for Data

Analytics

Investigating the Manchester Housing Market

STU218659

Lee Braiden
Investigating the Manchester Housing Market
The main goal of this report is to examine the Manchester Housing dataset and offer
insights to help make informed decisions. The analysis is based on the CRISP DM (Cross
Industry Standard Process, for Data Mining) framework encompassing stages like Business
Understanding, Data Understanding, Data Preparation, Modeling, Evaluation and Deployment.
Within this report are statistical examinations the application of the Central Limit Theorem and
Python utilization, for data analysis.

Exploring Business Factors

The main objective is to pinpoint the elements that impact property values in Manchester
specifically looking at features, like footage, construction year, proximity to water and available
amenities. This study seeks to provide insights, for pricing tactics, real estate development
choices and potential investment prospects.

Data Understanding

Dataset Overview

The dataset contains various attributes of properties in Manchester, including:

• Price

• Waterfront status

• Floor Space

• Year Built

• Bedrooms

• Bathrooms

• Location

• Property Type

• Condition
• Lot Size

• Amenities

First, we loaded the dataset and displayed the first 10 rows for initial inspection.

Descriptive Statistics

In this study we analyzed the statistics, for waterfront homes to get insights, into their
characteristics and variations. The findings revealed that waterfront properties generally
command prices offer spacious living areas and come with a greater range of amenities
compared to non-waterfront properties.

Data Preparation

Data Cleaning and Transformation

It is important to find and fill in missing values accurately for analysis. We replaced missing
values, with the occurring value for categorical variables and made sure to verify and adjust data
types as needed. This process guaranteed that all data points were ready for use and maintained
consistency, for analysis.

Statistical Test: T-test

A statistical test known as a T test was performed to analyze the price disparity between
properties near water and those that are not. The results showed a T statistic of 0.210 and a p
value of 0.836 suggesting that there is a slight difference in prices, between waterfront and non-
waterfront properties.

Central Limit Theorem Demonstration

To explain the Central Limit Theorem, we took samples from the dataset. Graphed the averages
of these samples. The outcome showed that the distribution of sample averages resembled a
distribution. This proves that as the sample size grows the average price becomes normally
distributed, regardless of whether the original price distribution's normal or not.
Modeling and Analysis

Correlation Analysis

Correlation matrices were computed before and after data preprocessing to understand
relationships between numeric variables. Key correlations identified include:

• A moderate positive correlation (0.390) between Floor Space and Price.

• A minor correlation (0.094) between Year Built and Price.

• A minor correlation (0.045) between Waterfront status and Price.

Heatmaps were used to visualize these correlations, highlighting the relationships between
different property attributes.

Visualizations

Several plots were created to visualize relationships between variables:

• Distribution of Floor Space: This histogram showed the spread and central tendency of
floor space across properties.

• Year Built vs. Price: A scatter plot revealed a positive trend, indicating that newer
properties tend to be priced higher.

• Floor Space vs. Price: A scatter plot demonstrated a clear positive relationship,
suggesting that larger properties command higher prices.

• Waterfront vs. Price: A box plot showed that waterfront properties generally have higher
median prices, though the variability within each category was considerable.

Evaluation

The analysis revealed key insights:

• There is a moderate positive correlation between Floor Space and Price.

• Bedrooms and Bathrooms have strong positive correlations with Price.

• Waterfront status has a minor impact on Price, as indicated by the T-test results.
These findings suggest that while certain factors like floor space and the number of bedrooms
significantly influence property prices, others like the year built and waterfront status have less
impact.

Recommendations

1. Focus on Floor Space and Amenities: Properties with larger floor space and better
amenities should be priced higher, as these factors significantly influence property prices.

2. Year Built Consideration: While newer properties are slightly more valuable, this factor
is less significant compared to floor space and amenities.

3. Investment in Non-Waterfront Properties: Given the minor price difference between

waterfront and non-waterfront properties, investing in well-located non-waterfront
properties with good amenities might be more cost-effective.

The thorough investigation of the Manchester Housing dataset has given us information,
about the factors affecting property prices. By using the CRISP DM framework, we carefully
studied the data, utilized techniques and drew significant conclusions to guide our strategic
choices.
References

Han, J., Pei, J., & Kamber, M. (2011). Data Mining: Concepts and Techniques. Elsevier.

McKinney, W. (2010). Data Analysis with Python. O'Reilly Media.

Silver, N. (2012). The Signal and the Noise: Why So Many Predictions Fail--but Some Don't.
Penguin.
Appendix

# Importing required libraries

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from scipy import stats

# Path of Manchester Housing dataset

file_path = r'C:\Users\Administrator\Desktop\DataAnalytics\manchester_housing_data.csv'
data = pd.read_csv(file_path)

# Display the first few rows of the dataset

print("First 10 rows of the dataset:")
print(data.head(10))

# Descriptive statistics for waterfront properties

print("statistics for waterfront properties:")
waterfront_properties = data[data['Waterfront'] == 1]
print(waterfront_properties.describe())

# Graph the distribution of floor space

plt.figure(figsize=(10, 6))
sns.histplot(data['Floor Space'], kde=True)
plt.title('Distribution of Floor Space')
plt.xlabel('Floor Space (sq ft)')
plt.ylabel('Frequency')
plt.show()

# Correlation matrix for numeric columns

print("\nCorrelation matrix for numeric columns:")
numeric_cols = data.select_dtypes(include=[np.number])
correlation_matrix = numeric_cols.corr()
print(correlation_matrix)

# Visualize the correlation matrix

plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()

# Scatter plot for Year Built vs. Price

plt.figure(figsize=(10, 6))
sns.scatterplot(x='Year Built', y='Price', data=data)
plt.title('Year Built vs. Price')
plt.xlabel('Year Built')
plt.ylabel('Price')
plt.show()
# Scatter plot for Floor Space vs. Price
plt.figure(figsize=(10, 6))
sns.scatterplot(x='Floor Space', y='Price', data=data)
plt.title('Floor Space vs. Price')
plt.xlabel('Floor Space (sq ft)')
plt.ylabel('Price')
plt.show()

# Box plot for Waterfront vs. Price

plt.figure(figsize=(10, 6))
sns.boxplot(x='Waterfront', y='Price', data=data)
plt.title('Waterfront vs. Price')
plt.xlabel('Waterfront')
plt.ylabel('Price')
plt.show()

# Correlation between Floor Space and Price

correlation_floor_space_price = data['Floor Space'].corr(data['Price'])
print(f"\nCorrelation between Floor Space and Price:
{correlation_floor_space_price:.3f}")

# Correlation between Year Built and Price

correlation_year_price = data['Year Built'].corr(data['Price'])
print(f"Correlation between Year Built and Price: {correlation_year_price:.3f}")

# Central Limit Theorem

sample_means = []
for _ in range(1000):
sample = data['Price'].sample(30, replace=True)
sample_means.append(sample.mean())

plt.figure(figsize=(10, 6))
sns.histplot(sample_means, kde=True)
plt.title('Sampling Distribution of the Sample Mean [Central Limit Theorem]')
plt.xlabel('Sample Mean of Price')
plt.ylabel('Frequency')
plt.show()

# T-test (Statistical test) to compare prices of waterfront vs. non-waterfront properties

print("\nPerforming T-test to compare prices of waterfront vs. non-waterfront
properties:")
waterfront_prices = data[data['Waterfront'] == 1]['Price']
non_waterfront_prices = data[data['Waterfront'] == 0]['Price']

t_stat, p_val = stats.ttest_ind(waterfront_prices, non_waterfront_prices)

print(f"Results: t-statistic = {t_stat:.3f}, p-value = {p_val:.3f}")
# Identifying missing values in data
print("\nIdentifying missing values in the dataset:")
missing_values = data.isnull().sum()
print("Missing Values in Dataset:\n", missing_values)
# Impute missing values
data['Amenities'] = data['Amenities'].fillna(data['Amenities'].mode()[0])

# Checking data types and converting them if necessary

data['Price'] = data['Price'].astype(float)
data['Waterfront'] = data['Waterfront'].astype(int)
data['Floor Space'] = data['Floor Space'].astype(float)
data['Year Built'] = data['Year Built'].astype(int)

# To use only numeric columns for correlation

numeric_cols_post = data.select_dtypes(include=[np.number])
correlation_matrix_post = numeric_cols_post.corr()
print("\nCorrelation Matrix after Preprocessing:\n", correlation_matrix_post)

# Visualize the updated correlation matrix

plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix_post, annot=True, cmap='coolwarm')
plt.title('Updated Correlation Matrix')
plt.show()

Output (in sequence)

EDA and Hypothesis Testing On KC Housing Data: Daniele Sammarco - Exploratory Data Analysis For Machine Learning by IBM
No ratings yet
EDA and Hypothesis Testing On KC Housing Data: Daniele Sammarco - Exploratory Data Analysis For Machine Learning by IBM
9 pages
House Price Pridiction Prabhjotsingh2
No ratings yet
House Price Pridiction Prabhjotsingh2
14 pages
Coding
No ratings yet
Coding
7 pages
House Price Prediction Models Analysis
No ratings yet
House Price Prediction Models Analysis
27 pages
Capstone Project 6 April
No ratings yet
Capstone Project 6 April
64 pages
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
No ratings yet
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
127 pages
Ese Lab File
No ratings yet
Ese Lab File
30 pages
FML PROJECT Diya
No ratings yet
FML PROJECT Diya
9 pages
King County House Sales Data Analysis
No ratings yet
King County House Sales Data Analysis
11 pages
House Value
No ratings yet
House Value
22 pages
Problem Statement
No ratings yet
Problem Statement
6 pages
Making Predictions
No ratings yet
Making Predictions
13 pages
Advanced Regression Techniques Based Housing Price Prediction Model
No ratings yet
Advanced Regression Techniques Based Housing Price Prediction Model
11 pages
Girish Chadha Capstone Final Report Submission 16 Jul 23
No ratings yet
Girish Chadha Capstone Final Report Submission 16 Jul 23
33 pages
Machine Learning for Real Estate
No ratings yet
Machine Learning for Real Estate
9 pages
Laboratory Eercise 4.1 - Del Pilar
No ratings yet
Laboratory Eercise 4.1 - Del Pilar
9 pages
ADV Exp 5 2022301014
No ratings yet
ADV Exp 5 2022301014
9 pages
Report
No ratings yet
Report
40 pages
Housing Prices AI
No ratings yet
Housing Prices AI
10 pages
House Price Prediction with Regression
No ratings yet
House Price Prediction with Regression
20 pages
House Price Prediction
No ratings yet
House Price Prediction
14 pages
Capstone Project Submission
100% (2)
Capstone Project Submission
31 pages
Dawit House
No ratings yet
Dawit House
49 pages
ML Project CLG
No ratings yet
ML Project CLG
62 pages
Major Project Guide
No ratings yet
Major Project Guide
5 pages
FALLSEM2021-22 MDI4001 ETH VL2021220104135 Reference Material I 09-Aug-2021 Data2 1
No ratings yet
FALLSEM2021-22 MDI4001 ETH VL2021220104135 Reference Material I 09-Aug-2021 Data2 1
9 pages
House-Price-Prediction-Using-Regression-Techniques Retouch - Removed
No ratings yet
House-Price-Prediction-Using-Regression-Techniques Retouch - Removed
14 pages
Data Analysis Project MAIN
No ratings yet
Data Analysis Project MAIN
6 pages
Intro to ML with Sklearn & Python
No ratings yet
Intro to ML with Sklearn & Python
10 pages
House Price Prediction With Analysis
No ratings yet
House Price Prediction With Analysis
9 pages
DM Assignment
No ratings yet
DM Assignment
17 pages
Story Point Estimation Copy
No ratings yet
Story Point Estimation Copy
16 pages
PPML Projectnew044
No ratings yet
PPML Projectnew044
22 pages
Predictive Analytics For Housing Market Trends and Valuation
No ratings yet
Predictive Analytics For Housing Market Trends and Valuation
6 pages
Business: Capstone Project House Price Prediction Project Note-1
88% (8)
Business: Capstone Project House Price Prediction Project Note-1
40 pages
Project Presentation On House Price Prediction System: Presented by Name: Simran B Solanki Roll No: 19020
100% (1)
Project Presentation On House Price Prediction System: Presented by Name: Simran B Solanki Roll No: 19020
32 pages
House Ames Project
No ratings yet
House Ames Project
15 pages
House Price Prediction Using ML Techniques
No ratings yet
House Price Prediction Using ML Techniques
11 pages
Information Regarding Sales Made in Real Estate in A Tabular Format
No ratings yet
Information Regarding Sales Made in Real Estate in A Tabular Format
13 pages
Manchester Restaurant Location Guide
No ratings yet
Manchester Restaurant Location Guide
7 pages
Project1 Report1
No ratings yet
Project1 Report1
3 pages
EDA Techniques for Data Science Students
No ratings yet
EDA Techniques for Data Science Students
48 pages
House Price Prediction Guide
No ratings yet
House Price Prediction Guide
14 pages
Data Cleaning EDA
No ratings yet
Data Cleaning EDA
5 pages
Case Study 219302405
No ratings yet
Case Study 219302405
14 pages
Iamsp 2
No ratings yet
Iamsp 2
8 pages
MiniProject BI
No ratings yet
MiniProject BI
16 pages
Housing Price Prediction with Regression
No ratings yet
Housing Price Prediction with Regression
5 pages
House Price Prediction Analysis
100% (2)
House Price Prediction Analysis
26 pages
Unit 2
No ratings yet
Unit 2
78 pages
(House Price Prediction) Capstone Project For Python
No ratings yet
(House Price Prediction) Capstone Project For Python
10 pages
2023 MScIT Patel Mirza
No ratings yet
2023 MScIT Patel Mirza
54 pages
Real Estate Cost Estimation Using Data Mining
No ratings yet
Real Estate Cost Estimation Using Data Mining
15 pages
House Price Prediction
No ratings yet
House Price Prediction
5 pages
The Boston Housing Dataset
100% (2)
The Boston Housing Dataset
4 pages
Real Estate Analysis
No ratings yet
Real Estate Analysis
38 pages
Real Estate Price Prediction Models
No ratings yet
Real Estate Price Prediction Models
19 pages
Team NB PositionPaper BPG TechnicalDocEU MDR 2017 745 V2 20230419
No ratings yet
Team NB PositionPaper BPG TechnicalDocEU MDR 2017 745 V2 20230419
48 pages
Audit Quality in Indonesian Firms
No ratings yet
Audit Quality in Indonesian Firms
10 pages
Assessment of Sport Performance: Theoretical Aspects and Practical Indications
No ratings yet
Assessment of Sport Performance: Theoretical Aspects and Practical Indications
4 pages
Methods of Research Module 1
No ratings yet
Methods of Research Module 1
15 pages
Development and Validation of A Scale Measuring Students' Use of Generative Artificial Intelligence Tools
No ratings yet
Development and Validation of A Scale Measuring Students' Use of Generative Artificial Intelligence Tools
10 pages
Patient Flow in Emergency Departments: A Comprehensive Umbrella Review of Solutions and Challenges Across The Health System
No ratings yet
Patient Flow in Emergency Departments: A Comprehensive Umbrella Review of Solutions and Challenges Across The Health System
36 pages
CBM Global Consultancy - ToR - Architecture
No ratings yet
CBM Global Consultancy - ToR - Architecture
3 pages
Educational Statistics Overview
No ratings yet
Educational Statistics Overview
47 pages
How To Write A Good Research - Proposal-By TSEGAYE S.
No ratings yet
How To Write A Good Research - Proposal-By TSEGAYE S.
35 pages
The Bullwhip Effect in Intra Organisational Echelons
No ratings yet
The Bullwhip Effect in Intra Organisational Echelons
29 pages
A Comparison of BHP Billiton Mineral Escondida Flotation Concentrators
100% (1)
A Comparison of BHP Billiton Mineral Escondida Flotation Concentrators
22 pages
Women's Rampage in Aba Riot Study
No ratings yet
Women's Rampage in Aba Riot Study
21 pages
Pre-Final Examination III With Answer
No ratings yet
Pre-Final Examination III With Answer
3 pages
Probability and Statistical Testing in Genetics The Goodness-of-Fit Chi-Square Test
No ratings yet
Probability and Statistical Testing in Genetics The Goodness-of-Fit Chi-Square Test
4 pages
Tax Compliance in A Developing Country: Understanding Taxpayers ' Compliance Decision by Their Perceptions
No ratings yet
Tax Compliance in A Developing Country: Understanding Taxpayers ' Compliance Decision by Their Perceptions
27 pages
Numbers in Base Ten 2017 Unit
No ratings yet
Numbers in Base Ten 2017 Unit
3 pages
Ducks, Geese & Swans of North America A Completely New and Expa
100% (9)
Ducks, Geese & Swans of North America A Completely New and Expa
576 pages
Toxic Positivity Intentions An Image Management Approach To Upward Social
No ratings yet
Toxic Positivity Intentions An Image Management Approach To Upward Social
13 pages
Boise Fire Strategic Plan 2023-2033
No ratings yet
Boise Fire Strategic Plan 2023-2033
29 pages
Year Master Students:: Lesson Plan Objectives
No ratings yet
Year Master Students:: Lesson Plan Objectives
3 pages
Articulo 1 - Ingles
No ratings yet
Articulo 1 - Ingles
7 pages
Literature Review Help for Students
100% (2)
Literature Review Help for Students
6 pages
Operational Auditing Syllabus
No ratings yet
Operational Auditing Syllabus
21 pages
SUG596 - Field Scheme II (Engineering Survey)
90% (10)
SUG596 - Field Scheme II (Engineering Survey)
60 pages
Statistics Course for Vietnamese Students
No ratings yet
Statistics Course for Vietnamese Students
51 pages
Childselfdescription - SDQ I Marsh
No ratings yet
Childselfdescription - SDQ I Marsh
7 pages
Understanding Map Distances and Symbols
No ratings yet
Understanding Map Distances and Symbols
2 pages
Steps For BFPIP Culvert Surveys-Simple Instructions Please Note
No ratings yet
Steps For BFPIP Culvert Surveys-Simple Instructions Please Note
2 pages
Kelloggs Group Presentation (Final Draft) V4
No ratings yet
Kelloggs Group Presentation (Final Draft) V4
24 pages
ACKNOWLEDGEMENT ND Executive Summary
No ratings yet
ACKNOWLEDGEMENT ND Executive Summary
3 pages

Task 1 - Data Analytics in Python

Uploaded by

Task 1 - Data Analytics in Python

Uploaded by

COM7024

Msc Data Science

Programming for Data

Investigating the Manchester Housing Market

Exploring Business Factors

The dataset contains various attributes of properties in Manchester, including:

Data Cleaning and Transformation

Statistical Test: T-test

Central Limit Theorem Demonstration

• A moderate positive correlation (0.390) between Floor Space and Price.

• A minor correlation (0.094) between Year Built and Price.

• A minor correlation (0.045) between Waterfront status and Price.

Several plots were created to visualize relationships between variables:

The analysis revealed key insights:

• There is a moderate positive correlation between Floor Space and Price.

• Bedrooms and Bathrooms have strong positive correlations with Price.

3. Investment in Non-Waterfront Properties: Given the minor price difference between

McKinney, W. (2010). Data Analysis with Python. O'Reilly Media.

# Importing required libraries

# Path of Manchester Housing dataset

# Display the first few rows of the dataset

# Descriptive statistics for waterfront properties

# Graph the distribution of floor space

# Correlation matrix for numeric columns

# Visualize the correlation matrix

# Scatter plot for Year Built vs. Price

# Box plot for Waterfront vs. Price

# Correlation between Floor Space and Price

# Correlation between Year Built and Price

# Central Limit Theorem

# T-test (Statistical test) to compare prices of waterfront vs. non-waterfront properties

t_stat, p_val = stats.ttest_ind(waterfront_prices, non_waterfront_prices)

# Checking data types and converting them if necessary

# To use only numeric columns for correlation

# Visualize the updated correlation matrix

Output (in sequence)

You might also like