0% found this document useful (0 votes)
57 views1 page

Group Assignment #1

The assignment for FIN 4503 requires students to work in groups on a case study involving the Boston Housing dataset, which includes 167 cases and 11 attributes. Students must handle missing data and identify outliers in the PTRATIO predictor, highlighting them and providing possible causes. Additionally, they must substitute missing data with NaN and provide Python code for data omission and imputation, with a submission deadline of January 26, 2023.

Uploaded by

namarata.ajmera
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views1 page

Group Assignment #1

The assignment for FIN 4503 requires students to work in groups on a case study involving the Boston Housing dataset, which includes 167 cases and 11 attributes. Students must handle missing data and identify outliers in the PTRATIO predictor, highlighting them and providing possible causes. Additionally, they must substitute missing data with NaN and provide Python code for data omission and imputation, with a submission deadline of January 26, 2023.

Uploaded by

namarata.ajmera
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

FIN 4503 – Winter 2023

ASSIGNMENT #1 (5%)
INSTRUCTIONS:

 This assignment is to be completed in groups.


 Use MLA or APA format
 The solutions must be submitted via Blackboard through the assignment’s link.
 Due date: Midnight of 01/26/2023.

CASE STUDY:

Boston Housing dataset contains data collected by the US Census Service concerning housing in
the area of Boston Massachusetts. It was obtained from the StatLib archive
(http://lib.stat.cmu.edu/datasets/boston). The dataset has 167 cases.
The data was originally published by Harrison, D. and Rubinfeld, D.L. `Hedonic prices and the
demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978.

The BostonHousing.xlsx dataset has 11 attributes. The dataset comes with different
imperfections (missing and outliers). As described earlier, most algorithms will not process
records with these imperfections.

REQUIREMENTS:
PART A
Use the provided data file in the following tasks:
1. Except PTRATIO predictor, perform the necessary “Handling Missing Data” operations
to the missing values and highlight them with yellow.

2. Find possible "outliers" in the PTRATIO predictor. The possible causes of outliers are:
(a) Typing non-numeric value.
(b) Shift in decimal place while data entry error.
(c) Genuine case of outlier.
Highlight the cells with outlier cases and state the possible cause indicating a, b, or c.

PART B
Use the provided data file in the following tasks:
1. Substitute the missing data by NaN (not a number).
2. Write and provide Python code to implement:
A. Omission
B. Imputation

You might also like