VIT – AP University Amaravati
Problem Set-I
Course Title: Fundamentals of Datascience Class: M.Sc.
Instructor’s name: Dr. Aastha Semester: Fall Freshers 2023-24
Download the dataset shared via mail and perform the following analysis using python!
Q.1) Print the first five rows of the dataframe and use a suitable function to extract the column
names and dimension of the dataframe.
Q.2) Extract the datatype of each column
Q.3) What is the count of the missing values in each column of the dataframe? Also find the
overall total missing values and convert it in percentage!
Q.4) What is dimension of the dataframe after removing the rows with missing values?Is the
dimension same after removing columns with missing values?
Q.5) Drop the entirely empty columns and then impute the new dataframe with forward and
backward fill. Also, extract the names of the columns that were dropped. Next, obtain a fresh
new dataframe and impute the missing values with the mean of the column.
Q.6) Calculate the mean and median for the estimated cost and revised cost. What is your
observation?
Q.7) Create a new cleaned dataframe after applying backward/forward fill with two column names
only, namely the Revised cost and the Estimated Cost. Transform the dataframe using the following
normalization techniques: min-max scaling and z-score.