You will be given a COVID-19 dataset read in from python code, and are required to do below tasks:
1. (4.5 marks, 1.5 marks for each) Good structure of Python Jupyter Notebook
a. Containing title cells, subtitle cells.
b. Python codes are reasonable separated into groups (code cells) with functionalities.
Code for importing libraries
Code for loading data
Code for data cleaning
Code for data analysis
Code for visualizations
c. Containing meaningful comments and sensible variable and function name
2. (10 marks, 2 marks each)) download csv data with pandas with below code:
import pandas as pd
deaths_df =
pd.read_csv('[Link] 19/blob/master/csse_covid_19_data/csse
_covid_19_time_series/time_series_covid19_deaths_
[Link]’)
Identify and handle missing values in the dataset.
Remove duplicate entries if any.
Convert the date column to a consistent date format (e.g., YYYY MM-DD)
Save the cleaned dataset to a new CSV file.
3. (3 marks) Display first 5 rows of the loaded data (1 mark) and do a short summary about the data
(2 marks)
4. (2.5 marks) Calculate the mean and median of the daily cases.
5. (3 marks) Get daily death cases worldwide (hint: summarizing daily death cases over all countries.
6. (3 marks) Get daily incensement of deaths cases via defining a function (hint: use the death cases
of today minus the death cases of yesterday from the data obtained in task 5.
7. (4 marks) Visualize the data obtained in task 5 with library matplotlib.