0% found this document useful (0 votes)
16 views2 pages

Assignment 3

The document outlines a step-by-step procedure for analyzing a dataset using Python's pandas library. It details how to load the dataset, manipulate and group the data, and extract insights such as the states with the most and least vaccines distributed, the model with the highest distribution, and average daily distributions for specific vaccines. The process includes using functions like read_excel(), iloc, tolist(), and various indexing methods to derive the required information.

Uploaded by

paulsidira10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views2 pages

Assignment 3

The document outlines a step-by-step procedure for analyzing a dataset using Python's pandas library. It details how to load the dataset, manipulate and group the data, and extract insights such as the states with the most and least vaccines distributed, the model with the highest distribution, and average daily distributions for specific vaccines. The process includes using functions like read_excel(), iloc, tolist(), and various indexing methods to derive the required information.

Uploaded by

paulsidira10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Lilongwe University of Agriculture and Natural Resources

NRC CAMPUS
TO: Mr. James Kambere.
FROM: bright manda
PASSCODE: 240302107
Step by step Procedure

*I take the path of the dataset through user input,


then I have a function taking the path as parameter

*Inside the function, the dataset path is passed to


read_excel() function of pandas to load it then iloc
is used to slice the first ten rows

*To work with the dataset better, I created list


on each required column using a tolist() method

*Then I grouped the data by Location with models


one by one to remove duplicating states and sum up
values on each state

*Then I created Lists from the grouped data


-to find the state with more vaccines distributed in
total, the lists from grouped data are used by checking
the index of a maxmum value from the list of total
distributed. Then using the index to access the corresponding
state in location list.

*The same applies to state with least vaccines distributed in total.


In grouped list, the index of the minimum value in the list
of total distributed is used to access it's corresponding state
in location list

*To find the model with highest number of vaccines distributed,


i use the sum of the values of the list of models separately
then output the model with a highest sum

*To find the state which had highest janssen distributed, the index of
maxmum value from the janssen list of grouped data is used to
find its corresponding state in the grouped location list

*To find the day which had more novavax distributed, I used the index
of maximum value in novavax list of non grouped data to find it's
corresponding day in the date list,
-i used datetime module to come up with some good formats of the day

*To find the average number of moderna vaccines in KY per day, I used
the value from the list of distributed_modena from grouped data which
is at the same index as KY in the location list of grouped data. Then I divided the
value by frequency of KY state in non grouped data which I found using a count
function.

You might also like