Lilongwe University of Agriculture and Natural Resources
NRC CAMPUS
TO: Mr. James Kambere.
FROM: bright manda
PASSCODE: 240302107
Step by step Procedure
*I take the path of the dataset through user input,
then I have a function taking the path as parameter
*Inside the function, the dataset path is passed to
read_excel() function of pandas to load it then iloc
is used to slice the first ten rows
*To work with the dataset better, I created list
on each required column using a tolist() method
*Then I grouped the data by Location with models
one by one to remove duplicating states and sum up
values on each state
*Then I created Lists from the grouped data
-to find the state with more vaccines distributed in
total, the lists from grouped data are used by checking
the index of a maxmum value from the list of total
distributed. Then using the index to access the corresponding
state in location list.
*The same applies to state with least vaccines distributed in total.
In grouped list, the index of the minimum value in the list
of total distributed is used to access it's corresponding state
in location list
*To find the model with highest number of vaccines distributed,
i use the sum of the values of the list of models separately
then output the model with a highest sum
*To find the state which had highest janssen distributed, the index of
maxmum value from the janssen list of grouped data is used to
find its corresponding state in the grouped location list
*To find the day which had more novavax distributed, I used the index
of maximum value in novavax list of non grouped data to find it's
corresponding day in the date list,
-i used datetime module to come up with some good formats of the day
*To find the average number of moderna vaccines in KY per day, I used
the value from the list of distributed_modena from grouped data which
is at the same index as KY in the location list of grouped data. Then I divided the
value by frequency of KY state in non grouped data which I found using a count
function.