0% found this document useful (0 votes)
182 views24 pages

Sample Mini Project

This document describes a project that uses machine learning models to predict future COVID-19 cases in Tamil Nadu, India. The objectives are to develop a linear regression model to predict near-future COVID cases and compare predictions to actual daily cases. The dataset is from Kaggle and contains over 15,000 records of daily confirmed cases, deaths, and recoveries in India from January 2020 to June 2021. Linear regression, LASSO, SVM, and exponential smoothing models are used to predict numbers in the next 10 days and their results are analyzed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
182 views24 pages

Sample Mini Project

This document describes a project that uses machine learning models to predict future COVID-19 cases in Tamil Nadu, India. The objectives are to develop a linear regression model to predict near-future COVID cases and compare predictions to actual daily cases. The dataset is from Kaggle and contains over 15,000 records of daily confirmed cases, deaths, and recoveries in India from January 2020 to June 2021. Linear regression, LASSO, SVM, and exponential smoothing models are used to predict numbers in the next 10 days and their results are analyzed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

PREDICTION OF COVID19 CASES USING MACHINE LEARNING

A MINI PROJECT REPORT


Submitted by

BHARANI RAAJ T 311518205011


JOHNSON PRAVEEN G 311518205021

In partial fulfillment for the award of the


degree Of
BACHELOR OF TECHNOLOGY
In
INFORMATION TECHNOLOGY

MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

KODAMBAKKAM, CHENNAI – 600024.

ANNA UNIVERSITY: CHENNAI 600 025

AUGUST 2021

ANNA UNIVERSITY: CHENNAI 600 025


BONAFIDE CERTIFICATE

Certified that this project report “PREDICTION OF COVID19 CASES USING


MACHINE LEARNING” is the bonafide work of BHARANI RAAJ T (311518205011)
and JOHNSON PRAVEEN G(311517205046) who carried out the project work under my
supervision.

SIGNATURE SIGNATURE
Mrs.A.Kanimozhi, M.E., [PhD]., Mrs.S.VIJAYALAKSHMI, M.Tech.,

HEAD OF THE DEPARTMENT SUPERVISOR


Associate Professor Associate Professor
Information Technology Information Technology
Meenakshi Sundararajan Meenakshi
Sundararajan
EngineeringCollege Engineering
College #363, Arcot Road, #363, Arcot Road,
Kodambakkam, Kodambakkam,
Chennai-600 024 Chennai-600 024

Submitted for the project viva voice held at Meenakshi Sundararajan Engineering
College on 09-08-2021.

INTERNAL EXAMINER EXTERNAL EXAMINER


ACKNOWLEDGEMENT

We sincerely thank our management of Meenakshi Sundararajan


Engineering College for supporting and motivating us during the
course of study.

We extend our heartfelt gratitude and sincere thanks to our secretary


Dr.K.S.Babai., B.Sc., B.E., M.S., FIE., Ph.D., Meenakshi
Sundararajan Engineering College for being pivotal in all our
endeavors.

We extend our heartfelt gratitude and sincere thanks to our principal


Dr.P.K.Suresh, M.S, Ph.D., Meenakshi Sundararajan Engineering
College for his guidance & support.

We would like to thank Dr.K.Uma Rani, Ph.D., Dean Academics,


Meenakshi Sundararajan Engineering College, for being a pillar of
support throughout our four years of undergraduate.

We would like to thank Dr.A.Kanimozhi, M.E., Ph.D., Associate


Professor, Head of the department, Information Technology,
Meenakshi Sundararajan Engineering College, for encouraging all
through the four years & guiding us in every aspect.

We would like to thank our internal guide Mrs.S.Vijayalakshmi,


M.Tech., Associate Professor, Information Technology, Meenakshi
Sundararajan Engineering College for providing constructive feedback,
and for helping us take various decisions during our project phase.

We also thank all other faculty members of Department of Information


Technology for their guidance & encouragement.
TABLE OF
CONTENTS

CHAPTER PAGE
TITLE
NUMBER NUMBER
1 INTRODUCTION 7
1.1 7
What is Machine Learning?
1.2 Types of Machine Learning 8

2 OBJECTIVES 9
DATASET
3 10
METHOD USED - LINEAR
4 REGRESSION 12
5 FORMULA 14
EXPERIMENTS
6 15
PREDICTION
7 17
HARDWARE/SOFTWARE
8 REQUIREMENTS 20

9 SOFTWARE DESCRIPTION 21
10 RESULT DISCUSSIONS AND 22
CONCLUSION
11 CODE 23
REFRENCES
12 24
ABSTRACT

Machine learning (ML) based forecasting mechanisms have proved


their significance to anticipate in perioperative outcomes to improve
the decision making on the future course of actions. The ML models
have long been used in many application domains which needed the
identification and prioritization of adverse factors for a threat. Several
prediction methods are being popularly used to handle forecasting
problems. This study demonstrates the capability of ML models to
forecast the number of upcoming patients affected by COVID-19
which is presently considered as a potential threat to mankind. In
particular, four standard forecasting models, such as linear regression
(LR), least absolute shrinkage and selection operator (LASSO),
support vector machine (SVM), and exponential smoothing (ES) have
been used in this study to forecast the threatening factors of COVID-
19. Three types of predictions are made by each of the models, such
as the number of newly infected cases, the number of deaths, and the
number of recoveries in the next 10 days. The results produced by the
study proves it a promising mechanism to use these methods for the
current scenario of the COVID-19 pandemic. The results prove that
the ES performs best among all the used models followed by LR and
LASSO which performs well in forecasting the new confirmed cases,
death rate as well as recovery rate, while SVM performs poorly in all
the prediction scenarios given the available dataset.
INTRODUCTION

19.1 MACHINE LEARNING


Machine learning is a branch of artificial intelligence (AI) and
computer science which focuses on the use of data and algorithms to
imitate the way that humans learn, gradually improving its accuracy.
Machine learning is an important component of the growing field of
data science. Through the use of statistical methods, algorithms are
trained to make classifications or predictions, uncovering key insights
within data mining projects. These insights subsequently drive
decision making within applications and businesses, ideally impacting
key growth metrics. As big data continues to expand and grow, the
market demand for data scientists will increase, requiring them to
assist in the identification of the most relevant business questions and
subsequently the data to answer them.

Machine learning classifiers fall into three primary categories.

Supervised machine learning

Supervised learning, also known as supervised machine learning, is


defined by its use of labeled datasets to train algorithms that to
classify data or predict outcomes accurately. As input data is fed into
the model, it adjusts its weights until the model has been fitted
appropriately. This occurs as part of the cross validation process to
ensure that the model avoids overfitting or underfitting. Supervised
learning helps organizations solve for a variety of real-world
problems at scale, such as classifying spam in a separate folder from
your inbox. Some methods used in supervised learning include neural
networks, naïve bayes, linear regression, logistic regression, random
forest, support vector machine (SVM), and more.
Unsupervised machine learning

Unsupervised learning, also known as unsupervised machine learning,


uses machine learning algorithms to analyze and cluster unlabeled
datasets. These algorithms discover hidden patterns or data groupings
without the need for human intervention. Its ability to discover
similarities and differences in information make it the ideal solution
for exploratory data analysis, cross-selling strategies, customer
segmentation, image and pattern recognition. It’s also used to reduce
the number of features in a model through the process of
dimensionality reduction; principal component analysis (PCA) and
singular value decomposition (SVD) are two common approaches for
this. Other algorithms used in unsupervised learning include neural
networks, k-means clustering, probabilistic clustering methods, and
more.

Semi-supervised learning

Semi-supervised learning offers a happy medium between supervised


and unsupervised learning. During training, it uses a smaller labeled
data set to guide classification and feature extraction from a larger,
unlabeled data set. Semi-supervised learning can solve the problem of
having not enough labeled data (or not being able to afford to label
enough data) to train a supervised learning algorithm.
OBJECTIVES

The main objective of developing this project are:

1. To develop machine learning model to predict the near future


possibility of Covid 19 cases in Tamil Nadu by implementing
Linear Regression.

2. To find the predicted number of cases per day and compare it


with the actual cases on that day by plotting a graph.

3. To analyze feature selection methods and understand their


working principle.
DATASET

The dataset is publicly available on the Kaggle Website which is


from an ongoing study on covid 19 cases in India. Since Kaggle is
one of the most trusted data provider in the world, dataset is picked
from Kaggle. It provides information on how many new cases
arose everyday and the number of deaths per day.It contains an
excess of 15000 records and 9 attributes. The attributes include:
date, state, confirmed cases, cured cases and number of deaths. The
data set is in csv (Comma Separated Value) format which is further
prepared to data frame as supported by pandas library in python.
Original dataset containing the covid 19 cases in India from 31-
Jan-2020 to 01-Jun-2021

Considering the large volume of the dataset , the prediction is


restricted only to the state of Tamil Nadu.
Dataset containing only covid 19 cases in Tamil Nadu
METHOD USED

LINEAR REGRESSION

What linear regression does is simply tell us the value of the


dependent variable for an arbitrary independent/explanatory
variable. e.g. Twitter revenues based on number of Twitter users .

From a machine learning context, it is the simplest model one can try
out on your data. If you have a hunch that the data follows a straight
line trend, linear regression can give you quick and reasonably
accurate results.

Simple predictions are all cases of linear regression. We first observe


the trend and then predict based on the trend e.g. How hard you must
brake depending on the distance of the car ahead of you. Not all of
situations follow a linear trend though. e.g. the rise of bitcoin from
2015 to 2016 was linear but in 2017 it suddenly became exponential.
So post 2017 Bitcoin would not be predicted well by linear regression

Hence it is important to understand that even though linear regression


can be the first attempt at understanding the data it may not always be
ideal.

Here’s how we do linear regression

1. We plot our dependent variable (y-axis) against the independent


variable (x-axis)
2. We try to plot a straight line and measure correlation
3. We keep changing the direction of our straight line until we get the
best correlation
4. We extrapolate from this line to find new values on y-axis
FORMULA

Linear regression is a form of supervised learning. Supervised learning


involves those set of problems where we use existing data to train our
machine. In the beer example we already know the data for the first 10
months. We just have to predict the data for 11th and 12th month.

Linear regression can involve multiple independent variables. e.g.


house price (dependent) depending on both location (independent)
and land area (independent) but in its simplest form it involves 1
independent variable.

In its generic form it is written as

where all the alphas are coefficients that our machine learning
algorithm has to figure out. The x’s are known because they are
independent. We can set them anything. What we need to find is Y.

For a single independent variable the equation is reduced to

Simple Linear Regression


For simplification x0 is set to be equal to 1 and alpha0 is given the
name c. x1 is called x and alpha1 = m. It reduces to:
EXPERIMENTS

Since the dataset contains many attributes and columns , we used a


few built-in commands to test on a few experiments. Functions like
unique(), head(), tail(), etc. were tested for the sake of simplicity
and better understanding.

Then a graph is plotted from the given dataset to find the heat map
between cured rate and death rate.

Now since the data is scrapped , it is ready for prediction using


Linear regression model.
PREDICTION

Two array lists namely train and test are created. train contains 80%
of the integer data whereas the rest 20% is stored in test. This is
done only for convenience.

Then the dataset is appended to two new datasets called datax and
datay which refer the data at x-axis and y-axis respectively. Each
data in the dataset is appended to x and y with the help of a for loop.

The function returns the two new datasets when called upon.

Linear Regression model is used and the 2 new dataset arrays datax
and datay are fit into it. model.predict() function predicts the future
value from the given dataset.
The predict model predicts the number of cases for the next 90 days.
It is seen that the actual value of the number of cases per day and
the predicted value are relatively close. This makes the idea of the
project valuable. The final output is shown as a bar graph for better
understanding. We can find the closeness of the actual and predicted
value through the graph.
SOFTWARE/HARDWARE REQUIREMENTS

The hardware requirements specification enlists all


necessary requirements that are required for the project development.
The name nodes used for monitoring and analysis in each circle must
meet the following minimal hardware requirements:

Navigator : Anaconda

Platform : Jupyter Notebook

Ram : 512 Mb.


Operating System: Windows 7/8/10
Coding Languages: Python
SOFTWARE DESCRIPTION

PYTHON
Python is taken as high-level programming language for programming
Python offers multiple choices for developing GUI (Graphical User
Interface). Out of all the GUI ways, tkinter is most ordinarily used
technique. it's a customary Python interface to the TtKinter GUI toolkit
shipped with Python. Python with tkinter outputs the quickest and
easiest method to make the GUI applications. making a GUI , tkinter is
a simple task.
The Jupyter Notebook is an open source web application that you can
use to create and share documents that contain live code, equations,
visualizations, and text. Jupyter Notebook is maintained by the people
at Project Jupyter.

Jupyter Notebooks are a spin-off project from the IPython project,


which used to have an IPython Notebook project itself. The name,
Jupyter, comes from the core supported programming languages that
it supports: Julia, Python, and R. Jupyter ships with the IPython
kernel, which allows you to write your programs in Python, but there
are currently over 100 other kernels that you can also use.
RESULT DISCUSSIONS AND CONCLUSIONS

The project is a basic implementation of Machine Learning and Data


Science concepts combined together. Even though the project is a
simple demonstration of covid 19 cases prediction , it is possible to
extend this project to predict any models. With the same concepts it is
possible to predict outputs from various data sources. For example
with the same concepts it is possible to predict the stock market value
of stocks with the previous history of increase and decrease in stock
market values. The values will be relatively close to the actual ones.
Another application of this project could be the result analysis of a
student. Considering his/her previous performances , it is possible to
predict the future mark of the student. Thus we sincerely hope that
this project could do a lot to the society and in general.
CODE

The coding portion were carried out to prepare the data, visualize it,
pre-process it, building the model and then evaluating it. The code has
been written in Python programming language using Jupyter
Notebook as IDE. The experiments and all the models building are
done based on python libraries. The code is available in the Git
repository given in following link:

https://github.com/bharaniraaj17/Covid19_prediction

Libraries used:
1. NumPy
2. Pandas
3. Matplotlib
4. Seaborn
REFRENCES

Geekforgeeks
Tutorialspoint
Stack Overflow
AI Tamil

You might also like