0% found this document useful (0 votes)
84 views48 pages

Disease Prediction with Machine Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views48 pages

Disease Prediction with Machine Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 48

MULTIPLE DISEASE PRIDICTION USING MACHINE

LEARNING

A PROJECT REPORT

Submitted by

A ALWIN SELVA KUMAR (210519205003)

S GOKUL (210519205052)

in partial fulfillment for the award of the degree

of

BACHELOR OF TECHNOLOGY

in

INFORMATION TECHNOLOGY

DMI COLLEGE OF ENGINEERING

ANNA UNIVERSITY :: CHENNAI 600 025

MAY 2023
ANNA UNIVERSITY :: CHENNAI 600 025

BONAFIDE CERTIFICATE
Certified that this project report “MULTIPLE DISEASE PRIDICTION

USING MACHINE LEARNING”is the bonafide work of

A.ALWIN SELVA KUMAR(210519205004) and S.GOKUL(210519205014)

who carried out the project work under my supervision.

SIGNATURE SIGNATURE

Dr.B.Muthu Kumar M.E.(Ph.D) Mrs.S.Derisha mahil M. E.

HEAD OF THE DEPARTMENT SUPERVISOR

Professor Assistant Professor

Information Tecchnology Information Tecchnology

DMI College of Engineering DMI College of Engineering

Palanchur,Chennai-600123. Palanchur,Chennai-600123.

Viva Voice held on ------------.

INTERNAL EXAMINER EXTERNAL EXAMINER

II
ACKNOWLEDGEMENT

At the outset from the core of our heart, we thank the LORD ALMIGHTY for
the manifold blessings showered on us and strengthening us to complete our
project work without any hurdles.

We stand indebted in the gratitude of our beloved Founder and Chairman


Rev. Fr. Dr. J. E. Arul Raj, for all the facilities provided at our institution. We
also give sincere thanks to our beloved Correspondent Rev. Sr. MK.Teresa.

We thank our principal Dr. N. Azhagesan,M.Tech,Ph.D, who has also


served as an inspiration for us to carry out the responsibilities for doing this
project.

We proudly express our esteemed gratitude to our Head of


DepartmentDr.B.Muthu Kumar M.E,Ph.D and our project coordinator
Mrs.M.Maheswari,M.E and our project guide Mrs.S.Derisha Mahil, M.E,
for their encouragement by assistance towards the completion of our project.

It is a pleasure to acknowledge our heartfelt gratitude to all the staff


members of IT department who kindled us successfully to bring our project a
complete one. Further thanks to Non-Teaching Staff extending the lab facilities
.

We thank our parents, our family members and friends for their moral
support.

A.ALWIN SELVA KUMAR


S.GOKUL

iii
ABSTRACT

Health education based on infectious disease issues has been sparked by


health education based on infectious disease concerns and the demand for high-
quality medical care. This report provides a current comprehensive assessment
of machine learning techniques with a view to forecasting infectious disease
concerns. We will also discuss the challenges, constraints, and potential
applications of machine learning in the study of infection and disease. We
collect research and articles on machine learning techniques for predicting
infectious disease concerns by examining reliable databases. When doing this
systematic review, we also apply linear regression and the PRISMA approach.
After that, we categorize the data we've collected. articles addressing infections
and diseases, such as schizophrenia, bipolar disorder, anxiety and depression,
PTSD, and infections in children disease We analyse the challenges and
limitations machine learning researchers have faced when researching infectious
illness issues as we discuss the findings. We also make specific
recommendations for future research and advancements in the application of
machine learning to the study of infection and disease. We will use the input to
read the dataset, clean it, and then check to see whether there is a null value.
Predict the feature engineering, use the model, and then forecast the results.
Machine learning, prediction-making Natural Language Processing (NLP), and
other uses of artificial intelligence are required. The purpose of this study is to
forecast the onset of infectious disease utilizing two machine learning
techniques, namely Visualization of Data and Prognoses algorithms for neural
networks, recurrent neural networks, and boost.

iv
TABLE OF CONTENT

CHAPTER TITLE PAGE


NO NO
ACKNOWLEDGEMENT
ABSTRACT i
LIST OF FIGURES v
LIST OF ABBREVIATION vi
1 INTRODUCTION 1
1.1OBJECTIVE 1
1.2 METHODOLOGY 2
1.3 SCOPE OF THE PROJECT
2 LITERATURE SURVEY 3

3 SYSTEM ANALYSIS 7
3.1 EXISTING SYSTEM
3.1.1 DISADVANTAGES
3.2 PROPOSED SYSTEM
3.2.1 ADVANTAGES
3.3 SYSTEM REQUIREMENTS
4 SYSTEM DESIGNS 12
4.1 ARCHITECTURE DIAGRAM
4.2 MODULES
4.3 COLLECTION OF DATA
4.4 PRE-PROCESSING THE DATA
4.4.1 FORMATTING
4.4.2 CLEANING
4.4.3 SAMPLING

v
4.5 EXTRACTION OF FEATURES
4.6 EVALUATING THE MODEL
4.7 DATA FLOW DIAGRAM
4.8 UML DIAGRAM
4.8.1 USE CASE DIAGRAM
4.8.2 CLASS DIAGRAM
4.8.3 SEQUENCE DIAGRAM
4.8.4 ACTIVITY DIAGRAM
5 IMPLEMENTATION 16
5.1 DOMAIN SPECIFICATION 16
5.1.1 MACHINE LEARNING 16
5.1.2 MACHINE LEARNING VS 16
TRADITIONAL PROGRAMMING
5.1.3 SUPERVISED LEARNING
5.1.4 UNSUPERVISED LEARNING
5.1.5 REINFORCEMENT LEARNING
5.2 TENSORFLOW 17
5.2.1 TENSORFLOW ARCHITECTURE 17
5.3 PYTHON OVERVIEW
5.4 ANACONDA NAVIGATOR
6 TESTING 20
6.1 TESTING
7 RESULT 23
8 CONCLUSION AND DISCUSSION 24
8.1 CONCLUSION
8.2 FUTURE WORK
9 APPENDIX

vi
vii
CHAPTER I

INTRODUCTION

When coupled, data analysis and artificial intelligence offer new


opportunities and improve results in the field of predictive modelling, both of
which have shown their value in various study fields. Researchers are utilizing the
power of analytics to analyses the data and derive a result from it. The past has
seen catastrophic epidemics that claimed millions of lives and led to a global
economic downturn. As a result, it is crucial that an epidemic be stopped as soon as
it starts. While gathering and keeping information about the ongoing pandemic
takes a lot of time and resources, a portion of this issue can be solved by leveraging
social media platforms to gather the most recent data.

1.1OBJECTIVE
The objective of using Machine Learning (ML) for health education during
an infectious disease outbreak is to create effective and targeted educational
materials that can help prevent the spread of the disease. By analyzing the data
from social media conversations and other sources, ML algorithms can identify
patterns and insights that help understand the concerns and questions of different
populations. This information can then be used to create personalized educational
materials that specifically address those concerns and provide relevant information.

1.2.METHODOLOGY

 DATA COLLECTION : Gather data from various sources such as social


media platforms, online forums, news articles, and other relevant sources.
This data should consist of conversations, comments, and posts related to the

1
infectious disease outbreak. Ensure that the data collected is representative
of different populations and contains diverse perspectives.
 DATA ANALYSIS : Perform exploratory data analysis to gain insights into
the collected data. This analysis can involve statistical measures,
visualizations, and topic modeling techniques to understand the trends,
patterns, and common themes present in the data. This step helps in
understanding the concerns, questions, and misconceptions of different
populations.
 LABELING AND ANNOTATION : Annotate the data by assigning
appropriate labels or categories to different pieces of text. This can involve
classifying the text into predefined categories, identifying key topics,
sentiment labeling, or any other relevant annotations based on the project's
objectives. The annotation process may require domain expertise and can be
done manually or through automated techniques.
 MODEL DEVELOPMENT : Select appropriate ML algorithms and
develop models based on the project objectives. This can include techniques
such as classification algorithms (e.g., logistic regression, random forest,
support vector machines), clustering algorithms (e.g., K-means, hierarchical
clustering), or recommendation algorithms (e.g., collaborative filtering).
Consideration should be given to model performance, interpretability, and
scalability.
 MODEL TRAINING AND EVALUATION : Split the annotated data into
training and evaluation sets. Train the ML models using the training data and
fine-tune them to optimize their performance. Evaluate the models using
appropriate evaluation metrics such as accuracy, precision, recall, F1-score,
or area under the receiver operating characteristic curve (AUC-ROC). Iterate

2
on the model development and evaluation process to improve the models'
performance.

3
1.3 SCOPE OF FUTURE WORK
 Gather relevant data from sources such as social media platforms, online
forums, news articles, and other relevant sources.
 Clean and preprocess the collected data by removing noise, irrelevant
information, and duplicate entries.
 Apply techniques such as text normalization, stop word removal, and
sentiment analysis to enhance the quality of the data.
 Perform statistical analysis and visualizations to gain insights into the
collected data.
 Identify common concerns, questions, and misconceptions related to the
infectious disease outbreak.
 Assign appropriate labels or categories to the collected text data based on the
project objectives.
 Perform manual or automated annotation processes to label the data for
training the ML models.
 Ensure the quality and accuracy of the annotated data.
 Utilize the trained ML models to generate personalized educational
materials.
 Create informative and engaging content such as infographics, videos,
articles, or interactive platforms.
 Ensure that the educational materials effectively address concerns, provide
accurate information, and promote healthy behaviors.
 Deploy the generated educational materials and monitor their effectiveness.
 Collect user feedback, track engagement metrics, and conduct surveys or
interviews to evaluate the impact of the materials.
 Incorporate user feedback to refine the ML models and improve the
educational materials.
4
CHAPTER 2

LITERATURE SURVEY

Using Social Media for Actionable Disease Surveillance and Outbreak


Management: A Systematic Literature Review by G. Velasco et al. (2021): This
study explores the use of social media data for disease surveillance and outbreak
management. It discusses the potential of social media platforms in providing real-
time insights into public sentiment, concerns, and behaviors during infectious
disease outbreaks.

Machine Learning for Infectious Disease Prediction and Surveillance by L.


Yang et al. (2019): This paper provides an overview of the applications of ML in
infectious disease prediction and surveillance. It discusses various ML techniques
such as classification, clustering, and time series analysis in the context of disease
outbreak prediction, identification of high-risk areas, and early detection of
outbreaks.

Natural Language Processing and Machine Learning for Health and


Biomedical Applications by A. Holzinger et al. (2017): This survey paper covers
various ML and Natural Language Processing (NLP) techniques used in health and
biomedical applications. It provides insights into text mining, sentiment analysis,
text classification, and recommendation systems, which are relevant for developing
educational materials in the project.

Text Mining and Social Media Analytics for Improved Health Management:
A Review by C. Castillo et al. (2019): This review article focuses on text mining
and social media analytics for health management. It discusses the use of ML and
NLP techniques to analyze social media conversations and extract valuable insights

5
for public health interventions, including disease education and prevention
strategies.

6
CHAPTER 3

SYSTEM ANALYSIS

3.1 EXISTING SYSTEM

In this existingsystem-stacked DNN, five fine-tuned DNN models obtained


from hyper parameter tuning are stacked then the output of each DNN model
became the input of the meta-model in the form of a fully connected layer. The
proposed feature extraction method outperformed the existing feature extraction
and was able to separate data between classes better. Furthermore, the proposed
stacked DNN model generated an accuracy of 0.934 in the testing data,
outperforming DNN single models and other state-of-the-art machine learning
algorithms.

3.1.1 DISADVANTAGES

 Axillary,
 Deep learning,
 Electronic nose,
 Feature extraction,
 Infectious respiratory disease, stacked.

3.2 PROPOSED MODEL

We will use the input to read the dataset, clean it, and then check to see
whether there is a null value. Predict the feature engineering, use the model, and
then forecast the results. Machine learning, prediction-making Natural Language
Processing (NLP), and other uses of artificial intelligence are required. The
purpose of this study is to forecast the onset of infectious disease utilizing two

7
machine learning techniques, namely Visualization of Data and Prognoses
algorithms for neural networks, recurrent neural networks, and boost.

3.2.1 ADVANTAGES

 Time-consuming very less


 We will apply the algorithm of for neural networks
 recurrent neural networks, and boost.
 Implement a nlp techniques also.

3.3 SYSTEM REQUIREMENTS

Hardware Requirements

 Windows / linux / Mac Laptop or Desktop with Minimum 4GB


RAM

Software Requirements

 Python / Anaconda Navigator


 In Python language
 Jupiter notebook

8
CHAPTER 4

SYSTEM DESIGNS

4.1 ARCHITECTURE DIAGRAM

FIG 4.1 ARCHITECTURE DIAGRAM

4.2 MODULES

 COLLECTION OF DATA
 PRE-PROCESSING THE DATA
 EXTRACTION OF FEATURES
 EVALUATING THE MODEL
9

10
4.3 .COLLECTION OF DATA
Data collection is a process that gathers information on Health education
based from a variety of sources, which is then utilised to create machine learning
models. A set of cervical cancer data with features is the type of data used in this
work. The selection of the subset of all accessible data that you will be working
with is the focus of this stage. Ideally, ML challenges begin with a large amount of
data (examples or observations) for which you already know the desired solution.
Labelled data is information for which you already know the desired outcome.
4.4 PRE-PROCESSING THE DATA
Format, clean, and sample from your chosen data to organise it. There are
three typical steps in data pre-processing:
4.4.1 FORMATTING
It's possible that the format of the data you've chosen is not one that allows
you to deal with it. The data may be in a proprietary file format and you would like
it in a relational database or text file, or the data may be in a relational database
and you would like it in a flat file.
4.4.2 CLEANING
Data cleaning is the process of replacing missing data. There can be data
instances that are insufficient and lack the information you think you need to
address the issue. These occurrences might need to be eliminated.
4.4.3 SAMPLING
You may have access to much more data than you actually need that has
been carefully chosen. Algorithms may require more compute and memory to run
as well as take significantly longer to process larger volumes of data. You can
choose a smaller representative sample of the chosen data, which may be much
faster for exploring and testing ideas, rather than thinking about the complete
dataset.
11
4.5 EXTRACTION OF FEATURES
The next step is to A process of attribute reduction is feature extraction.
Feature extraction actually alters the attributes as opposed to feature selection,
which ranks the current attributes according to their predictive relevance. The
original attributes are linearly combined to generate the changed attributes, or
features. Finally, the Classifier algorithm is used to train our models. We make use
of the acquired labelled dataset. The models will be assessed using the remaining
labelled data we have. Pre-processed data was categorised using a few machine
learning methods. Random forest classifiers were selected.

4.6 EVALUATING THE MODEL

The model development process includes a step called model evaluation.


Finding the model that best depicts our data and predicts how well the model will
perform in the future is helpful. In data science, it is not acceptable to evaluate
model performance using the training data because this can quickly lead to overly
optimistic and over fitted models. Hold-Out and Cross-Validation are two
techniques used in data science to assess models. Both approaches use a test set
(unseen by the model) to assess model performance in order to prevent overfitting.
Based on its average, each categorization model's performance is estimated. The
outcome will take on the form that was imagined. graph representation of data that
has been categorised.
4.6.1 PROPOSED APPROCHES

1. We start by using the dataset of fraud transactions.


2. Filter the dataset in accordance with the needs, then construct a new dataset with
attributes that correspond to the analysis to be performed.

12
3. Pre-process the dataset before using it.
4. Distinguish training from testing data.
5. Analyse the testing dataset using the classification algorithm after training the
model with training data.
6. You will receive results as accuracy metrics at the end.

4.7 DATA FLOW DIAGRAM: -


LEVEL 0:

Dataset
Collection

Pre-
processing

Random
selection

Trained &
Testing
dataset

13
LEVEL 1:

Dataset
collection

Pre-
processing

Apply

Algorithm

Feature

Extraction

14
LEVEL 2:

Classify
the
dataset

Accuracy
of Result

Prediction of
Infectious
Disease
Outbreak

Finalize the
accuracy of
Infectious
Disease
Outbreak

15
4.8 UML DIAGRAM
Unified Modelling Language (UML) is used to specify, visualize, modify,
build, and document the artefacts of object-oriented software-intensive systems
under development. UML provides a standard way to visualize a system's
architectural blueprint, including elements such as:

Actor

● Business process
● (logical) components
● Activities
● programming language statements
● Database schema and

4.8.1 USE CASE DIAGRAM

UML is a standard language for specifying, visualizing, building, and


documenting the artefacts of software systems. • UML was developed by the
Object Management Group (OMG) and the UML 1.0 draft specification was
submitted to OMG in January 1997. • OMG is continuously committed to creating
true industry standards. • UML stands for Unified Modeling Language. • UML is a
visual language used to create software designs.

16
FIG 4.8.1 USECASE DIAGRAM

4.8.2 CLASS DIAGRAM


Class diagrams are a key building block of object-oriented modeling. It is
used for general conceptual modeling of the application's system and detailed
modeling by translating the model into programming code. Class diagrams can
also be used for data modeling. [1] Classes in class diagrams represent both major
elements, interactions within the application, and classes to be programmed. In the
diagram, a class is represented by a box containing three compartments.

17
FIG 4.8.2 CLASS DIAGRAM

4.8.3 SEQUANCE DIAGRAM


Sequence diagram Represents the objects involved in an interaction horizontally
and vertically in time. A use case is a type of behaviour classifier that represents a
declaration of provided behaviour. Each use case specifies specific behaviour. This
may include variants that the subject can perform cooperatively with one or more
of her actors. A use case defines the behaviour provided by a subject without
reference to the internal structure of the subject. These actions, including
interactions between actors and subjects, can lead to changes in the subject's state
and communication with its environment. A use case can have many variations on
the basic behaviour, such as anomalous behaviour and error handling.

18
FIG 4.8.3 SEQUANCE DIAGRAM

4.8.4 ACTIVITY DIAGRAM


Activity diagrams are graphical representations of workflows containing step-by-
step activities and actions that support selection, repetition, and concurrency.
Unified Modelling Language allows you to use activity diagrams to step-by-step
describe the business and operational workflows of the components in your
system. Activity diagrams show the overall control flow of this project.

19
FIG 4.8.4 ACTIVITY DIAGRAM

20
CHAPTER 5

IMPLEMENTATION

5.1 DOMAIN SPECIFICATION

5.1.1 MACHINE LEARNING

Machine Learning is a system that can learn from example through self-
improvement and without being explicitly coded by programmers. Breakthrough is
based on the idea that machines can learn independently from data (that is,
samples) to produce accurate results. Machine learning combines data with
statistical tools to predict output. These results are used by companies to generate
actionable insights. Machine learning is closely related to data mining and
Bayesian predictive modelling. A machine takes data as input and uses an
algorithm to create a response. A typical machine learning task are to provide a
recommendation. For those who have a Netflix account, all recommendations of
movies or series are based on the user's historical data. Tech companies are using
unsupervised learning to improve the user experience with personalizing
recommendations.

Machine learning is also used for a variety of tasks like fraud detection,
predictive maintenance, portfolio optimization, automatizing tasks and so on. The
machine executes the output after the logical statements. As your system becomes
more complex, you will need to create more rules. Similarly, the odds of success in
unfamiliar situations are lower than in known ones. First, machines learn by
discovering patterns. This discovery is thanks to data. It's important for data
scientists to choose carefully which data to make available to machines. A list of
attributes used to solve a problem is called a feature vector. A feature vector can be
thought of as a subset of data used to address a problem.

21
5.1.2 MACHINE LEARNING VS TRADITIONAL PROGRAMMING
Traditional programming differs significantly from machine learning. In
traditional programming, a programmer codes all the rules in consultation with an
expert in the industry for which software is being developed. The machine
executes the output after the logical statements. As your system becomes more
complex, you will need to create more rules. Similarly, the odds of success in
unfamiliar situations are lower than in known ones.

First, machines learn by discovering patterns. This discovery is thanks to


data. It's important for data scientists to choose carefully which data to make
available to machines. A list of attributes used to solve a problem is called a feature
vector. A feature vector can be thought of as a subset of data used to address a
problem. Therefore, the learning stage is used to describe the data and summarize
it into a model.

The following bullet points sum up the straightforward nature of machine learning
programs:

1. Establish a question

2. Compile data

3. Display data graphically

22
4. Exercise algorithm

5. Evaluate the algorithm

6. Gather comments

7. Make algorithm improvements

8. Repetition of steps 4 through 7 until contentment.

9. Use the model to predict something.

The algorithm applies this knowledge to new sets of data once it becomes
proficient at arriving at the correct conclusions.

5.1.3 SUPERVISED LEARNING

An invention uses preparation data and response from persons to determine


the friendship of given inputs to a likely production. For instance, an expert can
use shopping expenses and the meteorological outlook as a recommendation
dossier to anticipate the sale of cans. You can use directed education when the
profit dossier is known.
23
Classification

Imagine you be going to anticipate the neuter of a consumer for a monetary.


You will start accumulation dossier on the crest, burden, task, payroll, buying box,
etc. from your client table. You experience the neuter of each of your consumers, it
can only be male or female. The objective of the classifier will search out designate
a anticipation of being a male or a female (that is, the label) established the news
(that is, lineaments you have composed). When the model learns by virtue of what
to admit male or female, you can use new dossier to form a prognosis. For
instance, you just took new news from a mysterious client, and you ask about if it
is a male or female. If the classifier forecasts male = 70%, it method the invention
understand at 70% that this consumer is a male, and 30% it is a female. The label
maybe of two or more classes. The above instance has only two classes, but if a
classifier needs to conclude an object, it has dozens of classes (such as, jar, table,
footwear, etc.

Regression

When the gain is a constant profit, the task is a reversion. For instance, a
commercial accountant concede possibility need to forecast the profit of a stock
established a range of looks like impartiality, premature stock acts,
macroeconomics index. The system will acquire information to estimate the price
of the stocks accompanying rude likely mistake.

5.1.4 UNSUPERVISED LEARNING

In alone knowledge, an algorithm investigates recommendation dossier


without being likely an unambiguous amount changeable (e.g., surveying client
mathematical data to label patterns). You can use it when you do mix up in what

24
way or manner to classify the dossier and want the treasure to find patterns and
categorize the dossier for you.

5.1.5 REINFORCEMENT LEARNING

Reinforcement knowledge is a subfield of machine intelligence at which


point systems are prepared by taking in essence "rewards" or "penalties," basically
learning by experimental approach. Google's DeepMind has second hand support
knowledge to beat a human champion in the Go plot. Reinforcement learning is
more second hand in program plot to upgrade the gaming happening by providing
brisker bot.

5.2 TENSORFLOW

The world's most legendary deep education research is Google's Tensor


Flow. Google fully uses Allure Device's machine intelligence to power its search
engine, interpretations, body hugs and advice. To present a concrete model, Google
users can experience faster and more sophisticated searches using AI. When a user
types a magic word into the search bar, Google offers advice on the next
discussion. Google wants to use machine learning to impose large datasets to give
consumers the best options. Three different groups are using machine intelligence.

● Researchers.
● Data scientists.
● Programmers.

5.2.1 TENSORFLOW ARCHITECTURE

 Data pre-processing.
 Build the model.

25
 Train and estimate a model.

It is called tensor flow, as it is known, or at another time or place called tensor,


because it is recommended as a multifaceted array. At what point in time to
recommend can be summarized in a series of charts (called charts) to some extent.
Recommendations go to a converging end, so they flow through this series of
diverse movements and emerge from the other end as manufacturing. This is why it
is called tensor flow. Because the tensor comes in, flows through the motion
superclass, and the inverse motion comes out.

5.3 PYTHON OVERVIEW

Python is a high-level, interpreted, shared, mostly physical thinking style of


scripting. As expected, Python is designed to be easy to read. Other phones use
punctuation, so they repeat English keywords and have a leaner syntax in terms of
composition than other phones.

Python is interpreted: Python is treated as a translator at runtime. There is no


need to assemble the program before completing it. This is the same as PERL and
PHP.

● Python is interactive. You can actually open a Python command prompt and
communicate directly with the translator to doodle the program.

● Python is object-oriented: Python supports an object-oriented style, or


computational method that encapsulates laws in objects.

● Python is an entry-level language: Python is a great term for entry-level


programmers, supporting the development of applications ranging from natural
sentences to WWW browsers to plotting. History of Python

26
Python was bred in the late 1980s and early 1990s by Guido Camper Rossum at the
Dutch National Institute of Mathematics and Computers at his Science Institute.
Python appeared in many different languages, including ABC, Modula-3, C, C++,
Algol-68, Small Talk, Unix coverage, and various musical languages. Python is the
control. Like Perl, Python Beginning Law is now available under the GNU General
Public License (GPL). Python will soon be supported by the Despite the large
development team at the institute, Guido Camper Rossum is actively involved in
overseeing the exciting development.

5.4 ANACONDA NAVIGATOR


Using command-line commands, Anaconda Navigator, a PC program that
manages the display (GUI) built into Anaconda maps, enables you to use and
securely manage external Conda packages, environments, and channels. Anaconda
Cloud or a local Anaconda repository both allow for complete tracking of
Navigator. For Linux, Mac OS, and Windows, it is free.

Many controlled wholes believe specific translations of added bundles in


order to operate. Data physicists frequently use multiple interpretations of
numerous bundles and use a variety of environments to distinguish between these
various variants. Combined, a complete controller and a command-line program
called conda surroundings manager, to help dossier chemists guarantee that each
form of each bundle has all the dependencies it demands and everything right.

Navigator is a smooth, point-and-click habit to work together whole and


surroundings outside calling for to type conda commands in a terminal fenestra.

27
You can use it to find the bundle you want, establish ruling class in an
surroundings, run the bundle and revise ruling class, all inside Navigator.

The following applications are available by default in Navigator:

 JupyterLab
 Jupyter Notebook
 QT Console
 Spyder
 VS Code
 Glue viz
 Orange 3 App
 Rodeo
 RStudio
Advanced Conda customers can create more custom Navigator applications.
How do I enforce the rules in the Navigator? Spiders are accompanied by the
simplest outfits. In the Navigator home ticket, click Spyder to create and destroy
rules. You can also overuse Jupyter Notebook. A Jupyter Notebook is a widely
used scheme that combines rules, explanations, benefits, numbers, and a
common interface into a single history file, modified, pondered, and used in
mesh computer network/netting belief tables.

28
29
CHAPTER 6

TESTING

6.1 TESTING
● Software experiments are searches conducted to support employees and include
information about tested product or tool functionality. Software testing also
determines an objective free view of the table so that you can enjoy and understand
the risks of using an operating system. Testing methods include, but are not limited
to, the process of killing programs and using programs to identify software bugs.
Rather, software testing can be established as the process of verifying and proving
that a spre Using NLP

NLP (Natural Language Processing) is a field of computer science and artificial


intelligence that deals with the interaction between computers and human
language. NLP aims to enable computers to understand, interpret, and generate
human language in a way that is natural and intuitive for people.

NLP techniques are used in a wide range of applications, including:

1. Text classification: Used to categorize text data into predefined categories based on
its content.
2. Sentiment analysis: Used to determine the sentiment expressed in a piece of text,
such as positive, negative, or neutral.
3. Named entity recognition: Used to extract named entities from text, such as people,
organizations, and locations.
4. Part-of-speech tagging: Used to identify the parts of speech in a sentence, such as
nouns, verbs, and adjectives.

30
5. Summarization: Used to condense a large piece of text into a shorter, more concise
summary.
6. Question answering: Used to automatically answer questions posed in natural
language.
7. Machine translation: Used to automatically translate text from one language to
another.

NLP is a rapidly growing field, and recent advances in deep learning have led to
significant improvements in NLP performance. However, NLP still faces many
challenges, including dealing with ambiguity and context, understanding sarcasm
and irony, and handling different languages and dialects.

Overall, NLP has the potential to transform the way people interact with computers
and to enable new applications in areas such as customer service, e-commerce, and
information retrieval

Using AdaBoost ,XGBoost , CatBoost

Boosting algorithms are among the most powerful of all other machine learning
algorithms, with the highest performance and higher accuracy. All boosting
algorithms work based on learning from the errors of previously trained models
and try to avoid the same errors introduced by previously trained weak learning
algorithms.

It's also a great interview question to ask in a data science interview. This article
describes the main differences between the GradientBoosting, AdaBoost,
XGBoost, CatBoost and LightGBM algorithms and their working mechanics and
mathematics.

31
Gradient boosting

Gradient Boosting works on the principle of stepwise addition, where multiple


weak learning algorithms are trained, and multiple weak learning algorithms
trained on the same data set are added together to use one strong learning
algorithm as the final model. A boosting algorithm that In the gradient
reinforcement algorithm the first weak learner is not trained on the dataset, it just
returns the mean of the given column, the residual of the output of the first weak
learner algorithm is computed and used as the output or the target sequence of the
next weak learning algorithm to train.

Following the same pattern, a second weak learner is trained and residuals are
computed. The residuals are used as output sequences for the next weak learner. In
this way, the process continues until the residual is zero.

For gradient boosting, the data set must be in the form of numeric or categorical
data, and the loss function used to compute the residuals must be derivative at all
points.

XGBoost

A distributed gradient boosting library called XGBoost has been designed to be


quick and scalable when training machine learning models. a method of ensemble
learning that strengthens predictions by combining those from various weak
models. XGBoost stands for "Extreme Gradient Boosting" and is one of the most
popular and widely used machine learning algorithms due to its ability to handle
large data sets and its use in many machine learning tasks such as classification and
regression.

32
One of the main features of XGBoost is efficient missing value handling.
This allows you to handle real data with missing values without the need for
significant preprocessing. Additionally, XGBoost includes built-in parallel
processing support, enabling models to be trained on sizable datasets quickly.
Applications for XGBoost include click-through rate prediction, recommendation
engines, and Kaggle competitions among others. Additionally, you can fine-tune
various model parameters to enhance performance thanks to its high degree of
customizability.

Extreme Gradient Boosting, or XgBoost, is a suggestion made by University


of Washington researchers. a C++ library designed to improve gradient boosting
training.

You need to first understand trees, particularly decision trees, in order to


understand XGBoost.

Deciding tree:

A decision tree is a flowchart-like tree structure where each inner node


specifies a test for an attribute, each branch represents the result of the test, and
each leaf node (the terminal node) contains a class label. The tree can be "learned"
by splitting the source text into subsets based on attribute value tests. Recursive
partitioning, a method that repeats this process for each derived subset, is used in
this situation. When a subset of nodes all share the same value for the target
variable or when the split adds no more values to the prediction, recursion is
finished.cat boost

33
With CatBoost, the main difference that makes it stand out from the rest is
that the decision tree is growing. In CatBoost, the grown decision tree is
symmetric. This library can be easily installed with the following command:

Catboost

CatBoost is a boosting algorithm that works very well with categorical


datasets other than machine learning algorithms, as it has a special method for
handling categorical datasets. In CatBoost, categorical features are encoded based
on the output columns. Therefore, when training or coding categorical features, the
weights of the output columns are also taken into account, thus improving the
accuracy of the categorical dataset.

Comparing different algorithms

After fitting the data to the model, all algorithms give roughly similar results. Here
LightGBM seems to perform poorly compared to other algorithms, but XGBoost
works well in this case.

To visualize the performance of all algorithms on the same data, we can also plot
graphs between y_test and y_pred for all algorithms

Using RNNs for training

The network receives an input step that happens only once. The set of current
inputs and prior states are then used to compute the current state. The current ht
will be ht-1 at the next time step. Depending on the problem, we can take any
number of time steps and combine information from all previous states. After all
time steps are complete, the final current state is used to compute the output. The
34
output is then compared with the actual output. H. generates an error in the target
output. The error is then propagated to the network to update the weights and train
the network (RNN).

Adsheet program/application/product:

● Meets the misrepresentation and mechanical requirements that guided the design
and development of Fascination.

● Works normally and can be achieved with immutable characteristics.

TESTING METHODS: -

1. Functional Testing
Functional tests provide systematic demonstrations that functions tested are
available as specified by the business and technical requirements, system
documentation, and user manuals.

Functional testing is cantered on the following items:

● Functions: Identified functions must be exercised.


● Output: Exercises must be performed on specified classes of software
outputs.
● Systems/Procedures: system should work properly.

2. Integration Testing

35
Software unification experiment is the incremental unification experiment of two
or more joined operating system components on a alone program to produce
deteriorations generated by interface defects.

Test Case for Excel Sheet Verification:

Here in machine intelligence we are handling a dataset that is in surpass covering


plan so if any representative occurrence we need resources we need to check
surpass file. Later on categorization will bother the particular pillars of the dataset.

Test Case 1:

Code:

Output:

System Requirements:-

 Hardware:
1. OS – Windows 7, 8 and 10 (32 and 64 bit)
2. RAM – 4GB
 Software:
1. Python / Anaconda Navigator
2. In Python language
Jupiter notebook

36
CHAPTER –7
RESULT
Multiple disease prediction using machine learning can have several positive
outcomes. Here are some potential results:

Real-time information dissemination:ML can analyze and process large amounts


of data from various sources, such as news articles, social media posts, and
scientific literature. This allows for the rapid extraction of relevant information
about infectious disease outbreaks. Health education systems based on ML can
provide up-to-date information about the outbreak, including symptoms,
preventive measures, and treatment options. This timely dissemination of accurate
information can help in controlling the spread of the disease and minimizing
misinformation.

Personalized recommendations:ML can analyze individual data, such as


symptoms, medical history, and demographic information, to provide personalized
recommendations for individuals. This can include advice on self-care, when to
seek medical attention, or specific preventive measures based on the individual's
circumstances. Personalized recommendations can enhance the effectiveness of
health education by tailoring the information to the specific needs of each person.

Language and cultural sensitivity:ML can be used to overcome language barriers


by providing health education materials in multiple languages. It can also take into
account cultural nuances and adapt the content accordingly. This ensures that the

37
information is accessible and relevant to individuals from diverse linguistic and
cultural backgrounds, improving their understanding and compliance with
preventive measures.

Early detection and monitoring:ML algorithms can analyze social media posts,
online forums, and other digital platforms to detect early signals of an infectious
disease outbreak. By monitoring trends and detecting relevant keywords or
patterns, health education systems can provide early warnings and alerts to the
public and healthcare authorities. Early detection can help initiate timely
responses, such as increased surveillance, contact tracing, or vaccination
campaigns.

Data-driven decision-making:ML can assist in analyzing large datasets, including


clinical data, epidemiological reports, and public health records. By extracting
insights from these data, health education systems can provide evidence-based
recommendations for policymakers, healthcare professionals, and the general
public. This data-driven approach can lead to more effective strategies in
preventing and controlling infectious disease outbreaks.

It's important to note that the effectiveness of health education based on ML


depends on the quality and accuracy of the underlying data, as well as the
robustness of the algorithms and systems in place. Additionally, privacy concerns
and ethical considerations should be addressed to ensure the responsible use of
personal data in ML-based health education systems.

38
39
CHAPTER – 8
CONCLUSION AND DISCUSSION
8.1 CONCLUSION
A widespread infectious disease that has a negative impact on human life and the
global economic infrastructure. Recovery from a consumes a tremendous amount
of time and resources and takes decades. Containment is the first step in dealing
with an infection disease outbreak. In such cases, speed is of the essence because
any delay could result in the exponential destruction of both the economy and
human life. In order to predict the potential pace of escalation of the infection
sickness, governments and health ministries all over the world must always be one
step ahead. The majority of countries are ill-equipped to deal with such unforeseen
breakouts. A revolutionary transformation is brought about by predictive modelling
since it can serve as the first line of defense in containing an illness or sickness in
early stages.

8.2 FUTURE WORK:

We can improve future work can also be increased and made more precise
by adding further components and modules, including tracking regional traffic and
international flight data. To understand more about the current epidemic, advanced
ML methods and machine learning algorithms can be utilized in conjunction with
large amounts of data. When using features like periodic extraction, analysis, and
prediction, the entire model can be totally automated.

40
CHAPTER-9

APPENDIX

41

You might also like