Kali Charan Nigam Institute of Technology Banda
(U.P)
SESSION 2024-25
DEPARTMENT OF COMPUTER SCINCE AND ENGINEERING
A MAJOR PROJECT SYNOPSIS (KCS-753)
On
“Multiple Diseases Prediction System”
Submitted in partial Fulfillments to the requirement for the degree
Of
Bachelor of Technology
In
Computer Science & Engineering
Under the Guidance of
“Mr Abhishek Tiwari”
Submitted To: Submitted By:
Mr. Vivek Tripathi Nainshi Chaurasiya : 2101390100047
Nikhil Kumar : 2101390100048
Aman Yadav :2101390100006
i
CONTENTS
1.INTRODUCTION .................................................................................................................................. 1
1.1 OBJECTIVES .............................................................................................................................. 2
1.2 SCOPE OF THE PROJECT ......................................................................................................... 2
2.REQUIREMENT SPECIFICATION .................................................................................................... 3
2.1SOFTWARE REQUIREMENTS ........................................................................................................ 3
2.2HARDWARE REQUIREMETS.......................................................................................................... 3
3. TECHNOLOGIES ................................................................................................................................. 4
4.ALGORITHM ........................................................................................................................................ 6
5.DESIGNING ........................................................................................................................................... 7
5.1MODULES ......................................................................................................................................... 7
5.2MODULES DESCRIPTIONS ............................................................................................................. 7
5.3DFD(Data Flow Diagram) ................................................................................................................... 9
6.FUTURE SCOPE ...................................................................................................................................11
7.REFERENCES ......................................................................................................................................12
ii
1.INTRODUCTION
Healthcare is a critical domain where early diagnosis and treatment can save lives. The Multiple
Diseases Prediction System is a machine-learning-based application that leverages data from
patients (age, gender, symptoms, medical test results, etc.) to predict possible diseases and
recommend appropriate specialists or further tests.
The Earth is going through a purplish patch of technology where the demand of intelligence
and accuracy is increasing behind it. Today's people are likely addicted to internet but they
are not concerned about their physical health.
People ignore the small problem and don't visit to visit hospital which turn into serious
disease with time. Taking the advantage of this growing technology, our basis aim is to
develop such a system that will predict the multiple diseases in accordance with symptoms
put down by the patients without visiting the hospitals / physicians.
Machine Learning is a subset of AI that is mainly deal with the study of algorithms which
improve with the use of data and experience. Machine Learning has two phases i.e. Training
and Testing. Machine Learning provides an efficient platform in medical field to solve
various healthcare issues at a much faster rate.
There are two kinds of Machine Learning – Supervised Learning and Unsupervised
Learning. In supervised learning we frame a model with the help of data that is well
labelled. On the other hand, unsupervised learning model learn from unlabelled data.
The intent is to deduce a satisfactory Machine Learning algorithm which is efficient and
accurate for the prediction of disease. In this paper, the supervised Machine Learning
concept is used for predicting the diseases.
The main feature will be Machine Learning in which we will be using machine learning
algorithm which will help in early prediction of diseases accurately and better patient care.
1
1.1 OBJECTIVES
To develop a predictive system using machine learning to diagnose multiple diseases based
on user input.
To evaluate and compare the performance of different machine learning models for disease
prediction.
To provide a tool for healthcare professionals that can assist in identifying multiple diseases
in a patient with high accuracy.
1.2 SCOPE OF THE PROJECT
Disease Coverage: The system will predict a set of common diseases such as Diabetes, Heart
Disease, Cancer, Hypertension, and Pneumonia based on available data.
Data Input: The model will take input data in the form of patient symptoms, medical history,
demographic details (age, gender, etc.), and clinical test results.
Prediction Type: The system will output a probability for each disease, indicating the
likelihood of the patient having it.
2
2.REQUIREMENT SPECIFICATION
2.1SOFTWARE REQUIREMENTS
User Interface Designing HTML, CSS, Java Script,Streamlit.
Programming Language Python ,Machine Learning
Operating System Window
IDE Jupyter/VS Code
2.2HARDWARE REQUIREMETS
Processor Intel Core i3
Hard Disk 128Gb or more
Ram 1Gb or more.
Other Keyboard, Mouse, Monitor etc.
3
3. TECHNOLOGIES
Python: Python is a high-level, interpreted programming language known for its simplicity and
readability. It was created by Guido van Rossum and first released in 1991. Python has become
one of the most popular programming languages in the world due to its versatility, ease of use,
and wide range of applications.
Machine Learning: Machine learning (ML) is a type of artificial intelligence (AI) that provides
systems the ability to automatically learn and improve from experience without being explicitly
programmed. Machine learning focuses on developing computer programs that can access data
and use it to learn for themselves.
Machine Learning Library:
Scikit-Learn: A powerful library for various machine learning algorithms, including
classification, regression, clustering and model selection.
Data Manipulation Libraries:
NumPy: For numerical computations and array operations.
Pandas: For data analysis and manipulation.
Data Visualization Libraries:
Matplotlib: For creating static, animates and interactive visualizations.
Seaborn: A high-level data visualization library based on Matplotlib.
Web Development:
HTML: It defines the content and layout of a webpage using elements and tags. Provides
the skeleton or structure of a website.
CSS: CSS is a style sheet language used to control the appearance and layout of HTML
elements. Adds styling to a webpage (e.g., colours, fonts, spacing, and positioning).
Java Script: JavaScript is a programming language used to add interactivity, behaviour,
and logic to web pages. Supports modern development frameworks like React, Angular,
and Vue.js.
Web Development Framework:
Django: For building web application to deploy the model.
Cloud Platform:
4
Google Cloud Platform/Amazon Web Services(AWS)/Microsoft AZURE: For
deploying and scaling the model.
5
4.ALGORITHM
1. SVM(Support Vector Machine)
2. KNN(K-Nearest Neighbors)
3. Logistic Regression
4. Naïve Bayes
1.SVM(Support Vector Machine) Algorithm
A Support Vector Machine (SVM) is a powerful supervised machine learning algorithm primarily
used for classification tasks. It can also be employed for regression and outlier detection. The core
idea behind SVM is to find the optimal hyperplane that best separates data points into different
classes.
2.K-Nearest Neighbors (KNN) Algorithm
The K-Nearest Neighbors (KNN) algorithm is a simple, supervised machine learning method used
for both classification and regression tasks. It's based on the principle that similar things tend to be
close to each other.
3. Logistic Regression Algorithm
Logistic regression is a popular supervised machine learning algorithm primarily used for binary
classification tasks. It predicts the probability of an instance belonging to a particular class based on
the input features.
4.Naive Bayes Algorithm
The Naive Bayes algorithm is a simple yet powerful supervised machine learning algorithm, primarily
used for classification tasks. It's based on Bayes' theorem with the "naive" assumption of
independence between features.
6
5.DESIGNING
5.1MODULES
Data Collection Module
Preparing the data Module
Training a model
Disease prediction Module
5.2MODULES DESCRIPTIONS
Data Collection and Preprocessing Module
Data Sources:
Collect data from various sources, including medical records, research papers, and public
health databases.
Ensure data quality, accuracy, and completeness.
Data Cleaning:
Handle missing values using techniques like imputation or deletion.
Identify and remove outliers or inconsistencies.
Data Preprocessing:
Normalize or standardize numerical features to a common scale.
Encode categorical features using appropriate techniques (e.g., one-hot encoding, label
encoding).
Split the data into training and testing sets.
Model Selection and Training Module
Algorithm Selection:
Choose appropriate machine learning algorithms based on the problem and dataset
characteristics.
Consider algorithms like Logistic Regression, Decision Trees, Random Forest, Support
Vector Machines, XGBoost, and Neural Networks.
Model Training:
Train the selected models on the pre-processed training data.
Tune hyperparameters to optimize model performance.
Use techniques like cross-validation to assess model generalization.
Deployment Module
Model Integration:
Integrate the trained model into the web application.
Deployment Platform:
7
Deploy the web application on a cloud platform (e.g., Heroku, AWS, GCP) for
accessibility.
User Interface Module
User Input:
Provide a questionnaire or direct input fields for users to enter their symptoms.
Validate user input to ensure accuracy and completeness.
Prediction Display:
Display the predicted diseases and their probabilities in a clear and understandable
format.
Provide additional information or recommendations based on the predictions.
User Feedback:
Implement a feedback mechanism to collect user feedback and improve the system.
8
5.3DFD(Data Flow Diagram)
The DFD is also called as bubble chart. It is a simple graphical formalism that can be used
to represent a system in terms of input data to the system, various processing carried out
on this data, and the output data is generated by this system.
The data flow diagram (DFD) is one of the most important modelling tools. It is used to
model the system components. These components are the system process, the data used by
the process, an external entity that interacts with the system and the information flows in
the system.
DFD shows how the information moves through the system and how it is modified by a
series of transformations. It is a graphical technique that depicts information flow and the
transformations that are applied as data moves from input to output.
DFD is also known as bubble chart. A DFD may be used to represent a system at any level
of abstraction. DFD may be partitioned into levels that represent increasing information
flow and functional detail.
Level 0 DFD:
Multiple diseases
User Predict disease
prediction system
9
Level 1 DFD
Result in
Enter Displayed
Detail Diseases
and predicted
Symptom
s
U1
User
User data Transferred Algorithm Predict
Result
User data is processed
User data
Algorithm
is
Is applied
processed
Datasets
10
6.FUTURE SCOPE
To enhanced the functionality of the prediction engine providing the details of 5 nearest
hospital or medical facilities to the user input location.
Provide a user account which allows the user to keep track of their medical test data and
get suggestions or support to meet the right specialists or the tests to be taken.
Provide admin control to upload, delete the dataset which will be used to train the model.
Automate the process of training the model and extracting pickle file of the trained model
which will be consumed by the API’s to predict the disease.
Mail the detailed report of the prediction engine results along with the information of 5
nearest facilities details having location and contact information.
Improved Accuracy:
Use larger datasets for better model training.
Incorporate advanced models like deep learning.
Multi-Language Support:
Add support system for regional to increase accessibility.
Integration with Healthcare Systems:
Integrating the prediction system into hospital management systems to make predictions in
real-time.
Handling Imbalanced Datasets:
Implement advanced techniques like SMOTE (Synthetic Minority Over-sampling
Technique) to balance classes in datasets.
Deep Learning Models:
Experiment with deeper neural networks or use reinforcement learning for continuous
improvement of the prediction model.
11
7.REFERENCES
1. “Artificial Intelligence in Healthcare” by Parashar Shah and Priti Dube.
2. “Healthcare Data Analytics” by Chandan K. Reddy and Kennesaw State University.
3. “Deep Learning for Healthcare” by A.K. Sharma.
4. www.stackoverflow.com
5. Guide “Mr. Abhishek Tiwari.”
12
THANK YOU
13