ML-Based Multi-Disease Prediction
ML-Based Multi-Disease Prediction
net/publication/381310662
CITATION READS
1 449
3 authors, including:
Sifatullah Siddiqi
Jawaharlal Nehru University
20 PUBLICATIONS 326 CITATIONS
SEE PROFILE
All content following this page was uploaded by Sifatullah Siddiqi on 11 June 2024.
88 This work is licensed under Creative Commons Attribution 4.0 International License.
International Journal of Engineering and Management Research e-ISSN: 2250-0758 | p-ISSN: 2394-6962
Volume-13, Issue-3 (June 2023)
[Link] [Link]
consumption. Furthermore, several prevailing systems frameworks, ethical guidelines, and healthcare
rely on limited parameters, leading to potentially workflows[15].
erroneous outcomes[8]. In conclusion, the application of machine
1.3 Proposed System learning in multiple disease prediction holds immense
We present an innovative solution that potential for revolutionizing healthcare. By harnessing
revolutionizes disease prediction in healthcare analysis. the power of these algorithms, healthcare providers can
Our proposed system transcends the conventional proactively identify individuals at risk, enhance diagnosis
approach by enabling the simultaneous prediction of accuracy, and optimize treatment strategies. However, it
multiple diseases. By consolidating diverse analyses into is crucial to address the challenges associated with data
a single unified platform, users can efficiently access quality, privacy, interpretability, and regulatory
accurate predictions for various conditions. With a focus compliance to ensure the successful implementation of
on enhancing both accuracy and efficiency, our model machine learning-based predictive models in healthcare
considers a comprehensive set of parameters, ensuring settings.
reliable results. By eliminating the need for multiple There are several tools and technologies used
models and streamlining the prediction process, our which have been used to develop this project[16].
system holds the potential to significantly improve
healthcare outcomes while optimizing resource Tools Used:
allocation[9]. 1: Kaggle - Kaggle is a platform that provides access to
To implement multiple disease analyses, we will diverse datasets
utilize machine learning algorithms and the Streamlit 2: Google Colaboratory - Colaboratory is a data analysis
framework. When accessing the web application, users and machine learning tool
can select the specific disease they wish to predict and 3: Anaconda – Anaconda aims to simplify package
input the corresponding parameters. Streamlit will then management and deployment.
invoke the appropriate model and provide the patient's 4: Spyder IDE - An open-source cross-platform
status as the output. This research contributes to the integrated development environment
advancement of healthcare analytics, providing a unique 6: Streamlit Cloud - Deploy, manage, share your apps
and holistic approach to disease prediction that has the with the world, directly from Streamlit
capacity to transform patient care on a global
scale[8,9,10]. Technologies Used:
1: Python - Python is dynamically typed, high-level,
II. BACKGROUND general-purpose programming language.
2: NumPy - A library for the Python. adding support for
The field of healthcare has witnessed significant large, multi-dimensional arrays & matrices
advancements in recent years, thanks to the emergence of 3: Pandas – A software library written for the Python
machine learning techniques. With the growing programming language for data manipulation and
availability of health data and the increasing computing analysis.
power, machine learning has become a powerful tool in 4: Sklearn - A free software machine learning library for
predicting and diagnosing various diseases[11,12,13]. the Python programming language.
The benefits of employing machine learning for 5: Machine Learning Algorithms - Supervised learning is
multiple disease prediction are numerous. Firstly, it the types of machine learning in which machines are
enables healthcare professionals to identify individuals trained using well "labelled" training data, and on basis of
who are at a higher risk of developing multiple diseases, that data, machines predict the output.
facilitating early intervention and preventive measures. 6: Pickle - Python pickle module is used for serializing
Secondly, it aids in optimizing healthcare resource and de-serializing a Python object structure.
allocation by prioritizing high-risk patients and ensuring 7: Stream Lit - A free, Open-source framework to rapidly
timely interventions. Furthermore, machine learning build and share beautiful machine learning web apps.
algorithms can assist in the identification of disease
patterns and risk factors, contributing to the development III. SYSTEM ANALYSIS
of targeted public health strategies[14].
While significant progress has been made in the 3.1 Functional Requirement
field of multiple disease prediction using machine The system allows the patient to predict the
learning, there are still challenges that need to be different diseases.
addressed. These include the availability and quality of The user adds disease-specific strategies and
health data, ensuring patient privacy and data security, training models based on the user's strategy are
and the interpretability and explain ability of the published.
predictive models. Additionally, the integration of
machine learning algorithms into existing healthcare
systems requires careful consideration of regulatory
89 This work is licensed under Creative Commons Attribution 4.0 International License.
International Journal of Engineering and Management Research e-ISSN: 2250-0758 | p-ISSN: 2394-6962
Volume-13, Issue-3 (June 2023)
[Link] [Link]
3.2 Non-Functional Requirements leveraged to train and validate our prediction system
The website will provide many benefits during accurately.
the forecast period. 5.3 Data Pre- Processing / Removal of Unwanted Data
The Website must be reliable and consistent. The collected data serves various purposes, and as
it is sourced from diverse platforms, it can contain a
substantial amount of information. However, this imported
IV. SYSTEM MODEL data may also include unwanted or noisy elements that
require pre-processing. The primary objective of data pre-
For developing this project, we have used
processing is to refine the dataset by removing irrelevant or
Supervised Machine Learning Model. Supervised
redundant information, addressing missing values, and
Learning is the simplest machine learning model to
handling outliers or noise. This step ensures that only the
understand in which input data is called training data
necessary and high-quality data is retained for further
and has a known label or result as an output. So, it
analysis[20].
works on the principle of input-output pairs. It requires
5.4 Feature Selection
creating a function that can be trained using a training
Feature selection is a critical step in the data
data set, and then it is applied to unknown data and
analysis process, aimed at identifying and selecting the
makes some predictive performance. Supervised
most relevant and informative features from a dataset. With
learning is task-based and tested on labelled data
the abundance of available data, feature selection plays a
sets[17].
crucial role in enhancing the performance and efficiency of
We can implement a supervised learning model
machine learning models. We have used statistical
on simple real-life problems. We have employed
measures and correlation analysis techniques to assess the
machine learning models to predict the likelihood of
importance and relevance of each feature[21].
different diseases based on user-input symptoms. To
By performing feature selection, we aim to
ensure accurate predictions, we selected different
streamline the input data and retain only the most valuable
machine learning algorithms for each disease,
features for our predictive models. This process helps us
considering their accuracy performance[18].
focus on the most influential factors and ensures that our
For each disease, we carefully chose a specific
model's predictions are based on the most meaningful and
machine learning algorithm that best captures the
impactful variables. Ultimately, feature selection enables us
patterns and relationships between symptoms and
to improve the accuracy and efficiency of our multiple
diseases. These algorithms include logistic regression,
disease prediction system, contributing to better healthcare
support vector machines (SVM) & K-Nearest
outcomes and empowering medical professionals with
Neighbours (KNN). The selection was based on their
valuable insights[22].
ability to effectively analyse the dataset and provide
5.5 Model Building
accurate predictions[19].
For our multiple disease prediction system, we
By leveraging different machine learning
have utilized supervised machine learning algorithms,
algorithms for different diseases and selecting the most
namely Logistic Regression, Support Vector Machines
accurate models, our project aims to provide reliable
(SVM), and K-Nearest Neighbours (KNN). These
disease predictions, supporting healthcare professionals
algorithms have been chosen for their effectiveness in
and users in early detection and intervention.
classification tasks and their ability to handle multi-class
prediction scenarios[23].
V. EXPERIMENT By leveraging these supervised machine learning
algorithms, we aim to develop robust and accurate models
5.1 Hypothesis Generation for multiple disease prediction. Each algorithm brings its
The hypothesis for the Multiple Disease unique strengths and considerations, allowing us to explore
Prediction System is that by analysing general medical data different approaches and select the most suitable model for
and advanced machine learning algorithms, it is possible to each disease category. Through this model-building
accurately predict the likelihood of individuals acquiring process, we aim to provide reliable and timely
specific diseases. The hypothesis assumes that there are predictions[24].
underlying patterns and relationships within the medical 5.6 Deployment
data that can be leveraged to develop a robust predictive Our multiple disease prediction system has been
model. deployed using the Streamlit Cloud server, providing a
5.2 Collection of Data user-friendly web interface for easy access and interaction.
To initiate this project, we began by collecting With Streamlit Cloud, users can input disease parameters
data from various sources. We utilized Kaggle as a and receive accurate predictions for multiple diseases
platform to import relevant datasets, which serve as simultaneously. This deployment ensures accessibility and
valuable resources for practice, research, and as a scalability, empowering healthcare professionals and
foundation for constructing machine learning models. individuals to make informed decisions about their health.
These curated datasets provide a solid starting point,
offering a diverse range of information that can be
90 This work is licensed under Creative Commons Attribution 4.0 International License.
International Journal of Engineering and Management Research e-ISSN: 2250-0758 | p-ISSN: 2394-6962
Volume-13, Issue-3 (June 2023)
[Link] [Link]
̅ + 𝑏̅ = 0
𝑥⋅𝑤
91 This work is licensed under Creative Commons Attribution 4.0 International License.
International Journal of Engineering and Management Research e-ISSN: 2250-0758 | p-ISSN: 2394-6962
Volume-13, Issue-3 (June 2023)
[Link] [Link]
Addition of More Diseases: In the future, we can [4] Raja, M. S., Anurag, M., Reddy, C. P. & Sirisala,
expand the system by incorporating additional N. R. (2021, January). Machine learning based
diseases into the existing web application. This heart disease prediction system. In 2021
would enable users to predict a broader range of International Conference on Computer
diseases and further enhance the system's Communication and Informatics (ICCCI), pp. 1-
usefulness in healthcare. 5. IEEE.
Accuracy Improvement: As part of ongoing [5] Khan, W. & Haroon, M. (2022). An
research and development, we can strive to unsupervised deep learning ensemble model for
improve the accuracy of disease predictions. By anomaly detection in static attributed social
refining the machine learning algorithms, networks. International Journal of Cognitive
optimizing feature selection, and incorporating Computing in Engineering, 3, 153-160.
more comprehensive datasets, we can reduce [6] Khan, R., Haroon, M. & Husain, M. S. (2015,
false predictions and increase the overall April). Different technique of load balancing in
accuracy of the system. This would ultimately distributed system: A review paper. In 2015
contribute to lowering the mortality rate by Global Conference on Communication
enabling timely interventions and treatments. Technologies (GCCT), pp. 371-375. IEEE.
Integrating our multiple disease prediction [7] Haroon, M. & Husain, M. (2015, March).
system with electronic health records can Interest Attentive Dynamic Load Balancing in
provide a more comprehensive and personalized distributed systems. In 2015 2nd International
healthcare experience. By leveraging patient Conference on Computing for Sustainable
data from EHR systems, we can enhance the Global Development (INDIACom), pp. 1116-
accuracy of predictions and enable healthcare 1120. IEEE.
professionals to make informed decisions based [8] Husain, M. S. & Haroon, D. M. (2020). An
on the patient's medical history. enriched information security framework from
Mobile Application Development: Developing a various attacks in the IoT. International Journal
mobile application version of the multiple of Innovative Research in Computer Science &
disease prediction system would enhance Technology (IJIRCST).
accessibility and convenience for users. It would [9] Haroon, M. & Husain, M. (2013). Analysis of a
allow individuals to access the system on their dynamic load balancing in multiprocessor
smartphones, providing real-time disease system. International Journal of Computer
predictions and empowering them to take Science engineering and Information Technology
proactive measures for their health anytime, Research, 3(1).
anywhere. [10] Khan, W. (2021). An exhaustive review on state-
By focusing on these future directions, we can of-the-art techniques for anomaly detection on
further advance the field of disease prediction, improve attributed networks. Turkish Journal of
healthcare outcomes, and make a positive impact on Computer and Mathematics Education
individuals' lives. (TURCOMAT), 12(10), 6707-6722.
[11] Haroon, M. & Husain, M. (2013). Different
REFERENCES types of systems model for dynamic load
balancing. IJERT, 2(3).
[12] Siddiqui, Z. A. & Haroon, M. (2023). Research
[1] Gopisetti, L. D., Kummera, S. K. L., Pattamsetti,
on significant factors affecting adoption of
S. R., Kuna, S., Parsi, N. & Kodali, H. P. (2023,
blockchain technology for enterprise distributed
January). Multiple disease prediction system
applications based on integrated MCDM FCEM-
using machine learning and streamlit. In: 5th
MULTIMOORA-FG method. Engineering
International Conference on Smart Systems and
Applications of Artificial Intelligence, 118,
Inventive Technology (ICSSIT), pp. 923-931.
105699.
IEEE.
[13] Khan, N. & Haroon, M. (2022). Comparative
[2] Keniya, R., Khakharia, A., Shah, V., Gada, V.,
study of various crowd detection and
Manjalkar, R., Thaker, T., ... & Mehendale, N.
classification methods for safety control
(2020). Disease prediction from various
system. Available at: SSRN 4146666.
symptoms using machine learning. Available at:
[14] Siddiqui, Z. A. & Haroon, M. (2022).
SSRN 3661426.
Application of artificial intelligence and
[3] Srivastava, S., Haroon, M. & Bajaj, A. (2013,
machine learning in blockchain technology.
September). Web document information
In Artificial Intelligence and Machine Learning
extraction using class attribute approach. In: 4th
for EDGE Computing, pp. 169-185. Academic
International Conference on Computer and
Press.
Communication Technology (ICCCT), pp. 17-22.
[15] Tripathi, M. M., Haroon, M., Khan, Z. &
IEEE.
Husain, M. S. (2022). Security in digital
93 This work is licensed under Creative Commons Attribution 4.0 International License.
International Journal of Engineering and Management Research e-ISSN: 2250-0758 | p-ISSN: 2394-6962
Volume-13, Issue-3 (June 2023)
[Link] [Link]
94 This work is licensed under Creative Commons Attribution 4.0 International License.