RAJKIYA ENGINEERING COLLEGE
BIJNOR
FINAL YEAR PROJECT [KIT753]
(2024-2025)
GROUP ID-06
AN EFFICIENT ENSEMBLE BASED MODEL FOR PREDICTION OF
NEURODEGENERATIVE MALADY AND ACUTE ENCEPHALOPATHY
UNDER THE GUIDANCE OF: SUBMITTED BY-
[Link] KUMAR ANISHA YADAV,2107350130014
(ASSISTANT PROFESSOR) RAMANDEEP RATAN, 21073501
SUMANSHI ROY,2107350130061
MUSHARRAF ALI,220735013900
CONTENT
[Link]
[Link] Review
[Link] Gap
[Link] Statement
[Link]
[Link]
[Link] Work
[Link]
[Link] Timeline
[Link]
INTRODUCTION
• Neurodegenerative maladies and acute encephalopathy pose a major
global health burden due to their progressive, irreversible nature,
leading to neuronal degeneration, motor impairment, cognitive decline,
and loss of independence.
• Around 60–70% of dementia cases worldwide, which affected over 55
million individuals as of 2021, were caused by Parkinson’s Disease
(PD). By 2050, this figure is expected to almost quadruple to 139
million(Figure. 1). The worldwide cost of dementia is projected to be
above $1 trillion each year, indicating a significant economic burden [1].
• This study includes a dataset containing multiple acoustic features
extracted from the voice recordings to train and test several machine- Figure.1 Global Parkinson’s Disease Prevalence Over the Years
learning models.
LITERATURE REVIEW
LITERATURE REVIEW
CONT.
RESEARCH GAP
• Traditional classifiers yield higher false positives and negatives, limiting
diagnostic accuracy.
• Few studies have investigated ranking-based ensemble strategies to improve
predictive performance for neurodegenerative diseases.
• Imbalanced and limited datasets hinder robust model training and testing.
• Inefficient feature selection leads to higher computation time and reduced
model efficiency, particularly in large datasets
PROBLEM
STATEMENT
• Neurodegenerative diseases and acute encephalopathy are
challenging to diagnose early due to the limitations of traditional
methods, which are time-consuming, costly, and reliant on expert
interpretation.
• Also, symptom overlap and variable progression in the diseases
occur in the traditional method.
• Existing single-model machine learning approaches lack
robustness and accuracy, necessitating the development of an
efficient ensemble-based predictive model to enhance diagnostic
precision and support timely interventions.
OBJECTIVES
1. Literature survey & study of related work to find the
limitations and strengths.
2. Create models to remove these limitations.
3. Apply an ensemble-based technique to remove limitations
and to get high accuracy.
4. Write a research article on this topic.
METHODOLOGY
Feature
Importance
Analysis
[Link] Importance Analysis: SHAP analysis identified key clinical and demographic features for
predicting neurodegenerative diseases and acute encephalopathy.
[Link] Preprocessing: Raw data was cleaned, handling missing values, and encoding categorical variables.
[Link] Engineering: Relevant features were engineered based on the SHAP analysis while creating new
features and transforming existing ones.
[Link] Selection & Ensemble Development: Multiple machine learning models were evaluated and
combined into an ensemble techniques like Stacking, Boosting etc.
[Link] Training: Models were trained using cross-validation and hyperparameters were tuned to optimize
performance.
[Link] Evaluation & Validation: Performance was assessed using metrice like accuracy, precision recall,
FINALIZED WORK
The [Link] dataset is likely sourced from the UCI Machine Learning Repository.
The distplot function combines a histogram and a KDE plot to provide a
comprehensive view of the data's distribution
•Univariate analysis: Visualizing the distribution of each feature:
the range, central tendency, and skewness.
•Bivariate analysis: Comparing distributions of features between healthy
and Parkinson's groups (different colors) suggesting their potential
importance in classification.
CONT.
•Median: The central line
inside the box represents
the median value of the
feature.
•Quartiles: The box itself
shows the interquartile
range (IQR), containing the
middle 50% of the data.
•Whiskers: Lines
extending from the box
show the range of the data,
excluding outliers.
•Outliers: Individual points
Boxplots visualize the distribution of features and highlight differences. beyond the whiskers are
•Feature spread and central tendency: Median, quartiles, and potential outliers for each feature potential outliers, indicating
unusual or extreme values
(status 0 and 1).
for that feature.
• Redundant features: Used with correlation analysis to identify highly correlated features for
reducing data.
CONT.
Visualizing class distribution of the target variable ('status') before
applying data balancing techniques like SMOTE (Synthetic Minority
Over-sampling Technique).
•Visualizes Class Distribution: The barplot shows how many
instances belong to each class (healthy and Parkinson’s).
•
Bar Height = Instance Count (for specific class)
By comparing bar heights, we can see a difference between the
number of healthy and Parkinson's instances – this indicates class
imbalance.
Confirmation of Balancing: It's used to confirm that SMOTE has
successfully balanced (roughly equal) dataset by creating synthetic
instances of the minority class (Parkinson's disease)
The height of each bar now represents the number of instances in each
class after SMOTE has been applied.
CONCLUSION
The study focused on developing an effective ensemble model
to predict neurodegenerative disease and acute
encephalopathy. Various machine learning algorithms were
evaluated, including XGBoost, KNN, Kernel SVM, Random Forest,
Decision Tree, Logistic Regression, Naive Bayes, and SVM.
XGBoost achieved the highest accuracy (over 98%),
outperforming others like KNN and Kernel SVM (90–93%), while Fig 6: Accuracy Score
the Stacking Classifier (Random Forest + Gradient Boosting)
reached 97%.
Data preprocessing played a key role in enhancing model
performance. SMOTE addressed class imbalance by over-
sampling the minority class, and feature scaling normalized the
data. Correlation analysis and data reduction helped tackle
multicollinearity by eliminating redundant features. Cross- Fig 7: After adding stacking Classifier
validation with fold values ranging from 2 to 20 confirmed the
robustness of the models, with XGBoost maintaining an average
PROJECT TIMELINE
FUTURE SCOPE
• Real-Time Integration: Connect models with clinical data streams and
wearable devices for early detection.
• Advanced Algorithms: Explore CatBoost, LightGBM, CNNs (for imaging),
RNNs/LSTMs (for sequential data), and AutoML tools for improved
performance.
• Multimodal Data Fusion: Combine genetic, imaging, and lab data using
deep learning for more accurate diagnosis.
• Clinical Deployment: Develop web/mobile-based Clinical Decision
Support Systems (CDSS).
• Personalized Medicine: Use clustering (e.g., DBSCAN) to segment
patients and recommend tailored treatments.
• Federated Learning: Train models across institutions while preserving
data privacy, enabling global scalability.
PUBLICATION
• Research paper titled “An Efficient Ensemble Based Model For Prediction Of
Neurodegenerative Malady And Acute Encephalopathy” has been accepted for
ICAISI 2025 (International Conference on Advanced Informatics for Sustainable
Innovation).
• The paper was presented at the ICAISI 2025 CONFERENCE on 31st May, 2025.
• It will be published in a Scopus-indexed journal as part of the official conference proceedings.
• The research contributes to the field of “Artificial Intelligence in Healthcare”, focusing on early
detection of Parkinson’s disease.
REFERENCES
1. [Link] accessed online on 05 April, 2025.
2. Ali, L., Chakraborty, C., He, Z., Cao, W., Imrana, Y., & Rodrigues, J. J. (2023). A novel sample and feature dependent ensemble approach for Parkinson’s disease
detection. Neural Computing and Applications, 35(22), 15997-16010.
3. Syam, V., Safal, S., Bhutia, O., Singh, A. K., Giri, D., Bhandari, S. S., & Panigrahi, R. (2023). A non-invasive method for prediction of neurodegenerative diseases
using gait signal features. Procedia computer science, 218, 1529-1541.
4. Pilania, U., Kumar, M., Singh, S., & Adhalkha, P. (2023, December). A Deep Ensemble Learning Network-based Approach to Detect Neurodegenerative Diseases.
In 2023 2nd International Conference on Automation, Computing and Renewable Systems (ICACRS) (pp. 1033-1038). IEEE.
5. Kiran, A., Alsaadi, M., Dutta, A. K., Raparthi, M., Soni, M., Alsubai, S., ... & Asenso, E. (2024). Bio-inspired Deep Learning-Personalized Ensemble Alzheimer's
Diagnosis Model for Mental Well-being. SLAS Technology, 100161.
6. Goyal, P., Rani, R., & Singh, K. (2024). An efficient ranking-based ensembled multiclassifier for neurodegenerative diseases classification using deep learning.
Journal of Neural Transmission, 1-27.
7. Shah, M., Shandilya, A., Patel, K., Mehta, M., Sanghavi, J., & Pandya, A. (2024). Neuropsychological detection and prediction using machine learning algorithms: a
comprehensive review. Intelligent Medicine, 4(3), 177-187
8. Gourie-Devi, M. (2014). Epidemiology of neurological disorders in India: Review of background, prevalence and incidence of epilepsy, stroke, Parkinson's disease
and tremors. Neurology India, 62(6), 588-598..
THANK YOU
(Any Query ??)