0% found this document useful (0 votes)
25 views39 pages

Epileptic Seizure - Report

The Epilepsy dataset consists of single-channel EEG recordings from 500 subjects, transformed into 11,500 samples for seizure detection classification. The dataset underwent preprocessing, including binary class conversion and under-sampling to address class imbalance, and was evaluated using various machine learning models, with the Light Gradient Boosting Machine achieving the highest accuracy. Additionally, Principal Component Analysis is applied to reduce dimensions while maintaining optimal model performance.

Uploaded by

saxenautkarsh722
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views39 pages

Epileptic Seizure - Report

The Epilepsy dataset consists of single-channel EEG recordings from 500 subjects, transformed into 11,500 samples for seizure detection classification. The dataset underwent preprocessing, including binary class conversion and under-sampling to address class imbalance, and was evaluated using various machine learning models, with the Light Gradient Boosting Machine achieving the highest accuracy. Additionally, Principal Component Analysis is applied to reduce dimensions while maintaining optimal model performance.

Uploaded by

saxenautkarsh722
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Epilepsy Dataset

The Epilepsy dataset consists of single-channel EEG recordings from 500 subjects,
collected by Andrzejak et al. in 2001. Each recording lasted 23.6 seconds.

Transformation Process

1. Splitting and Shuffling: The 23.6-second recordings were divided into 1-second segments,
resulting in 11,500 samples. These samples were shuffled to reduce sample-subject
association.
2. Data Distribution: The samples were split into 60 training, 20 validation, and 11,420 test
samples. Validation samples were added to the end of the training file.
3. Class Labels: Initially, there were five classes:
●​ Eyes open
●​ Eyes closed
●​ EEG from a healthy brain region
●​ EEG from the tumor region
●​ Seizure episode
These were merged into two classes: seizure and non-seizure.

Final Format
The dataset comprises:
●​ Total Samples: 11,5000 Samples
●​ Classes: 2 (seizure, non-seizure)
●​ Dimensions: 178 (each 1-second sample at 178 Hz)

The dataset is utilized for time series classification to detect epileptic seizures using EEG
data.
Feel free to check the detailed description here.

For the purpose of this paper, we don't need to download the original dataset or repeat the
preprocessing steps. Instead, the preprocessed data is readily available and can be
downloaded as a CSV file from the following sources:

1. Time Series Classification


2. Epileptic Seizures Dataset
Preprocessing

Original Class Distribution

Originally, the data is divided into 5 classes, each representing 20% of the dataset. In other
words, there is no class imbalance in the data, as all classes have equal representation.

Binary Class Conversion


In the initial phase of our study, we applied essential preprocessing steps to the dataset. Our
dataset consists of five target labels, which we converted into binary labels for the detection of
epileptic seizures. Specifically, we identified one class labeled 'ES' (Y=1), representing the
presence of a seizure. The remaining classes were collectively categorized to signify the
absence of epileptic seizures.

This conversion into binary labels addressed the primary objective of seizure detection but
introduced a new challenge: class distribution imbalance. This issue necessitates further
exploration and appropriate handling to ensure accurate and unbiased model performance.
Upon transforming our dataset into a binary classification problem to detect epileptic seizures,
we encountered a significant class imbalance. The dataset comprised only 2,300 samples
indicating seizures and 9,200 samples without seizures.

Addressing this class imbalance is crucial, and sampling methods are a common solution.
Among the two primary methods—under-sampling and over-sampling—over-sampling is
unsuitable for our study. Over-sampling creates new samples through interpolation, which could
misrepresent the true characteristics of EEG data or human brain scans, thereby potentially
impairing the model's effectiveness.

Consequently, we employed under-sampling techniques to mitigate the class imbalance,


ensuring that our models remain accurate and reliable in detecting epileptic seizures.
Under Sampling

Following the application of the undersampling process, we achieved a balanced data


distribution. Our dataset now comprises 50% non-epileptic seizure samples and 50% epileptic
seizure samples. This adjustment has effectively removed bias towards any particular class,
resulting in a more robust and accurate dataset for analysis.

Splitting Data
In our study, we approached data splitting with a comprehensive methodology. Rather than
performing a single split, we executed three distinct splits: 90-10%, 80-20%, and 75-25%. This
approach allowed us to compare the results across different splits and ensure the robustness of
our analysis.
Min Max Normalization
Previous studies have shown that normalizing or standardizing datasets often enhances model
performance. While some models are unaffected by normalization, others benefit significantly
from this process. Therefore, we applied min-max normalization to optimize our model's
performance

Machine Learning Models

K-Nearest Neighbours
The first model we tested was the k-nearest neighbors (KNN). While we standardized the
procedure for training and evaluating all models, KNN required a more tailored approach due to
the necessity of determining the optimal value of 'k' (the number of nearest neighbors). To
identify the best 'k' value, we tested the model with various 'k' values.
When comparing the accuracy for different K values, we observed that the highest performance,
with an accuracy of 91%, was achieved when K was set to 1. As the K value increased,
performance consistently decreased. Additionally, metrics such as accuracy, precision, recall,
and F1-score demonstrated a continuous decline with increasing K values. Therefore, we
concluded that the optimal value of K is 1. It is important to note that these measures were
calculated using the testing data. These results were invariant to the split ratio:
When comparing different K values, we found that the highest performance, with an accuracy of
91%, was achieved when K was set to 1. As the value of K increased, performance consistently
decreased. This trend was observed across various metrics, including accuracy, precision,
recall, and F1-score. Consequently, we determined that the optimal K value was 1. The testing
data supported this conclusion, showing the best performance at K=1, with a precision of 93%,
recall of 91%, and F1-score of 91%.

Furthermore, comparison graphs for different K values and the results for accuracy, recall,
precision, and F1-score on the testing data indicated that the split ratio did not significantly
impact KNN performance. The optimal K value remained 1, regardless of the split ratio. In our
study, we used an 80/20 split, with 80% of the data for training and 20% for testing. Although the
performance metrics decreased slightly under this split, the precision was 92%, recall was 90%,
and F1-score was 91%. Even when using a 75-25 split ratio, the performance metrics remained
consistent.

Classification Models
In our study, we focused on binary classification models for the panel classification task. We
selected the following models: Light Gradient Boosting Machine, Extreme Gradient Boosting
Machine, CatBoost Classifier, Random Forest, Support Vector Machines, K-Nearest Neighbors,
Naive Bayes models, and Decision Trees. These models were specifically chosen to analyze
their performance on the dataset.

Notably, models such as Extreme Gradient Boosting Machine and CatBoost Classifier were not
previously tested in any of the supplied papers. Thus, we consider these hybrid models as
innovative additions for our study.

Error Comparison

From the visualization, we can infer two main points: First, the mean absolute error (MAE) and
root mean squared error (RMSE) follow the same trend, indicating they provide consistent
information. Both measures are low for the Light Gradient Boosting Classifier, CatBoost
Classifier, Random Forest, Extreme Gradient Boosting (XGBoost) Classifier, and Support Vector
Machine (SVM). These are our top five models in terms of error comparison.

Among these top five models, the Light Gradient Boosting Classifier and CatBoost Classifier
have identical errors, while the Random Forest and XGBoost Classifiers also show identical
error results. Therefore, we can conclude that the best model based on error comparison is the
Light Gradient Boosting Classifier. However, the CatBoost Classifier also yields the same
performance, at least in terms of MAE and RMSE, essentially providing equivalent results.
Despite slight variations in results among different models, the best performers identified are the
Light Gradient Boosting Machine, CatBoost Classifier, and Extreme Gradient Boosting
Classifier. As we increase the amount of training data—from a 75-25 split to an 80-20 split and
finally a 90-10 split—there is a general trend of decreasing error, both in terms of mean absolute
error and root mean squared error.

Interestingly, the Extreme Gradient Boosting Classifier performs best with a 75-25 split, unlike
other models which show optimal performance with a 90-10 split. This highlights the necessity
of considering different split ratios for different models to achieve the best performance
outcomes.
Accuracy Comparison

Based on a comparison of multiple models, the Light Gradient Boosting Machine (LGBM)
Classifier achieved the highest accuracy at 97.83%. This was followed by the CatBoost
Classifier, Random Forest, and Extreme Gradient Boosting Machines, which ranked 2nd, 3rd,
and 4th, respectively. The Support Vector Machine (SVM) model came in 5th, which differs from
previous studies where K-Nearest Neighbors (KNN) often outperformed SVM and Naive Bayes.
It is important to note that some studies did not include SVM, making direct comparisons
challenging.

Contrary to expectations, KNN's accuracy was 91.09%, placing it among the lowest-performing
models. Logistic Regression, as anticipated due to its simplicity, exhibited lower accuracy. In this
context, hybrid models demonstrated promising results by achieving higher accuracies than
most traditional models. This suggests that, with further optimization, hybrid models might
become top performers.
When comparing the accuracies of the 75-25 split with those of the 80-20 and 90-10 splits,
some noteworthy observations emerge. Interestingly, the best-performing model is no longer the
Light Gradient Boosting Machine, but the Extreme Gradient Boosting Machine. This suggests
that with less training data, the Extreme Gradient Boosting Machine performs better. This
improvement could be attributed to the model using more data for testing rather than training,
thereby enhancing its robustness.

It is possible that the model may be overfitting when using the 90-10 or 80-20 splits, whereas
the 75-25 split appears to mitigate overfitting, at least for the Extreme Gradient Boosting
Machine. Its accuracy with the 75-25 split is 97.13%, which, although not as high as the 97.83%
achieved by the Light Gradient Boosting Machine, is still noteworthy. In this particular split
scenario, the Extreme Gradient Boosting Machine emerges as the best model, surpassing the
Light Gradient Boosting Machine.

Precision & Recall

This graph is particularly noteworthy as it shows two lines: the red line represents the precision
curve and the green line represents the recall curve. The precision and recall values for each
model are as follows:
●​ Precision: The highest precision is achieved by the K-Nearest Neighbors model, while
the lowest precision is by the Stochastic Gradient Descent model.
●​ Recall: Conversely, the highest recall is achieved by the Stochastic Gradient Descent
model, and the lowest recall is by the K-Nearest Neighbors model.

These findings indicate that both K-Nearest Neighbors and Stochastic Gradient Descent are not
ideal models. Additionally, the Logistic Regression model performs poorly, with low precision
and recall values.

Models such as Naive Bayes and SVM are relatively good, but not as effective when compared
to Light Gradient Boosting Machine, CatBoost, XGBoost, and Random Forest. The Random
Forest model can be excluded from the best model analysis due to the noticeable difference
between its precision and recall values. In contrast, the Light Gradient Boosting Machine and
CatBoost models exhibit almost negligible differences between their precision and recall values.

The Light Gradient Boosting Machine stands out with both precision and recall values at
approximately 97.75%, making it the top-performing model based on these measures.

The performance of the models in terms of precision and recall, with respect to the split size,
shows a clear trend of decreased performance when using smaller split ratios. However, when
examining the Extreme Gradient Boosting Classifier, a slight performance improvement is
observed. From an overall perspective, the performance remains relatively stable, but specific
metrics show minor improvements.

When analyzing recall, it is evident that the Extreme Gradient Boosting Classifier's performance
is marginally better with the 75-25 split ratio. This highlights the nuanced differences in model
performance based on the split ratio used.

F1 Score

When comparing the models based on the F1 score, the Light Gradient Boosting Machine
(LGBM) emerges as the top performer with an F1 score of 97.75%. Following LGBM, the
CatBoost Classifier, Random Forest, and Extreme Gradient Boosting Classifier also
demonstrate strong performance. The Support Vector Machine (SVM) and Naive Bayes models
perform well, albeit to a lesser extent.

Overall, LGBM consistently outperforms other models across various metrics, including error
rates, accuracy, and precision-recall scores. The top five models are LGBM, Random Forest,
CatBoost Classifier, Extreme Gradient Boosting Classifier, and SVM.

Given these results, it is advisable to focus on hyperparameter tuning for these top models.
Additionally, applying Principal Component Analysis (PCA) to these models may further refine
their performance. While the current comparison is based on a 90/10 data split, varying the split
ratios could provide further insights for final model selection. At present, LGBM stands out as
the best-performing model.
When comparing the F1 scores across different split ratios (70-25, 80-20, and 90-10), it is
evident that models generally achieve the highest F1 scores with the 90-10 split. The
best-performing models, such as the Light Gradient Boosting Machine (LGBM) Classifier,
achieved an F1 score of 97.75% with the 90-10 split. However, this score decreased to 96.27%
with the 80-20 split and 96.2% with the 70-25 split.

Interestingly, the Extreme Gradient Boosting Classifier performs notably better with the 70-25
split, achieving an F1 score of 97.08%. This performance is higher compared to its scores with
the 80-20 split (96.62%) and the 90-10 split (96.83%). Despite this improvement, its
performance is still slightly lower than that of the Light Gradient Boosting Machine.

ROC AUC Score

The Receiver Operating Characteristic (ROC) curves and the Area Under the Curve (AUC)
scores reinforce the previous findings. The Light Gradient Boosting Machine (LGBM) Classifier
exhibits the highest ROC AUC score at 97.82%, followed by the CatBoost Classifier, Random
Forest, and Extreme Gradient Boosting Classifier. These results confirm that the LGBM
Classifier remains the best-performing model.
The comparison of ROC AUC scores reveals consistent performance trends. The Light Gradient
Boosting Machine (LGBM) continues to be the best model with a ROC AUC score of 97.82%.
However, its performance decreases noticeably with an 80-20 split, where the ROC AUC score
drops to 96.39%.

In contrast, the Extreme Gradient Boosting Classifier shows promising results. Its original
performance with a 90-10 split was 96.94%, and it only slightly decreased to 96.73% with an
80-20 split. This suggests that the Extreme Gradient Boosting Classifier is a robust model that
warrants further evaluation, particularly if the required performance is not achieved with other
models.

The results reinforce previous findings, indicating that the Light Gradient Boosting Machine
(LGBM) Classifier consistently outperforms other models. In the 90-10 split, LGBM delivers the
best performance. However, the Extreme Gradient Boosting Classifier demonstrates greater
robustness across various split ratios. For instance, its accuracy is 96.94% in the 90-10 split,
96.73% in the 80-20 split, and 97.13% in the 75-25 split.

Despite this, the maximum accuracy of the Extreme Gradient Boosting Classifier remains
slightly lower than that of LGBM. Given that these results are based on test data, it is evident
that LGBM is already a highly robust model.
In conclusion, if LGBM does not meet the required performance, the alternative models to
consider are the Extreme Gradient Boosting Classifier, CatBoost Classifier, and Random Forest.

Principal Component Analysis


Principal Component Analysis (PCA) is a crucial technique in data analysis and machine
learning for simplifying datasets by reducing their dimensions. The primary objective of PCA is
to transform data into a new coordinate system, where the first principal component captures
the greatest variance, followed by subsequent components capturing progressively smaller
variances. This process helps identify the most important features of the dataset, reduce noise,
and facilitate easier visualization and analysis.

In this study, PCA is applied to our dataset to reduce its dimensions. Initially, the data will be
compressed into 2 and 3 dimensions to determine the number of components that yield a
performance between 97% and 99%, our target range.

The dataset currently comprises 178 features, and the goal is to significantly reduce this number
while maintaining optimal model performance. By systematically analyzing the data, the minimal
number of components required to achieve the desired performance level will be identified. This
approach aims to streamline the dataset and enhance the efficiency of the model.

Models
Having reached the stage where we are utilizing Principal Component Analysis (PCA), it is now
time to focus on selecting the best models. Instead of applying PCA to all the models, we will
only apply it to those that have already demonstrated good performance. We have identified the
top three and top five models, and it is evident that their performances will remain comparatively
higher than other models even after PCA is applied.

At this stage, we are selecting only the best models for further analysis. The models we are
referring to are the Light Gradient Boosting Machine (LGBM) Classifier, CatBoost Classifier,
Random Forest, and Extreme Gradient Boosting Classifier. From this point onwards, these four
models will be compared. The comparison methodology remains the same, but the data will
now be compressed into different dimensions using PCA.
2D Data

Upon compressing the data into two dimensions, it becomes evident that the data is highly
separable, allowing for a clear distinction between class one and class zero (Y=0 and Y=1).
Visually, it appears that a linear model or a support vector machine could effectively differentiate
between the two classes.

Although this separability is apparent, models may still encounter challenges as they are not
inherently designed for such straightforward classifications. Nevertheless, the data remains
suitably formatted for classification tasks. The ability to distinguish between the two classes
following compression indicates that the process is effective for this classification task. Thus, we
have successfully reduced the data to two components while preserving its separability.

Error Comparison
Even without examining the bar graphs comparing the 2D PCA data and the 90/10 split model
performances (in terms of mean absolute error and root mean squared error), it is evident that
the error has increased. This was expected, given that we have reduced the data from 178
features to just two dimensions. Despite using PCA, which aims to retain the maximum amount
of information possible, this reduction is significant.

Remarkably, even after this considerable reduction, the errors remain relatively low. The root
mean squared errors are around 0.26, and the mean absolute errors are around 0.06. While
these values are still quite low, comparing them to the previous best performances achieved
with the 90/10 split shows a noticeable increase.

The rise in error, although it seems substantial, is relatively minor. For instance, the root mean
squared error increased from 0.17 to 0.26, a difference of just 0.09. In terms of loss, this
increase is relatively minor. Even after reducing the number of features in the data, the best
performance is still achieved by the Light Gradient Boosting Machine (LGBM) Classifier,
showcasing its strength and robustness.
Accuracy Comparison

Upon examining the table, it is apparent that the accuracy of the models has decreased
significantly. Previously, the Light Gradient Boosting Machine (LGBM) achieved the highest
accuracy at 97.83%, followed by the CatBoost Classifier at 97.61%, Random Forest at 96.96%,
and Extreme Gradient Boosting (XGBoost) at 96.96%. However, after applying Principal
Component Analysis (PCA) and reducing the data to two dimensions, the highest accuracy
achieved is 94.57% by the LGBM. CatBoost, LightGBM, and XGBoost also fall within the
93-94% range, with slight variations in their results.

This indicates that reducing the dimensionality to just two components significantly impacts the
models' ability to capture information in the data, leading to a decrease in performance.
Nonetheless, despite the reduction from 178 features to just two, the models still achieve
around 93% accuracy, which is quite impressive.

Overall, while there is a noticeable drop in accuracy, the performance remains relatively high,
showcasing the effectiveness of PCA in dimensionality reduction without overly compromising
the model's predictive capabilities.
Precision & Recall

Analyzing the two-dimensional data, it is evident that the top-performing models at this stage
are the Light Gradient Boosting Machine (LGBM) and CatBoost classifiers. This conclusion is
drawn based on their balance between precision and recall values while maintaining high
performance in both metrics. However, when these values are compared to the original dataset
with 178 features, there is a noticeable decline in the scores.
This decline is significant, indicating that reducing the data to just two dimensions does not
adequately capture the necessary information for optimal model performance. This observation
is further supported by the visual analysis of the two-dimensional data, which reveals inherent
difficulties in data separation.

In summary, while LGBM and CatBoost classifiers continue to perform well, the reduction to two
dimensions substantially impacts their effectiveness, highlighting the challenges of
dimensionality reduction.

F1 Score

The same results are reflected in the F1 scores as well. The F1 scores on the original dataset,
containing 178 features, are significantly higher compared to those on the two-dimensional data.
This substantial reduction in performance clearly indicates that using two-dimensional data is
not adequate for maintaining the models' effectiveness.
3D Data
The analysis of the three-dimensional data reveals more interesting patterns compared to the
two-dimensional data. In the two-dimensional data, we observed that class 1 (Y=1) encloses
class 0 (Y=0). This enclosing pattern persists in three dimensions as well.

When examining the three-dimensional graph from a distant perspective, Y=0 samples might
not be visible because they are encapsulated within the three-dimensional space, forming a
"blob." Identifying these samples would require zooming in, which can be challenging. This
observation indicates that in three dimensions, class 0 samples are still enclosed by class 1
samples.
MAE & RMSE

We anticipated that the errors would be comparatively lower with the three-dimensional data,
given that it retains more information than the two-dimensional version. Indeed, there is a
reduction in errors, but they remain higher compared to the original data with 178 features.
Specifically, the errors have increased by at least 0.1, which is not ideal.

Nevertheless, this is a positive development. Despite using significantly fewer features, we still
achieve comparatively low errors. This underscores the potential of Principal Component
Analysis (PCA). By slightly increasing the number of dimensions, we can further enhance
performance, demonstrating PCA's capability to preserve essential information even with a
reduced number of features.
Accuracy Comparison

Interestingly, the model accuracies remain impressive even with the three-dimensional data. We
have reduced the number of features from 178 to just three dimensions and still achieved an
accuracy of 96.09% without any hyperparameter tuning. This suggests that with tuning, we
might be able to further boost the accuracy, possibly reaching 97% or 98%, which is remarkable.

Even with only three-dimensional data, the performance is commendable. If we increase the
number of dimensions to five or six, the model's performance is likely to improve significantly.
This highlights the potential of using Principal Component Analysis (PCA) for dimensionality
reduction while maintaining high model accuracy.
Precision & Recall

The precision and recall values also indicate an improvement in model performance with
increased data. This demonstrates that slight increases in the number of components or the
application of hyperparameter tuning can result in significantly enhanced model versions.

F1 Score

The results indicate that increasing the number of features enhances model performance.
Notably, even with just three features, we achieve an F1 score of approximately 95.91%, or
roughly 96%. This high performance with only three features, compared to the original 178,
demonstrates that it is unnecessary to use all 178 features. Instead, we can identify the optimal
number of components or features needed.

Rather than discussing this in terms of feature selection, we focus on Principal Component
Analysis (PCA) for feature extraction and data decomposition to reduce dimensionality. By
reducing the data to an optimal number of features or components, we can still achieve high
performance. The key is to determine the best value of "N" for the number of features required,
which will be significantly lower than 178 while still maintaining excellent performance.

No. Of Components

From the data presented, it is evident that the optimal number of components is 48. However,
when examining the number of components at 41, we observe that the performance is relatively
similar. In terms of accuracy, the results are identical, and the F1 score shows only a marginal
decrease. Therefore, we have chosen 41 components. Our primary objective is to perform
dimensionality reduction or feature extraction using PCA, and it is crucial to use as few features
as possible while maintaining high accuracy.

Based on the data, 41 components meet both conditions, offering a balance between
dimensionality reduction and performance. While 48 components may seem excessive, 41
components provide a more manageable number while still delivering excellent performance.
Another important observation is that by providing more data, we achieved a performance of
98%. Achieving 98.4% accuracy by simply increasing the dataset establishes a baseline that we
aim to surpass. Consequently, the next step is to identify a model that can take the accuracy to
even higher levels.

The results align with our expectations: as the number of components increases, model
accuracy, especially for the Light Gradient Boosting Machine (LGBM), improves steadily. Initially,
accuracy starts at 88%, jumps to 94%, and eventually reaches 98%, the highest observed so
far. The optimal number of components is 41, yielding an accuracy of 98.48%.

Although hyperparameter tuning has not yet been performed, which could further enhance
results, it is evident that selecting 41 components offers a high likelihood of achieving a better
model. Notably, beyond 41 components, accuracy slightly decreases—from 98.48% at 41
components to approximately 97% at 45-46 components. This indicates that 41 components are
ideal for our data.

Therefore, hyperparameter tuning will be applied with the number of components set to 41.
Optimization
Introduction to Optimization
Optimization is a crucial aspect of machine learning model development, aimed at enhancing
model performance by fine-tuning hyperparameters. The goal is to identify the best combination
of hyperparameters that results in an optimal balance between accuracy, generalization, and
computational efficiency.

Foundation of Optimization
The optimization process begins with data preprocessing to ensure that the input data is
well-prepared for model training. This includes scaling features using MinMaxScaler, performing
dimensionality reduction via Principal Component Analysis (PCA), and handling class
imbalances using Random Under-Sampling.

Once the dataset is preprocessed, different machine learning models such as XGBoost,
LightGBM, and Random Forest are employed. However, the default hyperparameter settings of
these models may not yield the best performance. Therefore, an optimization framework is
necessary to systematically explore the hyperparameter space and identify the best
configurations.

Optimization Process
To optimize the model, Optuna, an advanced hyperparameter optimization framework, is
utilized. The process involves:

1.​ Defining an Objective Function: The objective function is formulated based on a


relevant evaluation metric such as ROC-AUC, accuracy, or F1-score.
2.​ Search Space Definition: The range of hyperparameters to be optimized, such as
learning rate, tree depth, and number of estimators, is specified.
3.​ Bayesian Optimization with Optuna: Optuna utilizes a tree-structured Parzen
estimator (TPE) approach to efficiently navigate the hyperparameter space and identify
promising configurations.
4.​ Trial Execution and Evaluation: Multiple trials are conducted where different
hyperparameter sets are tested, and performance metrics are recorded.
5.​ Best Hyperparameter Selection: After several trials, the optimal hyperparameter
combination is selected based on performance metrics.
6.​ Model Training with Optimized Parameters: The final model is trained using the
best-found hyperparameters to maximize predictive performance.
Optimization Setup for Each Model
1.​ XGBoost Optimization
○​ Search space includes learning rate, max depth, number of estimators, and
subsample ratio.
○​ Optuna iteratively selects and evaluates hyperparameter sets using TPE.
○​ The best-performing configuration is used to retrain XGBoost.
2.​ LightGBM Optimization
○​ Search space covers learning rate, max depth, number of leaves, and boosting
type.
○​ Optuna optimizes hyperparameters while considering performance on validation
data.
○​ The optimal parameters are used to finalize the LightGBM model.
3.​ Random Forest Optimization
○​ Key hyperparameters such as the number of trees, max depth, and minimum
samples split are tuned.
○​ Optuna conducts trials to find the best balance between complexity and
performance.
○​ The selected hyperparameter set is used to train the final Random Forest model.

By systematically exploring hyperparameter configurations using Optuna, this optimization


process ensures that the selected models achieve superior performance while preventing
overfitting and computational inefficiencies.

Optimization Outcome
The optimization of all three models was conducted with respect to their hyperparameters, with
the F1-score chosen as the target function. This decision was made because accuracy alone
may be misleading, whereas the F1-score accounts for both precision and recall, providing a
more robust assessment of model performance, the best values achieved are:
Optimization History

The optimization history of the Light Gradient boosting machine reveals that the initial best
performance was observed during the zeroth trial. The optimization process demonstrated
consistent progress from the outset, as evidenced by the proximity of the performance points to
their optimized versions. Notably, the majority of the points exhibited accuracies above 98%,
indicating the robustness of the model and the effectiveness of the optimization process in
enhancing the model's performance.

The zeroth trial yielded the initial best performance, which was subsequently improved through
continuous optimization. This iterative process ultimately culminated in the attainment of the
optimal performance during the 43rd trial.

The optimization history of the Extreme Gradient Boosting model reveals that the optimal
performance was achieved in the initial trials. Specifically, the model attained a near 99% F1
score by the second trial, which remained consistent throughout subsequent trials. Despite the
absence of further improvements, the initially high performance underscores the model's
robustness and efficacy from the outset.
The optimization history of the Random Forest model displays a more gradual improvement
compared to the other models. Initially, the model's best performance was approximately 95%,
which is lower than the performances observed in the other models. However, the model
demonstrated steady improvement over the course of several trials. Notably, after the 20th trial,
the model's performance improved significantly, achieving its best performance of approximately
98% by the 21st trial.

After optimizing all three models using the 41-dimensional dataset, which had been reduced
from 178 dimensions, we achieved remarkable performance metrics. Specifically, the Light
Gradient Boosting Machine classifier reached 99% accuracy, 99% F1 score, 99% precision, and
99% recall. This impressive performance was possible due to the continuous optimization of the
model, ultimately reaching a peak performance of 99.13% on the testing set.

Although the high performance could be attributed to extensive hyperparameter tuning, it is


important to note that the same hyperparameter tuning was applied to other models, such as
Extreme Gradient Boosting and Random Forest, over 50 trials. Despite this, the other models
did not achieve comparable results. This indicates that the Light Gradient Boosting Machine
classifier is not overfitting or particularly hyper-tuned to the current output, but rather, it
consistently produces high-quality results, achieving nearly 99% accuracy on various test
datasets.
Performance Measure

F1 Score

Given that the optimization objective was set to maximize the F1 score, it is not surprising that
the models performed best in this metric. Specifically, the Light Gradient Boosting model
achieved an F1 score of 99.131%, followed by the Extreme Gradient Boosting model with an F1
score of 98.478%, and finally, the Random Forest model with an F1 score of 98.261%. This
clearly demonstrates the superior performance of more advanced models such as the Light
Gradient Boosting Machine and the Extreme Gradient Boosting classifier.

Accuracy

The primary objective of this study was to achieve the highest possible accuracy while
employing Principal Component Analysis (PCA) for feature extraction. Initial experiments
determined that using 41 components provided an optimal balance between accuracy and F1
score. Even without hyperparameter tuning, the model achieved an accuracy of 98%.
Given this initial performance, it was anticipated that further hyperparameter tuning would
enhance the model's accuracy. As expected, the Light Gradient Boosting Machine classifier's
accuracy increased to 99.13% following optimization. In comparison, the Extreme Gradient
Boosting (XGB) and Random Forest (RF) models achieved accuracies of 98.47% and 98.26%,
respectively. These results indicate that the Light Gradient Boosting Machine classifier is the
most effective model for this dataset.

MAE & RMSE

In addition to achieving high accuracy and F1 scores, it is crucial to maintain low error values. In
this study, we measured the Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE)
to evaluate model performance. The Light Gradient Boosting Machine classifier emerged as the
best model, with a MAE of just 0.009 and an RMSE of 0.09. These extremely low error values
further underscore the model's effectiveness and reliability.
Precision & Recall

Having established that the Light Gradient Boosting Machine classifier achieved an overall
performance score of 99.13%, we observe no immediate necessity to delve into individual
precision and recall scores. Nonetheless, examining these metrics reveals that the precision
value (99.134%) is slightly higher than the recall value (99.13%). This observation is significant
as it highlights the classifier's ability to maintain a high level of accuracy in positive predictions.
While other models also have their respective precision and recall scores, they did not achieve
comparable performance, thereby establishing the Light Gradient Boosting Machine classifier as
the most effective model in our study.

Summary
This study explored the detection of epileptic seizures using machine learning models on EEG
data. The dataset comprised EEG recordings from 500 subjects, with data preprocessed into a
binary classification problem—seizure (Y=1) and non-seizure (Y=0). The initial dataset exhibited
class imbalance, necessitating an undersampling approach to balance the data distribution and
enhance model fairness.
To evaluate model performance, multiple classification algorithms were tested, including
K-Nearest Neighbors (KNN), Random Forest, Support Vector Machines (SVM), Naïve Bayes,
Decision Trees, and advanced ensemble methods such as Light Gradient Boosting Machine
(LGBM), Extreme Gradient Boosting (XGBoost), and CatBoost classifiers. The study employed
various data split ratios (90-10, 80-20, 75-25) to assess model stability across different training
and testing proportions.

Among the tested models, LGBM emerged as the best performer with an accuracy of 97.83%,
followed closely by CatBoost and XGBoost. The study also highlighted that ensemble methods
outperformed traditional models like KNN and Naïve Bayes, which demonstrated relatively lower
accuracies. A deeper error analysis using MAE and RMSE confirmed that LGBM and CatBoost
had the lowest error rates, reinforcing their robustness in seizure detection.

Principal Component Analysis (PCA) was applied to assess dimensionality reduction's impact
on performance. Reducing the dataset to 41 components provided an optimal balance
between accuracy and computational efficiency, achieving a peak accuracy of 98.48%. Further
analysis showed that an increase in the number of dimensions improved performance until it
plateaued around 41 features, beyond which accuracy slightly declined.

Hyperparameter tuning using Optuna was employed to optimize model configurations, focusing
on accuracy, F1-score, and error minimization. The LGBM classifier, after hyperparameter
tuning, achieved an outstanding 99.13% accuracy, an F1-score of 99.131%, and exceptionally
low error values (MAE: 0.009, RMSE: 0.09). In comparison, XGBoost and Random Forest
classifiers exhibited slightly lower but still commendable performances.

Conclusion
This study successfully demonstrated the efficacy of machine learning models in detecting
epileptic seizures from EEG data. The findings highlight the superiority of ensemble learning
techniques, particularly the Light Gradient Boosting Machine (LGBM) classifier, which
consistently outperformed other models across multiple evaluation metrics. By leveraging
dimensionality reduction through PCA and optimizing hyperparameters via Optuna, the study
achieved state-of-the-art classification performance, attaining an accuracy of 99.13% on the
testing dataset.

The implications of this research extend beyond model optimization; it underscores the critical
role of feature selection and hyperparameter tuning in enhancing classification accuracy. The
results further emphasize the potential of EEG-based machine learning approaches in real-time
seizure detection, which could lead to significant advancements in automated epileptic
diagnosis and patient monitoring systems.

Future work could focus on expanding the dataset to incorporate more diverse EEG recordings,
exploring deep learning techniques, and developing real-time implementation frameworks.
Additionally, integrating domain-specific feature engineering and explainable AI techniques
could further improve model interpretability and clinical applicability. Overall, this research
provides a strong foundation for deploying machine learning solutions in the medical domain,
particularly for epilepsy detection and management.

You might also like