0% found this document useful (0 votes)

54 views18 pages

A Deep Learning Model Using A Bi-LSTM Network

This research presents a deep learning framework using Bi-LSTM to predict student academic performance, achieving an average accuracy of 88.23%. The model outperforms traditional machine learning algorithms and incorporates SHAP for interpretability, allowing stakeholders to understand feature contributions. The study aims to facilitate early identification of at-risk students and improve academic outcomes through timely interventions.

Uploaded by

bharathkrishna2088

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views18 pages

A Deep Learning Model Using A Bi-LSTM Network

Uploaded by

bharathkrishna2088

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

TYPE Original Research

PUBLISHED 23 June 2025

DOI 10.3389/feduc.2025.1581247

Predicting student academic

OPEN ACCESS performance using Bi-LSTM: a
deep learning framework with
EDITED BY
Antonino Ferraro,
Pegaso University, Italy

REVIEWED BY
Mostafa Aboulnour Salem,
SHAP-based interpretability and
King Faisal University, Saudi Arabia
Gail Augustine,
Walden University, United States
statistical validation
*CORRESPONDENCE
Silvia Gaftandzhieva Emi Kalita 1, Abdullah Mana Alfarwan 2, Houssam El Aouifi 3,4,
[email protected] Ashima Kukkar 5, Sadiq Hussain 1, Tazid Ali 1 and
RECEIVED 19March 2025
ACCEPTED 02 June 2025
Silvia Gaftandzhieva 6*
PUBLISHED 23 June 2025 1
Centre for Computer Science and Applications, Dibrugarh University, Dibrugarh, India, 2 Department
CITATION of Education and Psychology, Najran University, Najran, Saudi Arabia, 3 FSJES, Ibn Zohr University,
Kalita E, Alfarwan AM, El Aouifi H, Kukkar A, Ait Melloul, Morocco, 4 IRF-SIC Laboratory, Faculty of Science, Ibn Zohr University, Agadir, Morocco,
Hussain S, Ali T and Gaftandzhieva S (2025)
5
Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, India,
Predicting student academic performance
6
Faculty of Mathematics and Informatics, University of Plovdiv Paisii Hilendarski, Plovdiv, Bulgaria
using Bi-LSTM: a deep learning framework
with SHAP-based interpretability and
statistical validation. Introduction: Educational Data Mining (EDM) involves analysing educational
Front. Educ. 10:1581247. data to identify patterns and trends. By uncovering these insights, educators
doi: 10.3389/feduc.2025.1581247
can better understand student learning, optimise teaching methods, and refine
COPYRIGHT
curriculum. One of the main tasks in educational data mining is predicting
© 2025 Kalita, Alfarwan, El Aouifi, Kukkar,
Hussain, Ali and Gaftandzhieva. This is an the student’s academic performance because it makes it possible to provide
open-access article distributed under the appropriate interventions supporting students’ achievements. Predicting the
terms of the Creative Commons Attribution
student’s academic performance also helps to identify at-risk students and
License (CC BY). The use, distribution or
reproduction in other forums is permitted, explore the possibility of providing intervention techniques.
provided the original author(s) and the
Methods: In this paper, a deep learning model using a Bi-LSTM network is
copyright owner(s) are credited and that the
original publication in this journal is cited, in introduced to predict second term GPA.
accordance with accepted academic
Results: The model had an average accuracy of 88.23% and was statistically
practice. No use, distribution or reproduction
is permitted which does not comply with better than traditional machine learning algorithms such as CatBoost, XGBoost,
these terms. Hist Gradient Boosting, and LightGBM for accuracy, precision, recall, or F1-
score metrics. The results are also analysed with the help of SHAP values for
model interpretability to understand feature contributions, making the proposed
framework more transparent. The performance of models is also compared
using various statistical tests.
Discussion: The results demonstrate that BI-LSTM performance is significantly
different from other models. Hence, the proposed model provides a way to
prevent student dropouts and improve academic achievements.

KEYWORDS

student academic outcome, XAI, SHAP, Bi-LSTM, student dropout, statistical test

1 Introduction
Student academic performance is a key factor when evaluating the outcome of global
education systems. Our civilisation depends heavily on education, which is a crucial
component. Research in many areas, particularly education, has been impacted by information
and communication technology. For instance, the recent COVID-19 pandemic forced many
countries to adopt various e-learning platforms (Albreiki et al., 2021). Higher education
institutions prioritise student academic achievement as a key indicator of quality education.

Frontiers in Education 01 frontiersin.org

Kalita et al. 10.3389/feduc.2025.1581247

However, identifying the factors that significantly impact student achievement, we can incorporate variables like aptitude test results,
success early in their academic journey is a complex challenge. Several high school GPAs, and the student’s graduating high school. We think
useful strategies have been employed to address the academic that a student’s success during their first year of college can be used as
performance issues of the students (Bravo-Agapito et al., 2021; Alamri a predictor of how well they will perform during the remaining years
and Alharbi, 2021; Hamsa et al., 2016). These resources may not of their education. These elements enable students to receive early
be easily implemented everywhere. Also, while technology has feedback and take steps to enhance their performance. The main
improved student performance prediction, further work is necessary purpose of this study is to achieve early classification of at-risk
to achieve higher accuracy through new data and techniques. students and the prediction of their GPA to allow timely intervention
Additionally, clustering and classification techniques are presented to by educators and other policymakers. That is why recognising
identify the impact of students’ performance early on the GPA. Grade potential dropouts can help an institution improve dropout and
Point Average, commonly known as GPA, is the widely used and retention rates. The three key objectives of this research are:
accepted criterion for determining student academic performance. It
is a very significant component of the overall academic evaluation • To predict the at-risk students using classification so that the
process. However, there is a need to predict GPA initially to easily teachers and policymakers can stop the possible dropout of
track and address any student who is most likely to drop out during these students.
their academic period. To address this challenge, this study applies • To find the best classifiers among different classifiers to predict
modern computational techniques. the at-risk students that may be applied to similar datasets of
Student performance is a major component of the learning other Universities.
process. Predicting student performance is necessary to identify those • To utilise SHAP (Shapley Additive exPlanations) to interpret the
most likely to experience poor academic accomplishment in the results, providing stakeholders with insights into the key features
future. The data may be helpful and utilised to make predictions if it influencing predictions and reinforcing the principles of
has been converted into knowledge. Therefore, the information could Explainable AI (XAI).
help students reach their academic goals and enhance the quality of • To compute the performance of the best classifier with others, a
education and learning. This study, Educational Data Mining (EDM), statistical analysis such as the mean, median, standard deviation,
analyses data from educational backgrounds using data mining t-test–test, bootstrap confidence levels, Friedman test, Effect Sizes
techniques (Kaunang and Rotikan, 2018; Yağcı, 2022). EDM (Cohen’s d) and Tukey’s HSD Test are employed on the four
application also assists in preparing action plans for enhancing student performance metrics.
performance. This will ultimately lead to improved teaching, learning,
and the overall student experience within the institution (Ajibade This study aims to improve predictive accuracy while providing
et al., 2022; Nabil et al., 2021). Analysing academic data with machine comprehensible and practical recommendations to educational
learning has shown promising results in identifying learning patterns stakeholders using deep learning methodologies and interpretability
and predicting student performance (Hussain and Khan, 2023). tools like SHAP. The proposed framework offers a reference model for
Through the application of ML algorithms, an assessment of student early GPA prediction, contributing to better academic outcomes and
outcomes can be made due to the identification of patterns that exist fewer student dropouts.
within the data (Dabhade et al., 2021). While machine learning offers The rest of the paper is organised as follows. Section 2 describes
potential for academic data analysis, traditional model-building the related works, while Section 3 depicts the methodology section.
methods are inadequate. They suffer from issues such as lack of Results and discussions were described in Section 4, and Section 5 is
interpretability, vulnerability to overfitting in imbalanced datasets, the conclusion.
and difficulty managing feature interdependencies (Alam and
Mohanty, 2022). These limitations, in turn, make it difficult for those
who apply the models to make important decisions based on the 2 Related work
provided information by the models. Deep Learning (DL) has
emerged as a promising solution to address the limitations of The growth and development of a country depend on the
traditional machine learning models (Rodríguez-Hernández et al., achievements of students in school. Therefore, various researchers
2021). However, even with DL, handling the complexities and work to develop diverse methods for the early prediction of student’s
non-linear relationships found in academic datasets remains a academic performance.
significant challenge (Waheed et al., 2020; Lee et al., 2021; Al-Azazi Sarker et al. (2024) conducted a study by applying the EDM
and Ghurab, 2023; Shen, 2024; Sateesh et al., 2023; Manigandan et al., method to investigate student achievement in higher secondary
2024). Moreover, DL’s capability of handling big data will enhance the education in Bangladesh. The research focused on categorising
prediction accuracy of GPA if integrated with workflows of handling students into good, average, and poorly performing groups. It
imbalanced data and the feature importance workflow, as shown in evaluated their academic performance through four key aspects:
Figure 1. assessment of probable outcomes, comparison of subject-wise
Academic achievement is significant since it is closely related to performance analysis, performance trends, and internal examination
the favourable results that we appreciate. Students’ academic pattern parameters. Therefore, a two-year dataset of humanities
achievement in college or university is one of the aspects that students was used, and five machine learning algorithms were used for
contribute to academic success. Every college or university’s analysis, such as Naive Bayes (NB), Decision Tree (DT), Random
performance is still determined by the total academic achievement of Forest (RF), Neural Network (NN), and Nearest Neighbour. The study
its students. To enhance our analysis and prediction of academic demonstrated a clear correlation between students’ performance

Frontiers in Education 02 frontiersin.org

Kalita et al. 10.3389/feduc.2025.1581247

FIGURE 1
A flowchart of the ML & DL process with the constraints and transition stages.

during the term and their final grades, and it also identified specific RF, Support Vector Machine (SVM), and XGB. The proposed system
subjects that significantly contribute to high academic achievement. also included nine feature selection techniques, including variance
Such a concept can help college administrations with intervention threshold and recursive feature elimination. The ensemble DXK
strategies that can be used to help low achievers while motivating (DT + XGB + KNN) model achieved 97.83% accuracy with 80:20 data
high achievers. proportions, showing better results than traditional classifiers.
Kukkar et al. (2023) proposed a new Student Academic Furthermore, the ACO-DT Model achieved a 98.15% accuracy rate
Performance Predicting (SAPP) system to enhance the prediction and was higher than all the models used. The authors highlighted that
accuracy and solve performance prediction issues. The proposed more research should enhance the performance of more accurate and
system combined 4-layer stacked LSTM with RF and Gradient faster predictions.
Boosting (GB) algorithms. The system performance was evaluated Another analysis was done by Liang et al. (2024) using five
using Accuracy, Precision, F-measure and Recall parameters on a machine learning models to predict academic performance in an
newly created emotional dataset with an OULAD dataset. The engineering mechanics course with inputs as online learning
accuracy of the proposed SAPP system was around 96%, which is behaviours and comprehensive performance and outputs as final exam
higher than ANN, RNN, CNN, SVM, DT, and NB. These results scores (FESs). The best performance was achieved by GB Regression
supported its accuracy over other approaches employed in student (GBR) with RMSE (9.3595) and a correlation coefficient of (0.7558).
prediction performance. Thus, they found that the Intellectual Education Score (IES) was the
Mahawar and Rattan (2025) developed a performance prediction most important performance indicator affecting the change in the
model using ML models involving demographic, social, psychological, scores. Live viewing rate (LVR), replay viewing rate (RVR), and
and economic indicators. An online survey was performed, and a number of completed assignments (NOCA) were critical for FESs.
dataset of pre-year undergraduate students was considered for analysis They presented practical information for educators who could
using eight different ML classifiers, namely, Logistic Regression (LR), incorporate or modify particular practices to help a student at risk.

Frontiers in Education 03 frontiersin.org

Kalita et al. 10.3389/feduc.2025.1581247

Huang and Zeng (2024) developed a novel academic Kapucu et al. (2024) explored ML and DL approaches to predict
performance prediction model leveraging dual graph NN to utilise student performance in science classes. They collected the data from
both interaction-based structural information and attribute feature 445 students in grades 5–8 from a school in Central Anatolia, Turkey,
spaces of students. The model included a local academic performance during the 2022–2023 academic year. The results revealed that out of
representation module obtained from online interaction activities several factors, the average number of books read per year significantly
and a global representation module constructed from attribute affected performance more than other factors. The DNN model
features with the help of dynamic graph convolution. These various achieved the highest accuracy, i.e., 90%.
data representations are integrated with a learning module that Nurudeen et al. (2024) established the correlation between the
analyses information from individual and overall perspectives to first-year GPA and the final-year CGPA. Data were collected using an
predict performance on a test. The experiment outcome showed that ex-post facto design and analysed using Pearson’s correlation and
performance was improved with 83.96% accuracy for pass/fail regression in Minitab. It was also found that first-year GPA had a
prediction and 90.18% for pass/withdraw prediction in a public consistently high correlation (i.e., 0.9334) with the final-year CGPA,
dataset. Additionally, ablation studies were performed to validate proving that early academic performance is a major determinant of
these improvements and to showcase that the proposed model success. However, other demographic characteristics were not
outperformed the other approaches. significantly related to CGPA.
Hussain et al. (2024) implemented an innovative deep learning The problem of imbalanced datasets in learning was
approach that uses the Levenberg Marquardt Algorithm (MLA), minimised by Wang et al. (2023). They proposed a ProbSAP
which solves problems like insufficient attributes and model system for predicting academic performance. The ProbSAP
complexity in the current approach. The input data included the incorporated three key modules: a cooperative data enhancement
assignments, class tests, midterm scores, and attendance. This data sub-module for improving data quality, accessible in large-scale
is fed through the NN via four input variables, three hidden layers metadata clustering sub-module for reducing potential imbalances
and an output layer. The proposed model obtained an accuracy of of academic features, and the XGBoost-based prediction
88.6%, more accurate than previous approaches. The study sub-module for final course mark prediction. The comparative
achieved its goal of predicting final grades, which proved beneficial assessments revealed that ProbSAP leads to lower mean absolute
for students, teachers, and educational leaders by providing error than the current methods, including CNN, SVR, and
actionable information. Catboost-SHAP, and improved on an average by up to 84.76%. It
Kukkar et al. (2024) developed a system that analysed the provided a sample accuracy above 98%; there is less than 1–9%
sequences and long-dependent structures of OULAD and self- prediction error. Table 1 showcases different state-of-the-art
derived emotional data using RNN and LSTM networks. studies in this domain.
Integrating RF, SVM, NB, and DT with RNN and LSTM improves
the method’s predictive capability. The proposed model with the
RNN + LSTM + RF model achieved a high accuracy of 97% as 3 Methodology
compared to the other models: RNN + LSTM + SVM with 90.67%,
RNN + LSTM + NB with 86.45% and RNN + LSTM + DT with In this section, the different methods used in this study for
84.42%. This method effectively modelled the intricate time- second-term GPA prediction are explained in detail. The design,
dependent relationships within the data and outperformed all other implementation, and evaluation of the proposed methodologies and
tested configurations. their comparison with the conventional machine learning approaches
Demographic and personality features are combined by Shaninah are also explained as follows.
and Mohd Noor (2024) to develop a SAP prediction model. They
collected the dataset from 305 students studying at Al-Zintan
University, Libya, through a questionnaire containing 44 questions. 3.1 Different methods utilised in the study
The proposed approach involved one latent dependent construct, i.e.,
SAP and five independent constructs. Both were tested using This section provides a detailed analysis of seven methods,
PLS-SEM, which was more effective in handling smaller samples and examining their architecture, functionality, and effectiveness in
complex models than CB-SEM. The research outcomes identified predicting second-term GPA. Following this, we discuss the
personality features as the most influential factors that affect advantages and disadvantages of each method in the context of
SAP performance. academic performance prediction.
The issues faced by DHH students in their education were
addressed by Raji et al. (2024). They proposed a new ML system with 3.1.1 XGBoost
LIME and SHAP methods. The proposed system predicted the student E-Xtreme gradient Boosting is a Machine learning technique
at risk and weighted the key risk factors like early intervention, family known for its exceptional predictive performance. It is also
deafness history, mode of communication, and type of schooling. renowned for its high accuracy, efficiency and speed. It creates a
They generated a new dataset combining 454 DHH student records sequence of weak learners, and based on this sequence, it develops
with synthetic and SMOTE datasets. After that, various ML methods an accurate predictive model. XGBoost minimises the overfitting
were applied, among which a stacked model with XGB + RF + Extra problem by improving generalisation. Mostly, it works on
Trees gained 92.99% accuracy. This system provided practical classification and regression problems. It can handle missing
recommendations allowing stakeholders to enhance DHH students’ values, which allows the model to handle real-world data with
performance. missing values without requiring pre-processing. Boost has key

Frontiers in Education 04 frontiersin.org

Kalita et al. 10.3389/feduc.2025.1581247

TABLE 1 Some of the state-of-the-art studies with their findings and limitations.

Researchers Dataset Attributes Methods Best Method Findings Limitations

The established
relationship
between internal Focused only on
2-year dataset Internal exams,
NB, DT, RF, NN, and end-term humanities students;
Sarker et al. (2024) (Humanities subject-wise RF
Nearest Neighbour performance broader generalisation
students) performance, trends
identified subjects is limited.
contributing to high
grades.

Achieved 96%
RF, GB, ANN, Requires additional
Emotional + OULAD Emotional states, Stacked LSTM + accuracy; enhanced
Kukkar et al. (2023) CNN, SVM, DT, real-world validation
datasets academic records RF + GB prediction over
NB for diverse datasets.
traditional methods.

Identified effective
Limited to pre-year
features using
Demographic, social, undergraduates;
Mahawar and Rattan Online survey (pre- LR, RF, SVM, XGB, ACO-DT (98.15% advanced feature
psychological, and economic data
(2025) year undergraduate) DXK, ACO-DT accuracy) selection; improved
economic factors inconsistencies may
accuracy with
affect generalisation.
ensemble models.

Found IES, LVR,

Applied only to
RVR, and NOCA as
Engineering Online behaviours, engineering
critical factors;
Liang et al. (2024) Mechanics course comprehensive NB, DT, RF, GBR GBR mechanics, external
RMSE: 9.3595,
Data performance applicability is
correlation
untested.
coefficient: 0.7558.

Combined local and

global student
Requires more diverse
Interaction activities, Dual Graph NN features for pass/fail
Huang and Zeng (2024) OULAD Dual Graph NN datasets for robust
attribute features (90.18% accuracy) predictions;
validation.
validated using
ablation studies.

Successfully
predicted final
Attendance, grades using simple Accuracy is slightly
BS program 1st- NN + MLA (88.6%
Hussain et al. (2024) assignments, midterm MLA input features; lower than modern
semester Data accuracy)
scores, class tests beneficial for ensemble methods.
educators and
policy-makers.

Captured complex
temporal
Temporal dependencies with Needs scalability
Emotional + OULAD RNN, LSTM, RF, RNN + LSTM + RF
Kukkar et al. (2023) dependencies from superior testing for larger
datasets SVM, NB, DT (97% accuracy)
sequence-based data performance datasets.
compared to other
combinations.

Identified
personality traits as
Personality traits, Limited sample size;
Shaninah and Mohd most influential on
305 students (survey) demographics, PLS-SEM, CB-SEM PLS-SEM focused only on
Noor (2024) SAP; performed
employment factors Libyan universities.
well with smaller
sample sizes.

(Continued)

Frontiers in Education 05 frontiersin.org

Kalita et al. 10.3389/feduc.2025.1581247

TABLE 1 (Continued)

Researchers Dataset Attributes Methods Best Method Findings Limitations

Predicted at-risk
Focused solely on the
Communication students and
DHH population,
454 DHH student mode, family deafness XGB, RF, Extra Stacked Model identified key risk
Raji et al. (2024) applicability to
records history, early Trees (92.99% accuracy) factors affecting
broader populations is
intervention DHH student
unknown.
outcomes.

Determined books
Applied only to grades
Number of books read read per year as a
445 students (grades DNN (90% 5–8; additional factors
Kapucu et al. (2024) per year, midterm DNN significant factor for
5–8) accuracy) for higher education
scores predicting science
are not included.
course performance.

Strong correlation
between first-year
GPA and final Focused only on GPA
Regression, Regression
First- and final-year Demographics, first- CGPA; progression; external
Nurudeen et al. (2024) Pearson’s (Correlation:
GPAs year GPA demographic factors were not
Correlation 0.9334)
variables had no considered.
significant
influence.

ProbSAP reduced
MAE by 84.76% and
Requires extensive
achieved 98%
Massive educational Academic features, XGBoost, CNN, computational
Wang et al. (2023) ProbSAP accuracy in
dataset metadata clustering SVR, ProbSAP resources for large-
predictions with a
scale datasets.
reduced error
margin (1–9%).

features; it uses a decision tree as the base learner. To enhance its overfitting, high performance, interpretability and scalability are
performance, this approach supports parallel processing for the advantages of Cat Boost (Prokhorenkova et al., 2018).
improved efficiency and scalability and utilises regularisation to Mathematically, CatBoost can be calculated as follows:
avoid overfitting. Its advantages are High accuracy, efficiency,
M N
handling large datasets and interpretability (Chen and ( x ) F0 ( x ) + ∑
F= ∑ f
m 1 =i 1 m
=
( xi ,yi ) (1)
Guestrin, 2016).

N is the samples of a Training data set and M features, where each

3.1.2 CatBoost sample is defined as (x i , y i), where x i is a vector of M features and y i
Yandex develops CatBoost to handle categorical and numerical is the corresponding target variablе, CatBoost try to learn a function
data. CatBoost, or categorical boosting, is an open-source library F ( x ) that predicts the target variable y (see Equation 1).
to solve the problem of regression and classification with many Here,
independent features. It uses Symmetric Weighted Quantile Sketch F ( x ) is the overall prediction function that CatBoost try to learn.
(SWQS) to handle missing values in the dataset and reduce the It selects an input vector x and predicts the variable y.
overfitting problem, which improves the performance of the F0 ( x ) is the initial prediction. It is the mean of the target variable
dataset. It also applies Ordered Boosting to handle difficulties like in the training dataset.
M
large cardinality faced by categorical data. CatBoost utilises random ∑ m =1 defines the summation over the ensemble of trees. M is the
permutations and gradient-based optimisation, which leads to total number of trees in the ensemble.
N
improved performance on large and complex datasets. In each ∑i =1 is the summation over the training samples? N is the total
iteration of the algorithm, CatBoost define the loss function by number of training samples.
calculating the negative gradient from the current prediction then, f m ( x i ,y i ) denotes the prediction of the mth tree for the ith
this gradient is used to update the prediction by adding a scaled training sample. In the overall prediction process, each tree within
gradient to the current prediction. CatBoost uses gradient-based the ensemble contributes by leveraging its training sample predictions.
optimisation to build decision trees to achieve accurate predictions. The equation represents that the total prediction F ( x ) is summing
It then employs Ordered Boosting for faster model convergence up the initial guess F0 ( x ) with thе predictions of each tree f m ( x i ,y i )
and improved accuracy, which is particularly beneficial for datasets for each training sample. This summation is done for all trees (m) and
with numerous features. Categorial feature handling, reduced all training samples (i).

Frontiers in Education 06 frontiersin.org

Kalita et al. 10.3389/feduc.2025.1581247

3.1.3 Histogram based gradient boosting Finally, it combines forward and backward passes to capture past and
Traditional Gradient Boosting is an ensemble decision tree future context. In the forward pass, it can process the input from
algorithm; it is slow to train the model, to minimise this problem, Hist starting to ending and from ending to starting in the backward pass.
Gradient Boosting or Histogram Based Gradient Boosting (HGB) In Figure 2, the input sequence represents some data like characters
concept is introduced. Hist Gradient Boosting is an effective in a text or words in a sentence, etc., these data points are transformed into
implementation of traditional gradient boosting. This boosting technique dense vectors. The Bi-LSTM layer applies its parameter to the vector
divides data into bins and histograms, reducing the computational sequence. In the forward pass, information is collected from the past
complexity and memory usage. These bins or histograms are used to find (prior time steps), and in the backward pass, information is recorded from
the gradient of the loss function and then update the model using the the future (following time steps). The output of the BiLSTM is the
calculated gradients. It is an iterative process until it reaches the stopping combination of the hidden steps from forward and backward directions
criteria or convergence. Hist Gradient Boosting offers advantages such (Graves and Schmidhuber, 2005) (Equation 2).
as accelerated gradient computation, scalability for large datasets and
f b
high-dimensional features, and resilience to outliers and noisy data. The p=
t pt + pt (2)
common application of Hist Gradient Boosting is classification,
regression and recommendation systems (Si et al., 2017).
Where.
3.1.4 LightGBM pt is the probability record from both the forward and backward
Microsoft’s LightGBM is a fast and efficient gradient-boosting LSTM network, i.e., the final probability vector;
framework for high performance. It tackles classification, regression, ptf probability vector found from the forward LSTM network.
and ranking problems through a tree-structured approach, combining ptb probability vector found from the backward LSTM network.
weak models into a strong predictor. LightGBM’s focus on large and
small gradient instances contributes to its accuracy. It is a flexible 3.1.6 SHAP (Shaply additive explanations)
model because it can support various objective functions. Due to its The concept of cooperative game theory and sharply values is
support for sparse data, LightGBM is highly memory-efficient. Its the foundation of SHAP (Lundberg and Lee, 2017). The output of
operation involves initialising a basic model and then calculating the ML model is interpreted and explained using the Shapley
gradients. LightGBM applies some efficient algorithms to get an Additive Explanations (SHAP) framework. SHAP values help to
efficient model by searching the optimal split point in each feature.it understand the contribution of each feature in model prediction.
is an iterative process and updates the model prediction based on split SHAP values explain the significance of each feature and how it
point and calculated gradients, continuously adding new decision affects the output and interaction between features. The positive
trees until a stopping criterion is met, which may be either a maximum SHAP value of a feature gives a positive impact on model
no. of trees or minimum improvement in performance. The high prediction, and the negative value gives a negative impact on
accuracy, speed, scalability, efficient histogram construction, and low model prediction. The magnitude represents the strength of the
memory usage are the advantages of LightGBM (Ke et al., 2017). effect. SHAP uses the training data to measure the contribution of
The selection of the methodologies depends on the problems, each feature, and then a reference value is calculated. This reference
datasets and performance matrices because the following methodologies value helps to represent the average prediction for the dataset.
also have some demerits. XGBoost gives high accuracy but can suffer SHAP value defines the difference between the predicted value and
from overfitting. CatBoost can handle categorical features, but it is reference value for each SHAP value and is calculated by
resilient to outliers. Hist GB is fast and memory-efficient, but it gives considering all possible feature coalitions. Under considering all
minimum accuracy. LightGBM is also fast and memory-efficient and potential feature coalitions, the SHAP value defines the difference
gives more accuracy but can be less robust to outliers. between predicted and reference values for each. Finally, SHAP
values are used to determine how each feature affects the outcome
3.1.5 BiLSTM and to understand and interpret the result. However, gaining
Bi-directional Long Short-Term Memory, commonly known as insight helps the model to make decisions. Interpretability, model
Bi-LSTM, belongs to the recurrent neural networks (RNNs) category. explainability and feature selection are the advantages of SHAP.
It is called a sequence model because it processes sequential data. It has
two LSTM layers, so it is Bi-directional. The first one is Forward LSTM, 3.1.7 SMOTE
and the other one is Backward LSTM. Simultaneously, these two LSTM Synthetic Minority Over Sampling Technique (SMOTE) is
layers process the input sequence in forward and backward directions. known to handle imbalanced datasets of machine learning models

FIGURE 2
Structure of BiLSTM.

Frontiers in Education 07 frontiersin.org

Kalita et al. 10.3389/feduc.2025.1581247

(Chawla et al., 2002). SMOTE helps solve oversampling, average (HSGPA), American College Testing (ACT) composite score,
undersampling and threshold moving issues. The underrepresented and grade point averages for the first (FTGPA) and second terms
minority class causes the majority class to dominate the class (STGPA). STGPA is our target variable. The dataset consisted of
distribution. Therefore, SMOTE handles these imbalanced issues three cohorts of students’ records (N = 6,500) on six variables
by generating a sample of minority classes. SMOTE identifies some (features).
minority class instances from the imbalanced dataset. Once
minority instances are identified, find their K-nearest neighbours 3.2.2 Data pre-processing
and generate synthetic samples by interpolating between each The dataset underwent a systematic preparation process to
minority instance and its K-nearest neighbours. SMOTE repeats ensure its reliability and accuracy. Data cleaning was a critical step
these steps to get a more balanced dataset. involving the identification and removal of missing values, as well
as the elimination of duplicate records to maintain data consistency.
These measures were essential to produce a clean and error-free
3.2 Data pre-processing to model dataset, providing a robust foundation for subsequent
evaluation analytical tasks.
In addition to data cleaning, data augmentation was applied
In this work, we followed a systematic methodology starting to enhance the dataset. This process involved generating new data
with data pre-processing, which involved data preparation, points by introducing small random perturbations to key features,
transformation, and oversampling to address class imbalance such as HSGPA, ACT, and FTGPA. Adding subtle variations to the
issues. The raw dataset was cleaned and transformed into a suitable data increased its diversity, better reflecting real-world variability.
format, and oversampling techniques were applied to balance the This data augmentation expanded the dataset and enhanced the
data. This resulted in a refined new dataset, which was then used model’s general ability, leading to more robust analyses. Figure 4
for model development and evaluation to assess the performance shows the distribution of classes before the data
and accuracy of the proposed approach. Figure 3 describes the augmentation process.
steps of our model. To further address the class imbalance, SMOTE (Synthetic
Minority Over-Sampling Technique) was applied. SMOTE
3.2.1 Dataset description generates synthetic data points for the minority classes, ensuring a
The dataset was collected from a Middle Western University, more balanced data distribution across all classes. This balance is
USA. The dataset comprised sex, age, high school grade point critical for training machine learning models, as it prevents bias

FIGURE 3
Structure of proposed model.

Frontiers in Education 08 frontiersin.org

Kalita et al. 10.3389/feduc.2025.1581247

toward any particular class and ensures that the model is equally • Recurrent Layers:
exposed to all possible outcomes, improving its overall • The core of the model leverages a combination of BiLSTM and
performance and generalisation ability. The final balanced BiGRU layers:
distribution is shown in Figure 5. • 1st Layer: A Bidirectional LSTM layer with 512 units and
return_sequences = True, allowing the output sequence to
3.2.3 Model architecture be passed to the next layer.
The model implemented is a Recurrent Neural Network (RNN) • 2nd Layer: A Bidirectional GRU layer with 256 units
architecture utilising Bidirectional Long Short-Term Memory configured to output sequences for further processing.
(Bi-LSTM) and Bidirectional Gated Recurrent Units (Bi-GRU) layers • 3rd Layer: Another Bidirectional LSTM layer with 256 units,
to capture sequential patterns in the data (Figure 6 shows the proposed reducing the sequence to a single vector representation.
model architecture).
• Dense Layers:
• Input Pre-processing: • A stack of fully connected layers captures complex, high-level
• The input features are reshaped to a 3D tensor of shape representations of the processed sequential data:
(samples, time steps, features). Here: • Dense(64) → Dense(32) → BatchNormalization →
• samples correspond to the number of training/ Dense(16) → Dense(8) layers refine the feature space.
testing samples. • Batch normalisation ensures stability and mitigates the risk
• time steps are set to 1, signifying a single time step. of vanishing/exploding gradients.
• features represent the number of input features.
• Dropout:
• Dropout layers introduce regularisation, preventing
overfitting by randomly setting a fraction of units to zero
during training.

• Output Layer:
• A Dense layer with four units and a sigmoid activation
function outputs class probabilities for the four classes.

The model was trained for up to 200 epochs with a batch size of 128,
while early stopping was applied to prevent overfitting. Early stopping
monitored the validation loss and halted training if no improvement
was observed for 15 consecutive epochs, restoring the best model
weights to ensure optimal performance. Dropout was applied with a rate
of 0.2 in the fully connected layers to reduce overfitting by randomly
FIGURE 4 deactivating some units during training. The model was compiled using
Distribution of classes before the data augmentation process. the Adam optimiser, which is efficient and adaptive, and the categorical
cross-entropy loss function, suitable for multi-class classification tasks.

FIGURE 5
Distribution of classes after the data augmentation process.

Frontiers in Education 09 frontiersin.org

Kalita et al. 10.3389/feduc.2025.1581247

FIGURE 6
Model architecture.

The accuracy metric was used to evaluate the model’s performance and F1-score, which help assess the performance of the models in
during training and validation (Figure 7). predicting the target variable, STGPA. The algorithms used in the
comparison include CatBoost, XGBoost, HistGradientBoosting, and
LightGBM (Figure 8).
For each algorithm, the following metrics were calculated based
4 Results and discussion on the values of True Positives (TP), False Positives (FP), True
Negatives (TN), and False Negatives (FN)
In this section, we will describe the results obtained from comparing
the performance of various machine learning algorithms. The evaluation 1 Accuracy: This metric measures the proportion of correct
was based on several key metrics, including accuracy, precision, recall, predictions made by the model relative to the total number of

Frontiers in Education 10 frontiersin.org

Kalita et al. 10.3389/feduc.2025.1581247

FIGURE 7
Training and validation accuracy over epochs.

FIGURE 8
Training and validation loss over epochs.

predictions (see Equation 3). Higher accuracy indicates better (see Equation 4). This is particularly crucial when incorrect
overall performance. positive predictions have significant negative consequences.

TP + TN TP
Accuracy = (3) Precision = (4)
TP + TN + FP + FN TP + FP

3 Recall: This metric indicates how well the model identifies all
2 Precision: This metric measures the proportion of true positive relevant instances of the positive class (see Equation 5). It is
predictions among all positive predictions made by the model critical when false negatives are costly.

Frontiers in Education 11 frontiersin.org

Kalita et al. 10.3389/feduc.2025.1581247

TP 4.2 Model evaluation

Recall = (5)
TP + FN
The research developed a deep learning approach with Bi-LSTM
for predicting academic performance of students by using GPA as the
4 F1-score: This is the harmonic mean of precision and recall, main metric. The proposed model underwent an evaluation test
offering a balance between the two (see Equation 6). A high F1 where it competed with CatBoost, XGBoost, HistGradientBoosting,
score indicates that the model performs well in precision and recall. and LightGBM through a classification methodology based on
accuracy, precision, recall, and F1-score metrics. The Bi-LSTM model
receives SHAP (SHapley Additive Explanations) interpretation to
Recall ∗ precision achieve transparency and trustworthiness for its global and
F1Score= 2 ∗ (6)
Recall + precision local output.

4.2.1 Comparative model performance

Table 2 paired with Figure 9 showed that the Bi-LSTM model
4.1 Model performance outperformed other models in all performance metrics. Among
the ML models, XGBoost achieves the highest accuracy (87.14%),
The training process consisted of 200 epochs, and throughout this precision (86.94%), recall (87.18%), and F1-score (86.98%),
period, the model exhibited a significant increase in performance. demonstrating its superior performance. The deep learning
Starting with a training accuracy of 63%, it gradually increases to 91%, model, Bi-LSTM, outperforms all others with the highest accuracy
while validation accuracy stabilises above 87%, demonstrating strong (88%) and significantly higher precision (92.02%), recall (92.11%),
generalisation. The cross-entropy loss decreases consistently for training and F1-score (91.98%), indicating its effectiveness in capturing
and validation sets, indicating effective optimisation. Early stopping is complex patterns and achieving better overall results. The
applied to monitor the validation loss and prevent overfitting; the performance metrics from Figure 9 demonstrated Bi-LSTM
training halts when the validation loss ceases to improve for a predefined exceeds all other models particularly with respect to precision and
number of epochs. This approach ensures the model achieves optimal recall levels where it displays substantial superiority over baseline
performance without overfitting the training data. evaluations. For instance, the higher precision means Bi-LSTM is

TABLE 2 Models’ performance comparison.

Model Accuracy (%) Precision (%) Recall (%) F1-Score (%)

CatBoost 86.61 86.38 86.65 86.42

XGBoost 87.14 86.94 87.18 86.98

HistGradientBoosting 86.63 86.42 86.68 86.43

LightGBM 86.03 85.79 86.07 85.82

Deep Learning (Bi-LSTM) 88.23 92.02 92.11 91.98

Bold values indicate that the method has high accuracy.

FIGURE 9
Performance models by metrics.

Frontiers in Education 12 frontiersin.org

Kalita et al. 10.3389/feduc.2025.1581247

more accurate in identifying students who are truly at risk of numerical values to each attribute to identify how much they impact
underperforming, reducing false interventions. The recall prediction results. The obtained insights from SHAP evaluations can
mechanism protected the identification of most students who be seen in Figure 10 of the SHAP Violin Summary Plot and Figure 11
need attention. The F1-score demonstrated that Bi-LSTM achieves of the SHAP Heatmap Plot.
better overall performance through its single balanced metric
reflecting both precision gains and recall enhancement. The 4.2.2.1 SHAP violin plot
information presented here becomes vital for educators who need Figure 9 revealed that: Among all predictive factors, FTGPA
systems that perform detection and intervention activities without (First Term GPA) showed the greatest impact because its data
making errors. The significant margin demonstrated an important distribution extends the furthest toward zero from the x-axis.
increase in the trustworthiness of models particularly when Student performance in first term and high school together with
applied to real-world academic tasks. ACT scores demonstrated similar importance levels which capture
their academic development and standardised testing abilities. Data
4.2.2 Feature importance via SHAP values from the model indicated that AGE and SEX variables had only
The opacity of DL models required the use of SHAP to explain small predictive power due to their negligible impact. The graphical
Bi-LSTM output and validate its predictions. SHAP attributes representation proved academic historical data supersedes

FIGURE 10
Violin summary plot based on SHAP values.

FIGURE 11
Heatmap plot based on SHAP values.

Frontiers in Education 13 frontiersin.org

Kalita et al. 10.3389/feduc.2025.1581247

TABLE 3 Results of the descriptive statics test.

Metric Accuracy (%) Precision (%) Recall (%) F1-Score (%)

Count 5 5 5 5

Mean 86.93 87.51 87.74 87.53

Std 0.83 2.55 2.48 2.52

Min 86.03 85.79 86.07 85.82

25% 86.61 86.38 86.65 86.42

50% 86.63 86.42 86.68 86.43

75% 87.14 86.94 87.18 86.98

Max 88.23 92.02 92.11 91.98

demographic characteristics in predicting GPA which strengthens • Percentiles (25, 50, 75%): provide insights into the distribution of
the model’s application relevance for educational purposes. performance scores.

4.2.2.2 SHAP Heatmap These descriptive statistics reveal that Bi-LSTM consistently
Figure 11 demonstrated local explanation through visual presentation outperforms other models in accuracy and F1-score, with a notable
of how individual student predictions relate to each feature. A positive difference in precision and recall.
SHAP contribution appears as red while negative SHAP influence shows
up as blue. For instance: Predicted GPA values are consistently higher 4.3.2 Friedman test
when the students demonstrate high FTGPA and HSGPA levels which The Friedman test for repeated measures is applied to compare the
appear as red in color. The model uses blue color to identify instances models and identify any significant differences in their performance
when variables have lower values which results in predicted outcomes that across the four metrics. The results are:
decrease. The model’s predictions received confidence through this
approach allowing advisors to identify explanation reasons for each • Chi-squared: 11.1600
prediction so they can deliver tailored guidance. • p-value: 0.0109
The stakeholders can identify the students at risk early and deliver
appropriate exhortation in an auspicious manner. This can help Thus, a p-value of 0.0109 gives a sign of difference between the
prevent students from dropping out of the institution and improve the models, meaning that Bi-LSTM is statistically different from the
institution’s overall performance. others when comparing the mean value for the complete combination
of all aspects.

4.3 Statistical analysis 4.3.3 Bootstrap confidence intervals

To evaluate the uncertainty of the differences between the models,
In this section, the performance of selected ML and DL models the bootstrap 95% confidence intervals are calculated for the
are compared using the mean, median, standard deviation, t-test–test, comparisons of each model against Bi-LSTM. The intervals for the
bootstrap confidence levels, Friedman test, Effect Sizes (Cohen’s d) difference in performance metrics (e.g., accuracy, precision, recall, and
and Tukey’s HSD Test (Penick and Brewer, 1972; Liu and Xu, 2022; F1-score) are as follows:
Carpenter and Bithell, 2000). These statistical measures provide
crucial information for assessing both the operational efficiency and • CatBoost vs. Bi-LSTM: (−5.6125, −2.6625)
the overall credibility of the models. This section provides further • XGBoost vs. Bi-LSTM: (−5.0625, −2.1175)
insight into evaluating the performance metrics of various models • HistGradientBoosting vs. Bi-LSTM: (−5.5900, −2.6400)
using statistical measures such as accuracy, precision, recall, F1-score • LightGBM vs. Bi-LSTM: (−6.2075, −3.2475)
and statistical tests at varying thresholds.
The negative values in these intervals support the authors’
4.3.1 Descriptive statistics conclusion that Bi-LSTM performs better than these models in
Descriptive statistics provide an overview of the performance of accuracy and other measures. The confidence intervals show
each model in terms of the key metrics: accuracy, Precision, Recall, statistically significant differences where Bi-LSTM performs higher
and F1-score. The following table summarises the descriptive statistics than other models.
for each metric across all models (Table 3):
4.3.4 Effect sizes (Cohen’s d)
• Mean: the average performance across all models. Cohen’s d is used to measure the difference in performance
• Standard Deviation: indicates the variability of each between the models and Bi-LSTM. The following results are obtained:
model’s performance.
• Min/Max: represents the lowest and highest performance, • CatBoost vs. Bi-LSTM: −3.9095
respectively. • XGBoost vs. Bi-LSTM: −3.4453

Frontiers in Education 14 frontiersin.org

Kalita et al. 10.3389/feduc.2025.1581247

TABLE 4 Tukey’s HSD test results.

Model 1 Model 2 Mean difference p-value Reject null hypothesis

Bi-LSTM CatBoost −4.57 0.0000 Yes

Bi-LSTM HistGradientBoosting −4.545 0.0000 Yes

Bi-LSTM LightGBM −5.16 0.0000 Yes

Bi-LSTM XGBoost −4.03 0.0001 Yes

• HistGradientBoosting vs. Bi-LSTM: −3.8882 4.4.2 Interpretation of evaluation metrics in

• LightGBM vs. Bi-LSTM: −4.4107 context
Statistical metrics generate quantitative comparisons yet educators
Cohen’s d values indicate large effect sizes and show that the need to interpret these metrics in actual educational scenarios.
Bi-LSTM model was significantly better than the other models. A
negative Cohen’s d value suggests that Bi-LSTM is more accurate than • High accuracy ensures the model is generally correct in
the others. its predictions.
• High precision reduces false alarms, meaning fewer students are
4.3.5 Tukey’s HSD test incorrectly flagged as at-risk.
Finally, Tukey’s Honestly Significant Difference (HSD) test is used • High recall ensures that truly at-risk students are not overlooked,
to compare all the models. The proposed model achieves the best which is critical for timely interventions.
performance of all the models and is statistically significantly different • High F1-score indicates a balanced and reliable predictive system
from CatBoost, XGBoost, HistGradientBoosting, and LightGBM with that can be trusted in operational settings.
a p-value <0.05. From Table 4, it is observed that Bi-LSTM is statistically
superior in predicting student performance than traditional models. Academic institutions deploying the Bi-LSTM model can predict
underperforming students in advance and deploy focused resources
to prevent failure through early intervention.
4.4 Discussion
4.4.3 Model interpretability and feature relevance
The research objective focused on creating a DL model (using Black-box deep learning methods historically lacked transparency
Bi-LSTM) to predict academic performance of students based on which makes explainability vital when student decisions are at stake in
GPA through an interpretable approach. The model is tested against any educational context. This research uses SHAP as an interpretation
four competitive ML algorithms namely CatBoost, XGBoost, tool for model predictive output. Through SHAP explanation
HistGradientBoosting, and LightGBM. This research utilised stakeholders can learn about standard influences from features while
comprehensive evaluation standards together with extensive simultaneously gaining case-by-case interpretation ability for specialised
statistical examinations that ensured strong model interventions. The most influential features, as shown in the SHAP
performance assessment. summary and heatmap plots, include:

4.4.1 Comparative performance of models • First-Term GPA (FTGPA): Reflects initial academic performance
All performance evaluation metrics from Table 2 demonstrate a and is a strong early indicator.
clear superiority of Bi-LSTM compared to ML approaches for all • High School GPA (HSGPA): Captures foundational
precision, recall, accuracy and F1-score measures. In particular: academic preparedness.
• Standardised Test Scores (ACT): Signifies cognitive aptitude and
• Bi-LSTM achieved 88.23% accuracy, outperforming the next-best readiness for college-level curriculum.
model, XGBoost, which reached 87.14%.
• Precision and Recall, both critical for identifying at-risk students, Research in educational data mining supports a clear connection
reached 92.02 and 92.11%, respectively, for Bi-LSTM. These between previous academic performance and future student
values are significantly higher than those of all ML counterparts achievement levels. Understanding the connection between data
(which ranged from 85.79 to 87.18%). points and student outcomes through SHAP analysis makes model
• The F1-score of Bi-LSTM (91.98%) reflects an excellent balance transparency possible which leads to better adoption by HEIs top
between precision and recall, signifying that the model effectively management and administrators of non-technical backgrounds.
minimises both false positives and false negatives.
4.4.4 Statistical validation of performance
The research demonstrated that deep learning algorithms such as superiority
Bi-LSTM exceed traditional ML models when processing educational The following statistical techniques were used to validate the
data through sequential and contextual dependency modelling. The findings along with their generalizability and credibility levels:
model employed bidirectional memory to access past and future
temporal data which proved crucial for understanding • The Friedman test, a non-parametric test for comparing multiple
academic trajectories. models over multiple datasets or metrics, revealed a statistically

Frontiers in Education 15 frontiersin.org

Kalita et al. 10.3389/feduc.2025.1581247

significant difference (χ2 = 11.16, p = 0.0109) among the models. • Temporal Dynamics: Real-time updates and time-series
This confirms that the observed performance differences are not changes have not been included into the present model
due to random variation. framework. The predictive capabilities and applicability of
• Bootstrap confidence intervals were calculated to assess the the model will improve by implementing longitudinal
uncertainty around the performance gaps. All intervals tracking systems.
comparing Bi-LSTM with other models (e.g., CatBoost, • Holistic Feature Space: Additional metadata about mental
LightGBM) had negative lower and upper bounds, indicating health and financial stress as well as engagement levels is
Bi-LSTM consistently outperformed its counterparts with missing from the current model assessment. Future versions
95% confidence. of the model must incorporate socio-emotional and
• Cohen’s d effect size provided further confirmation. The behavioural information to build a predictive instrument
magnitude of the effect sizes ranged from −3.4 to −4.4, with a broader scope.
representing very large effects. This statistically supports the
assertion that Bi-LSTM is meaningfully better, not
just marginally. 5 Conclusion
• Tukey’s HSD (Honestly Significant Difference) test
confirmed pairwise statistical superiority of Bi-LSTM over In this work, we proposed a deep learning-based model,
each individual model (p < 0.0001 in all cases), providing specifically a Bi-LSTM (Bidirectional Long Short-Term Memory)
robust post-hoc evidence to the Friedman results. network, to predict the second-term GPA. Our model was
evaluated against several other algorithms, including CatBoost,
Our analysis utilised multiple approaches for validation to XGBoost, HistGradientBoosting, and LightGBM, using key
enhance the credibility of our study’s findings. These evaluations performance metrics such as accuracy, precision, recall, and
create confidence in decision-makers who typically need F1-score. The results demonstrated that our proposed Bi-LSTM
empirical validation to feel comfortable adopting model outperforms the traditional machine learning algorithms
AI-based systems. in terms of predictive accuracy, highlighting the potential of deep
learning techniques for academic performance prediction. This
4.4.5 Relevance for non-technical stakeholders type of model can be utilised to mitigate student dropout and
The technical aspects of this study produce significant enhance the performance of the students. One of the limitations
practical benefits for educational institutions. This statistical of the study is the size of the dataset. In future, we shall try to
model generated results which serve practical strategic purposes: collect more data to boost the performance of the deep learning
model. The integration of deep learning strategies and SHAP
• HEIs top management and academic advisors can use the values in a single framework could overcome the challenges of
predictive results, along with SHAP explanations, to engage the trade-off between the student academic performance model’s
students in informed discussions and recommend tailored explainability and intricacy and augment model accuracy and
support plans. transparency. The performance of selected ML and DL models
• Administrators can incorporate the model into early alert are also compared using the mean, median, standard deviation,
systems to drive data-informed policies aimed at reducing t-test–test, bootstrap confidence levels, Friedman test, Effect
dropout rates and improving overall Sizes (Cohen’s d) and Tukey’s HSD Test. The results demonstrate
institutional performance. that BI-LSTM performance is significantly different from other
• Policymakers can explore this model as a blueprint for models. This study could open horizons for other researchers to
scalable national or state-level educational interventions, conduct analogous studies in the domain.
especially in systems that are resource-constrained but rich
in historical academic data.
Data availability statement
The Bi-LSTM model provided a unique combination between
outstanding predictive capabilities and easy interpretability The data analyzed in this study is subject to the following
which makes it essential for education domains requiring both licenses/restrictions: data will be provided on a request. Requests
technical brilliance and ethical clarity. to access these datasets should be directed to [email protected].

4.4.6 Limitations and future directions

Despite the encouraging results, this study also has several Author contributions
limitations such as:
EK: Writing – original draft, Writing – review & editing. AA:
• Dataset Size: The current dataset worked well for analysis yet Writing – original draft, Writing – review & editing. HE: Writing –
it may fail to show differences between student populations original draft, Writing – review & editing. AK: Writing – original draft,
across various institutions and geographic areas. Future Writing – review & editing. SH: Writing – original draft, Writing – review
research should develop a larger research dataset which & editing. TA: Writing – original draft, Writing – review & editing. SG:
encompasses multiple institutions. Writing – original draft, Writing – review & editing.

Frontiers in Education 16 frontiersin.org

Kalita et al. 10.3389/feduc.2025.1581247

Funding Generative AI statement

The author(s) declare that financial support was received for the The authors declare that no Gen AI was used in the creation of
research and/or publication of this article. This paper is financed by this manuscript.
the European Union-NextGenerationEU, through the National
Recovery and Resilience Plan of the Republic of Bulgaria, project No
BG-RRP-2.004-0001-C01. Publisher’s note
All claims expressed in this article are solely those of the authors
Conflict of interest and do not necessarily represent those of their affiliated
organizations, or those of the publisher, the editors and the
The authors declare that the research was conducted in the reviewers. Any product that may be evaluated in this article, or claim
absence of any commercial or financial relationships that could that may be made by its manufacturer, is not guaranteed or endorsed
be construed as a potential conflict of interest. by the publisher.

References
Ajibade, S. S. M., Dayupay, J., Ngo-Hoang, D. L., Oyebode, O. J., and Sasan, J. M. Kaunang, F. J., and Rotikan, R. (2018). Students' academic performance prediction
(2022). Utilization of ensemble techniques for prediction of the academic performance using data mining. In 2018 third international conference on informatics and computing
of students. J. Optoelectron. Laser 41, 48–54. (ICIC) (1–5).
Alam, A., and Mohanty, A. (2022). Predicting students’ performance employing Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., et al. (2017). LightGBM: a
educational data mining techniques, machine learning, and learning analytics. In highly efficient gradient boosting decision tree. Adv. Neural Inf. Proces. Syst. 30,
International conference on communication, networks and computing (166–177). 3149–3157.
Cham: Springer Nature Switzerland.
Kukkar, A., Mohana, R., Sharma, A., and Nayyar, A. (2023). Prediction of student
Alamri, R., and Alharbi, B. (2021). Explainable student performance prediction academic performance based on their emotional wellbeing and interaction on various
models: a systematic review. IEEE Access 9, 33132–33143. doi: e-learning platforms. Educ. Inf. Technol. 28, 9655–9684. doi: 10.1007/s10639-022-11573-9
10.1109/ACCESS.2021.3061368
Kukkar, A., Mohana, R., Sharma, A., and Nayyar, A. (2024). A novel methodology
Al-Azazi, F. A., and Ghurab, M. (2023). ANN-LSTM: a deep learning model for early using RNN+ LSTM+ ML for predicting student’s academic performance. Educ. Inf.
student performance prediction in MOOC. Heliyon 9:e15382. doi: Technol. 29, 14365–14401. doi: 10.1007/s10639-023-12394-0
10.1016/j.heliyon.2023.e15382
Lee, C. A., Tzeng, J. W., Huang, N. F., and Su, Y. S. (2021). Prediction of student
Albreiki, B., Zaki, N., and Alashwal, H. (2021). A systematic literature review of performance in massive open online courses using deep learning system based on
student performance prediction using machine learning techniques. Educ. Sci. 11:552. learning behaviors. Educ. Technol. Soc. 24, 130–146.
doi: 10.3390/educsci11090552
Liang, G., Jiang, C., Ping, Q., and Jiang, X. (2024). Academic performance prediction
Bravo-Agapito, J., Romero, S. J., and Pamplona, S. (2021). Early prediction of associated with synchronous online interactive learning behaviours based on the
undergraduate student's academic performance in completely online learning: a machine learning approach. Interact. Learn. Environ. 32, 3092–3107. doi:
five-year study. Comput. Human Behav. 115:106595. doi: 10.1080/10494820.2023.2167836
10.1016/j.chb.2020.106595
Liu, J., and Xu, Y. (2022). T-friedman test: a new statistical test for multiple comparison
Carpenter, J., and Bithell, J. (2000). Bootstrap confidence intervals: when, which, with an adjustable conservativeness measure. Int. J. Comput. Intell. Syst. 15:29. doi:
what? A practical guide for medical statisticians. Stat. Med. 19, 1141–1164. doi: 10.1007/s44196-022-00083-8
10.1002/(SICI)1097-0258(20000515)19:9<1141::AID-SIM479>3.0.CO;2-F
Lundberg, S. M., and Lee, S.-I. (2017). A unified approach to interpreting model
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). SMOTE: predictions. Adv. Neural Inform. Proces. Syst. 30, 4765–4774.
synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357. doi:
Mahawar, K., and Rattan, P. (2025). Empowering education: harnessing ensemble
10.1613/jair.953
machine learning approach and ACO-DT classifier for early student academic
Chen, T., and Guestrin, C. (2016). XGBoost: a scalable tree boosting system. performance prediction. Educ. Inf. Technol. 30, 4639–4667. doi:
Proceedings of the 22nd ACM SIGKDD international conference on knowledge 10.1007/s10639-024-12976-6
discovery and data mining, 785–794
Manigandan, E., Anispremkoilraj, P., Kumar, B. S., Satre, S. M., Chauhan, A., and
Dabhade, P., Agarwal, R., Alameen, K. P., Fathima, A. T., Sridharan, R., and Jeyaganthan, C. (2024). An effective BiLSTM-CRF based approach to predict student
Gopakumar, G. (2021). Educational data mining for predicting students’ academic achievement: an experimental evaluation. In 2024 2nd international conference on
performance using machine learning algorithms. Mater. Today Proc. 47, 5260–5267. doi: intelligent data communication technologies and internet of things (IDCIoT) (pp.
10.1016/j.matpr.2021.05.646 779–784). IEEE.
Graves, A., and Schmidhuber, J. (2005). Framewise phoneme classification with Nabil, A., Seyam, M., and Abou-Elfetouh, A. (2021). Prediction of students’ academic
bidirectional LSTM and other neural network architectures. Neural Netw. 18, 602–610. performance based on courses’ grades using deep neural networks. IEEE Access 9,
doi: 10.1016/j.neunet.2005.06.042 140731–140746. doi: 10.1109/ACCESS.2021.3119596
Hamsa, H., Indiradevi, S., and Kizhakkethottam, J. J. (2016). Student academic Nurudeen, A. H., Fakhrou, A., Lawal, N., and Ghareeb, S. (2024). Academic
performance prediction model using decision tree and fuzzy genetic algorithm. Procedia performance of engineering students: a predictive validity study of first-year GPA and
Technol. 25, 326–332. doi: 10.1016/j.protcy.2016.08.114 final-year CGPA. Eng. Rep. 6:e12766. doi: 10.1002/eng2.12766
Huang, Q., and Zeng, Y. (2024). Improving academic performance predictions with Penick, J. E., and Brewer, J. K. (1972). The power of statistical tests in science teachnig
dual graph neural networks. Complex Intell. Syst. 10, 3557–3575. doi: research. J. Res. Sci. Teach. 9, 377–381. doi: 10.1002/tea.3660090410
10.1007/s40747-024-01344-z
Prokhorenkova, L., Gusev, G., and Vorobev, A. (2018). CatBoost: gradient
Hussain, M. M., Akbar, S., Hassan, S. A., Aziz, M. W., and Urooj, F. (2024). Prediction boosting on decision trees with categorical features support. Proceedings of the 2nd
of student’s academic performance through data mining approach. J. Inform. Web Eng. ACM SIGKDD international conference on knowledge discovery and data mining,
3, 241–251. doi: 10.33093/jiwe.2024.3.1.16 1125–1134.
Hussain, S., and Khan, M. Q. (2023). Student-performulator: predicting students’ Raji, N. R., Kumar, R. M. S., and Biji, C. L. (2024). Explainable machine learning
academic performance at secondary and intermediate level using machine learning. prediction for the academic performance of deaf scholars. IEEE Access 12, 23595–23612.
Ann. Data Sci. 10, 637–655. doi: 10.1007/s40745-021-00341-0
Rodríguez-Hernández, C. F., Musso, M., Kyndt, E., and Cascallar, E. (2021). Artificial
Kapucu, M. S., Özcan, H., and Aypay, A. (2024). Predicting secondary school students' neural networks in academic performance prediction: systematic implementation and
academic performance in science course by machine learning. Int. J. Technol. Educ. Sci. predictor evaluation. Comput. Educ. Artif. Intell. 2:100018. doi:
8, 41–62. doi: 10.46328/ijtes.518 10.1016/j.caeai.2021.100018

Frontiers in Education 17 frontiersin.org

Kalita et al. 10.3389/feduc.2025.1581247

Sarker, S., Paul, M. K., Thasin, S. T. H., and Hasan, M. A. M. (2024). Analyzing Si, S., Zhang, S., and Keerthi, S. S. (2017). Histogram-based gradient boosting for
students' academic performance using educational data mining. Comput. Educ. Artif. categorical and numerical features. Proceedings of the 23rd ACM SIGKDD international
Intell. 7:100263. doi: 10.1016/j.caeai.2024.100263 conference on knowledge discovery and data mining, 765–774
Sateesh, N., Rao, P. S., and Lakshmi, D. R. (2023). Deep belief bi-directional LSTM Waheed, H., Hassan, S. U., Aljohani, N. R., Hardman, J., Alelyani, S., and
network-based intelligent student's performance prediction model with entropy Nawaz, R. (2020). Predicting academic performance of students from VLE big data
weighted fuzzy rough set mining. Int. J. Intell. Inf. Database Syst. 16, 107–142. doi: using deep learning models. Comput. Human Behav. 104:106189. doi:
10.1504/IJIIDS.2023.131411 10.1016/j.chb.2019.106189
Shaninah, F. S. E., and Mohd Noor, M. H. (2024). The impact of big five personality Wang, X., Zhao, Y., Li, C., and Ren, P. (2023). ProbSAP: a comprehensive and high-
trait in predicting student academic performance. J. Appl. Res. High. Educ. 16, 523–539. performance system for student academic performance prediction. Pattern Recogn.
doi: 10.1108/JARHE-08-2022-0274 137:109309. doi: 10.1016/j.patcog.2023.109309
Shen, Y. (2024). Using long short-term memory networks (LSTM) to predict student Yağcı, M. (2022). Educational data mining: prediction of students' academic
academic achievement: dynamic learning path adjustment. In Proceedings of the 2024 performance using machine learning algorithms. Smart Learn. Environ. 9:11. doi:
international conference on machine intelligence and digital applications (627–634). 10.1186/s40561-022-00192-z

Frontiers in Education 18 frontiersin.org

Early Prediction of Student Performance in Face-To-Face Education Environments A Hybrid Deep Learning
No ratings yet
Early Prediction of Student Performance in Face-To-Face Education Environments A Hybrid Deep Learning
15 pages
1 s2.0 S2772503025000180 Main
No ratings yet
1 s2.0 S2772503025000180 Main
16 pages
Prediction of Student Academic Performance Based On Their Emotional Wellbeing and Interaction On Various e Learning Platforms
No ratings yet
Prediction of Student Academic Performance Based On Their Emotional Wellbeing and Interaction On Various e Learning Platforms
30 pages
Students Performance Prediction
No ratings yet
Students Performance Prediction
5 pages
Article 4
No ratings yet
Article 4
9 pages
Revue D'intelligence Artificielle: Received: 1 October 2021 Accepted: 21 October 2021
No ratings yet
Revue D'intelligence Artificielle: Received: 1 October 2021 Accepted: 21 October 2021
7 pages
Predicting Student Success
No ratings yet
Predicting Student Success
3 pages
Computational Intelligence and Neuroscience - 2022 - Alsariera - Assessment and Evaluation of Different Machine Learning
No ratings yet
Computational Intelligence and Neuroscience - 2022 - Alsariera - Assessment and Evaluation of Different Machine Learning
11 pages
AI Models for Online Student Success
No ratings yet
AI Models for Online Student Success
25 pages
Competency Learning and Student Centric
No ratings yet
Competency Learning and Student Centric
14 pages
Predicting Student Performance From Online Engagement Activities Using Novel Statistical Features
No ratings yet
Predicting Student Performance From Online Engagement Activities Using Novel Statistical Features
19 pages
Machine Learning for Student Performance Prediction
No ratings yet
Machine Learning for Student Performance Prediction
17 pages
14 Predicting Students Performance in Educational Data Mining
No ratings yet
14 Predicting Students Performance in Educational Data Mining
4 pages
Applsci 11 00237 v3
No ratings yet
Applsci 11 00237 v3
28 pages
Predicting and Interpreting Student Performance Using Ensemble Models and Shapley Additive Explanations
No ratings yet
Predicting and Interpreting Student Performance Using Ensemble Models and Shapley Additive Explanations
16 pages
A Deep Learning Approach Towards Student Performance Prediction in Online Courses Challenges Based On A Global Perspective
No ratings yet
A Deep Learning Approach Towards Student Performance Prediction in Online Courses Challenges Based On A Global Perspective
6 pages
Introduce and Related Work
No ratings yet
Introduce and Related Work
3 pages
Data Mining Approach To Predict Academic Performance of Students
No ratings yet
Data Mining Approach To Predict Academic Performance of Students
11 pages
Ai-Based Early Prediction and Intervention For Student Academic Performance in Higher Education
No ratings yet
Ai-Based Early Prediction and Intervention For Student Academic Performance in Higher Education
19 pages
Career Predction
No ratings yet
Career Predction
10 pages
An E Cient Deep Learning Approach For Prediction of Student Performance Using Neural Network
No ratings yet
An E Cient Deep Learning Approach For Prediction of Student Performance Using Neural Network
13 pages
Huang2021 Article AFeatureWeightedSupportVectorM
No ratings yet
Huang2021 Article AFeatureWeightedSupportVectorM
13 pages
(IJCST-V11I4P11) :vaibhav Sharma, Manoj Patil
No ratings yet
(IJCST-V11I4P11) :vaibhav Sharma, Manoj Patil
3 pages
Ncisem-2022 Paper 24
No ratings yet
Ncisem-2022 Paper 24
13 pages
A Hybrid Model Integrating Recurrent Neural Networks and The Semi-Supervised Support Vector Machine For Identification of Early Student Dropout Risk
No ratings yet
A Hybrid Model Integrating Recurrent Neural Networks and The Semi-Supervised Support Vector Machine For Identification of Early Student Dropout Risk
31 pages
Second
No ratings yet
Second
27 pages
2023-Contextualizing The Current State of Research On The Use Ofmachine Learning For Student Performance Prediction Asystematic Literature Review
No ratings yet
2023-Contextualizing The Current State of Research On The Use Ofmachine Learning For Student Performance Prediction Asystematic Literature Review
25 pages
Academic Prediction
No ratings yet
Academic Prediction
5 pages
PredictingStudentSuccess-AutoML PrePrint
No ratings yet
PredictingStudentSuccess-AutoML PrePrint
23 pages
Computer Science Students Academic Performance Prediction Using Ai
No ratings yet
Computer Science Students Academic Performance Prediction Using Ai
68 pages
Rofiat Project
No ratings yet
Rofiat Project
5 pages
When Probabilities Are Not Enough - A Frameworkfor Causal Explanations of Student Success Models
No ratings yet
When Probabilities Are Not Enough - A Frameworkfor Causal Explanations of Student Success Models
24 pages
Leveraging Machine Learning Approaches For Predicting Students' Academic Success An Analytical Perspective
No ratings yet
Leveraging Machine Learning Approaches For Predicting Students' Academic Success An Analytical Perspective
16 pages
Explainable AI Methods For Predicting Student Grades and Improving Academic Success
No ratings yet
Explainable AI Methods For Predicting Student Grades and Improving Academic Success
10 pages
1st Review.1
No ratings yet
1st Review.1
10 pages
A Systematic Literature Review
No ratings yet
A Systematic Literature Review
28 pages
Jeml 0102005
No ratings yet
Jeml 0102005
7 pages
An Integrated System Framework For Predicting Students' Academic Performance in Higher Educational Institutions
No ratings yet
An Integrated System Framework For Predicting Students' Academic Performance in Higher Educational Institutions
9 pages
Using Educational Data Mining To Predict Students
No ratings yet
Using Educational Data Mining To Predict Students
17 pages
Early Student Performance Prediction
No ratings yet
Early Student Performance Prediction
12 pages
Educational Data Mining For Predicting Studentsâ ™ Academic Performance Using Machine Learning Algorithms
No ratings yet
Educational Data Mining For Predicting Studentsâ ™ Academic Performance Using Machine Learning Algorithms
8 pages
Predicting Student Performance with DNN
No ratings yet
Predicting Student Performance with DNN
9 pages
Academic Performance Prediction Using Machine Learning Approaches A Survey
No ratings yet
Academic Performance Prediction Using Machine Learning Approaches A Survey
18 pages
Final PPT Gruop 143k
No ratings yet
Final PPT Gruop 143k
26 pages
Multi-Label Feature Aware XGBoost Model For Student Performance Assessment Using Behavior Data in Online Learning Environment
No ratings yet
Multi-Label Feature Aware XGBoost Model For Student Performance Assessment Using Behavior Data in Online Learning Environment
7 pages
The Power of Deep Learning Techniques For Predicting Student Performance in Virtual Learning Environments A Systematic Literature Review
No ratings yet
The Power of Deep Learning Techniques For Predicting Student Performance in Virtual Learning Environments A Systematic Literature Review
29 pages
12 IV April 2024
No ratings yet
12 IV April 2024
8 pages
Journal Publications
No ratings yet
Journal Publications
13 pages
ML in Student Performance Analysis
No ratings yet
ML in Student Performance Analysis
15 pages
Can We Predict Student Performance Based On Tabular and Textual Data
No ratings yet
Can We Predict Student Performance Based On Tabular and Textual Data
12 pages
SFA Paper 10
No ratings yet
SFA Paper 10
2 pages
Shsconf Glob2021 09001
No ratings yet
Shsconf Glob2021 09001
10 pages
Machine Learning for Student Dropout Prediction
No ratings yet
Machine Learning for Student Dropout Prediction
4 pages
Prediction of Students Performance With Learning Coefficients Using Regression Based Machine Learning Models
No ratings yet
Prediction of Students Performance With Learning Coefficients Using Regression Based Machine Learning Models
11 pages
28596-Texto Del Manuscrito-113757-1-10-20240415
No ratings yet
28596-Texto Del Manuscrito-113757-1-10-20240415
31 pages
SFA Paper 4
No ratings yet
SFA Paper 4
2 pages
10 1016@j Caeai 2021 100018
No ratings yet
10 1016@j Caeai 2021 100018
15 pages
English Module Q3 Week1
No ratings yet
English Module Q3 Week1
9 pages
LSTM Predicts Campus Energy Use
No ratings yet
LSTM Predicts Campus Energy Use
12 pages
Stata Regression Techniques Overview
No ratings yet
Stata Regression Techniques Overview
134 pages
Excel Data Mining Guide
100% (1)
Excel Data Mining Guide
178 pages
Coca-Cola Stock Price LSTM Prediction
No ratings yet
Coca-Cola Stock Price LSTM Prediction
13 pages
HR Analytics
No ratings yet
HR Analytics
22 pages
Johnsson 1987
No ratings yet
Johnsson 1987
9 pages
2
No ratings yet
2
2 pages
An Introduction To Probabilistic Seismic Hazard Analysis (PSHA)
No ratings yet
An Introduction To Probabilistic Seismic Hazard Analysis (PSHA)
72 pages
Data Analytics Approaches Explained
No ratings yet
Data Analytics Approaches Explained
50 pages
Journal of Dental Research: Meta-Analytical Review of Parameters Involved in Dentin Bonding
No ratings yet
Journal of Dental Research: Meta-Analytical Review of Parameters Involved in Dentin Bonding
8 pages
Data Science
No ratings yet
Data Science
29 pages
F
No ratings yet
F
21 pages
ICCET 2022 - Proceedings Release v1
No ratings yet
ICCET 2022 - Proceedings Release v1
357 pages
Senior Prac Research 2 Q1 - M2
No ratings yet
Senior Prac Research 2 Q1 - M2
20 pages
One Column IEEE Journal Article
No ratings yet
One Column IEEE Journal Article
10 pages
Weekly Gas Purchase Decision Model
No ratings yet
Weekly Gas Purchase Decision Model
37 pages
Mathematics 08 01245 v2
No ratings yet
Mathematics 08 01245 v2
29 pages
Disease Prediction Using Machine Learning Over Big Data
No ratings yet
Disease Prediction Using Machine Learning Over Big Data
8 pages
The Amazing Egg Drop Project
No ratings yet
The Amazing Egg Drop Project
8 pages
The Decay of Western Civilization: Double Relaxed Darwinian Selection
No ratings yet
The Decay of Western Civilization: Double Relaxed Darwinian Selection
45 pages
MD - Walid - 20103160 - Sec I Complete
No ratings yet
MD - Walid - 20103160 - Sec I Complete
29 pages
Research Aptitude by Talvir PDF
50% (2)
Research Aptitude by Talvir PDF
27 pages
Augmented Data and Neural Networks For Robust Epidemic Forecasting: Application To COVID-19 in Italy
No ratings yet
Augmented Data and Neural Networks For Robust Epidemic Forecasting: Application To COVID-19 in Italy
26 pages
Black Pages of Shustah Cards
No ratings yet
Black Pages of Shustah Cards
3 pages
Thoughts by Rakesh Jhunjhunwala
No ratings yet
Thoughts by Rakesh Jhunjhunwala
6 pages
Bross (1990) Cybernetic Model
No ratings yet
Bross (1990) Cybernetic Model
7 pages
Codsoft Report
No ratings yet
Codsoft Report
26 pages
Wine Prediction
100% (1)
Wine Prediction
13 pages
Data Table: Ecology Lesson 1: Name: - Grade & Section
No ratings yet
Data Table: Ecology Lesson 1: Name: - Grade & Section
5 pages

A Deep Learning Model Using A Bi-LSTM Network

Uploaded by

A Deep Learning Model Using A Bi-LSTM Network

Uploaded by

TYPE Original Research

PUBLISHED 23 June 2025

Predicting student academic

Frontiers in Education 01 frontiersin.org

Frontiers in Education 02 frontiersin.org

Frontiers in Education 03 frontiersin.org

Frontiers in Education 04 frontiersin.org

Researchers Dataset Attributes Methods Best Method Findings Limitations

Found IES, LVR,

Combined local and

Frontiers in Education 05 frontiersin.org

Researchers Dataset Attributes Methods Best Method Findings Limitations

N is the samples of a Training data set and M features, where each

Frontiers in Education 06 frontiersin.org

Frontiers in Education 07 frontiersin.org

Frontiers in Education 08 frontiersin.org

Frontiers in Education 09 frontiersin.org

Frontiers in Education 10 frontiersin.org

Frontiers in Education 11 frontiersin.org

TP 4.2 Model evaluation

4.2.1 Comparative model performance

TABLE 2 Models’ performance comparison.

Model Accuracy (%) Precision (%) Recall (%) F1-Score (%)

XGBoost 87.14 86.94 87.18 86.98

HistGradientBoosting 86.63 86.42 86.68 86.43

LightGBM 86.03 85.79 86.07 85.82

Deep Learning (Bi-LSTM) 88.23 92.02 92.11 91.98

Frontiers in Education 12 frontiersin.org

Frontiers in Education 13 frontiersin.org

TABLE 3 Results of the descriptive statics test.

Metric Accuracy (%) Precision (%) Recall (%) F1-Score (%)

Mean 86.93 87.51 87.74 87.53

Std 0.83 2.55 2.48 2.52

Min 86.03 85.79 86.07 85.82

25% 86.61 86.38 86.65 86.42

50% 86.63 86.42 86.68 86.43

75% 87.14 86.94 87.18 86.98

Max 88.23 92.02 92.11 91.98

4.3 Statistical analysis 4.3.3 Bootstrap confidence intervals

Frontiers in Education 14 frontiersin.org

TABLE 4 Tukey’s HSD test results.

Model 1 Model 2 Mean difference p-value Reject null hypothesis

Bi-LSTM HistGradientBoosting −4.545 0.0000 Yes

Bi-LSTM LightGBM −5.16 0.0000 Yes

Bi-LSTM XGBoost −4.03 0.0001 Yes

• HistGradientBoosting vs. Bi-LSTM: −3.8882 4.4.2 Interpretation of evaluation metrics in

Frontiers in Education 15 frontiersin.org

4.4.6 Limitations and future directions

Frontiers in Education 16 frontiersin.org

Funding Generative AI statement

Frontiers in Education 17 frontiersin.org

Frontiers in Education 18 frontiersin.org

You might also like