Figure 1 Bi-GRU with BERT urgent classification model.
Related Figures (17)
FIGURE 2. Document classification model formed by incorporating BERT with one additional output layer. Figure adapted from Devlin et al. [26]. TABLE 1. Examples of MOOC discussion forum posts and their label urgency scores. TABLE 2. Examples of MOOC discussion forum posts and their label urgency scores. To prevent the model’s overfitting, I used the Dropout mech- anism during the model training [30]. The batch size is set to 128, and the dropout rate is adjusted for each group of datasets. FIGURE 4. The learning curves and validation curves (loss verse number of epochs) of RNN in three experiments using data sets A, B, and C where the best values of losses are 0.28, 0.33, and 0.4 at epochs 196, 237, and 222, respectively. FIGURE 5. The learning curves and validation curves (loss verse number of epochs) of CNN in three experiments using data sets A, B, and C where the best values of losses are 0.248, 0.34, and 0.33 at epochs 4, 5, and 6, respectively. FIGURE 6. The learning curves and validation curves (loss verse number of epochs) of FASTTEXT in three experiments using data sets A, B, and C where the best values of losses are 0.276, 0.33, and 0.41 at epochs 9, 8, and 7, respectively. FIGURE 7. The learning curves and validation curves (loss verse number of epochs) LSTM in three experiments using data sets A, B, and C where the best values of losses are 0.238, 0.29, and 0.30 at epochs 9, 6, and 8 respectively. FIGURE 8. The learning curves and validation curves (loss verse number of epochs) of BERT in three experiments using data sets A, B, and C where the best values of losses are 0.20, 0.24, and 0.30 at epochs 4, 4, and 4, respectively. TABLE 4. Experimental results on group B. TABLE 5. Experimental results on group C. FIGURE 9. The PR curves of RNN in three experiments using data sets A, B, and C where AUC values were equal to 0.741, 0.734, and 0.657, respectively. FIGURE 10. The PR curves of CNN in three experiments using data sets A, B, and C where AUC values were equal to 0.754, 0.761, and 0.734, respectively. FIGURE 11. The PR curves of FASTTEXT in three experiments using data sets A, B, and C where AUC values were equal to 0.751, 0.728, and 0.683, respectively. FIGURE 12. The PR curves of LSTM in three experiments using data sets A, B, and C where AUC values were equal to 0.799, 0.797 and 0.759, respectively. FIGURE 13. The PR curves of GRU with BERT in three experiments using data sets A, B, and C where AUC values were equal to 0.822, 0.836, and 0.792, respectively.