-
Notifications
You must be signed in to change notification settings - Fork 32.7k
XLNET SQuAD2.0 Fine-Tuning - What May Have Changed? #2651
Description
❓ Questions & Help
I fine-tuned XLNet_large_cased on SQuAD 2.0 last November 2019 with Transformers V2.1.1 yielding satisfactory results:
xlnet_large_squad2_512_bs48
{
"exact": 82.07698138633876,
"f1": 85.898874470488,
"total": 11873,
"HasAns_exact": 79.60526315789474,
"HasAns_f1": 87.26000954590184,
"HasAns_total": 5928,
"NoAns_exact": 84.54163162321278,
"NoAns_f1": 84.54163162321278,
"NoAns_total": 5945,
"best_exact": 83.22243746315169,
"best_exact_thresh": -11.112004280090332,
"best_f1": 86.88541353813282,
"best_f1_thresh": -11.112004280090332
}
with script:
#!/bin/bash
export OMP_NUM_THREADS=6
RUN_SQUAD_DIR=/media/dn/dssd/nlp/transformers/examples
SQUAD_DIR=${RUN_SQUAD_DIR}/scripts/squad2.0
MODEL_PATH=${RUN_SQUAD_DIR}/runs/xlnet_large_squad2_512_bs48
python -m torch.distributed.launch --nproc_per_node=2 ${RUN_SQUAD_DIR}/run_squad.py \
--model_type xlnet \
--model_name_or_path xlnet-large-cased \
--do_train \
--train_file ${SQUAD_DIR}/train-v2.0.json \
--predict_file ${SQUAD_DIR}/dev-v2.0.json \
--version_2_with_negative \
--num_train_epochs 3 \
--learning_rate 3e-5 \
--adam_epsilon 1e-6 \
--max_seq_length 512 \
--doc_stride 128 \
--save_steps 2000 \
--per_gpu_train_batch_size 1 \
--gradient_accumulation_steps 24 \
--output_dir ${MODEL_PATH}
CUDA_VISIBLE_DEVICES=0 python ${RUN_SQUAD_DIR}/run_squad.py \
--model_type xlnet \
--model_name_or_path ${MODEL_PATH} \
--do_eval \
--train_file ${SQUAD_DIR}/train-v2.0.json \
--predict_file ${SQUAD_DIR}/dev-v2.0.json \
--version_2_with_negative \
--max_seq_length 512 \
--per_gpu_eval_batch_size 48 \
--output_dir ${MODEL_PATH}
$@
After upgrading Transformers to Version 2.3.0 I decided to see if there would be any improvements in the fine-tuning results using the same script above. I got the following results:
xlnet_large_squad2_512_bs48
Results: {
'exact': 45.32131727448834,
'f1': 45.52929325627209,
'total': 11873,
'HasAns_exact': 0.0,
'HasAns_f1': 0.4165483859174251,
'HasAns_total': 5928,
'NoAns_exact': 90.51303616484441,
'NoAns_f1': 90.51303616484441,
'NoAns_total': 5945,
'best_exact': 50.07159100480081,
'best_exact_thresh': 0.0,
'best_f1': 50.07229287739689,
'best_f1_thresh': 0.0}
Looking for potential explanation(s)/source(s) for the loss of performance. I have searched Transformer releases and issues for anything pertaining to XLNet with no clues. Are there new fine-tuning hyperparameters I've missed that now need to be assigned, or maybe didn't exist in earlier Transformer versions? Any PyTorch/Tensorflow later version issues? I may have to recreate the Nov 2019 environment for a re-run to verify the earlier results, and then incrementally update Transformers, PyTorch, Tensorflow, etc.?
Current system configuration:
OS: Linux Mint 19.3 based on Ubuntu 18.04. 3 LTS and Linux Kernel 5.0
GPU/CPU: 2 x NVIDIA 1080Ti / Intel i7-8700
Seasonic 1300W Prime Gold Power Supply
CyberPower 1500VA/1000W battery backup
Transformers: 2.3.0
PyTorch: 1.3.0
TensorFlow: 2.0.0
Python: 3.7.5

