XLNET SQuAD2.0 Fine-Tuning - What May Have Changed?

## ❓ Questions & Help

I fine-tuned XLNet_large_cased on SQuAD 2.0 last November 2019 with Transformers V2.1.1 yielding satisfactory results:
```
xlnet_large_squad2_512_bs48
{
  "exact": 82.07698138633876,
  "f1": 85.898874470488,
  "total": 11873,
  "HasAns_exact": 79.60526315789474,
  "HasAns_f1": 87.26000954590184,
  "HasAns_total": 5928,
  "NoAns_exact": 84.54163162321278,
  "NoAns_f1": 84.54163162321278,
  "NoAns_total": 5945,
  "best_exact": 83.22243746315169,
  "best_exact_thresh": -11.112004280090332,
  "best_f1": 86.88541353813282,
  "best_f1_thresh": -11.112004280090332
}
```
![loss_graph](https://user-images.githubusercontent.com/44321615/73140980-b73d9e80-4033-11ea-9320-d4fa633c9fd7.jpg)

with script:
```
#!/bin/bash

export OMP_NUM_THREADS=6
RUN_SQUAD_DIR=/media/dn/dssd/nlp/transformers/examples
SQUAD_DIR=${RUN_SQUAD_DIR}/scripts/squad2.0
MODEL_PATH=${RUN_SQUAD_DIR}/runs/xlnet_large_squad2_512_bs48

python -m torch.distributed.launch --nproc_per_node=2 ${RUN_SQUAD_DIR}/run_squad.py \
  --model_type xlnet \
  --model_name_or_path xlnet-large-cased \
  --do_train \
  --train_file ${SQUAD_DIR}/train-v2.0.json \
  --predict_file ${SQUAD_DIR}/dev-v2.0.json \
  --version_2_with_negative \
  --num_train_epochs 3 \
  --learning_rate 3e-5 \
  --adam_epsilon 1e-6 \
  --max_seq_length 512 \
  --doc_stride 128 \
  --save_steps 2000 \
  --per_gpu_train_batch_size 1 \
  --gradient_accumulation_steps 24 \
  --output_dir ${MODEL_PATH}

CUDA_VISIBLE_DEVICES=0 python ${RUN_SQUAD_DIR}/run_squad.py \
  --model_type xlnet \
  --model_name_or_path ${MODEL_PATH} \
  --do_eval \
  --train_file ${SQUAD_DIR}/train-v2.0.json \
  --predict_file ${SQUAD_DIR}/dev-v2.0.json \
  --version_2_with_negative \
  --max_seq_length 512 \
  --per_gpu_eval_batch_size 48 \
  --output_dir ${MODEL_PATH}
$@
```
After upgrading Transformers to Version 2.3.0 I decided to see if there would be any improvements in the fine-tuning results using the same script above.  I got the following results:
```
xlnet_large_squad2_512_bs48
Results: {
'exact': 45.32131727448834,
'f1': 45.52929325627209,
'total': 11873,
'HasAns_exact': 0.0,
'HasAns_f1': 0.4165483859174251,
'HasAns_total': 5928,
'NoAns_exact': 90.51303616484441,
'NoAns_f1': 90.51303616484441,
'NoAns_total': 5945,
'best_exact': 50.07159100480081,
'best_exact_thresh': 0.0,
'best_f1': 50.07229287739689,
'best_f1_thresh': 0.0}
```
No learning takes place:
![loss](https://user-images.githubusercontent.com/44321615/73141342-de966a80-4037-11ea-92c7-bdbf2035b51b.jpg)

Looking for potential explanation(s)/source(s) for the loss of performance.  I have searched Transformer releases and issues for anything pertaining to XLNet with no clues.  Are there new fine-tuning hyperparameters I've missed that now need to be assigned, or maybe didn't exist in earlier Transformer versions?  Any PyTorch/Tensorflow later version issues?  I may have to recreate the Nov 2019 environment for a re-run to verify the earlier results, and then incrementally update Transformers, PyTorch, Tensorflow, etc.?

Current system configuration:
OS: Linux Mint 19.3 based on Ubuntu 18.04. 3 LTS and Linux Kernel 5.0
GPU/CPU: 2 x NVIDIA 1080Ti / Intel i7-8700
Seasonic 1300W Prime Gold Power Supply
CyberPower 1500VA/1000W battery backup

Transformers: 2.3.0
PyTorch: 1.3.0
TensorFlow: 2.0.0
Python: 3.7.5


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XLNET SQuAD2.0 Fine-Tuning - What May Have Changed? #2651

❓ Questions & Help

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

XLNET SQuAD2.0 Fine-Tuning - What May Have Changed? #2651

Description

❓ Questions & Help

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions