To reproduce the main numbers (Figure 4) we reported in the paper using PyTorch and 🤗 Transformers, you can use run_eval.py. The script works:
- on CPU (even though it will likely be very slow)
- on a single GPU (single process)
- on multiple GPUs in a distributed environment (multiple processes)
- on multiple GPUs with model parallelism (single process)
The results will be saved in a json file in the output_dir folder.
Here's the command to launch the evaluation on a single process:
python run_eval.py \
--dataset_name super_glue \
--dataset_config_name rte \
--template_name "must be true" \
--model_name_or_path bigscience/T0_3B \
--output_dir ./debugYou are expected to modify the dataset_name, the dataset_config_name and the template_name. The list of templates per data(sub)set is available in this file.
If you evaluate on ANLI (R1, R2 or R3), the dataset_config_name should be dev_r1, dev_r2 or dev_r3.
To launch the evaluation in a distributed environment (multiple GPUs), you should use the accelerate launcher (please refer to Accelerate for installation):
accelerate run_eval.py \
--dataset_name super_glue \
--dataset_config_name rte \
--template_name "must be true" \
--model_name_or_path bigscience/T0_3B \
--output_dir ./debugWhen the model is too big to fit on a single GPU, you can use model parallelism to split it across multiple GPUs. You should add the flag --parallelize when calling the script:
python run_eval.py \
--dataset_name super_glue \
--dataset_config_name rte \
--template_name "must be true" \
--model_name_or_path bigscience/T0_3B \
--output_dir ./debug \
--parallelizeNote that this feature is still an experimental feature under 🤗 Transformers.