Bonito is an open-source model for generating task-specific synthetic instruction tuning datasets conditioned on unannotated text.
This repo contains code to reproduce the experiments from the Bonito paper. For the Bonito package, see the bonito repo.
To install all the relevant packages, run the following:
conda create -n bonito-experiments python==3.9
conda activate bonito-experiments
pip3 install -r requirements.txt
To train models, run the following script:
deepspeed training/train_decoder.py --model_name_or_path mistralai/Mistral-7B-v0.1 --supervision_source bonito --dataset_name pubmed_qa --output_dir output/models/bonito_pubmed_qa_mistralOptions:
model_name_or_path: The model to train. We consider{mistralai/Mistral-7B-v0.1, meta-llama/Llama-2-7b-hf, mistralai/Mistral-7B-Instruct-v0.2}in our experiments. You can train on any language model of your choice. Default ismistralai/Mistral-7B-v0.1.supervision_source: The source of supervision to train the model. This includes either synthetic instruction instruction dataset, or unnoatated texts, or general instruction tuning dataset. Your choices include{bonito, dapt, mistral_instruct, zephyr_beta, p3}. Default isbonito.dataset_name: The synthetic dataset. Your choices include{pubmed_qa, privacy_qa, squadshifts_nyt, squadshifts_amazon, squadshifts_reddit, contract_nli, vitaminc}. All the datasets are retrieved from BatsResearch/bonito-experiment.checkpoint_model_id_or_path(Optional): This loads the LoRA adapter instruction tuned on P3. This is dependent on themodel_name_or_path. UseBatsResearch/Mistral-7B-v0.1-P3formistralai/Mistral-7B-v0.1andBatsResearch/Llama-2-7b-hf-P3formeta-llama/Llama-2-7b-hfmodel. You can also pass a local checkpoint. Default isNone.
Notes:
- If you are using a multi-gpu environment, ensure you adjust the
per_device_train_batch_sizeandgradient_accumulation_stepsto achieve an effective batch size of 16. - We train the model for 10,000 steps. If the dataset has fewer than 160,000 samples, then we train for 1 epoch.
We evaluate the pretrained and fine-tuned models on prompted datasets.
We use ranked evaluation for pubmed_qa, privacy_qa, contract_nli, and vitaminc and SQuAD evaluation for squadshifts_nyt, squadshifts_amazon, and squadshifts_reddit.
All the evaluation datasets are uploaded to BatsResearch/bonito-experiment-eval.
The following script evaluates the model on the target dataset:
deepspeed evaluation/evaluate_decoder.py --dataset_name pubmed_qa --model_name_or_path mistralai/Mistral-7B-v0.1 --checkpoint_model_id_or_path <checkpoint_path> --output_dir results/bonito-mistral-pubmed_qa --bf16Options:
checkpoint_model_id_or_path: path to the checkpoint directory or the huggingface model id. This is the path to the trained model. Default isNone.model_name_or_path: The model to evaluate. We consider{mistralai/Mistral-7B-v0.1, meta-llama/Llama-2-7b-hf, mistralai/Mistral-7B-Instruct-v0.2}in our experiments. You can evaluate any language model of your choice. Default ismistralai/Mistral-7B-v0.1.dataset_name: the evaluation dataset. Your choices include{pubmed_qa, privacy_qa, contract_nli, vitaminc}. Default isNone.output_dir: the directory to save the evaluation results. Default isresults.
Additional options:
template_name: runs evaluation for a specific template in the dataset. See the jinja templates for See the jinja templates forpubmed_qa,privacy_qa,contract_nli, andvitamincintemplatesdirectory. intemplatesdirectory. Default isNone.
The following script merges the base model with the checkpoint adapter and evaluates the model on five templates from the SQuADShifts dataset:
python3 evaluation/merge_and_evaluate_squad.py --dataset_name squadshifts_nyt --model_name_or_path mistralai/Mistral-7B-v0.1 --checkpoint_model_id_or_path <checkpoint_path> --output_dir results/bonito-mistral-squadshifts_nytOptions:
checkpoint_model_id_or_path: path to the checkpoint directory or the huggingface model id. This is the path to the trained model. Default isNone.model_name_or_path: The model to evaluate. We consider{mistralai/Mistral-7B-v0.1, meta-llama/Llama-2-7b-hf, mistralai/Mistral-7B-Instruct-v0.2}in our experiments. You can evaluate any language model of your choice. Default ismistralai/Mistral-7B-v0.1.dataset_name: the evaluation dataset. Your choices include{squadshifts_nyt, squadshifts_amazon, squadshifts_reddit}. Default isNone.output_dir: the directory to save the evaluation results. Default isresults.
Notes:
- We use
SQuADShiftstemplates from promptsource. - The merging operation saves a new model in the
scratchdirectory. Change--scratchpath to save the model in a different directory. Additionally ensure you have enough space to save the model.
To generate the CTGA-v1 dataset, run the following script:
python3 ctga/task_type_bonito.py --output_dir output/dataset/ctga-v1To train the Bonito model, run the following script:
deepspeed training/train_decoder.py --model_name_or_path mistralai/Mistral-7B-v0.1 --training_type="bonito_training" --dataset_name ctga-v1 --output_dir output/model/bonito_ctga-v1_mistral --max_steps 100000 --max_eval_samples 10000 --save_steps 10000 --save_total_limit 10To generate instruction tuning datasets, run the following script:
python3 generation/generate_data.py --model_name_or_path BatsResearch/bonito-v1 --output_dir output/dataset/contract_nli --dataset_name contract_nli --task_type nliOptions:
model_name_or_path: The model to generate the synthetic dataset. You can useBatsResearch/bonito-v1in our experiments. You can generate datasets using any language model of your choice. Default isBatsResearch/bonito-v1.output_dir: the directory to save the generated dataset. Default isoutput/dataset.dataset_name: the name of the dataset. Your choices include{pubmed_qa, privacy_qa, squadshifts_nyt, squadshifts_amazon, squadshifts_reddit, contract_nli, vitaminc}. Default isNone.task_type: the task type of the dataset. Your choices include{exqa, ynqa,nli,mcqa, qg,qa,coref,paraphrase,paraphrase_id,sent_comp,sentiment,summarization,text_gen,topic_class,wsd,te}. Default isNone.
The training code is adapted from Q-LoRA. The evaluation code is adapted from t-zero.
If you use Bonito in your research, please cite the following paper:
@article{bonito:arxiv24,
Author = {Nihal V. Nayak and Yiyang Nan and Avi Trost and Stephen H. Bach},
Title = {Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation},
Volume = {arXiv:2402.18334 [cs.CL]},
Year = {2024}}