Yun-Yen Chuang · Hung-Min Hsu · Kevin Lin · Chen-Sheng Gu · Ling-Zhen Li · Ray-I Chang · Hung-yi Lee
[Slide]
[Poster]
This project is based on our paper accepted at 38th Conference on Neural Information Processing Systems (NeurIPS 2024), titled "Meta-DiffuB: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-Exploration". You can find the paper here.
Comparison between S2S-Diffusion model (i.e., DiffuSeq) and the proposed Meta-DiffuB. The shades of color represent different amounts of noise being imposed.
Different from prior works that use a fixed noise, we introduce a novel scheduler-exploiter framework, Meta-DiffuB, which achieves trainable noise scheduling inspired by Meta Exploration. Our scheduler model schedules contextualized noise, enhancing the training and generation of the S2S-Diffusion model, resulting in state-of-the-art (SOTA) performance compared to previous S2S-Diffusion models, as detailed in Section 4.
Our implementation is based on Python 3.8, PyTorch 1.11 and Fairseq 0.10.2. The following command will install the dependencies and this package in a Conda environment:
conda install pytorch==1.11.0 -c pytorch
pip install -e .
According to the provided steps, after confirming the installation of fairseq, please replace the code in the installed environment with the code from the fairseq and fairseq_cli folders that we provided.
For the non-translation task, we follows DiffuSeq dataset settings.
Prepare datasets and put them under the datasets folder.
Take datasets/WA/train.jsonl as an example. We use four datasets in our paper.
| Task | Datasets | TRaiing Sample | Source | Used in Meta-DiffuB |
|---|---|---|---|---|
| Open-domain Dialogue | Commonsense Conversation | 3382k | CCM | download |
| Question Generation | Quasar-T | 117k | OpenQA | download |
| Text Simplification | Wiki-Auto | 677k | Wiki-auto | download |
| Paraphrase | Quora Question Pairs | 144k | Kaggle | download |
For the translation task, we follow the instructions of Fairseq to preprocess the translation datasets. Then we adopt knowledge distillation using Transformer models trained on the same datasets. To binarize the distilled and tokenized datasets, run following command (take the IWSLT14 De-En dataset as an example):
fairseq-preprocess \
--source-lang de --target-lang en \
--trainpref {PATH-TO-YOUR-DATASET}/train \
--validpref {PATH-TO-YOUR-DATASET}/valid \
--testpref {PATH-TO-YOUR-DATASET}/test \
--destdir data-bin/iwslt14_de_en_distill \
--joined-dictionary \
--workers 20
All training, inference, and evaluation scripts are located in the {model_type}/scripts directory. For example, to train Meta-DiffuB-Difformer on the QQP dataset, simply run:
bash scripts/qqp/train.shTo run inference and evaluate Meta-DiffuB-Difformer on the QQP dataset, run:
bash scripts/qqp/evaluate.shFor Meta-DiffuB-DiffuSeq, a different approach is required. Instead of bash scripts, Jupyter Notebook files are used for training, inference, and evaluation. Specifically:
- To train Meta-DiffuB-DiffuSeq, execute
scripts/Train.ipynbin Jupyter Notebook. - To run inference, execute
scripts/Inference.ipynbin Jupyter Notebook. - To evaluate the model, execute
scripts/Evaluate.ipynbin Jupyter Notebook.
You can modify the parameters in the .ipynb files (such as the dataset) to fit your specific usage scenario.
The other S2S-Diffusion models' code we run for experiments.