Skip to content

Meta-DiffuB/Meta-DiffuB

Repository files navigation

Meta-DiffuB: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-Exploration

Yun-Yen Chuang · Hung-Min Hsu · Kevin Lin · Chen-Sheng Gu · Ling-Zhen Li · Ray-I Chang · Hung-yi Lee
[Slide] [Poster]

Our Paper at NeurIPS 2024

This project is based on our paper accepted at 38th Conference on Neural Information Processing Systems (NeurIPS 2024), titled "Meta-DiffuB: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-Exploration". You can find the paper here.

Meta-DiffuB

Image Alt text Comparison between S2S-Diffusion model (i.e., DiffuSeq) and the proposed Meta-DiffuB. The shades of color represent different amounts of noise being imposed. Different from prior works that use a fixed noise, we introduce a novel scheduler-exploiter framework, Meta-DiffuB, which achieves trainable noise scheduling inspired by Meta Exploration. Our scheduler model schedules contextualized noise, enhancing the training and generation of the S2S-Diffusion model, resulting in state-of-the-art (SOTA) performance compared to previous S2S-Diffusion models, as detailed in Section 4.

Getting started

Our implementation is based on Python 3.8, PyTorch 1.11 and Fairseq 0.10.2. The following command will install the dependencies and this package in a Conda environment:

conda install pytorch==1.11.0 -c pytorch
pip install -e .

According to the provided steps, after confirming the installation of fairseq, please replace the code in the installed environment with the code from the fairseq and fairseq_cli folders that we provided.

Datasets

For the non-translation task, we follows DiffuSeq dataset settings. Prepare datasets and put them under the datasets folder. Take datasets/WA/train.jsonl as an example. We use four datasets in our paper.

Task Datasets TRaiing Sample Source Used in Meta-DiffuB
Open-domain Dialogue Commonsense Conversation 3382k CCM download
Question Generation Quasar-T 117k OpenQA download
Text Simplification Wiki-Auto 677k Wiki-auto download
Paraphrase Quora Question Pairs 144k Kaggle download

For the translation task, we follow the instructions of Fairseq to preprocess the translation datasets. Then we adopt knowledge distillation using Transformer models trained on the same datasets. To binarize the distilled and tokenized datasets, run following command (take the IWSLT14 De-En dataset as an example):

fairseq-preprocess \
    --source-lang de --target-lang en \
    --trainpref {PATH-TO-YOUR-DATASET}/train \
    --validpref {PATH-TO-YOUR-DATASET}/valid \
    --testpref {PATH-TO-YOUR-DATASET}/test \
    --destdir data-bin/iwslt14_de_en_distill \
    --joined-dictionary \
    --workers 20

Training, inference, and Evaluation

All training, inference, and evaluation scripts are located in the {model_type}/scripts directory. For example, to train Meta-DiffuB-Difformer on the QQP dataset, simply run:

bash scripts/qqp/train.sh

To run inference and evaluate Meta-DiffuB-Difformer on the QQP dataset, run:

bash scripts/qqp/evaluate.sh

For Meta-DiffuB-DiffuSeq, a different approach is required. Instead of bash scripts, Jupyter Notebook files are used for training, inference, and evaluation. Specifically:

  • To train Meta-DiffuB-DiffuSeq, execute scripts/Train.ipynb in Jupyter Notebook.
  • To run inference, execute scripts/Inference.ipynb in Jupyter Notebook.
  • To evaluate the model, execute scripts/Evaluate.ipynb in Jupyter Notebook.

You can modify the parameters in the .ipynb files (such as the dataset) to fit your specific usage scenario.

Baseline Model Reference

The other S2S-Diffusion models' code we run for experiments.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors