GitHub - ashaba1in/smoothie: The official implementation of "Smoothie: Smoothing Diffusion on Token Embeddings for Text Generation"

Smoothie: Smoothing Diffusion on Token Embeddings for Text Generation

Paper: https://arxiv.org/pdf/2505.18853

Requirements

Python libraries: See requirements.txt for exact library dependencies. You can use the following commands with Miniconda3 to create and activate your Python environment:
- conda create --name smoothie python=3.9
- conda activate smoothie
- conda install pip
- pip install -r requirements.txt
- python -m spacy download en

Dataset loading

For Newsela-Auto and Quasar-T datasets you first need to download files train.json, valid.json and test.json from DiffuSeq github and put them in the ./datasets/ folder.

Then you should run the following command:

python -m data.load --dataset_name=dataset_name

For any other dataset used in the paper, you can run the command above without downloading anything.

The 'dataset_name' is one of the following:

'rocstories'
'qqp'
'xsum'
'newsela-auto'
'quasar_t

Diffusion training

To train basic Smoothie setup, run

torchrun --nproc_per_node=n train_diffusion.py --dataset_name dataset_name --smooth_diffusion

This script will train Smoothie model used in the paper.

Diffusion evaluation

To evaluate the trained model, run

torchrun --nproc_per_node=n evaluate_diffusion.py --dataset_name dataset_name --smooth_diffusion --checkpoints_name checkpoints_name"

where checkpoints_name is a name of the folder with saved checkpoint. By default, it is smoothie-{dataset_name}

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
data		data
diffusion_utils		diffusion_utils
estimation_utils		estimation_utils
model		model
utils		utils
README.md		README.md
__init__.py		__init__.py
create_config.py		create_config.py
diffusion_holder.py		diffusion_holder.py
evaluate_diffusion.py		evaluate_diffusion.py
requirements.txt		requirements.txt
train_diffusion.py		train_diffusion.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Smoothie: Smoothing Diffusion on Token Embeddings for Text Generation

Requirements

Dataset loading

Diffusion training

Diffusion evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Smoothie: Smoothing Diffusion on Token Embeddings for Text Generation

Requirements

Dataset loading

Diffusion training

Diffusion evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages