Using Quantized Embedding Vectors to Optimize Controllable Diffusion Language Models

Overview

This is the official repo for the paper: Quantized Embedding Vectors for Controllable Diffusion Language Models.

QE-Diffusion Controllable LM is based on a Controllable Diffusion Language Model whose latent space is modeled by the Denoising Diffusion Probabilistic Model (DDPM), constrained by task requirement (such as, topic, grammar, length and so on), and modified by the rounding process to bridge the discrete text and the continuous input. Quantization process, especially fixed-quantization on embedding vectors can decease the complexity of Controllable DLM's embedding space. But it cannot improve DLMs because their embedding space need higher complexity. Compared to previous controllable text generation models, this method not only can decease the perplexity of generated text, but also can theoretically accelerate the inference speed.

The limitation of Controllable Diffusion Language Models

"There are drawbacks to the Diffusion-LMs that we constructed: (1) it has higher perplexity; (2) decoding is substantially slower; and (3) training converges more slowly."

The solutions

The proposed method contains two main steps: QEDLM and Classifier. In the first step, QE-DLM denoises a sequence of quantized Gaussian vectors that are added to word vectors. The quantized embedding vectors then compress and remodel the discrete latent space through a reverse diffusion process. In the second step, the Classifier updates the gradient on the continuous latent space using control. The DLM demonstrates its capability to generate fluent text, and the proper classifier effectively constrains the generated text based on specific control dependence, such as a Parse Tree.

Requirements

pip install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
pip install -e improved-diffusion/ 
pip install -e transformers/
pip install -e loralib/
pip install spacy
pip install datasets 
pip install huggingface_hub
pip install wandb

Train Diffusion-LM with Quantized Enbeddings:

cd improved-diffusion
mkdir diffusion_models

cd Q-Controllable-DLM/batch_file
sbatch DiffLM_Quantized_fixed04_e2e.batch
sbatch DiffLM_Quantized_fixed04_ROCStory.batch
sbatch DiffLM_Quantized_fixed04_WikiText103.batch

Acknowledgement

This code is based on Contrtollable Diffusion Language Model and Improve Diffusion. Thanks for their wonderful works.

For details of the methods and results, please refer to our paper.

@article{kang2024quantized,
  title={Quantized Embedding Vectors for Controllable Diffusion Language Models},
  author={Kang, Cheng and Chen, Xinye and Hu, Yong and Novak, Daniel},
  journal={arXiv preprint arXiv:2402.10107},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
assets		assets
batch_file		batch_file
datasets		datasets
improved-diffusion		improved-diffusion
loralib		loralib
transformers		transformers
LICENSE		LICENSE
README.md		README.md
constants.py		constants.py
embedding_tsne.py		embedding_tsne.py
metrics_json.py		metrics_json.py
train_run.py		train_run.py
wordlist_extract.py		wordlist_extract.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Using Quantized Embedding Vectors to Optimize Controllable Diffusion Language Models

Overview

The limitation of Controllable Diffusion Language Models

The solutions

Requirements

Train Diffusion-LM with Quantized Enbeddings:

Acknowledgement

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Using Quantized Embedding Vectors to Optimize Controllable Diffusion Language Models

Overview

The limitation of Controllable Diffusion Language Models

The solutions

Requirements

Train Diffusion-LM with Quantized Enbeddings:

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages