CODA: Repurposing Continuous VAEs for Discrete Tokenization

This is the official implementation of CODA, introduced in CODA: Repurposing Continuous VAEs for Discrete Tokenization.

🔆 Highlights

We identify that training conventional VQ tokenizers is inherently challenging, as it requires both compressing visual signals into a compact representation and discretizing them into a fixed set of codes. This often lead to unstable training, low codebook utilization, and limited reconstruction quality. Instead of training discrete tokenizers from scratch, we introduce CODA (COntinuous-to-Discrete Adaptation), which adapts off-the-shelf continuous VAEs --- already optimized for perceptual compression --- into discrete tokenizers via a carefully designed discretization process. This ensures stable and efficient training while retaining the strong visual fidelity of continuous VAEs.

🔧 Usage

Tokenizer

Install corresponding environments with

git clone [email protected]:LeapLabTHU/CODA.git
cd tokenizer
pip install -r requirements.txt

Prepare the required pretrained models and dataset

Prepare the ImageNet dataset and replace the PATH_TO_IMAGENET with the corresponding path on your machine.
Prepare the pretrained models: MAR VAE, FLUX VAE and Style-GAN DINO discriminator:

data
├── train
│   ├── folder 1 (class 1)
│   ├── folder 2 (class 1)
│   ├── ...
|
├── val
│   ├── folder 1 (class 1)
│   ├── folder 2 (class 1)
│   ├── ...


checkpoints
├── mar_vae
│   ├── kl16.safetensors
|
├── flux_vae
│   ├── config.json
│   ├── diffusion_pytorch_model.safetensors
|
├── dino_disc
│   ├── dino_deitsmall16_pretrain.safetensors

Training

bash run.sh

See run.sh for detailed configs for running MAR and FLUX based models.

📚 Model Zoo

Model	Link
MAR, $V=16384$	link
FLUX, $V=65536$	link

🔎 Code Release

Generation training code & checkpoints
Tokenizer checkpoints
Tokenizer training codes

📬 Contact

⛽⛽⛽ [email protected]

🔖 Acknowledgements

Our implementation is based on vaex, VQGAN, SEED-Voken, MAR, pytorch-fid.

We thank the authors for their excellent work.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
tokenizer		tokenizer
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CODA: Repurposing Continuous VAEs for Discrete Tokenization

🔆 Highlights

🔧 Usage

Tokenizer

📚 Model Zoo

🔎 Code Release

📬 Contact

🔖 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CODA: Repurposing Continuous VAEs for Discrete Tokenization

🔆 Highlights

🔧 Usage

Tokenizer

📚 Model Zoo

🔎 Code Release

📬 Contact

🔖 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages