Use a pre-trained LLM to generate high-fidelity tabular data!
🧠 Blog Post
📄 Paper
🐈 Demo notebook
🤗 HuggingFace: sonicc/tabby-distilgpt2-diabetes
Follow the instructions below to install the necessary packages, then try things out with our simple pre-trained demo Tabby, or training and sampling your own Tabby!
Run the following:
conda create --name tabby python=3.11.4
conda activate tabby
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
conda install tqdm matplotlib jupyter pandas scikit-learn jupyter
pip install transformers==4.44.2 accelerate datasets ucimlrepo openml bitsandbytes wandb openpyxl huggingface_hubWe have created the demo.ipynb notebook as a quick and easy way to try out Tabby synthesis, no fine-tuning required!
The data-prep.ipynb notebook will download and format datasets as they were prepared in our paper. If you wish to add a new dataset, just make sure it has similar file stricture, config file and train/val/test splits as those created in this notebook.
The file trainplain.py is the main point for interacting with our codebase. It allows you to train both Tabby and non-Tabby models, with or without Plain training.
Here is the command used to train our demo model:
python trainplain.py -p ./ckpt -d diabetes -mh -t -n 10The -p flag indicates where to save the model checkpoints, training logs and samples, -d flag specificies the dataset, -mh means that MoE will be applied to the LM head, -t to perform training (defaults to Plain) and -n specifies the number of samples to produce after training. There are many other flags to specify various facets of training and sampling; to list them all run
python trainplain.py --helpCredit: Some code in the src directory is from the Great repository.
If you use Tabby, please cite:
@misc{cromp2026tabbylanguagemodelarchitecture,
title={Tabby: A Language Model Architecture for Tabular and Structured Data Synthesis},
author={Sonia Cromp and Satya Sai Srinath Namburi GNVV and Mohammed Alkhudhayri and Catherine Cao and Samuel Guo and Nicholas Roberts and Frederic Sala},
year={2026},
eprint={2503.02152},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2503.02152},
}Thanks for checking out Tabby! We'd love to hear any thoughts you might have. Feel free to contact me (Sonia Cromp) at [[email protected]].
