GitHub - deadlykitten4/DICE: DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

Haolei Bai¹, Lingcheng Kong^1,2, Xueyi Chen¹, Jiamian Wang³, Zhiqiang Tao³, Huan Wang¹

¹Westlake University, ²The Hong Kong University of Science and Technology, ³Rochester Institute of Technology

🗓️ Plan

2026.03.03: We release our code of training and evaluation!
2026.02.13: We release DICE-1.7B, DICE-4B, and DICE-8B on Hugging Face !
2026.02.13: The paper is on arXiv !

🚀 Usage

1. Install dependencies

conda env create -f environment.yml

2. Training

2.1 Prepare dataset

We provide the curated CuKe dataset for SFT in

DICE/training/sft/llama_factory_sdar/data/CuKe_dataset.json

We provide the data for two stages of BiC-RL in the folder

DICE/rl_data

2.2 Supervised fine-tuning

We follow the training process of SDAR, you may check here for more instruction.

cd DICE/training/sft/llama_factory_sdar
torchrun --nnodes 1 --node_rank 0 --nproc_per_node 8 --master_addr 127.0.0.1 --master_port 12345 ./src/llamafactory/launcher.py ./examples/train_full_sdar/sdar_8b_full.yaml

2.2 BiC-RL

cd DICE/training/rl
# kernel infilling stage
python rl.py config=configs/rl_sdar_kernel_infilling-8b.yaml
# end-to-end kernel generation stage
python rl.py config=configs/rl_sdar_kernel_final-8b.yaml

3. Evaluation

We evaluate all models based on KernelBench. You can train SDAR series models based on the provided training scripts or you can directly download the DICE series models on Hugging Face.

cd DICE/evaluation
# generation
python scripts/generate_samples.py run_name=DICE_8b_level_1 dataset_src=huggingface level=1 use_local_model=True local_model_path="/path/to/DICE-8B/" gen_length=4096
python scripts/generate_samples.py run_name=DICE_8b_level_2 dataset_src=huggingface level=2 use_local_model=True local_model_path="/path/to/DICE-8B/" gen_length=4096
python scripts/generate_samples.py run_name=DICE_8b_level_3 dataset_src=huggingface level=3 use_local_model=True local_model_path="/path/to/DICE-8B/" gen_length=4096

# evaluation
python scripts/eval_from_generations.py run_name=DICE_8b_level_1 dataset_src=local level=1 timeout=300

# you need to first obtain the baseline time on your hardware (please refer to KernelBench)
python scripts/benchmark_eval_analysis.py run_name=DICE_8b_level_1 level=1 hardware=A100 baseline=baseline_time_torch

🙌 Acknowledgement

We are grateful to the SDAR, TraceRL, KernelBench, cudaLLM for releasing their code publicly, which greatly facilitated our work.

📖 Citation

If you find DICE useful for your research or projects, please consider citing our work:

@article{bai2026dice,
  title={DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels},
  author={Bai, Haolei and Kong, Lingcheng and Chen, Xueyi and Wang, Jiamian and Tao, Zhiqiang and Wang, Huan},
  journal={arXiv preprint arXiv:2602.11715},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.idea		.idea
evaluation		evaluation
rl_data		rl_data
training		training
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

🗓️ Plan

🚀 Usage

1. Install dependencies

2. Training

2.1 Prepare dataset

2.2 Supervised fine-tuning

2.2 BiC-RL

3. Evaluation

🙌 Acknowledgement

📖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

🗓️ Plan

🚀 Usage

1. Install dependencies

2. Training

2.1 Prepare dataset

2.2 Supervised fine-tuning

2.2 BiC-RL

3. Evaluation

🙌 Acknowledgement

📖 Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages