Skip to content

deadlykitten4/DICE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

Haolei Bai1, Lingcheng Kong1,2, Xueyi Chen1, Jiamian Wang3, Zhiqiang Tao3, Huan Wang1

1Westlake University, 2The Hong Kong University of Science and Technology, 3Rochester Institute of Technology

arXiv Hugging Face GitHub Repo stars
[HuggingFace Daily Paper]

🗓️ Plan

  • 2026.03.03: We release our code of training and evaluation!
  • 2026.02.13: We release DICE-1.7B, DICE-4B, and DICE-8B on Hugging Face !
  • 2026.02.13: The paper is on arXiv !

🚀 Usage

1. Install dependencies

conda env create -f environment.yml

2. Training

2.1 Prepare dataset

We provide the curated CuKe dataset for SFT in

DICE/training/sft/llama_factory_sdar/data/CuKe_dataset.json

We provide the data for two stages of BiC-RL in the folder

DICE/rl_data

2.2 Supervised fine-tuning

We follow the training process of SDAR, you may check here for more instruction.

cd DICE/training/sft/llama_factory_sdar
torchrun --nnodes 1 --node_rank 0 --nproc_per_node 8 --master_addr 127.0.0.1 --master_port 12345 ./src/llamafactory/launcher.py ./examples/train_full_sdar/sdar_8b_full.yaml

2.2 BiC-RL

cd DICE/training/rl
# kernel infilling stage
python rl.py config=configs/rl_sdar_kernel_infilling-8b.yaml
# end-to-end kernel generation stage
python rl.py config=configs/rl_sdar_kernel_final-8b.yaml

3. Evaluation

We evaluate all models based on KernelBench. You can train SDAR series models based on the provided training scripts or you can directly download the DICE series models on Hugging Face.

cd DICE/evaluation
# generation
python scripts/generate_samples.py run_name=DICE_8b_level_1 dataset_src=huggingface level=1 use_local_model=True local_model_path="/path/to/DICE-8B/" gen_length=4096
python scripts/generate_samples.py run_name=DICE_8b_level_2 dataset_src=huggingface level=2 use_local_model=True local_model_path="/path/to/DICE-8B/" gen_length=4096
python scripts/generate_samples.py run_name=DICE_8b_level_3 dataset_src=huggingface level=3 use_local_model=True local_model_path="/path/to/DICE-8B/" gen_length=4096

# evaluation
python scripts/eval_from_generations.py run_name=DICE_8b_level_1 dataset_src=local level=1 timeout=300

# you need to first obtain the baseline time on your hardware (please refer to KernelBench)
python scripts/benchmark_eval_analysis.py run_name=DICE_8b_level_1 level=1 hardware=A100 baseline=baseline_time_torch

🙌 Acknowledgement

We are grateful to the SDAR, TraceRL, KernelBench, cudaLLM for releasing their code publicly, which greatly facilitated our work.

📖 Citation

If you find DICE useful for your research or projects, please consider citing our work:

@article{bai2026dice,
  title={DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels},
  author={Bai, Haolei and Kong, Lingcheng and Chen, Xueyi and Wang, Jiamian and Tao, Zhiqiang and Wang, Huan},
  journal={arXiv preprint arXiv:2602.11715},
  year={2026}
}

About

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages