Skip to content

Conversation

@Willy-Chan
Copy link
Collaborator

Thunderkittens backend support, but with a simplified load_inline() implementation.

This turns out to be much simpler than using separate files, but has its own tradeoffs and implications.

Also, users don't have to put the TK repo in the root directory, it's automatically cloned to the Modal.

@willhu-jpg
Copy link
Collaborator

willhu-jpg commented Dec 31, 2025

We should add instructions for installing the TK dir in local environments to the README. Think there's also a mismatch with the CUDA version in the current TK repo too. It's hard coded to 12.6 if we rely on TK instructions to call 'source env.src'

@Willy-Chan
Copy link
Collaborator Author

Willy-Chan commented Jan 1, 2026

We should add instructions for installing the TK dir in local environments to the README. Think there's also a mismatch with the CUDA version in the current TK repo too. It's hard coded to 12.6 if we rely on TK instructions to call 'source env.src'

@willhu-jpg Added instructions to the README, lmk what you think. Also, could you clarify the CUDA version point? Like the tk-v2 branch uses a different version than kernelbench right now?

@willhu-jpg
Copy link
Collaborator

README changes look good. For the CUDA version, checkout the env.src in the TK repo. Typical setup is to call "source env.src" to set all the environment variables, but they have it hardcoded to 12.6.

@willhu-jpg willhu-jpg self-requested a review January 2, 2026 03:03
Copy link
Collaborator

@willhu-jpg willhu-jpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes look good otherwise! let's merge it :)

README.md Outdated
tk_root = os.environ.get("THUNDERKITTENS_ROOT", "/root/ThunderKittens")
```

This allows the kernel to include the right TK primitives.s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit typo

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, fixed the typo. @simonguozirui TK works on CUDA versions 12.6 - 12.9, rn in modal scripts the system uses CUDA 12.8 by default which should be fine. Locally users will set CUDA_HOME, default is 12.6 for TK's env.src. Lmk if this arrangement sounds fine.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's awesome we will likely stick to CUDA 12.8 for modal container. I’ll add a note that local users should either override CUDA_HOME or modify env.src to point to the appropriate CUDA version on their local dev box.


This allows the kernel to include the right TK primitives.

*NOTE*: Right now, all generated ThunderKittens kernels are required to be in datatype format BF16. FP16 support is TBD.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be good for now! Most TK kernels from the TK paper was done in BF16. We will add code path to restrict it.

# thunderkittens requires bf16 and H100 GPU
if backend == "thunderkittens":
config.precision = "bf16"
config.gpu = "H100"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically blackwell and H200 too but we can worry about that later (i will address that in the enforcing H100 vs 200 PR)

"--expt-extended-lambda",
"-DKITTENS_HOPPER",
"-DKITTENS_BLACKWELL",
"-diag-suppress=20012",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a smart way of doing it

If you plan on using `scripts/generate_and_eval_single_sample.py` using `backend=thunderkittens`, make sure to git clone the ThunderKittens repo and you set the following environment variable to point to your local ThunderKittens directory:

```bash
export THUNDERKITTENS_ROOT=/Users/willychan/Desktop/projects/KernelBench/ThunderKittens
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix to be generic and i will add some comments etc

@simonguozirui
Copy link
Collaborator

simonguozirui commented Jan 3, 2026

Validate TK example execution both locally on H200 and modal H200

python3 scripts/run_and_check.py   ref_origin=local   ref_arch_src_path=src/kernelbench/prompts/model_ex_add_thunderkittens.py   kernel_src_path=src/kernelbench/prompts/model_new_ex_add_thunderkittens.py   eval_mode=<local, modal>   gpu=H200   backend=thunderkittens precision=bf16

We will use matrix add example rather than vector add example for thunderkittens, similar to our last attempt so the tk example can leverage TK primitives and programming model.

Copy link
Collaborator

@simonguozirui simonguozirui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validated it works! I will add more comments in the next PR since I don't have push access to this branch on Willy's repo.

Thank you so much @Willy-Chan for making TK work easily for KernelBench (with the load_inline format rather than the TK python binding interface). This has long awaited since Simran and my first attempt before NeurIPS '24 (branch thunderkittens), and thank you for the many attempts #101 #104. Much thanks to @willhu-jpg @simran-arora for your advice and feedback too!

@simonguozirui simonguozirui merged commit f393682 into ScalingIntelligence:main Jan 3, 2026
ethanboneh pushed a commit that referenced this pull request Jan 6, 2026
* run and check works with new TK format

* generate_and_eval_single_sample working with TK backend

* eval from generations changes

* generate_samples working

* generate_single_examples adapted

* updated git cloning of TK repo: using the tk-v2 branch

* removed unnecessary pip installs now that uv works

* fixed bug to prompts.toml pointing to correct file

* README explaining how to run with TK locally

* nit typo fix

* current commits only support bf16, clarification to README
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants