-
Notifications
You must be signed in to change notification settings - Fork 110
Simplified Thunderkittens Port #107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplified Thunderkittens Port #107
Conversation
1ce1f1b to
faf0935
Compare
|
We should add instructions for installing the TK dir in local environments to the README. Think there's also a mismatch with the CUDA version in the current TK repo too. It's hard coded to 12.6 if we rely on TK instructions to call 'source env.src' |
@willhu-jpg Added instructions to the README, lmk what you think. Also, could you clarify the CUDA version point? Like the tk-v2 branch uses a different version than kernelbench right now? |
|
README changes look good. For the CUDA version, checkout the env.src in the TK repo. Typical setup is to call "source env.src" to set all the environment variables, but they have it hardcoded to 12.6. |
willhu-jpg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes look good otherwise! let's merge it :)
README.md
Outdated
| tk_root = os.environ.get("THUNDERKITTENS_ROOT", "/root/ThunderKittens") | ||
| ``` | ||
|
|
||
| This allows the kernel to include the right TK primitives.s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, fixed the typo. @simonguozirui TK works on CUDA versions 12.6 - 12.9, rn in modal scripts the system uses CUDA 12.8 by default which should be fine. Locally users will set CUDA_HOME, default is 12.6 for TK's env.src. Lmk if this arrangement sounds fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's awesome we will likely stick to CUDA 12.8 for modal container. I’ll add a note that local users should either override CUDA_HOME or modify env.src to point to the appropriate CUDA version on their local dev box.
|
|
||
| This allows the kernel to include the right TK primitives. | ||
|
|
||
| *NOTE*: Right now, all generated ThunderKittens kernels are required to be in datatype format BF16. FP16 support is TBD. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be good for now! Most TK kernels from the TK paper was done in BF16. We will add code path to restrict it.
| # thunderkittens requires bf16 and H100 GPU | ||
| if backend == "thunderkittens": | ||
| config.precision = "bf16" | ||
| config.gpu = "H100" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically blackwell and H200 too but we can worry about that later (i will address that in the enforcing H100 vs 200 PR)
| "--expt-extended-lambda", | ||
| "-DKITTENS_HOPPER", | ||
| "-DKITTENS_BLACKWELL", | ||
| "-diag-suppress=20012", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a smart way of doing it
| If you plan on using `scripts/generate_and_eval_single_sample.py` using `backend=thunderkittens`, make sure to git clone the ThunderKittens repo and you set the following environment variable to point to your local ThunderKittens directory: | ||
|
|
||
| ```bash | ||
| export THUNDERKITTENS_ROOT=/Users/willychan/Desktop/projects/KernelBench/ThunderKittens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix to be generic and i will add some comments etc
|
Validate TK example execution both locally on H200 and modal H200 We will use matrix add example rather than vector add example for thunderkittens, similar to our last attempt so the tk example can leverage TK primitives and programming model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validated it works! I will add more comments in the next PR since I don't have push access to this branch on Willy's repo.
Thank you so much @Willy-Chan for making TK work easily for KernelBench (with the load_inline format rather than the TK python binding interface). This has long awaited since Simran and my first attempt before NeurIPS '24 (branch thunderkittens), and thank you for the many attempts #101 #104. Much thanks to @willhu-jpg @simran-arora for your advice and feedback too!
* run and check works with new TK format * generate_and_eval_single_sample working with TK backend * eval from generations changes * generate_samples working * generate_single_examples adapted * updated git cloning of TK repo: using the tk-v2 branch * removed unnecessary pip installs now that uv works * fixed bug to prompts.toml pointing to correct file * README explaining how to run with TK locally * nit typo fix * current commits only support bf16, clarification to README
Thunderkittens backend support, but with a simplified load_inline() implementation.
This turns out to be much simpler than using separate files, but has its own tradeoffs and implications.
Also, users don't have to put the TK repo in the root directory, it's automatically cloned to the Modal.