Skip to content

Conversation

@simonguozirui
Copy link
Collaborator

@simonguozirui simonguozirui commented Nov 3, 2025

KernelBench by default uses the PyTorch Tensor precisions which is fp32.
All results reported on KernelBench so far are fp32.

However, as more inference and training techniques go towards lower precision, it is important we support a variety of precisions to understand performance comprehensively. We address the issue in raised in #79.

Specifically

  • Precision support (fp32, fp16, bf16) to cast inputs and weights into the target precisions
  • Add tolerance mapping for various precisions during eval, referencing PyTorch's suggested tolerances.
  • TileLang DSL support - this was blocked previously as TileLang operates on fp16 and bf16

We will also add this info in the model generation prompt in another PR.

Now for KernelBench run you can specify the desired precision as an argument.

…ghts (during forward pass)

use same precision for both

Co-authored-by: Simon Guo <[email protected]>
Co-authored-by: Sahan Paliskara <[email protected]>
@PaliC
Copy link
Collaborator

PaliC commented Nov 3, 2025

For the tolerances we took inspiration from torchbench. You can look at all of the references / tolerance decisions here

The specific tolerances we used are inspired from here + backendbench where we use 1e-02 for everything at fp16

@simonguozirui simonguozirui added the enhancement New feature or request label Nov 3, 2025
@simonguozirui simonguozirui changed the title Precision Support + Cache Clear Precision Support + TileLang Support Nov 4, 2025
@simonguozirui simonguozirui changed the title Precision Support + TileLang Support Precision Support + TileLang Integration Nov 4, 2025
Copy link
Collaborator

@nathanjpaek nathanjpaek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, added tilelang

@simonguozirui
Copy link
Collaborator Author

Thanks to @nathanjpaek for the comprehensive testing! We checked after specifying the precision argument, inside the eval function, and it uses the target precision. We also checked the timing of varying precision, and tested again the main branch version without explicit precision specification (which is fp32).

With bf16 and fp16 support, we can add more DSLs and support more targeted optimizations.

@simonguozirui simonguozirui merged commit 2c3dbda into main Nov 5, 2025
siddagra pushed a commit to siddagra/KernelBench that referenced this pull request Nov 10, 2025
* initial implementation for various precision support on input and weights (during forward pass)
use same precision for both

Co-authored-by: Simon Guo <[email protected]>
Co-authored-by: Sahan Paliskara <[email protected]>

* add tilelang

* update requirements for tilelang

* add precision to other files

* tested and updated readme

---------

Co-authored-by: Sahan Paliskara <[email protected]>
Co-authored-by: Nathan Paek <[email protected]>
Co-authored-by: nathanjp <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants