Static Kernel Code Checker #110

simonguozirui · 2025-12-23T01:10:16Z

Kernel Static Checker - Pattern-based validation for GPU kernel code.
See src/kernel_static_checker.py

The goal is flag reward hacking patterns (both strictly prohibited and possible ones).
through statically examining the code.

Warning: This list is by no means complete and nor this is not a replacement for runtime checks.
We welcome feedback and contributions as community find new ways of hacks.

Bypass hacks (PyTorch wrapping, try-except fallback, inheritance bypass)
Disallow some high-level torch operations (depends on the settings)
Backend implementation requirements, that CUDA or DSL features must be used

Usage:
result = validate_kernel_static(code, backend="cuda")
will return a tuple (valid, errors, warnings)

Right now, all these checks are regex-matching-based.
In the future we can add AST-based checking (@PaliC) to make it more reliable, more runtime level checks, and LM-as-a-judge.

Source of checks:

Kevin paper, especially the reward hacking part (Appendix F)
Snippets of code from our other projects
DeepReinforce's awesome blogpost

PaliC

There's a lot of work to do. One immediate point of feedback is that if you want to ban something I would just replace whatever that is with a useful error or use ast to capture it (as in you need to do with try and except) and emit a useful error if the nodes pop up.

PaliC · 2026-01-04T20:31:44Z

uv.lock

As discussed remove this :P

More seriously just update .gitignore

PaliC · 2026-01-04T20:36:15Z

src/kernelbench/unit_tests/test_static_checker.py

@@ -0,0 +1,270 @@
+"""
+Tests for kernel_static_checker.py


Would recommend using fixtures for this (this is a common strat for web dev / js testing)

I can give you some examples if you want

yeah i will clean up the test file with some adveserial examples and reusable features, it was a start

PaliC · 2026-01-04T20:53:33Z

src/code_check_old.py

+    has_tilelang = "@T.prim_func" in kernel_code or "tvm.build" in kernel_code
+
+    has_custom_implementation = any([
+        has_triton_kernel,


for the python dsl. An idea would be to patch the decorators such that they do some inspection op / timing to validate it's executed / it takes up most of the time

good idea, we can use that in the runtime-based validation later on.

PaliC · 2026-01-04T20:54:35Z

src/code_check_old.py

+    # Kevin Rule 1: PyTorch wrapping detection
+    # Zero reward for kernels containing torch.nn or torch.nn.functional
+    pytorch_patterns = [
+        "torch.nn.functional",


I would just overwrite these libraries to emit a useful error if you want to ban torch

i see what you mean by overwriting now, let's keep this a static checker for now and we can think about smarter runtime checks later

PaliC · 2026-01-04T20:56:33Z

src/kernelbench/kernel_static_checker.py

+    "torch.softmax", "torch.log_softmax", "torch.tanh", "torch.sigmoid",
+    "torch.hardsigmoid", "torch.silu", "torch.mish",
+    # Normalization
+    "torch.batch_norm", "torch.group_norm", "torch.layer_norm",


just overwrite these with an error I think it's better.

and dsl tests Co-authored-by: Simon Guo <[email protected]> Co-authored-by: Ethan <[email protected]>

…erial ones)

alexzhang13 · 2026-01-05T18:38:23Z

These are cool, I didn't take a careful enough look to make precise comments but in general I like the idea of all these tests. Also make sure people have to option to use or not use them (especially for the L3+ kernels where it might be too hard to not use PyTorch ops at all).

simonguozirui · 2026-01-06T04:59:11Z

src/kernelbench/unit_tests/test_precision.py

+Run with pytest:
+    pytest src/kernelbench/unit_tests/test_precision.py -v
+    or
+    uv run pytest src/kernelbench/unit_tests/test_precision.py -v


@ethanboneh this is a bit redundant and too extensive, perhaps we can clean it up with the real preicison checks in result PR

simonguozirui · 2026-01-06T05:18:58Z

Thanks to @alexzhang13 and @PaliC for the thoughtful reviews and discussions!

The goal of this PR is to introduce an initial checker system. While I like @PaliC’s AST-based checker idea, this PR focuses on a regex-based approach, which already catches a large class of problematic patterns in practice. Many of these patterns are derived from observations during Kevin RL training, as well as issues highlighted in Jiwei’s recent blog post.

One advantage of regex-based checking is that it allows us to efficiently scan and validate large numbers of already-generated kernels. Going forward, we can complement this with more dynamic checkers in the evaluation pipeline, and I also plan to explore adding an LM-as-a-judge checker.

This work is part of our ongoing effort to mitigate reward hacking, as discussed in #74, and we hope it can serve as a useful resource for the community. The current set of patterns is not meant to be exhaustive, and contributions of additional checks are very welcome.

Since this system is still experimental, we will merge it but it is gated behind a flag. Users can configure groups of patterns that are either disallowed or treated as warnings, depending on their use case. See the file for more details on our suggestions for use cases.

particularly a note on degree of how much pytorch computational ops should be allowed

# Since KernelBench problems uses PyTorch as a reference, there could be settigs where
# Model generated code
# 1. Replaces some (not all) ops with custom kernels, others are kept in Torch
# --> More practical from a performance perspective (ie. make better systems) as you want to use whatever makes the best system for your use case. 
# 2. All compuational ops must be replaced with custom kernels
# --> Could be helpful from an eval (model ability on transpile + optimization) / RL training perspective 
# Depends the setting you use, you can move the checks below (pytorch_wrap, torch_computation_ops) 
# from WARNING to STRICT

# --- PyTorch NN Module Wrapping ---
# Allows: nn.Module, nn.Parameter, nn.ParameterList, nn.ParameterDict, 
#         nn.ModuleList, nn.ModuleDict, nn.init (needed for model structure)
# Blocks: nn.Linear, nn.Conv2d, nn.ReLU, etc. (compute layers)

start code checker to avoid reward hacking

a0017bc

simonguozirui requested a review from nathanjpaek December 23, 2025 01:10

simonguozirui assigned simonguozirui and Willy-Chan Dec 23, 2025

Willy-Chan and others added 5 commits December 23, 2025 00:13

prelim thunderkittens code check

6d579c3

optional producer-consumer semantics, setup for compilation check

0ca10f3

proposed rejection behavior

80cb0d1

an initial modularized version, needs to simplify and cleaning

76cef06

clearer organization for static reward hack checking

a933c49

simonguozirui requested a review from alexzhang13 January 4, 2026 16:34

simonguozirui changed the title ~~[WIP] Checker of Kernel Code to ensure Language~~ [WIP] Static Kernel Code Checker Jan 4, 2026

PaliC reviewed Jan 4, 2026

View reviewed changes

simonguozirui and others added 2 commits January 4, 2026 23:46

Added more static checks ethan wrote (from deep reinforce blog)

fb3213f

and dsl tests Co-authored-by: Simon Guo <[email protected]> Co-authored-by: Ethan <[email protected]>

Merge main and move kernel_static_checker to kernelbench package

106e416

simonguozirui assigned ethanboneh Jan 4, 2026

add integration paths and simplify test cases, (could add more advers…

846867f

…erial ones)

simonguozirui changed the title ~~[WIP] Static Kernel Code Checker~~ Static Kernel Code Checker Jan 5, 2026

simonguozirui and others added 3 commits January 5, 2026 20:49

gate behind a flag for now as a start for the checker system

92395ec

precision downgrades

53004e9

precision downgrade check and test

2aab8cd

simonguozirui commented Jan 6, 2026

View reviewed changes

simonguozirui merged commit 768d52c into main Jan 6, 2026

simonguozirui mentioned this pull request Jan 6, 2026

[Roadmap] Fall 2025 KernelBench Maintenance + Improvement Plan #74

Open

28 tasks

Static Kernel Code Checker #110

Static Kernel Code Checker #110

Uh oh!

Conversation

simonguozirui commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PaliC left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PaliC Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexzhang13 commented Jan 5, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

simonguozirui commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

simonguozirui commented Dec 23, 2025 •

edited

Loading

PaliC Jan 4, 2026 •

edited

Loading