-
Notifications
You must be signed in to change notification settings - Fork 110
Eval Unit Tests for Adversarial Eval Testing #82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Adds a unit test to check that a generated kernel which modifies the original inputs fails the correctness check. For the square matmul problem, the kernel zeros out the inputs and returns a matrix of 0s. This will fail correctness/pass the test as long as the reference implementation is ran first. If we swap the order, the test will fail as the reference implementation will operate on tensors of 0s and it will look like the generated kernel computed the correct output.
Adds a unit test to check that a generated kernel which attempts to access the result from the PyTorch reference model in memory fails the correctness check. If a generated kernel uses empty_like, the CUDA caching allocator can re-use the physical memory of the previously computed result. All the kernel needs to do is return immediately and it will pass the correctness check. Note that in order to reproduce this, we need to copy the PyTorch output to the CPU and delete the output object. Then empty_like will fetch the physical memory for the output object.
f24640f to
9097e65
Compare
use generic matmul shape for cache reuse adversarial kernel rather than requiring a square matmul.
c734bbc to
5bb2679
Compare
dc02d5c to
3d7ff72
Compare
…nto eval-unit-tests
make a non-blocking non-default stream, and use cublasGemmEx rather than at::matmul:
eval script now flags excessive speedups by timing pytorch reference.
|
Thanks @bkal01 to create the adversarial kernel with additional cuda stream.. Now we have unit test and eval timing functions that only time the main cuda_stream might suffer from such attack, but we have added a heuristics way to check it (see if speedup is bigger than some threshold like 10x or 5x). Here is an example using naive |
|
We added an optional and gated logic in the eval function |
|
Tysm @bkal01 for the great work and being super careful. These unit tests would be super helpful for us to test the eval function with adversarial examples. Merging these for now but we can add more later. Right now we added a simple excessive speedup check (heuristics like >5x, 10x) mark it as suspicious. A better approach is to create a SoL modeling (ongoing effort) based on program ops and hardware specs. Also started to add the draft of eval / benchmarking guide here. @PaliC and team will pick up in other PRs. |
adds unit tests for eval scripts
eval scripts should:
emptywhich can get allocated the same physical memory as the PyTorch reference outputsoutputobject at some point before the custom kernel is run, the CUDA cache allocator might give that un-erased physical memory to the custom kernel and it will incorrectly passKernelExecResultmetadata.