-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Log max_autotune exceptions (#159687) #159688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159688
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (3 Unrelated Failures)As of commit 7f1f847 with merge base eb25a95 ( FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D79420953 |
|
@pytorchbot label "topic: not user facing" |
| get_chromium_event_logger().add_event_data( | ||
| event_name, autotune_choices_stats=payload | ||
| ) | ||
| sys.stderr.write(f"Autotune Choices Stats:\n{payload}\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how we log stuff here? No logging API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to know more about other logging APIs.
the logging is mainly done by calling get_chromium_event_logger().add_event_data(). So the data will be part of _template_autotuning event.
The sys.stderr.write is just for local debug. I can remove that for sure.
Btw, this part of code is actually another PR (#159496) and was already landed. I'll make necessary changes with another PR if needed.
stashuk-olek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Summary:
Exceptions during autotune kernel precompilation are now systematically captured and reported via the chromium_event_logger, enabling better debugging and analysis of autotune failures.
Currently, exceptions are dumped to the console in the following format::
```
[0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 262144 Hardware limit:232448 Reducing block sizes or `num_stages` may help.
[0/0] Runtime error during autotuning:
[0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 262144 Hardware limit:232448 Reducing block sizes or `num_stages` may help..
[0/0] Ignoring this choice.
```
With this change, the exception details will also be logged in the metadata of the `{name}_template_precompiling` event.
The format:
```
{
"exceptions": [
{
"choice_type": "triton",
"choice": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4, num_consumer_groups=0, num_buffers_warp_spec=0",
"exception_message": "No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 262144 Hardware limit:232448 Reducing block sizes or `num_stages` may help.",
"exception": "OutOfMemoryError",
"required_memory": "262144",
"hardware_limit": "232448"
}
]
}
```
Test Plan:
buck2 run //scripts/wychi:test_autotune_mm 2>&1 > /tmp/mylog.txt
Rollback Plan:
Reviewed By: stashuk-olek
Differential Revision: D79420953
Summary: Pull Request resolved: pytorch#159688 Pull Request resolved: pytorch#159687 Exceptions during autotune kernel precompilation are now systematically captured and reported via the chromium_event_logger, enabling better debugging and analysis of autotune failures. Currently, exceptions are dumped to the console in the following format:: ``` [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 262144 Hardware limit:232448 Reducing block sizes or `num_stages` may help. [0/0] Runtime error during autotuning: [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 262144 Hardware limit:232448 Reducing block sizes or `num_stages` may help.. [0/0] Ignoring this choice. ``` With this change, the exception details will also be logged in the metadata of the `{name}_template_precompiling` event. The format: ``` { "exceptions": [ { "choice_type": "triton", "choice": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4, num_consumer_groups=0, num_buffers_warp_spec=0", "exception_message": "No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 262144 Hardware limit:232448 Reducing block sizes or `num_stages` may help.", "exception": "OutOfMemoryError", "required_memory": "262144", "hardware_limit": "232448" } ] } ``` Test Plan: buck2 run //scripts/wychi:test_autotune_mm 2>&1 > /tmp/mylog.txt Rollback Plan: Reviewed By: stashuk-olek Differential Revision: D79420953
|
This pull request was exported from Phabricator. Differential Revision: D79420953 |
| if not pt2_compile_substack: | ||
| return | ||
|
|
||
| current_event = pt2_compile_substack[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This smells like an anti-pattern. @jamesjwu wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not ideal, but it involves minimal changes. The assumption here is that the log occurs within the _template_precompiling event, which holds true with the current implementation.
Another approach would be to pass the exceptions by altering the return type of make_precompile_fn() and wait_on_futures().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the "anti-pattern" is not very harmful, I want to proceed with the change to gather some preliminary data.
Based on whether the data proves useful or valuable, I will either revert the PR if it’s not helpful or find a better location to log the errors if further action is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
|
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary:
Exceptions during autotune kernel precompilation are now systematically captured and reported via the chromium_event_logger, enabling better debugging and analysis of autotune failures.
Currently, exceptions are dumped to the console in the following format::
```
[0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 262144 Hardware limit:232448 Reducing block sizes or `num_stages` may help.
[0/0] Runtime error during autotuning:
[0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 262144 Hardware limit:232448 Reducing block sizes or `num_stages` may help..
[0/0] Ignoring this choice.
```
The exception tracebacks:
```
# inner exception
traceback:
File "/torch/_inductor/runtime/triton_heuristics.py", line 603, in _make_launchers
launchers.append(result.make_launcher())
^^^^^^^^^^^^^^^^^^^^^^
File "/torch/_inductor/runtime/triton_heuristics.py", line 1503, in make_launcher
self.kernel.load_kernel(device)
File "/torch/_inductor/runtime/static_cuda_launcher.py", line 113, in load_kernel
(self.function, self.n_regs, self.n_spills) = _StaticCudaLauncher._load_kernel(
# wrapped exception
traceback:
File "/usr/local/fbcode/platform010/lib/python3.12/concurrent/futures/thread.py", line 59, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<trimmed>#link-tree/torch/_inductor/select_algorithm.py", line 2596, in precompile_with_captured_stdout
choice.precompile()
File "<trimmed>#link-tree/torch/_inductor/select_algorithm.py", line 1881, in precompile
self.bmreq.precompile()
File "<trimmed>#link-tree/torch/_inductor/autotune_process.py", line 660, in precompile
getattr(mod, self.kernel_name).precompile()
File "<trimmed>#link-tree/torch/_inductor/runtime/triton_heuristics.py", line 440, in precompile
self._make_launchers()
File "<trimmed>#link-tree/torch/_inductor/runtime/triton_heuristics.py", line 608, in _make_launchers
raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}")
```
With this change, the exception details will also be logged in the metadata of the `{name}_template_precompiling` event.
The format:
```
{
"exceptions": [
{
"choice_type": "triton",
"choice": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4, num_consumer_groups=0, num_buffers_warp_spec=0",
"exception_message": "No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 262144 Hardware limit:232448 Reducing block sizes or `num_stages` may help.",
"exception": "OutOfMemoryError",
"required_memory": "262144",
"hardware_limit": "232448"
}
]
}
```
Test Plan:
buck2 run //scripts/wychi:test_autotune_mm 2>&1 > /tmp/mylog.txt
Rollback Plan:
Differential Revision: D79420953
Pull Request resolved: pytorch#159688
Approved by: https://github.com/stashuk-olek
Summary:
Exceptions during autotune kernel precompilation are now systematically captured and reported via the chromium_event_logger, enabling better debugging and analysis of autotune failures.
Currently, exceptions are dumped to the console in the following format::
```
[0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 262144 Hardware limit:232448 Reducing block sizes or `num_stages` may help.
[0/0] Runtime error during autotuning:
[0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 262144 Hardware limit:232448 Reducing block sizes or `num_stages` may help..
[0/0] Ignoring this choice.
```
The exception tracebacks:
```
# inner exception
traceback:
File "/torch/_inductor/runtime/triton_heuristics.py", line 603, in _make_launchers
launchers.append(result.make_launcher())
^^^^^^^^^^^^^^^^^^^^^^
File "/torch/_inductor/runtime/triton_heuristics.py", line 1503, in make_launcher
self.kernel.load_kernel(device)
File "/torch/_inductor/runtime/static_cuda_launcher.py", line 113, in load_kernel
(self.function, self.n_regs, self.n_spills) = _StaticCudaLauncher._load_kernel(
# wrapped exception
traceback:
File "/usr/local/fbcode/platform010/lib/python3.12/concurrent/futures/thread.py", line 59, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<trimmed>#link-tree/torch/_inductor/select_algorithm.py", line 2596, in precompile_with_captured_stdout
choice.precompile()
File "<trimmed>#link-tree/torch/_inductor/select_algorithm.py", line 1881, in precompile
self.bmreq.precompile()
File "<trimmed>#link-tree/torch/_inductor/autotune_process.py", line 660, in precompile
getattr(mod, self.kernel_name).precompile()
File "<trimmed>#link-tree/torch/_inductor/runtime/triton_heuristics.py", line 440, in precompile
self._make_launchers()
File "<trimmed>#link-tree/torch/_inductor/runtime/triton_heuristics.py", line 608, in _make_launchers
raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}")
```
With this change, the exception details will also be logged in the metadata of the `{name}_template_precompiling` event.
The format:
```
{
"exceptions": [
{
"choice_type": "triton",
"choice": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=4, num_consumer_groups=0, num_buffers_warp_spec=0",
"exception_message": "No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 262144 Hardware limit:232448 Reducing block sizes or `num_stages` may help.",
"exception": "OutOfMemoryError",
"required_memory": "262144",
"hardware_limit": "232448"
}
]
}
```
Test Plan:
buck2 run //scripts/wychi:test_autotune_mm 2>&1 > /tmp/mylog.txt
Rollback Plan:
Differential Revision: D79420953
Pull Request resolved: pytorch#159688
Approved by: https://github.com/stashuk-olek
Summary:
Exceptions during autotune kernel precompilation are now systematically captured and reported via the chromium_event_logger, enabling better debugging and analysis of autotune failures.
Currently, exceptions are dumped to the console in the following format::
The exception tracebacks:
With this change, the exception details will also be logged in the metadata of the
{name}_template_precompilingevent.The format:
Test Plan:
buck2 run //scripts/wychi:test_autotune_mm 2>&1 > /tmp/mylog.txt
Rollback Plan:
Differential Revision: D79420953
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben