-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[aotd] Subclasses profile logging #136478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136478
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (3 Unrelated Failures)As of commit d7fa84c with merge base 87f1990 ( BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
torch/_functorch/config.py
Outdated
| # adds patch, save_config, invalid config checks, etc | ||
| install_config_module(sys.modules[__name__]) | ||
|
|
||
| aotd_debug_profile: bool = os.environ.get("PYTORCH_AOTD_DEBUG_PROFILE", "0") == "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs docs
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
The goal is: Optional profiling of AOTD Runtime overhead, to chromium_logger. Changes to ChromiumLogger: Adding listener to be able to subscribe to events of the logger to: 1/ Verify in the tests that events are sent correctly 2/ Use the data from events for adhoc perf analysis of changes in AOTD ``` TORCH_TRACE="/var/tmp/tt" python test/functorch/test_aotdispatch.py -k test_aotd_debug_profile_overhead_logging ``` ``` SUBCLASS FW_aotd_overhead_avg:2264.533ns 51.080% of fwd_compiled_module_duration_avg_ns:4433.325ns fwd_runtime_wrapper_duration_avg_ns:6697.858ns SUBCLASS BW_aotd_overhead_avg:1449.392ns 19.426% of bwd_compiled_module_duration_avg:7461.050ns bwd_total_duration_avg:8910.442ns NO_SUBCLASS FW_aotd_overhead_avg:2215.717ns 49.924% of fwd_compiled_module_duration_avg_ns:4438.175ns fwd_runtime_wrapper_duration_avg_ns:6653.892ns NO_SUBCLASS BW_aotd_overhead_avg:1206.492ns 16.328% of bwd_compiled_module_duration_avg:7389.025ns bwd_total_duration_avg:8595.517ns ``` Perfetto trace:  cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
| ) | ||
|
|
||
| # Uncomment for adhoc aotd overhead perf experimentation | ||
| _print_stat(" SUBCLASS") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this meant to be commented out before you check it in (So we don't print during CI runs by default)?
Also, I think you mentioned in our 1:1 - but were you thinking of adding this into a benchmark somewhere that we can track over time for regressions? (e.g. we have some benchmarks here, although they are mainly compile-time related https://github.com/pytorch/pytorch/tree/main/benchmarks/dynamo/pr_time_benchmarks/benchmarks)
torch/_dynamo/utils.py
Outdated
| # TODO: log to init/id tlparse after I add support for it | ||
| log.info("ChromiumEventLogger initialized with id %s", self.id_) | ||
|
|
||
| def add_listener(self, listener: Callable[[Dict[str, Any]], None]) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @jamesjwu, since IIRC you added the ChromiumEventLogger code
| @staticmethod | ||
| def forward(ctx, *deduped_flat_tensor_args): | ||
| if torch._functorch.config.aotd_debug_profile: | ||
| get_chromium_event_logger().log_event_start( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you sanity check that get_chromium_event_logger().log_event_start() is fast enough that it doesn't throw off our measurements? (especially since these runtime overhead benchmarks are in the range of micro-to-milliseconds, while most compile time benchmarks are seconds-to-minutes)
|
Does this subsume #135501 ? |
jamesjwu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable from chromium event side. Can you run a benchmark like nanogpt with training enabled and post the full chromium trace after adding this PR?
Overall, I think I'd just like a little more documentation on what AOTD is somewhere near these events/config
|
|
||
| def runtime_wrapper(args: List[Any]): | ||
| if torch._functorch.config.aotd_debug_profile: | ||
| get_chromium_event_logger().log_event_start( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong, but this causes the event to be logged at runtime instead of compile time, correct? (I.e. the compiled function gets run by the runtime wrapper in between log event start and log event end). So you're measuring the entire runtime of the compiled function by doing this. Is that actually the event time that you want to measure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, here I want to measure runtime overhead of subclasses. Wrapping-unwrapping in wrappers etc.
Do you know, if nanogpt uses subclasses? |
The goal is: Optional profiling of AOTD Runtime overhead, to chromium_logger. Changes to ChromiumLogger: Adding listener to be able to subscribe to events of the logger to: 1/ Verify in the tests that events are sent correctly 2/ Use the data from events for adhoc perf analysis of changes in AOTD ``` TORCH_TRACE="/var/tmp/tt" python test/functorch/test_aotdispatch.py -k test_aotd_debug_profile_overhead_logging ``` ``` SUBCLASS FW_aotd_overhead_avg:2264.533ns 51.080% of fwd_compiled_module_duration_avg_ns:4433.325ns fwd_runtime_wrapper_duration_avg_ns:6697.858ns SUBCLASS BW_aotd_overhead_avg:1449.392ns 19.426% of bwd_compiled_module_duration_avg:7461.050ns bwd_total_duration_avg:8910.442ns NO_SUBCLASS FW_aotd_overhead_avg:2215.717ns 49.924% of fwd_compiled_module_duration_avg_ns:4438.175ns fwd_runtime_wrapper_duration_avg_ns:6653.892ns NO_SUBCLASS BW_aotd_overhead_avg:1206.492ns 16.328% of bwd_compiled_module_duration_avg:7389.025ns bwd_total_duration_avg:8595.517ns ``` Perfetto trace:  cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
The goal is: Optional profiling of AOTD Runtime overhead, to chromium_logger. Changes to ChromiumLogger: Adding listener to be able to subscribe to events of the logger to: 1/ Verify in the tests that events are sent correctly 2/ Use the data from events for adhoc perf analysis of changes in AOTD ``` TORCH_TRACE="/var/tmp/tt" python test/functorch/test_aotdispatch.py -k test_aotd_debug_profile_overhead_logging ``` ``` SUBCLASS FW_aotd_overhead_avg:2264.533ns 51.080% of fwd_compiled_module_duration_avg_ns:4433.325ns fwd_runtime_wrapper_duration_avg_ns:6697.858ns SUBCLASS BW_aotd_overhead_avg:1449.392ns 19.426% of bwd_compiled_module_duration_avg:7461.050ns bwd_total_duration_avg:8910.442ns NO_SUBCLASS FW_aotd_overhead_avg:2215.717ns 49.924% of fwd_compiled_module_duration_avg_ns:4438.175ns fwd_runtime_wrapper_duration_avg_ns:6653.892ns NO_SUBCLASS BW_aotd_overhead_avg:1206.492ns 16.328% of bwd_compiled_module_duration_avg:7389.025ns bwd_total_duration_avg:8595.517ns ``` Perfetto trace:  cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
The goal is: Optional profiling of AOTD Runtime overhead, to chromium_logger. Changes to ChromiumLogger: Adding listener to be able to subscribe to events of the logger to: 1/ Verify in the tests that events are sent correctly 2/ Use the data from events for adhoc perf analysis of changes in AOTD ``` TORCH_TRACE="/var/tmp/tt" python test/functorch/test_aotdispatch.py -k test_aotd_debug_profile_overhead_logging ``` ``` SUBCLASS FW_aotd_overhead_avg:2264.533ns 51.080% of fwd_compiled_module_duration_avg_ns:4433.325ns fwd_runtime_wrapper_duration_avg_ns:6697.858ns SUBCLASS BW_aotd_overhead_avg:1449.392ns 19.426% of bwd_compiled_module_duration_avg:7461.050ns bwd_total_duration_avg:8910.442ns NO_SUBCLASS FW_aotd_overhead_avg:2215.717ns 49.924% of fwd_compiled_module_duration_avg_ns:4438.175ns fwd_runtime_wrapper_duration_avg_ns:6653.892ns NO_SUBCLASS BW_aotd_overhead_avg:1206.492ns 16.328% of bwd_compiled_module_duration_avg:7389.025ns bwd_total_duration_avg:8595.517ns ``` Perfetto trace:  cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
The goal is: Optional profiling of AOTD Runtime overhead, to chromium_logger. Changes to ChromiumLogger: Adding listener to be able to subscribe to events of the logger to: 1/ Verify in the tests that events are sent correctly 2/ Use the data from events for adhoc perf analysis of changes in AOTD ``` TORCH_TRACE="/var/tmp/tt" python test/functorch/test_aotdispatch.py -k test_aotd_debug_profile_overhead_logging ``` ``` SUBCLASS FW_aotd_overhead_avg:2264.533ns 51.080% of fwd_compiled_module_duration_avg_ns:4433.325ns fwd_runtime_wrapper_duration_avg_ns:6697.858ns SUBCLASS BW_aotd_overhead_avg:1449.392ns 19.426% of bwd_compiled_module_duration_avg:7461.050ns bwd_total_duration_avg:8910.442ns NO_SUBCLASS FW_aotd_overhead_avg:2215.717ns 49.924% of fwd_compiled_module_duration_avg_ns:4438.175ns fwd_runtime_wrapper_duration_avg_ns:6653.892ns NO_SUBCLASS BW_aotd_overhead_avg:1206.492ns 16.328% of bwd_compiled_module_duration_avg:7389.025ns bwd_total_duration_avg:8595.517ns ``` Perfetto trace:  cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Stack from ghstack (oldest at bottom):
The goal is:
Optional profiling of AOTD Runtime overhead, to chromium_logger.
Changes to ChromiumLogger:
Adding listener to be able to subscribe to events of the logger to:
1/ Verify in the tests that events are sent correctly
2/ Use the data from events for adhoc perf analysis of changes in AOTD
Perfetto trace:
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @rec