Skip to content

Conversation

@IvanKobzarev
Copy link
Contributor

@IvanKobzarev IvanKobzarev commented Sep 23, 2024

Stack from ghstack (oldest at bottom):

The goal is:
Optional profiling of AOTD Runtime overhead, to chromium_logger.

Changes to ChromiumLogger:

Adding listener to be able to subscribe to events of the logger to:
1/ Verify in the tests that events are sent correctly
2/ Use the data from events for adhoc perf analysis of changes in AOTD

TORCH_TRACE="/var/tmp/tt" python test/functorch/test_aotdispatch.py -k test_aotd_debug_profile_overhead_logging
   SUBCLASS FW_aotd_overhead_avg:2264.533ns 51.080% of fwd_compiled_module_duration_avg_ns:4433.325ns fwd_runtime_wrapper_duration_avg_ns:6697.858ns
   SUBCLASS BW_aotd_overhead_avg:1449.392ns 19.426% of bwd_compiled_module_duration_avg:7461.050ns bwd_total_duration_avg:8910.442ns
NO_SUBCLASS FW_aotd_overhead_avg:2215.717ns 49.924% of fwd_compiled_module_duration_avg_ns:4438.175ns fwd_runtime_wrapper_duration_avg_ns:6653.892ns
NO_SUBCLASS BW_aotd_overhead_avg:1206.492ns 16.328% of bwd_compiled_module_duration_avg:7389.025ns bwd_total_duration_avg:8595.517ns

Perfetto trace:

Screenshot 2024-09-24 at 16 22 54

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @rec

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 23, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136478

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit d7fa84c with merge base 87f1990 (image):

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

IvanKobzarev added a commit that referenced this pull request Sep 23, 2024
ghstack-source-id: dd99cf4
Pull Request resolved: #136478
# adds patch, save_config, invalid config checks, etc
install_config_module(sys.modules[__name__])

aotd_debug_profile: bool = os.environ.get("PYTORCH_AOTD_DEBUG_PROFILE", "0") == "1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs docs

@ezyang ezyang requested a review from jamesjwu September 24, 2024 01:37
@IvanKobzarev IvanKobzarev added the topic: not user facing topic category label Sep 24, 2024
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec

[ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec

[ghstack-poisoned]
The goal is:
Optional profiling of AOTD Runtime overhead, to chromium_logger.

Changes to ChromiumLogger:

Adding listener to be able to subscribe to events of the logger to:
1/ Verify in the tests that events are sent correctly
2/ Use the data from events for adhoc perf analysis of changes in AOTD

```
TORCH_TRACE="/var/tmp/tt" python test/functorch/test_aotdispatch.py -k test_aotd_debug_profile_overhead_logging
```
```
   SUBCLASS FW_aotd_overhead_avg:2264.533ns 51.080% of fwd_compiled_module_duration_avg_ns:4433.325ns fwd_runtime_wrapper_duration_avg_ns:6697.858ns
   SUBCLASS BW_aotd_overhead_avg:1449.392ns 19.426% of bwd_compiled_module_duration_avg:7461.050ns bwd_total_duration_avg:8910.442ns
NO_SUBCLASS FW_aotd_overhead_avg:2215.717ns 49.924% of fwd_compiled_module_duration_avg_ns:4438.175ns fwd_runtime_wrapper_duration_avg_ns:6653.892ns
NO_SUBCLASS BW_aotd_overhead_avg:1206.492ns 16.328% of bwd_compiled_module_duration_avg:7389.025ns bwd_total_duration_avg:8595.517ns
```



Perfetto trace:

![Screenshot 2024-09-24 at 16 22 54](https://github.com/user-attachments/assets/8a5293bb-6dbc-40e6-935a-5c5bd79ae717)

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Sep 24, 2024
ghstack-source-id: 05fa17d
Pull Request resolved: #136478
)

# Uncomment for adhoc aotd overhead perf experimentation
_print_stat(" SUBCLASS")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be commented out before you check it in (So we don't print during CI runs by default)?

Also, I think you mentioned in our 1:1 - but were you thinking of adding this into a benchmark somewhere that we can track over time for regressions? (e.g. we have some benchmarks here, although they are mainly compile-time related https://github.com/pytorch/pytorch/tree/main/benchmarks/dynamo/pr_time_benchmarks/benchmarks)

# TODO: log to init/id tlparse after I add support for it
log.info("ChromiumEventLogger initialized with id %s", self.id_)

def add_listener(self, listener: Callable[[Dict[str, Any]], None]) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @jamesjwu, since IIRC you added the ChromiumEventLogger code

@staticmethod
def forward(ctx, *deduped_flat_tensor_args):
if torch._functorch.config.aotd_debug_profile:
get_chromium_event_logger().log_event_start(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you sanity check that get_chromium_event_logger().log_event_start() is fast enough that it doesn't throw off our measurements? (especially since these runtime overhead benchmarks are in the range of micro-to-milliseconds, while most compile time benchmarks are seconds-to-minutes)

@bdhirsh
Copy link
Contributor

bdhirsh commented Sep 24, 2024

Does this subsume #135501 ?

@IvanKobzarev
Copy link
Contributor Author

Does this subsume #135501 ?

I've reused the test logic and applied Edwards comment to move config in fuctorch/config, but dropped profile module that I started in that PR in favor of using chromium_logger. I will abandon/close #135501.

Copy link
Contributor

@jamesjwu jamesjwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable from chromium event side. Can you run a benchmark like nanogpt with training enabled and post the full chromium trace after adding this PR?

Overall, I think I'd just like a little more documentation on what AOTD is somewhere near these events/config


def runtime_wrapper(args: List[Any]):
if torch._functorch.config.aotd_debug_profile:
get_chromium_event_logger().log_event_start(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong, but this causes the event to be logged at runtime instead of compile time, correct? (I.e. the compiled function gets run by the runtime wrapper in between log event start and log event end). So you're measuring the entire runtime of the compiled function by doing this. Is that actually the event time that you want to measure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, here I want to measure runtime overhead of subclasses. Wrapping-unwrapping in wrappers etc.

@IvanKobzarev
Copy link
Contributor Author

nanogpt

Do you know, if nanogpt uses subclasses?

The goal is:
Optional profiling of AOTD Runtime overhead, to chromium_logger.

Changes to ChromiumLogger:

Adding listener to be able to subscribe to events of the logger to:
1/ Verify in the tests that events are sent correctly
2/ Use the data from events for adhoc perf analysis of changes in AOTD

```
TORCH_TRACE="/var/tmp/tt" python test/functorch/test_aotdispatch.py -k test_aotd_debug_profile_overhead_logging
```
```
   SUBCLASS FW_aotd_overhead_avg:2264.533ns 51.080% of fwd_compiled_module_duration_avg_ns:4433.325ns fwd_runtime_wrapper_duration_avg_ns:6697.858ns
   SUBCLASS BW_aotd_overhead_avg:1449.392ns 19.426% of bwd_compiled_module_duration_avg:7461.050ns bwd_total_duration_avg:8910.442ns
NO_SUBCLASS FW_aotd_overhead_avg:2215.717ns 49.924% of fwd_compiled_module_duration_avg_ns:4438.175ns fwd_runtime_wrapper_duration_avg_ns:6653.892ns
NO_SUBCLASS BW_aotd_overhead_avg:1206.492ns 16.328% of bwd_compiled_module_duration_avg:7389.025ns bwd_total_duration_avg:8595.517ns
```



Perfetto trace:

![Screenshot 2024-09-24 at 16 22 54](https://github.com/user-attachments/assets/8a5293bb-6dbc-40e6-935a-5c5bd79ae717)

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Sep 30, 2024
ghstack-source-id: ce74ec6
Pull Request resolved: #136478
The goal is:
Optional profiling of AOTD Runtime overhead, to chromium_logger.

Changes to ChromiumLogger:

Adding listener to be able to subscribe to events of the logger to:
1/ Verify in the tests that events are sent correctly
2/ Use the data from events for adhoc perf analysis of changes in AOTD

```
TORCH_TRACE="/var/tmp/tt" python test/functorch/test_aotdispatch.py -k test_aotd_debug_profile_overhead_logging
```
```
   SUBCLASS FW_aotd_overhead_avg:2264.533ns 51.080% of fwd_compiled_module_duration_avg_ns:4433.325ns fwd_runtime_wrapper_duration_avg_ns:6697.858ns
   SUBCLASS BW_aotd_overhead_avg:1449.392ns 19.426% of bwd_compiled_module_duration_avg:7461.050ns bwd_total_duration_avg:8910.442ns
NO_SUBCLASS FW_aotd_overhead_avg:2215.717ns 49.924% of fwd_compiled_module_duration_avg_ns:4438.175ns fwd_runtime_wrapper_duration_avg_ns:6653.892ns
NO_SUBCLASS BW_aotd_overhead_avg:1206.492ns 16.328% of bwd_compiled_module_duration_avg:7389.025ns bwd_total_duration_avg:8595.517ns
```



Perfetto trace:

![Screenshot 2024-09-24 at 16 22 54](https://github.com/user-attachments/assets/8a5293bb-6dbc-40e6-935a-5c5bd79ae717)

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Oct 31, 2024
ghstack-source-id: fefb69a
Pull Request resolved: #136478

python test/distributed/_tensor/test_dtensor_compile.py -k
test_unwrap_async_collective_tensor_tangent
The goal is:
Optional profiling of AOTD Runtime overhead, to chromium_logger.

Changes to ChromiumLogger:

Adding listener to be able to subscribe to events of the logger to:
1/ Verify in the tests that events are sent correctly
2/ Use the data from events for adhoc perf analysis of changes in AOTD

```
TORCH_TRACE="/var/tmp/tt" python test/functorch/test_aotdispatch.py -k test_aotd_debug_profile_overhead_logging
```
```
   SUBCLASS FW_aotd_overhead_avg:2264.533ns 51.080% of fwd_compiled_module_duration_avg_ns:4433.325ns fwd_runtime_wrapper_duration_avg_ns:6697.858ns
   SUBCLASS BW_aotd_overhead_avg:1449.392ns 19.426% of bwd_compiled_module_duration_avg:7461.050ns bwd_total_duration_avg:8910.442ns
NO_SUBCLASS FW_aotd_overhead_avg:2215.717ns 49.924% of fwd_compiled_module_duration_avg_ns:4438.175ns fwd_runtime_wrapper_duration_avg_ns:6653.892ns
NO_SUBCLASS BW_aotd_overhead_avg:1206.492ns 16.328% of bwd_compiled_module_duration_avg:7389.025ns bwd_total_duration_avg:8595.517ns
```



Perfetto trace:

![Screenshot 2024-09-24 at 16 22 54](https://github.com/user-attachments/assets/8a5293bb-6dbc-40e6-935a-5c5bd79ae717)

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec

[ghstack-poisoned]
The goal is:
Optional profiling of AOTD Runtime overhead, to chromium_logger.

Changes to ChromiumLogger:

Adding listener to be able to subscribe to events of the logger to:
1/ Verify in the tests that events are sent correctly
2/ Use the data from events for adhoc perf analysis of changes in AOTD

```
TORCH_TRACE="/var/tmp/tt" python test/functorch/test_aotdispatch.py -k test_aotd_debug_profile_overhead_logging
```
```
   SUBCLASS FW_aotd_overhead_avg:2264.533ns 51.080% of fwd_compiled_module_duration_avg_ns:4433.325ns fwd_runtime_wrapper_duration_avg_ns:6697.858ns
   SUBCLASS BW_aotd_overhead_avg:1449.392ns 19.426% of bwd_compiled_module_duration_avg:7461.050ns bwd_total_duration_avg:8910.442ns
NO_SUBCLASS FW_aotd_overhead_avg:2215.717ns 49.924% of fwd_compiled_module_duration_avg_ns:4438.175ns fwd_runtime_wrapper_duration_avg_ns:6653.892ns
NO_SUBCLASS BW_aotd_overhead_avg:1206.492ns 16.328% of bwd_compiled_module_duration_avg:7389.025ns bwd_total_duration_avg:8595.517ns
```



Perfetto trace:

![Screenshot 2024-09-24 at 16 22 54](https://github.com/user-attachments/assets/8a5293bb-6dbc-40e6-935a-5c5bd79ae717)

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec

[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Oct 31, 2024
ghstack-source-id: 55f834b
Pull Request resolved: #136478

python test/functorch/test_aotdispatch.py -k
test_aotd_debug_profile_overhead_logging
@github-actions
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Dec 30, 2024
@github-actions github-actions bot closed this Jan 29, 2025
@github-actions github-actions bot deleted the gh/IvanKobzarev/72/head branch March 1, 2025 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants