-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[IG] Avoid generation of empty merge cpu submodule by splitter v2 #140794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary:
Customize splitter behavior to mark `get_attr` nodes as acc supported.
Currently these nodes are excluded by `FxNetAccNodesFinder` which marks all nodes with op not in `CALLABLE_NODE_OPS` ("call_module", "call_function", "call_method") as unsupported.
Before this change, merge-net is split into an almost empty cpu submodule with a single empty output node:
```
INFO:caffe2.torch.fb.model_transform.experimental.prepare_fx_model:###### debug_print nodes for _run_on_cpu_0
INFO:caffe2.torch.fb.model_transform.experimental.prepare_fx_model:Found output node: n.name='output', n.target='output', n.args=((),), n.kwargs={}, n.meta={}
INFO:caffe2.torch.fb.model_transform.experimental.prepare_fx_model:return ()
INFO:caffe2.torch.fb.model_transform.experimental.prepare_fx_model:
_run_on_cpu_0 stats for merge:
[output] output: 1
```
full log: P1678727348 (generated using same command as below)
Test Plan:
Tested by lowering `ig_organic_feed_cn_v2_mtml` using cmd:
```
buck run mode/opt-split-dwarf //tgif/cli:cli -- --model-name=ig_organic_feed_cn_v2_mtml --model-type ig_organic_feed_cn_v2_mtml --world-size=1 --storage-mode 1 --inference-dtype=FP16 --meta-transform=False --use-random-weights=True --accelerator-arch=3 --enable-input-dist=True --embedding-tables-dtype=FP16 --mtia-use-torch-export=True embedding-quantization-pass torchrec-sharding-pass tgif-split-pass gen-app-graph-pass tgif-mtia-lowering-pass dense-quantization-pass save-torch-package-pass generate-model-package-pass pack-weights-and-save-pass 2>&1 | tee /tmp/publish_ig_organic_feed_cn_v2_mtml_mtia_export_20241114_splitter_2.log
```
Output shows only 1 acc submodule is generated for merge:
```
INFO 18:33:15.951 1735650 utils.py:235: [TGIF] num of acc submodules: 1
INFO 18:33:15.952 1735650 utils.py:236: [TGIF] num of cpu submodules: 0
INFO 18:33:16.534 1735650 logging_utils.py:53: [TGIF] _run_on_acc_0 graph module debug info: https://www.internalfb.com/intern/everpaste/?color=0&handle=GK4VKhWsDKF9VdsDAKxhR6KAlhJ0br0LAAAz
INFO 18:33:16.534 1735650 utils.py:257: [TGIF] Start MTIA lowering _run_on_acc_0 in merge, device ordinal: -1
```
full log: P1679596796
Differential Revision: D65983916
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/140794
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit 52c229a with merge base ee3a4f0 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D65983916 |
ezyang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider making it kwarg only
|
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…torch#140794) Summary: Customize splitter behavior to mark `get_attr` nodes as acc supported. Currently these nodes are excluded by `FxNetAccNodesFinder` which marks all nodes with op not in `CALLABLE_NODE_OPS` ("call_module", "call_function", "call_method") as unsupported. Before this change, merge-net is split into an almost empty cpu submodule with a single empty output node: ``` INFO:caffe2.torch.fb.model_transform.experimental.prepare_fx_model:###### debug_print nodes for _run_on_cpu_0 INFO:caffe2.torch.fb.model_transform.experimental.prepare_fx_model:Found output node: n.name='output', n.target='output', n.args=((),), n.kwargs={}, n.meta={} INFO:caffe2.torch.fb.model_transform.experimental.prepare_fx_model:return () INFO:caffe2.torch.fb.model_transform.experimental.prepare_fx_model: _run_on_cpu_0 stats for merge: [output] output: 1 ``` full log: P1678727348 (generated using same command as below) Test Plan: Tested by lowering `ig_organic_feed_cn_v2_mtml` using cmd: ``` buck run mode/opt-split-dwarf //tgif/cli:cli -- --model-name=ig_organic_feed_cn_v2_mtml --model-type ig_organic_feed_cn_v2_mtml --world-size=1 --storage-mode 1 --inference-dtype=FP16 --meta-transform=False --use-random-weights=True --accelerator-arch=3 --enable-input-dist=True --embedding-tables-dtype=FP16 --mtia-use-torch-export=True embedding-quantization-pass torchrec-sharding-pass tgif-split-pass gen-app-graph-pass tgif-mtia-lowering-pass dense-quantization-pass save-torch-package-pass generate-model-package-pass pack-weights-and-save-pass 2>&1 | tee /tmp/publish_ig_organic_feed_cn_v2_mtml_mtia_export_20241114_splitter_2.log ``` Output shows only 1 acc submodule is generated for merge: ``` INFO 18:33:15.951 1735650 utils.py:235: [TGIF] num of acc submodules: 1 INFO 18:33:15.952 1735650 utils.py:236: [TGIF] num of cpu submodules: 0 INFO 18:33:16.534 1735650 logging_utils.py:53: [TGIF] _run_on_acc_0 graph module debug info: https://www.internalfb.com/intern/everpaste/?color=0&handle=GK4VKhWsDKF9VdsDAKxhR6KAlhJ0br0LAAAz INFO 18:33:16.534 1735650 utils.py:257: [TGIF] Start MTIA lowering _run_on_acc_0 in merge, device ordinal: -1 ``` full log: P1679596796 Differential Revision: D65983916 Pull Request resolved: pytorch#140794 Approved by: https://github.com/ezyang
Summary:
Customize splitter behavior to mark
get_attrnodes as acc supported.Currently these nodes are excluded by
FxNetAccNodesFinderwhich marks all nodes with op not inCALLABLE_NODE_OPS("call_module", "call_function", "call_method") as unsupported.Before this change, merge-net is split into an almost empty cpu submodule with a single empty output node:
full log: P1678727348 (generated using same command as below)
Test Plan:
Tested by lowering
ig_organic_feed_cn_v2_mtmlusing cmd:Output shows only 1 acc submodule is generated for merge:
full log: P1679596796
Differential Revision: D65983916
cc @ezyang @SherlockNoMad @EikanWang @jgong5 @wenzhe-nrv