-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Closed
Labels
module: ddpIssues/PRs related distributed data parallel trainingIssues/PRs related distributed data parallel trainingmodule: dynamic shapesoncall: distributedAdd this issue/PR to distributed oncall triage queueAdd this issue/PR to distributed oncall triage queueoncall: pt2pt2d-triage-nov2024triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
🐛 Describe the bug
I am going to exhaust recompilation also with extended torch._dynamo hit config.cache_size_limit (64) on this module
Just enabling the compilation on this module
https://github.com/hustvl/ViTMatte/blob/main/modeling/decoder/detail_capture.py#L132-L140
Error logs
function: 'forward' (/workspace/modeling/decoder/detail_capture.py:131)
[rank6]:W1101 02:43:39.336000 143 site-packages/torch/_dynamo/convert_frame.py:896] [1/64] last reason: 1/0: tensor 'L['images']' size mismatch at index 2. expected 1696, actual 1248
[rank6]:W1101 02:43:39.336000 143 site-packages/torch/_dynamo/convert_frame.py:896] [1/64] To log all recompilation reasons, use TORCH_LOGS="recompiles".
[rank6]:W1101 02:43:39.336000 143 site-packages/torch/_dynamo/convert_frame.py:896] [1/64] To diagnose recompilation issues, see https://pytorch.org/docs/main/torch.compiler_troubleshooting.html.
[rank3]:W1101 02:43:40.706000 140 site-packages/torch/_dynamo/convert_frame.py:896] [1/64] torch._dynamo hit config.cache_size_limit (64)
[rank3]:W1101 02:43:40.706000 140 site-packages/torch/_dynamo/convert_frame.py:896] [1/64] function: 'forward' (/workspace/modeling/decoder/detail_capture.py:131)
[rank3]:W1101 02:43:40.706000 140 site-packages/torch/_dynamo/convert_frame.py:896] [1/64] last reason: 1/0: tensor 'L['images']' size mismatch at index 2. expected 1248, actual 1376Minified repro
No response
Versions
nightly
cc @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @chauhang @penguinwu @ezyang @bobrenjc93
Metadata
Metadata
Assignees
Labels
module: ddpIssues/PRs related distributed data parallel trainingIssues/PRs related distributed data parallel trainingmodule: dynamic shapesoncall: distributedAdd this issue/PR to distributed oncall triage queueAdd this issue/PR to distributed oncall triage queueoncall: pt2pt2d-triage-nov2024triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module