Skip to content

Fix FSDP2 and distributed checkpointing imports for older PyTorch versions#46141

Merged
3outeille merged 6 commits into
huggingface:mainfrom
ryota-komatsu:fix-fsdp-import-version
May 26, 2026
Merged

Fix FSDP2 and distributed checkpointing imports for older PyTorch versions#46141
3outeille merged 6 commits into
huggingface:mainfrom
ryota-komatsu:fix-fsdp-import-version

Conversation

@ryota-komatsu
Copy link
Copy Markdown
Contributor

What does this PR do?

This PR updates the PyTorch version constraints for specific distributed features to prevent ImportError and ModuleNotFoundError crashes on older PyTorch versions:

  • Bumps the minimum PyTorch requirement for FSDP2 from >=2.5 to >=2.6.
  • Add a minimum PyTorch requirement of >=2.7 for distributed checkpoint saving.

Currently, attempting to initialize FSDP2 with torch==2.5 results in an import error because CPUOffloadPolicy, MixedPrecisionPolicy, and OffloadPolicy are not available in 'torch.distributed.fsdp' for that version.

Similarly, attempting to use distributed checkpointing on versions earlier than torch==2.7 crashes because HuggingFaceStorageWriter does not exist in torch.distributed.checkpoint.hf_storage.

Tracebacks

transformers/distributed/fsdp.py", line 34, in <module>
    from torch.distributed.fsdp import CPUOffloadPolicy, MixedPrecisionPolicy, OffloadPolicy
ImportError: cannot import name 'CPUOffloadPolicy' from 'torch.distributed.fsdp'
transformers/distributed/utils.py", line 42, in <module>
    from torch.distributed.checkpoint.hf_storage import HuggingFaceStorageWriter
ModuleNotFoundError: No module named 'torch.distributed.checkpoint.hf_storage'
  • I confirm that this is not a pure code agent PR.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@3outeille
Copy link
Copy Markdown
Member

3outeille commented May 25, 2026

Thanks for your PR ! Ci is not passing because of #46187, @ArthurZucker We can merge this and fix later issues related to cohere_moe

@ryota-komatsu ryota-komatsu force-pushed the fix-fsdp-import-version branch from 8f98cbe to 7971318 Compare May 25, 2026 14:30
@ryota-komatsu ryota-komatsu force-pushed the fix-fsdp-import-version branch from 7971318 to a7d46fb Compare May 25, 2026 23:41
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@3outeille 3outeille enabled auto-merge May 26, 2026 15:35
@3outeille 3outeille added this pull request to the merge queue May 26, 2026
Merged via the queue into huggingface:main with commit 634500b May 26, 2026
30 checks passed
vasqu added a commit that referenced this pull request May 28, 2026
* Revert "init FSDP through from_pretrained (#46102)"

This reverts commit 0588858.

* Revert "Fix FSDP2 and distributed checkpointing imports for older PyTorch versions (#46141)"

This reverts commit 634500b.

* Revert "Update cohere2_moe tp_plan (#46189)"

This reverts commit e65c3a2.

* Revert "FSDP + TP & native save/load distributed (#45028)"

This reverts commit 9ba8e85.

* fix

* they should have been deleted I think

* these are actually needed changes

* oops
yuchenxie4645 pushed a commit to yuchenxie4645/transformers that referenced this pull request May 28, 2026
…sions (huggingface#46141)

* Fix PyTorch requirement for FSDP2 to >=2.6

* Fix PyTorch requirement for distributed checkpoint saving to >=2.7

---------

Co-authored-by: Ferdinand Mom <[email protected]>
yuchenxie4645 pushed a commit to yuchenxie4645/transformers that referenced this pull request May 28, 2026
* Revert "init FSDP through from_pretrained (huggingface#46102)"

This reverts commit 0588858.

* Revert "Fix FSDP2 and distributed checkpointing imports for older PyTorch versions (huggingface#46141)"

This reverts commit 634500b.

* Revert "Update cohere2_moe tp_plan (huggingface#46189)"

This reverts commit e65c3a2.

* Revert "FSDP + TP & native save/load distributed (huggingface#45028)"

This reverts commit 9ba8e85.

* fix

* they should have been deleted I think

* these are actually needed changes

* oops
kashif pushed a commit to kashif/transformers that referenced this pull request Jun 1, 2026
…sions (huggingface#46141)

* Fix PyTorch requirement for FSDP2 to >=2.6

* Fix PyTorch requirement for distributed checkpoint saving to >=2.7

---------

Co-authored-by: Ferdinand Mom <[email protected]>
kashif pushed a commit to kashif/transformers that referenced this pull request Jun 1, 2026
* Revert "init FSDP through from_pretrained (huggingface#46102)"

This reverts commit 0588858.

* Revert "Fix FSDP2 and distributed checkpointing imports for older PyTorch versions (huggingface#46141)"

This reverts commit 634500b.

* Revert "Update cohere2_moe tp_plan (huggingface#46189)"

This reverts commit e65c3a2.

* Revert "FSDP + TP & native save/load distributed (huggingface#45028)"

This reverts commit 9ba8e85.

* fix

* they should have been deleted I think

* these are actually needed changes

* oops
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants