Skip to content

feat: Allow DistributedSampler to accept Sampler as input#171597

Open
ankitlade12 wants to merge 2 commits intopytorch:mainfrom
ankitlade12:feature/distributed-sampler-accepts-sampler
Open

feat: Allow DistributedSampler to accept Sampler as input#171597
ankitlade12 wants to merge 2 commits intopytorch:mainfrom
ankitlade12:feature/distributed-sampler-accepts-sampler

Conversation

@ankitlade12
Copy link
Copy Markdown

@ankitlade12 ankitlade12 commented Jan 1, 2026

Fixes #23430

This change enables composition of sampling strategies in distributed training by allowing DistributedSampler to wrap another Sampler.

Changes:

  • Add Union[Dataset, Sampler] type hint to init
  • Add _input_is_sampler flag for runtime detection
  • Update iter to use wrapped sampler's indices when applicable
  • Update docstring with new usage examples
  • Add 7 comprehensive unit tests covering various sampler types

Example usage:
base_sampler = WeightedRandomSampler(weights, num_samples)
sampler = DistributedSampler(base_sampler, shuffle=False)

Fixes pytorch#23430

This change enables composition of sampling strategies in distributed
training by allowing DistributedSampler to wrap another Sampler.

Changes:
- Add Union[Dataset, Sampler] type hint to __init__
- Add _input_is_sampler flag for runtime detection
- Update __iter__ to use wrapped sampler's indices when applicable
- Update docstring with new usage examples
- Add 7 comprehensive unit tests covering various sampler types

Example usage:
    base_sampler = WeightedRandomSampler(weights, num_samples)
    sampler = DistributedSampler(base_sampler, shuffle=False)
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Jan 1, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/171597

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b677028 with merge base b35a75b (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: dataloader release notes category label Jan 1, 2026
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla bot commented Jan 1, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@vadimkantorov
Copy link
Copy Markdown
Contributor

vadimkantorov commented Jan 1, 2026

Some design concerns from the original issue:

…mpler

- Propagate set_epoch (or set_seed) to child samplers
- Add split_contiguous parameter to allow contiguous index blocks
- Add unit tests for both new features
@ankitlade12
Copy link
Copy Markdown
Author

Some design concerns from the original issue:

@ankitlade12
Copy link
Copy Markdown
Author

I've addressed both concerns in this implementation:

  1. set_epoch propagation: I've updated the code so that calling DistributedSampler.set_epoch will now automatically call .set_epoch() (or .set_seed()) on the wrapped sampler if those methods exist. This keeps the child sampler in sync.

  2. Splitting strategies: I added a new split_contiguous parameter.

  • By default, it uses interleaved chunks (same as before).
  • If set to True, it uses contiguous chunks. This solves the issue where sorted datasets could cause unbalanced workloads across GPUs.

I've added 10 tests to verify that both the epoch propagation and the splitting strategies work as expected.

@janeyx99 janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 8, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 9, 2026

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Mar 9, 2026
@ankitlade12
Copy link
Copy Markdown
Author

Hey @divyanshk @ramanishsingh @aelavender — just wanted to bump this PR. It was recently marked as Stale, but it's ready for review. I've addressed the design concerns raised by @vadimkantorov (set_epoch propagation and contiguous vs. interleaved splitting) and added tests for both.

Would really appreciate it if you could take a look when you get a chance. Happy to make any changes based on your feedback. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

open source release notes: dataloader release notes category Stale triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature request] Let DistributedSampler take a Sampler as input

4 participants