PyTorch Data Sampler benchmark #156974

divyanshk · 2025-06-26T17:55:32Z

Motivation

Many PRs optimizing samplers (for eg #147706, #137423) are leveraging an adhoc script for benchmarking samplers. The script and outputs are often copied over in PRs. We want to begin centralizing benchmarks for torch.utils.data components.

What ?

This PR adds a new sub-folder in benchmarks for data. This is aimed to cover benchmarking scripts for torch.utils.data components like dataloader and sampler.
Specifically, this PR includes a simple script to time samplers. This is often "copy-pasted" in PRs optimizing samplers. Having it in a centralized location should prevent that, and allow a common standard.

Output

Benchmark Results:
+--------------+-------------+----------------+-----------+-----------+
|   Batch Size | Drop Last   |   Original (s) |   New (s) | Speedup   |
+==============+=============+================+===========+===========+
|            4 | True        |         0.004  |    0.0088 | -119.62%  |
+--------------+-------------+----------------+-----------+-----------+
|            4 | False       |         0.0083 |    0.009  | -9.23%    |
+--------------+-------------+----------------+-----------+-----------+
|            8 | True        |         0.003  |    0.0074 | -147.64%  |
+--------------+-------------+----------------+-----------+-----------+
|            8 | False       |         0.0054 |    0.0075 | -38.72%   |
+--------------+-------------+----------------+-----------+-----------+
|           64 | True        |         0.0021 |    0.0056 | -161.92%  |
+--------------+-------------+----------------+-----------+-----------+
|           64 | False       |         0.0029 |    0.0055 | -92.50%   |
+--------------+-------------+----------------+-----------+-----------+
|          640 | True        |         0.002  |    0.0055 | -168.75%  |
+--------------+-------------+----------------+-----------+-----------+
|          640 | False       |         0.0024 |    0.0062 | -161.35%  |
+--------------+-------------+----------------+-----------+-----------+
|         6400 | True        |         0.0021 |    0.0055 | -160.13%  |
+--------------+-------------+----------------+-----------+-----------+
|         6400 | False       |         0.0021 |    0.0068 | -215.46%  |
+--------------+-------------+----------------+-----------+-----------+
|        64000 | True        |         0.0042 |    0.0065 | -55.29%   |
+--------------+-------------+----------------+-----------+-----------+
|        64000 | False       |         0.0029 |    0.0077 | -169.56%  |
+--------------+-------------+----------------+-----------+-----------+

pytorch-bot · 2025-06-26T17:55:36Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/156974

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

VolumeLimitExceeded Issue for linux.2xlarge and linux.4xlarge

⏳ 1 Pending, 2 Unrelated Failures

As of commit a36401b with merge base d061a02 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / linux-jammy-py3.13-clang12 / test (dynamo_wrapped, 2, 3, linux.2xlarge) (gh) (similar failure)
'test/test_reductions.py::TestReductionsCPU::test_sum_all_cpu_float64'

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu, unstable) (gh) (#153987)
MISSING REGRESSION TEST

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2025-06-26T17:55:37Z

The committers listed above are authorized under a signed CLA.

✅ login: divyanshk / name: Divyansh Khanna (a36401b, 2d84a54, b21723e, 3efcb12, c392b4f, 3bd8193)

divyanshk · 2025-06-26T18:05:16Z

/easycla

divyanshk · 2025-06-26T18:11:39Z

/easycla :-)

divyanshk · 2025-06-26T23:04:45Z

The CI error seem to be fixed in a recent PR#157010.

ramanishsingh · 2025-06-26T23:16:43Z

Thanks for adding this benchmarking script. :)

divyanshk · 2025-06-26T23:50:16Z

@pytorchbot merge

pytorchmergebot · 2025-06-26T23:51:59Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

divyanshk and others added 5 commits June 23, 2025 15:00

torch.utils.data benchmarking

2d84a54

basic benchmark for samplers

3bd8193

Add note to samplers.py

c392b4f

update README.md and clean script

3efcb12

update benchmark README.md

b21723e

pytorch-bot bot added the release notes: dataloader release notes category label Jun 26, 2025

fix linter issues

a36401b

divyanshk marked this pull request as ready for review June 26, 2025 23:03

divyanshk requested a review from ramanishsingh as a code owner June 26, 2025 23:03

divyanshk requested a review from scotts June 26, 2025 23:03

ramanishsingh approved these changes Jun 26, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 26, 2025

pytorchmergebot added the merging label Jun 26, 2025

pytorchmergebot added the Merged label Jun 27, 2025

pytorchmergebot closed this in e6d8ed0 Jun 27, 2025

pytorchmergebot removed the merging label Jun 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PyTorch Data Sampler benchmark #156974

PyTorch Data Sampler benchmark #156974

Uh oh!

divyanshk commented Jun 26, 2025

Uh oh!

pytorch-bot bot commented Jun 26, 2025 •

edited

Loading

Uh oh!

linux-foundation-easycla bot commented Jun 26, 2025 •

edited

Loading

Uh oh!

divyanshk commented Jun 26, 2025

Uh oh!

divyanshk commented Jun 26, 2025

Uh oh!

divyanshk commented Jun 26, 2025

Uh oh!

ramanishsingh commented Jun 26, 2025

Uh oh!

divyanshk commented Jun 26, 2025

Uh oh!

pytorchmergebot commented Jun 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PyTorch Data Sampler benchmark #156974

PyTorch Data Sampler benchmark #156974

Uh oh!

Conversation

divyanshk commented Jun 26, 2025

Motivation

What ?

Output

Uh oh!

pytorch-bot bot commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/156974

❗ 1 Active SEVs

⏳ 1 Pending, 2 Unrelated Failures

Uh oh!

linux-foundation-easycla bot commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

divyanshk commented Jun 26, 2025

Uh oh!

divyanshk commented Jun 26, 2025

Uh oh!

divyanshk commented Jun 26, 2025

Uh oh!

ramanishsingh commented Jun 26, 2025

Uh oh!

divyanshk commented Jun 26, 2025

Uh oh!

pytorchmergebot commented Jun 26, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot bot commented Jun 26, 2025 •

edited

Loading

linux-foundation-easycla bot commented Jun 26, 2025 •

edited

Loading