Skip to content

Conversation

@didicout
Copy link

@didicout didicout commented May 6, 2022

Fixes #76950

Improve the performance of iteration on BatchSampler , especially when batch_size is big.

Python 3.6.8:

  batch_size  drop_last      origin(avg)    origin(std)   new(avg)   new(std)  speedup
------------  -----------  -------------  -------------  ---------  ---------  -------
           4  True                 0.026      0.0001183      0.032  0.0004839  -18.07%
           4  False                0.026      2.326e-05      0.022  3.317e-05  15.92% 
           8  True                 0.025      0.0001441      0.023  0.0001204  9.43%
           8  False                0.025      4.457e-05      0.019  3.951e-05  30.90% 
          64  True                 0.024      0.0005505      0.015  0.0002987  54.99%
          64  False                0.023      8.014e-05      0.016  1.999e-05  49.64% 
         640  True                 0.024      0.0005598      0.015  2.327e-05  66.26%
         640  False                0.024      2.229e-05      0.016  2.358e-05  48.32% 
        6400  True                 0.025      0.0005789      0.015  2.935e-05  69.06%
        6400  False                0.025      2.529e-05      0.017  6.737e-05  45.17% 

Python 3.8.12:

  batch_size  drop_last      origin(avg)    origin(std)   new(avg)   new(std)  speedup
------------  -----------  -------------  -------------  ---------  ---------  -------
           4  True                 0.017      0.0001389      0.019  3.026e-05  -10.50%
           4  False                0.017      9.909e-06      0.017  2.37e-05   -0.78% 
           8  True                 0.016      1.251e-05      0.013  2.748e-05  24.40%
           8  False                0.016      5.523e-05      0.015  1.156e-05  10.20% 
          64  True                 0.015      2.483e-05      0.008  1.859e-05  90.96%
          64  False                0.015      2.622e-05      0.012  3.227e-05  26.09% 
         640  True                 0.016      6.405e-05      0.007  1.836e-05  112.88%
         640  False                0.016      1.284e-05      0.013  0.0004079  20.09% 
        6400  True                 0.016      8.298e-05      0.008  1.507e-05  111.80%
        6400  False                0.016      1.716e-05      0.014  0.0004602  18.37% 

Check the issue page for more details of the tests.

@facebook-github-bot
Copy link
Contributor

Hi @didicout!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented May 6, 2022

🔗 Helpful links

❌ 1 New Failures

As of commit 4844bb5 (more details on the Dr. CI page):

Expand to see more
  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See GitHub Actions build pull / linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge) (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-10T14:50:31.5079066Z The PR is introduc...m to confirm whether this change is wanted or not.
2022-05-10T14:50:31.5063431Z processing existing schema:  text(__torch__.torch.classes.profiling.SourceRef _0) -> (str _0)
2022-05-10T14:50:31.5065070Z processing existing schema:  count(__torch__.torch.classes.profiling.InstructionStats _0) -> (int _0)
2022-05-10T14:50:31.5066320Z processing existing schema:  duration_ns(__torch__.torch.classes.profiling.InstructionStats _0) -> (int _0)
2022-05-10T14:50:31.5067965Z processing existing schema:  source(__torch__.torch.classes.profiling.SourceStats _0) -> (__torch__.torch.classes.profiling.SourceRef _0)
2022-05-10T14:50:31.5069911Z processing existing schema:  line_map(__torch__.torch.classes.profiling.SourceStats _0) -> (Dict(int, __torch__.torch.classes.profiling.InstructionStats) _0)
2022-05-10T14:50:31.5071101Z processing existing schema:  __init__(__torch__.torch.classes.profiling._ScriptProfile _0) -> (NoneType _0)
2022-05-10T14:50:31.5072661Z processing existing schema:  enable(__torch__.torch.classes.profiling._ScriptProfile _0) -> (NoneType _0)
2022-05-10T14:50:31.5073876Z processing existing schema:  disable(__torch__.torch.classes.profiling._ScriptProfile _0) -> (NoneType _0)
2022-05-10T14:50:31.5075883Z processing existing schema:  _dump_stats(__torch__.torch.classes.profiling._ScriptProfile _0) -> (__torch__.torch.classes.profiling.SourceStats[] _0)
2022-05-10T14:50:31.5078593Z processing existing schema:  __init__(__torch__.torch.classes.dist_rpc.WorkerInfo _0, str _1, int _2) -> (NoneType _0)
2022-05-10T14:50:31.5079066Z The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not. 
2022-05-10T14:50:31.5079351Z 
2022-05-10T14:50:31.5079428Z Broken ops: [
2022-05-10T14:50:31.5079873Z 	aten::set_.source_Tensor_storage_offset(Tensor(a!) self, Tensor source, int storage_offset, int[] size, int[] stride=[]) -> (Tensor(a!))
2022-05-10T14:50:31.5080485Z 	aten::linalg_vander(Tensor x, *, int? N=None) -> (Tensor)
2022-05-10T14:50:31.5080709Z ]
2022-05-10T14:50:31.6152653Z + cleanup
2022-05-10T14:50:31.6153010Z + retcode=1
2022-05-10T14:50:31.6153321Z + set +x
2022-05-10T14:50:31.6185610Z ##[error]Process completed with exit code 1.
2022-05-10T14:50:31.6227907Z ##[group]Run pytorch/pytorch/.github/actions/get-workflow-job-id@master

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@soulitzer soulitzer requested review from VitalyFedyunin and ejguan May 9, 2022 04:40
@soulitzer soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 9, 2022
@ejguan ejguan self-assigned this May 9, 2022
Copy link
Contributor

@ejguan ejguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for proactively trying to make batchsampler better. Do you mind listing the performance gain in the PR summary with your testing script? Besides, I also commented about adding a few tests with batchsize 4 or 8 in the issue.

@didicout
Copy link
Author

didicout commented May 9, 2022

Thank you for your reply. I have posted some new test results in the issue. I will list the performance in the PR summary if there's no problem.

Copy link
Contributor

@ejguan ejguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ping me when you have PR summary ready! Really appreciate your work.

@didicout
Copy link
Author

@ejguan Done. :)

@ejguan ejguan added module: dataloader Related to torch.utils.data.DataLoader and Sampler release notes: dataloader release notes category topic: improvements topic category labels May 10, 2022
@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@ejguan ejguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ejguan
Copy link
Contributor

ejguan commented May 10, 2022

@pytorchbot merge this please

@ejguan
Copy link
Contributor

ejguan commented May 10, 2022

Do you mind also adding std to the table in summary? Thank you

@didicout
Copy link
Author

Do you mind also adding std to the table in summary? Thank you

Done. :)

facebook-github-bot pushed a commit that referenced this pull request May 13, 2022
Summary:
Fixes #76950

Improve the performance of iteration on `BatchSampler` , especially when `batch_size` is big.

Python 3.6.8:
```
  batch_size  drop_last     speedup
------------  -----------   -------
           4  True          -18.07%
           4  False         15.92%
           8  True          9.43%
           8  False         30.90%
          64  True          54.99%
          64  False         49.64%
         640  True          66.26%
         640  False         48.32%
        6400  True          69.06%
        6400  False         45.17%
```

Python 3.8.12:
```
  batch_size  drop_last    speedup
------------  -----------  --------
           4  True         -10.50%
           4  False        -0.78%
           8  True         24.40%
           8  False        10.20%
          64  True         90.96%
          64  False        26.09%
         640  True         112.88%
         640  False        20.09%
        6400  True         111.80%
        6400  False        18.37%

```

Check the issue page for more details of the tests.

Pull Request resolved: #76951
Approved by: https://github.com/ejguan

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/d22d749a0e8436829e564796bf6cd67847199453

Reviewed By: samdow

Differential Revision: D36282565

Pulled By: samdow

fbshipit-source-id: cbdf554f5a9941a892e10b651c0dc709e25c7a87
pytorchmergebot pushed a commit that referenced this pull request Oct 13, 2024
Builds upon #76951.

Benchmarking code is the same as in #76950.

AMD Ryzen Threadripper PRO 3995WX:
```
  batch_size  drop_last      origin     new  speedup
------------  -----------  --------  ------  ---------
           4  True           0.94    0.5706  64.74%
           4  False          0.9745  0.9468  2.93%
           8  True           0.7423  0.3715  99.82%
           8  False          0.7974  0.5666  40.73%
          64  True           0.5394  0.2085  158.76%
          64  False          0.6083  0.2697  125.51%
         640  True           0.5448  0.1985  174.41%
         640  False          0.7085  0.2308  206.91%
        6400  True           0.5554  0.2028  173.88%
        6400  False          0.7711  0.2109  265.60%
       64000  True           0.556   0.2091  165.82%
       64000  False          0.7803  0.2078  275.58%
```

When `drop_last == True`, it uses `zip` to speed things up.
When `drop_last == False`, it uses `itertools` to speed things up.

`itertools` was the fastest way I could find that deals with the last batch if it is smaller than `batch_size`. I have a pure python method too, but it is slower when `batch_size` is 4 or 8, so I have committed the `itertools` version for now.

Happy to chat further about this change :-) I understand you may not want to introduce the `itertools` package into [sampler.py](https://github.com/pytorch/pytorch/blob/main/torch/utils/data/sampler.py).
Pull Request resolved: #137423
Approved by: https://github.com/Skylion007
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged module: dataloader Related to torch.utils.data.DataLoader and Sampler open source release notes: dataloader release notes category topic: improvements topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Faster BatchSampler

6 participants