[DataLoader] Fix collation logic by NivekT · Pull Request #97789 · pytorch/pytorch

NivekT · 2023-03-28T15:39:23Z

Stack from ghstack (oldest at bottom):

-> [DataLoader] Fix collation logic #97789

Similar to #97737, a previous auto-refactor changed how bytes are handled during collation, which can potentially lead to performance regression. This PR undoes that.

[ghstack-poisoned]

pytorch-bot · 2023-03-28T15:39:28Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/97789

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 1 Pending

As of commit bc0d041:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

albanD

Thanks!
Should we have a one off test for this?

FYI @malfet

NivekT · 2023-03-28T16:55:01Z

@pytorchbot merge -r

pytorchmergebot · 2023-03-28T16:57:42Z

@pytorchbot successfully started a rebase job. Check the current status here

Similar to #97737, a previous auto-refactor changed how `bytes` are handled during collation, which can potentially lead to performance regression. This PR undoes that. [ghstack-poisoned]

pytorchmergebot · 2023-03-28T16:58:00Z

Successfully rebased gh/nivekt/57/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/97789)

ghstack-source-id: 126f4d7 Pull Request resolved: #97789

pytorchmergebot · 2023-03-28T16:59:08Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Revisit `torch._six.string_classes` (which is `(str, bytes)`) removal: `isinstance(obj, string_classes) -> isinstance(obj, str)`. Both `str` and `bytes` are `Sequence` classes. ```python In [1]: from typing import Sequence In [2]: issubclass(bytes, Sequence) Out[2]: True In [3]: issubclass(str, Sequence) Out[3]: True ``` Re-add `bytes` to type guards like: ```python def is_seq(obj): return isinstance(obj, Sequence) and not isinstance(obj, (str, bytes)) ``` Ref: - #94709 (comment) - #97737 - #97789 Pull Request resolved: #97863 Approved by: https://github.com/Skylion007, https://github.com/albanD

Similar to pytorch#97737, a previous auto-refactor changed how `bytes` are handled during collation, which can potentially lead to performance regression. This PR undoes that. Pull Request resolved: pytorch#97789 Approved by: https://github.com/albanD

…97863) Revisit `torch._six.string_classes` (which is `(str, bytes)`) removal: `isinstance(obj, string_classes) -> isinstance(obj, str)`. Both `str` and `bytes` are `Sequence` classes. ```python In [1]: from typing import Sequence In [2]: issubclass(bytes, Sequence) Out[2]: True In [3]: issubclass(str, Sequence) Out[3]: True ``` Re-add `bytes` to type guards like: ```python def is_seq(obj): return isinstance(obj, Sequence) and not isinstance(obj, (str, bytes)) ``` Ref: - pytorch#94709 (comment) - pytorch#97737 - pytorch#97789 Pull Request resolved: pytorch#97863 Approved by: https://github.com/Skylion007, https://github.com/albanD

Similar to #97737, a previous auto-refactor changed how `bytes` are handled during collation, which can potentially lead to performance regression. This PR undoes that. Pull Request resolved: #97789 Approved by: https://github.com/albanD

…97789, #97863) (#98055) * [DataLoader] Short circuit pin_memory recursion when operating on bytes (#97737) Slack thread: https://pytorch.slack.com/archives/GEEQ2K4MD/p1679962409906099 I was seeing some massive (~2x) slowdowns on a job after running it on PyTorch 2.0. From some profiling in `py-spy` it looked like the pin_memory thread was doing a lot more work than before. Looking at a trace in `nsys` I saw the thread doing the forward pass having a bunch of `pthread_cond_timedwait` with GIL reacquire calls in it’s call stack, and it seemed like the thread doing the forward pass was getting blocked (waiting for the GIL) by the pin memory thread (which was holding the GIL). After some debugging I found out the issue. If a `bytes` was passed into `pin_memory`, previously in 1.13 (before #94709) it would short-circuit and return here https://github.com/pytorch/pytorch/blob/d922c29a22e4bf0fba49526f7536395eb8cd66f4/torch/utils/data/_utils/pin_memory.py#L54-L55 since `bytes` was in `torch._six.string_classes`: ``` >>> from torch._six import string_classes >>> string_classes (<class 'str'>, <class 'bytes'>) >>> ``` However after #94709, if a `bytes` was passed into `pin_memory` it would fall into here instead https://github.com/pytorch/pytorch/blob/c263bd43e8e8502d4726643bc6fd046f0130ac0e/torch/utils/data/_utils/pin_memory.py#L68-L73 because the previous check is now doing `isinstance(data, str)` instead of `isinstance(data, (str, bytes))`! https://github.com/pytorch/pytorch/blob/c263bd43e8e8502d4726643bc6fd046f0130ac0e/torch/utils/data/_utils/pin_memory.py#L56-L57 As a result, `pin_memory` gets called recursively for each element in the `bytes` leading to a ton of wasted recursion. This also explains the slowdown / GIL contention I was seeing. This PR simply changes `isinstance(data, str)` to `isinstance(data, (str, bytes))` to match the behavior before #94709 Pull Request resolved: #97737 Approved by: https://github.com/albanD, https://github.com/NivekT * [DataLoader] Fix collation logic (#97789) Similar to #97737, a previous auto-refactor changed how `bytes` are handled during collation, which can potentially lead to performance regression. This PR undoes that. Pull Request resolved: #97789 Approved by: https://github.com/albanD * Revisit `torch._six.string_classes` removal (#94709) (#97863) Revisit `torch._six.string_classes` (which is `(str, bytes)`) removal: `isinstance(obj, string_classes) -> isinstance(obj, str)`. Both `str` and `bytes` are `Sequence` classes. ```python In [1]: from typing import Sequence In [2]: issubclass(bytes, Sequence) Out[2]: True In [3]: issubclass(str, Sequence) Out[3]: True ``` Re-add `bytes` to type guards like: ```python def is_seq(obj): return isinstance(obj, Sequence) and not isinstance(obj, (str, bytes)) ``` Ref: - #94709 (comment) - #97737 - #97789 Pull Request resolved: #97863 Approved by: https://github.com/Skylion007, https://github.com/albanD --------- Co-authored-by: Eric Zhang <[email protected]> Co-authored-by: Kevin Tse <[email protected]>

[DataLoader] Fix collation logic

9feaaca

[ghstack-poisoned]

pytorch-bot bot added the release notes: dataloader release notes category label Mar 28, 2023

NivekT requested a review from albanD March 28, 2023 15:40

NivekT added the topic: improvements topic category label Mar 28, 2023

albanD approved these changes Mar 28, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 28, 2023

albanD mentioned this pull request Mar 28, 2023

[BE] Remove dependency on six and future #94709

Closed

NivekT added this to the 2.0.1 milestone Mar 28, 2023

Update on "[DataLoader] Fix collation logic"

bc0d041

Similar to #97737, a previous auto-refactor changed how `bytes` are handled during collation, which can potentially lead to performance regression. This PR undoes that. [ghstack-poisoned]

pytorchmergebot pushed a commit that referenced this pull request Mar 28, 2023

[DataLoader] Fix collation logic

45ce310

ghstack-source-id: 126f4d7 Pull Request resolved: #97789

pytorchmergebot added the Merged label Mar 28, 2023

pytorchmergebot closed this in bb42104 Mar 28, 2023

XuehaiPan mentioned this pull request Mar 29, 2023

Revisit torch._six.string_classes removal (#94709) #97863

Closed

This was referenced Mar 31, 2023

[release v2.0.1] Revisit torch._six.string_classes removal (#97737, #97789, #97863) #98055

Merged

[v2.0.1] Release Tracker #97272

Closed

NivekT mentioned this pull request Apr 4, 2023

[v2.0.1][DataLoader] Fix collation logic #98348

Closed

jbschlosser mentioned this pull request Apr 17, 2023

default_collate inconsistently structured data #99227

Closed

facebook-github-bot deleted the gh/nivekt/57/head branch June 8, 2023 18:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DataLoader] Fix collation logic#97789

[DataLoader] Fix collation logic#97789
NivekT wants to merge 2 commits intogh/nivekt/57/basefrom
gh/nivekt/57/head

NivekT commented Mar 28, 2023 •

edited by pytorchmergebot

Loading

Uh oh!

pytorch-bot bot commented Mar 28, 2023 •

edited

Loading

Uh oh!

albanD left a comment

Uh oh!

NivekT commented Mar 28, 2023

Uh oh!

pytorchmergebot commented Mar 28, 2023

Uh oh!

pytorchmergebot commented Mar 28, 2023

Uh oh!

pytorchmergebot commented Mar 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

NivekT commented Mar 28, 2023 • edited by pytorchmergebot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/97789

⏳ No Failures, 1 Pending

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

NivekT commented Mar 28, 2023

Uh oh!

pytorchmergebot commented Mar 28, 2023

Uh oh!

pytorchmergebot commented Mar 28, 2023

Uh oh!

pytorchmergebot commented Mar 28, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NivekT commented Mar 28, 2023 •

edited by pytorchmergebot

Loading

pytorch-bot bot commented Mar 28, 2023 •

edited

Loading