Skip to content

[release v2.0.1] Revisit torch._six.string_classes removal (#97737, #97789, #97863)#98055

Merged
atalman merged 3 commits intopytorch:release/2.0from
XuehaiPan:revisit-string-classes
Apr 5, 2023
Merged

[release v2.0.1] Revisit torch._six.string_classes removal (#97737, #97789, #97863)#98055
atalman merged 3 commits intopytorch:release/2.0from
XuehaiPan:revisit-string-classes

Conversation

@XuehaiPan
Copy link
Copy Markdown
Collaborator

ezhang887 and others added 3 commits March 31, 2023 10:14
…es (pytorch#97737)

Slack thread: https://pytorch.slack.com/archives/GEEQ2K4MD/p1679962409906099

I was seeing some massive (~2x) slowdowns on a job after running it on PyTorch 2.0. From some profiling in `py-spy` it looked like the pin_memory thread was doing a lot more work than before. Looking at a trace in `nsys` I saw the thread doing the forward pass having a bunch of `pthread_cond_timedwait` with GIL reacquire calls in it’s call stack, and it seemed like the thread doing the forward pass was getting blocked (waiting for the GIL) by the pin memory thread (which was holding the GIL).

After some debugging I found out the issue. If a `bytes` was passed into `pin_memory`, previously in 1.13 (before pytorch#94709) it would short-circuit and return here
https://github.com/pytorch/pytorch/blob/d922c29a22e4bf0fba49526f7536395eb8cd66f4/torch/utils/data/_utils/pin_memory.py#L54-L55
since `bytes` was in `torch._six.string_classes`:
```
>>> from torch._six import string_classes
>>> string_classes
(<class 'str'>, <class 'bytes'>)
>>>
```

However after pytorch#94709, if a `bytes` was passed into `pin_memory` it would fall into here instead
https://github.com/pytorch/pytorch/blob/c263bd43e8e8502d4726643bc6fd046f0130ac0e/torch/utils/data/_utils/pin_memory.py#L68-L73
because the previous check is now doing `isinstance(data, str)` instead of `isinstance(data, (str, bytes))`!
https://github.com/pytorch/pytorch/blob/c263bd43e8e8502d4726643bc6fd046f0130ac0e/torch/utils/data/_utils/pin_memory.py#L56-L57

As a result, `pin_memory` gets called recursively for each element in the `bytes` leading to a ton of wasted recursion. This also explains the slowdown / GIL contention I was seeing.

This PR simply changes `isinstance(data, str)` to `isinstance(data, (str, bytes))` to match the behavior before pytorch#94709

Pull Request resolved: pytorch#97737
Approved by: https://github.com/albanD, https://github.com/NivekT
Similar to pytorch#97737, a previous auto-refactor changed how `bytes` are handled during collation, which can potentially lead to performance regression. This PR undoes that.
Pull Request resolved: pytorch#97789
Approved by: https://github.com/albanD
…97863)

Revisit `torch._six.string_classes` (which is `(str, bytes)`) removal: `isinstance(obj, string_classes) -> isinstance(obj, str)`.

Both `str` and `bytes` are `Sequence` classes.

```python
In [1]: from typing import Sequence

In [2]: issubclass(bytes, Sequence)
Out[2]: True

In [3]: issubclass(str, Sequence)
Out[3]: True
```

Re-add `bytes` to type guards like:

```python
def is_seq(obj):
    return isinstance(obj, Sequence) and not isinstance(obj, (str, bytes))
```

Ref:

- pytorch#94709 (comment)
- pytorch#97737
- pytorch#97789
Pull Request resolved: pytorch#97863
Approved by: https://github.com/Skylion007, https://github.com/albanD
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 31, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/98055

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ff41020:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: onnx torch.onnx related changes that should show up in the release notes label Mar 31, 2023
@XuehaiPan
Copy link
Copy Markdown
Collaborator Author

@pytorchbot label 'topic: bug fixes'

@atalman atalman merged commit c039d2f into pytorch:release/2.0 Apr 5, 2023
@XuehaiPan XuehaiPan deleted the revisit-string-classes branch April 6, 2023 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

open source release notes: onnx torch.onnx related changes that should show up in the release notes topic: bug fixes topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants