Skip to content

Conversation

@mikaylagawarecki
Copy link
Contributor

@mikaylagawarecki mikaylagawarecki commented Jun 21, 2024

Since we use DEFAULT_PROTOCOL=2, some functions/classes that were renamed from python 2-->3 will be pickled with their python2 name. This PR ensures that when a mod GLOBAL <python2_mod>.<python2_name> is encountered, following the strategy used by pickle it is properly mapped to <python3_mod>.<python3_name>.

This fix ensures that add_safe_globals works properly for such functions/classes (i.e. users will allowlist the python3 func and the weights_only unpickler will do the appropriate translation when checking whether a class was allowlisted).

An example is as follows:
__builtin__ was named to builtins, see the release notes for Python 3.0

Renamed module __builtin__ to builtins (removing the underscores, adding an ‘s’). The builtins variable found in most global namespaces is unchanged. To modify a builtin, you should use builtins, not __builtins__!

However, since we use DEFAULT_PROTOCOL=2, builtins will be pickled with their module string as __builtin__.

>>> import pickle
>>> import pickletools
>>> print.__module__
'builtins'
>>> with open('print.pkl', 'wb') as f:
>>>      pickle.dump(print, f, protocol=2) # 2 because this is the default protocol used by pytorch
>>> with open('print.pkl', 'rb') as f:
>>>     pickletools.dis(f)
0: \x80 PROTO      2
2: c    GLOBAL     '__builtin__ print' # pickle saves the module string as __builtin__ !!! :(
21: q    BINPUT     0
23: .    STOP

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Jun 21, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129244

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit afb4496 with merge base 0acd09a (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]
[ghstack-poisoned]
@mikaylagawarecki mikaylagawarecki requested a review from albanD June 21, 2024 21:16
@mikaylagawarecki mikaylagawarecki marked this pull request as ready for review June 21, 2024 21:16
[ghstack-poisoned]
# weights = torch.load(buf, weights_only = True)

import functools as _functools
from _compat_pickle import IMPORT_MAPPING, NAME_MAPPING
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a comment that this is privte cpython.
try/catch to handle when it's missing (nice error message)
And make it is available for all versions we use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid warning on import torch, I'm warning only when Unpickler.load is called, does this sound reasonable to you?

[ghstack-poisoned]
@mikaylagawarecki mikaylagawarecki requested a review from albanD June 24, 2024 20:34
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@mikaylagawarecki mikaylagawarecki added release notes: python_frontend python frontend release notes category topic: improvements topic category topic: bug fixes topic category and removed topic: improvements topic category labels Jun 24, 2024
pytorchmergebot pushed a commit that referenced this pull request Jun 25, 2024
…n.add_safe_globals (#129251)

Previously, allowlisting functions/classes via `torch.serialization.add_safe_globals(obj)` for the `weights_only` Unpickler had the following effect:

- For a [`GLOBAL`](https://github.com/python/cpython/blob/3.12/Lib/pickletools.py#L1926-L1939) instruction, `GLOBAL obj.__module__ obj.__name__` would be allowed and translated back to obj to be pushed back to the stack.
- For a [`REDUCE`](https://github.com/python/cpython/blob/3.12/Lib/pickletools.py#L1926-L1982) instruction where we expect the stack to contain `func` and `args`, `func` is allowed if it was added via `add_safe_globals`

However, it did not have an effect on `BUILD` and `NEWOBJ` instructions

Some classes may be rebuilt via [`NEWOBJ`](https://github.com/python/cpython/blob/3.12/Lib/pickletools.py#L2091-L2104) instruction, which indicates that their constructor should be used to rebuild the class.

Further, a [`BUILD`](https://github.com/python/cpython/blob/3.12/Lib/pickletools.py#L1984-L2007) instruction might be used if an object's `__reduce__`/`__reduce_ex__` returns a non-None value for `state`. Which indicates a `__setstate__` or `__dict__.update`.

**This PR makes sure that adding objects to the allowlist will also allow `NEWOBJ` and `BUILD` instructions for them.**

In particular, the update for `NEWOBJ` should unblock allowlisting of [`ScaledMMConfig`](https://github.com/pytorch-labs/float8_experimental/blob/d4ade877dff327ea7f51e91f7cc218ae956e8cfd/float8_experimental/float8_tensor.py#L26-L30) in float8_experimental @drisspg

Pull Request resolved: #129251
Approved by: https://github.com/albanD
ghstack dependencies: #129244
pytorchmergebot pushed a commit that referenced this pull request Jun 25, 2024
Also changes default for `weights_only` to `None` per comment below (hence the `suppress-bc-linter` tag)

Pull Request resolved: #129239
Approved by: https://github.com/albanD
ghstack dependencies: #129244, #129251
mikaylagawarecki added a commit to mikaylagawarecki/pytorch that referenced this pull request Jun 25, 2024
ghstack-source-id: de329c7
Pull Request resolved: pytorch#129244

(cherry picked from commit cc99c01)
atalman pushed a commit that referenced this pull request Jun 26, 2024
* Fix allowlisting of builtins for weights_only unpickler

ghstack-source-id: de329c7
Pull Request resolved: #129244

(cherry picked from commit cc99c01)

* Allow NEWOBJ instruction for items added via torch.serialization.add_safe_globals

ghstack-source-id: 34a8fc3
Pull Request resolved: #129251

(cherry picked from commit 50b888d)

* Add warning for weights_only

ghstack-source-id: ffa772c
Pull Request resolved: #129239

(cherry picked from commit b3f9aa3f8f4c03b40fed53423d4a0a9340e3bd09)

* Add example for torch.serialization.add_safe_globals

ghstack-source-id: 6dc3275
Pull Request resolved: #129396

(cherry picked from commit ed8c36eda0f4dcf7b1d9c5eb2fb1cdccdf3fee6e)
atalman added a commit that referenced this pull request Jun 26, 2024
atalman added a commit that referenced this pull request Jun 26, 2024
…4" (#129571)

Revert "Cherry pick #129244, #129251, #129239, 129396 into release/2.4 (#129478)"

This reverts commit 22a4d46.
mikaylagawarecki added a commit to mikaylagawarecki/pytorch that referenced this pull request Jun 26, 2024
Since we use [`DEFAULT_PROTOCOL=2`](https://github.com/pytorch/pytorch/blob/main/torch/serialization.py#L62), some functions/classes that were renamed from python 2-->3 will be pickled with their python2 name. This PR ensures that when a mod `GLOBAL <python2_mod>.<python2_name> ` is encountered, [following the strategy used by pickle](https://github.com/python/cpython/blob/main/Lib/pickle.py#L1590C13-L1593C63) it is properly mapped to `<python3_mod>.<python3_name>`.

This fix ensures that `add_safe_globals` works properly for such functions/classes (i.e. users will allowlist the python3 func and the weights_only unpickler will do the appropriate translation when checking whether a class was allowlisted).

An example is as follows:
`__builtin__` was named to `builtins`, see the [release notes for Python 3.0](https://docs.python.org/3/whatsnew/3.0.html)

> Renamed module `__builtin__` to [`builtins`](https://docs.python.org/3/library/builtins.html#module-builtins) (removing the underscores, adding an ‘s’). The __builtins__ variable found in most global namespaces is unchanged. To modify a builtin, you should use [builtins](https://docs.python.org/3/library/builtins.html#module-builtins), not `__builtins__`!

However, since we use [`DEFAULT_PROTOCOL=2`](https://github.com/pytorch/pytorch/blob/main/torch/serialization.py#L62), builtins will be pickled with their module string as `__builtin__`.

```python
>>> import pickle
>>> import pickletools
>>> print.__module__
'builtins'
>>> with open('print.pkl', 'wb') as f:
>>>      pickle.dump(print, f, protocol=2) # 2 because this is the default protocol used by pytorch
>>> with open('print.pkl', 'rb') as f:
>>>     pickletools.dis(f)
0: \x80 PROTO      2
2: c    GLOBAL     '__builtin__ print' # pickle saves the module string as __builtin__ !!! :(
21: q    BINPUT     0
23: .    STOP
```

Pull Request resolved: pytorch#129244
Approved by: https://github.com/albanD
mikaylagawarecki added a commit to mikaylagawarecki/pytorch that referenced this pull request Jun 26, 2024
…n.add_safe_globals (pytorch#129251)

Previously, allowlisting functions/classes via `torch.serialization.add_safe_globals(obj)` for the `weights_only` Unpickler had the following effect:

- For a [`GLOBAL`](https://github.com/python/cpython/blob/3.12/Lib/pickletools.py#L1926-L1939) instruction, `GLOBAL obj.__module__ obj.__name__` would be allowed and translated back to obj to be pushed back to the stack.
- For a [`REDUCE`](https://github.com/python/cpython/blob/3.12/Lib/pickletools.py#L1926-L1982) instruction where we expect the stack to contain `func` and `args`, `func` is allowed if it was added via `add_safe_globals`

However, it did not have an effect on `BUILD` and `NEWOBJ` instructions

Some classes may be rebuilt via [`NEWOBJ`](https://github.com/python/cpython/blob/3.12/Lib/pickletools.py#L2091-L2104) instruction, which indicates that their constructor should be used to rebuild the class.

Further, a [`BUILD`](https://github.com/python/cpython/blob/3.12/Lib/pickletools.py#L1984-L2007) instruction might be used if an object's `__reduce__`/`__reduce_ex__` returns a non-None value for `state`. Which indicates a `__setstate__` or `__dict__.update`.

**This PR makes sure that adding objects to the allowlist will also allow `NEWOBJ` and `BUILD` instructions for them.**

In particular, the update for `NEWOBJ` should unblock allowlisting of [`ScaledMMConfig`](https://github.com/pytorch-labs/float8_experimental/blob/d4ade877dff327ea7f51e91f7cc218ae956e8cfd/float8_experimental/float8_tensor.py#L26-L30) in float8_experimental @drisspg

Pull Request resolved: pytorch#129251
Approved by: https://github.com/albanD
ghstack dependencies: pytorch#129244
atalman pushed a commit that referenced this pull request Jun 27, 2024
* Fix allowlisting of builtins for weights_only unpickler (#129244)

Since we use [`DEFAULT_PROTOCOL=2`](https://github.com/pytorch/pytorch/blob/main/torch/serialization.py#L62), some functions/classes that were renamed from python 2-->3 will be pickled with their python2 name. This PR ensures that when a mod `GLOBAL <python2_mod>.<python2_name> ` is encountered, [following the strategy used by pickle](https://github.com/python/cpython/blob/main/Lib/pickle.py#L1590C13-L1593C63) it is properly mapped to `<python3_mod>.<python3_name>`.

This fix ensures that `add_safe_globals` works properly for such functions/classes (i.e. users will allowlist the python3 func and the weights_only unpickler will do the appropriate translation when checking whether a class was allowlisted).

An example is as follows:
`__builtin__` was named to `builtins`, see the [release notes for Python 3.0](https://docs.python.org/3/whatsnew/3.0.html)

> Renamed module `__builtin__` to [`builtins`](https://docs.python.org/3/library/builtins.html#module-builtins) (removing the underscores, adding an ‘s’). The __builtins__ variable found in most global namespaces is unchanged. To modify a builtin, you should use [builtins](https://docs.python.org/3/library/builtins.html#module-builtins), not `__builtins__`!

However, since we use [`DEFAULT_PROTOCOL=2`](https://github.com/pytorch/pytorch/blob/main/torch/serialization.py#L62), builtins will be pickled with their module string as `__builtin__`.

```python
>>> import pickle
>>> import pickletools
>>> print.__module__
'builtins'
>>> with open('print.pkl', 'wb') as f:
>>>      pickle.dump(print, f, protocol=2) # 2 because this is the default protocol used by pytorch
>>> with open('print.pkl', 'rb') as f:
>>>     pickletools.dis(f)
0: \x80 PROTO      2
2: c    GLOBAL     '__builtin__ print' # pickle saves the module string as __builtin__ !!! :(
21: q    BINPUT     0
23: .    STOP
```

Pull Request resolved: #129244
Approved by: https://github.com/albanD

* Allow BUILD/NEWOBJ instruction for items added via torch.serialization.add_safe_globals (#129251)

Previously, allowlisting functions/classes via `torch.serialization.add_safe_globals(obj)` for the `weights_only` Unpickler had the following effect:

- For a [`GLOBAL`](https://github.com/python/cpython/blob/3.12/Lib/pickletools.py#L1926-L1939) instruction, `GLOBAL obj.__module__ obj.__name__` would be allowed and translated back to obj to be pushed back to the stack.
- For a [`REDUCE`](https://github.com/python/cpython/blob/3.12/Lib/pickletools.py#L1926-L1982) instruction where we expect the stack to contain `func` and `args`, `func` is allowed if it was added via `add_safe_globals`

However, it did not have an effect on `BUILD` and `NEWOBJ` instructions

Some classes may be rebuilt via [`NEWOBJ`](https://github.com/python/cpython/blob/3.12/Lib/pickletools.py#L2091-L2104) instruction, which indicates that their constructor should be used to rebuild the class.

Further, a [`BUILD`](https://github.com/python/cpython/blob/3.12/Lib/pickletools.py#L1984-L2007) instruction might be used if an object's `__reduce__`/`__reduce_ex__` returns a non-None value for `state`. Which indicates a `__setstate__` or `__dict__.update`.

**This PR makes sure that adding objects to the allowlist will also allow `NEWOBJ` and `BUILD` instructions for them.**

In particular, the update for `NEWOBJ` should unblock allowlisting of [`ScaledMMConfig`](https://github.com/pytorch-labs/float8_experimental/blob/d4ade877dff327ea7f51e91f7cc218ae956e8cfd/float8_experimental/float8_tensor.py#L26-L30) in float8_experimental @drisspg

Pull Request resolved: #129251
Approved by: https://github.com/albanD
ghstack dependencies: #129244

* Remove dependency on private _compat_pickle in CPython

ghstack-source-id: 7d6ee40
Pull Request resolved: #129509
@github-actions github-actions bot deleted the gh/mikaylagawarecki/225/head branch July 26, 2024 01:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged release notes: python_frontend python frontend release notes category topic: bug fixes topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants