-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Add utility to get all unsafe globals in checkpoint #139106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add utility to get all unsafe globals in checkpoint #139106
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139106
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 7132a41 with merge base 9af1816 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| "Expected input to be a checkpoint returned by torch.save but got a torchscript checkpoint" | ||
| ) | ||
| data_file = io.BytesIO(zip_file.get_record("data.pkl")) | ||
| instructions = pickletools.genops(data_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure it's safe to use it? Per https://docs.python.org/3/library/pickletools.html#pickletools.genops it looks like it might do some unsafe deserialization to retrieve the arguments...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding from the implementation is that it does not import any modules or call any functions serialized within the pickle file. This huggingface article also calls out that pickletools does not execute any code link
re the decoding of the arguments, that only reads ints/strings and functions used seem to be codecs.decode, struct.unpack, bytearray is this potentially unsafe?
For example, the instruction name and args for a checkpoint created as follows (note none of the arguments on the stack will be there)
import torch
import io
from torch.testing._internal.two_tensor import TwoTensor
t = torch.randn(2, 3)
tt = TwoTensor(t, t)
buffer = io.BytesIO()
torch.save(tt, buffer)
buffer.seek(0)are
Click here to see output
PROTO 2
GLOBAL torch._tensor _rebuild_from_type_v2
BINPUT 0
MARK None
GLOBAL torch._utils _rebuild_wrapper_subclass
BINPUT 1
GLOBAL torch.testing._internal.two_tensor TwoTensor
BINPUT 2
MARK None
BINGET 2
GLOBAL torch float32
BINPUT 3
BININT1 2
BININT1 3
TUPLE2 None
BINPUT 4
BININT1 3
BININT1 1
TUPLE2 None
BINPUT 5
BININT1 0
GLOBAL torch.serialization _get_layout
BINPUT 6
BINUNICODE torch.strided
BINPUT 7
TUPLE1 None
BINPUT 8
REDUCE None
BINPUT 9
GLOBAL torch device
BINPUT 10
BINUNICODE cpu
BINPUT 11
TUPLE1 None
BINPUT 12
REDUCE None
BINPUT 13
NEWFALSE None
TUPLE None
BINPUT 14
EMPTY_DICT None
BINPUT 15
MARK None
BINUNICODE a
BINPUT 16
GLOBAL torch._utils _rebuild_tensor_v2
BINPUT 17
MARK None
MARK None
BINUNICODE storage
BINPUT 18
GLOBAL torch FloatStorage
BINPUT 19
BINUNICODE 0
BINPUT 20
BINUNICODE cpu
BINPUT 21
BININT1 6
TUPLE None
BINPUT 22
BINPERSID None
BININT1 0
BININT1 2
BININT1 3
TUPLE2 None
BINPUT 23
BININT1 3
BININT1 1
TUPLE2 None
BINPUT 24
NEWFALSE None
GLOBAL collections OrderedDict
BINPUT 25
EMPTY_TUPLE None
REDUCE None
BINPUT 26
TUPLE None
BINPUT 27
REDUCE None
BINPUT 28
BINUNICODE b
BINPUT 29
BINGET 28
SETITEMS None
TUPLE None
BINPUT 30
REDUCE None
BINPUT 31
STOP None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A potential alternative would be for me to rewrite the pickletools logic to only get the strings after the GLOBAL instruction, would this be preferable? i.e. add a mode to weights_only unpickler to do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #139221 that does this without pickletools for consideration, that might give us a bit more confidence that nothing unsafe is being done. wdyt?
That said I don't think pickletools i doing anything that jumps out to me as blatantly unsafe.
[ghstack-poisoned]
cc albanD [ghstack-poisoned]
Address #129698 cc albanD [ghstack-poisoned]
…heckpoint (no pickletools dependency)" #139106 without pickletools [ghstack-poisoned]
…ckletools dependency)" #139106 without pickletools [ghstack-poisoned]
Address #129698 cc albanD [ghstack-poisoned]
…heckpoint (no pickletools dependency)" #139106 without pickletools [ghstack-poisoned]
…ckletools dependency)" #139106 without pickletools [ghstack-poisoned]
…heckpoint (no pickletools dependency)" #139106 without pickletools [ghstack-poisoned]
…ckletools dependency)" #139106 without pickletools [ghstack-poisoned]
…heckpoint (no pickletools dependency)" #139106 without pickletools [ghstack-poisoned]
…ckletools dependency)" #139106 without pickletools [ghstack-poisoned]
…heckpoint (no pickletools dependency)" #139106 without pickletools [ghstack-poisoned]
…ckletools dependency)" #139106 without pickletools [ghstack-poisoned]
…heckpoint (no pickletools dependency)" #139106 without pickletools [ghstack-poisoned]
…ckletools dependency)" #139106 without pickletools [ghstack-poisoned]
|
Closing as #139221 was the preferred approach |
…heckpoint (no pickletools dependency)" #139106 without pickletools [ghstack-poisoned]
…ckletools dependency)" #139106 without pickletools [ghstack-poisoned]
…ependency) (pytorch#139221) Fixes pytorch#129698 pytorch#139106 without pickletools Pull Request resolved: pytorch#139221 Approved by: https://github.com/malfet ghstack dependencies: pytorch#138936
Address #129698
Stack from ghstack (oldest at bottom):
cc @albanD