Skip to content

Conversation

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 28, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139106

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 7132a41 with merge base 9af1816 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

"Expected input to be a checkpoint returned by torch.save but got a torchscript checkpoint"
)
data_file = io.BytesIO(zip_file.get_record("data.pkl"))
instructions = pickletools.genops(data_file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure it's safe to use it? Per https://docs.python.org/3/library/pickletools.html#pickletools.genops it looks like it might do some unsafe deserialization to retrieve the arguments...

Copy link
Contributor Author

@mikaylagawarecki mikaylagawarecki Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding from the implementation is that it does not import any modules or call any functions serialized within the pickle file. This huggingface article also calls out that pickletools does not execute any code link

re the decoding of the arguments, that only reads ints/strings and functions used seem to be codecs.decode, struct.unpack, bytearray is this potentially unsafe?

For example, the instruction name and args for a checkpoint created as follows (note none of the arguments on the stack will be there)

import torch
import io

from torch.testing._internal.two_tensor import TwoTensor


t = torch.randn(2, 3)
tt = TwoTensor(t, t)

buffer = io.BytesIO()
torch.save(tt, buffer)
buffer.seek(0)

are

Click here to see output
PROTO 2
GLOBAL torch._tensor _rebuild_from_type_v2
BINPUT 0
MARK None
GLOBAL torch._utils _rebuild_wrapper_subclass
BINPUT 1
GLOBAL torch.testing._internal.two_tensor TwoTensor
BINPUT 2
MARK None
BINGET 2
GLOBAL torch float32
BINPUT 3
BININT1 2
BININT1 3
TUPLE2 None
BINPUT 4
BININT1 3
BININT1 1
TUPLE2 None
BINPUT 5
BININT1 0
GLOBAL torch.serialization _get_layout
BINPUT 6
BINUNICODE torch.strided
BINPUT 7
TUPLE1 None
BINPUT 8
REDUCE None
BINPUT 9
GLOBAL torch device
BINPUT 10
BINUNICODE cpu
BINPUT 11
TUPLE1 None
BINPUT 12
REDUCE None
BINPUT 13
NEWFALSE None
TUPLE None
BINPUT 14
EMPTY_DICT None
BINPUT 15
MARK None
BINUNICODE a
BINPUT 16
GLOBAL torch._utils _rebuild_tensor_v2
BINPUT 17
MARK None
MARK None
BINUNICODE storage
BINPUT 18
GLOBAL torch FloatStorage
BINPUT 19
BINUNICODE 0
BINPUT 20
BINUNICODE cpu
BINPUT 21
BININT1 6
TUPLE None
BINPUT 22
BINPERSID None
BININT1 0
BININT1 2
BININT1 3
TUPLE2 None
BINPUT 23
BININT1 3
BININT1 1
TUPLE2 None
BINPUT 24
NEWFALSE None
GLOBAL collections OrderedDict
BINPUT 25
EMPTY_TUPLE None
REDUCE None
BINPUT 26
TUPLE None
BINPUT 27
REDUCE None
BINPUT 28
BINUNICODE b
BINPUT 29
BINGET 28
SETITEMS None
TUPLE None
BINPUT 30
REDUCE None
BINPUT 31
STOP None

Copy link
Contributor Author

@mikaylagawarecki mikaylagawarecki Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A potential alternative would be for me to rewrite the pickletools logic to only get the strings after the GLOBAL instruction, would this be preferable? i.e. add a mode to weights_only unpickler to do this

Copy link
Contributor Author

@mikaylagawarecki mikaylagawarecki Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #139221 that does this without pickletools for consideration, that might give us a bit more confidence that nothing unsafe is being done. wdyt?

That said I don't think pickletools i doing anything that jumps out to me as blatantly unsafe.

@mikaylagawarecki mikaylagawarecki added module: python frontend For issues relating to PyTorch's Python frontend topic: improvements topic category release notes: python_frontend python frontend release notes category and removed module: python frontend For issues relating to PyTorch's Python frontend labels Oct 29, 2024
mikaylagawarecki added a commit that referenced this pull request Oct 30, 2024
…heckpoint (no pickletools dependency)"

#139106 without pickletools




[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request Oct 30, 2024
…ckletools dependency)"

#139106 without pickletools




[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request Oct 30, 2024
…heckpoint (no pickletools dependency)"

#139106 without pickletools




[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request Oct 30, 2024
…ckletools dependency)"

#139106 without pickletools




[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request Oct 30, 2024
…heckpoint (no pickletools dependency)"

#139106 without pickletools




[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request Oct 30, 2024
…ckletools dependency)"

#139106 without pickletools




[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request Oct 31, 2024
…heckpoint (no pickletools dependency)"

#139106 without pickletools




[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request Oct 31, 2024
…ckletools dependency)"

#139106 without pickletools




[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request Oct 31, 2024
…heckpoint (no pickletools dependency)"

#139106 without pickletools




[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request Oct 31, 2024
…ckletools dependency)"

#139106 without pickletools




[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request Oct 31, 2024
…heckpoint (no pickletools dependency)"

#139106 without pickletools




[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request Oct 31, 2024
…ckletools dependency)"

#139106 without pickletools




[ghstack-poisoned]
@mikaylagawarecki
Copy link
Contributor Author

mikaylagawarecki commented Oct 31, 2024

Closing as #139221 was the preferred approach

mikaylagawarecki added a commit that referenced this pull request Nov 1, 2024
…heckpoint (no pickletools dependency)"

#139106 without pickletools




[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request Nov 1, 2024
…ckletools dependency)"

#139106 without pickletools




[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Nov 1, 2024
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
@github-actions github-actions bot deleted the gh/mikaylagawarecki/279/head branch December 1, 2024 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release notes: python_frontend python frontend release notes category topic: improvements topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants