Skip to content

Conversation

@jjlilley
Copy link

@jjlilley jjlilley commented Oct 16, 2019

Stack from ghstack:

For pushLong() in Pickler, it looks like we only use for a single use case, with a 10-byte value.

We were handling > 256 bytes incorrectly, by using a LONG4 opcode (expecting 4-byte length), but pushing 8 bytes. We could harden this handling, but rather than improve codepaths that we never expect to use, this change simply removes the incorrect codepath and adds and assert.

Differential Revision: D17934174

LONG4 length should be a 4-byte integer, per:
  https://github.com/python/cpython/blob/2.7/Lib/pickletools.py#L677

Document also seems to indicate that it should be signed.
Also, support on the unpickle side.

Differential Revision: [D17934174](https://our.internmc.facebook.com/intern/diff/D17934174/)

[ghstack-poisoned]
@jjlilley jjlilley requested a review from apaszke as a code owner October 16, 2019 00:21
@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Oct 16, 2019
jjlilley pushed a commit that referenced this pull request Oct 16, 2019
LONG4 length should be a 4-byte integer, per:
  https://github.com/python/cpython/blob/2.7/Lib/pickletools.py#L677

Document also seems to indicate that it should be signed.
Also, support on the unpickle side.

Differential Revision: [D17934174](https://our.internmc.facebook.com/intern/diff/D17934174/)

ghstack-source-id: 91957163
Pull Request resolved: #28057
@jjlilley jjlilley requested review from resistor and zdevito October 16, 2019 00:34
Copy link
Contributor

@zdevito zdevito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LONG4 is suppose to first read 4 bytes to determine the size of the data to read and then read that many bytes. I am not sure what our implementation does (either before or after the patch). LONG1 looks more correct, though even that only supports numbers that are precisely 8 bytes. @driazati for what is suppose to happen here. I think we should just delete LONG4 handling, and assert in LONG1 handling that length == 8 on write (currently we only do it on read).

@jjlilley
Copy link
Author

Thanks for the feedback.
I'm obviously not the expert in pickle opcodes, mostly noticed this in the other change and wanted to avoid an incorrect-size read.
AFAICT, it looks like there's only one caller of the pushLong function, and it's called with a 10-byte magic number, so asserting size==8 on writing a LONG1 isn
't going to universally work.

@driazati
Copy link
Contributor

LONG1 of size 10 is needed for #25109, but I don't think we have a real use case for LONG4 in the pickler, since TorchScript's ints are all int64_ts. I think we should just remove the LONG4 handling and error out if the code tries to go down that path.

@jjlilley
Copy link
Author

In any case, closing this specific PR.

@jjlilley jjlilley closed this Oct 16, 2019
@jjlilley jjlilley reopened this Oct 16, 2019
@jjlilley jjlilley changed the title [pytorch] Fix LONG4 length in pickler. [pytorch] Fix pushLong() issue pickler. Oct 16, 2019
@jjlilley jjlilley changed the title [pytorch] Fix pushLong() issue pickler. [pytorch] Fix pushLong() issue in pickler. Oct 16, 2019
For pushLong() in Pickler, it looks like we only use for a single use case, with a 10-byte value.

We were handling > 256 bytes incorrectly, by using a LONG4 opcode (expecting 4-byte length), but pushing 8 bytes. We could harden this handling, but rather than improve codepaths that we never expect to use, this change simply removes the incorrect codepath and adds and assert.

Differential Revision: [D17934174](https://our.internmc.facebook.com/intern/diff/D17934174/)

[ghstack-poisoned]
jjlilley pushed a commit that referenced this pull request Oct 16, 2019
Pull Request resolved: #28057

LONG4 length should be a 4-byte integer, per:
  https://github.com/python/cpython/blob/2.7/Lib/pickletools.py#L677

Document also seems to indicate that it should be signed.
Also, support on the unpickle side.
ghstack-source-id: 92038641

Differential Revision: [D17934174](https://our.internmc.facebook.com/intern/diff/D17934174/)
@jjlilley
Copy link
Author

LONG1 of size 10 is needed for #25109, but I don't think we have a real use case for LONG4 in the pickler, since TorchScript's ints are all int64_ts. I think we should just remove the LONG4 handling and error out if the code tries to go down that path.

That sounds reasonable, uploading a version that simply removes the incorrect codepath and asserts that the size fits in 1 byte.

@jjlilley jjlilley requested a review from zdevito October 16, 2019 20:18
}
TORCH_INTERNAL_ASSERT(
size <= std::numeric_limits<uint8_t>::max(),
"Cannot pickle a long with a size larger than 256 bytes");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should say Cannot pickle a long with a size larger than 1 byte

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change this message, but given that data.size() can be up to 256, saying that the long's size can't be larger than 1 byte doesn't seem accurate.

I could say the more-awkward-but-accurate:
"Cannot pickle a long with the size of size is larger than 1 byte"

Or, is there less-awkward-but-also-accurate preferred version of the assert text?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just Cannot pickle a long larger than 255 bytes (0 is a valid size, 256 is not)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks!

For pushLong() in Pickler, it looks like we only use for a single use case, with a 10-byte value.

We were handling > 256 bytes incorrectly, by using a LONG4 opcode (expecting 4-byte length), but pushing 8 bytes. We could harden this handling, but rather than improve codepaths that we never expect to use, this change simply removes the incorrect codepath and adds and assert.

Differential Revision: [D17934174](https://our.internmc.facebook.com/intern/diff/D17934174/)

[ghstack-poisoned]
jjlilley pushed a commit that referenced this pull request Oct 16, 2019
Pull Request resolved: #28057

For pushLong() in Pickler, it looks like we only use for a single use case, with a 10-byte value.

We were handling > 256 bytes incorrectly, by using a LONG4 opcode (expecting 4-byte length), but pushing 8 bytes. We could harden this handling, but rather than improve codepaths that we never expect to use, this change simply removes the incorrect codepath and adds and assert.

ghstack-source-id: 92048325

Differential Revision: [D17934174](https://our.internmc.facebook.com/intern/diff/D17934174/)
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in ff00e8c.

@facebook-github-bot facebook-github-bot deleted the gh/jjlilley/3/head branch October 28, 2019 22:16
thiagocrepaldi pushed a commit to thiagocrepaldi/pytorch that referenced this pull request Feb 4, 2020
Summary:
Pull Request resolved: pytorch#28057

For pushLong() in Pickler, it looks like we only use for a single use case, with a 10-byte value.

We were handling > 256 bytes incorrectly, by using a LONG4 opcode (expecting 4-byte length), but pushing 8 bytes. We could harden this handling, but rather than improve codepaths that we never expect to use, this change simply removes the incorrect codepath and adds and assert.

ghstack-source-id: 92048325

Test Plan: buck test mode/dev-nosan caffe2/test/...

Differential Revision: D17934174

fbshipit-source-id: ecc1ca37dbcc87151fc5bf2ffb6b05dff91d3667
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants