Buffer to speed Unpickler #27727

jjlilley · 2019-10-11T02:03:49Z

Summary:
This change uses a small buffer in the Unpickler to avoid
calling reader_() byte-by-byte. Particularly, the unpickler has a
tight loop reading 1-byte opcodes.

This can be more efficient because we avoid the variable-sized
memcpy (due to templating) and std::function indirection for the
common fast path.

This improves the unpickle-1m-ints benchmark by ~20%.

This change requires changing the std::function<> interface
to Unpickler to return size_t rather than bool, but there are
only a few uses of this api.

Differential Revision: D17869980

Summary: This change uses a small buffer in the Unpickler to avoid calling reader_() byte-by-byte. Particularly, the unpickler has a tight loop reading 1-byte opcodes. This can be more efficient because we avoid the variable-sized memcpy (due to templating) and std::function indirection for the common fast path. This improves the unpickle-1m-ints benchmark by ~20%. This change requires changing the std::function<> interface to Unpickler to return size_t rather than bool, but there are only a few uses of this api. Test Plan: tests Differential Revision: D17869980 fbshipit-source-id: fdfc6acea613ad8d7ec2dd70ceafa4cb79c0367a

facebook-github-bot · 2019-10-11T02:04:06Z

This pull request was exported from Phabricator. Differential Revision: D17869980

resistor · 2019-10-12T04:38:37Z

torch/csrc/jit/unpickler.cpp

+      memcpy(&data[0], buffer_.data() + buffer_pos_, from_old_buf);
+    }
+    const size_t needed = length - from_old_buf;
+    size_t nread = reader_(&data[from_old_buf], needed);


Shouldn't we do an over-read here? Otherwise won't the buffer usually be empty?

It this is because the needed might be larger than the buffer size (it is a string of any length), so it cannot read directly into the buffer because there might not be enough room there. There is a tradeoff in reading to the buffer and then having to copy to the string when the size is already big as well and the expected overhead of the reader_ call is relatively smaller.

Right, the reason is the arbitrary length.
It seems that there is a threshold (maybe about a cache line or two?) where double-buffering cost is effectively small, but then once it grows too big, that reading into the end location can make more sense.

zdevito

Looks good, thanks!

zdevito · 2019-10-15T04:04:36Z

torch/csrc/jit/unpickler.cpp

+      memcpy(&data[0], buffer_.data() + buffer_pos_, from_old_buf);
+    }
+    const size_t needed = length - from_old_buf;
+    size_t nread = reader_(&data[from_old_buf], needed);


It this is because the needed might be larger than the buffer size (it is a string of any length), so it cannot read directly into the buffer because there might not be enough room there. There is a tradeoff in reading to the buffer and then having to copy to the string when the size is already big as well and the expected overhead of the reader_ call is relatively smaller.

jjlilley · 2019-10-15T18:01:34Z

Thanks for looking at this!

facebook-github-bot · 2019-10-15T23:45:38Z

This pull request has been merged in 7e8420b.

Summary: Pull Request resolved: pytorch#27727 This change uses a small buffer in the Unpickler to avoid calling reader_() byte-by-byte. Particularly, the unpickler has a tight loop reading 1-byte opcodes. This can be more efficient because we avoid the variable-sized memcpy (due to templating) and std::function indirection for the common fast path. This improves the unpickle-1m-ints benchmark by ~20%. This change requires changing the std::function<> interface to Unpickler to return size_t rather than bool, but there are only a few uses of this api. Test Plan: buck test caffe2/test/... benchmark in experimental/jeremyl/c2/SerializationBench Differential Revision: D17869980 fbshipit-source-id: 37e752744d19e12b7282252c8963355970bd4feb

jjlilley requested a review from apaszke as a code owner October 11, 2019 02:03

pytorchbot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Oct 11, 2019

jjlilley requested review from resistor, suo and zdevito October 11, 2019 17:20

resistor reviewed Oct 12, 2019

View reviewed changes

zdevito approved these changes Oct 15, 2019

View reviewed changes

facebook-github-bot closed this in 7e8420b Oct 15, 2019

facebook-github-bot added the merged label Oct 15, 2019

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Buffer to speed Unpickler #27727

Buffer to speed Unpickler #27727

Uh oh!

jjlilley commented Oct 11, 2019 •

edited

Loading

Uh oh!

facebook-github-bot commented Oct 11, 2019

Uh oh!

resistor Oct 12, 2019

Uh oh!

zdevito Oct 15, 2019

Uh oh!

jjlilley Oct 15, 2019

Uh oh!

zdevito left a comment

Uh oh!

zdevito Oct 15, 2019

Uh oh!

jjlilley commented Oct 15, 2019

Uh oh!

facebook-github-bot commented Oct 15, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Buffer to speed Unpickler #27727

Buffer to speed Unpickler #27727

Uh oh!

Conversation

jjlilley commented Oct 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Oct 11, 2019

Uh oh!

resistor Oct 12, 2019

Choose a reason for hiding this comment

Uh oh!

zdevito Oct 15, 2019

Choose a reason for hiding this comment

Uh oh!

jjlilley Oct 15, 2019

Choose a reason for hiding this comment

Uh oh!

zdevito left a comment

Choose a reason for hiding this comment

Uh oh!

zdevito Oct 15, 2019

Choose a reason for hiding this comment

Uh oh!

jjlilley commented Oct 15, 2019

Uh oh!

facebook-github-bot commented Oct 15, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jjlilley commented Oct 11, 2019 •

edited

Loading