-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Buffer to speed Unpickler #27727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buffer to speed Unpickler #27727
Conversation
Summary: This change uses a small buffer in the Unpickler to avoid calling reader_() byte-by-byte. Particularly, the unpickler has a tight loop reading 1-byte opcodes. This can be more efficient because we avoid the variable-sized memcpy (due to templating) and std::function indirection for the common fast path. This improves the unpickle-1m-ints benchmark by ~20%. This change requires changing the std::function<> interface to Unpickler to return size_t rather than bool, but there are only a few uses of this api. Test Plan: tests Differential Revision: D17869980 fbshipit-source-id: fdfc6acea613ad8d7ec2dd70ceafa4cb79c0367a
|
This pull request was exported from Phabricator. Differential Revision: D17869980 |
| memcpy(&data[0], buffer_.data() + buffer_pos_, from_old_buf); | ||
| } | ||
| const size_t needed = length - from_old_buf; | ||
| size_t nread = reader_(&data[from_old_buf], needed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we do an over-read here? Otherwise won't the buffer usually be empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It this is because the needed might be larger than the buffer size (it is a string of any length), so it cannot read directly into the buffer because there might not be enough room there. There is a tradeoff in reading to the buffer and then having to copy to the string when the size is already big as well and the expected overhead of the reader_ call is relatively smaller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the reason is the arbitrary length.
It seems that there is a threshold (maybe about a cache line or two?) where double-buffering cost is effectively small, but then once it grows too big, that reading into the end location can make more sense.
zdevito
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
| memcpy(&data[0], buffer_.data() + buffer_pos_, from_old_buf); | ||
| } | ||
| const size_t needed = length - from_old_buf; | ||
| size_t nread = reader_(&data[from_old_buf], needed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It this is because the needed might be larger than the buffer size (it is a string of any length), so it cannot read directly into the buffer because there might not be enough room there. There is a tradeoff in reading to the buffer and then having to copy to the string when the size is already big as well and the expected overhead of the reader_ call is relatively smaller.
|
Thanks for looking at this! |
|
This pull request has been merged in 7e8420b. |
Summary: Pull Request resolved: pytorch#27727 This change uses a small buffer in the Unpickler to avoid calling reader_() byte-by-byte. Particularly, the unpickler has a tight loop reading 1-byte opcodes. This can be more efficient because we avoid the variable-sized memcpy (due to templating) and std::function indirection for the common fast path. This improves the unpickle-1m-ints benchmark by ~20%. This change requires changing the std::function<> interface to Unpickler to return size_t rather than bool, but there are only a few uses of this api. Test Plan: buck test caffe2/test/... benchmark in experimental/jeremyl/c2/SerializationBench Differential Revision: D17869980 fbshipit-source-id: 37e752744d19e12b7282252c8963355970bd4feb
Summary:
This change uses a small buffer in the Unpickler to avoid
calling reader_() byte-by-byte. Particularly, the unpickler has a
tight loop reading 1-byte opcodes.
This can be more efficient because we avoid the variable-sized
memcpy (due to templating) and std::function indirection for the
common fast path.
This improves the unpickle-1m-ints benchmark by ~20%.
This change requires changing the std::function<> interface
to Unpickler to return size_t rather than bool, but there are
only a few uses of this api.
Differential Revision: D17869980