-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Make JIT Serialization support arbitrary std::function<> IO #27586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This pull request was exported from Phabricator. Differential Revision: D17822962 |
caffe2/serialize/inline_container.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really providing any value to have both paths? Could we not just convert all callers to use the std::function path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good, I'll unify behind std::function<>.
test/cpp/api/serialize.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For API orthogonality, we should probably make the same changes to load as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, totally agree.
My intention was partly to gauge whether there would be objections that I hadn't thought of to this type of approach before doing the parallel load path. But I'll go ahead and change load as well.
|
This is a good idea, Ill let Owen mark when it is ready to land. |
648da20 to
7061fea
Compare
|
This pull request was exported from Phabricator. Differential Revision: D17822962 |
|
I just uploaded an updated version:
|
7061fea to
aaf28be
Compare
|
This pull request was exported from Phabricator. Differential Revision: D17822962 |
caffe2/serialize/inline_container.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be a unique_ptr? I believe ofstream can be default constructed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't need to be a unique_ptr<>, was mostly trying to save work in the case where it's not needed. Reverted this part back.
aaf28be to
7d01b5e
Compare
|
This pull request was exported from Phabricator. Differential Revision: D17822962 |
1 similar comment
|
This pull request was exported from Phabricator. Differential Revision: D17822962 |
7d01b5e to
1ce2d05
Compare
|
(also fixed a few issues to make clang-tidy happy) |
|
Looked into the failing tests - related to a bug in c10/util/tempfile.h, which was constructing a temp file name with a trailing '\0' in the std::string. Uploading the fix. |
…#27586) Summary: Pull Request resolved: pytorch#27586 Right now, torch::save() uses std::ostream, which results in unnecessary data copies in practice. Similar for torch::load(). Adding a std::function<size_t(const void*, size_t)> as an output option, parallel to the existing filename and std::ostream apis, gives users the flexibility to emit directly to a backing store. For a simple case of appending the output to a std::string, we observe significant benchmark savings (on order of -50%), even with the minor std::function<> dispatch overhead. The main reason is that std::ostringstream effectively requires 2 extra copies of the data beyond a simple string.append lambda. We also provide a parallel api for the load(), though this one is slightly more complex due to the need to do arbitrary position reads. Test Plan: buck test mode/dev-nosan caffe2/test/... (Basic serialization test in caffe2/test/cpp/api/serialize.cpp) Benchmark in experimental/jeremyl/c2/SerializationBench.cpp, with D17823443 (1M time goes from 90ms -> 40ms, albeit with crc patch applied) Differential Revision: D17822962 fbshipit-source-id: 2cf2b24ad7d4212f2e4a49d7abc257cbdf33d206
|
This pull request was exported from Phabricator. Differential Revision: D17822962 |
1ce2d05 to
9818c04
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jjlilley is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
Thanks for looking at this! |
|
It seems that this PR is breaking macOS builds on master: which also shows up in the CI for this PR. |
|
Sorry that to unbreak master, I will have to revert this PR. |
|
Sorry about this! Thanks for reverting. |
|
We'll need to look at the Github CI on this PR - specifically |
…27586) Summary: Right now, torch::save() uses std::ostream, which results in unnecessary data copies in practice. Similar for torch::load(). Adding a std::function<size_t(const void*, size_t)> as an output option, parallel to the existing filename and std::ostream apis, gives users the flexibility to emit directly to a backing store. For a simple case of appending the output to a std::string, we observe significant benchmark savings (on order of -50%), even with the minor std::function<> dispatch overhead. The main reason is that std::ostringstream effectively requires 2 extra copies of the data beyond a simple string.append lambda. We also provide a parallel api for the load(), though this one is slightly more complex due to the need to do arbitrary position reads. Pull Request resolved: pytorch#27586 Test Plan: buck test mode/dev-nosan caffe2/test/... (Basic serialization test in caffe2/test/cpp/api/serialize.cpp) Benchmark in experimental/jeremyl/c2/SerializationBench.cpp, with D17823443 (1M time goes from 90ms -> 40ms, albeit with crc patch applied) Differential Revision: D17822962 Pulled By: jjlilley fbshipit-source-id: d344a7e59707f3b30d42280fbab78f87399e4d10
Summary:
Right now, torch::save() uses std::ostream, which results in unnecessary
data copies in practice. Similar for torch::load().
Adding a std::function<size_t(const void*, size_t)> as an output option,
parallel to the existing filename and std::ostream apis, gives users the
flexibility to emit directly to a backing store.
For a simple case of appending the output to a std::string, we observe
significant benchmark savings (on order of -50%), even with the
minor std::function<> dispatch overhead. The main reason is that
std::ostringstream effectively requires 2 extra copies of the data
beyond a simple string.append lambda.
We also provide a parallel api for the load(), though this one is
slightly more complex due to the need to do arbitrary position reads.