[pytorch] Make JIT Serialization support arbitrary std::function<> IO #28039

jjlilley · 2019-10-15T21:15:36Z

Stack from ghstack:

[pytorch] Make JIT Serialization support arbitrary std::function<> IO #28039 [pytorch] Make JIT Serialization support arbitrary std::function<> IO

Right now, torch::save() uses std::ostream, which results in unnecessary
data copies in practice. Similar for torch::load().

Adding a std::function<size_t(const void*, size_t)> as an output option,
parallel to the existing filename and std::ostream apis, gives users the
flexibility to emit directly to a backing store.

For a simple case of appending the output to a std::string, we observe
significant benchmark savings (on order of -50%), even with the
minor std::function<> dispatch overhead. The main reason is that
std::ostringstream effectively requires 2 extra copies of the data
beyond a simple string.append lambda.

We also provide a parallel api for the load(), though this one is
slightly more complex due to the need to do arbitrary position reads.

Differential Revision: D17939034

Right now, torch::save() uses std::ostream, which results in unnecessary data copies in practice. Similar for torch::load(). Adding a std::function<size_t(const void*, size_t)> as an output option, parallel to the existing filename and std::ostream apis, gives users the flexibility to emit directly to a backing store. For a simple case of appending the output to a std::string, we observe significant benchmark savings (on order of -50%), even with the minor std::function<> dispatch overhead. The main reason is that std::ostringstream effectively requires 2 extra copies of the data beyond a simple string.append lambda. We also provide a parallel api for the load(), though this one is slightly more complex due to the need to do arbitrary position reads. Differential Revision: [D17939034](https://our.internmc.facebook.com/intern/diff/D17939034/) [ghstack-poisoned]

…nction<> IO" Right now, torch::save() uses std::ostream, which results in unnecessary data copies in practice. Similar for torch::load(). Adding a std::function<size_t(const void*, size_t)> as an output option, parallel to the existing filename and std::ostream apis, gives users the flexibility to emit directly to a backing store. For a simple case of appending the output to a std::string, we observe significant benchmark savings (on order of -50%), even with the minor std::function<> dispatch overhead. The main reason is that std::ostringstream effectively requires 2 extra copies of the data beyond a simple string.append lambda. We also provide a parallel api for the load(), though this one is slightly more complex due to the need to do arbitrary position reads. Differential Revision: [D17939034](https://our.internmc.facebook.com/intern/diff/D17939034/) [ghstack-poisoned]

Pull Request resolved: #28039 Right now, torch::save() uses std::ostream, which results in unnecessary data copies in practice. Similar for torch::load(). Adding a std::function<size_t(const void*, size_t)> as an output option, parallel to the existing filename and std::ostream apis, gives users the flexibility to emit directly to a backing store. For a simple case of appending the output to a std::string, we observe significant benchmark savings (on order of -50%), even with the minor std::function<> dispatch overhead. The main reason is that std::ostringstream effectively requires 2 extra copies of the data beyond a simple string.append lambda. We also provide a parallel api for the load(), though this one is slightly more complex due to the need to do arbitrary position reads. Differential Revision: [D17939034](https://our.internmc.facebook.com/intern/diff/D17939034/) ghstack-source-id: 91965707

…nction<> IO" Right now, torch::save() uses std::ostream, which results in unnecessary data copies in practice. Similar for torch::load(). Adding a std::function<size_t(const void*, size_t)> as an output option, parallel to the existing filename and std::ostream apis, gives users the flexibility to emit directly to a backing store. For a simple case of appending the output to a std::string, we observe significant benchmark savings (on order of -50%), even with the minor std::function<> dispatch overhead. The main reason is that std::ostringstream effectively requires 2 extra copies of the data beyond a simple string.append lambda. We also provide a parallel api for the load(), though this one is slightly more complex due to the need to do arbitrary position reads. Differential Revision: [D17939034](https://our.internmc.facebook.com/intern/diff/D17939034/) [ghstack-poisoned]

Pull Request resolved: #28039 Right now, torch::save() uses std::ostream, which results in unnecessary data copies in practice. Similar for torch::load(). Adding a std::function<size_t(const void*, size_t)> as an output option, parallel to the existing filename and std::ostream apis, gives users the flexibility to emit directly to a backing store. For a simple case of appending the output to a std::string, we observe significant benchmark savings (on order of -50%), even with the minor std::function<> dispatch overhead. The main reason is that std::ostringstream effectively requires 2 extra copies of the data beyond a simple string.append lambda. We also provide a parallel api for the load(), though this one is slightly more complex due to the need to do arbitrary position reads. Differential Revision: [D17939034](https://our.internmc.facebook.com/intern/diff/D17939034/) ghstack-source-id: 91972076

jjlilley · 2019-10-16T00:19:08Z

This change is a followup on the earlier #27586, which was merged earlier today, but then reverted due to a size_t issue with mac builds (sorry!)

This one should pass (it's passed most of the mac builds, one more pending).
If you could take a look, that would be great!

jjlilley · 2019-10-16T00:19:42Z

(The only difference vs. the previous change is a static_cast<> to change the uint64_t to size_t in both of the std::min statements)

jjlilley · 2019-10-16T04:08:14Z

Thank you!

facebook-github-bot · 2019-10-16T07:11:55Z

This pull request has been merged in 2e0294c.

…28039) Summary: Pull Request resolved: pytorch#28039 Right now, torch::save() uses std::ostream, which results in unnecessary data copies in practice. Similar for torch::load(). Adding a std::function<size_t(const void*, size_t)> as an output option, parallel to the existing filename and std::ostream apis, gives users the flexibility to emit directly to a backing store. For a simple case of appending the output to a std::string, we observe significant benchmark savings (on order of -50%), even with the minor std::function<> dispatch overhead. The main reason is that std::ostringstream effectively requires 2 extra copies of the data beyond a simple string.append lambda. We also provide a parallel api for the load(), though this one is slightly more complex due to the need to do arbitrary position reads. Test Plan: buck test mode/dev-nosan caffe2/test/... (Basic serialization test in caffe2/test/cpp/api/serialize.cpp) Benchmark in experimental/jeremyl/c2/SerializationBench.cpp, with D17823443 (1M time goes from 90ms -> 40ms, albeit with crc patch applied) Differential Revision: D17939034 fbshipit-source-id: 344cce46f74b6438cb638a8cfbeccf4e1aa882d7

jjlilley requested review from apaszke, ebetica, goldsborough and yf225 as code owners October 15, 2019 21:15

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Oct 15, 2019

jjlilley requested review from resistor and zdevito October 16, 2019 00:16

zdevito approved these changes Oct 16, 2019

View reviewed changes

facebook-github-bot closed this in 2e0294c Oct 16, 2019

facebook-github-bot added the merged label Oct 16, 2019

facebook-github-bot deleted the gh/jjlilley/1/head branch October 28, 2019 22:16

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pytorch] Make JIT Serialization support arbitrary std::function<> IO #28039

[pytorch] Make JIT Serialization support arbitrary std::function<> IO #28039

Uh oh!

jjlilley commented Oct 15, 2019 •

edited

Loading

Uh oh!

jjlilley commented Oct 16, 2019

Uh oh!

jjlilley commented Oct 16, 2019

Uh oh!

jjlilley commented Oct 16, 2019

Uh oh!

facebook-github-bot commented Oct 16, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[pytorch] Make JIT Serialization support arbitrary std::function<> IO #28039

[pytorch] Make JIT Serialization support arbitrary std::function<> IO #28039

Uh oh!

Conversation

jjlilley commented Oct 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jjlilley commented Oct 16, 2019

Uh oh!

jjlilley commented Oct 16, 2019

Uh oh!

jjlilley commented Oct 16, 2019

Uh oh!

facebook-github-bot commented Oct 16, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jjlilley commented Oct 15, 2019 •

edited

Loading