Skip to content

Conversation

@jjlilley
Copy link

@jjlilley jjlilley commented Oct 15, 2019

Stack from ghstack:

Right now, torch::save() uses std::ostream, which results in unnecessary
data copies in practice. Similar for torch::load().

Adding a std::function<size_t(const void*, size_t)> as an output option,
parallel to the existing filename and std::ostream apis, gives users the
flexibility to emit directly to a backing store.

For a simple case of appending the output to a std::string, we observe
significant benchmark savings (on order of -50%), even with the
minor std::function<> dispatch overhead. The main reason is that
std::ostringstream effectively requires 2 extra copies of the data
beyond a simple string.append lambda.

We also provide a parallel api for the load(), though this one is
slightly more complex due to the need to do arbitrary position reads.

Differential Revision: D17939034

Right now, torch::save() uses std::ostream, which results in unnecessary
data copies in practice. Similar for torch::load().

Adding a std::function<size_t(const void*, size_t)> as an output option,
parallel to the existing filename and std::ostream apis, gives users the
flexibility to emit directly to a backing store.

For a simple case of appending the output to a std::string, we observe
significant benchmark savings (on order of -50%), even with the
minor std::function<> dispatch overhead. The main reason is that
std::ostringstream effectively requires 2 extra copies of the data
beyond a simple string.append lambda.

We also provide a parallel api for the load(), though this one is
slightly more complex due to the need to do arbitrary position reads.

Differential Revision: [D17939034](https://our.internmc.facebook.com/intern/diff/D17939034/)

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Oct 15, 2019
…nction<> IO"

Right now, torch::save() uses std::ostream, which results in unnecessary
data copies in practice. Similar for torch::load().

Adding a std::function<size_t(const void*, size_t)> as an output option,
parallel to the existing filename and std::ostream apis, gives users the
flexibility to emit directly to a backing store.

For a simple case of appending the output to a std::string, we observe
significant benchmark savings (on order of -50%), even with the
minor std::function<> dispatch overhead. The main reason is that
std::ostringstream effectively requires 2 extra copies of the data
beyond a simple string.append lambda.

We also provide a parallel api for the load(), though this one is
slightly more complex due to the need to do arbitrary position reads.

Differential Revision: [D17939034](https://our.internmc.facebook.com/intern/diff/D17939034/)

[ghstack-poisoned]
jjlilley pushed a commit that referenced this pull request Oct 15, 2019
Pull Request resolved: #28039

Right now, torch::save() uses std::ostream, which results in unnecessary
data copies in practice. Similar for torch::load().

Adding a std::function<size_t(const void*, size_t)> as an output option,
parallel to the existing filename and std::ostream apis, gives users the
flexibility to emit directly to a backing store.

For a simple case of appending the output to a std::string, we observe
significant benchmark savings (on order of -50%), even with the
minor std::function<> dispatch overhead. The main reason is that
std::ostringstream effectively requires 2 extra copies of the data
beyond a simple string.append lambda.

We also provide a parallel api for the load(), though this one is
slightly more complex due to the need to do arbitrary position reads.

Differential Revision: [D17939034](https://our.internmc.facebook.com/intern/diff/D17939034/)
ghstack-source-id: 91965707
…nction<> IO"

Right now, torch::save() uses std::ostream, which results in unnecessary
data copies in practice. Similar for torch::load().

Adding a std::function<size_t(const void*, size_t)> as an output option,
parallel to the existing filename and std::ostream apis, gives users the
flexibility to emit directly to a backing store.

For a simple case of appending the output to a std::string, we observe
significant benchmark savings (on order of -50%), even with the
minor std::function<> dispatch overhead. The main reason is that
std::ostringstream effectively requires 2 extra copies of the data
beyond a simple string.append lambda.

We also provide a parallel api for the load(), though this one is
slightly more complex due to the need to do arbitrary position reads.

Differential Revision: [D17939034](https://our.internmc.facebook.com/intern/diff/D17939034/)

[ghstack-poisoned]
…nction<> IO"

Right now, torch::save() uses std::ostream, which results in unnecessary
data copies in practice. Similar for torch::load().

Adding a std::function<size_t(const void*, size_t)> as an output option,
parallel to the existing filename and std::ostream apis, gives users the
flexibility to emit directly to a backing store.

For a simple case of appending the output to a std::string, we observe
significant benchmark savings (on order of -50%), even with the
minor std::function<> dispatch overhead. The main reason is that
std::ostringstream effectively requires 2 extra copies of the data
beyond a simple string.append lambda.

We also provide a parallel api for the load(), though this one is
slightly more complex due to the need to do arbitrary position reads.

Differential Revision: [D17939034](https://our.internmc.facebook.com/intern/diff/D17939034/)

[ghstack-poisoned]
jjlilley pushed a commit that referenced this pull request Oct 15, 2019
Pull Request resolved: #28039

Right now, torch::save() uses std::ostream, which results in unnecessary
data copies in practice. Similar for torch::load().

Adding a std::function<size_t(const void*, size_t)> as an output option,
parallel to the existing filename and std::ostream apis, gives users the
flexibility to emit directly to a backing store.

For a simple case of appending the output to a std::string, we observe
significant benchmark savings (on order of -50%), even with the
minor std::function<> dispatch overhead. The main reason is that
std::ostringstream effectively requires 2 extra copies of the data
beyond a simple string.append lambda.

We also provide a parallel api for the load(), though this one is
slightly more complex due to the need to do arbitrary position reads.

Differential Revision: [D17939034](https://our.internmc.facebook.com/intern/diff/D17939034/)
ghstack-source-id: 91972076
@jjlilley jjlilley requested review from resistor and zdevito October 16, 2019 00:16
@jjlilley
Copy link
Author

This change is a followup on the earlier #27586, which was merged earlier today, but then reverted due to a size_t issue with mac builds (sorry!)

This one should pass (it's passed most of the mac builds, one more pending).
If you could take a look, that would be great!

@jjlilley
Copy link
Author

(The only difference vs. the previous change is a static_cast<> to change the uint64_t to size_t in both of the std::min statements)

@jjlilley
Copy link
Author

Thank you!

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 2e0294c.

@facebook-github-bot facebook-github-bot deleted the gh/jjlilley/1/head branch October 28, 2019 22:16
thiagocrepaldi pushed a commit to thiagocrepaldi/pytorch that referenced this pull request Feb 4, 2020
…28039)

Summary:
Pull Request resolved: pytorch#28039

Right now, torch::save() uses std::ostream, which results in unnecessary
data copies in practice. Similar for torch::load().

Adding a std::function<size_t(const void*, size_t)> as an output option,
parallel to the existing filename and std::ostream apis, gives users the
flexibility to emit directly to a backing store.

For a simple case of appending the output to a std::string, we observe
significant benchmark savings (on order of -50%), even with the
minor std::function<> dispatch overhead. The main reason is that
std::ostringstream effectively requires 2 extra copies of the data
beyond a simple string.append lambda.

We also provide a parallel api for the load(), though this one is
slightly more complex due to the need to do arbitrary position reads.

Test Plan:
buck test mode/dev-nosan caffe2/test/...
      (Basic serialization test in caffe2/test/cpp/api/serialize.cpp)
      Benchmark in experimental/jeremyl/c2/SerializationBench.cpp, with D17823443
        (1M time goes from 90ms -> 40ms, albeit with crc patch applied)

Differential Revision: D17939034

fbshipit-source-id: 344cce46f74b6438cb638a8cfbeccf4e1aa882d7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants