Skip to content

Conversation

@jjlilley
Copy link

@jjlilley jjlilley commented Oct 9, 2019

Summary:
Right now, torch::save() uses std::ostream, which results in unnecessary
data copies in practice. Similar for torch::load().

Adding a std::function<size_t(const void*, size_t)> as an output option,
parallel to the existing filename and std::ostream apis, gives users the
flexibility to emit directly to a backing store.

For a simple case of appending the output to a std::string, we observe
significant benchmark savings (on order of -50%), even with the
minor std::function<> dispatch overhead. The main reason is that
std::ostringstream effectively requires 2 extra copies of the data
beyond a simple string.append lambda.

We also provide a parallel api for the load(), though this one is
slightly more complex due to the need to do arbitrary position reads.

@pytorchbot pytorchbot added caffe2 oncall: jit Add this issue/PR to JIT oncall triage queue module: cpp Related to C++ API labels Oct 9, 2019
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D17822962

@jjlilley jjlilley requested review from resistor and zdevito October 9, 2019 00:42
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really providing any value to have both paths? Could we not just convert all callers to use the std::function path?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good, I'll unify behind std::function<>.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For API orthogonality, we should probably make the same changes to load as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, totally agree.
My intention was partly to gauge whether there would be objections that I hadn't thought of to this type of approach before doing the parallel load path. But I'll go ahead and change load as well.

@zdevito zdevito removed their request for review October 9, 2019 17:22
@zdevito
Copy link
Contributor

zdevito commented Oct 9, 2019

This is a good idea, Ill let Owen mark when it is ready to land.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D17822962

@jjlilley
Copy link
Author

jjlilley commented Oct 9, 2019

I just uploaded an updated version:

  1. There is std::function<> support for both torch::load() and torch::save() codepaths now
  2. I did simplify/unify the lowest layer PyTorchStreamWriter impl a bit, but chose not to remove the filename-based constructor api because of the python binding to that constructor.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D17822962

Copy link
Contributor

@resistor resistor Oct 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be a unique_ptr? I believe ofstream can be default constructed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't need to be a unique_ptr<>, was mostly trying to save work in the case where it's not needed. Reverted this part back.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D17822962

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D17822962

@jjlilley
Copy link
Author

(also fixed a few issues to make clang-tidy happy)

@jjlilley
Copy link
Author

Looked into the failing tests - related to a bug in c10/util/tempfile.h, which was constructing a temp file name with a trailing '\0' in the std::string. Uploading the fix.

…#27586)

Summary:
Pull Request resolved: pytorch#27586

Right now, torch::save() uses std::ostream, which results in unnecessary
data copies in practice. Similar for torch::load().

Adding a std::function<size_t(const void*, size_t)> as an output option,
parallel to the existing filename and std::ostream apis, gives users the
flexibility to emit directly to a backing store.

For a simple case of appending the output to a std::string, we observe
significant benchmark savings (on order of -50%), even with the
minor std::function<> dispatch overhead. The main reason is that
std::ostringstream effectively requires 2 extra copies of the data
beyond a simple string.append lambda.

We also provide a parallel api for the load(), though this one is
slightly more complex due to the need to do arbitrary position reads.

Test Plan:
buck test mode/dev-nosan caffe2/test/...
      (Basic serialization test in caffe2/test/cpp/api/serialize.cpp)
      Benchmark in experimental/jeremyl/c2/SerializationBench.cpp, with D17823443
        (1M time goes from 90ms -> 40ms, albeit with crc patch applied)

Differential Revision: D17822962

fbshipit-source-id: 2cf2b24ad7d4212f2e4a49d7abc257cbdf33d206
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D17822962

@pytorchbot pytorchbot added the module: internals Related to internal abstractions in c10 and ATen label Oct 10, 2019
@jjlilley jjlilley changed the title Make JIT Serialization support arbitrary std::function<> output Make JIT Serialization support arbitrary std::function<> IO Oct 11, 2019
@jjlilley jjlilley requested review from suo and zdevito October 11, 2019 23:59
@zdevito zdevito removed their request for review October 15, 2019 03:48
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjlilley is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@jjlilley
Copy link
Author

Thanks for looking at this!

@yf225
Copy link
Contributor

yf225 commented Oct 15, 2019

It seems that this PR is breaking macOS builds on master:

Oct 10 15:32:13 ../torch/csrc/api/src/serialize/input-archive.cpp:115:22: error: no matching function for call to 'min'
Oct 10 15:32:13       size_t nread = std::min(pos + n, size_) - pos;
Oct 10 15:32:13                      ^~~~~~~~

which also shows up in the CI for this PR.

@yf225
Copy link
Contributor

yf225 commented Oct 15, 2019

Sorry that to unbreak master, I will have to revert this PR.

@jjlilley
Copy link
Author

Sorry about this! Thanks for reverting.
How do I notice this issue in the future, since this didn't show up on Sandcastle?

@yf225
Copy link
Contributor

yf225 commented Oct 15, 2019

We'll need to look at the Github CI on this PR - specifically pytorch_macos_10_13_py3_build and pytorch_macos_10_13_cuda9_2_cudnn7_py3_build was failing.

@facebook-github-bot
Copy link
Contributor

@jjlilley merged this pull request in cbe5ab1.

thiagocrepaldi pushed a commit to thiagocrepaldi/pytorch that referenced this pull request Feb 4, 2020
…27586)

Summary:
Right now, torch::save() uses std::ostream, which results in unnecessary
data copies in practice. Similar for torch::load().

Adding a std::function<size_t(const void*, size_t)> as an output option,
parallel to the existing filename and std::ostream apis, gives users the
flexibility to emit directly to a backing store.

For a simple case of appending the output to a std::string, we observe
significant benchmark savings (on order of -50%), even with the
minor std::function<> dispatch overhead. The main reason is that
std::ostringstream effectively requires 2 extra copies of the data
beyond a simple string.append lambda.

We also provide a parallel api for the load(), though this one is
slightly more complex due to the need to do arbitrary position reads.
Pull Request resolved: pytorch#27586

Test Plan:
buck test mode/dev-nosan caffe2/test/...
      (Basic serialization test in caffe2/test/cpp/api/serialize.cpp)
      Benchmark in experimental/jeremyl/c2/SerializationBench.cpp, with D17823443
        (1M time goes from 90ms -> 40ms, albeit with crc patch applied)

Differential Revision: D17822962

Pulled By: jjlilley

fbshipit-source-id: d344a7e59707f3b30d42280fbab78f87399e4d10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

caffe2 Merged module: cpp Related to C++ API module: internals Related to internal abstractions in c10 and ATen oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants