[pytorch] String optimizations related to serialization. #28230

jjlilley · 2019-10-17T17:03:33Z

Stack from ghstack:

[pytorch] String optimizations related to serialization. #28230 [pytorch] String optimizations related to serialization.

This change improves the pickling small data benchmark by roughly 30%
(25.8usec -> 18.05usec on my machine, with pending zip changes also applied).

One of the main issues was that we were spending 25%+ of the cpu profile
time in std::[o]stringstream constructors alone.

Two main parts

Change some std::stringstream to std::ostringstream, when they
showed up on hot-ish paths, and it was trivial to convert them.

Roughly 27% of the std::stringstream constructor time is spent
building the constituent std::basic_istream. If the istream isn't
needed, don't construct it.
For a couple of very hot paths (e.g. Pickler::pushGlobal), just
convert to traditional string::append(). std::ostringstream is
convenient, but not particularly efficient.

Differential Revision: D17982181

This change improves the pickling small data benchmark by roughly 20%. One of the main issues was that we were spending 25%+ of the cpu profile time in std::[o]stringstream constructors alone. Two main parts - Change some std::stringstream to std::ostringstream, when they showed up on hot-ish paths, and it was trivial to convert them. Roughly 27% of the std::stringstream constructor time is spent building the constituent std::basic_istream. If the istream isn't needed, don't construct it. - For a couple of very hot paths (e.g. Pickler::pushGlobal), just convert to traditional string::append(). std::ostringstream is convenient, but not particularly efficient. Differential Revision: [D17982181](https://our.internmc.facebook.com/intern/diff/D17982181/) [ghstack-poisoned]

This change improves the pickling small data benchmark by roughly 30% (25.8usec -> 18.05usec on my machine). One of the main issues was that we were spending 25%+ of the cpu profile time in std::[o]stringstream constructors alone. Two main parts - Change some std::stringstream to std::ostringstream, when they showed up on hot-ish paths, and it was trivial to convert them. Roughly 27% of the std::stringstream constructor time is spent building the constituent std::basic_istream. If the istream isn't needed, don't construct it. - For a couple of very hot paths (e.g. Pickler::pushGlobal), just convert to traditional string::append(). std::ostringstream is convenient, but not particularly efficient. Differential Revision: [D17982181](https://our.internmc.facebook.com/intern/diff/D17982181/) [ghstack-poisoned]

Pull Request resolved: #28230 This change improves the pickling small data benchmark by roughly 30%. (25.8usec -> 18.05usec). One of the main issues was that we were spending 25%+ of the cpu profile time in std::[o]stringstream constructors alone. Two main parts - Change some std::stringstream to std::ostringstream, when they showed up on hot-ish paths, and it was trivial to convert them. Roughly 27% of the std::stringstream constructor time is spent building the constituent std::basic_istream. If the istream isn't needed, don't construct it. - For a couple of very hot paths (e.g. Pickler::pushGlobal), just convert to traditional string::append(). std::ostringstream is convenient, but not particularly efficient. ghstack-source-id: 92114972 Differential Revision: [D17982181](https://our.internmc.facebook.com/intern/diff/D17982181/)

jjlilley · 2019-10-17T18:54:29Z

(fwiw, pprof trace is attached to D17982181; this diff should be ready for a look, and gets the % of time spent in just std::stringstream-related constructors to under 10%, from 25%+. I don't think we need wholesale std::stringstream changes, but tweaking the hotter paths seems sensible).

This change improves the pickling small data benchmark by roughly 30% (25.8usec -> 18.05usec on my machine, with pending zip changes also applied). One of the main issues was that we were spending 25%+ of the cpu profile time in std::[o]stringstream constructors alone. Two main parts - Change some std::stringstream to std::ostringstream, when they showed up on hot-ish paths, and it was trivial to convert them. Roughly 27% of the std::stringstream constructor time is spent building the constituent std::basic_istream. If the istream isn't needed, don't construct it. - For a couple of very hot paths (e.g. Pickler::pushGlobal), just convert to traditional string::append(). std::ostringstream is convenient, but not particularly efficient. Differential Revision: [D17982181](https://our.internmc.facebook.com/intern/diff/D17982181/) [ghstack-poisoned]

Pull Request resolved: #28230 This change improves the pickling small data benchmark by roughly 30%. (25.8usec -> 18.05usec). One of the main issues was that we were spending 25%+ of the cpu profile time in std::[o]stringstream constructors alone. Two main parts - Change some std::stringstream to std::ostringstream, when they showed up on hot-ish paths, and it was trivial to convert them. Roughly 27% of the std::stringstream constructor time is spent building the constituent std::basic_istream. If the istream isn't needed, don't construct it. - For a couple of very hot paths (e.g. Pickler::pushGlobal), just convert to traditional string::append(). std::ostringstream is convenient, but not particularly efficient. ghstack-source-id: 92153103 Differential Revision: [D17982181](https://our.internmc.facebook.com/intern/diff/D17982181/)

driazati · 2019-10-18T00:45:40Z

torch/csrc/jit/pickler.cpp

-  data_type << toString(tensor.scalar_type()) << "Storage";
-  pushGlobal("torch", data_type.str());
+  std::string data_type =
+    std::string(toString(tensor.scalar_type())).append("Storage");


Is there any difference between append and +?

append() is going to be slightly more efficient because it assumes we can clobber the lhs string, unlike operator+() which maintains the input lhs and rhs args.

(operator+() unconditionally copies both lhs and rhs into a newly-created string. append() just needs to ensure the buffer is big enough and then copies in the rhs. And for small strings like these, if the string size is within the default in-place buffer, there's a chance no resizing is needed)

facebook-github-bot · 2019-10-18T16:07:57Z

This pull request has been merged in 3d74550.

Summary: Pull Request resolved: pytorch#28230 This change improves the pickling small data benchmark by roughly 30%. (25.8usec -> 18.05usec). One of the main issues was that we were spending 25%+ of the cpu profile time in std::[o]stringstream constructors alone. Two main parts - Change some std::stringstream to std::ostringstream, when they showed up on hot-ish paths, and it was trivial to convert them. Roughly 27% of the std::stringstream constructor time is spent building the constituent std::basic_istream. If the istream isn't needed, don't construct it. - For a couple of very hot paths (e.g. Pickler::pushGlobal), just convert to traditional string::append(). std::ostringstream is convenient, but not particularly efficient. ghstack-source-id: 92153103 Test Plan: Benchmarking: buck build mode/opt experimental/jeremyl/c2:SerializationBench Correctness: buck test mode/dev-nosan caffe2/test/... Differential Revision: D17982181 fbshipit-source-id: 7fd4d267293231244c10c1e5b8f4951a7a3d852f

jjlilley requested a review from apaszke as a code owner October 17, 2019 17:03

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Oct 17, 2019

jjlilley mentioned this pull request Oct 17, 2019

[pytorch] In torch::save() avoid zip compressing small header records. #28180

Closed

jjlilley requested review from driazati and zdevito October 17, 2019 18:51

driazati approved these changes Oct 18, 2019

View reviewed changes

zdevito removed their request for review October 18, 2019 03:50

facebook-github-bot closed this in 3d74550 Oct 18, 2019

facebook-github-bot added the merged label Oct 18, 2019

facebook-github-bot deleted the gh/jjlilley/7/head branch October 28, 2019 22:16

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pytorch] String optimizations related to serialization. #28230

[pytorch] String optimizations related to serialization. #28230

Uh oh!

jjlilley commented Oct 17, 2019 •

edited

Loading

Uh oh!

jjlilley commented Oct 17, 2019

Uh oh!

driazati Oct 18, 2019

Uh oh!

jjlilley Oct 18, 2019

Uh oh!

facebook-github-bot commented Oct 18, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[pytorch] String optimizations related to serialization. #28230

[pytorch] String optimizations related to serialization. #28230

Uh oh!

Conversation

jjlilley commented Oct 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jjlilley commented Oct 17, 2019

Uh oh!

driazati Oct 18, 2019

Choose a reason for hiding this comment

Uh oh!

jjlilley Oct 18, 2019

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Oct 18, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jjlilley commented Oct 17, 2019 •

edited

Loading