Skip to content

Conversation

@jjlilley
Copy link

@jjlilley jjlilley commented Oct 17, 2019

Stack from ghstack:

This change improves the pickling small data benchmark by roughly 30%
(25.8usec -> 18.05usec on my machine, with pending zip changes also applied).

One of the main issues was that we were spending 25%+ of the cpu profile
time in std::[o]stringstream constructors alone.

Two main parts

  • Change some std::stringstream to std::ostringstream, when they
    showed up on hot-ish paths, and it was trivial to convert them.

    Roughly 27% of the std::stringstream constructor time is spent
    building the constituent std::basic_istream. If the istream isn't
    needed, don't construct it.

  • For a couple of very hot paths (e.g. Pickler::pushGlobal), just
    convert to traditional string::append(). std::ostringstream is
    convenient, but not particularly efficient.

Differential Revision: D17982181

This change improves the pickling small data benchmark by roughly 20%.

One of the main issues was that we were spending 25%+ of the cpu profile
time in std::[o]stringstream constructors alone.

Two main parts
 - Change some std::stringstream to std::ostringstream, when they
   showed up on hot-ish paths, and it was trivial to convert them.

   Roughly 27% of the std::stringstream constructor time is spent
   building the constituent std::basic_istream. If the istream isn't
   needed, don't construct it.

 - For a couple of very hot paths (e.g. Pickler::pushGlobal), just
   convert to traditional string::append(). std::ostringstream is
   convenient, but not particularly efficient.

Differential Revision: [D17982181](https://our.internmc.facebook.com/intern/diff/D17982181/)

[ghstack-poisoned]
@jjlilley jjlilley requested a review from apaszke as a code owner October 17, 2019 17:03
@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Oct 17, 2019
This change improves the pickling small data benchmark by roughly 20%.

One of the main issues was that we were spending 25%+ of the cpu profile
time in std::[o]stringstream constructors alone.

Two main parts
 - Change some std::stringstream to std::ostringstream, when they
   showed up on hot-ish paths, and it was trivial to convert them.

   Roughly 27% of the std::stringstream constructor time is spent
   building the constituent std::basic_istream. If the istream isn't
   needed, don't construct it.

 - For a couple of very hot paths (e.g. Pickler::pushGlobal), just
   convert to traditional string::append(). std::ostringstream is
   convenient, but not particularly efficient.

Differential Revision: [D17982181](https://our.internmc.facebook.com/intern/diff/D17982181/)

[ghstack-poisoned]
This change improves the pickling small data benchmark by roughly 30%
(25.8usec -> 18.05usec on my machine).

One of the main issues was that we were spending 25%+ of the cpu profile
time in std::[o]stringstream constructors alone.

Two main parts
 - Change some std::stringstream to std::ostringstream, when they
   showed up on hot-ish paths, and it was trivial to convert them.

   Roughly 27% of the std::stringstream constructor time is spent
   building the constituent std::basic_istream. If the istream isn't
   needed, don't construct it.

 - For a couple of very hot paths (e.g. Pickler::pushGlobal), just
   convert to traditional string::append(). std::ostringstream is
   convenient, but not particularly efficient.

Differential Revision: [D17982181](https://our.internmc.facebook.com/intern/diff/D17982181/)

[ghstack-poisoned]
jjlilley pushed a commit that referenced this pull request Oct 17, 2019
Pull Request resolved: #28230

This change improves the pickling small data benchmark by roughly 30%.
(25.8usec -> 18.05usec).

One of the main issues was that we were spending 25%+ of the cpu profile
time in std::[o]stringstream constructors alone.

Two main parts
 - Change some std::stringstream to std::ostringstream, when they
   showed up on hot-ish paths, and it was trivial to convert them.
   Roughly 27% of the std::stringstream constructor time is spent
   building the constituent std::basic_istream. If the istream isn't
   needed, don't construct it.

 - For a couple of very hot paths (e.g. Pickler::pushGlobal), just
   convert to traditional string::append(). std::ostringstream is
   convenient, but not particularly efficient.
ghstack-source-id: 92114972

Differential Revision: [D17982181](https://our.internmc.facebook.com/intern/diff/D17982181/)
@jjlilley jjlilley requested review from driazati and zdevito October 17, 2019 18:51
@jjlilley
Copy link
Author

(fwiw, pprof trace is attached to D17982181; this diff should be ready for a look, and gets the % of time spent in just std::stringstream-related constructors to under 10%, from 25%+. I don't think we need wholesale std::stringstream changes, but tweaking the hotter paths seems sensible).

This change improves the pickling small data benchmark by roughly 30%
(25.8usec -> 18.05usec on my machine, with pending zip changes also applied).

One of the main issues was that we were spending 25%+ of the cpu profile
time in std::[o]stringstream constructors alone.

Two main parts
 - Change some std::stringstream to std::ostringstream, when they
   showed up on hot-ish paths, and it was trivial to convert them.

   Roughly 27% of the std::stringstream constructor time is spent
   building the constituent std::basic_istream. If the istream isn't
   needed, don't construct it.

 - For a couple of very hot paths (e.g. Pickler::pushGlobal), just
   convert to traditional string::append(). std::ostringstream is
   convenient, but not particularly efficient.

Differential Revision: [D17982181](https://our.internmc.facebook.com/intern/diff/D17982181/)

[ghstack-poisoned]
jjlilley pushed a commit that referenced this pull request Oct 17, 2019
Pull Request resolved: #28230

This change improves the pickling small data benchmark by roughly 30%.
(25.8usec -> 18.05usec).

One of the main issues was that we were spending 25%+ of the cpu profile
time in std::[o]stringstream constructors alone.

Two main parts
 - Change some std::stringstream to std::ostringstream, when they
   showed up on hot-ish paths, and it was trivial to convert them.
   Roughly 27% of the std::stringstream constructor time is spent
   building the constituent std::basic_istream. If the istream isn't
   needed, don't construct it.

 - For a couple of very hot paths (e.g. Pickler::pushGlobal), just
   convert to traditional string::append(). std::ostringstream is
   convenient, but not particularly efficient.
ghstack-source-id: 92153103

Differential Revision: [D17982181](https://our.internmc.facebook.com/intern/diff/D17982181/)
data_type << toString(tensor.scalar_type()) << "Storage";
pushGlobal("torch", data_type.str());
std::string data_type =
std::string(toString(tensor.scalar_type())).append("Storage");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any difference between append and +?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

append() is going to be slightly more efficient because it assumes we can clobber the lhs string, unlike operator+() which maintains the input lhs and rhs args.

(operator+() unconditionally copies both lhs and rhs into a newly-created string. append() just needs to ensure the buffer is big enough and then copies in the rhs. And for small strings like these, if the string size is within the default in-place buffer, there's a chance no resizing is needed)

@zdevito zdevito removed their request for review October 18, 2019 03:50
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 3d74550.

@facebook-github-bot facebook-github-bot deleted the gh/jjlilley/7/head branch October 28, 2019 22:16
thiagocrepaldi pushed a commit to thiagocrepaldi/pytorch that referenced this pull request Feb 4, 2020
Summary:
Pull Request resolved: pytorch#28230

This change improves the pickling small data benchmark by roughly 30%.
(25.8usec -> 18.05usec).

One of the main issues was that we were spending 25%+ of the cpu profile
time in std::[o]stringstream constructors alone.

Two main parts
 - Change some std::stringstream to std::ostringstream, when they
   showed up on hot-ish paths, and it was trivial to convert them.
   Roughly 27% of the std::stringstream constructor time is spent
   building the constituent std::basic_istream. If the istream isn't
   needed, don't construct it.

 - For a couple of very hot paths (e.g. Pickler::pushGlobal), just
   convert to traditional string::append(). std::ostringstream is
   convenient, but not particularly efficient.
ghstack-source-id: 92153103

Test Plan:
Benchmarking: buck build mode/opt experimental/jeremyl/c2:SerializationBench
  Correctness: buck test mode/dev-nosan caffe2/test/...

Differential Revision: D17982181

fbshipit-source-id: 7fd4d267293231244c10c1e5b8f4951a7a3d852f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants