-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[pytorch] String optimizations related to serialization. #28230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This change improves the pickling small data benchmark by roughly 20%. One of the main issues was that we were spending 25%+ of the cpu profile time in std::[o]stringstream constructors alone. Two main parts - Change some std::stringstream to std::ostringstream, when they showed up on hot-ish paths, and it was trivial to convert them. Roughly 27% of the std::stringstream constructor time is spent building the constituent std::basic_istream. If the istream isn't needed, don't construct it. - For a couple of very hot paths (e.g. Pickler::pushGlobal), just convert to traditional string::append(). std::ostringstream is convenient, but not particularly efficient. Differential Revision: [D17982181](https://our.internmc.facebook.com/intern/diff/D17982181/) [ghstack-poisoned]
This change improves the pickling small data benchmark by roughly 20%. One of the main issues was that we were spending 25%+ of the cpu profile time in std::[o]stringstream constructors alone. Two main parts - Change some std::stringstream to std::ostringstream, when they showed up on hot-ish paths, and it was trivial to convert them. Roughly 27% of the std::stringstream constructor time is spent building the constituent std::basic_istream. If the istream isn't needed, don't construct it. - For a couple of very hot paths (e.g. Pickler::pushGlobal), just convert to traditional string::append(). std::ostringstream is convenient, but not particularly efficient. Differential Revision: [D17982181](https://our.internmc.facebook.com/intern/diff/D17982181/) [ghstack-poisoned]
This change improves the pickling small data benchmark by roughly 30% (25.8usec -> 18.05usec on my machine). One of the main issues was that we were spending 25%+ of the cpu profile time in std::[o]stringstream constructors alone. Two main parts - Change some std::stringstream to std::ostringstream, when they showed up on hot-ish paths, and it was trivial to convert them. Roughly 27% of the std::stringstream constructor time is spent building the constituent std::basic_istream. If the istream isn't needed, don't construct it. - For a couple of very hot paths (e.g. Pickler::pushGlobal), just convert to traditional string::append(). std::ostringstream is convenient, but not particularly efficient. Differential Revision: [D17982181](https://our.internmc.facebook.com/intern/diff/D17982181/) [ghstack-poisoned]
Pull Request resolved: #28230 This change improves the pickling small data benchmark by roughly 30%. (25.8usec -> 18.05usec). One of the main issues was that we were spending 25%+ of the cpu profile time in std::[o]stringstream constructors alone. Two main parts - Change some std::stringstream to std::ostringstream, when they showed up on hot-ish paths, and it was trivial to convert them. Roughly 27% of the std::stringstream constructor time is spent building the constituent std::basic_istream. If the istream isn't needed, don't construct it. - For a couple of very hot paths (e.g. Pickler::pushGlobal), just convert to traditional string::append(). std::ostringstream is convenient, but not particularly efficient. ghstack-source-id: 92114972 Differential Revision: [D17982181](https://our.internmc.facebook.com/intern/diff/D17982181/)
|
(fwiw, pprof trace is attached to D17982181; this diff should be ready for a look, and gets the % of time spent in just std::stringstream-related constructors to under 10%, from 25%+. I don't think we need wholesale std::stringstream changes, but tweaking the hotter paths seems sensible). |
This change improves the pickling small data benchmark by roughly 30% (25.8usec -> 18.05usec on my machine, with pending zip changes also applied). One of the main issues was that we were spending 25%+ of the cpu profile time in std::[o]stringstream constructors alone. Two main parts - Change some std::stringstream to std::ostringstream, when they showed up on hot-ish paths, and it was trivial to convert them. Roughly 27% of the std::stringstream constructor time is spent building the constituent std::basic_istream. If the istream isn't needed, don't construct it. - For a couple of very hot paths (e.g. Pickler::pushGlobal), just convert to traditional string::append(). std::ostringstream is convenient, but not particularly efficient. Differential Revision: [D17982181](https://our.internmc.facebook.com/intern/diff/D17982181/) [ghstack-poisoned]
Pull Request resolved: #28230 This change improves the pickling small data benchmark by roughly 30%. (25.8usec -> 18.05usec). One of the main issues was that we were spending 25%+ of the cpu profile time in std::[o]stringstream constructors alone. Two main parts - Change some std::stringstream to std::ostringstream, when they showed up on hot-ish paths, and it was trivial to convert them. Roughly 27% of the std::stringstream constructor time is spent building the constituent std::basic_istream. If the istream isn't needed, don't construct it. - For a couple of very hot paths (e.g. Pickler::pushGlobal), just convert to traditional string::append(). std::ostringstream is convenient, but not particularly efficient. ghstack-source-id: 92153103 Differential Revision: [D17982181](https://our.internmc.facebook.com/intern/diff/D17982181/)
| data_type << toString(tensor.scalar_type()) << "Storage"; | ||
| pushGlobal("torch", data_type.str()); | ||
| std::string data_type = | ||
| std::string(toString(tensor.scalar_type())).append("Storage"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any difference between append and +?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
append() is going to be slightly more efficient because it assumes we can clobber the lhs string, unlike operator+() which maintains the input lhs and rhs args.
(operator+() unconditionally copies both lhs and rhs into a newly-created string. append() just needs to ensure the buffer is big enough and then copies in the rhs. And for small strings like these, if the string size is within the default in-place buffer, there's a chance no resizing is needed)
|
This pull request has been merged in 3d74550. |
Summary: Pull Request resolved: pytorch#28230 This change improves the pickling small data benchmark by roughly 30%. (25.8usec -> 18.05usec). One of the main issues was that we were spending 25%+ of the cpu profile time in std::[o]stringstream constructors alone. Two main parts - Change some std::stringstream to std::ostringstream, when they showed up on hot-ish paths, and it was trivial to convert them. Roughly 27% of the std::stringstream constructor time is spent building the constituent std::basic_istream. If the istream isn't needed, don't construct it. - For a couple of very hot paths (e.g. Pickler::pushGlobal), just convert to traditional string::append(). std::ostringstream is convenient, but not particularly efficient. ghstack-source-id: 92153103 Test Plan: Benchmarking: buck build mode/opt experimental/jeremyl/c2:SerializationBench Correctness: buck test mode/dev-nosan caffe2/test/... Differential Revision: D17982181 fbshipit-source-id: 7fd4d267293231244c10c1e5b8f4951a7a3d852f
Stack from ghstack:
This change improves the pickling small data benchmark by roughly 30%
(25.8usec -> 18.05usec on my machine, with pending zip changes also applied).
One of the main issues was that we were spending 25%+ of the cpu profile
time in std::[o]stringstream constructors alone.
Two main parts
Change some std::stringstream to std::ostringstream, when they
showed up on hot-ish paths, and it was trivial to convert them.
Roughly 27% of the std::stringstream constructor time is spent
building the constituent std::basic_istream. If the istream isn't
needed, don't construct it.
For a couple of very hot paths (e.g. Pickler::pushGlobal), just
convert to traditional string::append(). std::ostringstream is
convenient, but not particularly efficient.
Differential Revision: D17982181