Skip to content

Conversation

@jjlilley
Copy link

@jjlilley jjlilley commented Nov 6, 2019

Stack from ghstack:

This change adds a few optimizations to process_group_agent:

  1. Don't add an extra tensor during serialization for the message id,
    but instead put that in the preamble tensor. This saves maybe 15%
    overhead for minimal-sized RPCs.

  2. Add a payload-only fastpath.
    a) For very tiny messages (e.g. acks, status updates), this reduces RPC overhead by roughly 50%, by removing zip file overhead and related setup/copying in torch::load()/torch::save().
    b) If we do end up with large non-tensor payloads, this saves us ~25% in benchmarks, from avoiding copying in torch::save/load.

Differential Revision: D18352261

This change adds a few optimizations to process_group_agent:

 1) Don't add an extra tensor during serialization for the message id,
    but instead put that in the preamble tensor. This saves maybe 15%
    overhead for minimal-sized RPCs.

 2) Add a payload-only fastpath.
    a) For very tiny messages (e.g. acks, status updates), this reduces RPC overhead by roughly 50%, by removing zip file overhead and related setup/copying in torch::load()/torch::save().
    b) If we do end up with large non-tensor payloads, this saves us ~25% in benchmarks, from avoiding copying in torch::save/load.

Differential Revision: [D18352261](https://our.internmc.facebook.com/intern/diff/D18352261/)

[ghstack-poisoned]
jjlilley pushed a commit that referenced this pull request Nov 6, 2019
This change adds a few optimizations to process_group_agent:

 1) Don't add an extra tensor during serialization for the message id,
    but instead put that in the preamble tensor. This saves maybe 15%
    overhead for minimal-sized RPCs.

 2) Add a payload-only fastpath.
    a) For very tiny messages (e.g. acks, status updates), this reduces RPC overhead by roughly 50%, by removing zip file overhead and related setup/copying in torch::load()/torch::save().
    b) If we do end up with large non-tensor payloads, this saves us ~25% in benchmarks, from avoiding copying in torch::save/load.

Differential Revision: [D18352261](https://our.internmc.facebook.com/intern/diff/D18352261/)

ghstack-source-id: 93389049
Pull Request resolved: #29324
Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: if we switch from torch::save/torch::load to jit pickle/unpickle, do we still need to explicitly mark the serialization mode?

This change adds a few optimizations to process_group_agent:

 1) Don't add an extra tensor during serialization for the message id,
    but instead put that in the preamble tensor. This saves maybe 15%
    overhead for minimal-sized RPCs.

 2) Add a payload-only fastpath.
    a) For very tiny messages (e.g. acks, status updates), this reduces RPC overhead by roughly 50%, by removing zip file overhead and related setup/copying in torch::load()/torch::save().
    b) If we do end up with large non-tensor payloads, this saves us ~25% in benchmarks, from avoiding copying in torch::save/load.

Differential Revision: [D18352261](https://our.internmc.facebook.com/intern/diff/D18352261/)

[ghstack-poisoned]
jjlilley pushed a commit that referenced this pull request Nov 7, 2019
Pull Request resolved: #29324

This change adds a few optimizations to process_group_agent:

 1) Don't add an extra tensor during serialization for the message id,
    but instead put that in the preamble tensor. This saves maybe 15%
    overhead for minimal-sized RPCs.

 2) Add a payload-only fastpath.
    a) For very tiny messages (e.g. acks, status updates), this reduces RPC overhead by roughly 50%, by removing zip file overhead and related setup/copying in torch::load()/torch::save().
    b) If we do end up with large non-tensor payloads, this saves us ~25% in benchmarks, from avoiding copying in torch::save/load.
ghstack-source-id: 93413704

Differential Revision: [D18352261](https://our.internmc.facebook.com/intern/diff/D18352261/)
@jjlilley
Copy link
Author

jjlilley commented Nov 7, 2019

Question: if we switch from torch::save/torch::load to jit pickle/unpickle, do we still need to explicitly
mark the serialization mode?

If we decide that there's a better pickling mechanism, we can always get rid of this.

My main goal here was that in case different sorts of encoding payloads have more optimal encoding schemes, that we have an out-of-band (and effectively free) mechanism to specify this.

Right now, if we have a minimally-sized response from an RPC (e.g. an ACK), without any tensors, we expend more effort/bytes encoding this than we need to.

Being able to alternate between torch::save / jit::pickle / payload-only / whatever, depending on the input, seems like it could help some cases.

This change adds a few optimizations to process_group_agent:

 1) Don't add an extra tensor during serialization for the message id,
    but instead put that in the preamble tensor. This saves maybe 15%
    overhead for minimal-sized RPCs.

 2) Add a payload-only fastpath.
    a) For very tiny messages (e.g. acks, status updates), this reduces RPC overhead by roughly 50%, by removing zip file overhead and related setup/copying in torch::load()/torch::save().
    b) If we do end up with large non-tensor payloads, this saves us ~25% in benchmarks, from avoiding copying in torch::save/load.

Differential Revision: [D18352261](https://our.internmc.facebook.com/intern/diff/D18352261/)

[ghstack-poisoned]
jjlilley pushed a commit that referenced this pull request Nov 7, 2019
Pull Request resolved: #29324

This change adds a few optimizations to process_group_agent:

 1) Don't add an extra tensor during serialization for the message id,
    but instead put that in the preamble tensor. This saves maybe 15%
    overhead for minimal-sized RPCs.

 2) Add a payload-only fastpath.
    a) For very tiny messages (e.g. acks, status updates), this reduces RPC overhead by roughly 50%, by removing zip file overhead and related setup/copying in torch::load()/torch::save().
    b) If we do end up with large non-tensor payloads, this saves us ~25% in benchmarks, from avoiding copying in torch::save/load.
ghstack-source-id: 93489064

Differential Revision: [D18352261](https://our.internmc.facebook.com/intern/diff/D18352261/)
This change adds a few optimizations to process_group_agent:

 1) Don't add an extra tensor during serialization for the message id,
    but instead put that in the preamble tensor. This saves maybe 15%
    overhead for minimal-sized RPCs.

 2) Add a payload-only fastpath.
    a) For very tiny messages (e.g. acks, status updates), this reduces RPC overhead by roughly 50%, by removing zip file overhead and related setup/copying in torch::load()/torch::save().
    b) If we do end up with large non-tensor payloads, this saves us ~25% in benchmarks, from avoiding copying in torch::save/load.

Differential Revision: [D18352261](https://our.internmc.facebook.com/intern/diff/D18352261/)

[ghstack-poisoned]
This change adds a few optimizations to process_group_agent:

 1) Don't add an extra tensor during serialization for the message id,
    but instead put that in the preamble tensor. This saves maybe 15%
    overhead for minimal-sized RPCs.

 2) Add a payload-only fastpath.
    a) For very tiny messages (e.g. acks, status updates), this reduces RPC overhead by roughly 50%, by removing zip file overhead and related setup/copying in torch::load()/torch::save().
    b) If we do end up with large non-tensor payloads, this saves us ~25% in benchmarks, from avoiding copying in torch::save/load.

Differential Revision: [D18352261](https://our.internmc.facebook.com/intern/diff/D18352261/)

[ghstack-poisoned]
jjlilley pushed a commit that referenced this pull request Nov 8, 2019
Pull Request resolved: #29324

This change adds a few optimizations to process_group_agent:

 1) Don't add an extra tensor during serialization for the message id,
    but instead put that in the preamble tensor. This saves maybe 15%
    overhead for minimal-sized RPCs.

 2) Add a payload-only fastpath.
    a) For very tiny messages (e.g. acks, status updates), this reduces RPC overhead by roughly 50%, by removing zip file overhead and related setup/copying in torch::load()/torch::save().
    b) If we do end up with large non-tensor payloads, this saves us ~25% in benchmarks, from avoiding copying in torch::save/load.
ghstack-source-id: 93577550

Differential Revision: [D18352261](https://our.internmc.facebook.com/intern/diff/D18352261/)
@jjlilley
Copy link
Author

jjlilley commented Nov 9, 2019

Closing this PR for a bit, there's another approach that I'd like to take...

@jjlilley jjlilley closed this Nov 9, 2019
@facebook-github-bot facebook-github-bot deleted the gh/jjlilley/21/head branch December 9, 2019 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants