Skip to content

Conversation

@driazati
Copy link
Contributor

@driazati driazati commented Aug 30, 2019

The Pickler previously had a distinction between tensors that would be inlined in 1 pickle binary (matching the format of torch.save()) and tensors that are saved elsewhere with only a reference stored in the binary. This PR moves that distinction out to torch::pickle_save to match the eager Python interface.

The change can be seen in register_prim_ops.cpp where the call to jit::pickle is now torch::pickle_save

Differential Revision: D17175215

@pytorchbot pytorchbot added caffe2 oncall: jit Add this issue/PR to JIT oncall triage queue module: build Build system issues module: cpp Related to C++ API labels Aug 30, 2019
@driazati driazati changed the title [jit][wip][skip ci] Remove torch.save stuff from pickler [jit] Remove torch.save-related logic from pickler Sep 4, 2019
@pytorchbot pytorchbot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Sep 4, 2019
@driazati driazati requested a review from zdevito September 4, 2019 00:12
Copy link
Contributor

@zdevito zdevito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes that move torch::save out are good. I do not think we should be removing tensor references at the same time. They are the most appropriate way to for the RPC world to be dealing with tensor data. In particular if the tensors are on GPUs there may be a more efficient pathway to copy them, but the logic inside the pickler is going force a copy to the cpu and a copy back. It would be best to split the changes for torch::save from changes to tensor references and revisit how pickler RPC should work.

archive.save_to(std::forward<SaveToArgs>(args)...);
}

TORCH_API std::vector<char> save(const torch::IValue& ivalue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the C++ Module API. I am not sure this is the right place to put the save function. It might be more appropriately defined where torch::load for modules is defined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this API doesn't seem to belong here - it returns std::vector<char>, while all other torch::save APIs don't return anything and expect to be used by torch::save(value, filepath). I'd suggest naming this function pickle_save or similar, so that C++ API end users won't be confused.

btw, do we have this function in Python API? Or is it an implementation detail that would only be used by JIT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function (renamed to pickle_save) should produce the same output as torch.save() in Python, so there should be a similar API here too. A rename is fine given the clash with the existing torch::save/torch::load. It's intended to be used as an API function e.g. #25591

@pietern
Copy link
Contributor

pietern commented Sep 5, 2019

Thanks for flagging, @zdevito. I agree it would be great if the copy to CPU is done as part of torch::save instead of in the pickler (with a .cpu() on every tensor before writing it, for example, or an fmap beforehand).

What I don't yet fully get here is the tradeoff between storing a tensor instead of a storage object in WriteableStorageData. It looks like keeping a storage object here is enough, as long as the metadata of a tensor (dtype, sizes, strides) is taken care of by the pickler, as part of the reference to the storage. That way, we only need to deal with a c10::Device, a pointer, and a size in bytes. If I'm not mistaken, this metadata must be taken care of outside the pickler, since the pickler only has a reference to this WriteableStorageData entry. Am I reading this wrong?

Also cc @lerks since we had a discussion about this.

@pietern
Copy link
Contributor

pietern commented Sep 9, 2019

@driazati Looking at the recent changes, do you now plan to indeed keep the tensor table around, and always serialize their metadata (since pushTensorReference is now gone)?

If so, the comment in the summary should be updated.

@driazati driazati requested a review from zdevito September 11, 2019 18:02
@driazati driazati requested a review from zdevito September 13, 2019 20:19
@facebook-github-bot
Copy link
Contributor

@driazati merged this pull request in 61197e9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

caffe2 Merged module: build Build system issues module: cpp Related to C++ API oncall: distributed Add this issue/PR to distributed oncall triage queue oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants