Load tensors directly from pickle archive #23281

zdevito · 2019-07-24T00:01:41Z

Stack from ghstack:

Load tensors directly from pickle archive #23281 Load tensors directly from pickle archive

This allows tensors to be loaded directly from the pickle archive,
using the same encoding as the eager serializer. The one difference
is that the tensor data is still stored in a zip file rather
than appended to the archive, which still allows for aligning the tensor data.

With this change, we no longer rely on the json file for any information, so this PR also stops writing the json file and loads modules directly from the pickle.

New layout is:

code/ - all code goes in these folders with python names
constants{.pkl,/} - pickled code constants, these appear in the code so they cannot go in data, otherwise it would create a circular dependency.
data{.pkl,/} - pickled data, with tensors in the data/ folder.

Differential Revision: D16452815

This allows tensors to be loaded directly from the pickle archive, using the same encoding as the eager serializer. The one difference is that the tensor data is still stored in a zip file rather than appended to the archive.

[WIP] Load tensors directly from pickle archive This allows tensors to be loaded directly from the pickle archive, using the same encoding as the eager serializer. The one difference is that the tensor data is still stored in a zip file rather than appended to the archive. gh-metadata: pytorch pytorch 23281 gh/zdevito/78/head

dzhulgakov · 2019-07-29T16:21:35Z

torch/csrc/jit/pickler.cpp

+          result = autograd::make_variable(result, requires_grad);
+          stack_.push_back(std::move(result));
+        });
+      } else if (module_name == "collections" && class_name == "OrderedDict") {


this is a bit scary. Is there are stricter condition we can enforce here (e.g. that it's indeed in Tensor deserialization only)

dzhulgakov · 2019-07-29T16:23:15Z

torch/csrc/jit/pickler.cpp

          }
        });
+      } else if (
+          module_name == "torch._utils" && class_name == "_rebuild_tensor_v2") {


would be nice to extract to a subfunction and explicitly document in both python and C++ that those functions should match semantically

torch/csrc/jit/pickler.cpp

Load tensors directly from pickle archive This allows tensors to be loaded directly from the pickle archive, using the same encoding as the eager serializer. The one difference is that the tensor data is still stored in a zip file rather than appended to the archive. gh-metadata: pytorch pytorch 23281 gh/zdevito/78/head

suo · 2019-08-20T06:21:06Z

torch/csrc/jit/export.cpp

+
+  ScriptModuleSerializer2(std::ostream* ofs) : ofs_(), writer_(ofs) {}
+
+  void writeArchive(const std::string& archive_name, const IValue& value) {


Doesn't look like this needs to be public?

suo · 2019-08-20T06:26:02Z

torch/csrc/jit/pickle.h

 /// `bounds_checker` is a function that returns `true` if the reader can read
 /// more data, and `false` if it cannot (i.e. if a stream has hit its end of
 /// file)
 ///


This comment is now out of date, references the bounds_checker arg

suo · 2019-08-20T06:31:11Z

torch/csrc/jit/pickler.h


 public:
+  // tensors inside the pickle are references to the tensor_table
  Unpickler(


Is this version legacy-only? If so, we should note it in the comments

It is possible that RPC will be using this version, and completely handling tensor serialization on its own. Not sure though.

suo · 2019-08-20T21:00:23Z

torch/csrc/jit/pickler.cpp

+          module_name == "torch._utils" &&
+          (class_name == "_rebuild_tensor_v2" ||
+           class_name == "_rebuild_qtensor")) {
+        bool quantized = class_name == "_rebuild_qtensor";


ugh, I don't like that quantized tensors are somehow special case for the serialization format

I agree, but this is how it works in eager serialization, so we are stuck with it if we want torch.save from torchscript to work with torch.load

suo · 2019-08-20T21:01:54Z

torch/csrc/jit/pickle.h

 /// file)
 ///
 /// See `torch::pickle` for details.
 TORCH_API IValue unpickle(


In general we should think a bit about how we want the Pickle C++ API to look at the end of this. @driazati, any thoughts? At least I think we should get rid of tensor_table if possible

Any use of the tensor_table would be covered by pickle's persistent_id/load functions (which we could implement if we need to in the future). This is already how eager's serialization works anyways, so it's fine to get rid of the tensor_table

driazati · 2019-08-20T21:26:15Z

torch/csrc/jit/export.cpp

+
+  void writeArchive(const std::string& archive_name, const IValue& value) {
+    std::vector<char> data;
+    Pickler data_pickle(


It looks like jit::pickle(...) would be fine here instead of the manual use of the Pickler

Right after this it uses data_pickle.tensorData which is not exposed by those functions.

driazati · 2019-08-20T21:32:08Z

torch/csrc/jit/import.cpp

+    ss << archive_name << "/" << name;
+    return std::get<0>(reader_->getRecord(ss.str()));
+  };
+  Unpickler unpickler(


Same here, this can be jit::unpickle

user-exposed pickle functions do not take the read_record or device_ arguments needed here to load the tensors.

Load tensors directly from pickle archive This allows tensors to be loaded directly from the pickle archive, using the same encoding as the eager serializer. The one difference is that the tensor data is still stored in a zip file rather than appended to the archive. gh-metadata: pytorch pytorch 23281 gh/zdevito/78/head

facebook-github-bot · 2019-08-22T22:07:43Z

@zdevito merged this pull request in e2cccce.

[WIP] Load tensors directly from pickle archive

a32cfe1

This allows tensors to be loaded directly from the pickle archive, using the same encoding as the eager serializer. The one difference is that the tensor data is still stored in a zip file rather than appended to the archive.

pytorchbot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Jul 24, 2019

This was referenced Jul 24, 2019

Memoize storages in pickler #23262

Closed

Fix pickler bug where it would not load if no tensors were saved #23263

Closed

Switch keys to be sequential and stable in pickle serialization #23280

Closed

driazati mentioned this pull request Jul 24, 2019

[jit] Add Pickler C++ API #23241

Closed

dzhulgakov reviewed Jul 29, 2019

View reviewed changes

zdevito requested review from apaszke, mrshenli and pietern as code owners August 20, 2019 03:51

pytorchbot added caffe2 oncall: distributed Add this issue/PR to distributed oncall triage queue labels Aug 20, 2019

zdevito requested a review from suo August 20, 2019 03:59

zdevito changed the title ~~[WIP] Load tensors directly from pickle archive~~ Load tensors directly from pickle archive Aug 20, 2019

suo approved these changes Aug 20, 2019

View reviewed changes

driazati reviewed Aug 20, 2019

View reviewed changes

zdevito mentioned this pull request Aug 21, 2019

[jit] Fix bugs in assignment to optionals #24989

Closed

facebook-github-bot closed this in e2cccce Aug 22, 2019

zou3519 deleted the gh/zdevito/78/head branch August 22, 2019 18:48

facebook-github-bot added the merged label Aug 22, 2019

mruberry added the Merged label Oct 28, 2020


		ScriptModuleSerializer2(std::ostream* ofs) : ofs_(), writer_(ofs) {}

		void writeArchive(const std::string& archive_name, const IValue& value) {

Load tensors directly from pickle archive #23281

Load tensors directly from pickle archive #23281

Uh oh!

Conversation

zdevito commented Jul 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

driazati Aug 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Aug 22, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

zdevito commented Jul 24, 2019 •

edited

Loading

driazati Aug 20, 2019 •

edited

Loading