UnpicklingError when trying to load multiple objects from a file

## 🐛 UnpicklingError when trying to load multiple objects from a file

Pickle allows to dump multiple self-contained objects into the same file and later load them through subsequent reads. The pickling mechanism from PyTorch has the same behavior when using an in-memory buffer like `io.BytesIO`, but raises an error when using regular files.

## To Reproduce
This is the behavior of the standard `pickle` module:

  ```python
import io
import torch
import pickle

b=open('/tmp/file.pt', 'wb')
for i in range(3):
    was_at = b.tell()
    pickle.dump(torch.ones(10), b)
    print(f'{i}: {was_at:04d}-{b.tell():04d} ({b.tell()-was_at:03d})')
b.close()
  ```
  ```
>>> 0: 0000-0427 (427)
>>> 1: 0427-0854 (427)
>>> 2: 0854-1281 (427)
  ```
  ```python
i = 0
b=open('/tmp/file.pt', 'rb')
while True:
    try:
        was_at = b.tell()
        pickle.load(b)
        print(f'{i}: {was_at:04d}-{b.tell():04d} ({b.tell()-was_at:03d})')
        i+=1
    except EOFError:
        break
b.close()
  ```
  ```
>>> 0: 0000-0427 (427)
>>> 1: 0427-0854 (427)
>>> 2: 0854-1281 (427)    
  ```

PyTorch works fine with `io.BytesIO`, I get the same behavior:

  ```python
b=io.BytesIO()
for i in range(3):
    was_at = b.tell()
    torch.save(torch.ones(10), b)
    print(f'{i}: {was_at:04d}-{b.tell():04d} ({b.tell()-was_at:03d})')
  ```
  ```
>>> 0: 0000-0377 (377)
>>> 1: 0377-0754 (377)
>>> 2: 0754-1131 (377)
  ```
  ```python
i = 0
b.seek(0)
while True:
    try:
        was_at = b.tell()
        torch.load(b)
        print(f'{i}: {was_at:04d}-{b.tell():04d} ({b.tell()-was_at:03d})')
        i+=1
    except EOFError:
        break
  ```
  ```
>>> 0: 0000-0377 (377)
>>> 1: 0377-0754 (377)
>>> 2: 0754-1131 (377)
  ```

However, `UnpicklingError` is raised when using the serialization methods from PyTorch on a regular file:

  ```python
b=open('/tmp/file.pt', 'wb')
for i in range(3):
    was_at = b.tell()
    torch.save(torch.ones(10), b)
    print(f'{i}: {was_at:04d}-{b.tell():04d} ({b.tell()-was_at:03d})')
b.close()
  ```
  ```
>>> 0: 0000-0377 (377)
>>> 1: 0377-0754 (377)
>>> 2: 0754-1131 (377)
  ```
  ```python
i = 0
b=open('/tmp/file.pt', 'rb')
while True:
    try:
        was_at = b.tell()
        torch.load(b)
        print(f'{i}: {was_at:04d}-{b.tell():04d} ({b.tell()-was_at:03d})')
        i+=1
    except EOFError:
        break
b.close()
  ```
  ```
>>> 0: 0000--425 (-425) 
---------------------------------------------------------------------------
UnpicklingError                           Traceback (most recent call last)
<ipython-input-38-a8789bdba75a> in <module>
     12     try:
     13         was_at = b.tell()
---> 14         torch.load(b)
     15         print(f'{i}: {was_at:04d}-{b.tell():04d} ({b.tell()-was_at:03d})')
     16         i+=1

.../python3.7/site-packages/torch/serialization.py in load(f, map_location, pickle_module)
    366         f = open(f, 'rb')
    367     try:
--> 368         return _load(f, map_location, pickle_module)
    369     finally:
    370         if new_fd:

.../python3.7/site-packages/torch/serialization.py in _load(f, map_location, pickle_module)
    530             f.seek(0)
    531 
--> 532     magic_number = pickle_module.load(f)
    533     if magic_number != MAGIC_NUMBER:
    534         raise RuntimeError("Invalid magic number; corrupt file?")

UnpicklingError: invalid load key, '\x0a'.
  ```

Note how the read location inside the file given by `b.tell()` results to be negative: `-425`.

## Expected behavior

The serialization methods of Pickle and Pytorch should work in similar ways.

## Environment
```
PyTorch version: 1.0.1.post2
Is debug build: No
CUDA used to build PyTorch: 10.0.130

OS: Ubuntu 18.04.2 LTS
GCC version: (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0
CMake version: Could not collect

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1
GPU models and configuration: GPU 0: GeForce GTX 1050 Ti with Max-Q Design
Nvidia driver version: 418.43
cuDNN version: Could not collect

Versions of relevant libraries:
[pip] 19.0.3
[conda] pytorch                   1.0.1           py3.7_cuda10.0.130_cudnn7.4.2_2    pytorch
```

## Additional context

The main reason I want to serialize multiple objects individually rather than packing them in a list is because they represent inputs that might be created at different times from different processes and that I still want to process in batch.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UnpicklingError when trying to load multiple objects from a file #18436

🐛 UnpicklingError when trying to load multiple objects from a file

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

UnpicklingError when trying to load multiple objects from a file #18436

Description

🐛 UnpicklingError when trying to load multiple objects from a file

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions