Skip to content

Conversation

@yn4bit
Copy link
Contributor

@yn4bit yn4bit commented May 24, 2019

Summary:
Fixed an issue where models can not be loaded in a 32-bit environment like Raspbian.

Summary:
Fixed an issue where models can not be loaded in a 32-bit environment like Raspbian.
@pytorchbot pytorchbot added the module: internals Related to internal abstractions in c10 and ATen label May 24, 2019
@ezyang
Copy link
Contributor

ezyang commented May 24, 2019

Thanks. Is there a way we can test this?

@yn4bit
Copy link
Contributor Author

yn4bit commented May 27, 2019

Thanks. Is there a way we can test this?

Yes. Testing is possible with the Raspberry Pi.

If you load the model like net.load_state_dict (torch.load (Config.model_save_path)) before the patch, the following backtrace will be output.

ValueError Traceback (most recent call last)
in
----> 1 net.load_state_dict(torch.load(Config.model_save_path))
2
/usr/local/lib/python3.5/dist-packages/torch/serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
385 f = f.open('rb')
386 try:
--> 387 return _load(f, map_location, pickle_module, **pickle_load_args)
388 finally:
389 if new_fd:
/usr/local/lib/python3.5/dist-packages/torch/serialization.py in _load(f, map_location, pickle_module, **pickle_load_args)
579 for key in deserialized_storage_keys:
580 assert key in deserialized_objects
--> 581 deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
582 if offset is not None:
583 offset = f.tell()
ValueError: whence value 16552 unsupported

If you apply a patch, the above backtrace will not be output and loading will succeed.

However, it takes 8 hours to build a PyTorch environment with Raspberry Pi3, so it is not reasonable if you do not have it on hand. So I have not tried it, but it might be better to test in a 32-bit Linux virtual environment.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in 5259616.

@philippslang
Copy link
Contributor

philippslang commented Sep 28, 2019

As far as I can tell this will overflow if fd_current_pos doesn't fit in 32 bits, effectively hardcoding this to 32 bit system and limiting the load to <2G files. Am I missing something @ezyang ?

@peterjc123
Copy link
Collaborator

Yes, this is a perfect fix for Unix systems, where sizeof(long)=4 in 32-bit system and sizeof(long)=8 in 64-bit system. But for Windows, sizeof(long)=4 for both 32-bit and 64-system systems. It is actually breaking the Windows build.

@yn4bit
Copy link
Contributor Author

yn4bit commented Sep 30, 2019

In the first place, it seems that the logic to convert from va_arg to the stack for Python in PyObject_CallMethod () does not support LLP64 such as Windows.

The do_mkvalue format string "l" is parsed with PyLong_FromLong (va_arg (* p_va, long)) and depends on the size of the long. Also, even if the format string is "L", it will be cast to long in PyLong_FromLongLong (), so you may not be able to handle files over 2GB on Windows.

https://github.com/python/cpython/blob/master/Python/modsupport.c#L320

@peterjc123
Copy link
Collaborator

@yn4bit But PyLong is not only 32 bit, right? So what's the problem with PyLong_FromLongLong?

@yn4bit
Copy link
Contributor Author

yn4bit commented Sep 30, 2019

I'm sorry. PyLong_FromLongLong () was referring to another file (Misc/coverity_model.c). Looking at Objects/longobject.c, it seems that it is stored in the Python stack in 64bit. Then, this part seems to be able to be corrected by passing long long as shown below.

const auto seek_return = PyObject_CallMethod(file, "seek", "Li", (long long)fd_current_pos, 0);

@peterjc123
Copy link
Collaborator

Would you please make a PR on this?

@yn4bit
Copy link
Contributor Author

yn4bit commented Sep 30, 2019

OK. Wait a minute.

@ezyang
Copy link
Contributor

ezyang commented Sep 30, 2019

@plang85 thank you for catching this before release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: internals Related to internal abstractions in c10 and ATen open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants