Skip to content

Conversation

@yn4bit
Copy link
Contributor

@yn4bit yn4bit commented Sep 30, 2019

Fixes #26998

Yoshiaki Nakamura and others added 2 commits September 30, 2019 12:54
Change-Id: I71aa665ecc7afae16c3b553ed73ec3c11797ac30
Summary:
Fixed seek offset size to 64bit.

Change-Id: Ie4a5a4713c4e0b7b8c31f9b79db06f7bcd236ce0
@pytorchbot pytorchbot added the module: internals Related to internal abstractions in c10 and ATen label Sep 30, 2019
@peterjc123 peterjc123 requested a review from ezyang September 30, 2019 04:11
@peterjc123
Copy link
Collaborator

Could you please add a test as well?

Change-Id: I76fcfec78fa990dfd1f716b9c82ac40aad9d1507
@yn4bit
Copy link
Contributor Author

yn4bit commented Sep 30, 2019

@peterjc123
I tried to modify the existing test to check read / write of files over 2GB. Any concerns with this approach?

@peterjc123
Copy link
Collaborator

@yn4bit That's great! Thank you.

@ezyang ezyang added this to the 1.3 milestone Sep 30, 2019
@ezyang
Copy link
Contributor

ezyang commented Sep 30, 2019

It's a little bad since Windows tests are failing on master but I'm going to optimistically go ahead and land this.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

peterjc123 pushed a commit to peterjc123/pytorch that referenced this pull request Sep 30, 2019
Summary:
Fixes pytorch#26998
Pull Request resolved: pytorch#27047

Differential Revision: D17666050

Pulled By: ezyang

fbshipit-source-id: f02ebd5320ae25f8949be20d0744fe3cd3e2fee9
(cherry picked from commit 1afe3fc)
@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in 1afe3fc.

@ezyang
Copy link
Contributor

ezyang commented Sep 30, 2019

There's something wrong with the tests:


16:36:07 ======================================================================
16:36:07 ERROR: test_serialization_offset (__main__.TestTorch)
16:36:07 ----------------------------------------------------------------------
16:36:07 Traceback (most recent call last):
16:36:07   File "test_torch.py", line 5034, in test_serialization_offset
16:36:07     b_loaded = torch.load(f)
16:36:07   File "C:\Jenkins\workspace\pytorch-builds\pytorch-win-ws2016-cuda9-cudnn7-py3-test2\build\win_tmp\build\torch\serialization.py", line 426, in load
16:36:07     return _load(f, map_location, pickle_module, **pickle_load_args)
16:36:07   File "C:\Jenkins\workspace\pytorch-builds\pytorch-win-ws2016-cuda9-cudnn7-py3-test2\build\win_tmp\build\torch\serialization.py", line 620, in _load
16:36:07     deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
16:36:07   File "C:\Jenkins\Miniconda3\lib\tempfile.py", line 485, in func_wrapper
16:36:07     return func(*args, **kwargs)
16:36:07 OSError: [Errno 22] Invalid argument
16:36:07 

@ezyang
Copy link
Contributor

ezyang commented Sep 30, 2019

I'm going to revert this for now until we figure out what exactly the problem is

@peterjc123
Copy link
Collaborator

peterjc123 commented Oct 1, 2019

@yn4bit This one gets reverted, could you please figure out a new one?

Update: Let me do this for you.

// the file descriptor is returned to original position and
// the file handle at python call-site needs updating to the
// advanced postion
const auto fd_current_pos = lseek(fd, 0, SEEK_CUR);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading the MSVC document here, looks like we need to change the function too.

peterjc123 pushed a commit to peterjc123/pytorch that referenced this pull request Oct 1, 2019
Summary:
Fixes pytorch#26998
Pull Request resolved: pytorch#27047

Differential Revision: D17666050

Pulled By: ezyang

fbshipit-source-id: f02ebd5320ae25f8949be20d0744fe3cd3e2fee9
(cherry picked from commit 1afe3fc)
(cherry picked from commit 5846008)
@yn4bit
Copy link
Contributor Author

yn4bit commented Oct 1, 2019

@peterjc123 Thank you for your comment!

I think lseek is certainly suspicious. Python portable_lseek () is as follows, so I think that it needs to be changed.

#ifdef MS_WINDOWS
     res = _lseeki64(fd, pos, whence);
#else
     res = lseek(fd, pos, whence);
#endif

However, since OSError has finally occurred, I think that the cause of the direct error is the value passed to the Python seek or a restriction inside Python. I would like to correct this after analyzing the generation mechanism, but there is no Windows development environment at hand ...

@peterjc123
Copy link
Collaborator

peterjc123 commented Oct 1, 2019

@yn4bit I'm testing the fix at #27125. You could comment there if you have more thoughts.

@yn4bit
Copy link
Contributor Author

yn4bit commented Oct 1, 2019

@peterjc123 Thank you for your testing. Wow, this is a surprise to solve the problem. I don't understand the behavior of Windows ...
I checked the patch and it's looks good to me.

soumith pushed a commit that referenced this pull request Oct 4, 2019
* Fixed seek offset size to 64bit. (#27047)

Summary:
Fixes #26998
Pull Request resolved: #27047

Differential Revision: D17666050

Pulled By: ezyang

fbshipit-source-id: f02ebd5320ae25f8949be20d0744fe3cd3e2fee9
(cherry picked from commit 1afe3fc)

* Use _lseeki64 instead for MSVC

(cherry picked from commit f49f78d)
pdlive215 pushed a commit to pdlive215/pytorch that referenced this pull request Nov 27, 2019
Summary:
Fixes pytorch#26998
Pull Request resolved: pytorch#27047

Differential Revision: D17666050

Pulled By: ezyang

fbshipit-source-id: f02ebd5320ae25f8949be20d0744fe3cd3e2fee9
thiagocrepaldi pushed a commit to thiagocrepaldi/pytorch that referenced this pull request Feb 4, 2020
Summary:
Fixes pytorch#26998
Pull Request resolved: pytorch#27047

Differential Revision: D17666050

Pulled By: ezyang

fbshipit-source-id: f02ebd5320ae25f8949be20d0744fe3cd3e2fee9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: internals Related to internal abstractions in c10 and ATen open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

torch.load() - OSError on a file larger than 2GB

6 participants