Skip to content

Conversation

@syed-ahmed
Copy link
Collaborator

@syed-ahmed syed-ahmed commented Jun 3, 2019

Stack from ghstack:

Resubmit of #20886

Summary:

This PR removes curandStateMTGP32 usages since it's not stream-safe.
Main changes are:

  • It modifies THCTensor_(getRNGState) and THCTensor_(setRNGState) to not read/write curandStateMTGP anymore.
  • It modifies RRelu.cu and cuda multinomial kernels to use curandStatePhilox
  • It deletes new_state.clone() from torch.cuda.random.py to get a performance boost.

Differential Revision: D15632929

@pytorchbot pytorchbot added module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen module: operators labels Jun 3, 2019
Remove curandStateMTGP32 usage

gh-metadata: pytorch pytorch 21301 gh/syed-ahmed/13/head
@ryanchesler
Copy link

Sorry if this is a noob question but is there some way for me to be able to start utilizing these changes before they are fully tested? I am running across this error and hoping this fix will resolve them.

Remove curandStateMTGP32 usage

gh-metadata: pytorch pytorch 21301 gh/syed-ahmed/13/head
@syed-ahmed
Copy link
Collaborator Author

@ryanchesler The changes should be landing very soon if you could wait a bit. If not, you can find the full diff here: https://github.com/pytorch/pytorch/commits/gh/syed-ahmed/13/orig. It's the first 5 commits.

@syed-ahmed syed-ahmed requested a review from ezyang June 4, 2019 05:53
Remove curandStateMTGP32 usage

gh-metadata: pytorch pytorch 21301 gh/syed-ahmed/13/head
@syed-ahmed
Copy link
Collaborator Author

@ezyang all tests have passed in the stack. GitHub is just not updating the statuses

@ryanchesler
Copy link

Awesome. Glad this is making it through. Hopefully it solves the issue blocking me.

@ezyang
Copy link
Contributor

ezyang commented Jun 4, 2019

Hey @syed-ahmed this stack is conflicting with cauchy which I just landed. Can you rebase past that?

Remove curandStateMTGP32 usage

gh-metadata: pytorch pytorch 21301 gh/syed-ahmed/13/head
@syed-ahmed
Copy link
Collaborator Author

Rebased :)

@zou3519 zou3519 deleted the gh/syed-ahmed/13/head branch June 5, 2019 21:08
zdevito pushed a commit to zdevito/ATen that referenced this pull request Jun 5, 2019
Summary:
Pull Request resolved: pytorch/pytorch#21301
ghimport-source-id: d4516237a8fb46d1f74c47532e849e5926fc6a79

Differential Revision: D15632929

Pulled By: ezyang

fbshipit-source-id: b5147edb95dc3d71f87581aa2ab002e48c3fef30
@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in 0e3c4a0.

@sbelharbi
Copy link

Hi,
Sorry for this question.
I am facing some issues with rng states (see here).
It was suggested that this merge has fixed the issue.
By checking the date of this merge and the date of the latest version of Pytorch 1.1.0 (April 30), it seems that Pytorch 1.1.0 does not have this feature yet.

Do you have an idea when this feature will be available in a stable version?

I want to publish my code, and make it clear which version of Pytorch I used so it will be easy to install/reproduce.

Thanks!

@ezyang
Copy link
Contributor

ezyang commented Jun 16, 2019

It will probably be included in 1.2 (assuming it doesn't get reverted before branch cut). You can also try using a nightly to get binaries with the fix earlier.

@sbelharbi
Copy link

Thanks!
The issue seems to be fixed in the nightly build (https://download.pytorch.org/whl/nightly/cu100/torch_nightly-1.2.0.dev20190616-cp37-cp37m-linux_x86_64.whl).

I hope 1.2.0 will be released soon!

@sbelharbi
Copy link

sbelharbi commented Aug 9, 2019

hi,
@ezyang @syed-ahmed
sorry to bother you.
was this commit integrated in v1.2.0 which was released today?
https://github.com/pytorch/pytorch/releases/tag/v1.2.0

I searched in the notes (key: rng, 21301), and did not find it.
thanks

@ezyang
Copy link
Contributor

ezyang commented Aug 9, 2019

Yes it's in 1.2.0; 0e3c4a0 is reachable from v1.2.0 branch. It does look like this is missing from the changelog notes.

@sbelharbi
Copy link

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants