Skip to content

Conversation

@suo
Copy link
Member

@suo suo commented Aug 20, 2019

Stack from ghstack:

Trying to fix #2575.
Here is
all TLS in libtorch.so (thanks @ezyang for figuring how to find this)

I noticed that CallbackManager::sample_zero_one()::gen has size 5000,
which seems a lot bigger than the other ones. So make it heap-allocated
instead.

Caveat: I have no idea if this will actually fix anything, or whether
making this variable heap-allocated is a bad idea.

[pytorch ci] [win ci] [caffe2 ci] [binary ci]

Differential Revision: D16936370

Trying to fix #2575.
[Here](https://gist.github.com/suo/7b0bc4b49d3c9e095b9f7eef8fa7c6e8) is
all TLS in libtorch.so (thanks @ezyang for figuring how to find this)

I noticed that `CallbackManager::sample_zero_one()::gen` has size 5000,
which seems a lot bigger than the other ones. So make it heap-allocated
instead.

Caveat: I have no idea if this will actually fix anything, or whether
making this variable heap-allocated is a bad idea.

[pytorch ci] [win ci] [caffe2 ci] [binary ci]
@pytorchbot pytorchbot added the module: autograd Related to torch.autograd, and the autograd engine in general label Aug 20, 2019
suo added a commit that referenced this pull request Aug 20, 2019
Trying to fix #2575.
[Here](https://gist.github.com/suo/7b0bc4b49d3c9e095b9f7eef8fa7c6e8) is
all TLS in libtorch.so (thanks @ezyang for figuring how to find this)

I noticed that `CallbackManager::sample_zero_one()::gen` has size 5000,
which seems a lot bigger than the other ones. So make it heap-allocated
instead.

Caveat: I have no idea if this will actually fix anything, or whether
making this variable heap-allocated is a bad idea.

[pytorch ci] [win ci] [caffe2 ci] [binary ci]

ghstack-source-id: 262d66a
Pull Request resolved: #24911
@suo suo requested review from apaszke, ezyang and zdevito August 20, 2019 17:07
@ezyang
Copy link
Contributor

ezyang commented Aug 21, 2019

You need to explicitly enable the 3.5 test to see if this worked. You do this by (1) marking this config as XImportant in .circleci/cimodel/data/pytorch_build_data.py and (2) add the job name to .circleci/scripts/should_run_job.py. Verify that it worked by inspecting the job in question.

@ezyang
Copy link
Contributor

ezyang commented Aug 21, 2019

This is probably right and we should land it ASAP in any case.

@apaszke
Copy link
Contributor

apaszke commented Aug 26, 2019

We should also just switch to an RNG which has a significantly smaller state than this one

@ezyang
Copy link
Contributor

ezyang commented Aug 26, 2019

For the record, this PR does NOT actually work. More discussion at #2575 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module: autograd Related to torch.autograd, and the autograd engine in general

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants