Use a ptr to store autograd profiler rng #24911

suo · 2019-08-20T16:34:41Z

Stack from ghstack:

Use a ptr to store autograd profiler rng #24911 Use a ptr to store autograd profiler rng

Trying to fix #2575.
Here is
all TLS in libtorch.so (thanks @ezyang for figuring how to find this)

I noticed that CallbackManager::sample_zero_one()::gen has size 5000,
which seems a lot bigger than the other ones. So make it heap-allocated
instead.

Caveat: I have no idea if this will actually fix anything, or whether
making this variable heap-allocated is a bad idea.

[pytorch ci] [win ci] [caffe2 ci] [binary ci]

Differential Revision: D16936370

@ezyang

Trying to fix #2575. [Here](https://gist.github.com/suo/7b0bc4b49d3c9e095b9f7eef8fa7c6e8) is all TLS in libtorch.so (thanks @ezyang for figuring how to find this) I noticed that `CallbackManager::sample_zero_one()::gen` has size 5000, which seems a lot bigger than the other ones. So make it heap-allocated instead. Caveat: I have no idea if this will actually fix anything, or whether making this variable heap-allocated is a bad idea. [pytorch ci] [win ci] [caffe2 ci] [binary ci]

@ezyang

Trying to fix #2575. [Here](https://gist.github.com/suo/7b0bc4b49d3c9e095b9f7eef8fa7c6e8) is all TLS in libtorch.so (thanks @ezyang for figuring how to find this) I noticed that `CallbackManager::sample_zero_one()::gen` has size 5000, which seems a lot bigger than the other ones. So make it heap-allocated instead. Caveat: I have no idea if this will actually fix anything, or whether making this variable heap-allocated is a bad idea. [pytorch ci] [win ci] [caffe2 ci] [binary ci] ghstack-source-id: 262d66a Pull Request resolved: #24911

ezyang · 2019-08-21T09:23:00Z

You need to explicitly enable the 3.5 test to see if this worked. You do this by (1) marking this config as XImportant in .circleci/cimodel/data/pytorch_build_data.py and (2) add the job name to .circleci/scripts/should_run_job.py. Verify that it worked by inspecting the job in question.

ezyang · 2019-08-21T09:24:05Z

This is probably right and we should land it ASAP in any case.

apaszke · 2019-08-26T03:21:25Z

We should also just switch to an RNG which has a significantly smaller state than this one

ezyang · 2019-08-26T14:17:53Z

For the record, this PR does NOT actually work. More discussion at #2575 (comment)

pytorchbot added the module: autograd Related to torch.autograd, and the autograd engine in general label Aug 20, 2019

suo requested review from apaszke, ezyang and zdevito August 20, 2019 17:07

zdevito approved these changes Aug 20, 2019

View reviewed changes

suo mentioned this pull request Aug 21, 2019

ImportError: dlopen: cannot load any more object with static TLS #2575

Open

suo closed this Aug 23, 2019

suo deleted the gh/suo/149/head branch August 23, 2019 22:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use a ptr to store autograd profiler rng #24911

Use a ptr to store autograd profiler rng #24911

Uh oh!

suo commented Aug 20, 2019 •

edited by ezyang

Loading

Uh oh!

ezyang commented Aug 21, 2019

Uh oh!

ezyang commented Aug 21, 2019

Uh oh!

apaszke commented Aug 26, 2019

Uh oh!

ezyang commented Aug 26, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Use a ptr to store autograd profiler rng #24911

Use a ptr to store autograd profiler rng #24911

Uh oh!

Conversation

suo commented Aug 20, 2019 • edited by ezyang Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang commented Aug 21, 2019

Uh oh!

ezyang commented Aug 21, 2019

Uh oh!

apaszke commented Aug 26, 2019

Uh oh!

ezyang commented Aug 26, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

suo commented Aug 20, 2019 •

edited by ezyang

Loading