-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Avoid log(0.0) in the 'gumbel_softmax' function #20179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@syed-ahmed Do you think exponential is just wrong for doing this? @zhaoyanpeng Do you think you could add your sample code here to the test suite as a slowTest? Grep for examples of using this decorator. |
|
@ezyang a test function has been added. I was not able to reproduce the error using the test function, but it shall be noted that it is possible to sample a zero from the exponential distribution in theory. |
|
0 can be sampled from
where u \in [0, 1). |
|
I have the same question, and can reproduce this problem. Below are my own script, in some cases, the |
For CUDA, currently we are doing exponential like this: curand_uniform's range is (0, 1]. So it's totally possible that exponential is giving 0.0 when log gets 1. In the CPU side, we don't see it because, uniform's range there is [0,1) and we do a log(u-1) in the exponential formula to not give it zero. As a result exponential on the CPU side has bounds of (0,1), which is also the state bounds in wikipedia. One workaround I could add for the CUDA side in this PR, is to check for 1 and replace it with std::nextafter(1.0, 0.0). |
@zhaoyanpeng May be the global manual_seed in the test suite is not letting you reproduce the 0s. May be add the |
@syed-ahmed Actually I copied the sample code above to the test function at first, but the error did not occur. I just tested it again, still could not reproduce the error. |
|
@pytorchbot rebase this please |
|
We should be reseeding on every test case, but if we need a specific seed for this test, it seems fine to explicitly reseed.
I think ideally what we'd want is something like:
|
I did that on purpose since I do not think I explored a bit more about the possible zero exponential. I prepared two pytorch environments. The first is locally installed pytorch with Conda; the second is pytorch source code. Both of them use v1.1.0 version. I run the sample code above in the first environment and got a zero for the 376731-th exponential. With the same seed and device as in the first environment, I modified By the way, would it be better if we had a function |
|
@pytorchbot rebase this please |
|
This PR has run afoul of some TorchScript changes on master. cc @wanchaol |
|
I think it's because |
|
This is fixing a numerical stability problem, where we need to avoid
division by zero. So it will be hard to avoid doing something that's
not morally the same thing, though maybe there is another way to spell
this that avoids an finfo call?
Excerpts from Wanchao's message of 2019-06-07 11:42:31 -0700:
… I think it's because `torch.finfo` is not a JIT recognized type and JIT thought is a ATen op instead. Sorry I did not have a full context why we need to do `clamp(min_representive_number)` here, seems like dealing the edge cases, can we do it in other ways instead?
|
|
Hi @zhaoyanpeng! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
tensor.exponential_() on cuda device may generate 0.0; below is the code for reproducing the error.
output: -0.0 is sampled in the 376731-th entry in the 1-th sampling