-
Notifications
You must be signed in to change notification settings - Fork 26.3k
optionally zero infinite losses in CTCLoss #16199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@jinserk , @SeanNaren in case it's of interest to you. |
|
Thank you for letting me know!! :D
…On Sun, Jan 20, 2019, 3:44 PM Thomas Viehmann ***@***.*** wrote:
@jinserk <https://github.com/jinserk> , @SeanNaren
<https://github.com/SeanNaren> in case it's of interest to you.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16199 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAyPtjo3C5Omve879odHhWXNHNh-1421ks5vFNUzgaJpZM4aJ1lq>
.
|
|
I managed to reproduce the CI failures, so I'm hopeful to find the root cause soonish. |
|
@soumith can this make 1.0.1? |
|
It's broken with CuDNN right now in a way that is hard to understand and, while I'm not Soumith, I'd see it more in 1.1 after it is fixed.
|
|
Ah I misread the order of comment and your last commit -- assumed the remaining failures were spurious. |
and bring back changes lost in merge
|
So the errors don't speak to me as something related to the patch and I would think it is good to go. |
|
I'd love to merge it, but it says there's merge conflicts. |
|
The big IntList rename hit it. I'll rebase. |
|
I think it should be good now. |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: Here is a stab at implementing an option to zero out infinite losses (and NaN gradients). It might be nicer to move the zeroing to the respective kernels. The default is currently `False` to mimic the old behaviour, but I'd be half inclined to set the default to `True`, because the behaviour wasn't consistent between CuDNN and Native anyways and the NaN gradients aren't terribly useful. This topic seems to come up regularly, e.g. in #14335 Pull Request resolved: pytorch/pytorch#16199 Differential Revision: D14020462 Pulled By: ezyang fbshipit-source-id: 5ba8936c66ec6e61530aaf01175dc49f389ae428
Summary: Here is a stab at implementing an option to zero out infinite losses (and NaN gradients). It might be nicer to move the zeroing to the respective kernels. The default is currently `False` to mimic the old behaviour, but I'd be half inclined to set the default to `True`, because the behaviour wasn't consistent between CuDNN and Native anyways and the NaN gradients aren't terribly useful. This topic seems to come up regularly, e.g. in pytorch#14335 Pull Request resolved: pytorch#16199 Differential Revision: D14020462 Pulled By: ezyang fbshipit-source-id: 5ba8936c66ec6e61530aaf01175dc49f389ae428
|
@t-vi a naive question please: I was having issues with the NaN loss even with I just want to be sure that the GPU-CPU difference (installing torch+cu and using CPU runtime) was definitely the issue. |
Here is a stab at implementing an option to zero out infinite losses (and NaN gradients).
It might be nicer to move the zeroing to the respective kernels.
The default is currently
Falseto mimic the old behaviour, but I'd be half inclined to set the default toTrue, because the behaviour wasn't consistent between CuDNN and Native anyways and the NaN gradients aren't terribly useful.This topic seems to come up regularly, e.g. in #14335