-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Fix #11752: fix numerical issue in log_softmax #21672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #11752: fix numerical issue in log_softmax #21672
Conversation
test/test_nn.py
Outdated
|
|
||
| def test_log_softmax(self): | ||
| x_small = torch.ones(1, 2, dtype=torch.float32) | ||
| x_big = x_small * 1e16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
softmax and logsoftmax should be invariant uner addition, but not multiplication, right? (multilicative factor is inverse temperature, right?)
should this line read x_big = x_small + 1e16?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops..., this test works in the special case, but it's a mistake. I've written a new commit.
| scalar_t max_input = max_input_arr[j]; | ||
| vec256::map( | ||
| [tmp_sum](Vec x) { return x - Vec(tmp_sum); }, | ||
| [tmp_sum, max_input](Vec x) { return x - Vec(max_input) - Vec(tmp_sum); }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the compiler optimize this into computing max_input and tmp_sum before the :map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn’t be optimized to compute before.
In some cases, that input is large digits and the difference is small so that the preprocessed input value would be ignored (in the old way). You can see the example here. #11752 (comment)
And I have looked up the log_sofmax implementation in MXNet and TensorFlow. They also write like x - Vec(max_input) - Vec(tmp_sum) together to ensure the computing order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment here so people won't "optimize" this away in future? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's done.
…echu/pytorch into fix_numerical_issue_in_log_softmax
wolegechu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VitalyFedyunin So strange... I merged the 'test/test_nn.py' file from upstream:master branch to my branch. Now Github highlights the codes in green that don't belong to me.
…echu/pytorch into fix_numerical_issue_in_log_softmax
I introduced some new commits, which had been merged into master after I created this PR. @VitalyFedyunin emmmmm...Anyway, I have removed all the unrelated changes. |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: pytorch/pytorch#11866 has corrected this issue in function `host_softmax` (aten/src/ATen/native/SoftMax.cpp). But I tried the example proposed in pytorch/pytorch#11752. `log_softmax` is still not working for big logits. I have looked into the source code, found that example had called `vec_host_softmax_lastdim`, not `host_softmax`. This code fixes the issue in `_vec_log_softmax_lastdim` and has a test for `log_softmax`. Pull Request resolved: pytorch/pytorch#21672 Differential Revision: D15856327 Pulled By: VitalyFedyunin fbshipit-source-id: 7a1fd3c0a03d366c99eb873e235361e4fcfa7567
|
@VitalyFedyunin merged this pull request in b403b10. |
#11866 has corrected this issue in function
host_softmax(aten/src/ATen/native/SoftMax.cpp). But I tried the example proposed in #11752.log_softmaxis still not working for big logits.I have looked into the source code, found that example had called
vec_host_softmax_lastdim, nothost_softmax.This code fixes the issue in
_vec_log_softmax_lastdimand has a test forlog_softmax.