Skip to content

Conversation

@wolegechu
Copy link
Contributor

@wolegechu wolegechu commented Jun 12, 2019

#11866 has corrected this issue in function host_softmax (aten/src/ATen/native/SoftMax.cpp). But I tried the example proposed in #11752. log_softmax is still not working for big logits.

I have looked into the source code, found that example had called vec_host_softmax_lastdim, not host_softmax.

This code fixes the issue in _vec_log_softmax_lastdim and has a test for log_softmax.

@pytorchbot pytorchbot added module: cpu CPU specific problem (e.g., perf, algorithm) module: nn Related to torch.nn module: operators labels Jun 12, 2019
@wolegechu wolegechu changed the title Fix #11752: fix numerical issue in log softmax Fix #11752: fix numerical issue in log_softmax Jun 12, 2019
test/test_nn.py Outdated

def test_log_softmax(self):
x_small = torch.ones(1, 2, dtype=torch.float32)
x_big = x_small * 1e16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

softmax and logsoftmax should be invariant uner addition, but not multiplication, right? (multilicative factor is inverse temperature, right?)

should this line read x_big = x_small + 1e16?

Copy link
Contributor Author

@wolegechu wolegechu Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops..., this test works in the special case, but it's a mistake. I've written a new commit.

scalar_t max_input = max_input_arr[j];
vec256::map(
[tmp_sum](Vec x) { return x - Vec(tmp_sum); },
[tmp_sum, max_input](Vec x) { return x - Vec(max_input) - Vec(tmp_sum); },
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the compiler optimize this into computing max_input and tmp_sum before the :map?

Copy link
Contributor Author

@wolegechu wolegechu Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn’t be optimized to compute before.

In some cases, that input is large digits and the difference is small so that the preprocessed input value would be ignored (in the old way). You can see the example here. #11752 (comment)

And I have looked up the log_sofmax implementation in MXNet and TensorFlow. They also write like x - Vec(max_input) - Vec(tmp_sum) together to ensure the computing order.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment here so people won't "optimize" this away in future? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's done.

Copy link
Contributor Author

@wolegechu wolegechu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin So strange... I merged the 'test/test_nn.py' file from upstream:master branch to my branch. Now Github highlights the codes in green that don't belong to me.

@gchanan gchanan added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 13, 2019
@wolegechu
Copy link
Contributor Author

wolegechu commented Jun 14, 2019

@VitalyFedyunin So strange... I merged the 'test/test_nn.py' file from upstream:master branch to my branch. Now Github highlights the codes in green that don't belong to me.

I introduced some new commits, which had been merged into master after I created this PR.

@VitalyFedyunin emmmmm...Anyway, I have removed all the unrelated changes.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Jun 17, 2019
Summary:
pytorch/pytorch#11866 has corrected this issue in function `host_softmax` (aten/src/ATen/native/SoftMax.cpp). But I tried the example proposed in pytorch/pytorch#11752. `log_softmax` is still not working for big logits.

I have looked into the source code, found that example had called `vec_host_softmax_lastdim`, not `host_softmax`.

This code fixes the issue in `_vec_log_softmax_lastdim` and has a test for `log_softmax`.
Pull Request resolved: pytorch/pytorch#21672

Differential Revision: D15856327

Pulled By: VitalyFedyunin

fbshipit-source-id: 7a1fd3c0a03d366c99eb873e235361e4fcfa7567
@facebook-github-bot
Copy link
Contributor

@VitalyFedyunin merged this pull request in b403b10.

@wolegechu wolegechu deleted the fix_numerical_issue_in_log_softmax branch June 18, 2019 04:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: cpu CPU specific problem (e.g., perf, algorithm) module: nn Related to torch.nn open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants