Skip to content

Conversation

@ktarplee
Copy link

There is a regression in softmin in 0.4.1 that was not present in 0.4.0. The behavior of softmin(x) should match softmax(-x) however instead it is implemented (in v0.4.1) as -softmax(x). These are not the same. The fix is trivial because the bug is due to operator precedence.

This is a major regression that broke my training. I'm not sure how a unit test did not catch this.

x = torch.tensor([1, 2, 3.5, 4])
print(F.softmin(x, dim=0)) # this has the wrong output in 0.4.1 but correct in 0.4.0
print(F.softmax(-x, dim=0)) # this is what softmax should be
print(F.softmax(x, dim=0))
print(-F.softmax(x, dim=0)) # this is how softmax is implemented incorrectly

In 0.4.1 this produces
tensor([-0.0278, -0.0755, -0.3385, -0.5581])
tensor([0.6668, 0.2453, 0.0547, 0.0332])
tensor([0.0278, 0.0755, 0.3385, 0.5581])
tensor([-0.0278, -0.0755, -0.3385, -0.5581])

In 0.4.0 this produces the correct values
tensor([ 0.6668, 0.2453, 0.0547, 0.0332])
tensor([ 0.6668, 0.2453, 0.0547, 0.0332])
tensor([ 0.0278, 0.0755, 0.3385, 0.5581])
tensor([-0.0278, -0.0755, -0.3385, -0.5581])

@soumith
Copy link
Contributor

soumith commented Jul 31, 2018

this is bad, sorry about the regression. Would you be up for adding a test in test_nn.py?

@ktarplee
Copy link
Author

I would be happy to add to test_nn.py however it looks like somewhat of a pain to build and test pytorch from source.

@soumith
Copy link
Contributor

soumith commented Jul 31, 2018

@ktarplee no worries, we'll add it and push this PR through.

@ssnl
Copy link
Collaborator

ssnl commented Jul 31, 2018

Thanks for the fix. Can you also instead submit this PR to the master branch?

@soumith soumith changed the base branch from v0.4.1 to master July 31, 2018 22:14
@soumith
Copy link
Contributor

soumith commented Jul 31, 2018

@ssnl i've changed it to be on master.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

goodlux pushed a commit to goodlux/pytorch that referenced this pull request Aug 15, 2018
Summary:
There is a regression in softmin in 0.4.1 that was not present in 0.4.0.  The behavior of softmin(x) should match softmax(-x) however instead it is implemented (in v0.4.1) as -softmax(x).  These are not the same.  The fix is trivial because the bug is due to operator precedence.

This is a major regression that broke my training.  I'm not sure how a unit test did not catch this.

```
x = torch.tensor([1, 2, 3.5, 4])
print(F.softmin(x, dim=0)) # this has the wrong output in 0.4.1 but correct in 0.4.0
print(F.softmax(-x, dim=0)) # this is what softmax should be
print(F.softmax(x, dim=0))
print(-F.softmax(x, dim=0)) # this is how softmax is implemented incorrectly
```
In 0.4.1 this produces
tensor([-0.0278, -0.0755, -0.3385, -0.5581])
tensor([0.6668, 0.2453, 0.0547, 0.0332])
tensor([0.0278, 0.0755, 0.3385, 0.5581])
tensor([-0.0278, -0.0755, -0.3385, -0.5581])

In 0.4.0 this produces the correct values
tensor([ 0.6668,  0.2453,  0.0547,  0.0332])
tensor([ 0.6668,  0.2453,  0.0547,  0.0332])
tensor([ 0.0278,  0.0755,  0.3385,  0.5581])
tensor([-0.0278, -0.0755, -0.3385, -0.5581])
Pull Request resolved: pytorch#10066

Differential Revision: D9106995

Pulled By: soumith

fbshipit-source-id: 7332503c6077e8461ad6cd72422c749cf6ca595b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants