Skip to content

Conversation

@ailzhang
Copy link
Contributor

fixes #18057 according to @colesbury 's suggestion. Thanks!
cc: @ezyang

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ailzhang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ailzhang ailzhang requested a review from ezyang March 19, 2019 14:24
@ailzhang
Copy link
Contributor Author

@pytorchbot retest this please

@ailzhang
Copy link
Contributor Author

@pytorchbot rebase this please

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I didn't know this was a trick you could do. Do you know how it's justified?

@ezyang
Copy link
Contributor

ezyang commented Mar 19, 2019

Oh, I see Sam's comment now. If you wanna be nice, put it in a comment :)

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ailzhang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

return w12.div_((w1 * w2).clamp_min_(eps));
Tensor w1 = at::sum(x1 * x1, dim);
Tensor w2 = at::sum(x2 * x2, dim);
Tensor n12 = (w1 * w2).sqrt_().clamp_min(eps);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arguably we should use rsqrt here for better precision :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the precision worse.

On CPU, rsqrt(x) is implemented as 1/sqrt(x) so now three operations with rounding instead of two. (There's no std::rsqrt in C++. The x86 rsqrt instructions are low-precision).

With CUDA it's a little different, but sqrt(x) is generally more precise on modern GPUs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @colesbury ! I will send a patch to fix this.

return w12.div_((w1 * w2).clamp_min_(eps));
Tensor w1 = at::sum(x1 * x1, dim);
Tensor w2 = at::sum(x2 * x2, dim);
Tensor n12 = (w1 * w2).rsqrt_().clamp_min(eps);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Although you probably now want to do either clamp_min(eps * eps) before rsqrt or clamp_max(1.0 / eps) after rsqrt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice catch! Thanks!

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ailzhang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Mar 20, 2019
Summary:
fixes #18057 according to colesbury 's suggestion. Thanks!
cc: ezyang
Pull Request resolved: pytorch/pytorch#18168

Differential Revision: D14520953

Pulled By: ailzhang

fbshipit-source-id: 970e6cfb482d857a81721ec1d0ee4a4df84a0450
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cosine_similarity function produces results more than 1.0

6 participants