-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[WIP] fix different round behavior on CPU and GPU #16498 #17443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
can you also add test cases? |
|
yes, add a test case to test/test_cuda.py and this should be good to go |
|
I think this doesn't address the issue with respect to CPU rounding, which is still different based on the instruction set. See #16498 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought only one blank line is allowed across functions
I saw the error mentioning that there is one blank line missing, that is why I added it. |
|
@soumith, a unit test is added. @vishwakftw What do you mean "instruction set"? I didn't see the case that the cpu implementation of torch.round() wraps std::round. Besides, based on my test, the cpu version of torch.round() follows half-to-even mode. If you have counterexamples, please let me know. |
|
@musikisomorphie the half-to-even is because of the vectorized implementation, which is what you see in your CPU tests.
|
|
I think it'd be better to label this PR as |
@vishwakftw thank you for your feedback. |
|
@bhushan23, @soumith, should we merge this PR or? |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
Actually replacing round to nearbyint in:
and TH_MATH_NAME(round) to TH_MATH_NAME(nearbyint) everywhere (just two files) should suffice. |
Ok, I will fix them. Thanks for the tip. |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
Please merge master into this branch and avoid force-push. |
|
Is there any way to test CPU behavior for the non-vectorized version which has been modified? |
|
Pretty much no, as all calls now dispatch to the vectorized version. Asking The only round() which is not covered by vectorized implementation is HalfTensor(a).cpu().round(), but right now we throw "round_out is not implemented for type torch.HalfTensor" anyway. |
Do you mean I should rebase my commit and without using force-push? |
|
Just pull most recent master into your repo (do not rebase). |
I updated my branch, then git push origin my-branch. Is it correct? |
|
Yes. |
|
Hi guys @VitalyFedyunin, @bhushan23, how is this progress of this PR going? |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@pytorchbot retest this please |
3 similar comments
|
@pytorchbot retest this please |
|
@pytorchbot retest this please |
|
@pytorchbot retest this please |
|
@pytorchbot rebase this please |
|
Sorry, only maintainers are authorized to rebase other people's PRs. Feel free to try again on one of your PRs! (To learn more about this bot, see Bot commands.) |
Hi @VitalyFedyunin, should I rebase this PR myself? |
|
Yes please, there are internal tests failing (where they not supposed to), so I was trying to check if rebase would help. |
add cuda round test add a new blank line after test_cuda_round convert all cpu round to nearbyint
I rebased and force pushed my branch, without force the terminal shows:
|
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: xxtemp, colesbury, bhushan23, zou3519, convert gpu round behavior to half-to-even, consistent with torch cpu version and numpy. You feedback are welcomed. See #16498 Pull Request resolved: pytorch/pytorch#17443 Differential Revision: D14261786 Pulled By: VitalyFedyunin fbshipit-source-id: 98156436b545d72769831a89e2775d43ad913ebc
* upstream/master: (24 commits) Automatic update of fbcode/onnx to 96c58ceeacf0f2b73d752e413e4fd78787a12da3 (pytorch#17676) Set the default ONNX opset to the latest stable opset (i.e., 9) (pytorch#17736) Add module attributes (pytorch#17309) - refactoring serialization of ONNX initializers to be name-based (pytorch#17420) ONNX Export for Max and Average Pooling in CEIL_MODE use flake8-mypy (pytorch#17721) use fp16<->fp32 intrinsic (pytorch#17496) Implement a Caffe2 standalone LSTM operator (pytorch#17726) caffe2:libtorch_cuda depends on caffe2:caffe2_gpu (pytorch#17729) add tensor and cost inference functions (pytorch#17684) ONNX Export Narrow op Keep the dim_type of hinted shape as BATCH if possible (pytorch#17734) fix different round behavior on CPU and GPU pytorch#16498 (pytorch#17443) Warn about memory overlaps on expanded tensors (pytorch#17576) fix exp fam. formula refactor caffe2 operator constructors - 10/9 (pytorch#17659) Improve ONNX symbolic for logsoftmax and softmax (pytorch#17672) Enable using CMD when building cpp extensions on Windows Do not rename net boundary inputs/outputs during ssaRewrite. (pytorch#17545) Reapply D14078519 (pytorch#17596) ...
@xxtemp, @colesbury, @bhushan23, @zou3519, convert gpu round behavior to half-to-even, consistent with torch cpu version and numpy. You feedback are welcomed.
See #16498