Skip to content

Conversation

@musikisomorphie
Copy link
Contributor

@musikisomorphie musikisomorphie commented Feb 24, 2019

@xxtemp, @colesbury, @bhushan23, @zou3519, convert gpu round behavior to half-to-even, consistent with torch cpu version and numpy. You feedback are welcomed.
See #16498

@musikisomorphie musikisomorphie changed the title convert cuda round to half to even fix different round behavior on CPU and GPU #16498 Feb 24, 2019
@bhushan23
Copy link
Contributor

can you also add test cases?

@soumith
Copy link
Contributor

soumith commented Feb 25, 2019

yes, add a test case to test/test_cuda.py and this should be good to go

@vishwakftw
Copy link
Contributor

I think this doesn't address the issue with respect to CPU rounding, which is still different based on the instruction set. See #16498 (comment)

@ezyang ezyang changed the title fix different round behavior on CPU and GPU #16498 [WIP] fix different round behavior on CPU and GPU #16498 Feb 25, 2019
Copy link
Contributor

@bhushan23 bhushan23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought only one blank line is allowed across functions

@musikisomorphie
Copy link
Contributor Author

I thought only one blank line is allowed across functions

I saw the error mentioning that there is one blank line missing, that is why I added it.
I guess since the indentation is different between test_cuda_round() and load_ignore_file(), two blank lines are required.

@musikisomorphie
Copy link
Contributor Author

@soumith, a unit test is added.

@vishwakftw What do you mean "instruction set"? I didn't see the case that the cpu implementation of torch.round() wraps std::round. Besides, based on my test, the cpu version of torch.round() follows half-to-even mode. If you have counterexamples, please let me know.

@vishwakftw
Copy link
Contributor

@musikisomorphie the half-to-even is because of the vectorized implementation, which is what you see in your CPU tests.

std::round(2.5) rounds to 3 (away from zero), whereas the vectorized implementation rounds 2.5 to 2 (nearest even). The patch for this issue would be to wrap the CPU implementation with std::rint instead of std::round.

@vishwakftw
Copy link
Contributor

I think it'd be better to label this PR as bc-breaking.

@musikisomorphie
Copy link
Contributor Author

musikisomorphie commented Feb 26, 2019

@musikisomorphie the half-to-even is because of the vectorized implementation, which is what you see in your CPU tests.

std::round(2.5) rounds to 3 (away from zero), whereas the vectorized implementation rounds 2.5 to 2 (nearest even). The patch for this issue would be to wrap the CPU implementation with std::rint instead of std::round.

@vishwakftw thank you for your feedback.
What I did here is making the torch.round(), torch.cuda().round() consistent with np.round(), i.e., half-to-even.
I suppose what you expect is replacing all std::round in the pytorch's source code with std:rint, so that the whole framework strictly sticks to half-to-even mode.
There exists mismatch between my revision and your expectation. Unfortunately I may not have enough time to work on the latter.

@musikisomorphie
Copy link
Contributor Author

@bhushan23, @soumith, should we merge this PR or?

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@VitalyFedyunin VitalyFedyunin self-requested a review February 28, 2019 17:11
@VitalyFedyunin
Copy link
Contributor

VitalyFedyunin commented Feb 28, 2019

Actually replacing round to nearbyint in:

return map(std::round);

and TH_MATH_NAME(round) to TH_MATH_NAME(nearbyint) everywhere (just two files)

should suffice.

@musikisomorphie
Copy link
Contributor Author

TH_MATH_NAME(round)

Ok, I will fix them. Thanks for the tip.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@VitalyFedyunin
Copy link
Contributor

Please merge master into this branch and avoid force-push.

@vishwakftw
Copy link
Contributor

Is there any way to test CPU behavior for the non-vectorized version which has been modified?

@VitalyFedyunin
Copy link
Contributor

Pretty much no, as all calls now dispatch to the vectorized version. Asking
@musikisomorphie to remove old TH code would be too much for this PR.

The only round() which is not covered by vectorized implementation is HalfTensor(a).cpu().round(), but right now we throw "round_out is not implemented for type torch.HalfTensor" anyway.

@musikisomorphie
Copy link
Contributor Author

Please merge master into this branch and avoid force-push.

Do you mean I should rebase my commit and without using force-push?

@VitalyFedyunin
Copy link
Contributor

Just pull most recent master into your repo (do not rebase).

@musikisomorphie
Copy link
Contributor Author

Just pull most recent master into your repo (do not rebase).

I updated my branch, then git push origin my-branch. Is it correct?

@bhushan23
Copy link
Contributor

Yes.
I prefer to keep everything clean and no merge commits ( which will happen here), but that's fine.

@musikisomorphie
Copy link
Contributor Author

Hi guys @VitalyFedyunin, @bhushan23, how is this progress of this PR going?

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@VitalyFedyunin
Copy link
Contributor

@pytorchbot retest this please

3 similar comments
@VitalyFedyunin
Copy link
Contributor

@pytorchbot retest this please

@VitalyFedyunin
Copy link
Contributor

@pytorchbot retest this please

@VitalyFedyunin
Copy link
Contributor

@pytorchbot retest this please

@VitalyFedyunin
Copy link
Contributor

@pytorchbot rebase this please

@pytorchbot
Copy link
Collaborator

Sorry, only maintainers are authorized to rebase other people's PRs. Feel free to try again on one of your PRs!

(To learn more about this bot, see Bot commands.)

@musikisomorphie
Copy link
Contributor Author

@pytorchbot rebase this please

Hi @VitalyFedyunin, should I rebase this PR myself?

@VitalyFedyunin
Copy link
Contributor

Yes please, there are internal tests failing (where they not supposed to), so I was trying to check if rebase would help.

add cuda round test

add a new blank line after test_cuda_round

convert all cpu round to nearbyint
@musikisomorphie
Copy link
Contributor Author

Yes please, there are internal tests failing (where they not supposed to), so I was trying to check if rebase would help.

I rebased and force pushed my branch, without force the terminal shows:

To https://github.com/musikisomorphie/pytorch.git
! [rejected] round -> round (non-fast-forward)
error: failed to push some refs to 'https://github.com/musikisomorphie/pytorch.git'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. Integrate the remote changes (e.g.
hint: 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Mar 7, 2019
Summary:
xxtemp, colesbury, bhushan23, zou3519,  convert gpu round behavior to half-to-even, consistent with torch cpu version and numpy. You feedback are welcomed.
See #16498
Pull Request resolved: pytorch/pytorch#17443

Differential Revision: D14261786

Pulled By: VitalyFedyunin

fbshipit-source-id: 98156436b545d72769831a89e2775d43ad913ebc
@musikisomorphie musikisomorphie deleted the round branch March 7, 2019 08:52
petrex pushed a commit to petrex/pytorch that referenced this pull request Mar 7, 2019
* upstream/master: (24 commits)
  Automatic update of fbcode/onnx to 96c58ceeacf0f2b73d752e413e4fd78787a12da3 (pytorch#17676)
  Set the default ONNX opset to the latest stable opset (i.e., 9) (pytorch#17736)
  Add module attributes (pytorch#17309)
  - refactoring serialization of ONNX initializers to be name-based (pytorch#17420)
  ONNX Export for Max and Average Pooling in CEIL_MODE
  use flake8-mypy (pytorch#17721)
  use fp16<->fp32 intrinsic (pytorch#17496)
  Implement a Caffe2 standalone LSTM operator (pytorch#17726)
  caffe2:libtorch_cuda depends on caffe2:caffe2_gpu (pytorch#17729)
  add tensor and cost inference functions (pytorch#17684)
  ONNX Export Narrow op
  Keep the dim_type of hinted shape as BATCH if possible (pytorch#17734)
  fix different round behavior on CPU and GPU pytorch#16498 (pytorch#17443)
  Warn about memory overlaps on expanded tensors (pytorch#17576)
  fix exp fam. formula
  refactor caffe2 operator constructors - 10/9 (pytorch#17659)
  Improve ONNX symbolic for logsoftmax and softmax (pytorch#17672)
  Enable using CMD when building cpp extensions on Windows
  Do not rename net boundary inputs/outputs during ssaRewrite. (pytorch#17545)
  Reapply D14078519 (pytorch#17596)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants