Avoid undefined behavior by union type punning in round_to_bfloat16 by N-Dekker · Pull Request #41070 · tensorflow/tensorflow

N-Dekker · 2020-07-03T22:19:39Z

Use std::memcpy instead of union based type punning, to avoid undefined behavior.
See also C++ Core Guidelines: "Don't use a union for type punning"
https://github.com/isocpp/CppCoreGuidelines/blob/v0.8/CppCoreGuidelines.md#c183-dont-use-a-union-for-type-punning

googlebot · 2020-07-03T22:19:44Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.

What to do if you already signed the CLA

Individual signers

It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.

Corporate signers

Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
The email used to register you as an authorized contributor must also be attached to your GitHub account.

ℹ️ Googlers: Go here for more info.

N-Dekker · 2020-07-03T22:24:12Z

@googlebot I signed it!

googlebot · 2020-07-03T22:24:15Z

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

Dynmi

Delicate and nice change

jaingaurav

Could you remove the FP32 union on line 180 as well?

Also adding @rmlarsen to the review.

jaingaurav · 2020-07-06T19:43:53Z

@N-Dekker: Could you also make this change in Eigen? The relevant file is https://bitbucket.org/eigen/eigen/src/default/Eigen/src/Core/arch/Default/Half.h.

N-Dekker · 2020-07-06T21:09:41Z

@N-Dekker: Could you also make this change in Eigen? The relevant file is https://bitbucket.org/eigen/eigen/src/default/Eigen/src/Core/arch/Default/Half.h.

Thanks for the suggestion, @jaingaurav I never contributed to Eigen before, but I could give it a try. Specifically for their bfloat16 implementation, I think that would be this union: https://gitlab.com/libeigen/eigen/-/blob/386d809bde475c65b7940f290efe80e6a05878c4/Eigen/src/Core/arch/Default/BFloat16.h#L315 Right? Bfloat16 is my main interest for now, as I'm maintaining another implementation myself! https://github.com/biovault/biovault_bfloat16

Unfortunately, it appears that Eigen does not have a convenient bit_cast function template like absl::bit_cast. So I'd either have to propose bit_cast to Eigen as well, or fix their bfloat16 by memcpy-ing the bits from float to unsigned int, inside their float_to_bfloat16_rtne.

Anyway, it would certainly help if this pull request would make it into the master branch of TensorFlow 😃 I'll have another look tomorrow!

jaingaurav · 2020-07-06T21:37:29Z

@N-Dekker: Yeah I'd recommend updating both Half.h & BFloat16.h.

We're trying to merge your changes but there seem to be some BUILD file issues. I think you need to include the absl dependency else we are getting errors like: 'absl/base/casts.h': No such file or directory

N-Dekker · 2020-07-07T20:55:36Z

We're trying to merge your changes but there seem to be some BUILD file issues. I think you need to include the absl dependency else we are getting errors like: 'absl/base/casts.h': No such file or directory

Thank you @jaingaurav, but I'm sorry, I'm not really familiar with the build system! Would it be sufficient to just add "@com_google_absl//absl/base" to tensorflow/core/lib/bfloat16/BUILD ? I guess somewhere here, right?
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/lib/bfloat16/BUILD#L18

jaingaurav · 2020-07-07T21:32:52Z

Yep I believe that should work

jaingaurav · 2020-07-07T22:52:43Z

@N-Dekker: I believe there is a ubuntu sanity check failure due to the ordering of the build deps. Could you please address that?

N-Dekker · 2020-07-08T05:24:22Z

I believe there is a ubuntu sanity check failure due to the ordering of the build deps. Could you please address that?

@jaingaurav: Thanks, I see now in https://source.cloud.google.com/results/invocations/c3882af9-15f4-4c40-b79a-37a233c1d006/targets/%2F%2Ftensorflow%2Ftools%2Fci_build:gen_ci_sanity_out/log

FAIL: buildifier found errors and/or warnings in above BUILD files.
buildifier suggested the following changes:
tensorflow/core/lib/bfloat16/BUILD:
18d17
<         "@com_google_absl//absl/base",
20a20
>         "@com_google_absl//absl/base",
exit status 1
Please fix manually or run buildifier <file> to auto-fix.

I guess that means that it suggests moving the line down, from 18 to 20, right? Do you have a clue why? I'm asking because other BUILD files do have "@com_google_absl//absl/base" in front of their "//tensorflow/core..." dependencies, for example: https://github.com/tensorflow/tensorflow/blob/v2.3.0-rc0/tensorflow/compiler/aot/BUILD#L43

Anyway, I'll give it a try.

N-Dekker · 2020-07-08T09:07:31Z

@jaingaurav Still build failures, unfortunately! I guess because of the "absl/base" dependency as well. Would you suggest me to simply avoid using Abseil for this pull request? (Instead of absl::bit_cast<uint32_t>(v), the member function could directly call std::memcpy(&u, &v, sizeof(float)), of course.)

I'm asking you also because I have another pull request in mind that may or may not use bit_cast

jaingaurav · 2020-07-08T19:41:01Z

@N-Dekker: Some of the failures seem unrelated. However, given that we're unlikely to use absl in Eigen an alternate fix without absl would be preferred.

Use `std::memcpy` instead of union based type punning, to avoid undefined behavior. See also C++ Core Guidelines: "Don't use a union for type punning" https://github.com/isocpp/CppCoreGuidelines/blob/v0.8/CppCoreGuidelines.md#c183-dont-use-a-union-for-type-punning

jaingaurav · 2020-07-09T23:42:13Z

Apologies @N-Dekker but we're actively reworking this code to use the Eigen implementation. As a result, we'll address this casting behavior when performing that work.

N-Dekker · 2020-07-10T07:11:06Z

Apologies @N-Dekker but we're actively reworking this code to use the Eigen implementation. As a result, we'll address this casting behavior when performing that work.

No problem @jaingaurav, , thank you for the information! Would you still suggest me to make a merge request on this issue for https://gitlab.com/libeigen/eigen ?

I'm asking also because I was about to prepare another possible improvement: I believe the performance of conversion from an integer type to bfloat could be improved significantly.

jaingaurav · 2020-07-11T05:39:28Z

@N-Dekker: I think I'd hold off just a bit until @rmlarsen completes the work. He said he might be able to fix this issue when doing that work.

If the performance improvement is isolated, maybe you can start proposing that in eigen right away?

N-Dekker · 2020-07-11T11:45:00Z

FYI I just submitted a bfloat32 merge request to Eigen, albeit on a different subject: "Allow implicit conversion from bfloat16 to float and double", https://gitlab.com/libeigen/eigen/-/merge_requests/163

N-Dekker · 2020-07-14T19:18:04Z

@jaingaurav FYI, The eigen::bfloat16 merge request that corresponds to this tensorflow::bfloat16 pull request has just been approved and merged onto the Eigen master branch by your Google colleague Rasmus Munk Larsen (@rmlarsen) 😃
"Avoid undefined behavior by union type punning in float_to_bfloat16_rtne"
https://gitlab.com/libeigen/eigen/-/merge_requests/164
https://gitlab.com/libeigen/eigen/-/commit/b11f817bcff04276f3024d6780f56a137968b81a

N-Dekker · 2020-07-17T14:13:54Z

@jaingaurav

If the performance improvement is isolated, maybe you can start proposing that in eigen right away?

FYI, the eigen::bfloat16 performance improvement merge request that I submitted is currently being reviewed: "Faster conversion from integer types to bfloat16", https://gitlab.com/libeigen/eigen/-/merge_requests/166

google-ml-butler bot added the size:S CL Change Size: Small label Jul 3, 2020

googlebot added the cla: no label Jul 3, 2020

googlebot added cla: yes and removed cla: no labels Jul 3, 2020

Dynmi previously approved these changes Jul 5, 2020

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Jul 5, 2020

kokoro-team removed the kokoro:force-run Tests on submitted change label Jul 5, 2020

gbaned self-assigned this Jul 6, 2020

gbaned added comp:core issues related to core part of tensorflow and removed ready to pull PR ready for merge process labels Jul 6, 2020

gbaned requested a review from jaingaurav July 6, 2020 04:23

jaingaurav requested a review from rmlarsen July 6, 2020 05:25

jaingaurav suggested changes Jul 6, 2020

View reviewed changes

N-Dekker dismissed Dynmi’s stale review via 3720dbc July 6, 2020 06:15

N-Dekker force-pushed the remove-union-type-punning-round_to_bfloat16 branch from 69de499 to 3720dbc Compare July 6, 2020 06:15

jaingaurav previously approved these changes Jul 6, 2020

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Jul 6, 2020

kokoro-team removed the kokoro:force-run Tests on submitted change label Jul 6, 2020

N-Dekker dismissed jaingaurav’s stale review via a8b6ce3 July 7, 2020 21:47

N-Dekker force-pushed the remove-union-type-punning-round_to_bfloat16 branch from 3720dbc to a8b6ce3 Compare July 7, 2020 21:47

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Jul 7, 2020

kokoro-team removed the kokoro:force-run Tests on submitted change label Jul 7, 2020

N-Dekker dismissed jaingaurav’s stale review via 229cb2f July 8, 2020 05:25

N-Dekker force-pushed the remove-union-type-punning-round_to_bfloat16 branch from a8b6ce3 to 229cb2f Compare July 8, 2020 05:25

google-ml-butler bot removed the ready to pull PR ready for merge process label Jul 8, 2020

jaingaurav previously approved these changes Jul 8, 2020

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Jul 8, 2020

kokoro-team removed the kokoro:force-run Tests on submitted change label Jul 8, 2020

N-Dekker dismissed jaingaurav’s stale review via 34e9fe9 July 8, 2020 20:30

N-Dekker force-pushed the remove-union-type-punning-round_to_bfloat16 branch from 229cb2f to 34e9fe9 Compare July 8, 2020 20:30

google-ml-butler bot removed the ready to pull PR ready for merge process label Jul 8, 2020

gbaned requested a review from jaingaurav July 9, 2020 07:18

jaingaurav approved these changes Jul 9, 2020

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Jul 9, 2020

kokoro-team removed the kokoro:force-run Tests on submitted change label Jul 9, 2020

jaingaurav closed this Jul 9, 2020

Conversation

N-Dekker commented Jul 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

googlebot commented Jul 3, 2020

What to do if you already signed the CLA

Individual signers

Corporate signers

Uh oh!

N-Dekker commented Jul 3, 2020

Uh oh!

googlebot commented Jul 3, 2020

Uh oh!

Dynmi left a comment

Choose a reason for hiding this comment

Uh oh!

jaingaurav left a comment

Choose a reason for hiding this comment

Uh oh!

jaingaurav commented Jul 6, 2020

Uh oh!

N-Dekker commented Jul 6, 2020

Uh oh!

jaingaurav commented Jul 6, 2020

Uh oh!

N-Dekker commented Jul 7, 2020

Uh oh!

jaingaurav commented Jul 7, 2020

Uh oh!

jaingaurav commented Jul 7, 2020

Uh oh!

N-Dekker commented Jul 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

N-Dekker commented Jul 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jaingaurav commented Jul 8, 2020

Uh oh!

jaingaurav commented Jul 9, 2020

Uh oh!

N-Dekker commented Jul 10, 2020

Uh oh!

jaingaurav commented Jul 11, 2020

Uh oh!

N-Dekker commented Jul 11, 2020

Uh oh!

N-Dekker commented Jul 14, 2020

Uh oh!

N-Dekker commented Jul 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

N-Dekker commented Jul 3, 2020 •

edited

Loading

N-Dekker commented Jul 8, 2020 •

edited

Loading

N-Dekker commented Jul 8, 2020 •

edited

Loading

N-Dekker commented Jul 17, 2020 •

edited

Loading