Skip to content

Avoid undefined behavior by union type punning in round_to_bfloat16#41070

Closed
N-Dekker wants to merge 1 commit intotensorflow:masterfrom
N-Dekker:remove-union-type-punning-round_to_bfloat16
Closed

Avoid undefined behavior by union type punning in round_to_bfloat16#41070
N-Dekker wants to merge 1 commit intotensorflow:masterfrom
N-Dekker:remove-union-type-punning-round_to_bfloat16

Conversation

@N-Dekker
Copy link
Copy Markdown

@N-Dekker N-Dekker commented Jul 3, 2020

Use std::memcpy instead of union based type punning, to avoid undefined behavior.
See also C++ Core Guidelines: "Don't use a union for type punning"
https://github.com/isocpp/CppCoreGuidelines/blob/v0.8/CppCoreGuidelines.md#c183-dont-use-a-union-for-type-punning

@google-ml-butler google-ml-butler bot added the size:S CL Change Size: Small label Jul 3, 2020
@googlebot
Copy link
Copy Markdown

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@N-Dekker
Copy link
Copy Markdown
Author

N-Dekker commented Jul 3, 2020

@googlebot I signed it!

@googlebot
Copy link
Copy Markdown

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added cla: yes and removed cla: no labels Jul 3, 2020
Dynmi
Dynmi previously approved these changes Jul 5, 2020
Copy link
Copy Markdown
Contributor

@Dynmi Dynmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delicate and nice change

@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Jul 5, 2020
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Jul 5, 2020
@gbaned gbaned self-assigned this Jul 6, 2020
@gbaned gbaned added comp:core issues related to core part of tensorflow and removed ready to pull PR ready for merge process labels Jul 6, 2020
@gbaned gbaned requested a review from jaingaurav July 6, 2020 04:23
@jaingaurav jaingaurav requested a review from rmlarsen July 6, 2020 05:25
Copy link
Copy Markdown
Contributor

@jaingaurav jaingaurav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove the FP32 union on line 180 as well?

Also adding @rmlarsen to the review.

@N-Dekker N-Dekker force-pushed the remove-union-type-punning-round_to_bfloat16 branch from 69de499 to 3720dbc Compare July 6, 2020 06:15
jaingaurav
jaingaurav previously approved these changes Jul 6, 2020
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Jul 6, 2020
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Jul 6, 2020
@jaingaurav
Copy link
Copy Markdown
Contributor

@N-Dekker
Copy link
Copy Markdown
Author

N-Dekker commented Jul 6, 2020

@N-Dekker: Could you also make this change in Eigen? The relevant file is https://bitbucket.org/eigen/eigen/src/default/Eigen/src/Core/arch/Default/Half.h.

Thanks for the suggestion, @jaingaurav I never contributed to Eigen before, but I could give it a try. Specifically for their bfloat16 implementation, I think that would be this union: https://gitlab.com/libeigen/eigen/-/blob/386d809bde475c65b7940f290efe80e6a05878c4/Eigen/src/Core/arch/Default/BFloat16.h#L315 Right? Bfloat16 is my main interest for now, as I'm maintaining another implementation myself! https://github.com/biovault/biovault_bfloat16

Unfortunately, it appears that Eigen does not have a convenient bit_cast function template like absl::bit_cast. So I'd either have to propose bit_cast to Eigen as well, or fix their bfloat16 by memcpy-ing the bits from float to unsigned int, inside their float_to_bfloat16_rtne.

Anyway, it would certainly help if this pull request would make it into the master branch of TensorFlow 😃 I'll have another look tomorrow!

@jaingaurav
Copy link
Copy Markdown
Contributor

@N-Dekker: Yeah I'd recommend updating both Half.h & BFloat16.h.

We're trying to merge your changes but there seem to be some BUILD file issues. I think you need to include the absl dependency else we are getting errors like: 'absl/base/casts.h': No such file or directory

@N-Dekker
Copy link
Copy Markdown
Author

N-Dekker commented Jul 7, 2020

We're trying to merge your changes but there seem to be some BUILD file issues. I think you need to include the absl dependency else we are getting errors like: 'absl/base/casts.h': No such file or directory

Thank you @jaingaurav, but I'm sorry, I'm not really familiar with the build system! Would it be sufficient to just add "@com_google_absl//absl/base" to tensorflow/core/lib/bfloat16/BUILD ? I guess somewhere here, right?
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/lib/bfloat16/BUILD#L18

@jaingaurav
Copy link
Copy Markdown
Contributor

Yep I believe that should work

@N-Dekker N-Dekker force-pushed the remove-union-type-punning-round_to_bfloat16 branch from 3720dbc to a8b6ce3 Compare July 7, 2020 21:47
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Jul 7, 2020
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Jul 7, 2020
@jaingaurav
Copy link
Copy Markdown
Contributor

@N-Dekker: I believe there is a ubuntu sanity check failure due to the ordering of the build deps. Could you please address that?

@N-Dekker
Copy link
Copy Markdown
Author

N-Dekker commented Jul 8, 2020

I believe there is a ubuntu sanity check failure due to the ordering of the build deps. Could you please address that?

@jaingaurav: Thanks, I see now in https://source.cloud.google.com/results/invocations/c3882af9-15f4-4c40-b79a-37a233c1d006/targets/%2F%2Ftensorflow%2Ftools%2Fci_build:gen_ci_sanity_out/log

FAIL: buildifier found errors and/or warnings in above BUILD files.
buildifier suggested the following changes:
tensorflow/core/lib/bfloat16/BUILD:
18d17
<         "@com_google_absl//absl/base",
20a20
>         "@com_google_absl//absl/base",
exit status 1
Please fix manually or run buildifier <file> to auto-fix.

I guess that means that it suggests moving the line down, from 18 to 20, right? Do you have a clue why? I'm asking because other BUILD files do have "@com_google_absl//absl/base" in front of their "//tensorflow/core..." dependencies, for example: https://github.com/tensorflow/tensorflow/blob/v2.3.0-rc0/tensorflow/compiler/aot/BUILD#L43

Anyway, I'll give it a try.

@N-Dekker N-Dekker force-pushed the remove-union-type-punning-round_to_bfloat16 branch from a8b6ce3 to 229cb2f Compare July 8, 2020 05:25
@google-ml-butler google-ml-butler bot removed the ready to pull PR ready for merge process label Jul 8, 2020
jaingaurav
jaingaurav previously approved these changes Jul 8, 2020
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Jul 8, 2020
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Jul 8, 2020
@N-Dekker
Copy link
Copy Markdown
Author

N-Dekker commented Jul 8, 2020

@jaingaurav Still build failures, unfortunately! I guess because of the "absl/base" dependency as well. Would you suggest me to simply avoid using Abseil for this pull request? (Instead of absl::bit_cast<uint32_t>(v), the member function could directly call std::memcpy(&u, &v, sizeof(float)), of course.)

I'm asking you also because I have another pull request in mind that may or may not use bit_cast

@jaingaurav
Copy link
Copy Markdown
Contributor

@N-Dekker: Some of the failures seem unrelated. However, given that we're unlikely to use absl in Eigen an alternate fix without absl would be preferred.

@N-Dekker N-Dekker force-pushed the remove-union-type-punning-round_to_bfloat16 branch from 229cb2f to 34e9fe9 Compare July 8, 2020 20:30
@google-ml-butler google-ml-butler bot removed the ready to pull PR ready for merge process label Jul 8, 2020
@gbaned gbaned requested a review from jaingaurav July 9, 2020 07:18
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Jul 9, 2020
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Jul 9, 2020
@jaingaurav
Copy link
Copy Markdown
Contributor

Apologies @N-Dekker but we're actively reworking this code to use the Eigen implementation. As a result, we'll address this casting behavior when performing that work.

@jaingaurav jaingaurav closed this Jul 9, 2020
@N-Dekker
Copy link
Copy Markdown
Author

Apologies @N-Dekker but we're actively reworking this code to use the Eigen implementation. As a result, we'll address this casting behavior when performing that work.

No problem @jaingaurav, , thank you for the information! Would you still suggest me to make a merge request on this issue for https://gitlab.com/libeigen/eigen ?

I'm asking also because I was about to prepare another possible improvement: I believe the performance of conversion from an integer type to bfloat could be improved significantly.

@jaingaurav
Copy link
Copy Markdown
Contributor

@N-Dekker: I think I'd hold off just a bit until @rmlarsen completes the work. He said he might be able to fix this issue when doing that work.

If the performance improvement is isolated, maybe you can start proposing that in eigen right away?

@N-Dekker
Copy link
Copy Markdown
Author

FYI I just submitted a bfloat32 merge request to Eigen, albeit on a different subject: "Allow implicit conversion from bfloat16 to float and double", https://gitlab.com/libeigen/eigen/-/merge_requests/163

@N-Dekker
Copy link
Copy Markdown
Author

@jaingaurav FYI, The eigen::bfloat16 merge request that corresponds to this tensorflow::bfloat16 pull request has just been approved and merged onto the Eigen master branch by your Google colleague Rasmus Munk Larsen (@rmlarsen) 😃
"Avoid undefined behavior by union type punning in float_to_bfloat16_rtne"
https://gitlab.com/libeigen/eigen/-/merge_requests/164
https://gitlab.com/libeigen/eigen/-/commit/b11f817bcff04276f3024d6780f56a137968b81a

@N-Dekker
Copy link
Copy Markdown
Author

N-Dekker commented Jul 17, 2020

@jaingaurav

If the performance improvement is isolated, maybe you can start proposing that in eigen right away?

FYI, the eigen::bfloat16 performance improvement merge request that I submitted is currently being reviewed: "Faster conversion from integer types to bfloat16", https://gitlab.com/libeigen/eigen/-/merge_requests/166

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes comp:core issues related to core part of tensorflow ready to pull PR ready for merge process size:S CL Change Size: Small

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants