-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Convert paths, arguments, and env vars to Unicode for remote execution. #15333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @tjgq |
tjgq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not entirely obvious to me that Platform has the same issue (its contents originate from command line arguments, not Starlark). Adding a test would be the way to be sure, but feel free to punt on it, since non-ASCII values are probably not as common there.
FYI, while this PR is most definitely welcome, I suspect it doesn't completely fix #14381, since that bug report also mentions an encoding issue in bazel aquery. So feel free to mention it, but I'd suggest leaving it open for now.
src/main/java/com/google/devtools/build/lib/remote/util/Utils.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/remote/util/Utils.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/remote/merkletree/MerkleTree.java
Outdated
Show resolved
Hide resolved
I think the bad I've tweaked this PR so it also fixes
While platform properties can be obtained from CLI flags, they also come from Starlark (the For reference, the following property configurations are equivalent: And result in the same (incorrect) |
tjgq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tweaked this PR so it also fixes
aquery, using the new helper functions.
Thanks! Do you mind also adding a test for bazel aquery, probably in src/test/shell/integration/aquery_test.sh?
While platform properties can be obtained from CLI flags, they also come from Starlark (the
platformrule orexec_propertiescommon attribute). I agree that non-ASCII values in platform properties are probably rare, so not going to worry about it for now, but it might be worth fixing in the future.
Thanks, I wasn't aware of the exec_properties attribute. I agree that we don't have to fix this right now.
src/main/java/com/google/devtools/build/lib/remote/util/Utils.java
Outdated
Show resolved
Hide resolved
Done -- I added a test for Fixing the other For those, I've added a |
|
Note on the Also, apparently, on Windows strings for file paths are Unicode and strings from Starlark are UTF-8. I extended the logic in Here's how the new |
| * to be durable against unusual encoding settings, and does not guarantee | ||
| * that the escaping process is reverseable. | ||
| * | ||
| * Non-printable ASCII characters are formatted as `{U+00XX}`. Characters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go with the standard-ish \uXXXX notation to make the output copy-pastable into a script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/main/java/com/google/devtools/build/lib/remote/util/Utils.java
Outdated
Show resolved
Hide resolved
tjgq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'm going to import this now. Apologies for my confusion earlier and the review delay (I was out on Friday).
…utput and remote execution protocol. (#15447) For historical reasons, Bazel internally encodes non-ASCII in BUILD/bzl files by taking individual input bytes (assumed to be UTF-8) and storing them in a String as the corresponding Latin1 characters. This encoding must be undone whenever these strings escape to the outside world. Fixes #14381. Closes #15333. PiperOrigin-RevId: 445941013 Co-authored-by: John Millikin <[email protected]>
Fixes #14381
I didn't touch the
Platformconstruction because it seems to be used in parts of Bazel that aren't specific to remote execution. Maybe it'd be fine to change too? Comments welcomed.