Commit a58fe3f
Fix most Unicode encoding bugs
Bazel aims to support arbitrary file system path encodings (even raw byte sequences) by attempting to force the JVM to use a Latin-1 locale for OS interactions. As a result, Bazel internally encodes `String`s as raw byte arrays with a Latin-1 coder and no encoding information. Whenever it interacts with encoding-aware APIs, this may require a reencoding of the `String` contents, depending on the OS and availability of a Latin-1 locale.
This PR introduces the concepts of *internal*, *Unicode*, and *platform* strings and adds dedicated optimized functions for converting between these three types (see the class comment on the new `StringEncoding` helper class for details). These functions are then used to standardize and fix conversion throughout the code base. As a result, a number of new end-to-end integration tests for the handling of Unicode in file paths, command-line arguments and environment variables now pass.
Full support for Unicode beyond the current active code page on Windows is left to a follow-up PR as it may require patching the embedded JDK.
* Replace ad-hoc conversion logic with the new consistent set of helper functions.
* Make more parts of the Bazel client's Windows implementation Unicode-aware. This also fixes the behavior of `SetEnv` on Windows, which previously would remove an environment variable if passed an empty value for it, which doesn't match the Unix behavior.
* Drop the `charset` parameter from all methods related to parameter files. The `ISO-8859-1` vs. `UTF-8` choice was flawed since Bazel's internal string representation doesn't maintain any encoding information - `ISO-8859-1` just meant "write out raw bytes", which is the only choice that matches what arguments would look like if passed on the command line.
* Convert server args to the internal string representation. The arguments for requests to the server were already converted to Bazel's internal string representation, which resulted in a mismatch between `--client_cwd` and `--workspace_directory` if the workspace path contains non-ASCII characters.
* Read the downloader config using Bazel's filesystem implementation.
* Make `MacOSXFsEventsDiffAwareness` UTF-8 aware. It previously used the `GetStringUTF` JNI method, which, despite its name, doesn't return the UTF-8 representation of a string, but modified CESU-8 (nobody ever wants this).
* Correctly reencode path strings for `LocalDiffAwareness`.
* Correctly reencode the value of `user.dir`.
* Correctly turn `ExecRequest` fields into strings for `ProcessBuilder` for `bazel --batch run`. This makes it possible to reenable the `test_consistent_command_line_encoding` test, fixing #1775.
* Fix encoding issues in `TargetCompleteEvents`.
* Fix encoding issues in `SubprocessFactory` implementations.
* Drop obsolete warning if `file.encoding` doesn't equal `ISO-8859-1` as file names are encoded with `sun.jnu.encoding` now.
* Consistently reencode internal strings passed into and out of `FileSystem` implementations, e.g. if reading a symlink target. Tests are added that verify the interaction between `FileSystem` implementations and the Java (N)IO APIs on Unicode file paths.
Fixes #1775.
Fixes #11602.
Fixes #18293.
Work towards #374.
Work towards #23859.
Closes #24010.
PiperOrigin-RevId: 694114597
Change-Id: I5bdcbc14a90dd1f0f34698aebcbd07cd2bde7a231 parent f6585d4 commit a58fe3f
File tree
98 files changed
+1398
-819
lines changed- src
- main
- cpp
- java/com/google/devtools/build/lib
- actions
- analysis
- actions
- starlark
- bazel/repository
- downloader
- query2
- aquery
- remote
- merkletree
- rules
- cpp
- java
- runtime
- commands/info
- sandbox
- server
- shell
- skyframe
- serialization/strings
- unix
- unsafe
- util
- vfs
- windows
- worker
- native
- darwin
- test
- cpp
- java/com/google/devtools/build/lib
- analysis
- starlark
- util
- exec
- local
- remote
- sandbox
- skyframe
- testutil
- unix
- unsafe
- util
- vfs
- shell
- bazel
- integration
- tools/remote
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
98 files changed
+1398
-819
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
143 | 143 | | |
144 | 144 | | |
145 | 145 | | |
146 | | - | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
147 | 152 | | |
148 | 153 | | |
149 | 154 | | |
| |||
179 | 184 | | |
180 | 185 | | |
181 | 186 | | |
| 187 | + | |
182 | 188 | | |
183 | 189 | | |
184 | 190 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
372 | 372 | | |
373 | 373 | | |
374 | 374 | | |
375 | | - | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
376 | 378 | | |
377 | 379 | | |
378 | 380 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | | - | |
16 | 15 | | |
17 | | - | |
18 | 16 | | |
19 | 17 | | |
20 | 18 | | |
| |||
937 | 935 | | |
938 | 936 | | |
939 | 937 | | |
940 | | - | |
| 938 | + | |
| 939 | + | |
941 | 940 | | |
942 | 941 | | |
943 | 942 | | |
944 | 943 | | |
945 | | - | |
946 | | - | |
947 | | - | |
| 944 | + | |
| 945 | + | |
| 946 | + | |
948 | 947 | | |
949 | 948 | | |
950 | 949 | | |
| |||
970 | 969 | | |
971 | 970 | | |
972 | 971 | | |
973 | | - | |
974 | | - | |
| 972 | + | |
| 973 | + | |
975 | 974 | | |
976 | 975 | | |
977 | | - | |
| 976 | + | |
| 977 | + | |
| 978 | + | |
| 979 | + | |
978 | 980 | | |
979 | 981 | | |
980 | 982 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | | - | |
| 25 | + | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
| |||
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
58 | 58 | | |
59 | 59 | | |
60 | 60 | | |
| 61 | + | |
| 62 | + | |
61 | 63 | | |
62 | 64 | | |
63 | 65 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
42 | 47 | | |
43 | 48 | | |
44 | 49 | | |
| |||
48 | 53 | | |
49 | 54 | | |
50 | 55 | | |
51 | | - | |
52 | 56 | | |
53 | 57 | | |
54 | 58 | | |
| |||
512 | 516 | | |
513 | 517 | | |
514 | 518 | | |
515 | | - | |
516 | | - | |
| 519 | + | |
| 520 | + | |
517 | 521 | | |
518 | 522 | | |
519 | 523 | | |
| |||
583 | 587 | | |
584 | 588 | | |
585 | 589 | | |
586 | | - | |
587 | | - | |
588 | | - | |
589 | | - | |
590 | | - | |
591 | | - | |
592 | | - | |
593 | | - | |
594 | | - | |
595 | | - | |
596 | | - | |
597 | | - | |
598 | | - | |
599 | | - | |
600 | | - | |
601 | | - | |
602 | | - | |
603 | | - | |
604 | | - | |
605 | | - | |
606 | | - | |
607 | | - | |
608 | | - | |
609 | | - | |
610 | | - | |
611 | | - | |
612 | | - | |
613 | | - | |
614 | | - | |
615 | | - | |
616 | | - | |
617 | | - | |
618 | | - | |
619 | | - | |
620 | | - | |
621 | | - | |
622 | | - | |
623 | | - | |
624 | | - | |
625 | | - | |
626 | | - | |
627 | | - | |
628 | | - | |
629 | | - | |
630 | | - | |
631 | | - | |
632 | | - | |
633 | | - | |
634 | | - | |
635 | | - | |
636 | | - | |
637 | | - | |
638 | | - | |
639 | | - | |
640 | | - | |
641 | | - | |
642 | | - | |
643 | | - | |
644 | | - | |
645 | | - | |
646 | | - | |
647 | | - | |
648 | | - | |
649 | | - | |
650 | 590 | | |
651 | 591 | | |
652 | 592 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
325 | 325 | | |
326 | 326 | | |
327 | 327 | | |
328 | | - | |
329 | 328 | | |
330 | 329 | | |
331 | 330 | | |
| |||
478 | 477 | | |
479 | 478 | | |
480 | 479 | | |
| 480 | + | |
481 | 481 | | |
482 | 482 | | |
483 | 483 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
193 | 193 | | |
194 | 194 | | |
195 | 195 | | |
196 | | - | |
197 | 196 | | |
198 | 197 | | |
199 | 198 | | |
| |||
0 commit comments