[8.0.0] Fix more Unicode bugs by fmeum · Pull Request #24403 · bazelbuild/bazel

fmeum · 2024-11-19T23:20:28Z

Use Latin-1 in many native file write rules for consistency with the internal encoding.
Use Latin-1 for the resolved repository file and the JSON profile.
Fix unused_input_list handling of non-ASCII characters in file names.
Flip the legacy_utf8 parameter of repository_ctx.file to False and make it a no-op. With the previous default, any non-ASCII characters would be written out as double encoded UTF-8, which is not a useful choice.
Change repository_ctx.template to operate on raw bytes for consistency with repository_ctx.read and to fix substitution with non-ASCII keys/values.
Move some usages of UTF_8 closer to their usage site to clarify why they are correct.
Fixes parsing of dependency files with Unicode character contents (/showIncludes and .d files)

PiperOrigin-RevId: 698111811
Change-Id: Ie43bab9eb5963bf81690dd8985d358f544a711c9 (cherry picked from commit 3fdec93)

* Use Latin-1 in many native file write rules for consistency with the internal encoding. * Use Latin-1 for the resolved repository file and the JSON profile. * Fix `unused_input_list` handling of non-ASCII characters in file names. * Flip the `legacy_utf8` parameter of `repository_ctx.file` to `False` and make it a no-op. With the previous default, any non-ASCII characters would be written out as double encoded UTF-8, which is not a useful choice. * Change `repository_ctx.template` to operate on raw bytes for consistency with `repository_ctx.read` and to fix substitution with non-ASCII keys/values. * Move some usages of `UTF_8` closer to their usage site to clarify why they are correct. * Fixes parsing of dependency files with Unicode character contents (`/showIncludes` and `.d` files) Closes bazelbuild#24182. PiperOrigin-RevId: 698111811 Change-Id: Ie43bab9eb5963bf81690dd8985d358f544a711c9 (cherry picked from commit 3fdec93)

Prior to Bazel 8, manifest files containing non-ASCII characters were written with UTF-16LE encoding instead of UTF-8 on Windows: - bazelbuild/bazel#24231 - bazelbuild/bazel#24350 - bazelbuild/bazel#24403 This led to disable failing tests in CI: `//tests/zip:unicode_test`: ``` File "pkg\private\manifest.py", line 59, in read_entries_from raw_entries = json.loads(fh.read()) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb in position 338: invalid start byte ``` `//tests/mappings:utf8_manifest_test`: ``` File "tests\mappings\manifest_test_lib.py", line 39, in assertManifestsMatch got = json.loads(g_fp.read()) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb in position 354: invalid start byte ``` Since the manifest is plain JSON, the fix simply consists in detecting whether the second byte is `0`, where the default UTF-8 decoding would fail, in which case we assume the file is UTF-16LE-encoded. The code is slightly reorganized to factor out the encoding selection. This allows to enable `//tests/mappings:utf8_manifest_test` and `//tests/zip:unicode_test` tests in Windows CI.

fmeum requested a review from a team as a code owner November 19, 2024 23:20

fmeum requested review from Wyverald and removed request for a team November 19, 2024 23:20

fmeum requested a review from a team November 19, 2024 23:20

Fix record pattern

bf07883

fmeum mentioned this pull request Nov 20, 2024

[8.0.0] Fix more Unicode bugs #24242

Closed

fmeum requested a review from tjgq November 20, 2024 11:38

meteorcloudy approved these changes Nov 20, 2024

View reviewed changes

meteorcloudy added this pull request to the merge queue Nov 20, 2024

Merged via the queue into bazelbuild:release-8.0.0 with commit ed38e47 Nov 20, 2024

github-actions bot removed the awaiting-review PR is awaiting review from an assigned reviewer label Nov 20, 2024

fmeum deleted the 24242-cherry branch November 20, 2024 12:40

rdesgroppes mentioned this pull request Feb 8, 2026

fix: handle UTF-16LE-encoded manifests on Windows bazelbuild/rules_pkg#1021

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[8.0.0] Fix more Unicode bugs#24403

[8.0.0] Fix more Unicode bugs#24403
meteorcloudy merged 2 commits intobazelbuild:release-8.0.0from
fmeum:24242-cherry

fmeum commented Nov 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fmeum commented Nov 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants