Replace printables table with `unicode_data.rs` tables by Jules-Bertholet · Pull Request #155527 · rust-lang/rust

Jules-Bertholet · 2026-04-19T21:46:54Z

This gets rid of the printable.py script, ensuring that unicode-table-generator handles all our Unicode data table generation needs.

There are also some drive-by documentation improvements in library/core/char/methods.rs.

There is one change in behavior: we now consider all characters with the Default_Ignorable_Code_Point property to be unprintable. These characters can be hidden/invisible otherwise.

I've chosen to give each Unicode property its own table, instead of merging them all into one. This is slightly less efficient in terms of space, but should allow us to expose these tables in the future with public methods on char.

@rustbot label A-Unicode

rustbot · 2026-04-19T21:46:58Z

library/core/src/unicode/unicode_data.rs is generated by the src/tools/unicode-table-generator tool.

If you want to modify unicode_data.rs, please modify the tool then regenerate the library source file via ./x run src/tools/unicode-table-generator instead of editing unicode_data.rs manually.

rustbot · 2026-04-19T21:47:00Z

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

Owners of files modified in this PR: @scottmcm, libs
@scottmcm, libs expanded to 7 candidates
Random selection from Mark-Simulacrum, jhpratt, scottmcm

Mark-Simulacrum · 2026-04-26T18:49:45Z

There is one change in behavior: we now consider all characters with the Default_Ignorable_Code_Point property to be unprintable. These characters can be hidden/invisible otherwise.

Nominating for libs-api to FCP this. @Jules-Bertholet can you write up how that affects the public API of std? i.e., where is that unprintability used (only in Debug impls of str)?

Mark-Simulacrum

Can you split out the re-ordering and renaming in char/methods.rs? It's very hard to review the diff for me when methods are moved around in the file. It also seems entirely unrelated to the core change here and I'd rather have separate commits at least.

The changes look broadly reasonable though, I'd be happy to accept them if separated out (including maybe from the libs-api facing change).

View changes since this review

rustbot · 2026-04-26T18:57:28Z

Reminder, once the PR becomes ready for a review, use @rustbot ready.

And rename a struct field.

This gets rid of the `printable.py` script, ensuring that `unicode-table-generator` handles all our Unicode data table generation needs. I've elected to give each Unicode property its own table, instead of merging them all into one. This is slightly less efficient in terms of space, but should allow us to expose these tables in the future with public methods on `char`.

These characters may be hidden/invisible otherwise.

Jules-Bertholet · 2026-04-26T22:56:02Z

@rustbot ready

6079a98 is the libs-API-relevant change. It affects the Debug impls for char and the various string types, as well as the escape_debug() methods on char and str. The following characters are changed to be always escaped: https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BDefault_Ignorable_Code_Point%7D-%5Cp%7BCf%7D-%5Cp%7BCn%7D

Note that we may also wish to stop escaping format control characters which are not default-ignorable. The list of characters this would affect: https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BCf%7D-%5Cp%7BDefault_Ignorable_Code_Point%7D

nia-e · 2026-04-28T17:34:59Z

We discussed this in today's @rust-lang/libs-api meeting; +1 for us, but we'd like someone with more unicode knowledge to weigh in to be safe so cc @Manishearth (offtopic, it would be nice to have an @rust-lang/unicode-knowers ping group since these issues arise pretty often)

Manishearth · 2026-04-28T20:28:54Z

I didn't look too closely, but this seems fine. From a quick look the printability concern is for debug output, and yes, being more conservative there makes sense.

Manishearth · 2026-04-28T20:29:52Z

While you shouldn't depend on ICU4X in the stdlib, it may be worth using ICU4X to get your unicode properties, instead of fetching them yourself. This does mean you are beholden to ICU4X for unicode updates though.

Amanieu · 2026-05-05T15:28:02Z

@rfcbot merge libs-api

rust-rfcbot · 2026-05-05T15:28:05Z

Team member @Amanieu has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 19, 2026

rustbot assigned Mark-Simulacrum Apr 19, 2026

rustbot added the A-Unicode Area: Unicode label Apr 19, 2026

This comment has been minimized.

Sign in to view

Jules-Bertholet force-pushed the riir-printable branch from c2aeaba to fd13759 Compare April 19, 2026 22:55

This comment has been minimized.

Sign in to view

Jules-Bertholet force-pushed the riir-printable branch 2 times, most recently from ab09b17 to a799ecf Compare April 20, 2026 00:59

Mark-Simulacrum added the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Apr 26, 2026

Mark-Simulacrum requested changes Apr 26, 2026

View reviewed changes

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 26, 2026

Jules-Bertholet added 4 commits April 26, 2026 18:36

Improve core::char::mathods.rs docs

ffa8436

And rename a struct field.

char: move is_numeric next to is_alphanumeric

da089a7

Consider all Default_Ignorable_Code_Points unprintable

6079a98

These characters may be hidden/invisible otherwise.

Jules-Bertholet force-pushed the riir-printable branch from a799ecf to 6079a98 Compare April 26, 2026 22:40

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 26, 2026

Amanieu removed the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label May 5, 2026

rust-rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace printables table with `unicode_data.rs` tables#155527

Replace printables table with `unicode_data.rs` tables#155527
Jules-Bertholet wants to merge 4 commits intorust-lang:mainfrom
Jules-Bertholet:riir-printable

Jules-Bertholet commented Apr 19, 2026

Uh oh!

rustbot commented Apr 19, 2026

Uh oh!

rustbot commented Apr 19, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

Mark-Simulacrum commented Apr 26, 2026

Uh oh!

Mark-Simulacrum left a comment •

edited by rustbot

Loading

Uh oh!

rustbot commented Apr 26, 2026

Uh oh!

Jules-Bertholet commented Apr 26, 2026 •

edited

Loading

Uh oh!

nia-e commented Apr 28, 2026

Uh oh!

Manishearth commented Apr 28, 2026

Uh oh!

Manishearth commented Apr 28, 2026

Uh oh!

Amanieu commented May 5, 2026

Uh oh!

rust-rfcbot commented May 5, 2026 •

edited by joshtriplett

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Uh oh!

Conversation

Jules-Bertholet commented Apr 19, 2026

Uh oh!

rustbot commented Apr 19, 2026

Uh oh!

rustbot commented Apr 19, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

Mark-Simulacrum commented Apr 26, 2026

Uh oh!

Mark-Simulacrum left a comment • edited by rustbot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rustbot commented Apr 26, 2026

Uh oh!

Jules-Bertholet commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nia-e commented Apr 28, 2026

Uh oh!

Manishearth commented Apr 28, 2026

Uh oh!

Manishearth commented Apr 28, 2026

Uh oh!

Amanieu commented May 5, 2026

Uh oh!

rust-rfcbot commented May 5, 2026 • edited by joshtriplett Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Mark-Simulacrum left a comment •

edited by rustbot

Loading

Jules-Bertholet commented Apr 26, 2026 •

edited

Loading

rust-rfcbot commented May 5, 2026 •

edited by joshtriplett

Loading