Replace printables table with unicode_data.rs tables#155527
Replace printables table with unicode_data.rs tables#155527Jules-Bertholet wants to merge 4 commits intorust-lang:mainfrom
unicode_data.rs tables#155527Conversation
|
If you want to modify |
|
rustbot has assigned @Mark-Simulacrum. Use Why was this reviewer chosen?The reviewer was selected based on:
|
This comment has been minimized.
This comment has been minimized.
c2aeaba to
fd13759
Compare
This comment has been minimized.
This comment has been minimized.
ab09b17 to
a799ecf
Compare
Nominating for libs-api to FCP this. @Jules-Bertholet can you write up how that affects the public API of std? i.e., where is that unprintability used (only in Debug impls of str)? |
There was a problem hiding this comment.
Can you split out the re-ordering and renaming in char/methods.rs? It's very hard to review the diff for me when methods are moved around in the file. It also seems entirely unrelated to the core change here and I'd rather have separate commits at least.
The changes look broadly reasonable though, I'd be happy to accept them if separated out (including maybe from the libs-api facing change).
|
Reminder, once the PR becomes ready for a review, use |
And rename a struct field.
This gets rid of the `printable.py` script, ensuring that `unicode-table-generator` handles all our Unicode data table generation needs. I've elected to give each Unicode property its own table, instead of merging them all into one. This is slightly less efficient in terms of space, but should allow us to expose these tables in the future with public methods on `char`.
These characters may be hidden/invisible otherwise.
a799ecf to
6079a98
Compare
|
@rustbot ready 6079a98 is the libs-API-relevant change. It affects the Note that we may also wish to stop escaping format control characters which are not default-ignorable. The list of characters this would affect: https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BCf%7D-%5Cp%7BDefault_Ignorable_Code_Point%7D |
|
We discussed this in today's @rust-lang/libs-api meeting; +1 for us, but we'd like someone with more unicode knowledge to weigh in to be safe so cc @Manishearth (offtopic, it would be nice to have an @rust-lang/unicode-knowers ping group since these issues arise pretty often) |
|
I didn't look too closely, but this seems fine. From a quick look the printability concern is for debug output, and yes, being more conservative there makes sense. |
|
While you shouldn't depend on ICU4X in the stdlib, it may be worth using ICU4X to get your unicode properties, instead of fetching them yourself. This does mean you are beholden to ICU4X for unicode updates though. |
|
@rfcbot merge libs-api |
|
Team member @Amanieu has proposed to merge this. The next step is review by the rest of the tagged team members: No concerns currently listed. Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
This gets rid of the
printable.pyscript, ensuring thatunicode-table-generatorhandles all our Unicode data table generation needs.There are also some drive-by documentation improvements in
library/core/char/methods.rs.There is one change in behavior: we now consider all characters with the
Default_Ignorable_Code_Pointproperty to be unprintable. These characters can be hidden/invisible otherwise.I've chosen to give each Unicode property its own table, instead of merging them all into one. This is slightly less efficient in terms of space, but should allow us to expose these tables in the future with public methods on
char.@rustbot label A-Unicode