-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Charset: Create compat-utf8.php module with fallback code.
#9825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the Core Committers: Use this line as a base for the props when committing in SVN: To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
219f167 to
b77dc49
Compare
b77dc49 to
28bcb62
Compare
This is the second in a series of patches to modernize and standardize UTF-8 handling. When the fallback UTF-8 validation code was added it was placed inside formatting.php; however, that validation logic can be reused for a number of related UTF-8 functions. To faciliate this it should move into a new location and be loaded early. This patch is the first half of doing that, whereby the original fallback function is moved unchanged to the `compat-utf8.php` module. The follow-up patch will abstract the UTF-8 scanning logic for reuse. Splitting this into a move and a separate change involves an extra step, but faciliates tracking the heritage of the code through the changes. Developed in #9825 Discussed in https://core.trac.wordpress.org/ticket/63863 Follow-up to: [60630]. See #63863. git-svn-id: https://develop.svn.wordpress.org/trunk@60743 602fd350-edb4-49c9-b593-d223f7449a82
This is the second in a series of patches to modernize and standardize UTF-8 handling. When the fallback UTF-8 validation code was added it was placed inside formatting.php; however, that validation logic can be reused for a number of related UTF-8 functions. To faciliate this it should move into a new location and be loaded early. This patch is the first half of doing that, whereby the original fallback function is moved unchanged to the `compat-utf8.php` module. The follow-up patch will abstract the UTF-8 scanning logic for reuse. Splitting this into a move and a separate change involves an extra step, but faciliates tracking the heritage of the code through the changes. Developed in WordPress/wordpress-develop#9825 Discussed in https://core.trac.wordpress.org/ticket/63863 Follow-up to: [60630]. See #63863. Built from https://develop.svn.wordpress.org/trunk@60743 git-svn-id: http://core.svn.wordpress.org/trunk@60079 1a063a9b-81f0-0310-95a4-ce76da25c4cd
|
Merged in 31cac36 |
This is the second in a series of patches to modernize and standardize UTF-8 handling. When the fallback UTF-8 validation code was added it was placed inside formatting.php; however, that validation logic can be reused for a number of related UTF-8 functions. To faciliate this it should move into a new location and be loaded early. This patch is the first half of doing that, whereby the original fallback function is moved unchanged to the `compat-utf8.php` module. The follow-up patch will abstract the UTF-8 scanning logic for reuse. Splitting this into a move and a separate change involves an extra step, but faciliates tracking the heritage of the code through the changes. Developed in WordPress/wordpress-develop#9825 Discussed in https://core.trac.wordpress.org/ticket/63863 Follow-up to: [60630]. See #63863. Built from https://develop.svn.wordpress.org/trunk@60743 git-svn-id: https://core.svn.wordpress.org/trunk@60079 1a063a9b-81f0-0310-95a4-ce76da25c4cd
This is the second in a series of patches to modernize and standardize UTF-8 handling. When the fallback UTF-8 validation code was added it was placed inside formatting.php; however, that validation logic can be reused for a number of related UTF-8 functions. To faciliate this it should move into a new location and be loaded early. This patch is the first half of doing that, whereby the original fallback function is moved unchanged to the `compat-utf8.php` module. The follow-up patch will abstract the UTF-8 scanning logic for reuse. Splitting this into a move and a separate change involves an extra step, but faciliates tracking the heritage of the code through the changes. Developed in WordPress#9825 Discussed in https://core.trac.wordpress.org/ticket/63863 Follow-up to: [60630]. See #63863. git-svn-id: https://develop.svn.wordpress.org/trunk@60743 602fd350-edb4-49c9-b593-d223f7449a82
Trac ticket: Core-63863
Follow-up to: #9317, [60630]
See: (
#9825),#9830,#9498,#9826,#9827, #9798,#9828,#9829This patch introduces a new early-loaded
compat-utf8.phpfile which provides the UTF-8 validation fallback. This is part of a broader effort to unify and standardize UTF-8 handling.This is an intermediate change in order to better facilitate source-code tracking. This code was introduced in
formatting.phpand originally was intended to be duplicated inside ofwp_check_invalid_utf8(), but the difference between the two functions is so minor that the part of the code which scans through the bytes in a string should be abstracted and reused. This re-use function would ideally load early enough to be used to polyfill methods likemb_substr(), so I am moving this early.While this is not the abstracted code, this change will be helpful by providing continuity between the function as it stands today and as it will transform when reused by
wp_check_invalid_utf8(). In other words, this change exists to make sure that source control shows that this function moved first, and then was changed later. To move it and change it in one go is likely to sever its history.This will turn into
_wp_scan_utf8(), the updated iteration of the #6883 which provides a fast, spec-compliant, and streamable UTF-8 parser.