Skip to content

Conversation

@dmsnell
Copy link
Member

@dmsnell dmsnell commented Sep 10, 2025

Trac ticket: Core-63863
Follow-up to: #9317, [60630]
See: (#9825), #9830, #9498, #9826, #9827, #9798, #9828, #9829

This patch introduces a new early-loaded compat-utf8.php file which provides the UTF-8 validation fallback. This is part of a broader effort to unify and standardize UTF-8 handling.

This is an intermediate change in order to better facilitate source-code tracking. This code was introduced in formatting.php and originally was intended to be duplicated inside of wp_check_invalid_utf8(), but the difference between the two functions is so minor that the part of the code which scans through the bytes in a string should be abstracted and reused. This re-use function would ideally load early enough to be used to polyfill methods like mb_substr(), so I am moving this early.

While this is not the abstracted code, this change will be helpful by providing continuity between the function as it stands today and as it will transform when reused by wp_check_invalid_utf8(). In other words, this change exists to make sure that source control shows that this function moved first, and then was changed later. To move it and change it in one go is likely to sever its history.

This will turn into _wp_scan_utf8(), the updated iteration of the #6883 which provides a fast, spec-compliant, and streamable UTF-8 parser.

@github-actions
Copy link

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props dmsnell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@github-actions
Copy link

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

@dmsnell dmsnell force-pushed the utf8/new-compat-module branch from b77dc49 to 28bcb62 Compare September 15, 2025 18:57
pento pushed a commit that referenced this pull request Sep 15, 2025
This is the second in a series of patches to modernize and standardize UTF-8 handling.

When the fallback UTF-8 validation code was added it was placed inside formatting.php; however, that validation logic can be reused for a number of related UTF-8 functions. To faciliate this it should move into a new location and be loaded early. This patch is the first half of doing that, whereby the original fallback function is moved unchanged to the `compat-utf8.php` module. The follow-up patch will abstract the UTF-8 scanning logic for reuse. Splitting this into a move and a separate change involves an extra step, but faciliates tracking the heritage of the code through the changes.

Developed in #9825
Discussed in https://core.trac.wordpress.org/ticket/63863

Follow-up to: [60630].

See #63863.


git-svn-id: https://develop.svn.wordpress.org/trunk@60743 602fd350-edb4-49c9-b593-d223f7449a82
markjaquith pushed a commit to markjaquith/WordPress that referenced this pull request Sep 15, 2025
This is the second in a series of patches to modernize and standardize UTF-8 handling.

When the fallback UTF-8 validation code was added it was placed inside formatting.php; however, that validation logic can be reused for a number of related UTF-8 functions. To faciliate this it should move into a new location and be loaded early. This patch is the first half of doing that, whereby the original fallback function is moved unchanged to the `compat-utf8.php` module. The follow-up patch will abstract the UTF-8 scanning logic for reuse. Splitting this into a move and a separate change involves an extra step, but faciliates tracking the heritage of the code through the changes.

Developed in WordPress/wordpress-develop#9825
Discussed in https://core.trac.wordpress.org/ticket/63863

Follow-up to: [60630].

See #63863.

Built from https://develop.svn.wordpress.org/trunk@60743


git-svn-id: http://core.svn.wordpress.org/trunk@60079 1a063a9b-81f0-0310-95a4-ce76da25c4cd
@dmsnell
Copy link
Member Author

dmsnell commented Sep 15, 2025

Merged in 31cac36

@dmsnell dmsnell closed this Sep 15, 2025
@dmsnell dmsnell deleted the utf8/new-compat-module branch September 15, 2025 19:12
github-actions bot pushed a commit to platformsh/wordpress-performance that referenced this pull request Sep 15, 2025
This is the second in a series of patches to modernize and standardize UTF-8 handling.

When the fallback UTF-8 validation code was added it was placed inside formatting.php; however, that validation logic can be reused for a number of related UTF-8 functions. To faciliate this it should move into a new location and be loaded early. This patch is the first half of doing that, whereby the original fallback function is moved unchanged to the `compat-utf8.php` module. The follow-up patch will abstract the UTF-8 scanning logic for reuse. Splitting this into a move and a separate change involves an extra step, but faciliates tracking the heritage of the code through the changes.

Developed in WordPress/wordpress-develop#9825
Discussed in https://core.trac.wordpress.org/ticket/63863

Follow-up to: [60630].

See #63863.

Built from https://develop.svn.wordpress.org/trunk@60743


git-svn-id: https://core.svn.wordpress.org/trunk@60079 1a063a9b-81f0-0310-95a4-ce76da25c4cd
jonnynews pushed a commit to spacedmonkey/wordpress-develop that referenced this pull request Sep 24, 2025
This is the second in a series of patches to modernize and standardize UTF-8 handling.

When the fallback UTF-8 validation code was added it was placed inside formatting.php; however, that validation logic can be reused for a number of related UTF-8 functions. To faciliate this it should move into a new location and be loaded early. This patch is the first half of doing that, whereby the original fallback function is moved unchanged to the `compat-utf8.php` module. The follow-up patch will abstract the UTF-8 scanning logic for reuse. Splitting this into a move and a separate change involves an extra step, but faciliates tracking the heritage of the code through the changes.

Developed in WordPress#9825
Discussed in https://core.trac.wordpress.org/ticket/63863

Follow-up to: [60630].

See #63863.


git-svn-id: https://develop.svn.wordpress.org/trunk@60743 602fd350-edb4-49c9-b593-d223f7449a82
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant