Skip to content

Conversation

@ppiwo
Copy link

@ppiwo ppiwo commented Oct 9, 2025

Adds non-breaking hyphen to the list of characters converted to regular hyphens in sanitize_title_with_dashes()

Trac ticket: #64089

This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.


Drafted commit message

Formatting: Replace non-breaking hyphens with hyphens in sanitize_title_with_dashes().

Developed in #10204

Follow-up to [18705], [36775].

Props patpiwo, westonruter.
See #31790, #10797.
Fixes #64089.

Adds URL-encoded non-breaking hyphen () to the list of characters
converted to regular hyphens in sanitize_title_with_dashes()

Fixes ticket #64089
@github-actions
Copy link

github-actions bot commented Oct 9, 2025

Hi @ppiwo! 👋

Thank you for your contribution to WordPress! 💖

It looks like this is your first pull request to wordpress-develop. Here are a few things to be aware of that may help you out!

No one monitors this repository for new pull requests. Pull requests must be attached to a Trac ticket to be considered for inclusion in WordPress Core. To attach a pull request to a Trac ticket, please include the ticket's full URL in your pull request description.

Pull requests are never merged on GitHub. The WordPress codebase continues to be managed through the SVN repository that this GitHub repository mirrors. Please feel free to open pull requests to work on any contribution you are making.

More information about how GitHub pull requests can be used to contribute to WordPress can be found in the Core Handbook.

Please include automated tests. Including tests in your pull request is one way to help your patch be considered faster. To learn about WordPress' test suites, visit the Automated Testing page in the handbook.

If you have not had a chance, please review the Contribute with Code page in the WordPress Core Handbook.

The Developer Hub also documents the various coding standards that are followed:

Thank you,
The WordPress Project

@github-actions
Copy link

github-actions bot commented Oct 9, 2025

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props patpiwo, westonruter, dmsnell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@github-actions
Copy link

github-actions bot commented Oct 9, 2025

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

@westonruter
Copy link
Member

@ppiwo Could you also add test cases for this as part of https://github.com/WordPress/wordpress-develop/blob/trunk/tests/phpunit/tests/formatting/sanitizeTitleWithDashes.php ?

@westonruter
Copy link
Member

I wanted to take this as an opportunity to improve maintenance of this going forward so I've added ff2d2a7. This is begging for a review from @dmsnell!

@westonruter westonruter requested a review from dmsnell October 23, 2025 00:07
Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the ping, @westonruter. we recently had similar work in #9103 (Core-62995).

@ppiwo we could consider an approach similar to that taken over there, which is to rely on a Unicode-supported PCRE to replace everything with the Dash_Punctuation character property, and also the Space_Separator.

if ( _wp_can_use_pcre_u() ) {
	$title = preg_replace( '~[\p{Pd}\p{Zs}]~u', '-', $title );
}

Over time I think it’s okay to be more and more restrictive on these, but I hope we push more in the direction of finding ways to ensure the titles and filenames more closely match the content they are associated with.

if ( function_exists( 'mb_chr' ) ) {
$replacements[] = rawurlencode( mb_chr( $decimal_codepoint, 'UTF-8' ) );
}
}
Copy link
Member

@dmsnell dmsnell Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly discourage replacements that attempt to match normative character references, or which mix UTF-8 characters and HTML character references. these lead to strange edge cases and can easily lead to situations where we cannot accomplish what should be allowable.

to that end if we want to make these replacements I would encourage backing up to the top of this function and replacing strip_tags() with a run through the HTML API to extract the title as decoded plaintext. once that’s done we can examine raw UTF-8 replacements and not have to concern ourselves if someone wrote   or &nbsp or   or &#0000000160 — all of these decode into the same U+00A0 code point.

If not wanting to reconsider this function more holistically, this can still be decoded as WP_HTML_Decoder::decode_text_node( $title ) before making these replacements. They can be done rather swiftly with strtr(). Further, since we are creating a static replacements array, we don’t have to use a potentially-missing runtime function to generate them: we can use Unicode string literals like \u{2011} for the patterns/matches.

Also a quick side note: HTML’s named character references are case-sensitive, so while I am guessing the use of str_ireplace() is to catch variations like  , if it actually does that it will transform plaintext content and not the placeholder for a no-break space.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(My bad for using str_ireplace() in my addition.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it’s fine. that’s why we review each other’s work.

did I guess the purpose right?

Copy link
Member

@westonruter westonruter Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially. The purpose was so that %c2%a0 and %C2%A0 would both be matched, same as   and  . I forgot that named entities are case-sensitive in HTML, which is ironic since everything else is case insensitive (although I'm sure I'm not entirely accurate there).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that case it’d be fine to leáve it in after decoding, but we wouldn’t want or need to replace character references — they will have already been replaced.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmsnell How about this: to unblock closing this ticket, I just revert my addition (ff2d2a7) so the new non-breaking hyphen will be accounting for using the existing logic. Then we create a new ticket for 7.0 which can refactor the entire function using the new tooling you've been working on?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that’s fine @westonruter, though my comment about decoding was intended to keep this moving without blocking it. I think your suggestion was good, but it needed decoding.

somewhere I believe I have an existing branch for this entire thing. like usual, there are complications…

so either the way it is now or the way you had it, but with decoding sounds good. I am going to mark my approval and leave it up to you two. 🙇‍♂️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up ticket: Core-64089

…into ppiwo/trunk

* 'trunk' of https://github.com/WordPress/wordpress-develop:
  Twenty Sixteen: Document the `twentysixteen_author_avatar_size` filter.
  Abilities API: Ensure public method is used in the codebase
  General: Add comment explaining use of queried object in `feed_links_extra()` instead of global `$post`.
  Posts, Post Types: Update `get_the_modified_author()` to handle missing global `$post` and add (missing) `$post` arg.
  General: Improve resilience of `feed_links_extra()` when global `$post` is not set.
  Twenty Sixteen: Document the `twentysixteen_content_width` filter.
  Template activation: fix unique slug filtering.
  Coding Standards: Use Yoda conditions consistently in `wp-includes/formatting.php`.
pento pushed a commit that referenced this pull request Oct 25, 2025
…nitize_title_with_dashes()`.

Developed in #10204

Follow-up to [18705], [36775].

Props patpiwo, westonruter, dmsnell.
See #31790, #10797.
Fixes #64089.


git-svn-id: https://develop.svn.wordpress.org/trunk@61061 602fd350-edb4-49c9-b593-d223f7449a82
markjaquith pushed a commit to markjaquith/WordPress that referenced this pull request Oct 25, 2025
@westonruter
Copy link
Member

Committed in r61061

github-actions bot pushed a commit to gilzow/wordpress-performance that referenced this pull request Oct 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants