Skip to content

Conversation

@sirreal
Copy link
Member

@sirreal sirreal commented Dec 3, 2025

Ensure the class name modifications preserve correct HTML encoding when calling add_class and get_updated_html repeatedly.

There was an inconsistency in this branching where $existing_class would get a correctly decoded attribute value if there was a value returned from get_enqueued_attribute_value() but would otherwise get raw, unparsed HTML if there was a class attribute in the HTML:

$existing_class = $this->get_enqueued_attribute_value( 'class' );
if ( null === $existing_class || true === $existing_class ) {
$existing_class = '';
}
if ( false === $existing_class && isset( $this->attributes['class'] ) ) {
$existing_class = substr(
$this->html,
$this->attributes['class']->value_starts_at,
$this->attributes['class']->value_length
);
}

Note that ::get_enqueued_attribute_value() returns a decoded value:

return WP_HTML_Decoder::decode_attribute( $enqueued_value );

This inconsistent behavior was hidden by the use of esc_attr() which avoids double-escaping character references, so when ::set_attribute() was later called it would only encoded character references that did not appear to already be encoded.

$this->set_attribute( 'class', $class );

When :set_attribute() was changed in r60919 (4892d46) to always escape attribute values, the double-escaping manifested.

WP_HTML_Tag_Processor and WP_HTML_Processor may incorrectly encode class names containing the characters &, <, >, ", or ' when modifying them via class methods like ::add_class() and calling ::get_updated_html().

For example:

$p = new WP_HTML_Tag_Processor('<div></div>');
$p->next_tag();
$p->add_class('&');
echo $p->get_updated_html() . "\n";
$p->add_class('OK');
echo $p->get_updated_html() . "\n";

Will print:

<div class="&amp;"></div>
<div class="&amp;amp; OK"></div>

Notice that the first pass is correct, & has been correctly encoded in the class attribute as &. However, after calling ::add_class() and ::get_updated_html() again, the & has incorrectly been double-encoded as &amp;amp;.

The same code in WordPress 6.8 would print:

<div class="&amp;"></div>
<div class="&amp; OK"></div>

This is related to r60919 that was released in WordPress 6.9. The double-encoding behavior was present before, but it was "corrected" in this case by the use of esc_attr() that avoids any double-encoding. When esc_attr() usage was removed in r60919, the double-escaping behavior manifests causing this issue.

Trac ticket: https://core.trac.wordpress.org/ticket/64340

Originally reported in WordPress/gutenberg#73713.


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

@github-actions
Copy link

github-actions bot commented Dec 3, 2025

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

@sirreal sirreal force-pushed the trac-64340/html-api-double-encoded-class-names branch from e5a60c7 to cf42da8 Compare December 3, 2025 13:16
Comment on lines +2890 to +2894
'expected' => '<pre foo="bar" id="<code" class="wp-block-code &lt;code is poetry&gt; firstTag"><code class="secondTag">This &lt;is> a &lt;strong is="true">thing.</code></pre><span>test</span>',
),
'HTML tag brackets in attribute values and data markup' => array(
'input' => '<pre id="<code-&gt;-block-&gt;" class="wp-block-code <code is poetry&gt;"><code>This &lt;is> a &lt;strong is="true">thing.</code></pre><span>test</span>',
'expected' => '<pre foo="bar" id="<code-&gt;-block-&gt;" class="wp-block-code &lt;code is poetry&amp;gt; firstTag"><code class="secondTag">This &lt;is> a &lt;strong is="true">thing.</code></pre><span>test</span>',
'expected' => '<pre foo="bar" id="<code-&gt;-block-&gt;" class="wp-block-code &lt;code is poetry&gt; firstTag"><code class="secondTag">This &lt;is> a &lt;strong is="true">thing.</code></pre><span>test</span>',
Copy link
Member Author

@sirreal sirreal Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests seem to have been ensuring incorrect behavior.

In both cases updated here an input class like poetry&gt; (decoded value poetry>) was changed to poetry&amp;gt; (decoded value poetry&gt;).

They were updated in #10143. This change restores their original (correct) assertion.

@sirreal sirreal marked this pull request as ready for review December 3, 2025 13:38
@github-actions
Copy link

github-actions bot commented Dec 3, 2025

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props jonsurrell, dmsnell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@sirreal sirreal requested a review from dmsnell December 3, 2025 13:38
$this->html,
$this->attributes['class']->value_starts_at,
$this->attributes['class']->value_length
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve been staring at this for a bit now and have no idea how we missed it. Surely we didn’t overlook this since 6.2 in the original introduction?

but you know what? we did!

php > $p = new WP_HTML_Tag_Processor( '<div class="&amp; &#x61;bc">' );
php > $p->next_tag();
php > $p->add_class( '&' );
php > $p->add_class( 'abc' );
php > echo $p->get_updated_html();
<div class="&amp; &#x61;bc &amp; abc">
php > $p->remove_class( 'abc' );
php > echo $p->get_updated_html();
<div class="&amp; &#x61;bc &amp;">
php > $p->remove_class( '&' );
php > echo $p->get_updated_html();
<div class="&amp; &#x61;bc &amp;">

now I don’t remember why; maybe it was performance-related or maybe it was a choice to lean on already-existing bugs, but I suppose it’s proper here to fix it regardless.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there's been a latent issue for a while that was covered up by the behavior of esc_html(). It's become very clear with the change in 6.9 to use more rigorous attribute encoding.

@sirreal
Copy link
Member Author

sirreal commented Dec 3, 2025

@ktmn this is a proposed fix for WordPress/gutenberg#73713.

Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’d be nice to have performance evaluations of this, but that could be hard. It’s not directly in the hot path, and only happens when flushing updates. Either way, I prefer correctness than raw performance, and WP_HTML_Decoder::decode_attribute() is already checking for the presence of & so in most calls will be little more than a noop.

I tested and confirmed the broken behavior, that it originally appeared in 6.2 with the inception of the Tag Processor, and that the test is corrected. The theory checks out, so this should be solid.

pento pushed a commit that referenced this pull request Dec 3, 2025
Some class names with HTML character references could be mishandled, for example:
- Failure to remove an existing class like `&` with `::remove_class( '&' )`
- Double-encoding of an existing class like `&` after a modification, becoming `&amp;`

The second case manifested after double-encoding prevention was removed from `::set_attribute()` in [60919].

Developed in #10591.

Props jonsurrell, dmsnell.
Fixes #64340.


git-svn-id: https://develop.svn.wordpress.org/trunk@61346 602fd350-edb4-49c9-b593-d223f7449a82
@github-actions
Copy link

github-actions bot commented Dec 3, 2025

A commit was made that fixes the Trac ticket referenced in the description of this pull request.

SVN changeset: 61346
GitHub commit: 74b60c2

This PR will be closed, but please confirm the accuracy of this and reopen if there is more work to be done.

@github-actions github-actions bot closed this Dec 3, 2025
@sirreal sirreal deleted the trac-64340/html-api-double-encoded-class-names branch December 3, 2025 17:28
markjaquith pushed a commit to markjaquith/WordPress that referenced this pull request Dec 3, 2025
Some class names with HTML character references could be mishandled, for example:
- Failure to remove an existing class like `&` with `::remove_class( '&' )`
- Double-encoding of an existing class like `&` after a modification, becoming `&amp;`

The second case manifested after double-encoding prevention was removed from `::set_attribute()` in [60919].

Developed in WordPress/wordpress-develop#10591.

Props jonsurrell, dmsnell.
Fixes #64340.

Built from https://develop.svn.wordpress.org/trunk@61346


git-svn-id: http://core.svn.wordpress.org/trunk@60658 1a063a9b-81f0-0310-95a4-ce76da25c4cd
github-actions bot pushed a commit to gilzow/wordpress-performance that referenced this pull request Dec 3, 2025
Some class names with HTML character references could be mishandled, for example:
- Failure to remove an existing class like `&` with `::remove_class( '&' )`
- Double-encoding of an existing class like `&` after a modification, becoming `&amp;`

The second case manifested after double-encoding prevention was removed from `::set_attribute()` in [60919].

Developed in WordPress/wordpress-develop#10591.

Props jonsurrell, dmsnell.
Fixes #64340.

Built from https://develop.svn.wordpress.org/trunk@61346


git-svn-id: https://core.svn.wordpress.org/trunk@60658 1a063a9b-81f0-0310-95a4-ce76da25c4cd
pento pushed a commit that referenced this pull request Dec 4, 2025
Some class names with HTML character references could be mishandled, for example:
- Failure to remove an existing class like `&` with `::remove_class( '&' )`
- Double-encoding of an existing class like `&` after a modification, becoming `&amp;`

The second case manifested after double-encoding prevention was removed from `::set_attribute()` in [60919].

Developed in #10591.

Reviewed by dmsnell, mamaduka.
Merges [61346] to the 6.9 branch.

Props jonsurrell, dmsnell.
Fixes #64340.


git-svn-id: https://develop.svn.wordpress.org/branches/6.9@61350 602fd350-edb4-49c9-b593-d223f7449a82
markjaquith pushed a commit to markjaquith/WordPress that referenced this pull request Dec 4, 2025
Some class names with HTML character references could be mishandled, for example:
- Failure to remove an existing class like `&` with `::remove_class( '&' )`
- Double-encoding of an existing class like `&` after a modification, becoming `&amp;`

The second case manifested after double-encoding prevention was removed from `::set_attribute()` in [60919].

Developed in WordPress/wordpress-develop#10591.

Reviewed by dmsnell, mamaduka.
Merges [61346] to the 6.9 branch.

Props jonsurrell, dmsnell.
Fixes #64340.

Built from https://develop.svn.wordpress.org/branches/6.9@61350


git-svn-id: http://core.svn.wordpress.org/branches/6.9@60662 1a063a9b-81f0-0310-95a4-ce76da25c4cd
@ktmn
Copy link

ktmn commented Dec 4, 2025

The fix works well for me.

Couple things I tested (working as intended I think):

$html = '<div class="& &amp;"></div>';
$p = new WP_HTML_Tag_Processor($html);
$p->next_tag();
var_dump($p->has_class('&')); // true
var_dump($p->has_class('&amp;')); // false
var_dump($p->has_class('&amp;amp;')); // false
$p->add_class('test');
$html = $p->get_updated_html();
echo $html; // <div class="&amp; test"></div>
$html = '<div class="&"></div>';
$p = new WP_HTML_Tag_Processor($html);
$p->next_tag();
$p->add_class('&');
$html = $p->get_updated_html();
echo $html; // <div class="&"></div>
$html = '<div class="&"></div>';
$p = new WP_HTML_Tag_Processor($html);
$p->next_tag();
$p->remove_class('&');
$html = $p->get_updated_html();
echo $html; // <div ></div>
$html = '<div class="&"></div>';
$p = new WP_HTML_Tag_Processor($html);
$p->next_tag();
$p->remove_class('&amp;');
$html = $p->get_updated_html();
echo $html; // <div class="&"></div>
$html = '<div class="&amp;"></div>';
$p = new WP_HTML_Tag_Processor($html);
$p->next_tag();
$p->remove_class('&');
$html = $p->get_updated_html();
echo $html; // <div ></div>
$html = '<div class="&amp;"></div>';
$p = new WP_HTML_Tag_Processor($html);
$p->next_tag();
$p->remove_class('&amp;');
$html = $p->get_updated_html();
echo $html; // <div class="&amp;"></div>
$html = '<div class="&"></div>';
$p = new WP_HTML_Tag_Processor($html);
$p->next_tag();
$p->add_class('&amp;');
$html = $p->get_updated_html();
echo $html; // <div class="&amp; &amp;amp;"></div>
$html = '<div class="&"></div>';
$p = new WP_HTML_Tag_Processor($html);
$p->next_tag();
$p->add_class('&amp;amp;');
$html = $p->get_updated_html();
echo $html; // <div class="&amp; &amp;amp;amp;"></div>

Any & is always converted to &amp;, for CSS selectors both work OK:

echo <<<HTML
<style>.\[\&\]\:text-red { color: red; }</style>
<div class="[&]:text-red">Text is red</div>
<div class="[&amp;]:text-red">Text is still red</div>
HTML;

(not encoding the chars when not needed would look prettier (and shorter) though, if it's possible to do it safely)

While the class names here get encoded, CSS stays working:

echo '<style>.\[\&_\.class\\_\\_name\]\:font-bold .class__name { font-weight: bold; }</style>';

$html = <<<HTML
<div class="[&_.class\_\_name]:font-bold"><p class="class__name">Text</p></div>
<div class="[&amp;_.class\_\_name]:font-bold"><p class="class__name">Text</p></div>
HTML;

$p = new WP_HTML_Tag_Processor($html);
while($p->next_tag()) {
	$p->add_class('test');
}
$html = $p->get_updated_html();

echo $html;
// <div class="[&amp;_.class\_\_name]:font-bold test"><p class="class__name test">Text</p></div>
// <div class="[&amp;_.class\_\_name]:font-bold test"><p class="class__name test">Text</p></div>
echo <<<HTML
<style>
	.\[\&\:\:before\]\:content-\[\'\\2193\'\]:before,
	.\[\&\:\:before\]\:content-\[\"\\2193\"\]:before,
	.\[\&\:\:before\]\:content-\[\'↓\'\]:before,
	.\[\&\:\:before\]\:content-\[\"\"\]:before {
		content: "↓";
	}
</style>
HTML;

$html = <<<HTML
<p class="[&::before]:content-['↓']">Test</p>
<p class="[&::before]:content-[&quot;↓&quot;]">Test</p>
<p class="[&amp;::before]:content-['↓']">Test</p>
<p class="[&amp;::before]:content-[&quot;↓&quot;]">Test</p>
<p class='[&::before]:content-["↓"]'>Test</p>
<p class="[&::before]:content-['\\2193']">Test</p>
<p class="[&::before]:content-[&quot;\\2193&quot;]">Test</p>
<p class="[&amp;::before]:content-['\\2193']">Test</p>
<p class="[&amp;::before]:content-[&quot;\\2193&quot;]">Test</p>
HTML;

$p = new WP_HTML_Tag_Processor($html);
while($p->next_tag('p')) {
	$p->add_class('test');
}
$html = $p->get_updated_html();
echo $html;

// <p class="[&amp;::before]:content-[&apos;↓&apos;] test">Test</p>
// <p class="[&amp;::before]:content-[&quot;↓&quot;] test">Test</p>
// <p class="[&amp;::before]:content-[&apos;↓&apos;] test">Test</p>
// <p class="[&amp;::before]:content-[&quot;↓&quot;] test">Test</p>
// <p class="[&amp;::before]:content-[&quot;↓&quot;] test">Test</p>
// <p class="[&amp;::before]:content-[&apos;\2193&apos;] test">Test</p>
// <p class="[&amp;::before]:content-[&quot;\2193&quot;] test">Test</p>
// <p class="[&amp;::before]:content-[&apos;\2193&apos;] test">Test</p>
// <p class="[&amp;::before]:content-[&quot;\2193&quot;] test">Test</p>

Encoding stayed stable after running processor output through the processor again.

@sirreal
Copy link
Member Author

sirreal commented Dec 4, 2025

Awesome testing, thanks for following up @ktmn. I agree, the HTML processing appears to be correct in the examples you shared.


Any & is always converted to &amp;, for CSS selectors both work OK

This is correct whenever the class attribute is modified. There are a few cases where the class attribute did not require modification and the tag processor leaves the plain & class attribute value as-is. We can see it in the examples that add a class that is already present or remove a class that is not present.


(not encoding the chars when not needed would look prettier (and shorter) though, if it's possible to do it safely)

This is technically possible and allowed, but generally not worth the extra complexity. &amp; will decode as &, but & may be interpreted in different ways depending on what follows, so &amp; is just less error-prone. It's possible to encode fewer things, but encoding all of &<>"' aligns with what WordPress generally does and is safe.


The last two (more complex) cases don't render exactly the same for me. I think it's more related to how the style is being printed and may be related to quoting, escaping, PHP versions, and possibly even GitHub's code fence escaping. It's hard to say.

Here are the working style elements for me the do have the correct behavior with the example output from the tag processor:

<style>.\[\&_\.class\\_\\_name\]\:font-bold .class__name { font-weight: bold; }</style>

<style>
	.\[\&\:\:before\]\:content-\[\'\\2193\'\]:before,
	.\[\&\:\:before\]\:content-\[\"\\2193\"\]:before,
	.\[\&\:\:before\]\:content-\[\'↓\'\]:before,
	.\[\&\:\:before\]\:content-\[\"↓\"\]:before {
		content: "↓";
	}
</style>

Here's the example in the Live DOM Viewer. If you don't know it, it's a very handy tool for inspecting HTML processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants