-
Notifications
You must be signed in to change notification settings - Fork 3.2k
HTML API: Ensure correct encoding of modified class names #10591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML API: Ensure correct encoding of modified class names #10591
Conversation
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
e5a60c7 to
cf42da8
Compare
| 'expected' => '<pre foo="bar" id="<code" class="wp-block-code <code is poetry> firstTag"><code class="secondTag">This <is> a <strong is="true">thing.</code></pre><span>test</span>', | ||
| ), | ||
| 'HTML tag brackets in attribute values and data markup' => array( | ||
| 'input' => '<pre id="<code->-block->" class="wp-block-code <code is poetry>"><code>This <is> a <strong is="true">thing.</code></pre><span>test</span>', | ||
| 'expected' => '<pre foo="bar" id="<code->-block->" class="wp-block-code <code is poetry&gt; firstTag"><code class="secondTag">This <is> a <strong is="true">thing.</code></pre><span>test</span>', | ||
| 'expected' => '<pre foo="bar" id="<code->-block->" class="wp-block-code <code is poetry> firstTag"><code class="secondTag">This <is> a <strong is="true">thing.</code></pre><span>test</span>', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests seem to have been ensuring incorrect behavior.
In both cases updated here an input class like poetry> (decoded value poetry>) was changed to poetry&gt; (decoded value poetry>).
They were updated in #10143. This change restores their original (correct) assertion.
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the Core Committers: Use this line as a base for the props when committing in SVN: To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
| $this->html, | ||
| $this->attributes['class']->value_starts_at, | ||
| $this->attributes['class']->value_length | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve been staring at this for a bit now and have no idea how we missed it. Surely we didn’t overlook this since 6.2 in the original introduction?
but you know what? we did!
php > $p = new WP_HTML_Tag_Processor( '<div class="& abc">' );
php > $p->next_tag();
php > $p->add_class( '&' );
php > $p->add_class( 'abc' );
php > echo $p->get_updated_html();
<div class="& abc & abc">
php > $p->remove_class( 'abc' );
php > echo $p->get_updated_html();
<div class="& abc &">
php > $p->remove_class( '&' );
php > echo $p->get_updated_html();
<div class="& abc &">now I don’t remember why; maybe it was performance-related or maybe it was a choice to lean on already-existing bugs, but I suppose it’s proper here to fix it regardless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there's been a latent issue for a while that was covered up by the behavior of esc_html(). It's become very clear with the change in 6.9 to use more rigorous attribute encoding.
|
@ktmn this is a proposed fix for WordPress/gutenberg#73713. |
dmsnell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’d be nice to have performance evaluations of this, but that could be hard. It’s not directly in the hot path, and only happens when flushing updates. Either way, I prefer correctness than raw performance, and WP_HTML_Decoder::decode_attribute() is already checking for the presence of & so in most calls will be little more than a noop.
I tested and confirmed the broken behavior, that it originally appeared in 6.2 with the inception of the Tag Processor, and that the test is corrected. The theory checks out, so this should be solid.
Some class names with HTML character references could be mishandled, for example: - Failure to remove an existing class like `&` with `::remove_class( '&' )` - Double-encoding of an existing class like `&` after a modification, becoming `&` The second case manifested after double-encoding prevention was removed from `::set_attribute()` in [60919]. Developed in #10591. Props jonsurrell, dmsnell. Fixes #64340. git-svn-id: https://develop.svn.wordpress.org/trunk@61346 602fd350-edb4-49c9-b593-d223f7449a82
Some class names with HTML character references could be mishandled, for example: - Failure to remove an existing class like `&` with `::remove_class( '&' )` - Double-encoding of an existing class like `&` after a modification, becoming `&` The second case manifested after double-encoding prevention was removed from `::set_attribute()` in [60919]. Developed in WordPress/wordpress-develop#10591. Props jonsurrell, dmsnell. Fixes #64340. Built from https://develop.svn.wordpress.org/trunk@61346 git-svn-id: http://core.svn.wordpress.org/trunk@60658 1a063a9b-81f0-0310-95a4-ce76da25c4cd
Some class names with HTML character references could be mishandled, for example: - Failure to remove an existing class like `&` with `::remove_class( '&' )` - Double-encoding of an existing class like `&` after a modification, becoming `&` The second case manifested after double-encoding prevention was removed from `::set_attribute()` in [60919]. Developed in WordPress/wordpress-develop#10591. Props jonsurrell, dmsnell. Fixes #64340. Built from https://develop.svn.wordpress.org/trunk@61346 git-svn-id: https://core.svn.wordpress.org/trunk@60658 1a063a9b-81f0-0310-95a4-ce76da25c4cd
Some class names with HTML character references could be mishandled, for example: - Failure to remove an existing class like `&` with `::remove_class( '&' )` - Double-encoding of an existing class like `&` after a modification, becoming `&` The second case manifested after double-encoding prevention was removed from `::set_attribute()` in [60919]. Developed in #10591. Reviewed by dmsnell, mamaduka. Merges [61346] to the 6.9 branch. Props jonsurrell, dmsnell. Fixes #64340. git-svn-id: https://develop.svn.wordpress.org/branches/6.9@61350 602fd350-edb4-49c9-b593-d223f7449a82
Some class names with HTML character references could be mishandled, for example: - Failure to remove an existing class like `&` with `::remove_class( '&' )` - Double-encoding of an existing class like `&` after a modification, becoming `&` The second case manifested after double-encoding prevention was removed from `::set_attribute()` in [60919]. Developed in WordPress/wordpress-develop#10591. Reviewed by dmsnell, mamaduka. Merges [61346] to the 6.9 branch. Props jonsurrell, dmsnell. Fixes #64340. Built from https://develop.svn.wordpress.org/branches/6.9@61350 git-svn-id: http://core.svn.wordpress.org/branches/6.9@60662 1a063a9b-81f0-0310-95a4-ce76da25c4cd
|
The fix works well for me. Couple things I tested (working as intended I think): $html = '<div class="& &"></div>';
$p = new WP_HTML_Tag_Processor($html);
$p->next_tag();
var_dump($p->has_class('&')); // true
var_dump($p->has_class('&')); // false
var_dump($p->has_class('&amp;')); // false
$p->add_class('test');
$html = $p->get_updated_html();
echo $html; // <div class="& test"></div>$html = '<div class="&"></div>';
$p = new WP_HTML_Tag_Processor($html);
$p->next_tag();
$p->add_class('&');
$html = $p->get_updated_html();
echo $html; // <div class="&"></div>$html = '<div class="&"></div>';
$p = new WP_HTML_Tag_Processor($html);
$p->next_tag();
$p->remove_class('&');
$html = $p->get_updated_html();
echo $html; // <div ></div>$html = '<div class="&"></div>';
$p = new WP_HTML_Tag_Processor($html);
$p->next_tag();
$p->remove_class('&');
$html = $p->get_updated_html();
echo $html; // <div class="&"></div>$html = '<div class="&"></div>';
$p = new WP_HTML_Tag_Processor($html);
$p->next_tag();
$p->remove_class('&');
$html = $p->get_updated_html();
echo $html; // <div ></div>$html = '<div class="&"></div>';
$p = new WP_HTML_Tag_Processor($html);
$p->next_tag();
$p->remove_class('&');
$html = $p->get_updated_html();
echo $html; // <div class="&"></div>$html = '<div class="&"></div>';
$p = new WP_HTML_Tag_Processor($html);
$p->next_tag();
$p->add_class('&');
$html = $p->get_updated_html();
echo $html; // <div class="& &amp;"></div>$html = '<div class="&"></div>';
$p = new WP_HTML_Tag_Processor($html);
$p->next_tag();
$p->add_class('&amp;');
$html = $p->get_updated_html();
echo $html; // <div class="& &amp;amp;"></div>Any echo <<<HTML
<style>.\[\&\]\:text-red { color: red; }</style>
<div class="[&]:text-red">Text is red</div>
<div class="[&]:text-red">Text is still red</div>
HTML;(not encoding the chars when not needed would look prettier (and shorter) though, if it's possible to do it safely) While the class names here get encoded, CSS stays working: echo '<style>.\[\&_\.class\\_\\_name\]\:font-bold .class__name { font-weight: bold; }</style>';
$html = <<<HTML
<div class="[&_.class\_\_name]:font-bold"><p class="class__name">Text</p></div>
<div class="[&_.class\_\_name]:font-bold"><p class="class__name">Text</p></div>
HTML;
$p = new WP_HTML_Tag_Processor($html);
while($p->next_tag()) {
$p->add_class('test');
}
$html = $p->get_updated_html();
echo $html;
// <div class="[&_.class\_\_name]:font-bold test"><p class="class__name test">Text</p></div>
// <div class="[&_.class\_\_name]:font-bold test"><p class="class__name test">Text</p></div>echo <<<HTML
<style>
.\[\&\:\:before\]\:content-\[\'\\2193\'\]:before,
.\[\&\:\:before\]\:content-\[\"\\2193\"\]:before,
.\[\&\:\:before\]\:content-\[\'↓\'\]:before,
.\[\&\:\:before\]\:content-\[\"↓\"\]:before {
content: "↓";
}
</style>
HTML;
$html = <<<HTML
<p class="[&::before]:content-['↓']">Test</p>
<p class="[&::before]:content-["↓"]">Test</p>
<p class="[&::before]:content-['↓']">Test</p>
<p class="[&::before]:content-["↓"]">Test</p>
<p class='[&::before]:content-["↓"]'>Test</p>
<p class="[&::before]:content-['\\2193']">Test</p>
<p class="[&::before]:content-["\\2193"]">Test</p>
<p class="[&::before]:content-['\\2193']">Test</p>
<p class="[&::before]:content-["\\2193"]">Test</p>
HTML;
$p = new WP_HTML_Tag_Processor($html);
while($p->next_tag('p')) {
$p->add_class('test');
}
$html = $p->get_updated_html();
echo $html;
// <p class="[&::before]:content-['↓'] test">Test</p>
// <p class="[&::before]:content-["↓"] test">Test</p>
// <p class="[&::before]:content-['↓'] test">Test</p>
// <p class="[&::before]:content-["↓"] test">Test</p>
// <p class="[&::before]:content-["↓"] test">Test</p>
// <p class="[&::before]:content-['\2193'] test">Test</p>
// <p class="[&::before]:content-["\2193"] test">Test</p>
// <p class="[&::before]:content-['\2193'] test">Test</p>
// <p class="[&::before]:content-["\2193"] test">Test</p>Encoding stayed stable after running processor output through the processor again. |
|
Awesome testing, thanks for following up @ktmn. I agree, the HTML processing appears to be correct in the examples you shared.
This is correct whenever the class attribute is modified. There are a few cases where the class attribute did not require modification and the tag processor leaves the plain
This is technically possible and allowed, but generally not worth the extra complexity. The last two (more complex) cases don't render exactly the same for me. I think it's more related to how the style is being printed and may be related to quoting, escaping, PHP versions, and possibly even GitHub's code fence escaping. It's hard to say. Here are the working style elements for me the do have the correct behavior with the example output from the tag processor: <style>.\[\&_\.class\\_\\_name\]\:font-bold .class__name { font-weight: bold; }</style>
<style>
.\[\&\:\:before\]\:content-\[\'\\2193\'\]:before,
.\[\&\:\:before\]\:content-\[\"\\2193\"\]:before,
.\[\&\:\:before\]\:content-\[\'↓\'\]:before,
.\[\&\:\:before\]\:content-\[\"↓\"\]:before {
content: "↓";
}
</style>Here's the example in the Live DOM Viewer. If you don't know it, it's a very handy tool for inspecting HTML processing. |
Ensure the class name modifications preserve correct HTML encoding when calling
add_classandget_updated_htmlrepeatedly.There was an inconsistency in this branching where
$existing_classwould get a correctly decoded attribute value if there was a value returned fromget_enqueued_attribute_value()but would otherwise get raw, unparsed HTML if there was a class attribute in the HTML:wordpress-develop/src/wp-includes/html-api/class-wp-html-tag-processor.php
Lines 2339 to 2350 in 3a87241
Note that
::get_enqueued_attribute_value()returns a decoded value:wordpress-develop/src/wp-includes/html-api/class-wp-html-tag-processor.php
Line 2734 in 3a87241
This inconsistent behavior was hidden by the use of
esc_attr()which avoids double-escaping character references, so when::set_attribute()was later called it would only encoded character references that did not appear to already be encoded.wordpress-develop/src/wp-includes/html-api/class-wp-html-tag-processor.php
Line 2479 in 3a87241
When
:set_attribute()was changed in r60919 (4892d46) to always escape attribute values, the double-escaping manifested.Trac ticket: https://core.trac.wordpress.org/ticket/64340
Originally reported in WordPress/gutenberg#73713.
This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.