-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Command Palette: Use WP_HTML_Processor and WP_HTML_Decoder to generate menu label and menu URL #10480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…e menu label and menu URL
|
I'll prepare the commit message now, ready for when I commit this pull request. |
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the Core Committers: Use this line as a base for the props when committing in SVN: To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
|
@dmsnell @mukeshpanchal27 @peterwilsoncc If there are no other blockers, I'd like to commit this before the RC1 release, but what do you think? This PR can be considered a code quality improvement, so if there are any blockers, we can punt it to a future release. |
| } | ||
|
|
||
| return trim( implode( '', $text_parts ) ); | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey nice job @t-hamano getting this built. I hope it wasn’t too obscure to figure out.
this looks like it should be solid, but I can share a couple of points of feedback.
Finding root-level text nodes
when creating a fragment (with the default <body> context) we will always have an open HTML element and BODY element, meaning that root-level text will always have a depth of 3 (and likewise, the breadcrumb depth will be three).
this means we can eliminate the nested loop and directly check if the depth is 3. we don’t have to capture the root depth. that open HTML and BODY are guarantees with how it works.
On the other hand, we can also test this via the breadcrumbs. I found an issue that we should probably change/fix on matches_breadcrumbs(), because that won’t work here, but for the time being this would.
while ( $processor->next_token() ) {
if ( array( 'HTML', 'BODY', '#text' ) !== $processor->get_breadcrumbs() ) {
continue;
}
$text_parts…
}Efficiency and reliability
The use of the HTML Processor is particularly convenient because it provides depth automatically. On the other hand, if you find that it’s too slow or fails too frequently (because it receives the fraction of input documents it can’t parse) then we can still adjust the lever on the reliability/practicality spectrum. The Tag Processor will not fail with the same parsing issues the PCRE matches did, even though that can lead to some kinds of parsing failures (with, for example, mismatched tags).
Still, the Tag Processor won’t fail a parse each token and is considerably faster than the fully-fledged HTML Processor. If we were to choose this approach, we’d want to manually track depth, which again, could be wrong because HTML is so wonderfully complex (vs. the HTML Processor which will not be wrong here).
$processor = new WP_HTML_Tag_Processor( $label );
$depth = 0;
while ( $processor->next_token() ) {
$token_name = $processor->get_token_name();
if ( '#text' === $token_name && 0 === $depth ) {
$text_parts…
continue;
}
if ( $processor->is_closing_tag() ) {
--$depth;
} else if ( ! WP_HTML_Processor::is_void( $token_name ) ) {
++$depth;
}
}The choice is up to you. The only thing I’d watch out for is that occasionally we get things like “nested” A tags, and those can cause the HTML Processor to abort out of caution.
|
@t-hamano let me know your preferences on this based on my feedback. I’m happy to approve the work if we want to get it in still, understanding that it’s now after the RC1 deadline. I believe that with a couple of sign-offs we can still do so. |
|
Thanks for the feedback! I tried the latter approach using the By the way, I personally feel there is no need to rush this PR into 6.9 🙂 |
dmsnell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
everything that seems important checks out. thanks for working through this. I agree on not rushing this into 6.9
| $menu_url = $menu_slug; | ||
| } elseif ( ! empty( menu_page_url( $menu_slug, false ) ) ) { | ||
| $menu_url = html_entity_decode( menu_page_url( $menu_slug, false ), ENT_QUOTES, get_bloginfo( 'charset' ) ); | ||
| $menu_url = WP_HTML_Decoder::decode_attribute( menu_page_url( $menu_slug, false ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let’s also earmark this for a follow-up to address the issue in menu_page_url() that it fails to escape $menu_slug in the case where there’s no parent slug.
maybe we could create a ticket for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a closer look at the implementation of the menu_page_url function, namely here:
wordpress-develop/src/wp-admin/includes/plugin.php
Lines 1929 to 1933 in cd301d0
| if ( $parent_slug && ! isset( $_parent_pages[ $parent_slug ] ) ) { | |
| $url = admin_url( add_query_arg( 'page', $menu_slug, $parent_slug ) ); | |
| } else { | |
| $url = admin_url( 'admin.php?page=' . $menu_slug ); | |
| } |
I had assumed that the add_query_arg() function would URL-encode the string, but apparently it doesn't. Look at the following test results. As you can see, the add_query_arg() function does not URL-encode:
$encoded = add_query_arg( 'page', 'test #1&2', 'admin.php' );
$direct = 'admin.php?page=test #1&2';
echo bin2hex( $encoded ) == bin2hex( $direct ) ? 'Equal' : 'Not Equal';
// Output: "Equal"So in the menu_page_url() function, the $menu_slug is not escaped in either case, so we need to fix that. Is my understanding correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@t-hamano it’s actually the other case that’s more problematic because it doesn’t even attempt to escape the $menu_slug. I don’t know off-hand, but I think that # is a special case here; you may try passing something like é or 🏴 and see if it encodes that.
The second clause should read…
} else {
$url = admin_url( add_query_arg( 'page', $menu_slug, 'admin.php' ) );
}or something like that. it’s missing the call to add query args.
|
Thanks for the review, @dmsnell! Finally, could you review the commit message? I'm concerned that the explanation may not be accurate. |
Co-authored-by: Weston Ruter <[email protected]>
Co-authored-by: Weston Ruter <[email protected]>
Co-authored-by: Weston Ruter <[email protected]>
Generally I just write “use HTML API” rather than noting the specific classes and methods. It might be valuable to tweak the note on “tag removal” since that is what originally led me to misunderstand the problem. - Command Palette: Use WP_HTML_Tag_Processor and WP_HTML_Decoder for menu labels and URLs.
+ Command Palette: Use HTML API for more reliable menu labels and URLs.
- Replace regex-based HTML tag removal with WP_HTML_Tag_Processor to properly extract text nodes from menu labels. This ensures only root-level text nodes are
collected.
+ Replace regex-based HTML parsing with WP_HTML_Tag_Processor to properly extract text nodes from menu labels. This ensures only root-level text nodes are
collected.
- Additionally, replace html_entity_decode() with WP_HTML_Decoder::decode_attribute() for URL decoding to use the modern HTML API for consistent attribute decoding.
+ Additionally, replace html_entity_decode() with WP_HTML_Decoder::decode_attribute() with the menu URL for consistent attribute decoding.
Follow-up to [61124], [61126], [61127], [61142].
Props: dmsnell, madhavishah01, peterwilsoncc, wildworks.
Fixes #64177, #64196.I tossed in some minor styling updates in there, which you are free to ignore, but since you asked… |
dmsnell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine from the perspective of the HTML API use. The new version of the code seems clearer in intent to me, making it a bit more explicit that the goal is to strip away any elements with all their content.
Not sure why we do that or want that, but that’s neither here nor there.
|
@dmsnell Thanks for the feedback! I will use that message and commit as per your suggestion. |
Trac ticket: https://core.trac.wordpress.org/ticket/64233
This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.