Make WordPress Core

Opened 5 weeks ago

Closed 5 weeks ago

#64513 closed defect (bug) (invalid)

HTML_Processor gets wrong breadcrumbs for elements in <head>

Reported by: vicobot's profile vicobot Owned by:
Milestone: Priority: normal
Severity: normal Version: 6.9
Component: HTML API Keywords:
Focuses: Cc:

Description

If you visit elements that typically appear in the <head>, like META or LINK, the HTML_Processor returns a breadcrumbs array with BODY instead of HEAD, e.g.

Array
(
[0] => HTML
[1] => BODY
[2] => META
)

How to reproduce:

  • call $processor = \WP_HTML_Processor::create_fragment($html) on a full HTML page, e.g. by hooking into wp_template_enhancement_output_buffer
  • get $processor->next_tag('META') or some other tag that lives in <head>
  • get $processor->get_breadcrumbs() and check the resulting array, e.g. with print_r() or by doing an array_diff() with the expected result ['HTML', 'HEAD', 'META']

Wordpress: 6.9 (php8.2-apache docker)

Change History (2)

#1 @westonruter
5 weeks ago

  • Keywords close added

@vicobot this is because you're parsing in BODY mode. You need to use WP_HTML_Processor::create_full_parser() instead of WP_HTML_Processor::create_fragment().

#2 @dmsnell
5 weeks ago

  • Keywords close removed
  • Milestone Awaiting Review deleted
  • Resolution set to invalid
  • Status changed from new to closed

oops, I was prepping my response while @westonruter was responding. to add to what he wrote, the difference between those methods is that create_fragment() is specifically designed to operate within the context of inner HTML inside a specified element, the default being BODY.

Use this for cases where you are processing chunks of HTML that will be found within a bigger HTML document, such as rendered block output that exists within a post, the_content inside a rendered site layout.

If you use create_full_parser() it will assume that you are providing the full HTML for a page from start to finish and the initial META elements will appear within a HEAD. However, as per the HTML spec, when parsing a META tag, an element is to be created in the current element regardless of the parser’s current insertion mode.

You can see this demonstrated using your browser’s interpretation of the HTML, where META remains inside the BODY

Note: See TracTickets for help on using tickets.