Skip to content

HTML API: Fix invalid comment syntax cases#4256

Closed
dmsnell wants to merge 14 commits intoWordPress:trunkfrom
dmsnell:html-api/support-invalid-first-character-in-tag-name-comments
Closed

HTML API: Fix invalid comment syntax cases#4256
dmsnell wants to merge 14 commits intoWordPress:trunkfrom
dmsnell:html-api/support-invalid-first-character-in-tag-name-comments

Conversation

@dmsnell
Copy link
Copy Markdown
Member

@dmsnell dmsnell commented Mar 21, 2023

Status

  • Satisfy linter requirements
  • Create Trac ticket #58007-trac
  • Expand comments, docblocks, add version and Trac ticket refereces
  • Create backport in Gutenberg

What

Adds support for a few invalid HTML comment forms.

  • comments created by means of a tag closer with an invalid tag name, e.g. </3>
  • comments closed with the invalid --!> closer (comments should be closed by --> but if the ! appears it will also close it, in error).
  • empty tag name elements, which are technically skipped over and aren't comments, e.g. </>

Why

We want to continue to find and handle edge cases in HTML so that the Tag Processor remains reliable in the face of unexpected inputs.

How

Captures the patterns described in the HTML specification for parse errors relating to invalid comment construction. Since these are comments it's appropriate to proceed consuming input until the end, and there's no need to look for other tags or attributes in the process.

cc: @adamziel @gziolo

@dmsnell dmsnell force-pushed the html-api/support-invalid-first-character-in-tag-name-comments branch 2 times, most recently from f841c78 to 25df194 Compare March 21, 2023 23:10
@dmsnell dmsnell force-pushed the html-api/support-invalid-first-character-in-tag-name-comments branch from f6cfa42 to 2f22fda Compare March 28, 2023 23:42
@dmsnell dmsnell marked this pull request as ready for review March 29, 2023 04:00
@dmsnell dmsnell changed the title WIP: HTML API: Support comments created by invalid tag name in tag closers HTML API: Support comments created by invalid tag name in tag closers Mar 29, 2023
Copy link
Copy Markdown
Contributor

@costdev costdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @dmsnell!

I've added some docs-related feedback and just one question regarding whether it's possible that an array offset may not exist.

Comment on lines +1045 to +1047
/*
* Abruptly-closed empty comments are a sequence of dashes followed by `>`.
*/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/*
* Abruptly-closed empty comments are a sequence of dashes followed by `>`.
*/
// Abruptly-closed empty comments are a sequence of dashes followed by `>`.

This should use the single line comment format.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inlined in ddcb554

Comment on lines +1672 to +1673
* @ticket 58007
* @dataProvider data_next_tag_ignores_invalid_first_character_of_tag_name_comments
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @ticket 58007
* @dataProvider data_next_tag_ignores_invalid_first_character_of_tag_name_comments
* @ticket 58007
*
* @dataProvider data_next_tag_ignores_invalid_first_character_of_tag_name_comments

@ticket and @dataProvider should be separated by an empty line

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split in ddcb554

* @ticket 58007
* @dataProvider data_next_tag_ignores_invalid_first_character_of_tag_name_comments
*
* @param string $html_with_markers HTML containing an invalid tag closer whose element before and element after contain the "start" and "end" CSS classes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @param string $html_with_markers HTML containing an invalid tag closer whose element before and element after contain the "start" and "end" CSS classes.
* @param string $html_with_markers HTML containing an invalid tag closer whose element before
and element after contain the "start" and "end" CSS classes.

Let's break this line to limit the need for horizontal scrolling.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

broken in ddcb554.

Comment on lines +1687 to +1688
*
* @ticket 58007
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*
* @ticket 58007

Data providers don't need the @ticket annotation

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed in ddcb554

return false;
}

if ( '>' === $html[ $closer_at + 2 ] ) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that $closer_at + 2 may not exist as an offset?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it was, and thanks for noticing! I've added new tests and fixes in ae59e1a and 926d491 which validate this particular case of unchecked indices

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dmsnell! I love the dataset names! 😆

Comment on lines +1673 to +1674
*
* @dataProvider data_next_tag_ignores_invalid_first_character_of_tag_name_comments
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*
* @dataProvider data_next_tag_ignores_invalid_first_character_of_tag_name_comments
*
* @covers WP_HTML_Tag_Processor::next_tag
*
* @dataProvider data_next_tag_ignores_invalid_first_character_of_tag_name_comments

See this ticket as a reference for the test docblock order/spacing of annotations.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spaced out in 8813327

Comment on lines +1763 to +1764
* @ticket 58007
* @covers WP_HTML_Tag_Processor::next_tag
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @ticket 58007
* @covers WP_HTML_Tag_Processor::next_tag
* @ticket 58007
*
* @covers WP_HTML_Tag_Processor::next_tag

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spaced out in 8813327

Comment on lines +1785 to +1786
* @covers WP_HTML_Tag_Processor::next_tag
* @dataProvider data_html_with_unclosed_comments
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @covers WP_HTML_Tag_Processor::next_tag
* @dataProvider data_html_with_unclosed_comments
* @covers WP_HTML_Tag_Processor::next_tag
*
* @dataProvider data_html_with_unclosed_comments

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spaced out in 8813327

Comment on lines +1817 to +1818
* @covers WP_HTML_Tag_Processor::next_tag
* @dataProvider data_abruptly_closed_empty_comments
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @covers WP_HTML_Tag_Processor::next_tag
* @dataProvider data_abruptly_closed_empty_comments
* @covers WP_HTML_Tag_Processor::next_tag
*
* @dataProvider data_abruptly_closed_empty_comments

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spaced out in 8813327

@dmsnell
Copy link
Copy Markdown
Member Author

dmsnell commented Apr 12, 2023

@costdev I think I've addressed all your comments now, thanks. must say I tried to get the docblock ordering and formatting right, but for some reason that's consistently hard to do

Copy link
Copy Markdown
Contributor

@costdev costdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @dmsnell! Looks like there's just a PHPCS issue to resolve, but other than that this looks good to me 👍

@dmsnell
Copy link
Copy Markdown
Member Author

dmsnell commented Apr 12, 2023

So there were @costdev - sorry for missing that. I'll double-check once the tests finish again to make sure no remain.

* If a non-alpha starts the tag name in a tag closer it's a comment.
* Find the first `>`, which closes the comment.
*
* See https://github.com/WordPress/wordpress-develop/pull/4256
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to link to the spec as a somewhat more canonical source instead? While it's fair that this PR states the intent behind this change, I've been itching to see why we can actually do this 😅

How about e.g.

Suggested change
* See https://github.com/WordPress/wordpress-develop/pull/4256
* See https://html.spec.whatwg.org/#parse-error-invalid-first-character-of-tag-name

(which is the reference we're also using in the tests)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch; this was probably me pasting the wrong thing and not noticing it

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated in 663055b

@dmsnell dmsnell changed the title HTML API: Support comments created by invalid tag name in tag closers HTML API: Fix two invalid comment syntax cases Apr 20, 2023
@dmsnell dmsnell force-pushed the html-api/support-invalid-first-character-in-tag-name-comments branch from 663055b to 47d4b4f Compare April 20, 2023 12:32
@dmsnell dmsnell changed the title HTML API: Fix two invalid comment syntax cases HTML API: Fix invalid comment syntax cases Apr 20, 2023
Copy link
Copy Markdown
Contributor

@ockham ockham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@dmsnell dmsnell closed this Apr 20, 2023
@dmsnell dmsnell deleted the html-api/support-invalid-first-character-in-tag-name-comments branch April 20, 2023 17:10
@ockham
Copy link
Copy Markdown
Contributor

ockham commented Apr 20, 2023

Committed to Core trunk in r55667.

@ockham
Copy link
Copy Markdown
Contributor

ockham commented Apr 20, 2023

Backported to Core's 6.2 branch in r55668.

dmsnell added a commit to WordPress/gutenberg that referenced this pull request Apr 20, 2023
 - Support a few extra invalid comment syntaxes. WordPress/wordpress-develop#4256
 - Invalidate bookmarks which have been eliminated by enqueued changes. WordPress/wordpress-develop#4116
 - Expose whether the currently-matched tag has the self-closing flag. WordPress/wordpress-develop#4266
 - Avoid double-writing an attribute value if given case-variations of the name. WordPress/wordpress-develop#4337
dmsnell added a commit to WordPress/gutenberg that referenced this pull request Apr 21, 2023
 - Support a few extra invalid comment syntaxes. WordPress/wordpress-develop#4256
 - Invalidate bookmarks which have been eliminated by enqueued changes. WordPress/wordpress-develop#4116
 - Expose whether the currently-matched tag has the self-closing flag. WordPress/wordpress-develop#4266
 - Avoid double-writing an attribute value if given case-variations of the name. WordPress/wordpress-develop#4337
 - Linting updates from WordPress/wordpress-develop#4360.
dmsnell added a commit to WordPress/gutenberg that referenced this pull request Apr 24, 2023
 - Support a few extra invalid comment syntaxes. WordPress/wordpress-develop#4256
 - Invalidate bookmarks which have been eliminated by enqueued changes. WordPress/wordpress-develop#4116
 - Expose whether the currently-matched tag has the self-closing flag. WordPress/wordpress-develop#4266
 - Avoid double-writing an attribute value if given case-variations of the name. WordPress/wordpress-develop#4337
 - Linting updates from WordPress/wordpress-develop#4360.
 - Avoid losing previous updates in certain cases when seeking earlier in a document. WordPress/wordpress-develop#4345
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants