Skip to content

Surprising parsing behavior with active formatting elements nad PLAINTEXT #8009

@securityMB

Description

@securityMB

The HTML spec has the following note in the fragment about tree construction of PLAINTEXT:

Once a start tag with the tag name "plaintext" has been seen, that will be the last token ever seen other than character tokens (and the end-of-file token), because there is no way to switch out of the PLAINTEXT state.

This is not true, and it is possible to create an element inside <plaintext> that is spec-compliant. Check the following HTML:

<p><a><plaintext>x

It will create the following DOM tree:

└─ #document
   └─ html
      ├─ head
      └─ body
         ├─ p
         │  └─ a
         └─ plaintext
            └─ a
               └─ #text: x

I feel like this is quite unexpected and doesn't happen to other RAWTEXT elements such as <xmp> or <style>. The difference between <xmp> and <plaintext> is that in the former we have Reconstruct the active formatting elements while in the latter the active formatting elements are reconstructed on the first character token.

I was wondering about potential security implications of this behavior but couldn't find one. But still, I find this behavior surprising and believe it should be fixed by adding "Reconstruct the active formatting elements" for <plaintext> as well.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions