Conversation
|
Also need to test this in regular parsing mode (without |
Done. |
| #data | ||
| <math></p><foo> | ||
| #errors | ||
| 10: HTML end tag “p” in a foreign namespace context. | ||
| #document | ||
| | <html> | ||
| | <head> | ||
| | <body> | ||
| | <math math> | ||
| | <p> | ||
| | <foo> |
There was a problem hiding this comment.
This test agrees with Firefox but I'm not entirely sure I see how the spec gives this parsing. (I assume the others are similar, this is just the one I happened to look into.)
As I understand it, the <math> should go through a few insertion modes, inserting html, head, and body elements before being processed in the "in body" insertion mode. The rules say to insert a math element in the MathML namespace.
The </p> is handled by the rules for parsing in a foreign context under the "any other end tag." Step 2 gives a parse error. In step 6, node (the body element) is an element in the HTML namespace so step 7 says to process according to the rules in the current insertion mode which is still "in body."
The rules for </p> in the "in body" insertion mode say that if there isn't a p element in button scope (and there is not), then this is another parse error (which is missing from this test) and to insert a p element.
Inserting the p should happen at the appropriate place for inserting a node. No override target is specified so target is the current node (the math element). Foster parenting isn't enabled so the "adjusted insertion location [is] inside target, after its last child (if any)."
As far as I can tell, this should cause the following errors.
- expected doctype token
- unexpected end tag in foreign content (from the parsing in foreign context)
- unexpected end tag (from in body)
- unexpected EOF (because
foois not closed)
And the DOM should have the p and foo elements as children of math.
There must be something wrong with my analysis, but I can't figure out what it is. Any help is appreciated.
There was a problem hiding this comment.
After reading your analysis, I'm no longer convinced the test is correct either, but maybe you just managed to confuse me :)
(I didn't check the number of parse errors before.)
There was a problem hiding this comment.
The test agrees with Firefox, Safari, and Chrome. I must be missing something. The Gumbo parser (at least the updated version that appears in Nokogumbo and now in Nokogiri) parses based on my understanding and puts the p and foo elements as children of math.
I just checked html5ever. It doesn't have an up-to-date html5lib-tests submodule so I added the test and it produces the same output as Gumbo (including putting the foo element in the MathML namespace):
input: <math></p><foo>
got:
| <html>
| <head>
| <body>
| <math math>
| <p>
| <math foo>
expected:
| <html>
| <head>
| <body>
| <math math>
| <p>
| <foo>
Could this be a spec bug?
There was a problem hiding this comment.
I filed an issue in whatwg/html. I suspect I'm just misreading the spec, but I'm not sure where.
There was a problem hiding this comment.
See also whatwg/html#5113 and whatwg/html#6736
I believe your reading is correct - and that it was pointed out that this created a dangerous loophole for exploiting sanitizer round tripping. I'm not certain but I think maybe due to the nature of the problem vendors and some libraries may have addressed it before the spec was actually updated to reflect the necessary change.
|
Oh, I merged this before the corresponding spec PR; apologies for the resulting confusion. |
This is because of a change in the HTML5 parsing spec to special case closing </p> and </br> tags in foreign content (svg, mathml). See whatwg/html#5113 and html5lib/html5lib-tests#135
This is because of a change in the HTML5 parsing spec to special case closing </p> and </br> tags in foreign content (svg, mathml). See whatwg/html#5113 and html5lib/html5lib-tests#135
See whatwg/html#6736