Module: [email protected]
Repro steps (Linux):
$ cd $(mktemp --directory)
$ npm i parse5-sax-parser
[... npm output installing [email protected] ...]
$ node << 'EOF'
const Parser = require('parse5-sax-parser')
const p = new Parser()
p.on('comment', (c) => console.log(c))
p.end('<!--test <<-->')
EOF
{ text: 'test <!', sourceCodeLocation: undefined }
Expected: test <<, not test <!.
Rationale: I find the above behaviour confusing because the HTML spec on comments does not limit how < can be used inside comments. A comment containing 2 consecutive less-than signs should be legal, but is currently unrepresentable.
Analysis:
I've just walked into the source and don't know the details, but it appears wrong to me that the tokeniser switches state from COMMENT_STATE to COMMENT_LESS_THAN_SIGN_STATE when encountering <. COMMENT_LESS_THAN_SIGN_STATE then treats < as ! and causes the weird output seen above.
Surely COMMENT_STATE represents the state where we're inside the text part of the comment? The only non-error references out of there should be to itself (parsing more content) or to COMMENT_END_DASH_STATE, right?
Module: [email protected]
Repro steps (Linux):
Expected:
test <<, nottest <!.Rationale: I find the above behaviour confusing because the HTML spec on comments does not limit how
<can be used inside comments. A comment containing 2 consecutive less-than signs should be legal, but is currently unrepresentable.Analysis:
I've just walked into the source and don't know the details, but it appears wrong to me that the tokeniser switches state from
COMMENT_STATEtoCOMMENT_LESS_THAN_SIGN_STATEwhen encountering<.COMMENT_LESS_THAN_SIGN_STATEthen treats<as!and causes the weird output seen above.Surely
COMMENT_STATErepresents the state where we're inside the text part of the comment? The only non-error references out of there should be to itself (parsing more content) or toCOMMENT_END_DASH_STATE, right?