Skip to content

Bug: Inside comments, << is parsed as <! #325

@anko

Description

@anko

Module: [email protected]

Repro steps (Linux):

$ cd $(mktemp --directory)
$ npm i parse5-sax-parser
[... npm output installing [email protected] ...]
$ node << 'EOF'
const Parser = require('parse5-sax-parser')
const p = new Parser()
p.on('comment', (c) => console.log(c))
p.end('<!--test <<-->')
EOF
{ text: 'test <!', sourceCodeLocation: undefined }

Expected: test <<, not test <!.

Rationale: I find the above behaviour confusing because the HTML spec on comments does not limit how < can be used inside comments. A comment containing 2 consecutive less-than signs should be legal, but is currently unrepresentable.

Analysis:

I've just walked into the source and don't know the details, but it appears wrong to me that the tokeniser switches state from COMMENT_STATE to COMMENT_LESS_THAN_SIGN_STATE when encountering <. COMMENT_LESS_THAN_SIGN_STATE then treats < as ! and causes the weird output seen above.

Surely COMMENT_STATE represents the state where we're inside the text part of the comment? The only non-error references out of there should be to itself (parsing more content) or to COMMENT_END_DASH_STATE, right?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions