Skip to content

Proposal: Replace the token queue with an event-handler system #403

@fb55

Description

@fb55

Why

The token queue adds a level of indirection that makes it harder to fix some issues. Eg. #292 is easy to fix once the token queue is gone. Also, debugging is currently complicated, as stack traces end at the token queue.

With the queue gone, stack traces will point at the corresponding line in the tokenizer. V8 will be able to optimise more aggressively; in my branch combining all of the changes, I see a ~15% performance increase using htmlparser-benchmark.

Game plan

  1. Update the tokenizer to produce events. There will be a QueuedTokenizer class that wraps around the tokenizer, which provides an interface for the parser. Opened as refactor(tokenizer): Introduce events #404
  2. Invert event processing in the parser. The parser currently first checks the insertion mode, and then the token type. By inverting this (checking first the token type, then the insertion mode), we prepare the parser to accept the events from (1). Opened as refactor(parser): Invert event processing #405
  3. Tie everything together. Have the updated parser from (2) consume the tokenizer events from (1). Opened as refactor(parser): Consume tokenizer events #419

(1) and (2) do not depend on one-another and can be merged independently.

cc @wooorm @43081j

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions