Skip to content

Releases: fb55/htmlparser2

v12.0.0

20 Mar 23:11

Choose a tag to compare

What's Changed

This release aligns HTML parsing with the WHATWG spec Almost all changes are to HTML mode only β€” XML mode is unaffected unless noted.

Raw-text & RCDATA tags

  • <iframe>, <noembed>, <noframes>, and <plaintext> are now raw-text tags, their content is no longer parsed as HTML
  • <textarea> now decodes entities like <title> already did
  • Self-closing <script/>, <style/>, etc. now enter their raw-text state (the / is ignored per spec) unless recognizeSelfClosing is enabled

SVG & MathML

  • Tag names inside <svg> are case-adjusted per spec (foreignObject, clipPath, etc.)
  • CDATA sections inside foreign content are treated as text
  • Special-tag detection is disabled inside foreign content
  • Stray </svg> / </math> no longer corrupt the parser's context tracking

Comments & declarations

  • <!-->, <!--->, <!->, <!> now parse as valid comments per spec
  • <?…> and non-DOCTYPE <!…> in HTML mode emit bogus comments instead of being silently dropped
  • <!DOCTYPEhtml> (no space) is recognized as a DOCTYPE
  • Unclosed comments, <!DOCTYPE, <?…, <![CDATA[… at EOF emit the correct token type

Implicit open/close

  • <h1>–<h6> implicitly close other headings
  • <a> closes a previous <a>
  • Nested <form> is ignored when one is already open
  • <image> is rewritten to <img> outside foreign content
  • </> is silently ignored instead of emitted as text

Other fixes

  • Fixed reset() not clearing attribute state, which could leak data across parseComplete() calls

#2387

Full Changelog: v11.0.0...v12.0.0

v11.0.0

19 Mar 11:23

Choose a tag to compare

Breaking Changes

  • The module is now ESM only #2381
    • CommonJS require() is not supported in legacy environment anymore. Use import instead.
    • The minimum Node.js version is now 20.19.0.
  • Dependencies have been bumped to their latest major versions: domhandler v6, domutils v4, domelementtype v3, entities v8.

Features

  • Added WebWritableStream for the Web Streams API, enabling direct piping from fetch() response bodies into the parser #2376

Bug Fixes

  • Comments now accept --!> as a closing sequence per the HTML spec, and <!--> is recognized as an empty comment in HTML mode #2382
  • XML processing instructions (<?xml ... ?>) now require the full ?> closing sequence instead of just > #2382
  • Fixed reset() not clearing isSpecial and sequenceIndex state, which could cause incorrect parsing after reuse #2382
  • Fixed XML comment parsing: <!--> is no longer treated as a complete comment in xmlMode #2383

Other Changes

  • Expanded README with full API reference, parser options, events, and practical examples #2384

New Contributors

Full Changelog: v10.1.0...v11.0.0

v10.1.0

21 Jan 14:21

Choose a tag to compare

What's Changed

  • entities was bumped from 6.0.1 to 7.0.1, bringing size & speed improvements #2215
  • Test files are no longer shipped in the published module 72da671

New Contributors

  • @KTibow made their first contribution, bumping us to eslint 9 in #2204

Full Changelog: v10.0.0...v10.1.0

v10.0.0

24 Dec 10:49

Choose a tag to compare

v9.1.0

05 Jan 11:06

Choose a tag to compare

Fixes

Features

v9.0.0

10 May 09:05

Choose a tag to compare

Breaking Changes

  • The tokenizer now uses the EntityDecoder from the entities module #1480
    • Parsing of entities in attributes is now aligned with the HTML spec, and some inputs will produce different results. Eg. in <a href='&amp=boo'> the attribute value won't be modified any more.
    • The ontextentity tokenizer callback now has an endIndex argument; if you use the tokenizer directly, make sure indices are still the same.
  • Stacks inside the parser have been reversed. #1511

Features

  • Added a createDocumentStream function, analogous to createDomStream (which is now deprecated) #1510

Full Changelog: v8.0.2...v9.0.0

v8.0.2

22 Mar 23:43

Choose a tag to compare

Bug Fixes

  • Reset tokenizer baseState after closing tag name by @KillyMXI in #1460

Other changes

  • Dependency version bumps
  • GitHub Workflows security hardening by @sashashura in #1365
  • refactor(lint): Add eslint-plugin-n and -unicorn by @fb55 in #1352
  • chore(test): Move from JSON tests to specs by @fb55 in #1354
  • docs(readme): Use GitHub Actions CI badge by @fb55 in #1374

New Contributors

Full Changelog: v8.0.1...v8.0.2

v8.0.1

29 Apr 15:44

Choose a tag to compare

  • Added missing WritableStream export in the package.json 6923fca

v8.0.0...v8.0.1

v8.0.0

23 Apr 11:54

Choose a tag to compare

Breaking

  • The deprecated FeedHandler class has been removed #1166
    • See #1166 for how to migrate.
  • Typescript >= 4.5 is now required; see #1242
  • The types from domhandler and domutils have changed, the deprecated normalizeWhitespace option was removed #1164
  • The parser was updated to no longer concatenate strings. This led to several changes of internal interfaces. #1045
    • This reduces the memory overhead when parsing streams, and avoids copying memory.
    • Breaking if you were previously extending internals.
  • Parser.write() and Parser.end() now only accept string arguments. If you were previously
    passing Buffer, convert it to a string first (e.g. parser.write(buffer.toString())), or use
    WritableStream which handles decoding for you.

Features

  • htmlparser2 is now a dual CommonJS & ESM module #1165

Other changes

New Contributors

Full Changelog: v7.2.0...v8.0.0

v7.2.0

11 Nov 14:33

Choose a tag to compare

What's Changed

Fixes:

Docs

  • docs(readme): make parseDocument() example clearer by @cameronsteele in #998

Refactors:

  • Introduce sequences & fast forwarding by @fb55 in #1007
  • Emit text before entities once entity is confirmed by @fb55 in #1009

The refactors lead to a combined ~5% speed-up.

New Contributors

  • @cameronsteele made their first contribution in #998

Full Changelog: v7.1.2...v7.2.0