Skip to content

Incorrect behaviour on html entities without ending semicolon #581

@paramonov

Description

@paramonov

Parsing started behave incorrectly since v7, was working correctly on v6

how to reproduce:

const cheerio = require("cheerio");

const html = `<div class="contacts">
    <h3>Contacts</h3>
        <p>Website:&nbsp<a href="http://some.link/here" target="_blank">some.link</a></p>
        <p>Address:&nbsp<span>some address</span></p>
    </div>`;
const $ = cheerio.load(html);

const contacts = $("div.contacts p")
  .map((i, p) => $(p).text())
  .get();

console.log(contacts);

expected output:

[
  'Website: some.link',
  'Address: some address'
]

actual output:

[
  'Website:Áa href="http://some.link/here" target="_blank">some.link',
  'Address:Áspan>some address'
]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions