Skip to content

Incorrect decoding of branch of size 1 with a value that's already encoded #852

@paramonov

Description

@paramonov

Hello there, I faced a terrible issue using cheerio library
I've created an issue on parse5 library but after further investigations I realised the problem is here

I see two different behaviours on legacy entities in the same case on different entities:

const texts = [
    "&cent<", // legacy
    "&nbsp<", // legacy
    "&middot<", // legacy
    "&ensp<", // not legacy
];

for (const text of texts) {
    console.log(`"${text}"`, `"${decodeHTML(text)}"`);
}

I'd expect the following output of this:

"&cent<" "¢<"
"&nbsp<" " <"
"&middot<" "·<"
"&ensp<" "&ensp<"

but the actual output is:

"&cent<" "¢<"
"&nbsp<" "Á"
"&middot<" "·<"
"&ensp<" "&ensp<"

I tried to solve it by my own, but unfortunately need too much time for it due to the lack of byte operations knowledge

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions