Skip to content

How would I parse character references as literal bytes and not codepoints? #667

@Dekkonot

Description

@Dekkonot

I have an element like this:

<element>&#240;&#159;&#152;&#131;</element>

If those characters are literally interpreted, they should be the byte sequence f0 9f 98 83, which should be U+1F603, or 😃. Instead, it expands to c3 b0 c2 9f c2 98 c2 83 (this sequence is not printable, but you may inspect it here).

This is very much how this is meant to work, and I am aware of that. Unfortunately this decision wasn't made nor is it controlled by me. So, I'd like to know if there's an obvious way to change how escapes are done without having to do it by just iterating through the bytes returned by a Text event.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions