Skip to content

nodeLabel truncates in the middle of unicode surrogate pairs #11697

@wildlyinaccurate

Description

@wildlyinaccurate

Provide the steps to reproduce

  1. Run LH on https://www.yfood.eu/ with JSON output

What is the current behavior?

The label audit for the .c-regionswitch__select--footer element has a badly-truncated nodeLabel:

{
"node": {
    "type": "node",
    "selector": ".c-regionswitch__select--footer",
    "path": "1,HTML,1,BODY,1,DIV,4,DIV,0,FOOTER,1,DIV,0,DIV,0,SELECT",
    "snippet": "<select class=\"c-regionswitch__select c-regionswitch__select--footer\">",
    "explanation": "Fix any of the following:\n  aria-label attribute does not exist or is empty\n  aria-labelledby attribute does not exist, references elements that do not exist or references elements that are empty\n  Form element does not have an implicit (wrapped) <label>\n  Form element does not have an explicit <label>\n  Element has no title attribute or the title attribute is empty",
    "nodeLabel": "🇩🇪 Deutschland\n🇬🇧 United Kingdom\n🇵🇱 Polska\n🇳🇱 Nederland\n🇫🇷 France\n🇨\ud83c"
}

Not all JSON parsers are able to parse this correctly. For example PHP's json_decode function fails with an error: "Single unpaired UTF-16 surrogate".

Edit: golang's unmarshall also has problems.

What is the expected behavior?

Unicode surrogate pairs should be retained when truncating strings for better compatibility with JSON parsers.

Environment Information

  • Affected Channels: CLI
  • Lighthouse version: 6.4.1
  • Operating System: Ubuntu 20.10 (Linux 5.8.0)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions