Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dom: Always replace unpaired surrogates when handling page text #35381

Merged
merged 1 commit into from
Feb 9, 2025

Conversation

mrobinson
Copy link
Member

Background:

JavaScript strings are potentially ill-formed UTF-16 (arbitrary
Vec) and can contain unpaired surrogates. Rust’s String type is
well-formed UTF-8 and can not contain any surrogate. Surrogates are
never emitted when decoding bytes from the network, but they can sneak
in through document.write, the Element.innerHtml setter, or other DOM
APIs.

In 2015, Servo launched an experiment to see if unpaired surrogates
cropped up in page content. That experiment caused Servo to panic if
unpaired surrogates were encountered with a request to report the page
to bug #6564. During that time several pages were reported with unpaired
surrogates, causing Servo to panic. In addition, when running the WPT
tests Servo will never panic due to the -Z replace-surrogates option
being passed by the test driver.

Motivation:

After this 10 year experiment, it's clear that unpaired surrogates are a
real concern in page content. Several reports were filed of Servo
panicking after encountering them in real world pages. A complete fix for
this issue would be to somehow maintain unpaired surrogates in the DOM,
but that is a much larger task than simply emitting U+FFD instead of an
unpaired surrogate.

Since it is clear that this kind of content exists, it is better for
Servo to try its best to handle the content rather than crash as
production browsers should not crash due to user content when possible.
In this change, I modify Servo to always replace unpaired surrogates.

It would have been ideal to only crash when debug assertions are
enabled, but debug assertions are enabled by default in release mode --
so this wouldn't be effective for WPT tests.


  • ./mach build -d does not report any errors
  • ./mach test-tidy does not report any errors
  • There are tests for these changes

@servo-wpt-sync
Copy link
Collaborator

🤖 Opened new upstream WPT pull request (web-platform-tests/wpt#50594) with upstreamable changes.

@servo-wpt-sync
Copy link
Collaborator

📝 Transplanted new upstreamable changes to existing upstream WPT pull request (web-platform-tests/wpt#50594).

Copy link
Member

@jdm jdm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed!

Background:

> JavaScript strings are potentially ill-formed UTF-16 (arbitrary
> Vec<u16>) and can contain unpaired surrogates. Rust’s String type is
> well-formed UTF-8 and can not contain any surrogate. Surrogates are
> never emitted when decoding bytes from the network, but they can sneak
> in through document.write, the Element.innerHtml setter, or other DOM
> APIs.

In 2015, Servo launched an experiment to see if unpaired surrogates
cropped up in page content. That experiment caused Servo to panic if
unpaired surrogates were encountered with a request to report the page
to bug servo#6564. During that time several pages were reported with unpaired
surrogates, causing Servo to panic. In addition, when running the WPT
tests Servo will never panic due to the `-Z replace-surrogates` option
being passed by the test driver.

Motivation:

After this 10 year experiment, it's clear that unpaired surrogates are a
real concern in page content. Several reports were filed of Servo
panicking after encountering them in real world pages. A complete fix for
this issue would be to somehow maintain unpaired surrogates in the DOM,
but that is a much larger task than simply emitting U+FFD instead of an
unpaired surrogate.

Since it is clear that this kind of content exists, it is better for
Servo to try its best to handle the content rather than crash as
production browsers should not crash due to user content when possible.
In this change, I modify Servo to always replace unpaired surrogates.

It would have been ideal to only crash when debug assertions are
enabled, but debug assertions are enabled by default in release mode --
so this wouldn't be effective for WPT tests.

Signed-off-by: Martin Robinson <[email protected]>
@servo-wpt-sync
Copy link
Collaborator

📝 Transplanted new upstreamable changes to existing upstream WPT pull request (web-platform-tests/wpt#50594).

@mrobinson mrobinson added this pull request to the merge queue Feb 9, 2025
Merged via the queue into servo:main with commit 75cf3d7 Feb 9, 2025
22 checks passed
@mrobinson mrobinson deleted the replace-surrogates branch February 9, 2025 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants