[Gecko Bug 1701828] meta charset rewrite.#31927
Merged
moz-wptsync-bot merged 1 commit intomasterfrom Dec 9, 2021
Merged
Conversation
wpt-pr-bot
approved these changes
Dec 7, 2021
Collaborator
wpt-pr-bot
left a comment
There was a problem hiding this comment.
The review process for this patch is being conducted in the Firefox project.
Implements whatwg/html#6962 . Improves performance when <meta charset> occurs in head but after the first kilobyte and aligns behavior better with WebKit and Blink. The main change is to avoid reloads when meta appears within head but after the first kilobyte. Prior to this change, Gecko reloaded in that case (in compliance with the spec!) even though WebKit and Blink did not. Differences from WebKit and Blink: * WebKit and Blink honor <meta charset> in <noscript>. This implementation does not. * WebKit and Blink look for meta as if the tree builder was unaware of foreign content. This implementation is foreign content-aware. This makes a difference for CDATA sections that contain a > before the meta as well as style and script elements within foreign content. This could happen if the CDATA section that has mysteriously been introduced around a what looks like a meta tag also contains another prior tag-looking run of text. * This implementation processes rel=preload and speculative loads that are seen before <meta charset> has been seen. WebKit and Blink instead first look for the meta and rewind before starting speculative parsing. * Unlike WebKit, if there is neither an honored meta nor syntax resembling an XML declaration, detection from content takes place (as in Blink). * Unlike Blink, if there is neither an honored meta nor syntax resembling an XML declaration, the detection from content is not dependent of network buffer boundaries. * Unlike Blink, detection from content can trigger a reload at the end of the stream if the guess made at that point differs from the first guess. (See below for the definition of the input to the first guess.) Differences from the old spec and Gecko previously: * Meta inside script and RCDATA elements is no longer honored. * Late meta is now ignored and no longer triggers a reload. * Later meta counts as early enough meta: In addition to the previous meta within the first 1024 bytes, now a meta that started within the first 1024 bytes counts as early enough. Additionally, if by then there hasn't been a template start tag and head hasn't ended, meta occurring before the earlier of the end of the head or a template start tag counts as early enough. * Meta now counts as not-late even if the encoding label has numeric character reference escapes. * Syntax resembling an XML declaration longer than a kilobyte is honored if there is no honored meta. * If there is neither an honored meta nor syntax resembling an XML declaration, the initial chardetng scan is potentially longer than before: the first 1024 bytes, the token spanning the 1024-byte boundary if there is such a token, and, if by then head hasn't ended and there hasn't been a template start tag until the end of the template start tag or the end of the token that causes head to end, ever comes first. However, if the token implying the end of the head is a text token, bytes only to the end of the previous non-text token is considered. (This definition avoids depending on network buffer boundaries.) * XML View Source now uses the code for syntax resembling an XML declaration instead of expat for extracting the internal encoding label. Reftest are added as both WPT and Gecko reftests in order to test both http: and file: URL scenarios. The Gecko tests retain the WPT <link> tags in order to use the exact same bytes. An encoding declaration has been added to a number of old tests that didn't intend to test the new speculation behavior especially in the context of https://bugzilla.mozilla.org/show_bug.cgi?id=1727750 . Differential Revision: https://phabricator.services.mozilla.com/D125808 bugzilla-url: https://bugzilla.mozilla.org/show_bug.cgi?id=1701828 gecko-commit: 9a8abd87cc7935f29b94248c1a6f8203faa14403 gecko-reviewers: smaug
00129e4 to
c399bd1
Compare
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements whatwg/html#6962 . Improves performance
when occurs in head but after the first kilobyte and aligns
behavior better with WebKit and Blink.
The main change is to avoid reloads when meta appears within head but
after the first kilobyte. Prior to this change, Gecko reloaded in that
case (in compliance with the spec!) even though WebKit and Blink did not.
Differences from WebKit and Blink:
does not.
foreign content. This implementation is foreign content-aware. This
makes a difference for CDATA sections that contain a > before the meta
as well as style and script elements within foreign content. This could
happen if the CDATA section that has mysteriously been introduced around
a what looks like a meta tag also contains another prior tag-looking
run of text.
seen before has been seen. WebKit and Blink instead first
look for the meta and rewind before starting speculative parsing.
an XML declaration, detection from content takes place (as in Blink).
an XML declaration, the detection from content is not dependent of network
buffer boundaries.
the stream if the guess made at that point differs from the first guess.
(See below for the definition of the input to the first guess.)
Differences from the old spec and Gecko previously:
meta within the first 1024 bytes, now a meta that started within the first
1024 bytes counts as early enough. Additionally, if by then there hasn't
been a template start tag and head hasn't ended, meta occurring before the
earlier of the end of the head or a template start tag counts as early
enough.
character reference escapes.
there is no honored meta.
the initial chardetng scan is potentially longer than before: the first 1024
bytes, the token spanning the 1024-byte boundary if there is such a token,
and, if by then head hasn't ended and there hasn't been a template start tag
until the end of the template start tag or the end of the token that causes
head to end, ever comes first. However, if the token implying the end of the
head is a text token, bytes only to the end of the previous non-text token is
considered. (This definition avoids depending on network buffer boundaries.)
instead of expat for extracting the internal encoding label.
Reftest are added as both WPT and Gecko reftests in order to test both http:
and file: URL scenarios. The Gecko tests retain the WPT tags in order
to use the exact same bytes.
An encoding declaration has been added to a number of old tests that didn't
intend to test the new speculation behavior especially in the context of
https://bugzilla.mozilla.org/show_bug.cgi?id=1727750 .
Differential Revision: https://phabricator.services.mozilla.com/D125808
bugzilla-url: https://bugzilla.mozilla.org/show_bug.cgi?id=1701828
gecko-commit: 3dfd3c94a105e095aada0b356f1106370de222d3
gecko-reviewers: smaug