script: Prescan byte stream to determine encoding before parsing document#41376
script: Prescan byte stream to determine encoding before parsing document#41376simonwuelker merged 9 commits intoservo:mainfrom
Conversation
| [in-script.html] | ||
| expected: FAIL |
There was a problem hiding this comment.
These charset/in-* tests fail because the prescanning algorithm doesn't know that it should ignore things like <script><meta charset="windows-1251"></script>. I'll fix this in a followup, since this PR is large enough already.
7047f44 to
2766716
Compare
mrobinson
left a comment
There was a problem hiding this comment.
Amazing work @simonwuelker!
| if let Some(encoding) = encoding_detector.buffer(chunk) { | ||
| document.set_encoding(encoding); | ||
| let buffered_bytes = mem::take(&mut encoding_detector.buffered_bytes); | ||
| *self = Self::Decoding(NetworkDecoder { | ||
| decoder: Some(LossyDecoder::new_encoding_rs( | ||
| encoding, | ||
| NetworkSink::default(), | ||
| )), | ||
| encoding, | ||
| }); | ||
| return self.push(&buffered_bytes, document); | ||
| } | ||
|
|
||
| None |
There was a problem hiding this comment.
| if let Some(encoding) = encoding_detector.buffer(chunk) { | |
| document.set_encoding(encoding); | |
| let buffered_bytes = mem::take(&mut encoding_detector.buffered_bytes); | |
| *self = Self::Decoding(NetworkDecoder { | |
| decoder: Some(LossyDecoder::new_encoding_rs( | |
| encoding, | |
| NetworkSink::default(), | |
| )), | |
| encoding, | |
| }); | |
| return self.push(&buffered_bytes, document); | |
| } | |
| None | |
| encoding_detector.buffer(chunk).map(|encoding| { | |
| document.set_encoding(encoding); | |
| let buffered_bytes = mem::take(&mut encoding_detector.buffered_bytes); | |
| *self = Self::Decoding(NetworkDecoder { | |
| decoder: Some(LossyDecoder::new_encoding_rs( | |
| encoding, | |
| NetworkSink::default(), | |
| )), | |
| encoding, | |
| }); | |
| return self.push(&buffered_bytes, document); | |
| }) |
There was a problem hiding this comment.
Assigning to *self in the closure doesn't seem to be accepted by the borrow checker, because it also needs encoding_detector.buffered_bytes, which is borrowed from self.
| expected: ERROR | ||
| [WebGL test #3] | ||
| expected: FAIL |
There was a problem hiding this comment.
Likely a unrelated intermittent issue that I missed in my try run (:
2766716 to
586a9f2
Compare
586a9f2 to
768d35a
Compare
|
🔨 Triggering try run (#20350147469) for Linux (WPT) |
768d35a to
eabd62b
Compare
|
Test results for linux-wpt from try job (#20350147469): Flaky unexpected result (24)
Stable unexpected results that are known to be intermittent (31)
|
|
✨ Try run (#20350147469) succeeded. |
…ment (#41376) Servo currently completely ignores `<meta charset>` tags. When we find one with an encoding that is incompatible to the current one, then we should reload the page and start over with the new encoding. A common optimization that has even made its way into the specification is to wait for a few bytes to arrive and inspect them for `meta` tags, so the browser is able to use the correct encoding from the very beginng. In practice, I've run into problems with our WPT harness when reloading the page after `meta` tags. Therefore, this change implement the optimization first, so we never have to reload when running WPT. I've implemented prescanning in a way where we wait for 1024 bytes to arrive or for one second to pass, whichever one happens first. This causes a large number of web platform tests to flip around. I've looked at most of the new failures and I believe they're reasonable. Testing: New tests start to pass. Part of #6414 --------- Signed-off-by: Simon Wülker <[email protected]>
Signed-off-by: Simon Wülker <[email protected]>
Signed-off-by: Simon Wülker <[email protected]>
Signed-off-by: Simon Wülker <[email protected]>
Signed-off-by: Simon Wülker <[email protected]>
Signed-off-by: Simon Wülker <[email protected]>
Signed-off-by: Simon Wülker <[email protected]>
Signed-off-by: Simon Wülker <[email protected]>
Signed-off-by: Simon Wülker <[email protected]>
eabd62b to
ea7333d
Compare
|
Found a source of intermittency (22403fc) and fixing it gives us another 300 passes :) |
|
🔨 Triggering try run (#20364703193) for Linux (WPT) |
|
Test results for linux-wpt from try job (#20364703193): Flaky unexpected result (22)
Stable unexpected results that are known to be intermittent (33)
Stable unexpected results (17)
|
|
|
Signed-off-by: Simon Wülker <[email protected]>
ea7333d to
004ab4f
Compare
Servo currently completely ignores
<meta charset>tags. When we find one with an encoding that is incompatible to the current one, then we should reload the page and start over with the new encoding. A common optimization that has even made its way into the specification is to wait for a few bytes to arrive and inspect them formetatags, so the browser is able to use the correct encoding from the very beginng.In practice, I've run into problems with our WPT harness when reloading the page after
metatags. Therefore, this change implement the optimization first, so we never have to reload when running WPT. I've implemented prescanning in a way where we wait for 1024 bytes to arrive or for one second to pass, whichever one happens first.This causes a large number of web platform tests to flip around. I've looked at most of the new failures and I believe they're reasonable.
Testing: New tests start to pass.
Part of #6414