javascript URL: define JS string-to-byte conversion better

Response bodies are bytes, but the algorithm (as of #1107) uses a JS string as a response body.
#### Black box testing plan

At https://github.com/whatwg/html/pull/1107#discussion_r60952976 Boris gave a test plan that would allow us to figure out the string -> byte conversion in a black box way:

> In terms of test matrix, if we need to determine this in a black-box way, it seems to me that the following are somewhat useful cases to test:
> 1. Return string is all ASCII (charCodeAt() < 128 for all indices).
> 2. Return string has charCodeAt() < 256 for all indices, but does not fall into case 1.
> 3. Return string has does not have any charCodeAt values corresponding to UTF-16 surrogate code unit values, but does not fall into cases 1 or 2.
> 4. Return string has surrogate code units which are all paired properly.
> 5. Return string has unpaired surrogate code units.
> 
> Each of these should be tested in situations in which the source of the javascript: URL is either UTF-8 or ISO-8859-1/Windows-1252.  That is, either an iframe in a document with that encoding with src pointing to a javascript: URL, or a link in a document with that encoding with href pointing to a javascript: URL.  Probably test both scenarios.
> 
> The tests should look for the following things:
> 1. What is the `document.body.textContent` of the resulting document?
> 2. What is the `document.charset` of the resulting document?
#### Relevant implementer reports
##### Gecko

From https://github.com/whatwg/html/pull/1107#discussion_r60949229

> What Gecko does in terms of conversion to bytes is that it examines the returned string to see whether all `charCodeAt()` values are 255 or less.  If so, the string is treated as byte-inflated ISO-8859-1 data (and a response is synthesized which has "ISO-8859-1" as its encoding, with the byte-deflated bytes as data).  This allows generation of non-text data, dating back to when we supported javascript: in `<img>`, say.
> 
> Otherwise the return value is treated as a sequence of UTF-16 code units encoding a Unicode string and converted to UTF-8 bytes (insert handwaving about what happens to unpaired surrogates here).  The synthesized response has "UTF-8" as its encoding.
#### Blink

From https://github.com/whatwg/html/pull/1107#discussion_r60949637 with further analysis by Boris in https://github.com/whatwg/html/pull/1107#discussion_r60952976:

> I don't think Blink does conversion to bytes at all here.  See the FIXME comment in https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/core/loader/DocumentWriter.cpp&l=75&ct=xref_jump_to_def&cl=GROK&gsn=appendReplacingData as called from https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/core/loader/DocumentLoader.cpp&sq=package:chromium&l=684&rcl=1461583037 (DocumentLoader::replaceDocumentWhileExecutingJavaScriptURL).
#### EdgeHTML

From https://github.com/whatwg/html/pull/1107#discussion_r60950447

> In Edge we specify two code pages for transformation. The first is the calculated code page which is always CP_UCS_2 which translates to Unicode, ISO 10646 according to comments. We then specify as the source code page CPSRC_NATIVEDATA which means native data, known to be CP_UCS_2 so don't allow any sort of fallbacks.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

javascript URL: define JS string-to-byte conversion better #1129

Black box testing plan

Relevant implementer reports

Gecko

Blink

EdgeHTML

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

javascript URL: define JS string-to-byte conversion better #1129

Description

Black box testing plan

Relevant implementer reports

Gecko

Blink

EdgeHTML

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions