Add get an encoder and encode or fail for URLs#238
Conversation
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.) Depends on this Encoding PR: whatwg/encoding#238. Tests: web-platform-tests/wpt#26158. Fixes #557.
andreubotella
left a comment
There was a problem hiding this comment.
Some of these changes depend on #237 (comment).
encoding.bs
Outdated
| <div class=note> | ||
| <p>In addition to the <a>decode</a>, <a>BOM sniff</a>, and <a>encode</a> algorithms below, | ||
| standards needing these legacy hooks will most likely also need to use <a>get an encoding</a> (to | ||
| turn a <a>label</a> into an <a for=/>encoding</a>) and <a>get an output encoding</a> (to turn an |
There was a problem hiding this comment.
| turn a <a>label</a> into an <a for=/>encoding</a>) and <a>get an output encoding</a> (to turn an | |
| turn a <a>label</a> into an <a for=/>encoding</a> instance) and <a>get an output encoding</a> (to turn an |
There was a problem hiding this comment.
I filed #240 on this. I'd like to go for a more minimal approach and can post a PR after this lands if that's okay.
| <ol> | ||
| <li><p>Assert: <var>encoding</var> is not <a>replacement</a> or <a>UTF-16BE/LE</a>. | ||
|
|
||
| <li><p>Return <var>encoding</var>'s <a for=/>encoder</a>. |
There was a problem hiding this comment.
| <li><p>Return <var>encoding</var>'s <a for=/>encoder</a>. | |
| <li><p>Return a new instance of <var>encoding</var>'s <a for=/>encoder</a>. |
There's a difference between an encoder (an "encoder class", so to speak) and an encoder instance, which has state. This hook should also be renamed to "get an encoder instance".
See also #237 (comment).
| </ol> | ||
|
|
||
| <p>To <dfn export>encode or fail</dfn> an I/O queue of scalar values <var>ioQueue</var> given an | ||
| <a for=/>encoder</a> <var>encoder</var> and an I/O queue of bytes <var>output</var>, run these |
There was a problem hiding this comment.
| <a for=/>encoder</a> <var>encoder</var> and an I/O queue of bytes <var>output</var>, run these | |
| <a for=/>encoder</a> instance <var>encoderInstance</var> and an I/O queue of bytes <var>output</var>, run these |
|
|
||
| <ol> | ||
| <li><p>Let <var>potentialError</var> be the result of <a>running</a> <var>encoder</var> with | ||
| <var>ioQueue</var>, <var>output</var>, and "<code>fatal</code>". |
There was a problem hiding this comment.
| <var>ioQueue</var>, <var>output</var>, and "<code>fatal</code>". | |
| <var>ioQueue</var>, <var>output</var>, and "<code>fatal</code>". | |
| <li><p><a for="I/O queue">Push</a> <a>end-of-queue</a> into <var>encoder</var>. |
Needed so the conversion to a byte sequence in whatwg/url#558 doesn't hang.
There was a problem hiding this comment.
I did this, but adjusted the wording slightly and pushed into output instead.
encoding.bs
Outdated
| </ol> | ||
|
|
||
| <div class=note> | ||
| <p>This is a legacy hook for URLs. The caller will have to keep an <a for=/>encoder</a> alive as |
There was a problem hiding this comment.
| <p>This is a legacy hook for URLs. The caller will have to keep an <a for=/>encoder</a> alive as | |
| <p>This is a legacy hook for URLs. The caller will have to keep an <a for=/>encoder</a> instace alive as |
|
I thought of an alternative that seems slightly nicer, which is that we push the error itself into the output I/O queue. That still gives the caller freedom to deal with errors in whatever way they want, but preserves the existing contract of encode (other than having to pass a different error mode) and avoids having to hand out encoders. And if the remaining caller of encode really is text/plain (I need to double check that) it might well make sense to do away with "html" entirely. |
97ddfa3 to
beecec9
Compare
|
I seemed to remember that HTML also specified some encoding for multipart/form-data, and indeed, although it doesn't use the "encode" algorithm, what it describes is equivalent to encoding with replacement. I'm not very keen on the idea of returning an I/O queue of bytes or errors – I'd prefer encode to take an error-handling callback that can push into the output I/O queue, especially since multipart/form-data and text/plain could use the same callback which would be different from the one in the URL standard. But this isn't a blocker for me. |
|
Pushing into the output queue is no good (unless you get to push errors in, but at that point...). URL needs to process non-errors differently from errors. I suppose it could use the callback to do something else, but that seems worse than the current alternatives to me. |
|
Whoops, excuse my brain fart there. In that case, I'm still not too keen on it, but it seems to be the best way to solve this, so that's fine by me. |
|
I much prefer the concept of an encoder instance over the concept of error objects intermingling in a queue with bytes, because the former is closer to actual implementation concepts. |
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.) Builds on this Encoding PR: whatwg/encoding#238. Tests: web-platform-tests/wpt#26158. Fixes #557.
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.) Builds on this Encoding PR: whatwg/encoding#238. Tests: web-platform-tests/wpt#26158. Fixes #557.
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.) Builds on this Encoding PR: whatwg/encoding#238. Tests: web-platform-tests/wpt#26158 and web-platform-tests/wpt#26317. Fixes #557.
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.) Builds on this Encoding PR: whatwg/encoding#238. Tests: web-platform-tests/wpt#26158 and web-platform-tests/wpt#26317. Fixes #557.
Please excuse the branch name. The first commit is #237 and I will rebase this once @andreubotella has looked at that.
I suspect some changes might be needed here around my IO queue usage. In particular, in this scenario we probably do want to push end-of-queue upon encountering an error, to make conversion to bytes easier for the caller.
Preview | Diff