Add get an encoder and encode or fail for URLs by annevk · Pull Request #238 · whatwg/encoding

annevk · 2020-10-21T11:01:58Z

Please excuse the branch name. The first commit is #237 and I will rebase this once @andreubotella has looked at that.

I suspect some changes might be needed here around my IO queue usage. In particular, in this scenario we probably do want to push end-of-queue upon encountering an error, to make conversion to bytes easier for the caller.

Preview | Diff

encoding.bs

Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.) Depends on this Encoding PR: whatwg/encoding#238. Tests: web-platform-tests/wpt#26158. Fixes #557.

andreubotella

(posted by mistake, the actual review is below)

andreubotella

Some of these changes depend on #237 (comment).

andreubotella · 2020-10-21T22:54:13Z

encoding.bs

+<div class=note>
+ <p>In addition to the <a>decode</a>, <a>BOM sniff</a>, and <a>encode</a> algorithms below,
+ standards needing these legacy hooks will most likely also need to use <a>get an encoding</a> (to
+ turn a <a>label</a> into an <a for=/>encoding</a>) and <a>get an output encoding</a> (to turn an


Suggested change

turn a <a>label</a> into an <a for=/>encoding</a>) and <a>get an output encoding</a> (to turn an

turn a <a>label</a> into an <a for=/>encoding</a> instance) and <a>get an output encoding</a> (to turn an

I filed #240 on this. I'd like to go for a more minimal approach and can post a PR after this lands if that's okay.

encoding.bs

andreubotella · 2020-10-21T22:56:09Z

encoding.bs

+<ol>
+ <li><p>Assert: <var>encoding</var> is not <a>replacement</a> or <a>UTF-16BE/LE</a>.
+
+ <li><p>Return <var>encoding</var>'s <a for=/>encoder</a>.


Suggested change

<li>Return <var>encoding</var>'s <a for=/>encoder</a>.

<li>Return a new instance of <var>encoding</var>'s <a for=/>encoder</a>.

There's a difference between an encoder (an "encoder class", so to speak) and an encoder instance, which has state. This hook should also be renamed to "get an encoder instance".

See also #237 (comment).

andreubotella · 2020-10-21T22:56:24Z

encoding.bs

+</ol>
+
+<p>To <dfn export>encode or fail</dfn> an I/O queue of scalar values <var>ioQueue</var> given an
+<a for=/>encoder</a> <var>encoder</var> and an I/O queue of bytes <var>output</var>, run these


Suggested change

<a for=/>encoder</a> <var>encoder</var> and an I/O queue of bytes <var>output</var>, run these

<a for=/>encoder</a> instance <var>encoderInstance</var> and an I/O queue of bytes <var>output</var>, run these

andreubotella · 2020-10-21T22:57:54Z

encoding.bs

+
+<ol>
+ <li><p>Let <var>potentialError</var> be the result of <a>running</a> <var>encoder</var> with
+ <var>ioQueue</var>, <var>output</var>, and "<code>fatal</code>".


Suggested change

<var>ioQueue</var>, <var>output</var>, and "<code>fatal</code>".

<var>ioQueue</var>, <var>output</var>, and "<code>fatal</code>".

<li><a for="I/O queue">Push</a> <a>end-of-queue</a> into <var>encoder</var>.

Needed so the conversion to a byte sequence in whatwg/url#558 doesn't hang.

I did this, but adjusted the wording slightly and pushed into output instead.

Whoops, nice catch.

andreubotella · 2020-10-21T22:58:15Z

encoding.bs

+</ol>
+
+<div class=note>
+ <p>This is a legacy hook for URLs. The caller will have to keep an <a for=/>encoder</a> alive as


Suggested change

This is a legacy hook for URLs. The caller will have to keep an <a for=/>encoder</a> alive as

This is a legacy hook for URLs. The caller will have to keep an <a for=/>encoder</a> instace alive as

annevk · 2020-10-22T06:57:55Z

I thought of an alternative that seems slightly nicer, which is that we push the error itself into the output I/O queue. That still gives the caller freedom to deal with errors in whatever way they want, but preserves the existing contract of encode (other than having to pass a different error mode) and avoids having to hand out encoders. And if the remaining caller of encode really is text/plain (I need to double check that) it might well make sense to do away with "html" entirely.

Fixes #235.

andreubotella · 2020-10-22T13:00:24Z

I seemed to remember that HTML also specified some encoding for multipart/form-data, and indeed, although it doesn't use the "encode" algorithm, what it describes is equivalent to encoding with replacement.

I'm not very keen on the idea of returning an I/O queue of bytes or errors – I'd prefer encode to take an error-handling callback that can push into the output I/O queue, especially since multipart/form-data and text/plain could use the same callback which would be different from the one in the URL standard. But this isn't a blocker for me.

annevk · 2020-10-22T13:05:26Z

Pushing into the output queue is no good (unless you get to push errors in, but at that point...). URL needs to process non-errors differently from errors. I suppose it could use the callback to do something else, but that seems worse than the current alternatives to me.

andreubotella · 2020-10-22T13:11:14Z

Whoops, excuse my brain fart there. In that case, I'm still not too keen on it, but it seems to be the best way to solve this, so that's fine by me.

hsivonen · 2020-10-22T16:34:14Z

I much prefer the concept of an encoder instance over the concept of error objects intermingling in a queue with bytes, because the former is closer to actual implementation concepts.

Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.) Builds on this Encoding PR: whatwg/encoding#238. Tests: web-platform-tests/wpt#26158. Fixes #557.

Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.) Builds on this Encoding PR: whatwg/encoding#238. Tests: web-platform-tests/wpt#26158 and web-platform-tests/wpt#26317. Fixes #557.

annevk requested a review from hsivonen October 21, 2020 11:01

hsivonen suggested changes Oct 21, 2020

View reviewed changes

encoding.bs Outdated Show resolved Hide resolved

annevk mentioned this pull request Oct 21, 2020

Encoding: ISO-2022-JP encoder "SO/SI ESC" test web-platform-tests/wpt#26158

Merged

andreubotella mentioned this pull request Oct 21, 2020

Reorganize hooks for standards #237

Merged

andreubotella requested changes Oct 21, 2020

View reviewed changes

Add get an encoder and encode or fail for URLs

beecec9

Fixes #235.

annevk force-pushed the annevk/hooks-for-standards branch from 97ddfa3 to beecec9 Compare October 22, 2020 10:25

annevk mentioned this pull request Oct 23, 2020

Talk about instances of decoders and encoders #240

Closed

address some of the review feedback

4632a05

annevk requested a review from hsivonen October 23, 2020 11:54

hsivonen approved these changes Oct 23, 2020

View reviewed changes

annevk merged commit c55584b into master Oct 23, 2020

annevk deleted the annevk/hooks-for-standards branch October 23, 2020 14:44

annevk mentioned this pull request Oct 23, 2020

Refactor query state to operate on a buffer whatwg/url#558

Merged

3 tasks

	turn a <a>label</a> into an <a for=/>encoding</a>) and <a>get an output encoding</a> (to turn an
	turn a <a>label</a> into an <a for=/>encoding</a> instance) and <a>get an output encoding</a> (to turn an

	<li><p>Return <var>encoding</var>'s <a for=/>encoder</a>.
	<li><p>Return a new instance of <var>encoding</var>'s <a for=/>encoder</a>.

	<a for=/>encoder</a> <var>encoder</var> and an I/O queue of bytes <var>output</var>, run these
	<a for=/>encoder</a> instance <var>encoderInstance</var> and an I/O queue of bytes <var>output</var>, run these

	<p>This is a legacy hook for URLs. The caller will have to keep an <a for=/>encoder</a> alive as
	<p>This is a legacy hook for URLs. The caller will have to keep an <a for=/>encoder</a> instace alive as

Comments

Conversation

annevk commented Oct 21, 2020 • edited by pr-preview bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

andreubotella left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andreubotella left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

annevk commented Oct 22, 2020

Uh oh!

andreubotella commented Oct 22, 2020

Uh oh!

annevk commented Oct 22, 2020

Uh oh!

andreubotella commented Oct 22, 2020

Uh oh!

hsivonen commented Oct 22, 2020

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

annevk commented Oct 21, 2020 •

edited by pr-preview bot

Loading

andreubotella left a comment •

edited

Loading