Conversation
domenic
left a comment
There was a problem hiding this comment.
Excited to see further work on data-driven tests!!
url/resources/percent-encoding.py
Outdated
| value = request.GET.first(b"value") | ||
| encoding = request.GET.first(b"encoding") | ||
|
|
||
| output_value = numeric_references(unquote(value).decode(b"utf-8")) |
There was a problem hiding this comment.
I worry a bit about all the steps of smuggling the value through URL parameters. If a browser doesn't do percent-encoding correctly, then probably it will fail to get the correct output_value, messing up the test.
I guess the assumption is that the browser does utf-8 percent encoding correctly?
If you wanted to avoid that assumption, you could use another encoding scheme, e.g. base64 encoding or maybe JSON.stringify.
There was a problem hiding this comment.
Yeah, that might be a good improvement here.
There was a problem hiding this comment.
It doesn't seem like the output of JSON.stringify is always ASCII so maybe I'll leave this as-is for now.
There was a problem hiding this comment.
base64 with atob/btoa and Python's equivalent seems like it'd work nicely?
There was a problem hiding this comment.
How does that work given the non-ASCII input?
There was a problem hiding this comment.
Ah, right, it's indeed a bit more complicated. https://github.com/jsdom/whatwg-url/blob/d5f65c2bef5d88f873203d33289ddfe13f2eee1f/live-viewer/live-viewer.js#L135-L150 is what I was thinking, but perhaps it's not worth it, especially given that you'd have to implement it in both Python and JS.
There was a problem hiding this comment.
I made this work, but it took longer than expected due to some weirdness from WPT.
domenic
left a comment
There was a problem hiding this comment.
Interesting that every browser seems to fail some of these... but different ones.
url/resources/percent-encoding.py
Outdated
| value = request.GET.first(b"value") | ||
| encoding = request.GET.first(b"encoding") | ||
|
|
||
| output_value = numeric_references(unquote(value).decode(b"utf-8")) |
There was a problem hiding this comment.
base64 with atob/btoa and Python's equivalent seems like it'd work nicely?
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.) Builds on this Encoding PR: whatwg/encoding#238. Tests: web-platform-tests/wpt#26158 and web-platform-tests/wpt#26317. Fixes #557.
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.) Builds on this Encoding PR: whatwg/encoding#238. Tests: web-platform-tests/wpt#26158 and web-platform-tests/wpt#26317. Fixes #557.
|
This test case should probably be added as well, since the bug that makes Firefox fail on "−" with Shift_JIS is more general than that: |
| "windows-1252": "%86" | ||
| } | ||
| }, | ||
| "This uses a trailing A to prevent the URL parser from trimming the C0 control.", |
There was a problem hiding this comment.
It looks like in to-ascii.json we had a "comment" field inside each test case, which we could use for single-test case comments. Might be worth reusing here. (whatwg-url creates the test case description by appending the comment, if it exists.)
|
Still LGTM. I incorporated these into whatwg-url, although as discussed previously whatwg-url doesn't do non-UTF-8 encodings, so it's not as helpful as one might like. Still, it gives a basic sanity check, I guess. jsdom/whatwg-url#174 |
Refs: web-platform-tests/wpt#26317 PR-URL: #36032 Reviewed-By: Rich Trott <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Joyee Cheung <[email protected]>
Refs: web-platform-tests/wpt#26317 PR-URL: #36032 Reviewed-By: Rich Trott <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Joyee Cheung <[email protected]>
@domenic @rmisev @achristensen07 thoughts? (There are many ways to expand this to also cover form submission and such, but I thought I'd start out simple to ensure we agree on a common ground.)