Amend preload integrity check to match implementations by noamr · Pull Request #7738 · whatwg/html

noamr · 2022-03-22T13:53:52Z

In conjunction with whatwg/fetch#1418

At least two implementers are interested (and none opposed):
- Already implemented
Tests are written and can be reviewed and commented upon at:
- Add a few cases to preload SRI web-platform-tests/wpt#33326
Implementation bugs are filed:
- WebKit 238206
- Gecko 1751835
- Chromium 981419
  (See WHATWG Working Mode: Changes for more details.)

/infrastructure.html ( diff )
/links.html ( diff )
/semantics.html ( diff )

domenic

LGTM editorially, let us know when the template is filled out.

noamr · 2022-03-22T17:34:12Z

LGTM editorially, let us know when the template is filled out.

Done

domenic · 2022-03-22T17:36:48Z

Hmm so how is it "already implemented" if everyone has lots of failing tests? Are we sure we have multiple implementers interested in converging on the tested behavior?

noamr · 2022-03-22T17:40:22Z

Hmm so how is it "already implemented" if everyone has lots of failing tests? Are we sure we have multiple implementers interested in converging on the tested behavior?

The Chromium/Gecko bugs were open before.
Most of the tests pass on all browsers - the ones that fail are buggy edge cases that also didn't exactly comply with the previous specified behavior.

domenic · 2022-03-22T18:16:35Z

Hmm OK, good enough for me, but let's give it a couple days for anyone to chime in. /cc @hiroshige-g.

This and the Fetch PR will together close #7736 , right?

noamr · 2022-03-22T18:32:47Z

Hmm OK, good enough for me, but let's give it a couple days for anyone to chime in. /cc @hiroshige-g.

This and the Fetch PR will together close #7736 , right?

Right.

source

hiroshige-g · 2022-03-24T10:58:19Z

Also it is helpful to add a note to https://html.spec.whatwg.org/multipage/links.html#consume-a-preloaded-resource to explain why we have a custom matching logic for integrity metadata (and not for other parts of the request).
Probably the note with [SRI] at https://w3c.github.io/preload/#processing is sufficient.

domenic

LGTM again, but I would love explicit LGTMs from @hiroshige-g and @annevk since I clearly missed some stuff last time :)

annevk · 2022-04-05T09:35:00Z

source

+
+     <li><p>the user-agent has determined that <var>entry</var>'s <span data-x="preload integrity
+     metadata">integrity metadata</span>'s algorithm is more collision-resistant than
+     <var>integrityMetadata</var>'s algorithm <ref spec=SRI></p></li>


This only works if it's the hash for the same content, right? But it seems you can only determine that if you already have the content, or am I missing something?

I think I'd prefer we stick to Chrome's behavior of strict equality. Perhaps @mozfreddyb has thoughts?

Yea I agree that strict equality is cleaner. If Mozilla folks are ok with this I will be happy to revise.

Note that anyway the integrity is checked again at the end of main fetch, since my previous preload PR. In this case, if the integrity of the consumer is weaker and invalid, it will consume the preload but the consume would fail. Added a row in the Google Sheet for this scenario.

annevk · 2022-04-05T09:41:56Z

source

+    where a developer specifies subresource integrity metadata on a preload request, but not the
+    following resource request. If the preload request fails subresource integrity verification and
+    is discarded, the resource request will fetch and consume a potentially-malicious response from
+    the network without verifying its integrity. <ref spec=SRI></p>


It might also be worth calling out that mismatching SRI leads to new requests. This raises another question, it seems from the above there is no normalization going on, which seems plausible and I don't think additional complexity is warranted here. But did we test with integrity metadata containing ? (which ends up being ignored) and the other not containing that?

cc @baek9

It's actually not tested, and seems like normalization happens in implementations!
Added a test: web-platform-tests/wpt#33508

Will amend the equality test to be more like an SRI equality rather than a string equality.

Done in new revision.

mozfreddyb · 2022-04-07T10:44:04Z

source

+     <li><p>The user-agent has determined that
+     <var>entry</var>'s <span data-x="preload integrity metadata">integrity metadata</span>'s
+     algorithm is more collision-resistant than <var>integrityMetadata</var>'s algorithm
+     <ref spec=SRI></p></li>
+    </ul>


I would like for this to explicitly call out the "get strongest metadata from set" algorithm in SRI. I like the explicit "browser has specific priorities" better than saying "more collision-resistant".
(NB: My crypto education is very dated by now, but I also believe that the term "more collision-resistant" won't formally hold).

"More collision resistant" is copied from the aforementioned SRI algorithm :)
https://www.w3.org/TR/SRI/#dfn-getprioritizedhashfunction-a-b

but sure

Whoops. 🤐

mozfreddyb · 2022-04-07T10:44:25Z

source

+    <p class="note">A mistmatch in integrity metadata between the preload and the consumer, even if
+    both match the data, would lead to an additional fetch from the network.</p>


Thank you for calling this out specifically.

mozfreddyb · 2022-04-07T10:47:16Z

source

+    <p class="note">It is important that <span>network error</span>s are added to the preload cache
+    so that if a preload request results in an error, the erroneous response isn't re-requested
+    from the network later. This also has security implications; consider the case where a
+    developer specifies subresource integrity metadata on a preload request, but not the following
+    resource request. If the preload request fails subresource integrity verification and is
+    discarded, the resource request will fetch and consume a potentially-malicious response from
+    the network without verifying its integrity. <ref spec=SRI></p>
+   </li>


I believe this might be "new behavior" for some existing implementation. Where we previously might have said that the 2nd request is just "bad security decisions" and the developer's fault for not repeating the integrity value in all references. I like that we can gradually move away from this.

Right. It is copied the previous preload spec

noamr · 2022-04-07T11:28:51Z

@mozfreddyb: revised based on your comments. Another look? :)

mozfreddyb

Formally speaking, I'm not an HTML reviewer. Looks good from my SRI perspective.

noamr · 2022-04-22T15:23:54Z

@annevk I think there's consensus about this, no? I've rebased the patch.

- If consumer or preload don't have SRI or SRIs are the same, accept the preload - If both have different SRIs, accept the preload only in some UA-defined decision that the preload algo is stronger than the consumer (which Gecko makes and Chromium currently ignores)

noamr · 2022-05-02T18:14:49Z

@annevk I think there's consensus about this, no? I've rebased the patch.

^^^

annevk

Overall this looks reasonable to me, though I think we should open some issues against SRI to get the details better defined.

source

annevk · 2022-05-03T09:50:55Z

source

+     <li>
+      <p><var>consumerIntegrityMetadata</var> is equal to <var>preloadIntegrityMetadata</var>.</p>
+
+      <p class="note">This comparison would ignore unknown integrity options.</p>
+     </li>


This seems a bit too hand-wavy. Can we open an issue against SRI to define an equality operation and link that issue?

Filed w3c/webappsec-subresource-integrity#116

Thanks! Please turn the "note" into "XXX" and link that.

domfarolino

Found this in my inbox and took a look from my old preload+SRI perspective. This looks good, thanks a lot.

source

noamr · 2022-05-16T07:53:14Z

@annevk can we merge this? It had been reviewed by multiple people now.

domfarolino · 2022-05-18T05:07:57Z

From whatwg/fetch#1418 (comment) I see:

Resource has SRI but preload doesn't. Both Chromium & Gecko ignore the preload in that case
...
In Chromium, the preload is consumed only if the SRI matches 1:1 (or consumer doesn't have SRI)

A while back I was helping a Firefox engineer understand what Chromium was doing in cases like this, and I wrote up https://docs.google.com/document/d/1DBSR97lO52ye3lA-Z0GiQSPN57MTE9ArQgkE-ZHJ0Ag/edit#. The last three rows show that Chromium will never reuse a preload when it didn't have integrity metadata but the consuming request did. This is consistent with row 3 in https://docs.google.com/spreadsheets/d/1Hw-4akCuzSYO2oT4iWEqk4RNUuNy0dW4HT6ZF169aHs/edit#gid=0 I believe.

The only reason that Chromium does not reuse the preload in cases like this is due to an implementation optimization: we didn't want to store the bytes around forever after the response has been processed, only to be reused later to run a hash algorithm on as dictated by the consuming request.

But what is our spec goal? If we could reuse the preload here that seems ideal, and for spec purposes it seems like we can right? The preload entry contains the full preload response, which I think has all of the raw bytes required to compute a hash from in the future during consumption. I'm curious why Gecko does not reuse the request. Is it just matching Blink's behavior intentionally, or does it fall prey to the same optimization? If Gecko can easily support this, and we decide that theoretically it would be nice to do so, then

I can envision a Chromium implementation that can make this work

Upon preloading, computes all possible hashes of the resource based on the original raw bytes, then saves all of these hashes and processes the response thus discarding the raw bytes. I think this would allow Chromium to reuse the preloaded resource here because when the consuming request comes around with integrity metadata, we can compare that with the pre-computed hashes on the preload resource.

... and maybe we can change our expectations around that case.

But if we decide that the behavior asserted in the test above (preload does not contain IM --> consuming request does contain IM --> consuming request does not reuse preload and goes to network) is what we want, then I suppose we can proceed. I'm just trying to protect against the scenario where we want some behavior, but are not achieving it because of an implementation optimization in Blink.

noamr · 2022-05-18T06:34:53Z

From whatwg/fetch#1418 (comment) I see:

Resource has SRI but preload doesn't. Both Chromium & Gecko ignore the preload in that case
...
In Chromium, the preload is consumed only if the SRI matches 1:1 (or consumer doesn't have SRI)

A while back I was helping a Firefox engineer understand what Chromium was doing in cases like this, and I wrote up https://docs.google.com/document/d/1DBSR97lO52ye3lA-Z0GiQSPN57MTE9ArQgkE-ZHJ0Ag/edit#. The last three rows show that Chromium will never reuse a preload when it didn't have integrity metadata but the consuming request did. This is consistent with row 3 in https://docs.google.com/spreadsheets/d/1Hw-4akCuzSYO2oT4iWEqk4RNUuNy0dW4HT6ZF169aHs/edit#gid=0 I believe.

The only reason that Chromium does not reuse the preload in cases like this is due to an implementation optimization: we didn't want to store the bytes around forever after the response has been processed, only to be reused later to run a hash algorithm on as dictated by the consuming request.

But what is our spec goal? If we could reuse the preload here that seems ideal, and for spec purposes it seems like we can right? The preload entry contains the full preload response, which I think has all of the raw bytes required to compute a hash from in the future during consumption. I'm curious why Gecko does not reuse the request. Is it just matching Blink's behavior intentionally, or does it fall prey to the same optimization? If Gecko can easily support this, and we decide that theoretically it would be nice to do so, then

I can envision a Chromium implementation that can make this work
... and maybe we can change our expectations around that case.

But if we decide that the behavior asserted in the test above (preload does not contain IM --> consuming request does contain IM --> consuming request does not reuse preload and goes to network) is what we want, then I suppose we can proceed. I'm just trying to protect against the scenario where we want some behavior, but are not achieving it because of an implementation optimization in Blink.

I think the purpose of this PR is to first reflect the current state of implementations.
If there is willingness from Gecko and from Chromium to change the behavior we can also change the spec. I think, as you say, that it can be fine-tuned to consume the preload in more cases, but I'm wondering if those cases are actually useful and not hypothetical edge cases.

Either way, I suggest to first proceed with aligning the spec with reality and then discuss changing both.

domfarolino · 2022-05-18T20:58:12Z

Either way, I suggest to first proceed with aligning the spec with reality and then discuss changing both.

Yep, sounds good to me! I just wanted to make sure we had dialogue here about possibly going the other direction in the future.

LGTM

hiroshige-g · 2022-05-19T19:02:39Z

Either way, I suggest to first proceed with aligning the spec with reality and then discuss changing both.

LGTM.

Resource has SRI but preload doesn't. Both Chromium & Gecko ignore the preload in that case

I expect high complexity both in spec and impl to allow reuse in this case. There have been ideas/issues floating around SRI, but we haven't made much progress there due to high complexity and insufficient motivations. I'm happy to revisit some of them to align with the current spec (and further optimization if we have sufficient motivation and bandwidth).

noamr mentioned this pull request Mar 22, 2022

Integrity-metadata should not be a preload key whatwg/fetch#1418

Closed

3 tasks

domenic approved these changes Mar 22, 2022

View reviewed changes

domenic added the topic: link label Mar 22, 2022

noamr force-pushed the remove-sri-from-cache-key branch from 69c62b5 to 9cdb12d Compare March 23, 2022 15:37

noamr changed the title ~~Remove integrity metadata from preload key~~ Amend preload integrity check to match implementations Mar 23, 2022

hiroshige-g mentioned this pull request Mar 24, 2022

Add a few cases to preload SRI web-platform-tests/wpt#33326

Merged

hiroshige-g reviewed Mar 24, 2022

View reviewed changes

source Outdated Show resolved Hide resolved

domenic approved these changes Mar 28, 2022

View reviewed changes

annevk reviewed Apr 5, 2022

View reviewed changes

noamr force-pushed the remove-sri-from-cache-key branch from c18c7d8 to 195d584 Compare April 5, 2022 10:33

mozfreddyb reviewed Apr 7, 2022

View reviewed changes

mozfreddyb approved these changes Apr 7, 2022

View reviewed changes

noamr force-pushed the remove-sri-from-cache-key branch from d26fb8b to 768e3f8 Compare April 22, 2022 15:23

noamr added 8 commits April 23, 2022 17:00

Add condition

2344be8

Remove case

ebaf5a4

Import note about SRI from preload

969f17a

note

27a66bb

Normalize SRI

3e47361

Use better refs for 'strongest' algo

acce25f

Rebase early hints

78098a9

noamr force-pushed the remove-sri-from-cache-key branch from 768e3f8 to 78098a9 Compare April 24, 2022 07:23

annevk reviewed May 3, 2022

View reviewed changes

Dot

a97ec7a

This was referenced May 3, 2022

Be more specific about "most collision-resistant" w3c/webappsec-subresource-integrity#115

Open

Define equality comparator for metadata w3c/webappsec-subresource-integrity#116

Open

domenic added the topic: resource hints (inc. preload) label May 4, 2022

Add xxx note

f5cac9f

domfarolino approved these changes May 16, 2022

View reviewed changes

source Outdated Show resolved Hide resolved

Applies -> apply

6adfd40

Nits

ca2d82a

domenic merged commit ad09edc into whatwg:main May 19, 2022

noamr deleted the remove-sri-from-cache-key branch May 20, 2022 10:54

hiroshige-g mentioned this pull request Jun 1, 2022

Hash method mismatch in comsuming preloads can allow response not matching consumer's integrity #7973

Closed

		<p class="note">A mistmatch in integrity metadata between the preload and the consumer, even if
		both match the data, would lead to an additional fetch from the network.</p>

Conversation

noamr commented Mar 22, 2022 • edited by pr-preview bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

domenic left a comment

Choose a reason for hiding this comment

Uh oh!

noamr commented Mar 22, 2022

Uh oh!

domenic commented Mar 22, 2022

Uh oh!

noamr commented Mar 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

domenic commented Mar 22, 2022

Uh oh!

noamr commented Mar 22, 2022

Uh oh!

Uh oh!

hiroshige-g commented Mar 24, 2022

Uh oh!

domenic left a comment

Choose a reason for hiding this comment

Uh oh!

annevk Apr 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

noamr commented Apr 7, 2022

Uh oh!

mozfreddyb left a comment

Choose a reason for hiding this comment

Uh oh!

noamr commented Apr 22, 2022

Uh oh!

noamr commented May 2, 2022

Uh oh!

annevk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

domfarolino left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

noamr commented May 16, 2022

Uh oh!

domfarolino commented May 18, 2022

Uh oh!

noamr commented May 18, 2022

noamr commented Mar 22, 2022 •

edited by pr-preview bot

Loading

noamr commented Mar 22, 2022 •

edited

Loading

annevk Apr 5, 2022 •

edited

Loading