Skip to content

Amend preload integrity check to match implementations#7738

Merged
domenic merged 12 commits intowhatwg:mainfrom
noamr:remove-sri-from-cache-key
May 19, 2022
Merged

Amend preload integrity check to match implementations#7738
domenic merged 12 commits intowhatwg:mainfrom
noamr:remove-sri-from-cache-key

Conversation

@noamr
Copy link
Collaborator

@noamr noamr commented Mar 22, 2022

In conjunction with whatwg/fetch#1418


/infrastructure.html ( diff )
/links.html ( diff )
/semantics.html ( diff )

Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM editorially, let us know when the template is filled out.

@noamr
Copy link
Collaborator Author

noamr commented Mar 22, 2022

LGTM editorially, let us know when the template is filled out.

Done

@domenic
Copy link
Member

domenic commented Mar 22, 2022

Hmm so how is it "already implemented" if everyone has lots of failing tests? Are we sure we have multiple implementers interested in converging on the tested behavior?

@noamr
Copy link
Collaborator Author

noamr commented Mar 22, 2022

Hmm so how is it "already implemented" if everyone has lots of failing tests? Are we sure we have multiple implementers interested in converging on the tested behavior?

The Chromium/Gecko bugs were open before.
Most of the tests pass on all browsers - the ones that fail are buggy edge cases that also didn't exactly comply with the previous specified behavior.

@domenic
Copy link
Member

domenic commented Mar 22, 2022

Hmm OK, good enough for me, but let's give it a couple days for anyone to chime in. /cc @hiroshige-g.

This and the Fetch PR will together close #7736 , right?

@noamr
Copy link
Collaborator Author

noamr commented Mar 22, 2022

Hmm OK, good enough for me, but let's give it a couple days for anyone to chime in. /cc @hiroshige-g.

This and the Fetch PR will together close #7736 , right?

Right.

@noamr noamr force-pushed the remove-sri-from-cache-key branch from 69c62b5 to 9cdb12d Compare March 23, 2022 15:37
@noamr noamr changed the title Remove integrity metadata from preload key Amend preload integrity check to match implementations Mar 23, 2022
@hiroshige-g
Copy link
Contributor

Also it is helpful to add a note to https://html.spec.whatwg.org/multipage/links.html#consume-a-preloaded-resource to explain why we have a custom matching logic for integrity metadata (and not for other parts of the request).
Probably the note with [SRI] at https://w3c.github.io/preload/#processing is sufficient.

Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM again, but I would love explicit LGTMs from @hiroshige-g and @annevk since I clearly missed some stuff last time :)

source Outdated

<li><p>the user-agent has determined that <var>entry</var>'s <span data-x="preload integrity
metadata">integrity metadata</span>'s algorithm is more collision-resistant than
<var>integrityMetadata</var>'s algorithm <ref spec=SRI></p></li>
Copy link
Member

@annevk annevk Apr 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only works if it's the hash for the same content, right? But it seems you can only determine that if you already have the content, or am I missing something?

I think I'd prefer we stick to Chrome's behavior of strict equality. Perhaps @mozfreddyb has thoughts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I agree that strict equality is cleaner. If Mozilla folks are ok with this I will be happy to revise.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that anyway the integrity is checked again at the end of main fetch, since my previous preload PR. In this case, if the integrity of the consumer is weaker and invalid, it will consume the preload but the consume would fail. Added a row in the Google Sheet for this scenario.

source Outdated
where a developer specifies subresource integrity metadata on a preload request, but not the
following resource request. If the preload request fails subresource integrity verification and
is discarded, the resource request will fetch and consume a potentially-malicious response from
the network without verifying its integrity. <ref spec=SRI></p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also be worth calling out that mismatching SRI leads to new requests. This raises another question, it seems from the above there is no normalization going on, which seems plausible and I don't think additional complexity is warranted here. But did we test with integrity metadata containing ? (which ends up being ignored) and the other not containing that?

cc @baek9

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually not tested, and seems like normalization happens in implementations!
Added a test: web-platform-tests/wpt#33508

Will amend the equality test to be more like an SRI equality rather than a string equality.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in new revision.

@noamr noamr force-pushed the remove-sri-from-cache-key branch from c18c7d8 to 195d584 Compare April 5, 2022 10:33
source Outdated
Comment on lines 25818 to 25822
<li><p>The user-agent has determined that
<var>entry</var>'s <span data-x="preload integrity metadata">integrity metadata</span>'s
algorithm is more collision-resistant than <var>integrityMetadata</var>'s algorithm
<ref spec=SRI></p></li>
</ul>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like for this to explicitly call out the "get strongest metadata from set" algorithm in SRI. I like the explicit "browser has specific priorities" better than saying "more collision-resistant".
(NB: My crypto education is very dated by now, but I also believe that the term "more collision-resistant" won't formally hold).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"More collision resistant" is copied from the aforementioned SRI algorithm :)
https://www.w3.org/TR/SRI/#dfn-getprioritizedhashfunction-a-b

but sure

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops. 🤐

Comment on lines 25826 to 25827
<p class="note">A mistmatch in integrity metadata between the preload and the consumer, even if
both match the data, would lead to an additional fetch from the network.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for calling this out specifically.

source Outdated
Comment on lines 25829 to 25836
<p class="note">It is important that <span>network error</span>s are added to the preload cache
so that if a preload request results in an error, the erroneous response isn't re-requested
from the network later. This also has security implications; consider the case where a
developer specifies subresource integrity metadata on a preload request, but not the following
resource request. If the preload request fails subresource integrity verification and is
discarded, the resource request will fetch and consume a potentially-malicious response from
the network without verifying its integrity. <ref spec=SRI></p>
</li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this might be "new behavior" for some existing implementation. Where we previously might have said that the 2nd request is just "bad security decisions" and the developer's fault for not repeating the integrity value in all references. I like that we can gradually move away from this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. It is copied the previous preload spec

@noamr
Copy link
Collaborator Author

noamr commented Apr 7, 2022

@mozfreddyb: revised based on your comments. Another look? :)

Copy link
Contributor

@mozfreddyb mozfreddyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formally speaking, I'm not an HTML reviewer. Looks good from my SRI perspective.

@noamr noamr force-pushed the remove-sri-from-cache-key branch from d26fb8b to 768e3f8 Compare April 22, 2022 15:23
@noamr
Copy link
Collaborator Author

noamr commented Apr 22, 2022

@annevk I think there's consensus about this, no? I've rebased the patch.

noamr added 8 commits April 23, 2022 17:00
- If consumer or preload don't have SRI or SRIs are the same, accept the preload
- If both have different SRIs, accept the preload only in some UA-defined
  decision that the preload algo is stronger than the consumer
  (which Gecko makes and Chromium currently ignores)
@noamr noamr force-pushed the remove-sri-from-cache-key branch from 768e3f8 to 78098a9 Compare April 24, 2022 07:23
@noamr
Copy link
Collaborator Author

noamr commented May 2, 2022

@annevk I think there's consensus about this, no? I've rebased the patch.

^^^

Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks reasonable to me, though I think we should open some issues against SRI to get the details better defined.

Comment on lines 26177 to 26181
<li>
<p><var>consumerIntegrityMetadata</var> is equal to <var>preloadIntegrityMetadata</var>.</p>

<p class="note">This comparison would ignore unknown integrity options.</p>
</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit too hand-wavy. Can we open an issue against SRI to define an equality operation and link that issue?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Please turn the "note" into "XXX" and link that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@domfarolino domfarolino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found this in my inbox and took a look from my old preload+SRI perspective. This looks good, thanks a lot.

@noamr
Copy link
Collaborator Author

noamr commented May 16, 2022

@annevk can we merge this? It had been reviewed by multiple people now.

@domfarolino
Copy link
Member

From whatwg/fetch#1418 (comment) I see:

Resource has SRI but preload doesn't. Both Chromium & Gecko ignore the preload in that case
...
In Chromium, the preload is consumed only if the SRI matches 1:1 (or consumer doesn't have SRI)

A while back I was helping a Firefox engineer understand what Chromium was doing in cases like this, and I wrote up https://docs.google.com/document/d/1DBSR97lO52ye3lA-Z0GiQSPN57MTE9ArQgkE-ZHJ0Ag/edit#. The last three rows show that Chromium will never reuse a preload when it didn't have integrity metadata but the consuming request did. This is consistent with row 3 in https://docs.google.com/spreadsheets/d/1Hw-4akCuzSYO2oT4iWEqk4RNUuNy0dW4HT6ZF169aHs/edit#gid=0 I believe.

The only reason that Chromium does not reuse the preload in cases like this is due to an implementation optimization: we didn't want to store the bytes around forever after the response has been processed, only to be reused later to run a hash algorithm on as dictated by the consuming request.

But what is our spec goal? If we could reuse the preload here that seems ideal, and for spec purposes it seems like we can right? The preload entry contains the full preload response, which I think has all of the raw bytes required to compute a hash from in the future during consumption. I'm curious why Gecko does not reuse the request. Is it just matching Blink's behavior intentionally, or does it fall prey to the same optimization? If Gecko can easily support this, and we decide that theoretically it would be nice to do so, then

I can envision a Chromium implementation that can make this work Upon preloading, computes all possible hashes of the resource based on the original raw bytes, then saves all of these hashes and processes the response thus discarding the raw bytes. I think this would allow Chromium to reuse the preloaded resource here because when the consuming request comes around with integrity metadata, we can compare that with the pre-computed hashes on the preload resource.

... and maybe we can change our expectations around that case.

But if we decide that the behavior asserted in the test above (preload does not contain IM --> consuming request does contain IM --> consuming request does not reuse preload and goes to network) is what we want, then I suppose we can proceed. I'm just trying to protect against the scenario where we want some behavior, but are not achieving it because of an implementation optimization in Blink.

@noamr
Copy link
Collaborator Author

noamr commented May 18, 2022

From whatwg/fetch#1418 (comment) I see:

Resource has SRI but preload doesn't. Both Chromium & Gecko ignore the preload in that case
...
In Chromium, the preload is consumed only if the SRI matches 1:1 (or consumer doesn't have SRI)

A while back I was helping a Firefox engineer understand what Chromium was doing in cases like this, and I wrote up https://docs.google.com/document/d/1DBSR97lO52ye3lA-Z0GiQSPN57MTE9ArQgkE-ZHJ0Ag/edit#. The last three rows show that Chromium will never reuse a preload when it didn't have integrity metadata but the consuming request did. This is consistent with row 3 in https://docs.google.com/spreadsheets/d/1Hw-4akCuzSYO2oT4iWEqk4RNUuNy0dW4HT6ZF169aHs/edit#gid=0 I believe.

The only reason that Chromium does not reuse the preload in cases like this is due to an implementation optimization: we didn't want to store the bytes around forever after the response has been processed, only to be reused later to run a hash algorithm on as dictated by the consuming request.

But what is our spec goal? If we could reuse the preload here that seems ideal, and for spec purposes it seems like we can right? The preload entry contains the full preload response, which I think has all of the raw bytes required to compute a hash from in the future during consumption. I'm curious why Gecko does not reuse the request. Is it just matching Blink's behavior intentionally, or does it fall prey to the same optimization? If Gecko can easily support this, and we decide that theoretically it would be nice to do so, then

I can envision a Chromium implementation that can make this work
... and maybe we can change our expectations around that case.

But if we decide that the behavior asserted in the test above (preload does not contain IM --> consuming request does contain IM --> consuming request does not reuse preload and goes to network) is what we want, then I suppose we can proceed. I'm just trying to protect against the scenario where we want some behavior, but are not achieving it because of an implementation optimization in Blink.

I think the purpose of this PR is to first reflect the current state of implementations.
If there is willingness from Gecko and from Chromium to change the behavior we can also change the spec. I think, as you say, that it can be fine-tuned to consume the preload in more cases, but I'm wondering if those cases are actually useful and not hypothetical edge cases.

Either way, I suggest to first proceed with aligning the spec with reality and then discuss changing both.

@domfarolino
Copy link
Member

Either way, I suggest to first proceed with aligning the spec with reality and then discuss changing both.

Yep, sounds good to me! I just wanted to make sure we had dialogue here about possibly going the other direction in the future.

LGTM

@domenic domenic merged commit ad09edc into whatwg:main May 19, 2022
@hiroshige-g
Copy link
Contributor

Either way, I suggest to first proceed with aligning the spec with reality and then discuss changing both.

LGTM.

Resource has SRI but preload doesn't. Both Chromium & Gecko ignore the preload in that case

I expect high complexity both in spec and impl to allow reuse in this case. There have been ideas/issues floating around SRI, but we haven't made much progress there due to high complexity and insufficient motivations. I'm happy to revisit some of them to align with the current spec (and further optimization if we have sufficient motivation and bandwidth).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

6 participants