Amend preload integrity check to match implementations#7738
Amend preload integrity check to match implementations#7738domenic merged 12 commits intowhatwg:mainfrom
Conversation
domenic
left a comment
There was a problem hiding this comment.
LGTM editorially, let us know when the template is filled out.
Done |
|
Hmm so how is it "already implemented" if everyone has lots of failing tests? Are we sure we have multiple implementers interested in converging on the tested behavior? |
The Chromium/Gecko bugs were open before. |
|
Hmm OK, good enough for me, but let's give it a couple days for anyone to chime in. /cc @hiroshige-g. This and the Fetch PR will together close #7736 , right? |
Right. |
69c62b5 to
9cdb12d
Compare
|
Also it is helpful to add a note to https://html.spec.whatwg.org/multipage/links.html#consume-a-preloaded-resource to explain why we have a custom matching logic for integrity metadata (and not for other parts of the request). |
domenic
left a comment
There was a problem hiding this comment.
LGTM again, but I would love explicit LGTMs from @hiroshige-g and @annevk since I clearly missed some stuff last time :)
source
Outdated
|
|
||
| <li><p>the user-agent has determined that <var>entry</var>'s <span data-x="preload integrity | ||
| metadata">integrity metadata</span>'s algorithm is more collision-resistant than | ||
| <var>integrityMetadata</var>'s algorithm <ref spec=SRI></p></li> |
There was a problem hiding this comment.
This only works if it's the hash for the same content, right? But it seems you can only determine that if you already have the content, or am I missing something?
I think I'd prefer we stick to Chrome's behavior of strict equality. Perhaps @mozfreddyb has thoughts?
There was a problem hiding this comment.
Yea I agree that strict equality is cleaner. If Mozilla folks are ok with this I will be happy to revise.
There was a problem hiding this comment.
Note that anyway the integrity is checked again at the end of main fetch, since my previous preload PR. In this case, if the integrity of the consumer is weaker and invalid, it will consume the preload but the consume would fail. Added a row in the Google Sheet for this scenario.
source
Outdated
| where a developer specifies subresource integrity metadata on a preload request, but not the | ||
| following resource request. If the preload request fails subresource integrity verification and | ||
| is discarded, the resource request will fetch and consume a potentially-malicious response from | ||
| the network without verifying its integrity. <ref spec=SRI></p> |
There was a problem hiding this comment.
It might also be worth calling out that mismatching SRI leads to new requests. This raises another question, it seems from the above there is no normalization going on, which seems plausible and I don't think additional complexity is warranted here. But did we test with integrity metadata containing ? (which ends up being ignored) and the other not containing that?
cc @baek9
There was a problem hiding this comment.
It's actually not tested, and seems like normalization happens in implementations!
Added a test: web-platform-tests/wpt#33508
Will amend the equality test to be more like an SRI equality rather than a string equality.
There was a problem hiding this comment.
Done in new revision.
c18c7d8 to
195d584
Compare
source
Outdated
| <li><p>The user-agent has determined that | ||
| <var>entry</var>'s <span data-x="preload integrity metadata">integrity metadata</span>'s | ||
| algorithm is more collision-resistant than <var>integrityMetadata</var>'s algorithm | ||
| <ref spec=SRI></p></li> | ||
| </ul> |
There was a problem hiding this comment.
I would like for this to explicitly call out the "get strongest metadata from set" algorithm in SRI. I like the explicit "browser has specific priorities" better than saying "more collision-resistant".
(NB: My crypto education is very dated by now, but I also believe that the term "more collision-resistant" won't formally hold).
There was a problem hiding this comment.
"More collision resistant" is copied from the aforementioned SRI algorithm :)
https://www.w3.org/TR/SRI/#dfn-getprioritizedhashfunction-a-b
but sure
| <p class="note">A mistmatch in integrity metadata between the preload and the consumer, even if | ||
| both match the data, would lead to an additional fetch from the network.</p> |
There was a problem hiding this comment.
Thank you for calling this out specifically.
source
Outdated
| <p class="note">It is important that <span>network error</span>s are added to the preload cache | ||
| so that if a preload request results in an error, the erroneous response isn't re-requested | ||
| from the network later. This also has security implications; consider the case where a | ||
| developer specifies subresource integrity metadata on a preload request, but not the following | ||
| resource request. If the preload request fails subresource integrity verification and is | ||
| discarded, the resource request will fetch and consume a potentially-malicious response from | ||
| the network without verifying its integrity. <ref spec=SRI></p> | ||
| </li> |
There was a problem hiding this comment.
I believe this might be "new behavior" for some existing implementation. Where we previously might have said that the 2nd request is just "bad security decisions" and the developer's fault for not repeating the integrity value in all references. I like that we can gradually move away from this.
There was a problem hiding this comment.
Right. It is copied the previous preload spec
|
@mozfreddyb: revised based on your comments. Another look? :) |
mozfreddyb
left a comment
There was a problem hiding this comment.
Formally speaking, I'm not an HTML reviewer. Looks good from my SRI perspective.
d26fb8b to
768e3f8
Compare
|
@annevk I think there's consensus about this, no? I've rebased the patch. |
- If consumer or preload don't have SRI or SRIs are the same, accept the preload - If both have different SRIs, accept the preload only in some UA-defined decision that the preload algo is stronger than the consumer (which Gecko makes and Chromium currently ignores)
768e3f8 to
78098a9
Compare
^^^ |
annevk
left a comment
There was a problem hiding this comment.
Overall this looks reasonable to me, though I think we should open some issues against SRI to get the details better defined.
| <li> | ||
| <p><var>consumerIntegrityMetadata</var> is equal to <var>preloadIntegrityMetadata</var>.</p> | ||
|
|
||
| <p class="note">This comparison would ignore unknown integrity options.</p> | ||
| </li> |
There was a problem hiding this comment.
This seems a bit too hand-wavy. Can we open an issue against SRI to define an equality operation and link that issue?
There was a problem hiding this comment.
There was a problem hiding this comment.
Thanks! Please turn the "note" into "XXX" and link that.
domfarolino
left a comment
There was a problem hiding this comment.
Found this in my inbox and took a look from my old preload+SRI perspective. This looks good, thanks a lot.
|
@annevk can we merge this? It had been reviewed by multiple people now. |
|
From whatwg/fetch#1418 (comment) I see:
A while back I was helping a Firefox engineer understand what Chromium was doing in cases like this, and I wrote up https://docs.google.com/document/d/1DBSR97lO52ye3lA-Z0GiQSPN57MTE9ArQgkE-ZHJ0Ag/edit#. The last three rows show that Chromium will never reuse a preload when it didn't have integrity metadata but the consuming request did. This is consistent with row 3 in https://docs.google.com/spreadsheets/d/1Hw-4akCuzSYO2oT4iWEqk4RNUuNy0dW4HT6ZF169aHs/edit#gid=0 I believe. The only reason that Chromium does not reuse the preload in cases like this is due to an implementation optimization: we didn't want to store the bytes around forever after the response has been processed, only to be reused later to run a hash algorithm on as dictated by the consuming request. But what is our spec goal? If we could reuse the preload here that seems ideal, and for spec purposes it seems like we can right? The preload entry contains the full preload response, which I think has all of the raw bytes required to compute a hash from in the future during consumption. I'm curious why Gecko does not reuse the request. Is it just matching Blink's behavior intentionally, or does it fall prey to the same optimization? If Gecko can easily support this, and we decide that theoretically it would be nice to do so, then I can envision a Chromium implementation that can make this workUpon preloading, computes all possible hashes of the resource based on the original raw bytes, then saves all of these hashes and processes the response thus discarding the raw bytes. I think this would allow Chromium to reuse the preloaded resource here because when the consuming request comes around with integrity metadata, we can compare that with the pre-computed hashes on the preload resource.... and maybe we can change our expectations around that case. But if we decide that the behavior asserted in the test above (preload does not contain IM --> consuming request does contain IM --> consuming request does not reuse preload and goes to network) is what we want, then I suppose we can proceed. I'm just trying to protect against the scenario where we want some behavior, but are not achieving it because of an implementation optimization in Blink. |
I think the purpose of this PR is to first reflect the current state of implementations. Either way, I suggest to first proceed with aligning the spec with reality and then discuss changing both. |
Yep, sounds good to me! I just wanted to make sure we had dialogue here about possibly going the other direction in the future. LGTM |
LGTM.
I expect high complexity both in spec and impl to allow reuse in this case. There have been ideas/issues floating around SRI, but we haven't made much progress there due to high complexity and insufficient motivations. I'm happy to revisit some of them to align with the current spec (and further optimization if we have sufficient motivation and bandwidth). |
In conjunction with whatwg/fetch#1418
(See WHATWG Working Mode: Changes for more details.)
/infrastructure.html ( diff )
/links.html ( diff )
/semantics.html ( diff )