http2: Change accounting of WINDOW_UPDATE frames#14924
http2: Change accounting of WINDOW_UPDATE frames#14924yanavlasov merged 11 commits intoenvoyproxy:mainfrom
Conversation
Signed-off-by: Yan Avlasov <[email protected]>
antoniovicente
left a comment
There was a problem hiding this comment.
Just some naming nits in comments. Thanks for fixing this.
| // | ||
| // 1 + 2 * (inbound_streams + | ||
| // max_inbound_window_update_frames_per_data_frame_sent * outbound_data_frames) | ||
| // 10 * (1 + inbound_streams) + |
There was a problem hiding this comment.
Where is 10 coming from? It's a big multiplier, without any justification stated here.
There was a problem hiding this comment.
Consider that the default value of max_inbound_window_update_frames_per_data_frame_sent is 10 and it comes with a bonus factor of 2.
I think that one possible flaw in this algorithm is that once you have sent a few outbound data frames, the remote endpoint is able to send you a lot of window updates without triggering the limit.
There was a problem hiding this comment.
I've changed the formula once again. Basically it adds 1 DATA frame for each new stream. This will accommodate the case of server preemptively sending WINDOW_UPDATE.
Also dropped 2 multiplier. I do not think it makes sense in the new formula.
| // | ||
| // 1 + 2 * (inbound_streams + | ||
| // max_inbound_window_update_frames_per_data_frame_sent * outbound_data_frames) | ||
| // 10 * (1 + inbound_streams) + |
There was a problem hiding this comment.
Consider that the default value of max_inbound_window_update_frames_per_data_frame_sent is 10 and it comes with a bonus factor of 2.
I think that one possible flaw in this algorithm is that once you have sent a few outbound data frames, the remote endpoint is able to send you a lot of window updates without triggering the limit.
| 1 + 2 * (inbound_streams_ + | ||
| max_inbound_window_update_frames_per_data_frame_sent_ * outbound_data_frames_)) { | ||
| (10 * (1 + opened_streams_) + | ||
| (2 * max_inbound_window_update_frames_per_data_frame_sent_ * outbound_data_frames_))) { |
There was a problem hiding this comment.
Could you change this to:
(2 * max_inbound_window_update_frames_per_data_frame_sent_ * (outbound_data_frames_ + 1))) {
An issue I see in the current formula is that setting max_inbound_window_update_frames_per_data_frame_sent_ to uint32 max does not effectively disable this flood check. It would be good to have a mechanism to mitigate any future issue in this logic by effectively disabling the check.
There was a problem hiding this comment.
Ok, I think this will work too.
There was a problem hiding this comment.
I've massaged it a bit more to accommodate cases where DATA frames have not yet been sent.
Signed-off-by: Yan Avlasov <[email protected]>
Signed-off-by: Yan Avlasov <[email protected]>
Signed-off-by: Yan Avlasov <[email protected]>
|
/lgtm api |
antoniovicente
left a comment
There was a problem hiding this comment.
Thanks for the fix.
| // | ||
| // 1 + 2 * (inbound_streams + | ||
| // max_inbound_window_update_frames_per_data_frame_sent * outbound_data_frames) | ||
| // max_inbound_window_update_frames_per_data_frame_sent * (outbound_data_frames + opened_streams + 1) |
There was a problem hiding this comment.
I believe that the 2x multiplier was added to account for the same WINDOW_UPDATE increment to be received on a connection (stream == 0) and an individual stream (stream != 0) levels. The new formula doesn't account for that.
I believe the old formula is:
1- allow initialWINDOW_UPDATEframe on the connection level,2 * inbound_streams- allow initialWINDOW_UPDATEframe for each inbound stream (2x for connection and stream levels),2 * max_inbound_window_update_frames_per_data_frame_sent * outbound_data_frames- tunable multiplier of allowedWINDOW_UPDATEframes received in response to each sentDATAframe (2x for connection and stream levels).
The new formula seems to apply the old tunable to non-DATA frames as well. It might very well work, since the limit is going to be higher than the existing one, but the reasoning behind the formula is much less clear.
There was a problem hiding this comment.
The reasoning is to make tunable allowed number of WINDOW_UPDATE frames before any DATA was sent. This makes cases like facebook's or cases where server may preemptively open stream window knowing that it has large body work better.
I think agree that 2x multiplier needs to be put back.
Note that in the more restrictive case where max_inbound_window_update_frames_per_data_frame_sent == 1 the new formula (with 2x multiplier put back) does not increase the limit compared to the old formula.
There was a problem hiding this comment.
The reasoning is to make tunable allowed number of WINDOW_UPDATE frames before any DATA was sent. This makes cases like facebook's or cases where server may preemptively open stream window knowing that it has large body work better.
My point is that if there is a specific scenario that we want to address (e.g. upstream servers sending 2 WINDOW_UPDATE frames for each new stream), then it should be part of the formula, and we could either hardcode the value we deem reasonable (e.g. 2 * inbound_streams) or introduce a new tunable. This PR should improve the formula, and not simply increase the limit.
Also, using max_inbound_window_update_frames_per_data_frame_sent to control number of WINDOW_UPDATE frames allowed before receiving any DATA frames seems quite confusing.
There was a problem hiding this comment.
I'm going to just add a fixed constant to solve the issue we have in prod. We can discuss the formula later and do the deprecation dance if we need to.
Signed-off-by: Yan Avlasov <[email protected]>
Signed-off-by: Yan Avlasov <[email protected]>
Signed-off-by: Yan Avlasov <[email protected]>
Signed-off-by: Yan Avlasov <[email protected]>
Signed-off-by: Yan Avlasov <[email protected]>
| revert to the legacy path canonicalizer, enable the runtime flag | ||
| `envoy.reloadable_features.remove_forked_chromium_url`. | ||
| * http: increase the maximum allowed number of initial connection WINDOW_UPDATE frames sent by the peer from 1 to 5. | ||
| * http: upstream flood and abuse checks increment the count of opened HTTP/2 streams when Envoy sends initial HEADERS frame for the new stream. Before the counter was incrementred when |
There was a problem hiding this comment.
nit: Long line with line-break. Either merge the next line into this one, or apply line breaks around 100 columns instead of 185.
Signed-off-by: Yan Avlasov <[email protected]>
c0298e3
Signed-off-by: Yan Avlasov <[email protected]>
|
@zuercher or @envoyproxy/senior-maintainers: Could you do a review pass? |
| // | ||
| // the connection is terminated. The ``http2.inbound_priority_frames_flood`` stat tracks | ||
| // the connection is terminated. For downstream connections the `opened_streams` is incremented when | ||
| // Envoy receives complete response headers from the upstream server. For upstream connection the |
There was a problem hiding this comment.
@yanavlasov Is this comment right? I'm having a hard time seeing where that's happening. It looks to me like it's happening when receiving the first header frame.
Commit Message:
Allow more WINDOW_UPDATE frames to be received for connections with no open streams or for streams without any DATA frames. This is to make Envoy work with upstream H/2 servers that send multiple connection WINDOW_UPDATE frames before any streams were opened, or for new streams before any DATA was sent.
Additionally fix the issue where upstream flood checker was incrementing opened stream counter when receiving last HEADERS frame from the upstream server. It now correctly incremented when Envoy opens a new stream to the upstream server.
Risk Level: Low (WINDOW_UPDATE flood budged is slightly increased)
Testing: Unit Tests
Docs Changes: proto doc
Release Notes: Yes
Platform Specific Features: N/A
Signed-off-by: Yan Avlasov [email protected]