Skip to content

http2: Change accounting of WINDOW_UPDATE frames#14924

Merged
yanavlasov merged 11 commits intoenvoyproxy:mainfrom
yanavlasov:window-update-flood
Feb 11, 2021
Merged

http2: Change accounting of WINDOW_UPDATE frames#14924
yanavlasov merged 11 commits intoenvoyproxy:mainfrom
yanavlasov:window-update-flood

Conversation

@yanavlasov
Copy link
Copy Markdown
Contributor

@yanavlasov yanavlasov commented Feb 3, 2021

Commit Message:
Allow more WINDOW_UPDATE frames to be received for connections with no open streams or for streams without any DATA frames. This is to make Envoy work with upstream H/2 servers that send multiple connection WINDOW_UPDATE frames before any streams were opened, or for new streams before any DATA was sent.

Additionally fix the issue where upstream flood checker was incrementing opened stream counter when receiving last HEADERS frame from the upstream server. It now correctly incremented when Envoy opens a new stream to the upstream server.

Risk Level: Low (WINDOW_UPDATE flood budged is slightly increased)
Testing: Unit Tests
Docs Changes: proto doc
Release Notes: Yes
Platform Specific Features: N/A

Signed-off-by: Yan Avlasov [email protected]

@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
API shepherd assignee is @lizan
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #14924 was opened by yanavlasov.

see: more, trace.

antoniovicente
antoniovicente previously approved these changes Feb 3, 2021
Copy link
Copy Markdown
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some naming nits in comments. Thanks for fixing this.

//
// 1 + 2 * (inbound_streams +
// max_inbound_window_update_frames_per_data_frame_sent * outbound_data_frames)
// 10 * (1 + inbound_streams) +
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is 10 coming from? It's a big multiplier, without any justification stated here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider that the default value of max_inbound_window_update_frames_per_data_frame_sent is 10 and it comes with a bonus factor of 2.

I think that one possible flaw in this algorithm is that once you have sent a few outbound data frames, the remote endpoint is able to send you a lot of window updates without triggering the limit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the formula once again. Basically it adds 1 DATA frame for each new stream. This will accommodate the case of server preemptively sending WINDOW_UPDATE.
Also dropped 2 multiplier. I do not think it makes sense in the new formula.

//
// 1 + 2 * (inbound_streams +
// max_inbound_window_update_frames_per_data_frame_sent * outbound_data_frames)
// 10 * (1 + inbound_streams) +
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider that the default value of max_inbound_window_update_frames_per_data_frame_sent is 10 and it comes with a bonus factor of 2.

I think that one possible flaw in this algorithm is that once you have sent a few outbound data frames, the remote endpoint is able to send you a lot of window updates without triggering the limit.

1 + 2 * (inbound_streams_ +
max_inbound_window_update_frames_per_data_frame_sent_ * outbound_data_frames_)) {
(10 * (1 + opened_streams_) +
(2 * max_inbound_window_update_frames_per_data_frame_sent_ * outbound_data_frames_))) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change this to:

(2 * max_inbound_window_update_frames_per_data_frame_sent_ * (outbound_data_frames_ + 1))) {

An issue I see in the current formula is that setting max_inbound_window_update_frames_per_data_frame_sent_ to uint32 max does not effectively disable this flood check. It would be good to have a mechanism to mitigate any future issue in this logic by effectively disabling the check.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think this will work too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've massaged it a bit more to accommodate cases where DATA frames have not yet been sent.

Signed-off-by: Yan Avlasov <[email protected]>
Signed-off-by: Yan Avlasov <[email protected]>
@lizan
Copy link
Copy Markdown
Member

lizan commented Feb 5, 2021

/lgtm api

@repokitteh-read-only repokitteh-read-only bot removed the api label Feb 5, 2021
antoniovicente
antoniovicente previously approved these changes Feb 6, 2021
Copy link
Copy Markdown
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix.

//
// 1 + 2 * (inbound_streams +
// max_inbound_window_update_frames_per_data_frame_sent * outbound_data_frames)
// max_inbound_window_update_frames_per_data_frame_sent * (outbound_data_frames + opened_streams + 1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that the 2x multiplier was added to account for the same WINDOW_UPDATE increment to be received on a connection (stream == 0) and an individual stream (stream != 0) levels. The new formula doesn't account for that.

I believe the old formula is:

  • 1 - allow initial WINDOW_UPDATE frame on the connection level,
  • 2 * inbound_streams - allow initial WINDOW_UPDATE frame for each inbound stream (2x for connection and stream levels),
  • 2 * max_inbound_window_update_frames_per_data_frame_sent * outbound_data_frames - tunable multiplier of allowed WINDOW_UPDATE frames received in response to each sent DATA frame (2x for connection and stream levels).

The new formula seems to apply the old tunable to non-DATA frames as well. It might very well work, since the limit is going to be higher than the existing one, but the reasoning behind the formula is much less clear.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning is to make tunable allowed number of WINDOW_UPDATE frames before any DATA was sent. This makes cases like facebook's or cases where server may preemptively open stream window knowing that it has large body work better.
I think agree that 2x multiplier needs to be put back.
Note that in the more restrictive case where max_inbound_window_update_frames_per_data_frame_sent == 1 the new formula (with 2x multiplier put back) does not increase the limit compared to the old formula.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning is to make tunable allowed number of WINDOW_UPDATE frames before any DATA was sent. This makes cases like facebook's or cases where server may preemptively open stream window knowing that it has large body work better.

My point is that if there is a specific scenario that we want to address (e.g. upstream servers sending 2 WINDOW_UPDATE frames for each new stream), then it should be part of the formula, and we could either hardcode the value we deem reasonable (e.g. 2 * inbound_streams) or introduce a new tunable. This PR should improve the formula, and not simply increase the limit.

Also, using max_inbound_window_update_frames_per_data_frame_sent to control number of WINDOW_UPDATE frames allowed before receiving any DATA frames seems quite confusing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to just add a fixed constant to solve the issue we have in prod. We can discuss the formula later and do the deprecation dance if we need to.

Signed-off-by: Yan Avlasov <[email protected]>
Signed-off-by: Yan Avlasov <[email protected]>
PiotrSikora
PiotrSikora previously approved these changes Feb 10, 2021
Signed-off-by: Yan Avlasov <[email protected]>
PiotrSikora
PiotrSikora previously approved these changes Feb 10, 2021
Copy link
Copy Markdown
Contributor

@PiotrSikora PiotrSikora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

antoniovicente
antoniovicente previously approved these changes Feb 10, 2021
revert to the legacy path canonicalizer, enable the runtime flag
`envoy.reloadable_features.remove_forked_chromium_url`.
* http: increase the maximum allowed number of initial connection WINDOW_UPDATE frames sent by the peer from 1 to 5.
* http: upstream flood and abuse checks increment the count of opened HTTP/2 streams when Envoy sends initial HEADERS frame for the new stream. Before the counter was incrementred when
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Long line with line-break. Either merge the next line into this one, or apply line breaks around 100 columns instead of 185.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Yan Avlasov <[email protected]>
Signed-off-by: Yan Avlasov <[email protected]>
@antoniovicente
Copy link
Copy Markdown
Contributor

@zuercher or @envoyproxy/senior-maintainers: Could you do a review pass?

Copy link
Copy Markdown
Member

@zuercher zuercher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@yanavlasov yanavlasov merged commit 5bdcdd6 into envoyproxy:main Feb 11, 2021
//
// the connection is terminated. The ``http2.inbound_priority_frames_flood`` stat tracks
// the connection is terminated. For downstream connections the `opened_streams` is incremented when
// Envoy receives complete response headers from the upstream server. For upstream connection the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yanavlasov Is this comment right? I'm having a hard time seeing where that's happening. It looks to me like it's happening when receiving the first header frame.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants