Skip to content

Comments

quic: supporting connections with zero initial available streams#18775

Merged
alyssawilk merged 6 commits intoenvoyproxy:mainfrom
alyssawilk:zero_streams
Nov 9, 2021
Merged

quic: supporting connections with zero initial available streams#18775
alyssawilk merged 6 commits intoenvoyproxy:mainfrom
alyssawilk:zero_streams

Conversation

@alyssawilk
Copy link
Contributor

Risk Level: Medium (conn pool refactors)
Testing: unit, integration
Docs Changes: n/a
Release Notes: n/a

Signed-off-by: Alyssa Wilk <[email protected]>
Copy link
Contributor

@RyanTheOptimist RyanTheOptimist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I have a couple of small questions.

parent_.transitionActiveClientState(*this, ActiveClient::State::READY);
// If there's waiting streams, make sure the pool will now serve them.
parent_.onUpstreamReady();
} else if (currentUnusedCapacity() == 0 && state() == ActiveClient::State::READY) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With HTTP/3, MAX_STREAMS can only increase. I'm not sure this is possible. Is this code reached in an integration tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, looking at how this is called, is the num_streams argument the new MAX_STREAMS value (which must always increase) or is it the current stream capacity? I think maybe the latter. Mind adding a comment this this method? (You'd think I could remember from the previous PR, but apparently not)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this for today, but didn't you say this can happen for a rejected 0-rtt handshake?
If we start out assuming some number of streams and create some number, then the 0-rtt handshake is rejected and we fail over to real handshake with fewer streams, I assume we can lower the functional capacity?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, hah! Yes, I did say that... and subsequently forgot it. You're totally right. Would it be worth adding a comment that says something like:

// With HTTP/3 this can only happen during a rejected 0-RTT handshake.

// Fake this to make sure we set the connection pool stream limit correctly
// before use. This may result in OnCanCreateNewOutgoingStream with zero
// available streams.
OnCanCreateNewOutgoingStream(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading EnvoyQuicClientSession::OnCanCreateNewOutgoingStream() correctly, if streamsAvailable() is 0 it will be a no-op. Are we sure we need to call it here when there are no streams available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it a no-op?
We'll call onMaxStreamsChanged(0) which will update that initial 100 streams to 0, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Man, I hate the github UI. I missed that this method is changed in this PR specifically to make it not a no-op.

Lordy. Major reviewer fail on my part!

Copy link
Contributor

@RyanTheOptimist RyanTheOptimist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Man, not my day! Sorry for the confused review comments!

parent_.transitionActiveClientState(*this, ActiveClient::State::READY);
// If there's waiting streams, make sure the pool will now serve them.
parent_.onUpstreamReady();
} else if (currentUnusedCapacity() == 0 && state() == ActiveClient::State::READY) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, hah! Yes, I did say that... and subsequently forgot it. You're totally right. Would it be worth adding a comment that says something like:

// With HTTP/3 this can only happen during a rejected 0-RTT handshake.

// Fake this to make sure we set the connection pool stream limit correctly
// before use. This may result in OnCanCreateNewOutgoingStream with zero
// available streams.
OnCanCreateNewOutgoingStream(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Man, I hate the github UI. I missed that this method is changed in this PR specifically to make it not a no-op.

Lordy. Major reviewer fail on my part!

if (streams_available > 0) {
http_connection_callbacks_->onMaxStreamsChanged(streams_available);
}
http_connection_callbacks_->onMaxStreamsChanged(streams_available);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have thought about this in the previous PR, but I think we might want to tweak the name/documentation for this method. (I'd be happy to do this in a followup if that works for you).

The argument to this method is the number of currently available streams. But the "max streams" value in HTTP/3 is the largest stream id that can be opened (well, divided by four, but whatever :>). In other words, the terminology here is slightly misaligned. I think I would be inclined to renamed this onStreamsAvailableChanged() or some such. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the suggested name change. Also, be aware that there are two existing limits on pools: max concurrent requests, and max requests over the lifetime of the connection. For any of these cases, probably best to use a verbose name so it's clear which limit is being referenced.

Copy link
Contributor Author

@alyssawilk alyssawilk Oct 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wary of this name. think if we change it to onStreamsAvailableChanged it weakens the clarity where the current APIs are named after the protocol frames. We have
onMaxStreamsChanged (MAX_STREAMs frame for HTTP/3) and
onSettings (SETTINGS affecting number of streams, among other things, for HTTP/2)
both change the streams available, but in very different ways.

Edit: would be happy to make the protocol more clear. onHttp3MaxStreamsChanged?

@ggreenway
Copy link
Member

Why is there initially zero available streams?

CI failure is real.

/wait

if (streams_available > 0) {
http_connection_callbacks_->onMaxStreamsChanged(streams_available);
}
http_connection_callbacks_->onMaxStreamsChanged(streams_available);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the suggested name change. Also, be aware that there are two existing limits on pools: max concurrent requests, and max requests over the lifetime of the connection. For any of these cases, probably best to use a verbose name so it's clear which limit is being referenced.

pool_.decrClusterStreamCapacity(stream_limit_);

// Make sure that when the connected event is raised, there is no call to
// onPoolReady, and the client is marked as busy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a connection is setup with an initial capacity of zero, do we assume it will become non-zero soon?

If so, is there any timer/timeout currently watching for this to take too long? I assume the connect_timeout is done already, because we're connected.

If not, should the pool create another connection? If we don't, won't the request be stuck here with no way to make forward progress?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wary of creating new connections in this case - I'd assuming new connections would also start with capacity 0.
in general if we hit connection and stream limits, we queue without a connection timeout (#18748) so I think this is comparable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When in this state, is the downstream-request-timeout is still in place, correct? So if nothing happens, eventually we'll give up (assuming there is a configured timeout)?

Another thought on this: should we not count the connection as established, from the pool's perspective, until it has non-zero capacity? Maybe the existing connect_timeout should cover this condition, because it's logically not fully established yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we change this to time out, we'd want to also change the way we time out HTTP/1 and HTTP/2 streams when we hit the connection + stream limits, and all would be outside the scope of this pr.
@mattklein123 for thoughts and additional visibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we change this to time out, we'd want to also change the way we time out HTTP/1 and HTTP/2 streams when we hit the connection + stream limits, and all would be outside the scope of this pr.
@mattklein123 for thoughts and additional visibility.

I think I'm missing a lot of context here, but AIUI for H3 we can have a connected connection which cannot accept any streams? Is that correct? To me this seems fundamentally different from H1 and H2 in which a new connection should be able to accept at least 1 stream? Without thinking about it too much I tend to agree with @ggreenway that a connection that doesn't ever have at least 1 available stream isn't really connected and should timeout? I think this is different from H1 or H2? Either way I agree this can be done as a separate PR whatever we decide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we handle the above corner case correctly for H2? If it's possible there I agree it's the same. Either way I suggest we file an issue to track this and then follow up later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking, I think we'll back hole the traffic, like we would for HTTP/3 before this series of fixes
filed #18880

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is different for h2, because in h2 it is allowed to start sending frames (new stream, etc) before receiving a SETTINGS frame from the peer, which is what could set the max-streams to zero.

Also, if we get a SETTINGS frame with a stream limit of zero, we move the connection to draining and never use it again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this function doesn't make sense to me. It seems like we should use min(peer value, envoy limit), not to only adjust downward.

// Adjust the concurrent stream limit if the negotiated concurrent stream limit
// is lower than the local max configured streams.
//
// Note: if multiple streams are assigned to a connection before the settings
// are received, they may still be reset by the peer. This could be avoided by
// not considering http/2 connections connected until the SETTINGS frame is
// received, but that would result in a latency penalty instead.
void MultiplexedActiveClientBase::onSettings(ReceivedSettings& settings) {
  if (settings.maxConcurrentStreams().has_value() &&
      settings.maxConcurrentStreams().value() < concurrent_stream_limit_) {
    int64_t old_unused_capacity = currentUnusedCapacity();
    // Given config limits old_unused_capacity should never exceed int32_t.
    // TODO(alyssawilk) move remaining_streams_, concurrent_stream_limit_ and
    // currentUnusedCapacity() to be explicit int32_t
    ASSERT(std::numeric_limits<int32_t>::max() >= old_unused_capacity);
    concurrent_stream_limit_ = settings.maxConcurrentStreams().value();
    int64_t delta = old_unused_capacity - currentUnusedCapacity();
    parent_.decrClusterStreamCapacity(delta);
    ENVOY_CONN_LOG(trace, "Decreasing stream capacity by {}", *codec_client_, delta);
    negative_capacity_ += delta;
  }
  // As we don't increase stream limits when maxConcurrentStreams goes up, treat
  // a stream limit of 0 as a GOAWAY.
  if (concurrent_stream_limit_ == 0) {
    parent_.transitionActiveClientState(*this, ActiveClient::State::DRAINING);
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're correct, that's why I filed a tracking bug for it :-)

@RyanTheOptimist
Copy link
Contributor

Why is there initially zero available streams?

HTTP/3 is different from HTTP/2 in how it manages streams. HTTP/3 stream management is handled at the QUIC layer (instead of the HTTP layer). With QUIC the number of streams the client can open is 0 until the limit is in raised by the server sending a MAX_STREAMS frame after the handshake or a transport param during the handshake. The vast majority of servers will set a non-zero limit in the transport param during the handshake, which means that as soon as the connection is connected, streams may be created. But some will not and the spec explicitly permits this behavior.

(There are more stream limit differences but this explains why the limit is 0 initially)

Signed-off-by: Alyssa Wilk <[email protected]>
Signed-off-by: Alyssa Wilk <[email protected]>
Signed-off-by: Alyssa Wilk <[email protected]>
ggreenway
ggreenway previously approved these changes Nov 8, 2021
Copy link
Member

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Alyssa Wilk <[email protected]>
@alyssawilk alyssawilk merged commit f1ed59b into envoyproxy:main Nov 9, 2021
mpuncel added a commit to mpuncel/envoy that referenced this pull request Nov 10, 2021
* main: (71 commits)
  bazel: fix macOS build (envoyproxy#18920)
  http: switching from 100 to 1xx (envoyproxy#18904)
  grpc: implement BufferedAsyncClient for bidirectional gRPC stream (envoyproxy#18129)
  bazel: add repository arg to benchmark_test (envoyproxy#18795)
  rocketmq_proxy: Improvement for map find (envoyproxy#18909)
  tls: unit test: spiffe signed by intermediate cert (envoyproxy#18914)
  Test for FilterConfigPerRoute dtor called on worker thread. (envoyproxy#18927)
  deps: Bump `com_google_protobuf` -> 3.19.1 (envoyproxy#18930)
  deps: Bump `com_googlesource_code_re2` -> 2021-11-01 (envoyproxy#18933)
  cvescan: Move cvescan data to yaml (envoyproxy#18947)
  remove unnecessary file level not unimplemented hide annotation (envoyproxy#18924)
  test: moving echo test (envoyproxy#18938)
  test: fixing a test flake (envoyproxy#18899)
  deps: Revert pyparsing bump (envoyproxy#18946)
  deps: Bump `build_bazel_rules_apple` -> 0.32.0 (envoyproxy#18932)
  deps: Bump `com_github_bazelbuild_buildtools` -> 4.2.3 (envoyproxy#18931)
  build(deps): bump pycparser from 2.20 to 2.21 in /tools/dependency (envoyproxy#18936)
  quic: supporting connections with zero initial available streams (envoyproxy#18775)
  test: moving proxy proto (envoyproxy#18939)
  build(deps): bump pyparsing from 3.0.4 to 3.0.5 in /tools/dependency (envoyproxy#18937)
  ...

Signed-off-by: Michael Puncel <[email protected]>
@alyssawilk alyssawilk deleted the zero_streams branch August 4, 2022 00:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants