Skip to content

Change RR to consider a subchannel in TF if it has failed since it was READY.#20245

Merged
markdroth merged 1 commit intogrpc:masterfrom
markdroth:rr_connectivity_state
Sep 19, 2019
Merged

Change RR to consider a subchannel in TF if it has failed since it was READY.#20245
markdroth merged 1 commit intogrpc:masterfrom
markdroth:rr_connectivity_state

Conversation

@markdroth
Copy link
Copy Markdown
Member

@markdroth markdroth commented Sep 12, 2019

This ensures that RR will report TRANSIENT_FAILURE (and thus fail pending RPCs instead of queuing them indefinitely) when all subchannels are failing, even if the subchannels are bouncing back and forth between CONNECTING and TRANSIENT_FAILURE.

Unfortunately, the tests I've added here don't actually prove that this change works correctly, because they pass even without the change. As noted in the TODO comments there, we currently don't actually seem to have a good way to test the case that this PR was meant to address (although if you can think of a way to do this, I'd love to hear it).


This change is Reviewable

@markdroth markdroth added the release notes: no Indicates if PR should not be in release notes label Sep 12, 2019
@markdroth markdroth requested a review from apolcyn as a code owner September 12, 2019 18:15
Copy link
Copy Markdown
Contributor

@AspirinSJL AspirinSJL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are Java and Go doing the same? @ejona86 @dfawley

Copy link
Copy Markdown
Contributor

@AspirinSJL AspirinSJL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @apolcyn)


src/core/ext/filters/client_channel/lb_policy/round_robin/round_robin.cc, line 388 at r1 (raw file):

do not change the aggregation state 

Nit: might be clearer to say something like "do not report state changes" so that the "not changing state" behavior sounds more like a subchannel limitation other than aggregation state limitation for every subchannel.


test/cpp/end2end/client_lb_end2end_test.cc, line 1294 at r1 (raw file):

  EXPECT_TRUE(WaitForChannelReady(channel.get()));
  // Now kill the servers.  The channel should transition to TRANSIENT_FAILURE.
  // TODO(roth): This test should ideally check that even when the

This is unfortunate but I can't think of any easy way to tune the connection time either.

@markdroth markdroth force-pushed the rr_connectivity_state branch from 15493f3 to 6cbd610 Compare September 16, 2019 15:48
Copy link
Copy Markdown
Member Author

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The impetus for this PR was a suggestion from Eric and Doug. I don't think they've made this change in Java or Go yet, but we'll probably do this as part of the xDS work, since we're thinking that we'll want to use this same aggregation algorithm across priorities in the xds policy.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @apolcyn)


src/core/ext/filters/client_channel/lb_policy/round_robin/round_robin.cc, line 388 at r1 (raw file):

Previously, AspirinSJL (Juanli Shen) wrote…
do not change the aggregation state 

Nit: might be clearer to say something like "do not report state changes" so that the "not changing state" behavior sounds more like a subchannel limitation other than aggregation state limitation for every subchannel.

I've updated the comment to hopefully make this clearer.

Copy link
Copy Markdown
Contributor

@AspirinSJL AspirinSJL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @apolcyn)

@markdroth markdroth force-pushed the rr_connectivity_state branch from 6cbd610 to 01eeec7 Compare September 18, 2019 16:22
Copy link
Copy Markdown
Contributor

@AspirinSJL AspirinSJL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @apolcyn)

@markdroth
Copy link
Copy Markdown
Member Author

Known failures: #18840

@markdroth markdroth added release notes: yes Indicates if PR needs to be in release notes lang/core and removed release notes: no Indicates if PR should not be in release notes labels Sep 19, 2019
@markdroth markdroth merged commit b0e428c into grpc:master Sep 19, 2019
@markdroth markdroth deleted the rr_connectivity_state branch September 19, 2019 15:37
@lock lock Bot locked as resolved and limited conversation to collaborators Dec 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

lang/core release notes: yes Indicates if PR needs to be in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants