grpclb: skip fallback if the LB is already in fallback mode#8253
Merged
voidzcy merged 3 commits intogrpc:masterfrom Jun 11, 2021
Merged
Conversation
…gered by receiving updated addresses containing backend addresses only.
Contributor
Author
|
Updated to eliminate the strange code reuse of FallbackModeTask for checking if fallback operation is needed. Always manually check if it is already in fallback and do so directly if fallback operations are needed. PTAL. |
ejona86
approved these changes
Jun 11, 2021
Member
|
When merging, include all the useful details from this PR's description in the commit message, including the reference to the bug number. |
… receiving no LB addresses.
Contributor
|
I built our repro test against this PR and deployed, looks good so far. With grpc 1.38.0 the bigtable client would reliably fail writes but with this PR I'm not seeing any errors. |
voidzcy
added a commit
to voidzcy/grpc-java
that referenced
this pull request
Jun 15, 2021
Manually checks if the gRPCLB policy is already in fallback mode when trying to fallback due to receiving address update without LB addresses. Commit b956f88 added an invariant check in the FallbackModeTask runnable to ensure the task is fired only when the LB is not already in fallback mode. However, that commit missed the case that receiving address updates without LB addresses can trigger the run of FallbackModeTask runnable, because the existing implementation chose to reuse the code in FallbackModeTask. In such case, running FallbackModeTask could break the invariant check as the LB policy may already in fallback mode. This change eliminates the reuse of FallbackModeTask for handling address update without LB address. That is, every time receiving address update, we manually check if it is already in fallback instead of reusing to FallbackModeTask perform the check. Note there was a discussion brought up whether we should force entering fallback (shutdown existing subchannels) or we should still keep the balancer connection. Different languages have already diverged on this. Go shuts down the balancer connection and all subchannel connections to force using fallback addresses. C-core keep the balancer connection working and does not shutdown subchannels, only let fallback happens after the existing balancer connection and subchannel connections become broken. Java shuts down the balancer connection but not subchannels. This change does not try to change the existing behavior, but only fixes the invariant check breakage. ------------------- See bug reported in b/190700476
voidzcy
added a commit
that referenced
this pull request
Jun 15, 2021
… v1.38.x) (#8253) (#8259) Manually checks if the gRPCLB policy is already in fallback mode when trying to fallback due to receiving address update without LB addresses. Commit b956f88 added an invariant check in the FallbackModeTask runnable to ensure the task is fired only when the LB is not already in fallback mode. However, that commit missed the case that receiving address updates without LB addresses can trigger the run of FallbackModeTask runnable, because the existing implementation chose to reuse the code in FallbackModeTask. In such case, running FallbackModeTask could break the invariant check as the LB policy may already in fallback mode. This change eliminates the reuse of FallbackModeTask for handling address update without LB address. That is, every time receiving address update, we manually check if it is already in fallback instead of reusing to FallbackModeTask perform the check. Note there was a discussion brought up whether we should force entering fallback (shutdown existing subchannels) or we should still keep the balancer connection. Different languages have already diverged on this. Go shuts down the balancer connection and all subchannel connections to force using fallback addresses. C-core keep the balancer connection working and does not shutdown subchannels, only let fallback happens after the existing balancer connection and subchannel connections become broken. Java shuts down the balancer connection but not subchannels. This change does not try to change the existing behavior, but only fixes the invariant check breakage. ------------------- See bug reported in b/190700476 Backport of #8253
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When receiving updated addresses from the resolver with backend addresses only, gRPCLB will executes the fallback task. However, the invariant check added by #8035 was trying to ensure the fallback task is never fired if the LB is already in fallback mode (avoid overwriting the
fallbackReason). But seems #8035 missed the case that the fallback task can be run by receiving updated addresses with backend addresses only.This change eliminates the reuse of FallbackModeTask for handling address update without LB address. That is, every time receiving address update, we manually check if it is already in fallback instead of reusing to FallbackModeTask perform the check.
See reported error in b/190700476