Skip to content

grpclb-in-DNS clients fall back too early when the LB server ends the stream right after sending a server list #4887

@apolcyn

Description

@apolcyn

context: I'm working on a set of grpclb-in-DNS interop tests in this PR: grpc/grpc#16727, which uses a fake grpclb load balancer that's in: https://github.com/apolcyn/grpc-go/blob/grpclb_interop_client/interop/fake_grpclb/fake_grpclb.go

One of the scenarios is as follows:

  • One load balancer
    • the load balancer ends the stream with an OK status immediately after sending the server list.
  • DNS server that serves as an SRV record pointing to that load balancer
  • One "backend", which the load balancer points to
  • Zero "fallback" servers

The client, when ran in this test scenario, is expected to contact the balancer, get the backend address, and complete the test RPC with the backend. The Java client, however, actually ends up attempting to fall back to the non-existant fallback server right away.

I looked into this a bit and I believe the problem is as follows:

  1. the client receives the server list, and starts creating connections to backends
  2. the client receives the "end-of-stream" from the balancer immediately afterwards and checks if it should fall back
  3. when deciding whether or not to fall back, the newly created subchannels are still in connecting state, and so the client proceeds to fall back.

Because the fallback timer hasn't gone off yet, I believe the client should be waiting a bit longer to give the newly created subchannels more time to connect. I'd imagine that this balancer server behavior is OK and could be used e.g. when the balancer needs to shed load.

The test can be reproduced on a machine with docker installed, by checking out this PR, creating a ../grpc-go directory with this branch, and running:

tools/run_tests/run_grpclb_interop_tests.py -l java --scenarios_file=tools/run_tests/generated/lb_interop_test_scenarios.json scenario_name=client_referred_to_backend_insecure_short_stream_True --no_skips

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions