Skip to content

alts: Add retry loop when making RPC in ALTS's TestFullHandshake.#6183

Merged
easwars merged 5 commits intogrpc:masterfrom
matthewstevenson88:debugging-flake
Apr 11, 2023
Merged

alts: Add retry loop when making RPC in ALTS's TestFullHandshake.#6183
easwars merged 5 commits intogrpc:masterfrom
matthewstevenson88:debugging-flake

Conversation

@matthewstevenson88
Copy link
Copy Markdown
Contributor

@matthewstevenson88 matthewstevenson88 commented Apr 10, 2023

This fixes the flaky test introduced in #6177 and flagged in #6182. Before this PR, I could reproduce the flake ~1.7% of time on local runs. With this PR, I cannot reproduce the flake locally.

Fixes #6182

RELEASE NOTES: none

@matthewstevenson88 matthewstevenson88 changed the title alts: Add grpc.WithBlock() to ALTS TestFullHandshake to prevent flaki… alts: Add grpc.WithBlock() dial option in ALTS's TestFullHandshake. Apr 10, 2023
@matthewstevenson88 matthewstevenson88 marked this pull request as ready for review April 10, 2023 22:35
@matthewstevenson88 matthewstevenson88 changed the title alts: Add grpc.WithBlock() dial option in ALTS's TestFullHandshake. alts: Add retry loop when making RPC in ALTS's TestFullHandshake. Apr 11, 2023
@matthewstevenson88
Copy link
Copy Markdown
Contributor Author

@easwars This should be ready for review when you have a chance. :)

The failing test seems to be an unrelated XDS flake.

Comment thread credentials/alts/alts_test.go Outdated
c := testgrpc.NewTestServiceClient(conn)
if _, err = c.UnaryCall(ctx, &testpb.SimpleRequest{}, grpc.WaitForReady(true)); err != nil {
t.Errorf("c.UnaryCall() failed: %v", err)
for ; ctx.Err() == nil; <-time.After(5 * time.Second) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be more like this:

  • You have an overall deadline of 10s
  • You retry every 10ms
  • You make the RPCs without the WaitForReady call option
  • I'm not convinced that DeadlineExceeded is something where you retry. It should ideally be something else. You probably will see the correct code now if you make the RPC without the WaitForReady call option.
	const defaultTestTimeout = 10 * time.Second
	const defaultTestShortTimeout = 10 * time.Millisecond
	ctx, cancel := context.WithTimeout(context.Background(), defaultTestTimeout)
	defer cancel()
	c := testgrpc.NewTestServiceClient(conn)
	// TODO: A comment explaining why we need to retry the RPC until it succeeds.
	for ; ctx.Err() == nil; <-time.After(defaultTestShortTimeout) {
		_, err = c.UnaryCall(ctx, &testpb.SimpleRequest{})
		if err == nil {
			break
		}
		if code := status.Code(err); code == <expected code when handshake fails> {
			continue
		}
		t.Fatalf("c.UnaryCall() failed with unexpected error: %v", err)
	}
	if ctx.Err() != nil {
		t.Fatalf("Timeout when waiting for UnaryCall RPC to succeed")
	}

What do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, the WaitForReady option was masking an Unavailable error code, which makes more sense. And thanks for the tips! Implemented as you suggested. :)

@easwars easwars merged commit 06de8f8 into grpc:master Apr 11, 2023
@github-actions github-actions Bot locked as resolved and limited conversation to collaborators Oct 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky test: TestFullHandshake

2 participants