Skip to content

Retries for gRPC services are not activated when the grpc-status field is present only in the response trailer #7701

@jberm

Description

@jberm

What is the issue?

When a gRPC server error occurs, some frameworks set the grpc-status field in the response trailer only. Under these conditions, linkerd will fail to retry the request as it seems to rely on the grpc-status field being present in the initial response header rather than the trailer that follows the payload. It would be great if linkerd would accommodate this behavior as I've witnessed multiple drivers like akka-grpc and python's grpcio that set the grpc-status field in the response trailer only and thus are incompatible with linkerd's retry logic. I've created a sample akka-grpc application to demonstrate the issue.
akka-grpc-status.tar.gz

How can it be reproduced?

You can verify this issue using the uploaded sample application. It requires the following toolkits:

  • java >= 11
  • minikube
  • linkerd-edge-22.4.1 cli

First install linkerd using the following command:

linkerd install --set proxyInit.runAsRoot=true | kubectl apply -f -

Follow these instructions to build the sample project and deploy it to minikube:

./sbt stage
eval $(minikube docker-env)

docker build -t akka-grpc-status/backend-service backend-service/target/universal/stage/
docker build -t akka-grpc-status/frontend-service frontend-service/target/universal/stage/

kubectl apply -f manifest.yaml

To observe the difference in retry behaviors, first connect to the frontend-service by running:

kubectl port-forward -n akka-grpc-status svc/frontend-service 8080:8080

Follow the logs of the backend service:

POD=$(kubectl get pods -n akka-grpc-status --no-headers -o custom-columns=":metadata.name" | grep backend-service)
kubectl logs -f -n akka-grpc-status $POD backend-service

Make a request to the endpoint that provides the grpc-status field in only the response trailer:

curl localhost:8080/greet2

Observe that the retry policy is not activated by verifying that entering greet2 is only printed once in the
backend-service logs.

Make a request to the endpoint provides the grpc-status field in both the response header and trailer:

curl localhost:8080/greet1

Observe that the retry policy is activated by verifying that entering greet1 is printed many times in the
backend-service logs.

Logs, error output, etc

The expected logging in the sample application should look similar to the following:

entering greet2
entering greet1
entering greet1
entering greet1
entering greet1
entering greet1
entering greet1
entering greet1
entering greet1
entering greet1
entering greet1
entering greet1
entering greet1
entering greet1
entering greet1
entering greet1
entering greet1
entering greet1
entering greet1
entering greet1

output of linkerd check -o short

Status check results are √

Environment

  • Kubernetes 1.21.2
  • Minikube, EKS
  • Mac OS X 10.15.7
  • edge-22.1.4

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions