Skip to content

gRPC status code labels are reported as description strings rather than numbers #11449

@patrick-steadman

Description

@patrick-steadman

What is the issue?

After upgrading linkerd-proxy to cr.l5d.io/linkerd/proxy:stable-2.14.0, the grpc_status label for metric becomes a full-sentence description with whitespace like ""The operation completed successfully" rather than a numerical code like "2". Likely this was introduced earlier.

This change breaks existing dashboards and monitoring that expect the numeric gPRC response codes. These description strings also have whitespace and other characters that don't work well in tags.

How can it be reproduced?

It can be reproduced by running version stable-2.14.0 of the linkerd-proxy and making gRPC requests to a meshed service. When viewing the output from the /metrics endpoint of prometheus/openmetrics, the output will look something like:

route_response_total{direction="inbound",dst="algolia.thundercats.svc.cluster.local:8080",rt_route="GetSearchApiKey",status_code="200",classification="success",grpc_status="The operation completed successfully",error=""} 12
route_response_total{direction="inbound",dst="algolia.thundercats.svc.cluster.local:8080",rt_route="GetSearchApiKeys",status_code="200",classification="success",grpc_status="The operation completed successfully",error=""} 2

I think it'd also be pretty easy to reproduce this in a failing unit test case here: https://github.com/linkerd/linkerd2-proxy/blob/45b324f7b4bc5221b5ca796f79830d04ecce8e79/linkerd/app/core/src/metrics.rs#L425C48-L425C48

Logs, error output, etc

route_response_total{direction="inbound",dst="algolia.thundercats.svc.cluster.local:8080",rt_route="GetSearchApiKey",status_code="200",classification="success",grpc_status="The operation completed successfully",error=""} 12
route_response_total{direction="inbound",dst="algolia.thundercats.svc.cluster.local:8080",rt_route="GetSearchApiKeys",status_code="200",classification="success",grpc_status="The operation completed successfully",error=""} 2

output of linkerd check -o short

linkerd-identity
----------------
‼ issuer cert is valid for at least 60 days
    issuer certificate will expire on 2023-10-03T18:50:09Z
    see https://linkerd.io/2.14/checks/#l5d-identity-issuer-cert-not-expiring-soon for hints

linkerd-webhooks-and-apisvc-tls
-------------------------------
‼ proxy-injector cert is valid for at least 60 days
    certificate will expire on 2023-10-02T18:50:08Z
    see https://linkerd.io/2.14/checks/#l5d-proxy-injector-webhook-cert-not-expiring-soon for hints
‼ sp-validator cert is valid for at least 60 days
    certificate will expire on 2023-10-02T18:04:59Z
    see https://linkerd.io/2.14/checks/#l5d-sp-validator-webhook-cert-not-expiring-soon for hints
‼ policy-validator cert is valid for at least 60 days
    certificate will expire on 2023-10-02T18:50:09Z
    see https://linkerd.io/2.14/checks/#l5d-policy-validator-webhook-cert-not-expiring-soon for hints

linkerd-version
---------------
‼ cli is up-to-date
    is running version 2.14.0 but the latest stable version is 2.14.1
    see https://linkerd.io/2.14/checks/#l5d-version-cli for hints

control-plane-version
---------------------
‼ control plane is up-to-date
    is running version 2.14.0 but the latest stable version is 2.14.1
    see https://linkerd.io/2.14/checks/#l5d-version-control for hints

linkerd-control-plane-proxy
---------------------------
‼ control plane proxies are up-to-date
    some proxies are not running the current version:
        * linkerd-destination-59b7b9ddbd-fw2gp (stable-2.14.0)
        * linkerd-destination-59b7b9ddbd-gkp7t (stable-2.14.0)
        * linkerd-destination-59b7b9ddbd-t5lzr (stable-2.14.0)
        * linkerd-identity-759686967-7lddr (stable-2.14.0)
        * linkerd-identity-759686967-cdfwm (stable-2.14.0)
        * linkerd-identity-759686967-zq2xm (stable-2.14.0)
        * linkerd-proxy-injector-64fcfb4cd7-8dm28 (stable-2.14.0)
        * linkerd-proxy-injector-64fcfb4cd7-nntv6 (stable-2.14.0)
        * linkerd-proxy-injector-64fcfb4cd7-v7b75 (stable-2.14.0)
    see https://linkerd.io/2.14/checks/#l5d-cp-proxy-version for hints

linkerd-jaeger
--------------
‼ jaeger extension proxies are up-to-date
    some proxies are not running the current version:
        * collector-5f57dc685b-488nb (stable-2.14.0)
        * jaeger-79dd465474-qmtnc (stable-2.14.0)
        * jaeger-injector-84c8c45df4-l8z54 (stable-2.14.0)
    see https://linkerd.io/2.14/checks/#l5d-jaeger-proxy-cp-version for hints

linkerd-viz
-----------
‼ tap API server cert is valid for at least 60 days
    certificate will expire on 2023-10-02T22:52:07Z
    see https://linkerd.io/2.14/checks/#l5d-tap-cert-not-expiring-soon for hints
‼ viz extension proxies are up-to-date
    some proxies are not running the current version:
        * metrics-api-86f769585-hwwrc (stable-2.14.0)
        * prometheus-76d7bcc46f-zz7dd (stable-2.14.0)
        * tap-55f596cf7-lcctv (stable-2.14.0)
        * tap-55f596cf7-wcbb6 (stable-2.14.0)
        * tap-55f596cf7-wst2z (stable-2.14.0)
        * tap-injector-6656db5976-xcqdg (stable-2.14.0)
        * web-7df59675cc-47977 (stable-2.14.0)
    see https://linkerd.io/2.14/checks/#l5d-viz-proxy-cp-version for hints

Status check results are √

Environment

  • Kuberentes 2.25
  • Linkerd 2.14

Possible solution

Add call to .code() here so that a numeric code is emitted? https://github.com/linkerd/linkerd2-proxy/blob/main/linkerd/app/core/src/metrics.rs#L425C48-L425C48

Additional context

No response

Would you like to work on fixing this bug?

yes

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions