-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
What is the issue?
After upgrading linkerd-proxy to cr.l5d.io/linkerd/proxy:stable-2.14.0, the grpc_status label for metric becomes a full-sentence description with whitespace like ""The operation completed successfully" rather than a numerical code like "2". Likely this was introduced earlier.
This change breaks existing dashboards and monitoring that expect the numeric gPRC response codes. These description strings also have whitespace and other characters that don't work well in tags.
How can it be reproduced?
It can be reproduced by running version stable-2.14.0 of the linkerd-proxy and making gRPC requests to a meshed service. When viewing the output from the /metrics endpoint of prometheus/openmetrics, the output will look something like:
route_response_total{direction="inbound",dst="algolia.thundercats.svc.cluster.local:8080",rt_route="GetSearchApiKey",status_code="200",classification="success",grpc_status="The operation completed successfully",error=""} 12
route_response_total{direction="inbound",dst="algolia.thundercats.svc.cluster.local:8080",rt_route="GetSearchApiKeys",status_code="200",classification="success",grpc_status="The operation completed successfully",error=""} 2
I think it'd also be pretty easy to reproduce this in a failing unit test case here: https://github.com/linkerd/linkerd2-proxy/blob/45b324f7b4bc5221b5ca796f79830d04ecce8e79/linkerd/app/core/src/metrics.rs#L425C48-L425C48
Logs, error output, etc
route_response_total{direction="inbound",dst="algolia.thundercats.svc.cluster.local:8080",rt_route="GetSearchApiKey",status_code="200",classification="success",grpc_status="The operation completed successfully",error=""} 12
route_response_total{direction="inbound",dst="algolia.thundercats.svc.cluster.local:8080",rt_route="GetSearchApiKeys",status_code="200",classification="success",grpc_status="The operation completed successfully",error=""} 2
output of linkerd check -o short
linkerd-identity
----------------
‼ issuer cert is valid for at least 60 days
issuer certificate will expire on 2023-10-03T18:50:09Z
see https://linkerd.io/2.14/checks/#l5d-identity-issuer-cert-not-expiring-soon for hints
linkerd-webhooks-and-apisvc-tls
-------------------------------
‼ proxy-injector cert is valid for at least 60 days
certificate will expire on 2023-10-02T18:50:08Z
see https://linkerd.io/2.14/checks/#l5d-proxy-injector-webhook-cert-not-expiring-soon for hints
‼ sp-validator cert is valid for at least 60 days
certificate will expire on 2023-10-02T18:04:59Z
see https://linkerd.io/2.14/checks/#l5d-sp-validator-webhook-cert-not-expiring-soon for hints
‼ policy-validator cert is valid for at least 60 days
certificate will expire on 2023-10-02T18:50:09Z
see https://linkerd.io/2.14/checks/#l5d-policy-validator-webhook-cert-not-expiring-soon for hints
linkerd-version
---------------
‼ cli is up-to-date
is running version 2.14.0 but the latest stable version is 2.14.1
see https://linkerd.io/2.14/checks/#l5d-version-cli for hints
control-plane-version
---------------------
‼ control plane is up-to-date
is running version 2.14.0 but the latest stable version is 2.14.1
see https://linkerd.io/2.14/checks/#l5d-version-control for hints
linkerd-control-plane-proxy
---------------------------
‼ control plane proxies are up-to-date
some proxies are not running the current version:
* linkerd-destination-59b7b9ddbd-fw2gp (stable-2.14.0)
* linkerd-destination-59b7b9ddbd-gkp7t (stable-2.14.0)
* linkerd-destination-59b7b9ddbd-t5lzr (stable-2.14.0)
* linkerd-identity-759686967-7lddr (stable-2.14.0)
* linkerd-identity-759686967-cdfwm (stable-2.14.0)
* linkerd-identity-759686967-zq2xm (stable-2.14.0)
* linkerd-proxy-injector-64fcfb4cd7-8dm28 (stable-2.14.0)
* linkerd-proxy-injector-64fcfb4cd7-nntv6 (stable-2.14.0)
* linkerd-proxy-injector-64fcfb4cd7-v7b75 (stable-2.14.0)
see https://linkerd.io/2.14/checks/#l5d-cp-proxy-version for hints
linkerd-jaeger
--------------
‼ jaeger extension proxies are up-to-date
some proxies are not running the current version:
* collector-5f57dc685b-488nb (stable-2.14.0)
* jaeger-79dd465474-qmtnc (stable-2.14.0)
* jaeger-injector-84c8c45df4-l8z54 (stable-2.14.0)
see https://linkerd.io/2.14/checks/#l5d-jaeger-proxy-cp-version for hints
linkerd-viz
-----------
‼ tap API server cert is valid for at least 60 days
certificate will expire on 2023-10-02T22:52:07Z
see https://linkerd.io/2.14/checks/#l5d-tap-cert-not-expiring-soon for hints
‼ viz extension proxies are up-to-date
some proxies are not running the current version:
* metrics-api-86f769585-hwwrc (stable-2.14.0)
* prometheus-76d7bcc46f-zz7dd (stable-2.14.0)
* tap-55f596cf7-lcctv (stable-2.14.0)
* tap-55f596cf7-wcbb6 (stable-2.14.0)
* tap-55f596cf7-wst2z (stable-2.14.0)
* tap-injector-6656db5976-xcqdg (stable-2.14.0)
* web-7df59675cc-47977 (stable-2.14.0)
see https://linkerd.io/2.14/checks/#l5d-viz-proxy-cp-version for hints
Status check results are √
Environment
- Kuberentes 2.25
- Linkerd 2.14
Possible solution
Add call to .code() here so that a numeric code is emitted? https://github.com/linkerd/linkerd2-proxy/blob/main/linkerd/app/core/src/metrics.rs#L425C48-L425C48
Additional context
No response
Would you like to work on fixing this bug?
yes