feat(routing/http/server): expose prometheus metrics #718
Conversation
Codecov ReportAttention: Patch coverage is
@@ Coverage Diff @@
## main #718 +/- ##
==========================================
+ Coverage 60.38% 60.40% +0.02%
==========================================
Files 243 243
Lines 31021 31039 +18
==========================================
+ Hits 18731 18749 +18
+ Misses 10626 10625 -1
- Partials 1664 1665 +1
... and 10 files with indirect coverage changes 🚨 Try these New Features:
|
There was a problem hiding this comment.
Tested with #679 in ipfs/someguy#87 and lgtm.
Preview of metrics produced by this PR:
# HELP delegated_routing_server_http_request_duration_seconds The latency of the HTTP requests.
# TYPE delegated_routing_server_http_request_duration_seconds histogram
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="0.1"} 0
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="0.5"} 0
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="1"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="2"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="5"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="8"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="10"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="20"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="30"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="+Inf"} 2
delegated_routing_server_http_request_duration_seconds_sum{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service=""} 1.6826577409999999
delegated_routing_server_http_request_duration_seconds_count{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service=""} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="0.1"} 1
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="0.5"} 1
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="1"} 1
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="2"} 1
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="5"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="8"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="10"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="20"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="30"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="+Inf"} 2
delegated_routing_server_http_request_duration_seconds_sum{code="200",handler="/routing/v1/providers/{cid}",method="GET",service=""} 3.079128329
delegated_routing_server_http_request_duration_seconds_count{code="200",handler="/routing/v1/providers/{cid}",method="GET",service=""} 2
# HELP delegated_routing_server_http_requests_inflight The number of inflight requests being handled at the same time.
# TYPE delegated_routing_server_http_requests_inflight gauge
delegated_routing_server_http_requests_inflight{handler="/routing/v1/peers/{peer-id}",service=""} 0
delegated_routing_server_http_requests_inflight{handler="/routing/v1/providers/{cid}",service=""} 0
# HELP delegated_routing_server_http_response_size_bytes The size of the HTTP responses.
# TYPE delegated_routing_server_http_response_size_bytes histogram
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="100"} 0
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="1000"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="10000"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="100000"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="1e+06"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="1e+07"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="1e+08"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="1e+09"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="+Inf"} 2
delegated_routing_server_http_response_size_bytes_sum{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service=""} 398
delegated_routing_server_http_response_size_bytes_count{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service=""} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="100"} 0
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="1000"} 0
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="10000"} 0
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="100000"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="1e+06"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="1e+07"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="1e+08"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="1e+09"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="+Inf"} 2
delegated_routing_server_http_response_size_bytes_sum{code="200",handler="/routing/v1/providers/{cid}",method="GET",service=""} 42133
delegated_routing_server_http_response_size_bytes_count{code="200",handler="/routing/v1/providers/{cid}",method="GET",service=""} 2
I'm going to merge this to allow us to bubble this up to someguy and delegated-ipfs.dev
| } | ||
|
|
||
| if server.promRegistry == nil { | ||
| server.promRegistry = prometheus.NewRegistry() |
There was a problem hiding this comment.
iiuc this will disable metric for users by default, requiring opt-in via WithPrometheusRegistry.
I noticed we already call NewRegistry() in other places, and changing this requires a slight refactor of tests, so let's keep this as-is and cleanup in #722 without blocking this PR.
There was a problem hiding this comment.
Yeah. Creating a new registry without access to the registry is an anti-pattern because you can't expose those metrics.
We should definitely default to the default registry.
…request timeouts (#87) * fix: larger duration buckets for better visibility * feat: log accept header * fix: move instrumentation to boxo * feat: add tracing with auth token * feat: add 30 second request timeout * chore: remove replace directive * chore: add missing funcSampler * chore: remove request timeout this isn't working too well. We need to look more deeply into this * chore: update changelog * chore: go mod tidy * chore: go-libp2p-kad-dht v0.28.1 * chore: latest boxo#720 * chore: mod tidy * chore: boxo main with ipfs/boxo#720 and ipfs/boxo#718 * Apply suggestions from code review Co-authored-by: Marcin Rataj <[email protected]> * fix: typo --------- Co-authored-by: Daniel N <[email protected]> Co-authored-by: Marcin Rataj <[email protected]>
What's in this PR
Why
Until now, we relied on instrumentation in consumers of the delegated routing server, e.g. in someguy. The problem is that you cannot get endpoint/handler specific metrics.
The duration buckets were chosen based on probelab data and production someguy metrics. In many cases, requests take over 10 seconds.