Skip to content

metrics: Export process metrics using prometheus-client#2552

Merged
olix0r merged 1 commit intomainfrom
ver/prom-process
Dec 6, 2023
Merged

metrics: Export process metrics using prometheus-client#2552
olix0r merged 1 commit intomainfrom
ver/prom-process

Conversation

@olix0r
Copy link
Member

@olix0r olix0r commented Dec 6, 2023

  • Move process metrics from linkerd-app-core to linkerd-metrics (with a feature flag).
  • Add a linkered_metrics::prom::registry helper that automatically configures process metrics when the feature is enabled.
  • Add a process_threads metric to help surface when the multi-core proxy runtime is in use.
  • All of this uses prometheus-client to set up for future reusability.

Before

# HELP process_start_time_seconds Time that the process started (in seconds since the UNIX epoch)
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1701551542
# HELP process_uptime_seconds_total Total time since the process started (in seconds)
# TYPE process_uptime_seconds_total counter
process_uptime_seconds_total 1782.137
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0.72
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 111042560
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 33910784
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 28
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1048576
# HELP proxy_build_info Proxy build info
# TYPE proxy_build_info gauge
proxy_build_info{version="2.213.0",git_sha="9f7e7ac",profile="release",date="2023-11-16T23:24:26Z",vendor="linkerd"} 1

After

# HELP proxy_build_info Proxy build info.
# TYPE proxy_build_info gauge
proxy_build_info{date="2023-12-06T02:15:30Z",git_sha="9c29322d5",profile="release",vendor="code@ver-sea",version="0.0.0-dev.9c29322d5"} 1
# HELP process_start_time_seconds Time that the process started (in seconds since the UNIX epoch).
# TYPE process_start_time_seconds gauge
# UNIT process_start_time_seconds seconds
process_start_time_seconds 1701829321.4647413
# HELP process_uptime_seconds Total time since the process started (in seconds)
# TYPE process_uptime_seconds counter
# UNIT process_uptime_seconds seconds
process_uptime_seconds_total 51.986633717
# HELP process_cpu_seconds Total user and system CPU time spent in seconds
# TYPE process_cpu_seconds counter
# UNIT process_cpu_seconds seconds
process_cpu_seconds_total 0.04
# HELP process_virtual_memory_bytes Virtual memory size in bytes
# TYPE process_virtual_memory_bytes gauge
# UNIT process_virtual_memory_bytes bytes
process_virtual_memory_bytes 108208128
# HELP process_resident_memory_bytes Resident memory size in bytes
# TYPE process_resident_memory_bytes gauge
# UNIT process_resident_memory_bytes bytes
process_resident_memory_bytes 27471872
# HELP process_open_fds Number of open file descriptors
# TYPE process_open_fds gauge
process_open_fds 21
# HELP process_max_fds Maximum number of open file descriptors
# TYPE process_max_fds gauge
process_max_fds 1048576
# HELP process_threads Number of OS threads in the process.
# TYPE process_threads gauge
process_threads 2
# EOF

@olix0r olix0r requested a review from a team as a code owner December 6, 2023 02:28
@codecov
Copy link

codecov bot commented Dec 6, 2023

Codecov Report

Merging #2552 (73ef529) into main (f72cc7f) will increase coverage by 0.01%.
The diff coverage is 80.21%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2552      +/-   ##
==========================================
+ Coverage   67.54%   67.56%   +0.01%     
==========================================
  Files         330      330              
  Lines       14764    14781      +17     
==========================================
+ Hits         9973     9987      +14     
- Misses       4791     4794       +3     
Files Coverage Δ
linkerd/app/core/src/lib.rs 87.50% <ø> (ø)
linkerd/app/core/src/metrics.rs 98.49% <100.00%> (-0.03%) ⬇️
linkerd/app/inbound/src/test_util.rs 100.00% <100.00%> (ø)
linkerd/app/outbound/src/test_util.rs 100.00% <100.00%> (ø)
linkerd/app/src/lib.rs 87.11% <100.00%> (ø)
linkerd/metrics/src/counter.rs 91.30% <ø> (ø)
linkerd/metrics/src/lib.rs 100.00% <100.00%> (ø)
linkerd/tracing/src/lib.rs 50.00% <0.00%> (-0.67%) ⬇️
linkerd/metrics/src/process.rs 81.01% <81.01%> (ø)

... and 3 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f72cc7f...73ef529. Read the comment docs.

* Move process metrics from linkerd-app-core to linkerd-metrics (with a
  feature flag).
* Add a linkered_metrics::prom::registry helper that automatically
  configures process metrics when the feature is enabled.
* Add a process_threads metric to help surface when the multi-core proxy
  runtime is in use.
* All of this uses prometheus-client to set up for future reusability.

Before

    # HELP process_start_time_seconds Time that the process started (in seconds since the UNIX epoch)
    # TYPE process_start_time_seconds gauge
    process_start_time_seconds 1701551542
    # HELP process_uptime_seconds_total Total time since the process started (in seconds)
    # TYPE process_uptime_seconds_total counter
    process_uptime_seconds_total 1782.137
    # HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
    # TYPE process_cpu_seconds_total counter
    process_cpu_seconds_total 0.72
    # HELP process_virtual_memory_bytes Virtual memory size in bytes.
    # TYPE process_virtual_memory_bytes gauge
    process_virtual_memory_bytes 111042560
    # HELP process_resident_memory_bytes Resident memory size in bytes.
    # TYPE process_resident_memory_bytes gauge
    process_resident_memory_bytes 33910784
    # HELP process_open_fds Number of open file descriptors.
    # TYPE process_open_fds gauge
    process_open_fds 28
    # HELP process_max_fds Maximum number of open file descriptors.
    # TYPE process_max_fds gauge
    process_max_fds 1048576
    # HELP proxy_build_info Proxy build info
    # TYPE proxy_build_info gauge
    proxy_build_info{version="2.213.0",git_sha="9f7e7ac",profile="release",date="2023-11-16T23:24:26Z",vendor="linkerd"} 1

After

    # HELP proxy_build_info Proxy build info.
    # TYPE proxy_build_info gauge
    proxy_build_info{date="2023-12-06T02:15:30Z",git_sha="9c29322d5",profile="release",vendor="code@ver-sea",version="0.0.0-dev.9c29322d5"} 1
    # HELP process_start_time_seconds Time that the process started (in seconds since the UNIX epoch).
    # TYPE process_start_time_seconds gauge
    # UNIT process_start_time_seconds seconds
    process_start_time_seconds 1701829321.4647413
    # HELP process_uptime_seconds Total time since the process started (in seconds)
    # TYPE process_uptime_seconds counter
    # UNIT process_uptime_seconds seconds
    process_uptime_seconds_total 51.986633717
    # HELP process_cpu_seconds Total user and system CPU time spent in seconds
    # TYPE process_cpu_seconds counter
    # UNIT process_cpu_seconds seconds
    process_cpu_seconds_total 0.04
    # HELP process_virtual_memory_bytes Virtual memory size in bytes
    # TYPE process_virtual_memory_bytes gauge
    # UNIT process_virtual_memory_bytes bytes
    process_virtual_memory_bytes 108208128
    # HELP process_resident_memory_bytes Resident memory size in bytes
    # TYPE process_resident_memory_bytes gauge
    # UNIT process_resident_memory_bytes bytes
    process_resident_memory_bytes 27471872
    # HELP process_open_fds Number of open file descriptors
    # TYPE process_open_fds gauge
    process_open_fds 21
    # HELP process_max_fds Maximum number of open file descriptors
    # TYPE process_max_fds gauge
    process_max_fds 1048576
    # HELP process_threads Number of OS threads in the process.
    # TYPE process_threads gauge
    process_threads 2
    # EOF
@olix0r olix0r merged commit 31b2aea into main Dec 6, 2023
@olix0r olix0r deleted the ver/prom-process branch December 6, 2023 03:12
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Dec 13, 2023
This change culminates recent work to restructure the balancer to use a
PoolQueue so that balancer changes may occur independently of request
processing. This replaces independent discovery buffering so that the
balancer task is responsible for polling discovery streams without
independent buffering. Requests are buffered and processed as soon as
the pool has available backends. Fail-fast circuit breaking is enforced
on the balancer's queue so that requests can't get stuck in a queue
indefinitely.

In general, the new balancer is instrumented directly with metrics, and
the relevant metric name prefix and labelset is provided by the stack.
In addition to detailed queue metrics including request (in-queue)
latency histograms, but also failfast states, discovery updates counts,
and balancer endpoint pool sizes.

---

* outbound: Move queues into the concrete stack (linkerd/linkerd2-proxy#2539)
* metrics: Remove unused features (linkerd/linkerd2-proxy#2542)
* Add the PoolQueue middleware (linkerd/linkerd2-proxy#2540)
* ci: Fixup codecov config (linkerd/linkerd2-proxy#2545)
* ci: Cancel prior runs (linkerd/linkerd2-proxy#2546)
* ci: Skip ARM builds during non-release CI (linkerd/linkerd2-proxy#2547)
* deps: Update tokio, tonic, and prost (linkerd/linkerd2-proxy#2544)
* build(deps): bump tj-actions/changed-files from 40.2.0 to 40.2.1 (linkerd/linkerd2-proxy#2549)
* metrics: Use prometheus-client for proxy_build_info (linkerd/linkerd2-proxy#2551)
* balance: Add a p2c Pool implementation (linkerd/linkerd2-proxy#2541)
* metrics: Export process metrics using prometheus-client (linkerd/linkerd2-proxy#2552)
* linkerd_identity: split `linkerd_identity::Id` into DNS and URI variants (linkerd/linkerd2-proxy#2538)
* outbound: Move HTTP balancer into its own module (linkerd/linkerd2-proxy#2554)
* app: Setup prom registry for use in balancers (linkerd/linkerd2-proxy#2555)
* vscode: Move workspace settings to devcontainer (linkerd/linkerd2-proxy#2557)
* build(deps): bump tj-actions/changed-files from 40.2.1 to 40.2.2 (linkerd/linkerd2-proxy#2556)
* balance: Instrument metrics in pool balancer (linkerd/linkerd2-proxy#2558)
* Enable PoolQueue balancer (linkerd/linkerd2-proxy#2559)

Signed-off-by: Oliver Gould <[email protected]>
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Dec 14, 2023
This change culminates recent work to restructure the balancer to use a
PoolQueue so that balancer changes may occur independently of request
processing. This replaces independent discovery buffering so that the
balancer task is responsible for polling discovery streams without
independent buffering. Requests are buffered and processed as soon as
the pool has available backends. Fail-fast circuit breaking is enforced
on the balancer's queue so that requests can't get stuck in a queue
indefinitely.

In general, the new balancer is instrumented directly with metrics, and
the relevant metric name prefix and labelset is provided by the stack.
In addition to detailed queue metrics including request (in-queue)
latency histograms, but also failfast states, discovery updates counts,
and balancer endpoint pool sizes.

---

* outbound: Move queues into the concrete stack (linkerd/linkerd2-proxy#2539)
* metrics: Remove unused features (linkerd/linkerd2-proxy#2542)
* Add the PoolQueue middleware (linkerd/linkerd2-proxy#2540)
* ci: Fixup codecov config (linkerd/linkerd2-proxy#2545)
* ci: Cancel prior runs (linkerd/linkerd2-proxy#2546)
* ci: Skip ARM builds during non-release CI (linkerd/linkerd2-proxy#2547)
* deps: Update tokio, tonic, and prost (linkerd/linkerd2-proxy#2544)
* build(deps): bump tj-actions/changed-files from 40.2.0 to 40.2.1 (linkerd/linkerd2-proxy#2549)
* metrics: Use prometheus-client for proxy_build_info (linkerd/linkerd2-proxy#2551)
* balance: Add a p2c Pool implementation (linkerd/linkerd2-proxy#2541)
* metrics: Export process metrics using prometheus-client (linkerd/linkerd2-proxy#2552)
* linkerd_identity: split `linkerd_identity::Id` into DNS and URI variants (linkerd/linkerd2-proxy#2538)
* outbound: Move HTTP balancer into its own module (linkerd/linkerd2-proxy#2554)
* app: Setup prom registry for use in balancers (linkerd/linkerd2-proxy#2555)
* vscode: Move workspace settings to devcontainer (linkerd/linkerd2-proxy#2557)
* build(deps): bump tj-actions/changed-files from 40.2.1 to 40.2.2 (linkerd/linkerd2-proxy#2556)
* balance: Instrument metrics in pool balancer (linkerd/linkerd2-proxy#2558)
* Enable PoolQueue balancer (linkerd/linkerd2-proxy#2559)

Signed-off-by: Oliver Gould <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant