Skip to content

[FEATURE] AWS SD: Add MSK Role#17600

Merged
SuperQ merged 1 commit intoprometheus:mainfrom
matt-gp:aws-msk-service-discovery
Feb 5, 2026
Merged

[FEATURE] AWS SD: Add MSK Role#17600
SuperQ merged 1 commit intoprometheus:mainfrom
matt-gp:aws-msk-service-discovery

Conversation

@matt-gp
Copy link
Copy Markdown
Collaborator

@matt-gp matt-gp commented Nov 22, 2025

AWS offers a Kafka service called Managed Streaming Kafka (MSK). This allows users to run Kafka clusters without having the overhead of managing the infrastructure themselves.

As part of the monitoring configuration for these clusters, you can enable the JMX Exporter and/or Node Exporter, so that the nodes will expose metrics that users can collect.

There are some caveats with this:

  • Only works with provisioned clusters, eg not serverless.
  • Only Brokers expose Node Exporter metrics.
  • Only Brokers and kRaft Controller nodes expose JMX Exporter metrics, eg not Zookeeper nodes.

Which issue(s) does the PR fix:

Does this PR introduce a user-facing change?

[FEATURE] AWS SD: Add MSK Role #17600

@matt-gp matt-gp force-pushed the aws-msk-service-discovery branch 7 times, most recently from 9d399e9 to d88b2a1 Compare November 22, 2025 15:35
@matt-gp matt-gp changed the title AWS SD: Add MSK Role [FEATURE] AWS SD: Add MSK Role Nov 22, 2025
@krajorama
Copy link
Copy Markdown
Member

Hello from the bug scrub!

We haven't identified a reviewer at this time. Who would be a good candidate? Also pinging @sysadmind @alanprot possibly?

@matt-gp
Copy link
Copy Markdown
Collaborator Author

matt-gp commented Nov 25, 2025

@SuperQ Might be a good reviewer as he has reviewed some of the other AWS Service Discovery.

@SuperQ SuperQ self-requested a review November 25, 2025 15:40
@matt-gp matt-gp force-pushed the aws-msk-service-discovery branch from d88b2a1 to f48cb27 Compare November 25, 2025 20:20
@matt-gp matt-gp requested a review from SuperQ November 25, 2025 20:22
@matt-gp matt-gp force-pushed the aws-msk-service-discovery branch 2 times, most recently from a57ffbb to 200045a Compare November 25, 2025 21:36
@matt-gp matt-gp force-pushed the aws-msk-service-discovery branch 4 times, most recently from 1a82c0f to dbfefd7 Compare December 19, 2025 15:29
@matt-gp
Copy link
Copy Markdown
Collaborator Author

matt-gp commented Jan 2, 2026

We should rebase on top of #17769 and then add a dedicated MSK test to TestMultipleSDConfigsDoNotShareState

@matt-gp matt-gp marked this pull request as draft January 2, 2026 14:10
@matt-gp matt-gp force-pushed the aws-msk-service-discovery branch 4 times, most recently from bb56360 to 18a5794 Compare January 22, 2026 20:09
@matt-gp matt-gp marked this pull request as ready for review January 22, 2026 20:49
@matt-gp matt-gp force-pushed the aws-msk-service-discovery branch from 18a5794 to 636c2b0 Compare January 22, 2026 20:53
@matt-gp matt-gp force-pushed the aws-msk-service-discovery branch from 636c2b0 to b46a8bf Compare January 22, 2026 21:20
@matt-gp
Copy link
Copy Markdown
Collaborator Author

matt-gp commented Jan 22, 2026

@SuperQ would you be able to take another look when you get a chance?

@matt-gp matt-gp force-pushed the aws-msk-service-discovery branch from b46a8bf to a65d075 Compare January 28, 2026 18:47
@matt-gp matt-gp requested review from a team and sysadmind as code owners January 28, 2026 18:47
@matt-gp
Copy link
Copy Markdown
Collaborator Author

matt-gp commented Jan 28, 2026

@sysadmind would you be able to look when you get a chance?

Copy link
Copy Markdown
Contributor

@sysadmind sysadmind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks okay to me. I don't have much ability to support MSK myself, so this would mostly be on you, @matt-gp to maintain.

@matt-gp
Copy link
Copy Markdown
Collaborator Author

matt-gp commented Jan 31, 2026

Hi, I have an immediate need for this so am happy to maintain it.

@matt-gp
Copy link
Copy Markdown
Collaborator Author

matt-gp commented Feb 5, 2026

@SuperQ Sorry to nudge you, are you able to take a look? I have another PR that this is holding up. Thanks

Copy link
Copy Markdown
Member

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, sorry, this review got lost in my inbox.

@SuperQ SuperQ merged commit cf9d093 into prometheus:main Feb 5, 2026
32 checks passed
@matt-gp matt-gp deleted the aws-msk-service-discovery branch February 5, 2026 14:59
wbollock pushed a commit to wbollock/prometheus that referenced this pull request Feb 6, 2026
…timestamp) (prometheus#17411) (prometheus#17600)

Relates to
prometheus#16944 (comment)

Signed-off-by: bwplotka <[email protected]>
Signed-off-by: matt-gp <[email protected]>
Co-authored-by: Bartlomiej Plotka <[email protected]>
Signed-off-by: Will Bollock <[email protected]>
wbollock pushed a commit to wbollock/prometheus that referenced this pull request Feb 6, 2026
wbollock pushed a commit to wbollock/prometheus that referenced this pull request Feb 6, 2026
wbollock pushed a commit to wbollock/prometheus that referenced this pull request Feb 6, 2026
wbollock pushed a commit to wbollock/prometheus that referenced this pull request Feb 6, 2026
wbollock pushed a commit to wbollock/prometheus that referenced this pull request Feb 6, 2026
renovate bot added a commit to sdwilsh/ansible-playbooks that referenced this pull request Mar 12, 2026
##### [\`v3.10.0\`](https://github.com/prometheus/prometheus/releases/tag/v3.10.0)

Prometheus now offers a distroless Docker image variant alongside the default
busybox image. The distroless variant provides enhanced security with a minimal
base image, uses UID/GID 65532 (nonroot) instead of nobody, and removes the
VOLUME declaration. Both variants are available with `-busybox` and `-distroless`
tag suffixes (e.g., `prom/prometheus:latest-busybox`, `prom/prometheus:latest-distroless`).
The busybox image remains the default with no suffix for backwards compatibility
(e.g., `prom/prometheus:latest` points to the busybox variant).

For users migrating existing **named** volumes from the busybox image to the distroless variant, the ownership can be adjusted with:

```
docker run --rm -v prometheus-data:/prometheus alpine chown -R 65532:65532 /prometheus
```

Then, the container can be started with the old volume with:

```
docker run -v prometheus-data:/prometheus prom/prometheus:latest-distroless
```

User migrating from bind mounts might need to ajust permissions too, depending on their setup.

- \[CHANGE] Alerting: Add `alertmanager` dimension to following metrics: `prometheus_notifications_dropped_total`, `prometheus_notifications_queue_capacity`, `prometheus_notifications_queue_length`. [#16355](prometheus/prometheus#16355)
- \[CHANGE] UI: Hide expanded alert annotations by default, enabling more information density on the `/alerts` page. [#17611](prometheus/prometheus#17611)
- \[FEATURE] AWS SD: Add MSK Role. [#17600](prometheus/prometheus#17600)
- \[FEATURE] PromQL: Add `fill()` / `fill_left()` / `fill_right()` binop modifiers for specifying default values for missing series. [#17644](prometheus/prometheus#17644)
- \[FEATURE] Web: Add OpenAPI 3.2 specification for the HTTP API at `/api/v1/openapi.yaml`. [#17825](prometheus/prometheus#17825)
- \[FEATURE] Dockerfile: Add distroless image variant using UID/GID 65532 and no VOLUME declaration. Busybox image remains default. [#17876](prometheus/prometheus#17876)
- \[FEATURE] Web: Add on-demand wall time profiling under `<URL>/debug/pprof/fgprof`. [#18027](prometheus/prometheus#18027)
- \[ENHANCEMENT] PromQL: Add more detail to histogram quantile monotonicity info annotations. [#15578](prometheus/prometheus#15578)
- \[ENHANCEMENT] Alerting: Independent alertmanager sendloops. [#16355](prometheus/prometheus#16355)
- \[ENHANCEMENT] TSDB: Experimental support for early compaction of stale series in the memory with configurable threshold `stale_series_compaction_threshold` in the config file. [#16929](prometheus/prometheus#16929)
- \[ENHANCEMENT] Service Discovery: Service discoveries are now removable from the Prometheus binary through the Go build tag `remove_all_sd` and individual service discoveries can be re-added with the build tags `enable_<sd name>_sd`. Users can build a custom Prometheus with only the necessary SDs for a smaller binary size. [#17736](prometheus/prometheus#17736)
- \[ENHANCEMENT] Promtool: Support promql syntax features `promql-duration-expr` and `promql-extended-range-selectors`. [#17926](prometheus/prometheus#17926)
- \[PERF] PromQL: Avoid unnecessary label extraction in PromQL functions. [#17676](prometheus/prometheus#17676)
- \[PERF] PromQL: Improve performance of regex matchers like `.*-.*-.*`. [#17707](prometheus/prometheus#17707)
- \[PERF] OTLP: Add label caching for OTLP-to-Prometheus conversion to reduce allocations and improve latency. [#17860](prometheus/prometheus#17860)
- \[PERF] API: Compute `/api/v1/targets/relabel_steps` in a single pass instead of re-running relabeling for each prefix. [#17969](prometheus/prometheus#17969)
- \[PERF] tsdb: Optimize LabelValues intersection performance for matchers. [#18069](prometheus/prometheus#18069)
- \[BUGFIX] PromQL: Prevent query strings containing only UTF-8 continuation bytes from crashing Prometheus. [#17735](prometheus/prometheus#17735)
- \[BUGFIX] Web: Fix missing `X-Prometheus-Stopping` header for `/-/ready` endpoint in `NotReady` state. [#17795](prometheus/prometheus#17795)
- \[BUGFIX] PromQL: Fix PromQL `info()` function returning empty results when filtering by a label that exists on both the input metric and `target_info`. [#17817](prometheus/prometheus#17817)
- \[BUGFIX] TSDB: Fix a bug during exemplar buffer grow/shrink that could cause exemplars to be incorrectly discarded. [#17863](prometheus/prometheus#17863)
- \[BUGFIX] UI: Fix broken graph display after page reload, due to broken Y axis min encoding/decoding. [#17869](prometheus/prometheus#17869)
- \[BUGFIX] TSDB: Fix memory leaks in buffer pools by clearing reference fields (Labels, Histogram pointers, metadata strings) before returning buffers to pools. [#17879](prometheus/prometheus#17879)
- \[BUGFIX] PromQL: info function: fix series without identifying labels not being returned. [#17898](prometheus/prometheus#17898)
- \[BUGFIX] OTLP: Filter `__name__` from OTLP attributes to prevent duplicate labels. [#17917](prometheus/prometheus#17917)
- \[BUGFIX] TSDB: Fix division by zero when computing stale series ratio with empty head. [#17952](prometheus/prometheus#17952)
- \[BUGFIX] OTLP: Fix potential silent data loss for sum metrics. [#17954](prometheus/prometheus#17954)
- \[BUGFIX] PromQL: Fix smoothed interpolation across counter resets. [#17988](prometheus/prometheus#17988)
- \[BUGFIX] PromQL: Fix panic with `@` modifier on empty ranges. [#18020](prometheus/prometheus#18020)
- \[BUGFIX] PromQL: Fix `avg_over_time` for a single native histogram. [#18058](prometheus/prometheus#18058)
renovate bot added a commit to sdwilsh/ansible-playbooks that referenced this pull request Mar 13, 2026
##### [\`v3.10.0\`](https://github.com/prometheus/prometheus/releases/tag/v3.10.0)

Prometheus now offers a distroless Docker image variant alongside the default
busybox image. The distroless variant provides enhanced security with a minimal
base image, uses UID/GID 65532 (nonroot) instead of nobody, and removes the
VOLUME declaration. Both variants are available with `-busybox` and `-distroless`
tag suffixes (e.g., `prom/prometheus:latest-busybox`, `prom/prometheus:latest-distroless`).
The busybox image remains the default with no suffix for backwards compatibility
(e.g., `prom/prometheus:latest` points to the busybox variant).

For users migrating existing **named** volumes from the busybox image to the distroless variant, the ownership can be adjusted with:

```
docker run --rm -v prometheus-data:/prometheus alpine chown -R 65532:65532 /prometheus
```

Then, the container can be started with the old volume with:

```
docker run -v prometheus-data:/prometheus prom/prometheus:latest-distroless
```

User migrating from bind mounts might need to ajust permissions too, depending on their setup.

- \[CHANGE] Alerting: Add `alertmanager` dimension to following metrics: `prometheus_notifications_dropped_total`, `prometheus_notifications_queue_capacity`, `prometheus_notifications_queue_length`. [#16355](prometheus/prometheus#16355)
- \[CHANGE] UI: Hide expanded alert annotations by default, enabling more information density on the `/alerts` page. [#17611](prometheus/prometheus#17611)
- \[FEATURE] AWS SD: Add MSK Role. [#17600](prometheus/prometheus#17600)
- \[FEATURE] PromQL: Add `fill()` / `fill_left()` / `fill_right()` binop modifiers for specifying default values for missing series. [#17644](prometheus/prometheus#17644)
- \[FEATURE] Web: Add OpenAPI 3.2 specification for the HTTP API at `/api/v1/openapi.yaml`. [#17825](prometheus/prometheus#17825)
- \[FEATURE] Dockerfile: Add distroless image variant using UID/GID 65532 and no VOLUME declaration. Busybox image remains default. [#17876](prometheus/prometheus#17876)
- \[FEATURE] Web: Add on-demand wall time profiling under `<URL>/debug/pprof/fgprof`. [#18027](prometheus/prometheus#18027)
- \[ENHANCEMENT] PromQL: Add more detail to histogram quantile monotonicity info annotations. [#15578](prometheus/prometheus#15578)
- \[ENHANCEMENT] Alerting: Independent alertmanager sendloops. [#16355](prometheus/prometheus#16355)
- \[ENHANCEMENT] TSDB: Experimental support for early compaction of stale series in the memory with configurable threshold `stale_series_compaction_threshold` in the config file. [#16929](prometheus/prometheus#16929)
- \[ENHANCEMENT] Service Discovery: Service discoveries are now removable from the Prometheus binary through the Go build tag `remove_all_sd` and individual service discoveries can be re-added with the build tags `enable_<sd name>_sd`. Users can build a custom Prometheus with only the necessary SDs for a smaller binary size. [#17736](prometheus/prometheus#17736)
- \[ENHANCEMENT] Promtool: Support promql syntax features `promql-duration-expr` and `promql-extended-range-selectors`. [#17926](prometheus/prometheus#17926)
- \[PERF] PromQL: Avoid unnecessary label extraction in PromQL functions. [#17676](prometheus/prometheus#17676)
- \[PERF] PromQL: Improve performance of regex matchers like `.*-.*-.*`. [#17707](prometheus/prometheus#17707)
- \[PERF] OTLP: Add label caching for OTLP-to-Prometheus conversion to reduce allocations and improve latency. [#17860](prometheus/prometheus#17860)
- \[PERF] API: Compute `/api/v1/targets/relabel_steps` in a single pass instead of re-running relabeling for each prefix. [#17969](prometheus/prometheus#17969)
- \[PERF] tsdb: Optimize LabelValues intersection performance for matchers. [#18069](prometheus/prometheus#18069)
- \[BUGFIX] PromQL: Prevent query strings containing only UTF-8 continuation bytes from crashing Prometheus. [#17735](prometheus/prometheus#17735)
- \[BUGFIX] Web: Fix missing `X-Prometheus-Stopping` header for `/-/ready` endpoint in `NotReady` state. [#17795](prometheus/prometheus#17795)
- \[BUGFIX] PromQL: Fix PromQL `info()` function returning empty results when filtering by a label that exists on both the input metric and `target_info`. [#17817](prometheus/prometheus#17817)
- \[BUGFIX] TSDB: Fix a bug during exemplar buffer grow/shrink that could cause exemplars to be incorrectly discarded. [#17863](prometheus/prometheus#17863)
- \[BUGFIX] UI: Fix broken graph display after page reload, due to broken Y axis min encoding/decoding. [#17869](prometheus/prometheus#17869)
- \[BUGFIX] TSDB: Fix memory leaks in buffer pools by clearing reference fields (Labels, Histogram pointers, metadata strings) before returning buffers to pools. [#17879](prometheus/prometheus#17879)
- \[BUGFIX] PromQL: info function: fix series without identifying labels not being returned. [#17898](prometheus/prometheus#17898)
- \[BUGFIX] OTLP: Filter `__name__` from OTLP attributes to prevent duplicate labels. [#17917](prometheus/prometheus#17917)
- \[BUGFIX] TSDB: Fix division by zero when computing stale series ratio with empty head. [#17952](prometheus/prometheus#17952)
- \[BUGFIX] OTLP: Fix potential silent data loss for sum metrics. [#17954](prometheus/prometheus#17954)
- \[BUGFIX] PromQL: Fix smoothed interpolation across counter resets. [#17988](prometheus/prometheus#17988)
- \[BUGFIX] PromQL: Fix panic with `@` modifier on empty ranges. [#18020](prometheus/prometheus#18020)
- \[BUGFIX] PromQL: Fix `avg_over_time` for a single native histogram. [#18058](prometheus/prometheus#18058)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants