Skip to content

Make service discoveries removable through build tags#17736

Merged
ArthurSens merged 7 commits intomainfrom
removesds-buildtags
Jan 8, 2026
Merged

Make service discoveries removable through build tags#17736
ArthurSens merged 7 commits intomainfrom
removesds-buildtags

Conversation

@ArthurSens
Copy link
Copy Markdown
Member

@ArthurSens ArthurSens commented Dec 24, 2025

Prometheus ships with quite a few service discovery mechanisms, each bringing its own dependencies (AWS SDK, Azure SDK, Kubernetes client-go, etc.). This results in a ~170MB binary even for users who only need a subset of these SDs.
This PR introduces build tags that allow users to exclude specific service discoveries at compile time, reducing binary size for deployments that don't require all SD mechanisms.

This is useful for:

  • Edge/IoT deployments with limited storage
  • Single-cloud deployments that only need one cloud provider's SD
  • Other systems that import the Prometheus service discovery codebase, like the OTel-Collector, that want to reduce their binary size as well.

What I'm doing here is splitting the codebase that imports service discoveries into several files, each with its own build tag for exclusion, so that all service discoveries continue to be enabled by default.

Results

Build Configuration Binary Size Reduction
All SDs 170 MB
No k8s/aws/azure/gce 66 MB -61%
file+http only 49 MB -71%

Does this PR introduce a user-facing change?

[ENHANCEMENT] Service Discovery: Service discoveries are now removable from the Prometheus binary through the Go build tag `remove_all_sd` and individual service discoveries can be re-added with the build tags `enable_<sd name>_sd`. Users can build a custom Prometheus with only the necessary SDs for a smaller binary size.

@roidelapluie
Copy link
Copy Markdown
Member

I fail to understand the logic behind this.

Why do you import service discoveries you do not need?

Copy link
Copy Markdown
Member

@krajorama krajorama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved but see @roidelapluie question

@ArthurSens
Copy link
Copy Markdown
Member Author

ArthurSens commented Jan 3, 2026

I fail to understand the logic behind this.

Why do you import service discoveries you do not need?

Could you elaborate? I'm struggling to understand the question... The change I'm doing is to enable not importing SDs you don't need 🤔

Before this change, all SD's imports are in the same file and they are all registered together by their init() function, like here for example. With the change in this PR I'm splitting the imports into separate files and each file have their respective tags for exclusion

@ArthurSens
Copy link
Copy Markdown
Member Author

One thing that I think might be missing here is documentation, I'm not sure where's the right place for that though

@SuperQ
Copy link
Copy Markdown
Member

SuperQ commented Jan 3, 2026

I think @roidelapluie is asking about why the existing plugins.yml / generator is not sufficient to configure the discovery inclusion.

This is what we've used so far to build customized binaries.

@ArthurSens
Copy link
Copy Markdown
Member Author

I think @roidelapluie is asking about why the existing plugins.yml / generator is not sufficient to configure the discovery inclusion.

This is what we've used so far to build customized binaries.

Oh gotcha! To be honest, I didn't notice that plugins.yml was already one way of doing this 😮. It doesn't work for downstream projects that use Prometheus Scrape manager as a library, though.

@SuperQ
Copy link
Copy Markdown
Member

SuperQ commented Jan 3, 2026

I think build tags are probably a better Go-native way to do the configuration. If we do this, we should remove the plugins.yml setup so we keep things consistent.

@roidelapluie
Copy link
Copy Markdown
Member

I think build tags are probably a better Go-native way to do the configuration. If we do this, we should remove the plugins.yml setup so we keep things consistent.

Yes, I think keeping both would be a distraction.

Oh gotcha! To be honest, I didn't notice that plugins.yml was already one way of doing this 😮. It doesn't work for downstream projects that use Prometheus Scrape manager as a library, though.

In the initial design, those projects should simply not import the libraries then and have their own "plugins" package which imports the SD they want.

@ArthurSens
Copy link
Copy Markdown
Member Author

Oh gotcha! To be honest, I didn't notice that plugins.yml was already one way of doing this 😮. It doesn't work for downstream projects that use Prometheus Scrape manager as a library, though.

In the initial design, those projects should simply not import the libraries then and have their own "plugins" package which imports the SD they want.

Got it, I guess that works for most use cases with exception for the OTel Collector 🙈. The collector was designed to be as customizable as possible during build time, our goal there is to let users build their own collectors and not decide things for them

Signed-off-by: Arthur Silva Sens <[email protected]>
Signed-off-by: Arthur Silva Sens <[email protected]>
Signed-off-by: Arthur Silva Sens <[email protected]>
Signed-off-by: Arthur Silva Sens <[email protected]>
Signed-off-by: Arthur Silva Sens <[email protected]>
@ArthurSens ArthurSens force-pushed the removesds-buildtags branch from bec9083 to f198b86 Compare January 5, 2026 14:01
Co-authored-by: Julien <[email protected]>
Signed-off-by: Arthur Silva Sens <[email protected]>
@ArthurSens ArthurSens merged commit 14de1eb into main Jan 8, 2026
63 of 64 checks passed
@ArthurSens ArthurSens deleted the removesds-buildtags branch January 8, 2026 13:06
atoulme pushed a commit to open-telemetry/opentelemetry-collector-contrib that referenced this pull request Jan 21, 2026
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
This PR is built on top of
prometheus/prometheus#17736. There are a lot of
go mod changes, and that also inherits fixes to interface changes that
were made since Prometheus 3.8

The only change that is really about making SDs removable is this change
in imports in `factory.go`:
```diff
-	_ "github.com/prometheus/prometheus/discovery/install" // init() of this package registers service discovery impl.
+	_ "github.com/prometheus/prometheus/plugins" // init() of this package registers service discovery impl.
```

<!-- Issue number (e.g. #1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes #44406

<!--Describe what testing was performed and which tests were added.-->
#### Testing

Used two builder configs and measured the binary size difference:

Without build tags(120MB):
```yaml
dist:
  module: github.com/open-telemetry/otelcontribcol
  name: otelcontribcol
  version: 0.143.0
  output_path: ./cmd/otelcontribcol/sd-test/full

receivers:
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver v0.143.0
    path: ./receiver/prometheusreceiver

exporters:
  - gomod: go.opentelemetry.io/collector/exporter/debugexporter v0.143.0
```

With build tags(63MG):
```yaml
dist:
  module: github.com/open-telemetry/otelcontribcol
  name: otelcontribcol
  version: 0.143.0
  output_path: ./cmd/otelcontribcol/sd-test/no-sd
  build_tags: "remove_all_sd"

receivers:
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver v0.143.0
    path: ./receiver/prometheusreceiver

exporters:
  - gomod: go.opentelemetry.io/collector/exporter/debugexporter v0.143.0
```

---------

Signed-off-by: Arthur Silva Sens <[email protected]>
renovate bot added a commit to sdwilsh/ansible-playbooks that referenced this pull request Mar 12, 2026
##### [\`v3.10.0\`](https://github.com/prometheus/prometheus/releases/tag/v3.10.0)

Prometheus now offers a distroless Docker image variant alongside the default
busybox image. The distroless variant provides enhanced security with a minimal
base image, uses UID/GID 65532 (nonroot) instead of nobody, and removes the
VOLUME declaration. Both variants are available with `-busybox` and `-distroless`
tag suffixes (e.g., `prom/prometheus:latest-busybox`, `prom/prometheus:latest-distroless`).
The busybox image remains the default with no suffix for backwards compatibility
(e.g., `prom/prometheus:latest` points to the busybox variant).

For users migrating existing **named** volumes from the busybox image to the distroless variant, the ownership can be adjusted with:

```
docker run --rm -v prometheus-data:/prometheus alpine chown -R 65532:65532 /prometheus
```

Then, the container can be started with the old volume with:

```
docker run -v prometheus-data:/prometheus prom/prometheus:latest-distroless
```

User migrating from bind mounts might need to ajust permissions too, depending on their setup.

- \[CHANGE] Alerting: Add `alertmanager` dimension to following metrics: `prometheus_notifications_dropped_total`, `prometheus_notifications_queue_capacity`, `prometheus_notifications_queue_length`. [#16355](prometheus/prometheus#16355)
- \[CHANGE] UI: Hide expanded alert annotations by default, enabling more information density on the `/alerts` page. [#17611](prometheus/prometheus#17611)
- \[FEATURE] AWS SD: Add MSK Role. [#17600](prometheus/prometheus#17600)
- \[FEATURE] PromQL: Add `fill()` / `fill_left()` / `fill_right()` binop modifiers for specifying default values for missing series. [#17644](prometheus/prometheus#17644)
- \[FEATURE] Web: Add OpenAPI 3.2 specification for the HTTP API at `/api/v1/openapi.yaml`. [#17825](prometheus/prometheus#17825)
- \[FEATURE] Dockerfile: Add distroless image variant using UID/GID 65532 and no VOLUME declaration. Busybox image remains default. [#17876](prometheus/prometheus#17876)
- \[FEATURE] Web: Add on-demand wall time profiling under `<URL>/debug/pprof/fgprof`. [#18027](prometheus/prometheus#18027)
- \[ENHANCEMENT] PromQL: Add more detail to histogram quantile monotonicity info annotations. [#15578](prometheus/prometheus#15578)
- \[ENHANCEMENT] Alerting: Independent alertmanager sendloops. [#16355](prometheus/prometheus#16355)
- \[ENHANCEMENT] TSDB: Experimental support for early compaction of stale series in the memory with configurable threshold `stale_series_compaction_threshold` in the config file. [#16929](prometheus/prometheus#16929)
- \[ENHANCEMENT] Service Discovery: Service discoveries are now removable from the Prometheus binary through the Go build tag `remove_all_sd` and individual service discoveries can be re-added with the build tags `enable_<sd name>_sd`. Users can build a custom Prometheus with only the necessary SDs for a smaller binary size. [#17736](prometheus/prometheus#17736)
- \[ENHANCEMENT] Promtool: Support promql syntax features `promql-duration-expr` and `promql-extended-range-selectors`. [#17926](prometheus/prometheus#17926)
- \[PERF] PromQL: Avoid unnecessary label extraction in PromQL functions. [#17676](prometheus/prometheus#17676)
- \[PERF] PromQL: Improve performance of regex matchers like `.*-.*-.*`. [#17707](prometheus/prometheus#17707)
- \[PERF] OTLP: Add label caching for OTLP-to-Prometheus conversion to reduce allocations and improve latency. [#17860](prometheus/prometheus#17860)
- \[PERF] API: Compute `/api/v1/targets/relabel_steps` in a single pass instead of re-running relabeling for each prefix. [#17969](prometheus/prometheus#17969)
- \[PERF] tsdb: Optimize LabelValues intersection performance for matchers. [#18069](prometheus/prometheus#18069)
- \[BUGFIX] PromQL: Prevent query strings containing only UTF-8 continuation bytes from crashing Prometheus. [#17735](prometheus/prometheus#17735)
- \[BUGFIX] Web: Fix missing `X-Prometheus-Stopping` header for `/-/ready` endpoint in `NotReady` state. [#17795](prometheus/prometheus#17795)
- \[BUGFIX] PromQL: Fix PromQL `info()` function returning empty results when filtering by a label that exists on both the input metric and `target_info`. [#17817](prometheus/prometheus#17817)
- \[BUGFIX] TSDB: Fix a bug during exemplar buffer grow/shrink that could cause exemplars to be incorrectly discarded. [#17863](prometheus/prometheus#17863)
- \[BUGFIX] UI: Fix broken graph display after page reload, due to broken Y axis min encoding/decoding. [#17869](prometheus/prometheus#17869)
- \[BUGFIX] TSDB: Fix memory leaks in buffer pools by clearing reference fields (Labels, Histogram pointers, metadata strings) before returning buffers to pools. [#17879](prometheus/prometheus#17879)
- \[BUGFIX] PromQL: info function: fix series without identifying labels not being returned. [#17898](prometheus/prometheus#17898)
- \[BUGFIX] OTLP: Filter `__name__` from OTLP attributes to prevent duplicate labels. [#17917](prometheus/prometheus#17917)
- \[BUGFIX] TSDB: Fix division by zero when computing stale series ratio with empty head. [#17952](prometheus/prometheus#17952)
- \[BUGFIX] OTLP: Fix potential silent data loss for sum metrics. [#17954](prometheus/prometheus#17954)
- \[BUGFIX] PromQL: Fix smoothed interpolation across counter resets. [#17988](prometheus/prometheus#17988)
- \[BUGFIX] PromQL: Fix panic with `@` modifier on empty ranges. [#18020](prometheus/prometheus#18020)
- \[BUGFIX] PromQL: Fix `avg_over_time` for a single native histogram. [#18058](prometheus/prometheus#18058)
renovate bot added a commit to sdwilsh/ansible-playbooks that referenced this pull request Mar 13, 2026
##### [\`v3.10.0\`](https://github.com/prometheus/prometheus/releases/tag/v3.10.0)

Prometheus now offers a distroless Docker image variant alongside the default
busybox image. The distroless variant provides enhanced security with a minimal
base image, uses UID/GID 65532 (nonroot) instead of nobody, and removes the
VOLUME declaration. Both variants are available with `-busybox` and `-distroless`
tag suffixes (e.g., `prom/prometheus:latest-busybox`, `prom/prometheus:latest-distroless`).
The busybox image remains the default with no suffix for backwards compatibility
(e.g., `prom/prometheus:latest` points to the busybox variant).

For users migrating existing **named** volumes from the busybox image to the distroless variant, the ownership can be adjusted with:

```
docker run --rm -v prometheus-data:/prometheus alpine chown -R 65532:65532 /prometheus
```

Then, the container can be started with the old volume with:

```
docker run -v prometheus-data:/prometheus prom/prometheus:latest-distroless
```

User migrating from bind mounts might need to ajust permissions too, depending on their setup.

- \[CHANGE] Alerting: Add `alertmanager` dimension to following metrics: `prometheus_notifications_dropped_total`, `prometheus_notifications_queue_capacity`, `prometheus_notifications_queue_length`. [#16355](prometheus/prometheus#16355)
- \[CHANGE] UI: Hide expanded alert annotations by default, enabling more information density on the `/alerts` page. [#17611](prometheus/prometheus#17611)
- \[FEATURE] AWS SD: Add MSK Role. [#17600](prometheus/prometheus#17600)
- \[FEATURE] PromQL: Add `fill()` / `fill_left()` / `fill_right()` binop modifiers for specifying default values for missing series. [#17644](prometheus/prometheus#17644)
- \[FEATURE] Web: Add OpenAPI 3.2 specification for the HTTP API at `/api/v1/openapi.yaml`. [#17825](prometheus/prometheus#17825)
- \[FEATURE] Dockerfile: Add distroless image variant using UID/GID 65532 and no VOLUME declaration. Busybox image remains default. [#17876](prometheus/prometheus#17876)
- \[FEATURE] Web: Add on-demand wall time profiling under `<URL>/debug/pprof/fgprof`. [#18027](prometheus/prometheus#18027)
- \[ENHANCEMENT] PromQL: Add more detail to histogram quantile monotonicity info annotations. [#15578](prometheus/prometheus#15578)
- \[ENHANCEMENT] Alerting: Independent alertmanager sendloops. [#16355](prometheus/prometheus#16355)
- \[ENHANCEMENT] TSDB: Experimental support for early compaction of stale series in the memory with configurable threshold `stale_series_compaction_threshold` in the config file. [#16929](prometheus/prometheus#16929)
- \[ENHANCEMENT] Service Discovery: Service discoveries are now removable from the Prometheus binary through the Go build tag `remove_all_sd` and individual service discoveries can be re-added with the build tags `enable_<sd name>_sd`. Users can build a custom Prometheus with only the necessary SDs for a smaller binary size. [#17736](prometheus/prometheus#17736)
- \[ENHANCEMENT] Promtool: Support promql syntax features `promql-duration-expr` and `promql-extended-range-selectors`. [#17926](prometheus/prometheus#17926)
- \[PERF] PromQL: Avoid unnecessary label extraction in PromQL functions. [#17676](prometheus/prometheus#17676)
- \[PERF] PromQL: Improve performance of regex matchers like `.*-.*-.*`. [#17707](prometheus/prometheus#17707)
- \[PERF] OTLP: Add label caching for OTLP-to-Prometheus conversion to reduce allocations and improve latency. [#17860](prometheus/prometheus#17860)
- \[PERF] API: Compute `/api/v1/targets/relabel_steps` in a single pass instead of re-running relabeling for each prefix. [#17969](prometheus/prometheus#17969)
- \[PERF] tsdb: Optimize LabelValues intersection performance for matchers. [#18069](prometheus/prometheus#18069)
- \[BUGFIX] PromQL: Prevent query strings containing only UTF-8 continuation bytes from crashing Prometheus. [#17735](prometheus/prometheus#17735)
- \[BUGFIX] Web: Fix missing `X-Prometheus-Stopping` header for `/-/ready` endpoint in `NotReady` state. [#17795](prometheus/prometheus#17795)
- \[BUGFIX] PromQL: Fix PromQL `info()` function returning empty results when filtering by a label that exists on both the input metric and `target_info`. [#17817](prometheus/prometheus#17817)
- \[BUGFIX] TSDB: Fix a bug during exemplar buffer grow/shrink that could cause exemplars to be incorrectly discarded. [#17863](prometheus/prometheus#17863)
- \[BUGFIX] UI: Fix broken graph display after page reload, due to broken Y axis min encoding/decoding. [#17869](prometheus/prometheus#17869)
- \[BUGFIX] TSDB: Fix memory leaks in buffer pools by clearing reference fields (Labels, Histogram pointers, metadata strings) before returning buffers to pools. [#17879](prometheus/prometheus#17879)
- \[BUGFIX] PromQL: info function: fix series without identifying labels not being returned. [#17898](prometheus/prometheus#17898)
- \[BUGFIX] OTLP: Filter `__name__` from OTLP attributes to prevent duplicate labels. [#17917](prometheus/prometheus#17917)
- \[BUGFIX] TSDB: Fix division by zero when computing stale series ratio with empty head. [#17952](prometheus/prometheus#17952)
- \[BUGFIX] OTLP: Fix potential silent data loss for sum metrics. [#17954](prometheus/prometheus#17954)
- \[BUGFIX] PromQL: Fix smoothed interpolation across counter resets. [#17988](prometheus/prometheus#17988)
- \[BUGFIX] PromQL: Fix panic with `@` modifier on empty ranges. [#18020](prometheus/prometheus#18020)
- \[BUGFIX] PromQL: Fix `avg_over_time` for a single native histogram. [#18058](prometheus/prometheus#18058)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants