Switch `bazel` remote config from executor to cache with fallback by rdesgroppes · Pull Request #47945 · DataDog/datadog-agent

rdesgroppes · 2026-03-17T16:48:14Z

What does this PR do?

Adjust the cache Bazel config following a complete review of the remote flags:

switch --remote_executor to --remote_cache since RBE infrastructure is not ready,
add --incompatible_remote_local_fallback_for_remote_cache to enable local fallback on remote cache failures,
remove --remote_local_fallback_strategy=sandboxed from the linux and macos configs, which became dead letter.

Motivation

1. `--remote_executor` to `--remote_cache`

The build farm is used as a remote cache, not a remote executor.
Quoting @JSGette, who filed bazelbuild/bazel#29129:

fallback is basically broken for remote executor if an endpoint is unavailable before the build. So if buildbarn was to fail during the build Bazel would just continue locally.

2. `--incompatible_remote_local_fallback_for_remote_cache`

The flag was introduced with Bazel 8.5.1 (bazelbuild/bazel#27996) since --remote_local_fallback alone does not cover cache-layer failures.
Both flags are required because they are AND-gated at the single decision point:

          boolean shouldLocalFallback =
              options.remoteLocalFallbackForRemoteCache && options.remoteLocalFallback;

Without both flags, a RemoteExecutionCapabilitiesException (capabilities fetch failure) causes a hard build failure instead of a local fallback.

3. `--remote_local_fallback_strategy` removal

The flag is unreachable in a --remote_cache-only setup: its single call site is RemoteSpawnRunner.execLocally(), gated by RemoteExecutionService.mayBeExecutedRemotely(), which requires remoteExecutor != null.
It is also @Deprecated (bazelbuild/bazel#7480).

Describe how you validated your changes

The flag is a no-op outside --config=cache (local builds unaffected by default), but CI benefits from it by default.

Additional Notes

The stale registerRemoteSpawnStrategy() javadoc ("otherwise does nothing" when no executor is set) is a known source-reading hazard: the null-check was deliberately removed in bazelbuild/bazel#13490 (2021), moving the guard to canExec().
The javadoc was never updated.

agent-platform-auto-pr · 2026-03-17T17:08:14Z

Files inventory check summary

File checks results against ancestor 5c143af1:

Results for datadog-agent_7.79.0~devel.git.316.4492ca3.pipeline.105301854-1_amd64.deb:

No change detected

agent-platform-auto-pr · 2026-03-17T17:18:22Z

Static quality checks

✅ Please find below the results from static quality gates
Comparison made with ancestor 5c143af
📊 Static Quality Gates Dashboard
🔗 SQG Job

31 successful checks with minimal change (< 2 KiB)

	Quality gate	Current Size
✅	agent_deb_amd64	753.088 MiB
✅	agent_deb_amd64_fips	710.028 MiB
✅	agent_heroku_amd64	313.321 MiB
✅	agent_msi	604.869 MiB
✅	agent_rpm_amd64	753.072 MiB
✅	agent_rpm_amd64_fips	710.011 MiB
✅	agent_rpm_arm64	731.490 MiB
✅	agent_rpm_arm64_fips	691.450 MiB
✅	agent_suse_amd64	753.072 MiB
✅	agent_suse_amd64_fips	710.011 MiB
✅	agent_suse_arm64	731.490 MiB
✅	agent_suse_arm64_fips	691.450 MiB
✅	docker_agent_amd64	813.391 MiB
✅	docker_agent_arm64	816.579 MiB
✅	docker_agent_jmx_amd64	1004.306 MiB
✅	docker_agent_jmx_arm64	996.273 MiB
✅	docker_cluster_agent_amd64	203.957 MiB
✅	docker_cluster_agent_arm64	218.419 MiB
✅	docker_cws_instrumentation_amd64	7.142 MiB
✅	docker_cws_instrumentation_arm64	6.689 MiB
✅	docker_dogstatsd_amd64	39.234 MiB
✅	docker_dogstatsd_arm64	37.445 MiB
✅	dogstatsd_deb_amd64	29.878 MiB
✅	dogstatsd_deb_arm64	28.030 MiB
✅	dogstatsd_rpm_amd64	29.878 MiB
✅	dogstatsd_suse_amd64	29.878 MiB
✅	iot_agent_deb_amd64	43.285 MiB
✅	iot_agent_deb_arm64	40.332 MiB
✅	iot_agent_deb_armhf	41.084 MiB
✅	iot_agent_rpm_amd64	43.286 MiB
✅	iot_agent_suse_amd64	43.286 MiB

On-wire sizes (compressed)

	Quality gate	Change	Size (prev → curr → max)
✅	agent_deb_amd64	+14.28 KiB (0.01% increase)	174.806 → 174.820 → 178.360
✅	agent_deb_amd64_fips	-16.6 KiB (0.01% reduction)	165.367 → 165.351 → 172.790
✅	agent_heroku_amd64	+6.57 KiB (0.01% increase)	75.003 → 75.009 → 79.970
✅	agent_msi	+36.0 KiB (0.03% increase)	138.395 → 138.430 → 146.220
✅	agent_rpm_amd64	+5.65 KiB (0.00% increase)	177.629 → 177.635 → 181.830
✅	agent_rpm_amd64_fips	+25.45 KiB (0.01% increase)	167.663 → 167.688 → 173.370
✅	agent_rpm_arm64	+5.43 KiB (0.00% increase)	159.572 → 159.577 → 163.060
✅	agent_rpm_arm64_fips	-8.65 KiB (0.01% reduction)	151.437 → 151.429 → 156.170
✅	agent_suse_amd64	+5.65 KiB (0.00% increase)	177.629 → 177.635 → 181.830
✅	agent_suse_amd64_fips	+25.45 KiB (0.01% increase)	167.663 → 167.688 → 173.370
✅	agent_suse_arm64	+5.43 KiB (0.00% increase)	159.572 → 159.577 → 163.060
✅	agent_suse_arm64_fips	-8.65 KiB (0.01% reduction)	151.437 → 151.429 → 156.170
✅	docker_agent_amd64	+3.81 KiB (0.00% increase)	268.199 → 268.203 → 272.480
✅	docker_agent_arm64	neutral	255.390 MiB → 261.060
✅	docker_agent_jmx_amd64	neutral	336.843 MiB → 341.100
✅	docker_agent_jmx_arm64	-2.31 KiB (0.00% reduction)	320.032 → 320.029 → 325.620
✅	docker_cluster_agent_amd64	neutral	71.374 MiB → 72.920
✅	docker_cluster_agent_arm64	neutral	67.010 MiB → 68.220
✅	docker_cws_instrumentation_amd64	neutral	2.999 MiB → 3.330
✅	docker_cws_instrumentation_arm64	neutral	2.729 MiB → 3.090
✅	docker_dogstatsd_amd64	neutral	15.174 MiB → 15.820
✅	docker_dogstatsd_arm64	neutral	14.490 MiB → 14.830
✅	dogstatsd_deb_amd64	-2.24 KiB (0.03% reduction)	7.895 → 7.893 → 8.790
✅	dogstatsd_deb_arm64	neutral	6.777 MiB → 7.710
✅	dogstatsd_rpm_amd64	neutral	7.906 MiB → 8.800
✅	dogstatsd_suse_amd64	neutral	7.906 MiB → 8.800
✅	iot_agent_deb_amd64	neutral	11.401 MiB → 12.040
✅	iot_agent_deb_arm64	-2.44 KiB (0.02% reduction)	9.706 → 9.704 → 10.450
✅	iot_agent_deb_armhf	neutral	9.941 MiB → 10.620
✅	iot_agent_rpm_amd64	neutral	11.419 MiB → 12.060
✅	iot_agent_suse_amd64	neutral	11.419 MiB → 12.060

cit-pr-commenter-54b7da · 2026-03-17T17:33:21Z

Regression Detector

Regression Detector Results

Metrics dashboard
Target profiles
Run ID: d706f2d0-84b2-42ed-bb79-ebc1f40ada28

Baseline: e6be752
Comparison: e4c7857
Diff

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	docker_containers_cpu	% cpu utilization	+1.28	[-1.71, +4.26]	1	Logs

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	quality_gate_logs	% cpu utilization	+2.23	[+0.59, +3.86]	1	Logs bounds checks dashboard
➖	docker_containers_cpu	% cpu utilization	+1.28	[-1.71, +4.26]	1	Logs
➖	ddot_logs	memory utilization	+0.41	[+0.35, +0.48]	1	Logs
➖	docker_containers_memory	memory utilization	+0.24	[+0.16, +0.32]	1	Logs
➖	quality_gate_metrics_logs	memory utilization	+0.12	[-0.11, +0.36]	1	Logs bounds checks dashboard
➖	uds_dogstatsd_20mb_12k_contexts_20_senders	memory utilization	+0.09	[+0.04, +0.15]	1	Logs
➖	otlp_ingest_logs	memory utilization	+0.06	[-0.04, +0.17]	1	Logs
➖	uds_dogstatsd_to_api_v3	ingress throughput	+0.01	[-0.19, +0.21]	1	Logs
➖	uds_dogstatsd_to_api	ingress throughput	+0.01	[-0.20, +0.21]	1	Logs
➖	tcp_syslog_to_blackhole	ingress throughput	-0.00	[-0.14, +0.14]	1	Logs
➖	tcp_dd_logs_filter_exclude	ingress throughput	-0.00	[-0.11, +0.10]	1	Logs
➖	quality_gate_idle	memory utilization	-0.01	[-0.06, +0.04]	1	Logs bounds checks dashboard
➖	file_to_blackhole_1000ms_latency	egress throughput	-0.01	[-0.44, +0.41]	1	Logs
➖	ddot_metrics_sum_delta	memory utilization	-0.02	[-0.19, +0.15]	1	Logs
➖	file_to_blackhole_0ms_latency	egress throughput	-0.04	[-0.57, +0.49]	1	Logs
➖	file_to_blackhole_500ms_latency	egress throughput	-0.05	[-0.42, +0.33]	1	Logs
➖	file_to_blackhole_100ms_latency	egress throughput	-0.08	[-0.17, +0.00]	1	Logs
➖	quality_gate_idle_all_features	memory utilization	-0.09	[-0.13, -0.06]	1	Logs bounds checks dashboard
➖	file_tree	memory utilization	-0.12	[-0.17, -0.06]	1	Logs
➖	ddot_metrics_sum_cumulativetodelta_exporter	memory utilization	-0.20	[-0.42, +0.03]	1	Logs
➖	ddot_metrics	memory utilization	-0.26	[-0.44, -0.08]	1	Logs
➖	ddot_metrics_sum_cumulative	memory utilization	-0.27	[-0.41, -0.13]	1	Logs
➖	otlp_ingest_metrics	memory utilization	-0.95	[-1.11, -0.79]	1	Logs

Bounds Checks: ✅ Passed

perf	experiment	bounds_check_name	replicates_passed	observed_value	links
✅	docker_containers_cpu	simple_check_run	10/10	688 ≥ 26
✅	docker_containers_memory	memory_usage	10/10	272.86MiB ≤ 370MiB
✅	docker_containers_memory	simple_check_run	10/10	704 ≥ 26
✅	file_to_blackhole_0ms_latency	memory_usage	10/10	0.19GiB ≤ 1.20GiB
✅	file_to_blackhole_0ms_latency	missed_bytes	10/10	0B = 0B
✅	file_to_blackhole_1000ms_latency	memory_usage	10/10	0.23GiB ≤ 1.20GiB
✅	file_to_blackhole_1000ms_latency	missed_bytes	10/10	0B = 0B
✅	file_to_blackhole_100ms_latency	memory_usage	10/10	0.20GiB ≤ 1.20GiB
✅	file_to_blackhole_100ms_latency	missed_bytes	10/10	0B = 0B
✅	file_to_blackhole_500ms_latency	memory_usage	10/10	0.22GiB ≤ 1.20GiB
✅	file_to_blackhole_500ms_latency	missed_bytes	10/10	0B = 0B
✅	quality_gate_idle	intake_connections	10/10	3 = 3	bounds checks dashboard
✅	quality_gate_idle	memory_usage	10/10	174.26MiB ≤ 175MiB	bounds checks dashboard
✅	quality_gate_idle_all_features	intake_connections	10/10	3 = 3	bounds checks dashboard
✅	quality_gate_idle_all_features	memory_usage	10/10	492.63MiB ≤ 550MiB	bounds checks dashboard
✅	quality_gate_logs	intake_connections	10/10	4 ≤ 6	bounds checks dashboard
✅	quality_gate_logs	memory_usage	10/10	207.16MiB ≤ 220MiB	bounds checks dashboard
✅	quality_gate_logs	missed_bytes	10/10	0B = 0B	bounds checks dashboard
✅	quality_gate_metrics_logs	cpu_usage	10/10	428.13 ≤ 2000	bounds checks dashboard
✅	quality_gate_metrics_logs	intake_connections	10/10	4 ≤ 6	bounds checks dashboard
✅	quality_gate_metrics_logs	memory_usage	10/10	415.58MiB ≤ 475MiB	bounds checks dashboard
✅	quality_gate_metrics_logs	missed_bytes	10/10	0B = 0B	bounds checks dashboard

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

CI Pass/Fail Decision

✅ Passed. All Quality Gates passed.

quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.

### What does this PR do? Add `--incompatible_remote_local_fallback_for_remote_cache` to the `cache` config. ### Motivation `--remote_local_fallback` only covers remote executor unavailability. Remote cache errors, often leading Bazel to exit with status code [34](https://dd.slack.com/archives/C08SK4B0FK8/p1773737357223759), bypass the fallback and fail the build. `--incompatible_remote_local_fallback_for_remote_cache`, introduced with Bazel 8.5.1 (bazelbuild/bazel#27996) extends the fallback to remote cache errors as well. ### Describe how you validated your changes Local builds unaffected. The flag is a no-op outside `--config=cache`. ### Additional Notes Also remove a now stale comment that said "fallback doesn't work for --remote_cache" since it does, with this flag turned on.

JSGette · 2026-03-31T19:07:02Z

The build farm is used as a remote cache, not a remote executor.
--remote_executor dispatched actions to a remote executor, which caused --remote_local_fallback to cover
execution failures only, i.e. not cache failures.

Not exactly, fallback is basically broken for remote executor if an endpoint is unavailable before the build. So if buildbarn was to fail during the build Bazel would just continue locally.

rdesgroppes · 2026-03-31T22:29:10Z

Not exactly, fallback is basically broken for remote executor if an endpoint is unavailable before the build. So if buildbarn was to fail during the build Bazel would just continue locally.

I updated the PR description quoting your words and mentioning the issue you filed upstream:

--remote_local_fallback bypassed when --remote_executor endpoint is unreachable bazelbuild/bazel#29129.

rdesgroppes added changelog/no-changelog No changelog entry needed qa/no-code-change No code change in Agent code requiring validation labels Mar 17, 2026

rdesgroppes marked this pull request as ready for review March 17, 2026 16:49

rdesgroppes requested a review from a team as a code owner March 17, 2026 16:49

dd-octo-sts Bot added internal Identify a non-fork PR team/agent-build labels Mar 17, 2026

github-actions Bot added the short review PR is simple enough to be reviewed quickly label Mar 17, 2026

JSGette reviewed Mar 18, 2026

View reviewed changes

Comment thread .bazelrc Outdated

github-actions Bot added medium review PR review might take time and removed short review PR is simple enough to be reviewed quickly labels Mar 18, 2026

rdesgroppes force-pushed the regis.desgroppes/bazel-remote-local-fallback-cache branch from 235686b to 5d6eea3 Compare March 31, 2026 15:58

rdesgroppes added 2 commits March 31, 2026 18:30

Address review comments

4492ca3

rdesgroppes force-pushed the regis.desgroppes/bazel-remote-local-fallback-cache branch from 5d6eea3 to 4492ca3 Compare March 31, 2026 16:30

rdesgroppes changed the title ~~Extend Bazel remote_local_fallback to remote cache errors~~ Switch bazel remote config from executor to cache with fallback Mar 31, 2026

JSGette approved these changes Mar 31, 2026

View reviewed changes

gh-worker-dd-mergequeue-cf854d Bot merged commit e4c7857 into main Mar 31, 2026
263 checks passed

gh-worker-dd-mergequeue-cf854d Bot deleted the regis.desgroppes/bazel-remote-local-fallback-cache branch March 31, 2026 23:12

github-actions Bot added this to the 7.79.0 milestone Mar 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch `bazel` remote config from executor to cache with fallback#47945

Switch `bazel` remote config from executor to cache with fallback#47945
gh-worker-dd-mergequeue-cf854d[bot] merged 2 commits intomainfrom
regis.desgroppes/bazel-remote-local-fallback-cache

rdesgroppes commented Mar 17, 2026 •

edited

Loading

Uh oh!

agent-platform-auto-pr Bot commented Mar 17, 2026 •

edited

Loading

Uh oh!

agent-platform-auto-pr Bot commented Mar 17, 2026 •

edited

Loading

Uh oh!

cit-pr-commenter-54b7da Bot commented Mar 17, 2026 •

edited

Loading

Experiments ignored for regressions

Fine details of change detection per experiment

Bounds Checks: ✅ Passed

Explanation

Uh oh!

Uh oh!

JSGette commented Mar 31, 2026

Uh oh!

rdesgroppes commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rdesgroppes commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

1. --remote_executor to --remote_cache

2. --incompatible_remote_local_fallback_for_remote_cache

3. --remote_local_fallback_strategy removal

Describe how you validated your changes

Additional Notes

Uh oh!

agent-platform-auto-pr Bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Files inventory check summary

Results for datadog-agent_7.79.0~devel.git.316.4492ca3.pipeline.105301854-1_amd64.deb:

Uh oh!

agent-platform-auto-pr Bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Static quality checks

Uh oh!

cit-pr-commenter-54b7da Bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Regression Detector

Regression Detector Results

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Fine details of change detection per experiment

Bounds Checks: ✅ Passed

Explanation

CI Pass/Fail Decision

Uh oh!

Uh oh!

JSGette commented Mar 31, 2026

Uh oh!

rdesgroppes commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rdesgroppes commented Mar 17, 2026 •

edited

Loading

1. `--remote_executor` to `--remote_cache`

2. `--incompatible_remote_local_fallback_for_remote_cache`

3. `--remote_local_fallback_strategy` removal

agent-platform-auto-pr Bot commented Mar 17, 2026 •

edited

Loading

agent-platform-auto-pr Bot commented Mar 17, 2026 •

edited

Loading

cit-pr-commenter-54b7da Bot commented Mar 17, 2026 •

edited

Loading