Skip to content

system-probe: Add support for building and testing Rust binary#44635

Merged
dd-mergequeue[bot] merged 6 commits intomainfrom
vitkyrka/rust
Jan 13, 2026
Merged

system-probe: Add support for building and testing Rust binary#44635
dd-mergequeue[bot] merged 6 commits intomainfrom
vitkyrka/rust

Conversation

@vitkyrka
Copy link
Copy Markdown
Contributor

@vitkyrka vitkyrka commented Dec 29, 2025

What does this PR do?

As a first step towards integrating the minimal Rust privileged agent for
discovery, add basic support for building a Rust binary and invoking it from
the go tests, including in KMT. The binary is not shipped.

Since this is the first use of the Rust toolchain in the Datadog agent build,
this is done using a dummy program in order to separate any build, toolchain
and CI related issues and concerns from any concerns about the actual new
privileged agent (which will be imported in a separate, later step).

Note that this also does not deal with the question of invoking Rust unit tests
in the agent build and in KMT, that will also be dealt with separately.

Motivation

https://datadoghq.atlassian.net/browse/DSCVR-313

Describe how you validated your changes

CI

Additional Notes

The Rust binary is not built statically and due to this has issues running on
Centos 7.9 which likely has an older glibc (I wasn't able to debug this fully
since I had issues setting up that distro in KMT locally).

Building the binary statically with -C target-feature=+crt-static did not work
due to rcrt1.o apparently missing in the toolchain.

A fix could have been to include and use the musl toolchain which is the one
recommended for statically compliing Rust, but instead of complicating things
further I just skipped the tests on Centos 7.x for now.

@vitkyrka vitkyrka added changelog/no-changelog No changelog entry needed qa/done QA done before merge and regressions are covered by tests team/agent-discovery labels Dec 29, 2025
@github-actions github-actions bot added component/system-probe medium review PR review might take time labels Dec 29, 2025
@agent-platform-auto-pr
Copy link
Copy Markdown
Contributor

agent-platform-auto-pr bot commented Dec 29, 2025

Static quality checks

✅ Please find below the results from static quality gates
Comparison made with ancestor 1b40b8c
📊 Static Quality Gates Dashboard

31 successful checks with minimal change (< 2 KiB)
Quality gate Current Size
agent_deb_amd64 705.502 MiB
agent_deb_amd64_fips 700.787 MiB
agent_heroku_amd64 326.527 MiB
agent_msi 570.926 MiB
agent_rpm_amd64 705.488 MiB
agent_rpm_amd64_fips 700.774 MiB
agent_rpm_arm64 686.999 MiB
agent_rpm_arm64_fips 683.140 MiB
agent_suse_amd64 705.488 MiB
agent_suse_amd64_fips 700.774 MiB
agent_suse_arm64 686.999 MiB
agent_suse_arm64_fips 683.140 MiB
docker_agent_amd64 767.236 MiB
docker_agent_arm64 773.367 MiB
docker_agent_jmx_amd64 958.115 MiB
docker_agent_jmx_arm64 952.965 MiB
docker_cluster_agent_amd64 180.749 MiB
docker_cluster_agent_arm64 196.618 MiB
docker_cws_instrumentation_amd64 7.135 MiB
docker_cws_instrumentation_arm64 6.689 MiB
docker_dogstatsd_amd64 38.785 MiB
docker_dogstatsd_arm64 37.065 MiB
dogstatsd_deb_amd64 30.004 MiB
dogstatsd_deb_arm64 28.152 MiB
dogstatsd_rpm_amd64 30.004 MiB
dogstatsd_suse_amd64 30.004 MiB
iot_agent_deb_amd64 42.998 MiB
iot_agent_deb_arm64 40.119 MiB
iot_agent_deb_armhf 40.704 MiB
iot_agent_rpm_amd64 42.999 MiB
iot_agent_suse_amd64 42.999 MiB
On-wire sizes (compressed)
Quality gate Change Size (prev → curr → max)
agent_deb_amd64 -26.27 KiB (0.01% reduction) 173.431 → 173.405 → 174.490
agent_deb_amd64_fips +12.31 KiB (0.01% increase) 172.365 → 172.377 → 173.750
agent_heroku_amd64 neutral 87.061 MiB
agent_msi +8.0 KiB (0.01% increase) 142.754 → 142.762 → 143.020
agent_rpm_amd64 -9.31 KiB (0.01% reduction) 176.088 → 176.078 → 177.660
agent_rpm_amd64_fips -9.75 KiB (0.01% reduction) 175.154 → 175.145 → 176.600
agent_rpm_arm64 -2.37 KiB (0.00% reduction) 159.495 → 159.493 → 161.260
agent_rpm_arm64_fips +3.69 KiB (0.00% increase) 158.810 → 158.814 → 160.550
agent_suse_amd64 -9.31 KiB (0.01% reduction) 176.088 → 176.078 → 177.660
agent_suse_amd64_fips -9.75 KiB (0.01% reduction) 175.154 → 175.145 → 176.600
agent_suse_arm64 -2.37 KiB (0.00% reduction) 159.495 → 159.493 → 161.260
agent_suse_arm64_fips +3.69 KiB (0.00% increase) 158.810 → 158.814 → 160.550
docker_agent_amd64 neutral 261.024 MiB
docker_agent_arm64 -8.43 KiB (0.00% reduction) 250.057 → 250.049 → 252.630
docker_agent_jmx_amd64 neutral 329.664 MiB
docker_agent_jmx_arm64 neutral 314.667 MiB
docker_cluster_agent_amd64 neutral 63.858 MiB
docker_cluster_agent_arm64 neutral 60.134 MiB
docker_cws_instrumentation_amd64 neutral 2.994 MiB
docker_cws_instrumentation_arm64 neutral 2.726 MiB
docker_dogstatsd_amd64 neutral 15.014 MiB
docker_dogstatsd_arm64 neutral 14.341 MiB
dogstatsd_deb_amd64 neutral 7.938 MiB
dogstatsd_deb_arm64 neutral 6.813 MiB
dogstatsd_rpm_amd64 neutral 7.950 MiB
dogstatsd_suse_amd64 neutral 7.950 MiB
iot_agent_deb_amd64 neutral 11.264 MiB
iot_agent_deb_arm64 +2.09 KiB (0.02% increase) 9.628 → 9.631 → 10.450
iot_agent_deb_armhf -2.06 KiB (0.02% reduction) 9.827 → 9.825 → 10.620
iot_agent_rpm_amd64 neutral 11.278 MiB
iot_agent_suse_amd64 neutral 11.278 MiB

@cit-pr-commenter
Copy link
Copy Markdown

cit-pr-commenter bot commented Dec 29, 2025

Regression Detector

Regression Detector Results

Metrics dashboard
Target profiles
Run ID: 3268966e-2d21-4402-88e5-66d6e8c2c90e

Baseline: 0e29828
Comparison: 7f3b4d6
Diff

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf experiment goal Δ mean % Δ mean % CI trials links
docker_containers_cpu % cpu utilization -1.52 [-4.50, +1.46] 1 Logs

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI trials links
ddot_metrics memory utilization +1.04 [+0.81, +1.26] 1 Logs
quality_gate_logs % cpu utilization +0.98 [-0.50, +2.46] 1 Logs bounds checks dashboard
otlp_ingest_metrics memory utilization +0.41 [+0.26, +0.56] 1 Logs
ddot_metrics_sum_cumulativetodelta_exporter memory utilization +0.37 [+0.14, +0.60] 1 Logs
file_tree memory utilization +0.21 [+0.15, +0.27] 1 Logs
quality_gate_idle memory utilization +0.20 [+0.15, +0.24] 1 Logs bounds checks dashboard
docker_containers_memory memory utilization +0.19 [+0.12, +0.27] 1 Logs
ddot_logs memory utilization +0.14 [+0.07, +0.21] 1 Logs
quality_gate_idle_all_features memory utilization +0.11 [+0.07, +0.15] 1 Logs bounds checks dashboard
ddot_metrics_sum_cumulative memory utilization +0.10 [-0.05, +0.26] 1 Logs
tcp_dd_logs_filter_exclude ingress throughput +0.00 [-0.08, +0.09] 1 Logs
uds_dogstatsd_to_api ingress throughput -0.01 [-0.14, +0.12] 1 Logs
uds_dogstatsd_to_api_v3 ingress throughput -0.02 [-0.14, +0.11] 1 Logs
file_to_blackhole_1000ms_latency egress throughput -0.03 [-0.46, +0.39] 1 Logs
file_to_blackhole_0ms_latency egress throughput -0.03 [-0.42, +0.35] 1 Logs
file_to_blackhole_100ms_latency egress throughput -0.05 [-0.10, -0.01] 1 Logs
file_to_blackhole_500ms_latency egress throughput -0.06 [-0.43, +0.31] 1 Logs
uds_dogstatsd_20mb_12k_contexts_20_senders memory utilization -0.09 [-0.14, -0.03] 1 Logs
otlp_ingest_logs memory utilization -0.16 [-0.25, -0.06] 1 Logs
ddot_metrics_sum_delta memory utilization -0.43 [-0.64, -0.22] 1 Logs
docker_containers_cpu % cpu utilization -1.52 [-4.50, +1.46] 1 Logs
quality_gate_metrics_logs memory utilization -1.79 [-1.99, -1.60] 1 Logs bounds checks dashboard
tcp_syslog_to_blackhole ingress throughput -2.28 [-2.40, -2.17] 1 Logs

Bounds Checks: ✅ Passed

perf experiment bounds_check_name replicates_passed links
docker_containers_cpu simple_check_run 10/10
docker_containers_memory memory_usage 10/10
docker_containers_memory simple_check_run 10/10
file_to_blackhole_0ms_latency lost_bytes 10/10
file_to_blackhole_0ms_latency memory_usage 10/10
file_to_blackhole_1000ms_latency lost_bytes 10/10
file_to_blackhole_1000ms_latency memory_usage 10/10
file_to_blackhole_100ms_latency lost_bytes 10/10
file_to_blackhole_100ms_latency memory_usage 10/10
file_to_blackhole_500ms_latency lost_bytes 10/10
file_to_blackhole_500ms_latency memory_usage 10/10
quality_gate_idle intake_connections 10/10 bounds checks dashboard
quality_gate_idle memory_usage 10/10 bounds checks dashboard
quality_gate_idle_all_features intake_connections 10/10 bounds checks dashboard
quality_gate_idle_all_features memory_usage 10/10 bounds checks dashboard
quality_gate_logs intake_connections 10/10 bounds checks dashboard
quality_gate_logs lost_bytes 10/10 bounds checks dashboard
quality_gate_logs memory_usage 10/10 bounds checks dashboard
quality_gate_metrics_logs cpu_usage 10/10 bounds checks dashboard
quality_gate_metrics_logs intake_connections 10/10 bounds checks dashboard
quality_gate_metrics_logs lost_bytes 10/10 bounds checks dashboard
quality_gate_metrics_logs memory_usage 10/10 bounds checks dashboard

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

CI Pass/Fail Decision

Passed. All Quality Gates passed.

  • quality_gate_metrics_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
  • quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.

@vitkyrka vitkyrka changed the title rust system-probe: Add support for building and testing Rust binary Dec 30, 2025
@vitkyrka vitkyrka marked this pull request as ready for review January 2, 2026 10:58
@vitkyrka vitkyrka requested review from a team as code owners January 2, 2026 10:58
@vitkyrka vitkyrka added the ask-review Ask required teams to review this PR label Jan 2, 2026
Copy link
Copy Markdown
Contributor

@usamasaqib usamasaqib left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Just added a suggestion for resolving the glibc issue.

platformVersion, err := kernel.PlatformVersion()
require.NoError(t, err)

if platform == "centos" && strings.HasPrefix(platformVersion, "7") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When building test artifacts we generally use a custom gcc toolchain which has an appropriately old version of glibc. The path for this is passed through DD_CC env var in the gitlab job. I think you should be able to pass this through to cargo somehow to workaround this problem.

Copy link
Copy Markdown
Contributor

@rdesgroppes rdesgroppes Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DD_CC env var: we (@DataDog/agent-build) advise against using non-hermetic dependencies whenever possible, even for testing purposes. The available cc toolchains should allow you to reference their hermetic components (compiler, glibc, etc.)
That being said, we acknowledge the transitional state of the ongoing migration to bazel and can revisit at a later stage should you decide to go for a host-provided compiler at this stage.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one part of the problem is that the C/C++ toolchain we use for Bazel doesn't support -static-pie, otherwise we could try to build the test binaries statically. That toolchain is built by us (in some other repo) so we could maybe change it to be built with the appropriate support.

vagrant@agent-dev-ubuntu-24:~$ echo 'int main(){}' | gcc -static-pie -x c -
[0.067s] vagrant@agent-dev-ubuntu-24:~$ echo 'int main(){}' | .cache/bazel/_bazel_vagrant/8a1be3ba35926e170a8cb57e37b5b3bd/external/gcc_toolchain++gcc_toolchains+gcc_toolchain_aarch64/xbin/gcc -static-pie -x c -
/home/vagrant/.cache/bazel/_bazel_vagrant/8a1be3ba35926e170a8cb57e37b5b3bd/external/gcc_toolchain++gcc_toolchains+gcc_toolchain_aarch64/bin/../lib/gcc/aarch64-unknown-linux-gnu/12.3.0/../../../../aarch64-unknown-linux-gnu/bin/ld.bfd: cannot find rcrt1.o: No such file or directory
collect2: error: ld returned 1 exit status
[0.046s] 1  vagrant@agent-dev-ubuntu-24:~$ echo 'int main(){}' | .cache/bazel/_bazel_vagrant/8a1be3ba35926e170a8cb57e37b5b3bd/external/gcc_toolchain++gcc_toolchains+gcc_toolchain_aarch64/xbin/gcc -static -x c -
[0.081s] vagrant@agent-dev-ubuntu-24:~$

Copy link
Copy Markdown
Contributor

@rdesgroppes rdesgroppes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

In CI, enable rustfmt and clippy checks for Rust targets during the
build phase.

This ensures that the rules are enforced by CI while still allowing
developers
to temporarily ignore them locally during development.
@vitkyrka vitkyrka requested a review from a team as a code owner January 9, 2026 15:30
@vitkyrka vitkyrka requested a review from fabbing January 9, 2026 15:30
 Conflicts:
	MODULE.bazel
	MODULE.bazel.lock
@vitkyrka
Copy link
Copy Markdown
Contributor Author

/merge

@gh-worker-devflow-routing-ef8351
Copy link
Copy Markdown

gh-worker-devflow-routing-ef8351 bot commented Jan 13, 2026

View all feedbacks in Devflow UI.

2026-01-13 14:31:22 UTC ℹ️ Start processing command /merge


2026-01-13 14:31:34 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in main is approximately 60m (p90).


2026-01-13 15:31:08 UTC ℹ️ MergeQueue: This merge request was merged

@dd-mergequeue dd-mergequeue bot merged commit ba0d1dc into main Jan 13, 2026
456 checks passed
@dd-mergequeue dd-mergequeue bot deleted the vitkyrka/rust branch January 13, 2026 15:31
@github-actions github-actions bot added this to the 7.76.0 milestone Jan 13, 2026
dd-mergequeue bot pushed a commit that referenced this pull request Jan 15, 2026
### What does this PR do?

Import the Rust code for the privileged discovery agent and hook it up to the
system-probe build/test system, replacing the sample binary which was added
previously in #44635.

This binary is not currently shipped.

The contents of pkg/discovery/module/rust are taken from DataDog/sd-agent@d1255b6.
No modifications have been done in this PR except removing these files which
are now unnecessary/unused:

```
 delete mode 100644 pkg/discovery/module/rust/.bazelignore
 delete mode 100644 pkg/discovery/module/rust/.bazelrc
 delete mode 100644 pkg/discovery/module/rust/.bazelversion
 delete mode 100644 pkg/discovery/module/rust/.gitattributes
 delete mode 100644 pkg/discovery/module/rust/.github/CODEOWNERS
 delete mode 100644 pkg/discovery/module/rust/.github/chainguard/self.gitlab.read.sts.yaml
 delete mode 100644 pkg/discovery/module/rust/.github/workflows/bazel-ci.yaml
 delete mode 100644 pkg/discovery/module/rust/.github/workflows/ci.yaml
 delete mode 100644 pkg/discovery/module/rust/.github/workflows/release.yaml
 delete mode 100644 pkg/discovery/module/rust/.gitignore
 delete mode 100644 pkg/discovery/module/rust/MODULE.bazel
 delete mode 100644 pkg/discovery/module/rust/MODULE.bazel.lock
```

Any comments on the code itself will preferably be addressed in follow-up PRs,
to keep the import clean.

Note also that there is also future work needed to ensure that the resulting
binary is built with the exact size optimization options, by applying the
build options from the Cargo.toml (or the removed .bazelrc) to a more suitable
place.  This is also postponed to a later point since the binary is not being
shipped yet.

### Motivation

https://datadoghq.atlassian.net/browse/DSCVR-313

### Describe how you validated your changes

CI.

### Additional Notes

For the handling of external crates (dependencies), I've added them via
`crate_universe`'s [direct dependencies](https://bazelbuild.github.io/rules_rust/crate_universe_bzlmod.html#direct-dependencies).

The alternative would have been to point to a Cargo.toml.  Pointing to
pkg/discovery/module/rust's from the main MODULE.bazel didn't seem correct, and
adding a top-level Cargo.toml didn't seem like the way to go since we'll only
be supporting Bazel for (top-level) builds.  The organization of the
crates.MODULE.bazel file will likely have to be revisited in the future if we
get different Rust packages with conflicting requirements for crate features or
versions.


Co-authored-by: vincent.whitchurch <[email protected]>
theomagellan pushed a commit that referenced this pull request Jan 19, 2026
### What does this PR do?

Import the Rust code for the privileged discovery agent and hook it up to the
system-probe build/test system, replacing the sample binary which was added
previously in #44635.

This binary is not currently shipped.

The contents of pkg/discovery/module/rust are taken from https://github.com/DataDog/sd-agent/commit/d1255b66edd221f07f7fa30b6372d310c9e97a1a.
No modifications have been done in this PR except removing these files which
are now unnecessary/unused:

```
 delete mode 100644 pkg/discovery/module/rust/.bazelignore
 delete mode 100644 pkg/discovery/module/rust/.bazelrc
 delete mode 100644 pkg/discovery/module/rust/.bazelversion
 delete mode 100644 pkg/discovery/module/rust/.gitattributes
 delete mode 100644 pkg/discovery/module/rust/.github/CODEOWNERS
 delete mode 100644 pkg/discovery/module/rust/.github/chainguard/self.gitlab.read.sts.yaml
 delete mode 100644 pkg/discovery/module/rust/.github/workflows/bazel-ci.yaml
 delete mode 100644 pkg/discovery/module/rust/.github/workflows/ci.yaml
 delete mode 100644 pkg/discovery/module/rust/.github/workflows/release.yaml
 delete mode 100644 pkg/discovery/module/rust/.gitignore
 delete mode 100644 pkg/discovery/module/rust/MODULE.bazel
 delete mode 100644 pkg/discovery/module/rust/MODULE.bazel.lock
```

Any comments on the code itself will preferably be addressed in follow-up PRs,
to keep the import clean.

Note also that there is also future work needed to ensure that the resulting
binary is built with the exact size optimization options, by applying the
build options from the Cargo.toml (or the removed .bazelrc) to a more suitable
place.  This is also postponed to a later point since the binary is not being
shipped yet.

### Motivation

https://datadoghq.atlassian.net/browse/DSCVR-313

### Describe how you validated your changes

CI.

### Additional Notes

For the handling of external crates (dependencies), I've added them via
`crate_universe`'s [direct dependencies](https://bazelbuild.github.io/rules_rust/crate_universe_bzlmod.html#direct-dependencies).

The alternative would have been to point to a Cargo.toml.  Pointing to
pkg/discovery/module/rust's from the main MODULE.bazel didn't seem correct, and
adding a top-level Cargo.toml didn't seem like the way to go since we'll only
be supporting Bazel for (top-level) builds.  The organization of the
crates.MODULE.bazel file will likely have to be revisited in the future if we
get different Rust packages with conflicting requirements for crate features or
versions.


Co-authored-by: vincent.whitchurch <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ask-review Ask required teams to review this PR changelog/no-changelog No changelog entry needed component/system-probe medium review PR review might take time qa/done QA done before merge and regressions are covered by tests team/agent-discovery

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants