feat: process context publication#1585
feat: process context publication#1585gh-worker-dd-mergequeue-cf854d[bot] merged 23 commits intomainfrom
Conversation
Clippy Allow Annotation ReportComparing clippy allow annotations between branches:
Summary by Rule
Annotation Counts by File
Annotation Stats by Crate
About This ReportThis report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality. |
BenchmarksComparisonBenchmark execution time: 2026-03-02 13:15:17 Comparing candidate commit 0af80fa in PR branch Found 1 performance improvements and 2 performance regressions! Performance is the same for 54 metrics, 2 unstable metrics. scenario:benching serializing traces from their internal representation to msgpack
scenario:credit_card/is_card_number/ 378282246310005
CandidateCandidate benchmark detailsGroup 1
Group 2
Group 3
Group 4
Group 5
Group 6
Group 7
Group 8
Group 9
Group 10
Group 11
Group 12
Group 13
Group 14
Group 15
Group 16
Group 17
Group 18
Group 19
BaselineOmitted due to size. |
05a3bcd to
6598cc3
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1585 +/- ##
==========================================
+ Coverage 71.08% 71.18% +0.09%
==========================================
Files 424 425 +1
Lines 62493 62666 +173
==========================================
+ Hits 44426 44609 +183
+ Misses 18067 18057 -10
🚀 New features to boost your workflow:
|
6598cc3 to
43208fe
Compare
libdd-library-config/Cargo.toml
Outdated
| rand = "0.8.3" | ||
| rmp = "0.8.14" | ||
| rmp-serde = "1.3.0" | ||
| rustix = { version = "1.1.3", features = ["param", "mm", "process"] } |
There was a problem hiding this comment.
why rustix rather than the libc crate?
There was a problem hiding this comment.
IMHO it's higher-level and a nicer to use than libc (for example memfd_create returns a Result and an RAII file descriptor that is automatically closed upon drop, while the one from libc returns a c_int, etc.). I saw a bunch of occurrences already in Cargo.lock and thought it was pulled already anyway. But upon scrutiny it seems that the 1.1.3 major version bucket is mostly used by tempfile, which is a dev dependency most of the time? So maybe this is actually pulling some additional stuff.
There are a bunch of other dependencies that use the 0.38 version of rustix, so I can also downgrade to this one. But it's just a slight QoL improvement; happy to switch to libc if you think it's better for whatever reason.
There was a problem hiding this comment.
Might be worth giving a quick check on the size of the resulting builds (I thought we had a github action that posted that but I'm not seeing it?).
In particular, we don't have a specific set target size that we need to stay under, but for many reasons we often have to ship variants so 1 MiB extra does add up if we need to ship e.g. N architectures and M versions.
Having said that, I feel like I'd seen rustix before but hadn't paid a lot of attention to it.
Looking at https://crates.io/crates/rustix it lists that it can work even without libc and that's amazing! If we could drop libc as a dependency from libdatadog would be super-unlock, since one situation where we end up needing to repeat builds is sometimes needing to have builds for both for glibc AND musl.
There was a problem hiding this comment.
If we could drop libc as a dependency from libdatadog would be super-unlock, since one situation where we end up needing to repeat builds is sometimes needing to have builds for both for glibc AND musl.
That's not really possible, because std uses libc
There was a problem hiding this comment.
That's not really possible, because std uses libc
Yeah, I know. Rust is very disapponting in this :P
There was a problem hiding this comment.
The CI comment on artifact size seems to point at it being the same ( #1585 (comment) ) so no strong feelings here -- any concerns with keeping as-is @paullegranddc ?
fedbeb2 to
74d2641
Compare
ivoanjo
left a comment
There was a problem hiding this comment.
Did a pass on it! Sorry from my part if there is a bit of a confusion with older versions of the spec being implemented, I'll make sure to keep a close eye on the libdatadog impl so it doesn't fall behind while things are still sometimes moving in the spec.
Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|
ivoanjo
left a comment
There was a problem hiding this comment.
Left a few more comments! I think this is pretty much good to go (once undrafted and CI is happy)
libdd-library-config/Cargo.toml
Outdated
| rand = "0.8.3" | ||
| rmp = "0.8.14" | ||
| rmp-serde = "1.3.0" | ||
| rustix = { version = "1.1.3", features = ["param", "mm", "process"] } |
There was a problem hiding this comment.
The CI comment on artifact size seems to point at it being the same ( #1585 (comment) ) so no strong feelings here -- any concerns with keeping as-is @paullegranddc ?
Add an explicit rustix dependency (which was already pulled as a transitive dependency). Prep work for process context sharing.
The wrong mapping name was used for discovery. Co-authored-by: Scott Gerring <[email protected]>
Co-authored-by: Ivo Anjo <[email protected]>
ivoanjo
left a comment
There was a problem hiding this comment.
👍 LGTM.
We discussed and are aware these will be needed as follow-ups:
- Support for updates
- Support for forks
- Actually wiring it up
...but I think as-is this PR is good to go and we can build atop it
|
/merge |
|
View all feedbacks in Devflow UI.
The expected merge time in
|
# What does this PR do? This PR implements the missing update sequence of the [OTel process context](open-telemetry/opentelemetry-specification#4719). This allows for modification of an already published OTel process context. # Motivation This PR is a planned follow-up of #1585, which was knowingly incomplete: it could publish a fresh context, but wouldn't be able to update the context. This PR implements the missing update functionality. See the OTEP spec mentioned above for more information about the general motivation. # Additional Notes This only part of #1585 follow-ups. In particular, it does not handle forks (where we should publish a new context instead of updating it if there's been a fork since last publication), which is left for a follow-up. # How to test the change? This PR includes a test for the update sequence. It could also be hooked up in workflow where any reader implementation of the OTEP spec could try to read a published/updated context. Co-authored-by: yann.hamdaoui <[email protected]>
Depends #1640 # What does this PR do? This PR is a follow up of #1585 and #1640 which implement OTel process context publication. It adds a proper handling of the case of publishing a new context after a `fork`. # Motivation Some language runtimes (e.g. Python or Ruby) resorts to `fork`. In this case, the OTel process context of the child must be re-published (the process context of the parent is explicitly NOT inherited through `MADVISE_DONTFORK`). However, since they share the same copy of the static handler, prior to this PR, the publication would try to access the non-inherited mapping, potentially causing a crash. This PR properly handles this case by storing the PID of the publisher, so that upon update, we can detect if there's been a fork since (and we are a child), in which case we can re-create a mapping from scratch. # How to test the change? There is currently no test, because having tests that `fork` is not trivial to set up in the current Rust test framework (it is admittedly possible, but is a larger question that should be treated separately IMHO). However, two follow-up PRs are coming with an FFI and libdatadog-side protobuf encoding of the payload, which will make it possible to test the fork behavior e.g. from Ruby, once they land. Co-authored-by: yann.hamdaoui <[email protected]>
Depends #1650 # What does this PR do? Follow-up of #1585 #1640 #1650 Adds the protobuf definition for the OTel process context and associated messages, and make the interface of the publisher higher-level by taking the new structured `ProcessContext` value instead of a raw bytes payload. # Motivation Since libdatadog is already taking care of some protobuf encoding, and it's supposedly faster and simpler to do here rather than on the side of each language runtime, it makes sense to offer an interface with a struct that is encoded by libdatadog. The process context will be a `ProcessContext`-based opaque pointer on the FFI side, with proper setters/getters. # Additional Notes I was not sure where to put the protobuf definition, as there are a bunch of protobuf-dedicated crates in libdatadog. I feel like the OTel process context is about tracing metadata, but I'm happy to move it elsewhere if it makes more sense. # How to test the change? Once the FFI lands, we'll be able to publish the context from a language runtime and check the whole process end-to-end. For now, I feel like an additional test would mostly test `prost`, which isn't very valuable. Co-authored-by: yann.hamdaoui <[email protected]>
# What does this PR do? [feat: process context publication](#1585) [ci: pass macos label to downstream project](#1647) [chore: implement otel process ctx update](#1640) [chore(crashtracking): emit a best effort stacktrace for Mac](#1645) [ci: remove depth so it can cause problems when getting the diffs](#1657) [build(macOS): set the LC_ID_DYLIB for mac binaries to set correct name for linking](#1646) [chore(ci): fix crashtracker receiver binary rpath setting](#1652) [chore(deps): bump blazesym to 0.2.3 and blazesym-c to 0.1.7](#1654) [feat: otel process ctxt protobuf encoding](#1651) [chore(crashtracker): fix benchmark job](#1664) # Motivation What inspired you to submit this pull request? # Additional Notes Anything else we should know when reviewing? # How to test the change? Describe here in detail how the change can be validated. Co-authored-by: gyuheon.oh <[email protected]>
Depends #1653 # What does this PR do? This is the final follow-up of #1585 (and sequels) for OTel process context publication. This hooks the OTel process context sharing in the existing `store_tracer_metadata`, which now publishes using both "protocols". # Motivation This makes the OTel process context publication happen automatically for existing users of libdatadog that would publish tracer metadata already, in a transparent way. # Additional Notes N/A # How to test the change? Expected to be tested thereafter for Ruby. Co-authored-by: yann.hamdaoui <[email protected]>
# Release proposal for libdd-library-config and its dependencies This PR contains version bumps based on public API changes and commits since last release. ## libdd-trace-protobuf **Next version:** `2.0.0` **Semver bump:** `major` **Tag:** `libdd-trace-protobuf-v2.0.0` ### Commits - feat(trace-protobuf)!: Add two fields to ClientGroupedStats [SVLS-8627] (#1630) - feat: otel process ctxt protobuf encoding (#1651) ## libdd-library-config **Next version:** `1.1.0` **Semver bump:** `minor` **Tag:** `libdd-library-config-v1.1.0` ### Commits - chore: update otel process ctx protocol (#1713) - feat: publish tracer metadata as OTel process ctx (#1658) - feat: otel process ctxt protobuf encoding (#1651) - fix: handle fork in otel process ctx (#1650) - chore: implement otel process ctx update (#1640) - feat: process context publication (#1585) - ci: update nightly in CI to 2026-02-08 (#1539) - fix(stable_config): [APMAPI-1690] add >100mb check for stable config files (#1432) - chore: add changelog for every published crate (#1396) [SVLS-8627]: https://datadoghq.atlassian.net/browse/SVLS-8627?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ [APMAPI-1690]: https://datadoghq.atlassian.net/browse/APMAPI-1690?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ --------- Co-authored-by: dd-octo-sts[bot] <200755185+dd-octo-sts[bot]@users.noreply.github.com>
# Release proposal for libdd-library-config and its dependencies This PR contains version bumps based on public API changes and commits since last release. ## libdd-trace-protobuf **Next version:** `2.0.0` **Semver bump:** `major` **Tag:** `libdd-trace-protobuf-v2.0.0` ### Commits - feat(trace-protobuf)!: Add two fields to ClientGroupedStats [SVLS-8627] (#1630) - feat: otel process ctxt protobuf encoding (#1651) ## libdd-library-config **Next version:** `1.1.0` **Semver bump:** `minor` **Tag:** `libdd-library-config-v1.1.0` ### Commits - chore: update otel process ctx protocol (#1713) - feat: publish tracer metadata as OTel process ctx (#1658) - feat: otel process ctxt protobuf encoding (#1651) - fix: handle fork in otel process ctx (#1650) - chore: implement otel process ctx update (#1640) - feat: process context publication (#1585) - ci: update nightly in CI to 2026-02-08 (#1539) - fix(stable_config): [APMAPI-1690] add >100mb check for stable config files (#1432) - chore: add changelog for every published crate (#1396) [SVLS-8627]: https://datadoghq.atlassian.net/browse/SVLS-8627?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ [APMAPI-1690]: https://datadoghq.atlassian.net/browse/APMAPI-1690?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ --------- Co-authored-by: dd-octo-sts[bot] <200755185+dd-octo-sts[bot]@users.noreply.github.com>
# Release proposal for libdd-library-config and its dependencies This PR contains version bumps based on public API changes and commits since last release. ## libdd-trace-protobuf **Next version:** `2.0.0` **Semver bump:** `major` **Tag:** `libdd-trace-protobuf-v2.0.0` ### Commits - feat(trace-protobuf)!: Add two fields to ClientGroupedStats [SVLS-8627] (#1630) - feat: otel process ctxt protobuf encoding (#1651) ## libdd-library-config **Next version:** `1.1.0` **Semver bump:** `minor` **Tag:** `libdd-library-config-v1.1.0` ### Commits - chore: update otel process ctx protocol (#1713) - feat: publish tracer metadata as OTel process ctx (#1658) - feat: otel process ctxt protobuf encoding (#1651) - fix: handle fork in otel process ctx (#1650) - chore: implement otel process ctx update (#1640) - feat: process context publication (#1585) - ci: update nightly in CI to 2026-02-08 (#1539) - fix(stable_config): [APMAPI-1690] add >100mb check for stable config files (#1432) - chore: add changelog for every published crate (#1396) [SVLS-8627]: https://datadoghq.atlassian.net/browse/SVLS-8627?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ [APMAPI-1690]: https://datadoghq.atlassian.net/browse/APMAPI-1690?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ --------- # What does this PR do? A brief description of the change being made with this pull request. # Motivation What inspired you to submit this pull request? # Additional Notes Anything else we should know when reviewing? # How to test the change? Describe here in detail how the change can be validated. [SVLS-8627]: https://datadoghq.atlassian.net/browse/SVLS-8627?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ [APMAPI-1690]: https://datadoghq.atlassian.net/browse/APMAPI-1690?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ [SVLS-8627]: https://datadoghq.atlassian.net/browse/SVLS-8627?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Co-authored-by: dd-octo-sts[bot] <200755185+dd-octo-sts[bot]@users.noreply.github.com>
# Release proposal for libdd-library-config and its dependencies This PR contains version bumps based on public API changes and commits since last release. ## libdd-trace-protobuf **Next version:** `2.0.0` **Semver bump:** `major` **Tag:** `libdd-trace-protobuf-v2.0.0` ### Commits - feat(trace-protobuf)!: Add two fields to ClientGroupedStats [SVLS-8627] (#1630) - feat: otel process ctxt protobuf encoding (#1651) ## libdd-library-config **Next version:** `1.1.0` **Semver bump:** `minor` **Tag:** `libdd-library-config-v1.1.0` ### Commits - chore: update otel process ctx protocol (#1713) - feat: publish tracer metadata as OTel process ctx (#1658) - feat: otel process ctxt protobuf encoding (#1651) - fix: handle fork in otel process ctx (#1650) - chore: implement otel process ctx update (#1640) - feat: process context publication (#1585) - ci: update nightly in CI to 2026-02-08 (#1539) - fix(stable_config): [APMAPI-1690] add >100mb check for stable config files (#1432) - chore: add changelog for every published crate (#1396) [SVLS-8627]: https://datadoghq.atlassian.net/browse/SVLS-8627?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ [APMAPI-1690]: https://datadoghq.atlassian.net/browse/APMAPI-1690?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ --------- Co-authored-by: dd-octo-sts[bot] <200755185+dd-octo-sts[bot]@users.noreply.github.com>
What does this PR do?
This PR implements the publication protocol of the otel process context sharing proposal.
This is intended as a minimally viable starting point. Next steps are kept for follow-up PRs, which could include for example:
Motivation
This feature allows a process to expose data to an external process, typically an eBPF profiler. Please refer to the OTEP linked above for a detailed motivation.
Additional Notes
Some notes on dependencies:
rustixfor that since it's already pulled as a transitive dependency (with the same major version bucket), and is nicely higher-level thanlibc.MemFdwrapper crate used (e.g herelibdatadog/libdd-library-config/src/tracer_metadata.rs
Line 54 in 0bd90fd
There are a number of design choices or assumptions that might be interesting to discuss further:
/proc/<pid>/mapsand syscalls to do so, so the concurrency model is a bit unclear. We basically settled on the mental model being that we use atomics as if the reader was another thread of the same program, which sounds like the best we can do and should prevent re-ordering at least on the writer side (using OS-level sync is another solution, but was deemed too costly for the upcoming thread-level context).Pin<Box<[u8]>>?), and maybe offer the option - or do it automatically, depending on the size - of moving the payload directly after the header, as allowed by the spec. This is left for future work.How to test the change?
A test is included for both
publishandupdate, with a subsequent read. Another way would be to use those functions and use any other reader implementation (e.g. in C) as a separate process.