OTEP: Process Context: Sharing Resource Attributes with External Readers#4719
OTEP: Process Context: Sharing Resource Attributes with External Readers#4719ivoanjo wants to merge 57 commits intoopen-telemetry:mainfrom
Conversation
This OTEP introduces a standard mechanism for OpenTelemetry SDKs to publish process-level resource attributes for access by out-of-process readers such as the OpenTelemetry eBPF Profiler. External readers like the OpenTelemetry eBPF Profiler operate outside the instrumented process and cannot access resource attributes configured within OpenTelemetry SDKs. We propose a mechanism for OpenTelemetry SDKs to publish process-level resource attributes, through a standard format based on Linux anonymous memory mappings. When an SDK initializes (or updates its resource attributes) it publishes this information to a small, fixed-size memory region that external processes can discover and read. The OTEL eBPF profiler will then, upon observing a previously-unseen process, probe and read this information, associating it with any profiling samples taken from a given process. _I'm opening this PR as a draft with the intention of sharing with the Profiling SIG for an extra round of feedback before asking for a wider review._ _This OTEP is based on [Sharing Process-Level Resource Attributes with the OpenTelemetry eBPF Profiler](https://docs.google.com/document/d/1-4jo29vWBZZ0nKKAOG13uAQjRcARwmRc4P313LTbPOE/edit?tab=t.0), big thanks to everyone that provided feedback and helped refine the idea so far._
|
Marking as ready for review! |
|
So this would be a new requirement for eBPF profiler implementations? My issue is the lack of safe support for Erlang/Elixir to do this. While something that could just be accessed as a file or socket wouldn't have that issue. We'd have to pull in a third party, or implement ourselves, library that is a NIF to make these calls and that brings in instability many would rather not have when the goal of our SDK is to not be able to bring down a users program if the SDk crashes -- unless they specifically configure it to do so. |
No, hard requirement should not be the goal: for starters, this is Linux-only (for now), so right off the gate this means it's not going to be available everywhere. Having this discussion is exactly why it was included as one of the open questions in the doc 👍 Our view is that we should go for recommended to implement and recommended to enable by default. In languages/runtimes where it's easy to do so (Go, Rust, Java 22+, possibly Ruby, ...etc?) we should be able to deliver this experience. For others, such as Erlang/Elixir, Java 8-21 (requires a native library, similar to Erlang/Elixir), the goal would be to make it very easy to enable/use for users that want it, but still optional so as to not impact anyone that is not interested. We should probably record the above guidance on the OTEP, if/once we're happy with it 🤔 |
|
cc @open-telemetry/specs-entities-approvers for extra eyes |
|
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Co-authored-by: Florian Lehner <[email protected]>
Following discussion so far, we can probably avoid having our home-grown `OtelProcessCtx` and instead use the common OTEL `Resource` message.
This PR adds an experimental C/C++ implementation for the "Process Context" OTEP being proposed in open-telemetry/opentelemetry-specification#4719 This implementation previously lived in https://github.com/ivoanjo/proc-level-demo/tree/main/anonmapping-clib and as discussed during the OTEL profiling SIG meeting we want to add it to this repository so it becomes easier to find and contribute to. I've made sure to include a README explaining how to use it. Here's the ultra-quick start (Linux-only): ```bash $ ./build.sh $ ./build/example_ctx --keep-running Published: service=my-service, instance=123d8444-2c7e-46e3-89f6-6217880f7123, env=prod, version=4.5.6, sdk=example_ctx.c/c/1.2.3, resources=resource.key1=resource.value1,resource.key2=resource.value2 Continuing forever, to exit press ctrl+c... TIP: You can now `sudo ./otel_process_ctx_dump.sh 267023` to see the context # In another shell $ sudo ./otel_process_ctx_dump.sh 267023 # Update this to match the PID from above Found OTEL context for PID 267023 Start address: 756f28ce1000 00000000 4f 54 45 4c 5f 43 54 58 02 00 00 00 0b 68 55 47 |OTEL_CTX.....hUG| 00000010 70 24 7d 18 50 01 00 00 a0 82 6d 7e 6a 5f 00 00 |p$}.P.....m~j_..| 00000020 Parsed struct: otel_process_ctx_signature : "OTEL_CTX" otel_process_ctx_version : 2 otel_process_ctx_published_at_ns : 1764606693650819083 (2025-12-01 16:31:33 GMT) otel_process_payload_size : 336 otel_process_payload : 0x00005f6a7e6d82a0 Payload dump (336 bytes): 00000000 0a 25 0a 1b 64 65 70 6c 6f 79 6d 65 6e 74 2e 65 |.%..deployment.e| 00000010 6e 76 69 72 6f 6e 6d 65 6e 74 2e 6e 61 6d 65 12 |nvironment.name.| ... Protobuf decode: attributes { key: "deployment.environment.name" value { string_value: "prod" } } attributes { key: "service.instance.id" value { string_value: "123d8444-2c7e-46e3-89f6-6217880f7123" } } attributes { key: "service.name" value { string_value: "my-service" } } ... ``` Note that because the upstream OTEP is still under discussion, this implementation is experimental and may need changes to match up with the final version of the OTEP.
As pointed out during review, these don't necessarily exist for some resources so let's streamline the spec for now.
**What does this PR do?** This PR bumps the libdatadog dependency from version 28.0.2.1.0 to 29.0.0.1.0. This new version brings: * macOS build fixes needed to unblock #5351 * libdatadog support for the [OTel process context](open-telemetry/opentelemetry-specification#4719) (I plan to submit a PR with some integration for testing this separately -- commit is already waiting) **Motivation:** Adopt latest libdatadog. **Change log entry** Yes. Upgrade libdatadog dependency to version 29.0.0 **Additional Notes:** N/A **How to test the change?** Green CI is good, as usual.
**What does this PR do?** This PR adds integration tests for the [libdatadog](DataDog/libdatadog#1658) support for [OTel process context](open-telemetry/opentelemetry-specification#4719). **Motivation:** Starting in libdatadog v29, the process context is automatically published together with the process discovery feature, so we don't need to do anything from the Ruby side to turn it on. Yet, we want to make sure this is in good shape as this is a WIP effort so having a few Ruby-level integration tests seems nice. **Additional Notes:** For Datadog folks -- more details on this work can be found on #ebpf-context-propagation on our slack. The process context is stored as a protobuf message in a very special memory location, hence the new test using protobuf + the weird code to parse memory. **How to test the change?** Validate that new spec is passing!
### What does this PR do? Update OTel process context to v2 (see open-telemetry/opentelemetry-specification#4719) ### Motivation Bring implementation in dd-trace-go up-to-date with proposal. ### Reviewer's Checklist <!-- * Authors can use this list as a reference to ensure that there are no problems during the review but the signing off is to be done by the reviewer(s). --> - [x] Changed code has unit tests for its functionality at or near 100% coverage. - [ ] [System-Tests](https://github.com/DataDog/system-tests/) covering this feature have been added and enabled with the va.b.c-dev version tag. - [ ] There is a benchmark for any new code, or changes to existing code. - [ ] If this interacts with the agent in a new way, a system test has been added. - [ ] New code is free of linting errors. You can check this by running `make lint` locally. - [ ] New code doesn't break existing tests. You can check this by running `make test` locally. - [ ] Add an appropriate team label so this PR gets put in the right place for the release notes. - [ ] All generated files are up to date. You can check this by running `make generate` locally. - [x] Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild. Make sure all nested modules are up to date by running `make fix-modules` locally. Unsure? Have a question? Request a review! JIRA [PROF-14086] Co-authored-by: nicolas.savoire <[email protected]>
|
FYI @ivoanjo Usually we need at least 4 approvals for OTEPs, so we are almost there 😃 |
As pointed by @florianl and discussed with profiling SIG on 2026-03-25 having a dedicated namespace + have it at `v1development` gives us a bit more leeway on evolving this.
78178a1 to
39a0d58
Compare
reyang
left a comment
There was a problem hiding this comment.
I've left several suggestions, overall looks good to me!
Co-authored-by: Reiley Yang <[email protected]>

Changes
External readers like the OpenTelemetry eBPF Profiler operate outside the instrumented process and cannot access resource attributes configured within OpenTelemetry SDKs. We propose a mechanism for OpenTelemetry SDKs to publish process-level resource attributes, through a standard format based on Linux anonymous memory mappings.
When an SDK initializes (or updates its resource attributes) it publishes this information to a small, fixed-size memory region that external processes can discover and read. The OTEL eBPF profiler will then, upon observing a previously-unseen process, probe and read this information, associating it with any profiling samples taken from a given process.
Why open as draft:
I'm opening this PR as a draft with the intention of sharing with the Profiling SIG for an extra round of feedback before asking for a wider review.This OTEP is based on Sharing Process-Level Resource Attributes with the OpenTelemetry eBPF Profiler, big thanks to everyone that provided feedback and helped refine the idea so far.
CHANGELOG.mdfile updated for non-trivial changes