Skip to content

OTEP: Process Context: Sharing Resource Attributes with External Readers#4719

Open
ivoanjo wants to merge 57 commits intoopen-telemetry:mainfrom
ivoanjo:ivoanjo/profiling-process-ctx
Open

OTEP: Process Context: Sharing Resource Attributes with External Readers#4719
ivoanjo wants to merge 57 commits intoopen-telemetry:mainfrom
ivoanjo:ivoanjo/profiling-process-ctx

Conversation

@ivoanjo
Copy link
Copy Markdown

@ivoanjo ivoanjo commented Oct 31, 2025

Changes

External readers like the OpenTelemetry eBPF Profiler operate outside the instrumented process and cannot access resource attributes configured within OpenTelemetry SDKs. We propose a mechanism for OpenTelemetry SDKs to publish process-level resource attributes, through a standard format based on Linux anonymous memory mappings.

When an SDK initializes (or updates its resource attributes) it publishes this information to a small, fixed-size memory region that external processes can discover and read. The OTEL eBPF profiler will then, upon observing a previously-unseen process, probe and read this information, associating it with any profiling samples taken from a given process.

Why open as draft: I'm opening this PR as a draft with the intention of sharing with the Profiling SIG for an extra round of feedback before asking for a wider review.

This OTEP is based on Sharing Process-Level Resource Attributes with the OpenTelemetry eBPF Profiler, big thanks to everyone that provided feedback and helped refine the idea so far.

This OTEP introduces a standard mechanism for OpenTelemetry SDKs to
publish process-level resource attributes for access by out-of-process
readers such as the OpenTelemetry eBPF Profiler.

External readers like the OpenTelemetry eBPF Profiler operate outside
the instrumented process and cannot access resource attributes
configured within OpenTelemetry SDKs.

We propose a mechanism for OpenTelemetry SDKs to publish process-level
resource attributes, through a standard format based on Linux anonymous
memory mappings.

When an SDK initializes (or updates its resource attributes) it
publishes this information to a small, fixed-size memory region that
external processes can discover and read.

The OTEL eBPF profiler will then, upon observing a previously-unseen
process, probe and read this information, associating it with any
profiling samples taken from a given process.

_I'm opening this PR as a draft with the intention of sharing with
the Profiling SIG for an extra round of feedback before asking for a
wider review._

_This OTEP is based on
[Sharing Process-Level Resource Attributes with the OpenTelemetry eBPF Profiler](https://docs.google.com/document/d/1-4jo29vWBZZ0nKKAOG13uAQjRcARwmRc4P313LTbPOE/edit?tab=t.0),
big thanks to everyone that provided feedback and helped refine the
idea so far._
@ivoanjo
Copy link
Copy Markdown
Author

ivoanjo commented Nov 5, 2025

Marking as ready for review!

@ivoanjo ivoanjo marked this pull request as ready for review November 5, 2025 12:19
@ivoanjo ivoanjo requested review from a team as code owners November 5, 2025 12:19
@tsloughter
Copy link
Copy Markdown
Member

So this would be a new requirement for eBPF profiler implementations?

My issue is the lack of safe support for Erlang/Elixir to do this. While something that could just be accessed as a file or socket wouldn't have that issue. We'd have to pull in a third party, or implement ourselves, library that is a NIF to make these calls and that brings in instability many would rather not have when the goal of our SDK is to not be able to bring down a users program if the SDk crashes -- unless they specifically configure it to do so.

@ivoanjo
Copy link
Copy Markdown
Author

ivoanjo commented Nov 6, 2025

So this would be a new requirement for eBPF profiler implementations?

No, hard requirement should not be the goal: for starters, this is Linux-only (for now), so right off the gate this means it's not going to be available everywhere.

Having this discussion is exactly why it was included as one of the open questions in the doc 👍


Our view is that we should go for recommended to implement and recommended to enable by default.

In languages/runtimes where it's easy to do so (Go, Rust, Java 22+, possibly Ruby, ...etc?) we should be able to deliver this experience.

For others, such as Erlang/Elixir, Java 8-21 (requires a native library, similar to Erlang/Elixir), the goal would be to make it very easy to enable/use for users that want it, but still optional so as to not impact anyone that is not interested.

We should probably record the above guidance on the OTEP, if/once we're happy with it 🤔

@carlosalberto
Copy link
Copy Markdown
Contributor

cc @open-telemetry/specs-entities-approvers for extra eyes

@github-actions
Copy link
Copy Markdown

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Nov 15, 2025
@github-actions github-actions bot removed the Stale label Nov 18, 2025
Following discussion so far, we can probably avoid having our home-grown
`OtelProcessCtx` and instead use the common OTEL `Resource` message.
ivoanjo added a commit to ivoanjo/sig-profiling that referenced this pull request Dec 1, 2025
This PR adds an experimental C/C++ implementation for the "Process
Context" OTEP being proposed in
open-telemetry/opentelemetry-specification#4719

This implementation previously lived in
https://github.com/ivoanjo/proc-level-demo/tree/main/anonmapping-clib
and as discussed during the OTEL profiling SIG meeting we want to add
it to this repository so it becomes easier to find and contribute to.

I've made sure to include a README explaining how to use it. Here's
the ultra-quick start (Linux-only):

```bash
$ ./build.sh
$ ./build/example_ctx --keep-running
Published: service=my-service, instance=123d8444-2c7e-46e3-89f6-6217880f7123, env=prod, version=4.5.6, sdk=example_ctx.c/c/1.2.3, resources=resource.key1=resource.value1,resource.key2=resource.value2
Continuing forever, to exit press ctrl+c...
TIP: You can now `sudo ./otel_process_ctx_dump.sh 267023` to see the context

 # In another shell
$ sudo ./otel_process_ctx_dump.sh 267023 # Update this to match the PID from above
Found OTEL context for PID 267023
Start address: 756f28ce1000
00000000  4f 54 45 4c 5f 43 54 58  02 00 00 00 0b 68 55 47  |OTEL_CTX.....hUG|
00000010  70 24 7d 18 50 01 00 00  a0 82 6d 7e 6a 5f 00 00  |p$}.P.....m~j_..|
00000020
Parsed struct:
  otel_process_ctx_signature       : "OTEL_CTX"
  otel_process_ctx_version         : 2
  otel_process_ctx_published_at_ns : 1764606693650819083 (2025-12-01 16:31:33 GMT)
  otel_process_payload_size        : 336
  otel_process_payload             : 0x00005f6a7e6d82a0
Payload dump (336 bytes):
00000000  0a 25 0a 1b 64 65 70 6c  6f 79 6d 65 6e 74 2e 65  |.%..deployment.e|
00000010  6e 76 69 72 6f 6e 6d 65  6e 74 2e 6e 61 6d 65 12  |nvironment.name.|
...
Protobuf decode:
attributes {
  key: "deployment.environment.name"
  value {
    string_value: "prod"
  }
}
attributes {
  key: "service.instance.id"
  value {
    string_value: "123d8444-2c7e-46e3-89f6-6217880f7123"
  }
}
attributes {
  key: "service.name"
  value {
    string_value: "my-service"
  }
}
...
```

Note that because the upstream OTEP is still under discussion, this
implementation is experimental and may need changes to match up with
the final version of the OTEP.
As pointed out during review, these don't necessarily exist for some
resources so let's streamline the spec for now.
ivoanjo and others added 5 commits March 13, 2026 10:11
Thanks @yannham for pointing out the previous text in the description
was ambigious and for proposing this tweak to remove the ambiguity.
We want to prevent the read of `payload` and `payload_size` being
reordered with the read of `monotonic_published_at_ns`.

Thanks @yannham for the suggestion.
ivoanjo added a commit to DataDog/dd-trace-rb that referenced this pull request Mar 16, 2026
**What does this PR do?**

This PR bumps the libdatadog dependency from version 28.0.2.1.0 to
29.0.0.1.0.

This new version brings:

* macOS build fixes needed to unblock #5351
* libdatadog support for the
  [OTel process context](open-telemetry/opentelemetry-specification#4719)
  (I plan to submit a PR with some integration for testing this
  separately -- commit is already waiting)

**Motivation:**

Adopt latest libdatadog.

**Change log entry**

Yes. Upgrade libdatadog dependency to version 29.0.0

**Additional Notes:**

N/A

**How to test the change?**

Green CI is good, as usual.
ivoanjo added a commit to DataDog/dd-trace-rb that referenced this pull request Mar 16, 2026
**What does this PR do?**

This PR adds integration tests for the
[libdatadog](DataDog/libdatadog#1658) support
for
[OTel process context](open-telemetry/opentelemetry-specification#4719).

**Motivation:**

Starting in libdatadog v29, the process context is automatically
published together with the process discovery feature, so we don't
need to do anything from the Ruby side to turn it on.

Yet, we want to make sure this is in good shape as this is a WIP
effort so having a few Ruby-level integration tests seems nice.

**Additional Notes:**

For Datadog folks -- more details on this work can be found on
 #ebpf-context-propagation on our slack.

The process context is stored as a protobuf message in a
very special memory location, hence the new test using
protobuf + the weird code to parse memory.

**How to test the change?**

Validate that new spec is passing!
gh-worker-dd-mergequeue-cf854d bot pushed a commit to DataDog/dd-trace-go that referenced this pull request Mar 20, 2026
### What does this PR do?

Update OTel process context to v2 (see open-telemetry/opentelemetry-specification#4719)

### Motivation

Bring implementation in dd-trace-go up-to-date with proposal.

### Reviewer's Checklist
<!--
* Authors can use this list as a reference to ensure that there are no problems
  during the review but the signing off is to be done by the reviewer(s).
-->

- [x] Changed code has unit tests for its functionality at or near 100% coverage.
- [ ] [System-Tests](https://github.com/DataDog/system-tests/) covering this feature have been added and enabled with the va.b.c-dev version tag.
- [ ] There is a benchmark for any new code, or changes to existing code.
- [ ] If this interacts with the agent in a new way, a system test has been added.
- [ ] New code is free of linting errors. You can check this by running `make lint` locally.
- [ ] New code doesn't break existing tests. You can check this by running `make test` locally.
- [ ] Add an appropriate team label so this PR gets put in the right place for the release notes.
- [ ] All generated files are up to date. You can check this by running `make generate` locally.
- [x] Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild. Make sure all nested modules are up to date by running `make fix-modules` locally.

Unsure? Have a question? Request a review!

JIRA [PROF-14086] 

Co-authored-by: nicolas.savoire <[email protected]>
@carlosalberto
Copy link
Copy Markdown
Contributor

carlosalberto commented Mar 24, 2026

FYI @ivoanjo Usually we need at least 4 approvals for OTEPs, so we are almost there 😃

@ivoanjo
Copy link
Copy Markdown
Author

ivoanjo commented Mar 25, 2026

As pointed by @florianl and discussed with profiling SIG on
2026-03-25 having a dedicated namespace + have it at `v1development`
gives us a bit more leeway on evolving this.
@ivoanjo ivoanjo force-pushed the ivoanjo/profiling-process-ctx branch from 78178a1 to 39a0d58 Compare March 25, 2026 16:38
Copy link
Copy Markdown
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left several suggestions, overall looks good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.