[Proposal] Adding OpenTelemetry Trace Support to MCP #269
Replies: 14 comments 21 replies
-
|
observed 👀 |
Beta Was this translation helpful? Give feedback.
-
|
Hi @altryne, Solid proposal! Adding standardized OTel tracing is definitely the right move for production MCP. Building the Ithena Governance SDK ( My thoughts on the specifics:
This fits perfectly with the approach in the Ithena SDK. We already handle propagating the incoming trace context (using Really valuable direction for the protocol. Happy to share insights from the governance/observability layer perspective as this moves forward. |
Beta Was this translation helpful? Give feedback.
-
|
This takes a different approach to telemetry than is typical for OpenTelemetry where each node sends its telemetry to the back end via its own export mechanism. Why should MCP be any different from other distributed systems such as http API servers where that system is the norm. It also has the potential to expose internal implementation details which can be both security and competitive concerns as part of the output. I agree that MCP needs telemetry support, but I look at it as another RPC mechanism and so should be modelled as such. What I think is missing is: The C# SDK has a first round of OTel support: modelcontextprotocol/csharp-sdk#183 |
Beta Was this translation helpful? Give feedback.
-
|
Speaking from OpenTelemetry side, there is no need to wrap OTLP into MCP, just instrument the server and let it send OTLP to user-specified endpoint. |
Beta Was this translation helpful? Give feedback.
-
This should be the title. The current title makes it sound much broader. @samsp-msft and @lmolkova are right. #246 in particular is needed even in this proposal, spans still need the right context even if the client is responsible for sending them to their 'final' destination. With that, nothing else is needed to have full observability. Configuring OTel exporters in the servers isn't that hard. I can see how it could be useful for the SDKs to take care of exporter configuration so that you only need to configure once in the client. It's also possible that you want to export spans somewhere that's inaccessible to the servers, e.g. you may have a locally hosted OTel backend and remote servers. But it needs to be clearer that this is what's being proposed and why. There are probably also other ways to achieve the same goal. I can imagine generic OTel components that could be reused in other similar scenarios instead of building something specific to MCP. You're basically just using the MCP client as an OTel collector/proxy. |
Beta Was this translation helpful? Give feedback.
-
|
I thought about this approach too, but I agree with @samsp-msft that it seems like a mistake.
Overall I think there are two cases:
Let's concentrate on #246 and making logs better if required as a first step. |
Beta Was this translation helpful? Give feedback.
-
|
TL;DR; focus first on propagation. release that and get experience with it. then, narrow otel bits possibly in the otel org. My 2p is focus as a start on propagation and don't assume a specific instrumentation approach or data layout, or constrain to a specific signal like trace, metrics or logging. In other words, make it first possible to correlate/join a trace. this solves the most important part and other things can follow after practice. For example, what's currently called traceToken, you can use that field or add headers, or add a specific field for w3c traceparent header. An instrumented SDK can then inject and extract the trace context from that header, placing it as the current span. Not only does this keep things simpler, but it also provides a path for anything that supports the w3c propagation spec, but isn't strictly otel. This would include other open source projects like zipkin or vendor SDKs. This inject/extract part has limited overhead and also doesn't require a specific model to be applied. The highest impact is you can achieve the same trace today when converting a local function to one split over stdio. Note: We don't need MCP clients to become otel collectors as with some configuration, stdio subprocesses can inherit the same auto-instrumentation as the parent, either directly or implicitly. For example, if you run node like this, the resulting subprocess will get the same auto-instrumentation hooks as the parent. # using elastic distribution of otel, but I think this works with normal also
node --env-file .env -r @elastic/opentelemetry-node index.js// MCP server is a subprocess, and we want all arguments given to node to
// be visible, notably anything like '-r @elastic/opentelemetry-node'.
const transport = new StdioClientTransport({
command: process.execPath,
args: [...process.execArgv, ...process.argv.slice(1), SERVER_ARG],
}); |
Beta Was this translation helpful? Give feedback.
-
|
Hey @altryne, just checked out your OTel tracing proposal for MCP. Interesting approach to have servers send trace spans back to clients! I've been implementing MCP tools with observability lately, and this could really solve the "black box" problem we're all facing. I see both sides of this debate. On one hand, the standard OTel approach where each component exports its own telemetry makes sense for traditional distributed systems. But for MCP, where servers are essentially "tool calls" in a client's workflow, having the client stitch together the full trace feels more natural. The concern about exposing internal details is valid, but servers could control what spans they expose. I'm thinking this could start with basic timing data and gradually add more detail as needed. Have you considered a hybrid approach? Maybe keep the context propagation from #246 but also allow this optional span return mechanism for servers that want deeper integration with client observability tools? Either way, getting proper tracing into MCP is crucial as we build more complex agentic workflows. Nice to see this getting attention! |
Beta Was this translation helpful? Give feedback.
-
|
Hi all, for reference Dagger implements both MCP and OTEL, for full observability of tools. It works great and required no extension of either protocol. IMO if you want a unified trace across LLM and tools (to see the context around the tool call) then you should unify your observability stack across MCP client and server. This can be done at the runtime layer: either by literally having the same runtime on both sides (eg. agent SDKs and frameworks); by executing stdio mcp servers with injected otel collector configuration; or by configuring your mcp clients & remote mcp servers to send to the same otel collectors, then reconcile context when rendering traces. |
Beta Was this translation helpful? Give feedback.
-
|
(Placeholder) will be writing a response to this in the next 24hours on behalf of Comet. We welcome OTEL support but need to be concious if the mechanics. Thanks for bringing this up 🙏 |
Beta Was this translation helpful? Give feedback.
-
|
Clean core type definitions. Looks like a solid foundation for the protocol. Great direction. |
Beta Was this translation helpful? Give feedback.
-
|
There's already been plenty of discussion on security concerns - I'll mostly leave that to the respective comment threads. What I'd like to add on top of that is that having the client aggregate and emit traces from servers seems unnecessary compared to standard OTel patterns, if servers are already propagating trace IDs for association purposes (the focus of #246). This proposal makes some amount of sense if the end-user happens to also control all of the MCP servers in use, which is more or less the status quo. I don't believe that will continue to be the case in the future, however - clients and servers will regularly interact with peers outside of their trust boundary that they do not want to share raw traces with, and they may still want observability relative to other nodes within their trust boundary. What this proposal would force servers to do is to make a binary decision to either emit or not emit spans to all clients, and make clients responsible for sending those aggregated spans to a collector. Instead, if #246 is adopted, servers will have to individually send spans to a collector (the usual OTel pattern), which makes more sense when the end-user does not control all MCP servers in use. Each server or client owner would be responsible for sending spans to their own collector, ensuring that data isn't inadvertently leaked. The key detail I want to highlight here is that when the end-user does own all servers they're using, these two models behave in the same way. The end-user will still have their own span collector that they would individually point all of their servers to, enabling the same degree of observability at the cost of slightly more configuration (an extra environment variable on every server, perhaps). |
Beta Was this translation helpful? Give feedback.
-
I think this is a bit of a handwave for two reasons:
I can easily see this turning into a situation where many MCP servers will wind up having two parallel OTel systems for sending spans to clients versus internal collectors to reconcile those differences. Not only would that be error-prone for server implementors, it (again) raises the question of what value this proposal adds over #246. |
Beta Was this translation helpful? Give feedback.
-
|
traceparent and baggage is the way to go here |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
-
Pre-submission Checklist
Your Idea
Abstract
This proposal outlines a mechanism to integrate OpenTelemetry (OTel) tracing capabilities into the Model Context Protocol (MCP). The goal is to address the "black box" nature of MCP server interactions within agentic workflows, enabling end-to-end observability for debugging, performance analysis, and system understanding. We propose adding a standardized way for MCP servers to emit OTel trace spans back to the calling MCP client, leveraging the existing protocol structure for notifications while adhering to open standards and maintaining semantic clarity between different observability signals.
Motivation
As MCP gains traction as the potential "HTTP for agents," the complexity of applications built upon it will increase. Agent developers rely heavily on observability tools (like Weights & Biases Weave, LangSmith, Braintrust, Arize Phoenix etc.) to understand and debug the flow of execution, especially within complex chains involving multiple tool calls and LLM interactions.
Currently, when an agent (MCP client) invokes a tool or resource via an MCP server, the internal operations of that server are opaque. The client only sees the request and the final response. This lack of visibility presents significant challenges:
This proposal aims to solve these issues by establishing a standard, OpenTelemetry-based mechanism for MCP servers to report their internal trace information back to the client, enabling full-stack observability for agentic applications. This aligns with the goal of making MCP a robust and production-ready protocol for the growing agent ecosystem.
Diagram
Proposal Details
We propose extending MCP to support the transmission of OpenTelemetry trace data from the server back to the client.
Standard: OpenTelemetry (OTel) will be the standard for representing observability data due to its vendor-neutrality, widespread adoption, and rich semantic conventions (including evolving conventions for GenAI). This proposal focuses initially on Traces. Support for Metrics could be considered in future proposals.
Transmission Mechanism: Trace data generated by the server during the execution of a client request (e.g.,
tools/call,resources/read) will be sent back to the client via MCP notifications. This aligns with MCP's existing mechanism for asynchronous server-to-client communication, such as thenotifications/messageused for logging. This approach is preferred over requiring servers to push data directly to an OTel collector because:New Notification Type (Rationale): OpenTelemetry defines three primary observability signals: Logs, Metrics, and Traces. MCP already supports structured logging via the
notifications/messagemechanism. While technically possible to overload this existing notification for trace data (see Alternative/Interim Mechanism below), we propose adding a new, dedicated notification type specifically for OTel trace data:notifications/otel/tracetraceToken: The token provided by the client in the original request's_metafield. This MUST be included if the server is sending traces correlated to a specific client request that included atraceToken.resourceSpans: An array of OTel ResourceSpans objects, serialized according to the OTel OTLP/JSON format. (Exact schema details based on OTLP/JSON)Justification for a Dedicated Type:
notifications/otel/metrics) or a richer OTel Log format (notifications/otel/logs) become desirable later.Example Notification Payload (Conceptual OTLP/JSON):
{ "jsonrpc": "2.0", "method": "notifications/otel/trace", "params": { "traceToken": "client-req-abc-789", // Echoed from the originating request "resourceSpans": [ { "resource": { /* OTel Resource Attributes */ }, "scopeSpans": [ { "scope": { /* OTel InstrumentationScope Attributes */ }, "spans": [ { "traceId": "a1b2c3d4...", // Hex encoded "spanId": "e5f6a7b8...", // Hex encoded "parentSpanId": "c9d0e1f2...", // Optional, Hex encoded "name": "internal_api_call", "kind": "SPAN_KIND_CLIENT", "startTimeUnixNano": "1678886400123456789", "endTimeUnixNano": "1678886400987654321", "attributes": [ /* OTel Attributes */ ], "status": { /* OTel Status */ } // ... other OTel Span fields } // ... more spans from this scope ] } // ... more scopeSpans ] } // ... more resourceSpans (though likely just one per notification) ] } }Alternative/Interim Mechanism (using Logging): As mentioned, MCP currently supports structured logging via
notifications/message. It is technically possible to transmit OTel span data using this existing mechanism by encoding the span data within thedatafield, perhaps using a specificloggername (e.g.,otel_trace).notifications/otel/tracemechanism for long-term clarity, standardization, and alignment with OTel principles.Trace Correlation & Stitching (Using traceToken):
To reliably associate server-side trace notifications with the specific client request that triggered them, especially in concurrent scenarios, we propose to adapt the existing progressToken pattern.
A client wishing to receive correlated trace data for a specific request (e.g., tools/call, resources/read) MUST include a traceToken field within the _meta object of that request's params.
Example Client Request with traceToken:
{ "jsonrpc": "2.0", "id": 123, "method": "tools/call", "params": { "_meta": { "traceToken": "client-req-abc-789" // Client-generated unique token // "progressToken": "client-prog-xyz-123" // Could also exist }, "name": "my_tool", "arguments": { /* ... */ } } }If a server supports the
otel.tracescapability and receives a request containing a traceToken, it SHOULD attempt to generate and send trace data related to that request's execution.Any
notifications/otel/tracemessages sent by the server that directly result from processing that specific request MUST include the identical traceToken value in their params.The client uses the received
traceTokenin the notification to unambiguously associate the contained OTel spans with the correct originating client request and its corresponding client-side span.If a client does not include a
traceTokenin its request, a server supportingotel.tracesMAY still emit trace notifications (e.g., for background server activity), but these notifications MUST NOT include a traceToken and cannot be directly correlated by the client using this mechanism.New Server Capability: Servers supporting this feature MUST declare a new capability during initialization:
{ "capabilities": { // ... other capabilities "otel": { "traces": true // Indicates support for sending trace notifications // "metrics": false, // Future placeholder // "logs": false // Future placeholder } } }Clients can check for
capabilities.otel.traces === trueto know if a server might send these notifications.Schema Changes
Add the following to the
definitionssection of the MCP JSON schema:(Note: Fully defining the OTLP/JSON structure within the MCP schema might be overly verbose. Alternatively, the schema could simply state
paramsis an object conforming to the OTLP/JSON ResourceSpans specification and link to it.)Use Case / User Story
As Sarah, an Agent Developer, I'm debugging my customer support agent. It uses an MCP-based Notion tool to fetch KB articles. Users report intermittent slowness.
Without MCP Observability: My Weave trace shows a long duration for the
tools/callto the Notion tool, but I don't know why it's slow. Is it network latency to the tool? Slow database queries within the tool? An inefficient internal function?With MCP Observability (this proposal):
otel.tracescapability.tools/call, the Notion server generates internal OTel spans (e.g.,notion_api_request,process_results).notifications/otel/trace.tools/callnow expands to show the nested server-side spans received from the Notion tool.notion_api_requestspan within the server took 90% of the time.Security Considerations
Backwards Compatibility
This proposal is additive:
otel.tracescapability will ignore the related messages and capability flags.Alternatives Considered
notifications/message.notifications/otel/traceprovides better long-term clarity and structure. Considered viable only as a temporary workaround or for initial proofs-of-concept.Open Questions / Future Work
notifications/otel/trace(OTLP/JSON recommended). Confirming full schema definition vs. referencing external spec.tools/call,resources/read).notifications/otel/metrics,notifications/otel/logs).References
Scope
Beta Was this translation helpful? Give feedback.
All reactions