Skip to content

SEP-1913: Trust and Sensitivity Annotations#1913

Open
SamMorrowDrums wants to merge 11 commits intomodelcontextprotocol:mainfrom
SamMorrowDrums:sep-trust-annotations
Open

SEP-1913: Trust and Sensitivity Annotations#1913
SamMorrowDrums wants to merge 11 commits intomodelcontextprotocol:mainfrom
SamMorrowDrums:sep-trust-annotations

Conversation

@SamMorrowDrums
Copy link
Copy Markdown
Contributor

SEP: Trust and Sensitivity Annotations

Summary

This SEP proposes trust and sensitivity annotations for MCP requests and responses, enabling clients and servers to track, propagate, and enforce trust boundaries on data as it flows through tool invocations.

Motivation

As MCP adoption grows, data flows across tool boundaries without standardized trust metadata. This creates security gaps:

  1. Indirect Prompt Injection: Data from untrusted sources enters context without markers
  2. Data Exfiltration: Sensitive information can be passed to external destinations without policy enforcement
  3. Cross-Organization Boundaries: No way to mark internal vs. external data

Key Features

Annotations

  • sensitiveHint: Granular sensitivity levels (low, medium, high)
  • privateHint: Marks internal/private data
  • openWorldHint: Indicates untrusted/external data sources
  • maliciousActivityHint: Signals detected suspicious patterns
  • attribution: Provenance tracking for audit trails

Propagation Rules

  • Sensitivity escalates (never decreases) within an agent session
  • Boolean hints use union semantics (once true, stays true)
  • Attribution accumulates across context boundaries

Integration Points

Related Work

Open Questions

  1. Label namespaces for organization-specific classifications
  2. Declassification mechanisms
  3. Cross-server annotation sharing

Closes #711

/cc @dend (sponsor)

@dsp-ant dsp-ant changed the title SEP: Trust and Sensitivity Annotations SEP-1913: Trust and Sensitivity Annotations Dec 3, 2025
@localden localden self-assigned this Dec 4, 2025
Note over Web MCP: Detects prompt injection<br/>in page content
Web MCP-->>Client: Result (maliciousActivityHint: true,<br/>openWorldHint: true)

Client->>User: ⚠️ Warning: Potential malicious content detected

This comment was marked as resolved.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Should we call them MCP Server (FILE) and MCP Server (HTTP)

Although, it is kind of implied hence nit.

User->>Client: "Summarize this webpage"
Client->>Web MCP: tools/call (fetch URL)

Note over Web MCP: Detects prompt injection<br/>in page content
Copy link
Copy Markdown

@realArcherL realArcherL Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also highlight that this is best opportunity for servers to apply any preventative measures against indirect prompt injection (ex: Spotlighting, Prompt Sandwich etc)?

For example: Server applies Spotlighting and marks the data along with additional instruction. reference

OR do we want clients to deal with it, since the real attack of prompt injection(s) begin with LLMs?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't read it fully, but it seems like this is just a notification mechanism, and this should be, maybe, a new field inside the schema for suggestion mitigation if the server wants to do it.

@localden localden added the draft SEP proposal with a sponsor. label Jan 21, 2026
@SamMorrowDrums
Copy link
Copy Markdown
Contributor Author

SamMorrowDrums commented Jan 22, 2026

@localden, @rreichel3 (Open AI) is seeking to co-author this SEP as they see significant value for MCP Apps, and want to ensure that it does what they need, especially with respect to consequences of tool calls (such as being irreversible), would you be happy to also take a look at Robert's PR?

  • Open AI are in a unique position to require adoption of certain spec features for inclusion in their app store, which I think would be a boost
  • I also think Robert's ideas are cool and showcase the potential for this proposal.

He's going to get Nick Cooper to take a look also.

@SamMorrowDrums
Copy link
Copy Markdown
Contributor Author

@localden @nickcoai I merged @rreichel3's PR so now have co-author.

@SamMorrowDrums SamMorrowDrums marked this pull request as ready for review January 28, 2026 20:33
@SamMorrowDrums SamMorrowDrums requested a review from a team as a code owner January 29, 2026 13:54
SamMorrowDrums and others added 11 commits January 30, 2026 16:01
Introduces trust and sensitivity annotations for MCP requests and responses,
enabling clients and servers to track, propagate, and enforce trust boundaries
on data as it flows through tool invocations.

Key features:
- Result annotations: sensitiveHint, privateHint, openWorldHint, maliciousActivityHint, attribution
- Request annotations for propagating trust context
- Propagation rules ensuring sensitivity markers persist across agent sessions
- Integration with Tool Resolution (modelcontextprotocol#1862) for pre-execution annotations
- Per-item annotations for mixed results (e.g., search results)
- Defense-in-depth approach complementing tool-level annotations

Closes modelcontextprotocol#711
… type

- Extend existing ToolAnnotations with trust fields (privateHint, sensitiveHint, etc.)
- Leverage existing openWorldHint with refined meaning per context
- Remove per-item annotations (response-level aggregation only)
- Remove _meta nesting - trust annotations live in flat annotations field
- Add Alternative 1 explaining why separate type was rejected
- Update Tool Resolution integration to use flat annotations
- Rename DRAFT-trust-annotations.md to 1913-trust-and-sensitivity-annotations.md
- Update header to match SEP-1850 template format (dash-prefixed list)
- Add full PR URL
- Move issue reference to note below header
- Regenerate SEP documentation for docs site
- **User consent** cannot be meaningfully enforced without knowing a tool's real-world impact.
- **Distrust by default** leads to confirmation fatigue and bad user experience.

Action security metadata provides a declarative contract that describes where inputs go, where outputs originate, and what outcomes the tool can cause. This complements trust annotations, which track data characteristics in transit.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action security metadata provides a declarative contract that describes where inputs go, where outputs originate, and what outcomes the tool can cause. This complements trust annotations, which track data characteristics in transit.

Just for my understanding. Suppose my mcp is hosted inside a cluster as a pod and it needs egress to my internal service or maybe external, why do I enforce the security rule for data flow inside code running in that pod(I mean at protocol level), shouldn't I do it at infra(egress) level?

where inputs go, where outputs originate

I mean, shouldn't it be controlled at the infra level, not the protocol level? Since LLM clients are not deterministic, shouldn't we enforce security rules deterministically?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annotations are handled by clients, not LLMs themselves, so deterministic policy enforcement is exactly the sort of thing this could enable.


Indicates the origin of returned data.

- **untrustedPublic** — Public but unverified sources.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are enterpise setup allowing untrustedPublic? There must have been a check at the egress controller , whatever the company is using.

@connor4312
Copy link
Copy Markdown
Contributor

connor4312 commented Feb 4, 2026

  • maliciousActivityHint I have some concerns about this:

    • This is returned in tools/resolve which happens, in theory, before the actual tool execution happens. If I have a fetch_webpage tool, a server won't know if the response is potentially malicious before actually doing the fetch. It could in theory pre-fetch and cache the result, but that requires statefulness and also breaks the notion that "Resolution requests should complete in milliseconds" from SEP-1862
    • As a client this is not maximally useful to present warnings to users. Tool results size is unbounded. In my vision of strong injection/malicious detection in VS Code, we would use a model and highlight portions of the tool result which were flagged as concerning for potential manual review. The boolean hint just says something is wrong, without letting me give any better UX to users.
    • Generally speaking from the view of a client, I'm not going to trust the implementation of malicious content detection of random MCP servers. We will, at some point, do something in-product for this in VS Code. That will be tested, benchmarked, and controlled by user preference and organization policy. I might use maliciousActivityHint as a hint to give more or less scrutiny to content an MCP server returns, but nothing more.
  • Same tools/resolve concern for other hints. I think these would better belong on the Annotations which are associated with each ContentBlock in the result. That would also let you naturally be able to give ranges to which given annotations apply (byte offsets or code points, depending on the content type)

  • InputMetadata/ReturnMetadata seem okay. I would not that unlike maliciousActivityHint, I would be able to trust these as a client. The server is an authorized entity of whatever service it's representing, e.g. emails, and so I'm okay using its categorization of sources/destinations/outcomes. I think these metadata are generally fine but I am not an expert in the regulartory/data classification area.

  • RequestAnnotations.attribution -- as a client I don't think I can represent this very well. It can both be too comprehensive and also incomplete:

    • I don't know which resources the model synthesized into a given tool call, so I would have to present every resource/annotation I encountered in the conversation, which does not seem useful.
    • I don't know every resource and data encountered in a conversation. E.g. a model can use a terminal tool my client doesn't specifically recognize and that could pull in data from any number of unknown sources. Or to give another example, a previous agent session may have generated a file as intermediate content derived from any number of sources, and a new session that pulls it in would see 'just a file.'

@sep-automation-bot
Copy link
Copy Markdown

Maintainer Activity Check

Hi @localden!

You're assigned to this SEP but there hasn't been any activity from you in 19 days.

Please provide an update on:

  • Current status of your review/work
  • Any blockers or concerns
  • Expected timeline for next steps

If you're no longer able to sponsor this SEP, please let us know so we can find another maintainer.


This is an automated message from the SEP lifecycle bot.

@SamMorrowDrums
Copy link
Copy Markdown
Contributor Author

About to propose an Annotations working group that will look at this amongst others.

@SamMorrowDrums
Copy link
Copy Markdown
Contributor Author

@connor4312 agreed on most of that. With regards to:

I don't know every resource and data encountered in a conversation. E.g. a model can use a terminal tool my client doesn't specifically recognize and that could pull in data from any number of unknown sources. Or to give another example, a previous agent session may have generated a file as intermediate content derived from any number of sources, and a new session that pulls it in would see 'just a file.'

I think this feature will not be one coding agents care about, but agents with no terminal access that access medical records for example, might well want a complete account of all records accessed for example (HIPAA compliance etc.), and also might want to build agents that will not allow for mixing of records from say a different identity in the same session.

So I think that you are right to flag this is a niche feature, but one that people trying to do this stuff can use for example for https://github.com/mcp-security-standard/mcp-server-security-standard

Copy link
Copy Markdown

@JustinCappos JustinCappos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really interesting proposal!

My biggest comment: I'm a little worried about how usable and how useful this is. I don't know if different people creating the same tool would use the same security annotations. I feel the categories are quite squishy / broadly defined at times and as such, I wonder how clients would use this. It feels like clients will want to err on the side of using dangerous tools instead of blocking tools and that might neuter much of the usefulness here, but this is just a gut reaction. I think having a small, informal study of a few people would give you some ammo to argue for usefulness and consistency.

This pattern enables:

- **Deterministic enforcement** through declarative contracts for tool behavior
- **Data exfiltration prevention** by tracking when sensitive data flows to open-world destinations
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you really do this in situations where it might be transformed by some other step? What assumptions do we make about the destination of this? How


- **Deterministic enforcement** through declarative contracts for tool behavior
- **Data exfiltration prevention** by tracking when sensitive data flows to open-world destinations
- **Prompt injection defense** by marking untrusted data sources
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a binary (labeled / not labeled) likely sufficient for this? Would something indicating what is data vs instructions be better?

Copy link
Copy Markdown
Contributor Author

@SamMorrowDrums SamMorrowDrums Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of agree and disagree at the same time, because you could just as easily be asking an agent to work on a task from a public source as you could be pulling down public data.

I don't know how I'd define the distinction in a way that wouldn't require pre-labelling.

Then we get to the scenario where as an example a GitHub issue might be instructions in one context, but data in another where you are looking into why a change was made for example.


**1. Indirect Prompt Injection**

Data from untrusted sources (web pages, emails, user-generated content) enters the context without markers indicating its origin. An attacker can embed instructions in this data that the model may execute.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to stop prompt injection, would it be better to instead have these be labeled as coming from email (i.e. data not code/prompts)? I don't really understand how to label github vs email with a trust level of high, low, medium, etc.


**2. Data Exfiltration**

Sensitive information (credentials, PII, proprietary data) can be passed to tools that write to external destinations. Without declared data classifications and action metadata, clients cannot enforce policies like "don't email private repo content to external addresses."
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you track flow in the system given data with a label? How do you associate this with a later LLM output?


**4. Compliance Requirements**

Regulated industries (healthcare, finance) need audit trails and sensitivity classifications. Without standardized annotations, each implementation reinvents this wheel.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you thinking about in-toto attestations or some other cryptographically verifiable means for this?


`DataClass` keeps sensitivity simple for common cases while allowing regulated data to be scoped. The `regulated` form declares applicable regimes; it does not assert compliance.

`RegulatoryScope` accepts arbitrary strings. The following are suggested examples for common regimes: GDPR, CCPA, HIPAA, GLBA, PCI-DSS, FERPA, COPPA, SOX.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't you also want to indicate versions and specific parts of the scope which are implicated? These also often will touch on a huge array of these in different places. How should lists of these be handled? Do these need to be normalized in some way?

- **system** — Data is stored by the platform and not accessible to users or developers.
- **user** — Data is stored and visible only to the end user.
- **internal** — Data is stored and visible to a restricted internal audience.
- **public** — Data may be transmitted to or stored in publicly accessible systems.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what these mean in some cases. Is something visible to my friends on Facebook public? Is data about my taxes which is not individually identifiable but which goes into statistics on a website listed as public or???


Describes the real-world impact of invoking the tool.

- **benign** — No persistent state change outside the tool's execution context, or changes limited to private drafts that are not transmitted or shared.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The items in this section feel hard to reason about.

If I make a change which doesn't really matter (e.g., increment a counter of visitors), this is not very impactful, but is technically irreversible, since I have now way to undo it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


#### Why Not Information Flow Control (IFC) Labels?

@JustinCappos [suggested](https://github.com/modelcontextprotocol/modelcontextprotocol/issues/711#issuecomment-2967516811) IFC-style categorical labels instead of linear sensitivity. This is a valid approach with tradeoffs:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should note that my thinking about if IFC should be used anywhere in this space has evolved since then. 😄 I don't think IFC is a great fit, but I do think it does fit relatively well if you use annotations, so I guess the comment still stands.

- Missing annotations treated as unknown (not as "safe")
- Clients should apply appropriate defaults for unlabeled data
- No enforcement happens without annotation support

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to see a quick and informal study where you get 5 people to apply annotations to the same set of MCP tools and see if their annotations are the same or vary.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your assumptions are valid @JustinCappos but I don't think developer-side self-enforcement is the goal.

The value in the spec change is that it opens the door to downstream MCP Registry providers and marketplaces doing their own enforcement against a standardized interface and set of criteria.

For example: if you want to get the 'Verified' badge in bestmcpmarketplace.example.com you have to submit your server and documentation for review, and the bestmcpmarketplace team can check that your annotations are valid.

That said, I think more detail on the criteria would be a welcome enhancement to this SEP and the spec.

@kapil8811
Copy link
Copy Markdown

Hi Sam, Great work on SEP. I am a Senior Software Engineer and currently working in the similar domain, I was also prototyping data classification tooling for MCP and then I checked this PR. The unified approach indeed solves for trust annotations, security metadata. I would love to contribute any which way I can. I am going to start with a usability study by creating a spec-based implementation. I will share the results within 1 week. I hope that helps.

@kapil8811
Copy link
Copy Markdown

kapil8811 commented Mar 1, 2026

Hi @sam, I have done an LLM based study where I curated the list of scenarios and tested it with various LLM models to validate the spec. Here's the results:

<file link removed>

I have a ref SDK implementation ready, and I am currently testing the spec enhancement. I am looking for volunteers to participate once I have enough folks ready to participate, I will share the usability data.

sensitivity: DataClass | DataClass[];
}

type DataClass =
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we also add internal here? we can put it under regulated scope but putting it under regulated scope in itself is little ambiguous.

// ... existing fields ...

// Trust extensions
maliciousActivityHint?: boolean;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how are we going to use the maliciousactivityhint. Detection of the response by the server to identify malicious activity can be questionable as the output may or may not be accurate all the time. as part of the tool output shall we also provide more context around this by providing maliciousActivityDetail, which can contain the type (prompt injection[harmful content, XPIA, UPIA], data leak, PII not anonymize, credential leak), description(reasoning) and cursor to the location of the output, severity etc.

2. Clients **MAY** enforce basic policies (block, escalate, require confirmation)
3. Clients **SHOULD** surface `maliciousActivityHint` to users
4. Clients **MAY** present attribution data as part of confirmation dialogs
5. Clients **SHOULD** consider action security metadata for policy and consent decisions, and **MUST** treat all annotations as untrusted hints unless the server is trusted.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are clearly stating that the client may or may not take actionable steps on the server-side trust annotations, but that raises one question do we need to have a feedback loop where the client's finding can be added in the propagation pipeline. In case where the client runs additional evaluators and found some actionable insights, we can't feed them to the server.
Right now the only options is to override the response metadata sensitivity labels but that may break the trust model as it might result in provenance loss on sensitivity labels. Any potential monotonic change in sensitivity labels are a good signal for the server tools which can be actionable in nature.

@sep-automation-bot

This comment was marked as outdated.

@localden
Copy link
Copy Markdown
Contributor

localden commented Mar 11, 2026

Thank you @SamMorrowDrums and @rreichel3 for the thorough work on this SEP and for being responsive to the feedback that's already come in.

I got a chance to noodle a bit on the SEP, and the overall problem framing is fairly strong. A few areas where I think this aligns well with general MCP expectations:

  • This SEP addresses a very much real structural gap - MCP has no universal data classification primitive. There's a bit of tension with the "Capability over compensation" principles, but let's shelve that for now.
  • Everything is technically optional, so graceful degradation is possible.
  • The SEP doesn't try to mandate a policy engine built from the ground-up (probably the biggest worry I had initially), although I do worry about how annotations are interpreted consistently.

That being said, there are a few concerns.

This SEP adds a few schema modifications and a thorny array-or-scalar polymorphism on enum fields. If the taxonomy turns out to be wrong, I worry that we can't remove it or easily modify it. Can we do a potential narrower first cut at the set of proposed hint conventions to validate the core assumptions of this proposal?

The more I think about it, maybe extensions is the place to test this approach before committing it to the spec if this requires a larger schema overhaul. That would help clarify some use cases and a concrete thing to adopt, and give us real signal on whether the enum boundaries hold up.

As far as I can tell, this SEP doesn't have a reference implementation yet. I'm not super confident that the DataClass taxonomy generalizes well beyond some basic examples, and I would actually echo @JustinCappos' concern here that a few people annotating the same tool might disagree on scenarios like whether outcomes: irreversible vs consequential applies. You can have an implementer state that their tool is not at the same level of "danger" as a user interpreting it. How do we rationalize those and provide a good enough framework to make this consistently predictable for customers that will ultimately be exposed to these hints (albeit indirectly)?

A simpler hint type maybe could go into the spec directly, but I'd like to get some client implementer feedback first.

Do we have any strong pull from real-world scenarios that this kind of annotations can work at scale? Given your work at GitHub, curious how that is applicable more broadly.

Lastly, I would be concerned with some of the self-declared nature of these hints creating potentially wrong expectations around the tool safety. If you have a poorly configured (or worse, malicious) MCP server, will this create a false sense of security?

@sep-automation-bot
Copy link
Copy Markdown

Maintainer Activity Check

Hi @localden!

You're assigned to this SEP but there hasn't been any activity from you in 19 days.

Please provide an update on:

  • Current status of your review/work
  • Any blockers or concerns
  • Expected timeline for next steps

If you're no longer able to sponsor this SEP, please let us know so we can find another maintainer.


This is an automated message from the SEP lifecycle bot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

draft SEP proposal with a sponsor. security SEP

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[SPEC] Annotations for MCP Requests and Responses (security/privacy)

10 participants