[Feature] Built-in intent drift detection for Agent Mode  - prevent multi-step scope creep

### Pre-submit Checks

- [x] I have [searched Warp feature requests](https://github.com/warpdotdev/warp/issues?q=is%3Aissue+label%3AFEATURE) and there are no duplicates
- [x] I have [searched Warp docs](https://docs.warp.dev) and my feature is not there

### Describe the solution you'd like?

### Problem

When using Agent Mode or Oz for multi-step tasks, agents can experience **cumulative intent drift** - they start with a clear instruction like "refactor the auth module" and by step 6 they're modifying unrelated files, adding unexpected dependencies, or changing environment variables outside the original scope.

This happens not because the model is wrong, but because multi-step context causes gradual faithfulness erosion. Users only discover it after reviewing the full transcript.

### Proposed Solution

Add optional **intent drift monitoring** as a built-in capability for Agent Mode runs. The idea:

1. At the start of a multi-step task, the agent captures the user's intent as a contract (task description + constraints)
2. 2. After each step, the agent scores its own action against the original contract
3. 3. When faithfulness drops below a threshold, the agent pauses and asks the user before continuing
This is similar to how "ci-fix" validates its own fixes by re-running CI - but applied to intent preservation across all multi-step tasks.

### Prior Art

I've built an open-source implementation of this called [Anchor](https://github.com/anchor-engine/anchor-sdk) that uses HHEM-2.1-Open (Vectara's hallucination evaluation model, Apache 2.0) to score agent actions against intent contracts.

**Benchmark results** (CNN/DailyMail corpus, Gemma 4 31B + HHEM-2.1-Open):

| Metric | Baseline | + Anchor | Delta |
|--------|----------|----------|-------|
| High-fidelity grounding (HHEM >= 0.95) | 47.1% | 56.3% | **+9.2%** |
| Mean HHEM faithfulness score | 0.9257 | 0.9307 | +0.005 |
| Hallucination rate (<0.50) | 0.8% | 0.7% | -0.1% |

- 3:1 improvement ratio on interventions (52.9% improved vs 17.2% degraded)
- - 0% regression on IFEval (instruction following) and Tau2 (task completion)
### Implementation
 - catch when the agent touches /src/database
 - - **Dependency-locked tasks**: "Don't add new npm packages" - catch when the agent runs npm install
 - - - **Migration tasks**: "Convert all class components to hooks" - catch when the agent starts rewriting unrelated utilities
 - - - - **Security-sensitive work**: "Only modify the RBAC module" - prevent accidental changes to auth tokens or secrets
### Links

- Oz skill PR: https://github.com/warpdotdev/oz-skills/pull/16
- - SDK: [github.com/anchor-engine/anchor-sdk](https://github.com/anchor-engine/anchor-sdk)
- - - npm: "npm install @anchor-engine/sdk"
I've already submitted an Oz skill for this as a PR to "warpdotdev/oz-skills": https://github.com/warpdotdev/oz-skills/pull/16

The skill works today via the Anchor API, but the concept could also be built natively into Agent Mode using any hallucination evaluation model.

### Use Cases

- **Scoped refactoring**: "Only modify files in /src/auth" 

### Is your feature request related to a problem? Please describe.

_No response_

### Additional context

_No response_

### Operating system (OS)

Windows

### How important is this feature to you?

1 (Not too important)

### Warp Internal (ignore) - linear-label:39cc6478-1249-4ee7-950b-c428edfeecd1

None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Built-in intent drift detection for Agent Mode - prevent multi-step scope creep #9472

Pre-submit Checks

Describe the solution you'd like?

Problem

Proposed Solution

Prior Art

Implementation

Links

Use Cases

Is your feature request related to a problem? Please describe.

Additional context

Operating system (OS)

How important is this feature to you?

Warp Internal (ignore) - linear-label:39cc6478-1249-4ee7-950b-c428edfeecd1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metric	Baseline	+ Anchor	Delta
High-fidelity grounding (HHEM >= 0.95)	47.1%	56.3%	+9.2%
Mean HHEM faithfulness score	0.9257	0.9307	+0.005
Hallucination rate (<0.50)	0.8%	0.7%	-0.1%

[Feature] Built-in intent drift detection for Agent Mode - prevent multi-step scope creep #9472

Description

Pre-submit Checks

Describe the solution you'd like?

Problem

Proposed Solution

Prior Art

Implementation

Links

Use Cases

Is your feature request related to a problem? Please describe.

Additional context

Operating system (OS)

How important is this feature to you?

Warp Internal (ignore) - linear-label:39cc6478-1249-4ee7-950b-c428edfeecd1

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions