Pre-submit Checks
Describe the solution you'd like?
Problem
When using Agent Mode or Oz for multi-step tasks, agents can experience cumulative intent drift - they start with a clear instruction like "refactor the auth module" and by step 6 they're modifying unrelated files, adding unexpected dependencies, or changing environment variables outside the original scope.
This happens not because the model is wrong, but because multi-step context causes gradual faithfulness erosion. Users only discover it after reviewing the full transcript.
Proposed Solution
Add optional intent drift monitoring as a built-in capability for Agent Mode runs. The idea:
- At the start of a multi-step task, the agent captures the user's intent as a contract (task description + constraints)
-
- After each step, the agent scores its own action against the original contract
-
- When faithfulness drops below a threshold, the agent pauses and asks the user before continuing
This is similar to how "ci-fix" validates its own fixes by re-running CI - but applied to intent preservation across all multi-step tasks.
Prior Art
I've built an open-source implementation of this called Anchor that uses HHEM-2.1-Open (Vectara's hallucination evaluation model, Apache 2.0) to score agent actions against intent contracts.
Benchmark results (CNN/DailyMail corpus, Gemma 4 31B + HHEM-2.1-Open):
| Metric |
Baseline |
+ Anchor |
Delta |
| High-fidelity grounding (HHEM >= 0.95) |
47.1% |
56.3% |
+9.2% |
| Mean HHEM faithfulness score |
0.9257 |
0.9307 |
+0.005 |
| Hallucination rate (<0.50) |
0.8% |
0.7% |
-0.1% |
- 3:1 improvement ratio on interventions (52.9% improved vs 17.2% degraded)
-
- 0% regression on IFEval (instruction following) and Tau2 (task completion)
Implementation
- catch when the agent touches /src/database
-
- Dependency-locked tasks: "Don't add new npm packages" - catch when the agent runs npm install
-
-
- Migration tasks: "Convert all class components to hooks" - catch when the agent starts rewriting unrelated utilities
-
-
-
- Security-sensitive work: "Only modify the RBAC module" - prevent accidental changes to auth tokens or secrets
Links
The skill works today via the Anchor API, but the concept could also be built natively into Agent Mode using any hallucination evaluation model.
Use Cases
- Scoped refactoring: "Only modify files in /src/auth"
Is your feature request related to a problem? Please describe.
No response
Additional context
No response
Operating system (OS)
Windows
How important is this feature to you?
1 (Not too important)
Warp Internal (ignore) - linear-label:39cc6478-1249-4ee7-950b-c428edfeecd1
None
Pre-submit Checks
Describe the solution you'd like?
Problem
When using Agent Mode or Oz for multi-step tasks, agents can experience cumulative intent drift - they start with a clear instruction like "refactor the auth module" and by step 6 they're modifying unrelated files, adding unexpected dependencies, or changing environment variables outside the original scope.
This happens not because the model is wrong, but because multi-step context causes gradual faithfulness erosion. Users only discover it after reviewing the full transcript.
Proposed Solution
Add optional intent drift monitoring as a built-in capability for Agent Mode runs. The idea:
This is similar to how "ci-fix" validates its own fixes by re-running CI - but applied to intent preservation across all multi-step tasks.
Prior Art
I've built an open-source implementation of this called Anchor that uses HHEM-2.1-Open (Vectara's hallucination evaluation model, Apache 2.0) to score agent actions against intent contracts.
Benchmark results (CNN/DailyMail corpus, Gemma 4 31B + HHEM-2.1-Open):
Implementation
Links
I've already submitted an Oz skill for this as a PR to "warpdotdev/oz-skills": feat: add anchor-drift-monitor skill - real-time intent drift detection for Oz runs oz-skills#16
The skill works today via the Anchor API, but the concept could also be built natively into Agent Mode using any hallucination evaluation model.
Use Cases
Is your feature request related to a problem? Please describe.
No response
Additional context
No response
Operating system (OS)
Windows
How important is this feature to you?
1 (Not too important)
Warp Internal (ignore) - linear-label:39cc6478-1249-4ee7-950b-c428edfeecd1
None