Skip to content

[Feature] Built-in intent drift detection for Agent Mode - prevent multi-step scope creep #9472

@HasRahm

Description

@HasRahm

Pre-submit Checks

Describe the solution you'd like?

Problem

When using Agent Mode or Oz for multi-step tasks, agents can experience cumulative intent drift - they start with a clear instruction like "refactor the auth module" and by step 6 they're modifying unrelated files, adding unexpected dependencies, or changing environment variables outside the original scope.

This happens not because the model is wrong, but because multi-step context causes gradual faithfulness erosion. Users only discover it after reviewing the full transcript.

Proposed Solution

Add optional intent drift monitoring as a built-in capability for Agent Mode runs. The idea:

  1. At the start of a multi-step task, the agent captures the user's intent as a contract (task description + constraints)
    1. After each step, the agent scores its own action against the original contract
    1. When faithfulness drops below a threshold, the agent pauses and asks the user before continuing
      This is similar to how "ci-fix" validates its own fixes by re-running CI - but applied to intent preservation across all multi-step tasks.

Prior Art

I've built an open-source implementation of this called Anchor that uses HHEM-2.1-Open (Vectara's hallucination evaluation model, Apache 2.0) to score agent actions against intent contracts.

Benchmark results (CNN/DailyMail corpus, Gemma 4 31B + HHEM-2.1-Open):

Metric Baseline + Anchor Delta
High-fidelity grounding (HHEM >= 0.95) 47.1% 56.3% +9.2%
Mean HHEM faithfulness score 0.9257 0.9307 +0.005
Hallucination rate (<0.50) 0.8% 0.7% -0.1%
  • 3:1 improvement ratio on interventions (52.9% improved vs 17.2% degraded)
    • 0% regression on IFEval (instruction following) and Tau2 (task completion)

Implementation

  • catch when the agent touches /src/database
    • Dependency-locked tasks: "Don't add new npm packages" - catch when the agent runs npm install
      • Migration tasks: "Convert all class components to hooks" - catch when the agent starts rewriting unrelated utilities
        • Security-sensitive work: "Only modify the RBAC module" - prevent accidental changes to auth tokens or secrets

Links

The skill works today via the Anchor API, but the concept could also be built natively into Agent Mode using any hallucination evaluation model.

Use Cases

  • Scoped refactoring: "Only modify files in /src/auth"

Is your feature request related to a problem? Please describe.

No response

Additional context

No response

Operating system (OS)

Windows

How important is this feature to you?

1 (Not too important)

Warp Internal (ignore) - linear-label:39cc6478-1249-4ee7-950b-c428edfeecd1

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:agentAgent workflows, conversations, prompts, cloud mode, and AI-specific UI.enhancementNew feature or request.repro:mediumThe report suggests a plausible repro path, but some uncertainty remains.triagedIssue has received an initial automated triage pass.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions