Skip to content

triage routing: context size metadata biases complexity classification for simple queries in long conversations #2228

@bug-ops

Description

@bug-ops

Summary

build_triage_prompt includes {msg_count} messages, ~{token_estimate} tokens as conversation context in the triage classification prompt. In long conversations (50+ messages from history), this causes simple queries to be misclassified as complex or expert because the LLM infers the ongoing conversation must be complex.

Root Cause

// crates/zeph-llm/src/router/triage.rs:350
format!(
    r#"...Conversation context: {msg_count} messages, ~{token_estimate} tokens\n\nUser message:\n{truncated}..."#
)

The triage model (gpt-4o-mini) sees "51 messages, ~15000 tokens" alongside a simple query like "Solve: 3+4" and may upgrade its tier estimate due to the implied ongoing complexity.

Reproduction

  1. Start a session with a long history (50+ messages — e.g., the persistent testing.toml conversation)
  2. Enable routing = "triage" with two providers
  3. Send: Solve: 3+4
  4. Observe: initial classification is tier="simple", but the same session's follow-up calls return tier="expert" or tier="medium"

Expected

Classification should be based primarily on the content of the current user message, not the accumulated conversation size. A simple arithmetic query should reliably return tier="simple" regardless of conversation length.

Actual

triage routing: chat_with_tools tier="simple"  ← initial tool call
triage routing: chat tier="expert"             ← follow-up with long history context
triage routing: chat tier="medium"

Impact

  • MEDIUM: wrong tier → wrong provider selected → cost/quality mismatch
  • Particularly affects long sessions where all queries eventually get escalated
  • Makes cost optimization unreliable for sustained use

Fix Direction

Option A: Remove msg_count/token_estimate from the triage prompt — classify only the last user message content.
Option B: Replace absolute counts with bucketed labels: short/medium/long context to reduce noise.
Option C: Add a large_context_threshold where the classification prompt skips context size for small queries.

Discovered in CI-210 (2026-03-27) during live triage routing verification.

Metadata

Metadata

Assignees

Labels

P2High value, medium complexitybugSomething isn't workingllmzeph-llm crate (Ollama, Claude)

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions