triage routing: context size metadata biases complexity classification for simple queries in long conversations

## Summary

`build_triage_prompt` includes `{msg_count} messages, ~{token_estimate} tokens` as conversation context in the triage classification prompt. In long conversations (50+ messages from history), this causes simple queries to be misclassified as `complex` or `expert` because the LLM infers the ongoing conversation must be complex.

## Root Cause

```rust
// crates/zeph-llm/src/router/triage.rs:350
format!(
    r#"...Conversation context: {msg_count} messages, ~{token_estimate} tokens\n\nUser message:\n{truncated}..."#
)
```

The triage model (gpt-4o-mini) sees "51 messages, ~15000 tokens" alongside a simple query like "Solve: 3+4" and may upgrade its tier estimate due to the implied ongoing complexity.

## Reproduction

1. Start a session with a long history (50+ messages — e.g., the persistent testing.toml conversation)
2. Enable `routing = "triage"` with two providers
3. Send: `Solve: 3+4`
4. Observe: initial classification is `tier="simple"`, but the same session's follow-up calls return `tier="expert"` or `tier="medium"`

## Expected

Classification should be based primarily on the content of the current user message, not the accumulated conversation size. A simple arithmetic query should reliably return `tier="simple"` regardless of conversation length.

## Actual

```
triage routing: chat_with_tools tier="simple"  ← initial tool call
triage routing: chat tier="expert"             ← follow-up with long history context
triage routing: chat tier="medium"
```

## Impact

- MEDIUM: wrong tier → wrong provider selected → cost/quality mismatch
- Particularly affects long sessions where all queries eventually get escalated
- Makes cost optimization unreliable for sustained use

## Fix Direction

Option A: Remove `msg_count`/`token_estimate` from the triage prompt — classify only the last user message content.
Option B: Replace absolute counts with bucketed labels: `short/medium/long context` to reduce noise.
Option C: Add a `large_context_threshold` where the classification prompt skips context size for small queries.

Discovered in CI-210 (2026-03-27) during live triage routing verification.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

triage routing: context size metadata biases complexity classification for simple queries in long conversations #2228

Summary

Root Cause

Reproduction

Expected

Actual

Impact

Fix Direction

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

triage routing: context size metadata biases complexity classification for simple queries in long conversations #2228

Description

Summary

Root Cause

Reproduction

Expected

Actual

Impact

Fix Direction

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions