-
-
Notifications
You must be signed in to change notification settings - Fork 69.2k
feat: add model fallback support for /compact (compaction) #14543
Description
Problem
When the primary model is overloaded (e.g. 500 new_api_error: "负载已经达到上限"), /compact fails immediately with no recovery path. Unlike the chat reply pipeline which has runWithModelFallback() (retry + fallback to configured alternative models), the compaction path calls session.compact() directly with no retry or fallback logic.
This means compaction is the only user-facing LLM operation that has zero fault tolerance — chat replies, tool calls, and image generation all have fallback support, but compaction does not.
Current Behavior
- User runs
/compact compactEmbeddedPiSessionDirect()creates a session with the primary model- Calls
session.compact(customInstructions)→ API returns 500 (overloaded) - Error is caught by the outer
catch→ returns{ ok: false, reason: "..." } - User sees:
⚙️ Compaction failed: 500 {"error":{...}}
No retry, no fallback.
Expected Behavior
Compaction should have the same resilience as chat replies:
- Retry once on transient/overload errors (same model, short delay)
- Fallback to
agents.defaults.model.fallbacksif the primary model is consistently unavailable
Why This Is Non-Trivial
The compaction function (compactEmbeddedPiSessionDirect) creates the full agent session (auth, tools, system prompt, session manager, etc.) before calling session.compact(). To use a fallback model, the entire session setup needs to be re-executed with different model parameters — it's not a simple "swap the model and retry" situation.
A possible approach:
- Extract the model-dependent session setup into a helper
- Wrap it with
runWithModelFallback()or a similar pattern - Pass fallback candidates from
agents.defaults.model.fallbacks
Workaround
We're currently using a local patch that wraps session.compact() with a retry loop (up to 2 retries with incremental delay). This handles transient overloads but doesn't support model fallback.
Related
- PR feat(agents): retry empty-stream once before fallback #13820 — adds empty-stream retry + overload classification for the chat reply pipeline
- The chat reply path uses
runWithModelFallback()which handles both retry and fallback elegantly