-
Notifications
You must be signed in to change notification settings - Fork 3.2k
get_or_create_agent should restore session state on LRU re-creation #7615
Description
Problem
When AgentManager::get_or_create_agent re-creates an agent for a session that was evicted from the LRU cache, the new agent starts as a blank shell — no provider, no MCP extensions, no recipe instructions. The session data is persisted on disk but get_or_create_agent doesn't restore any of it.
This affects any client using the /reply endpoint (which routes through get_or_create_agent) after LRU eviction. The first visible symptom is "Provider not set", but the agent is also missing all tools and system prompt additions.
What's lost on LRU eviction
| State | Persisted in DB? | Restored by get_or_create_agent? |
Restored by restart_agent? |
|---|---|---|---|
| Provider (name + model config) | Yes | No | Yes |
| MCP extension connections | Yes (configs) | No | Yes (re-spawned) |
| Recipe instructions (system prompt) | Yes (recipe_json) |
No | Yes |
final_output_tool (response schema) |
No | No | Yes (from recipe) |
system_prompt_override |
No | No | No |
frontend_tools |
No | No | No |
resume_agent and restart_agent both handle provider + extension restoration, and restart_agent also re-applies recipe instructions. But get_or_create_agent — the path used by /reply — does none of this.
How it happens
- Session A is fully configured (provider, extensions, recipe)
- Enough new sessions are created to push A out of the LRU (default 100, configurable via
GOOSE_MAX_ACTIVE_AGENTS) - A new
/replyarrives for session A get_or_create_agentcreates a freshAgentwith onlydefault_provider(usuallyNone)- The reply fails with "Provider not set" — or if provider happens to be set, proceeds with no tools available
Context
We hit this consistently in an internal Slack bot that manages many concurrent sessions. With GOOSE_MAX_ACTIVE_AGENTS set to 20-45 in production, LRU eviction is frequent under normal load. We've added client-side workarounds (detect the error, invalidate the stale session, re-configure, retry), but ideally get_or_create_agent would restore what it can from the persisted session data — similar to how restart_agent already does.