Skip to content

get_or_create_agent should restore session state on LRU re-creation #7615

@wpfleger96

Description

@wpfleger96

Problem

When AgentManager::get_or_create_agent re-creates an agent for a session that was evicted from the LRU cache, the new agent starts as a blank shell — no provider, no MCP extensions, no recipe instructions. The session data is persisted on disk but get_or_create_agent doesn't restore any of it.

This affects any client using the /reply endpoint (which routes through get_or_create_agent) after LRU eviction. The first visible symptom is "Provider not set", but the agent is also missing all tools and system prompt additions.

What's lost on LRU eviction

State Persisted in DB? Restored by get_or_create_agent? Restored by restart_agent?
Provider (name + model config) Yes No Yes
MCP extension connections Yes (configs) No Yes (re-spawned)
Recipe instructions (system prompt) Yes (recipe_json) No Yes
final_output_tool (response schema) No No Yes (from recipe)
system_prompt_override No No No
frontend_tools No No No

resume_agent and restart_agent both handle provider + extension restoration, and restart_agent also re-applies recipe instructions. But get_or_create_agent — the path used by /reply — does none of this.

How it happens

  1. Session A is fully configured (provider, extensions, recipe)
  2. Enough new sessions are created to push A out of the LRU (default 100, configurable via GOOSE_MAX_ACTIVE_AGENTS)
  3. A new /reply arrives for session A
  4. get_or_create_agent creates a fresh Agent with only default_provider (usually None)
  5. The reply fails with "Provider not set" — or if provider happens to be set, proceeds with no tools available

Context

We hit this consistently in an internal Slack bot that manages many concurrent sessions. With GOOSE_MAX_ACTIVE_AGENTS set to 20-45 in production, LRU eviction is frequent under normal load. We've added client-side workarounds (detect the error, invalidate the stale session, re-configure, retry), but ideally get_or_create_agent would restore what it can from the persisted session data — similar to how restart_agent already does.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions