Skip to content

Tool-pair summarization creates hidden messages that are silently deleted on next client round-trip #7413

@bzqzheng

Description

@bzqzheng

Summary

When tool-pair summarization fires, it creates server-side hidden messages (userVisible: false, agentVisible: true) and mutates prior message metadata without emitting a HistoryReplaced event. The UI never learns about these changes. On the next user turn, the UI sends conversation_so_far containing its stale copy of the conversation, and the server calls replace_conversation with it — permanently deleting the hidden summaries and potentially reverting metadata changes the server just made.

This is in contrast to full compaction (compact_messages), which correctly emits HistoryReplaced and keeps the UI in sync.

Affected Code

  • crates/goose/src/agents/agent.rs — tool-pair summarization block (~L1580–1608): calls update_message_metadata and add_message but never emits AgentEvent::HistoryReplaced
  • crates/goose-server/src/routes/reply.rs (~L288–305): unconditionally calls replace_conversation with client-supplied conversation_so_far
  • crates/goose/src/agents/agent.rs (~L951, ~L1034, ~L1493): the three places where HistoryReplaced IS emitted — tool-pair summarization is the missing fourth

Reproduction

Setup: Set GOOSE_TOOL_CALL_CUTOFF: 3 in ~/.config/goose/config.yaml to make summarization trigger quickly.

  1. Start a fresh session

  2. Send a message that forces chained tool calls:

    "List the files in /tmp, read the 3 most recently modified ones, and tell me what you find"

  3. After the response, query the session DB:

    SELECT message_id, role, metadata_json
    FROM messages
    WHERE session_id = (SELECT id FROM sessions ORDER BY updated_at DESC LIMIT 1)
    ORDER BY created_timestamp;

    Observe: a new userVisible: false, agentVisible: true message (the hidden summary) and one or two messages now marked agentVisible: false.

  4. Send any follow-up message ("thanks")

  5. Query the DB again

Expected: hidden summary persists; agentVisible: false metadata preserved.

Actual: hidden summary is deleted; server had to re-run summarization from scratch. Other messages also dropped.

Evidence

Captured from a live test session:

After turn 1:

msg_6df69722  assistant  agentVisible:false   ← summarized, hidden from AI
msg_7e3e673e  user       agentVisible:false   ← summarized, hidden from AI
msg_803f5a4f  user       userVisible:false, agentVisible:true  ← hidden summary

After sending "thanks":

msg_011qCnLc  assistant  agentVisible:false   ← different message ID, msg_6df69722 gone
msg_7e3e673e  user       agentVisible:false   ← preserved
msg_cc2f0950  user       userVisible:false, agentVisible:true  ← brand new summary

msg_803f5a4f is completely gone. The server recreated the work it just did.

Impact

  • Tool-pair summarization silently undoes itself every turn
  • Agent context is not compressed as intended — tool pairs the system tried to retire may re-enter the context window
  • In long sessions where stale tool results (e.g. failed extension checks) were being compressed away, those results can persist in context longer than expected, contributing to stale reasoning loops
  • Severity: medium. No data loss visible to the user; the system partially self-heals by re-running summarization. But the intended token economy and context hygiene of tool-pair summarization is effectively broken.

Root Cause

maybe_summarize_tool_pair in agent.rs calls update_message_metadata and session_manager.add_message but does not emit AgentEvent::HistoryReplaced. The UI's local state therefore never reflects the server's metadata mutations. On the next /reply request, conversation_so_far carries the stale pre-mutation state, and reply.rs calls replace_conversation unconditionally, overwriting the server's changes.

Proposed Fix

Immediate (low risk): After applying tool-pair summarization changes in agent.rs, emit AgentEvent::HistoryReplaced(conversation.clone()). This is identical to how full compaction notifies the UI. The UI will refresh its local state and subsequent conversation_so_far payloads will be accurate.

Follow-up (tracked separately): The conversation_so_far ownership model is inherently fragile — the client should not be the source of truth for server-managed state. Consider a server-authoritative reply contract where the client can only append new messages, not replace existing ones.

References

  • Full compaction emits correctly: agent.rs ~L1034
  • Recovery compaction emits correctly: agent.rs ~L1493
  • Slash command compaction emits correctly: agent.rs ~L951
  • Tool-pair summarization: agent.rs ~L1580–1608 (missing emission)
  • AgentEvent::HistoryReplaced handling in UI: useChatStream.ts ~L294

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions