Skip to content

Improve reliability of in-flight session switching and stream reattachment #1466

@dso2ng

Description

@dso2ng

Problem

Hermes WebUI supports long-running chat turns, including tool-heavy runs and delegated work. A common workflow is:

  1. Start a chat turn in session A.
  2. While A is still running, use the sidebar to inspect session B.
  3. Return to session A.

The expected behavior is that A remains visible as an in-flight turn: live output, tool cards, approval/clarify prompts, composer state, and the sidebar running indicator should all stay consistent.

In practice, this path can become unreliable. Depending on timing, users can lose visibility into the running turn, see stale thinking/tool UI, miss an approval/clarify prompt, or return to a session whose sidebar/main-pane state no longer agrees.

Why this matters

Hermes turns can run for a long time. Users should be able to inspect other sessions while work continues, then return to the running session without needing a refresh or manual recovery.

This is especially important for:

  • long-running tool calls
  • approval/clarify prompts
  • delegated subagent sessions
  • context-compression/session-rotation flows
  • browser reload/reconnect recovery

Current state model

From reading the current WebUI code, the active-pane state and per-session runtime state are mixed in a few places:

  • S.session, S.messages, S.busy, and S.activeStreamId describe the currently viewed pane.
  • INFLIGHT[session_id], LIVE_STREAMS[session_id], and server-side active_stream_id describe running sessions.
  • loadSession() switches the active pane and can reattach to a stream if the selected session is still running.
  • attachLiveStream() owns the SSE connection for a session/stream.
  • approval/clarify UI is still partly global and is stopped/hidden during session switching.

The architecture direction I would like to steer toward is:

The active pane is only a projection. Running state belongs to the session that owns the stream.

This can be done incrementally without changing the project's no-build, no-framework design.

Suspected failure modes

  1. Stream reattach transport race

    • SSE events are effectively one-shot queue entries.
    • Returning to a running session may close/reopen the same stream, creating a small window where events can be consumed but not reflected into the pane/cache.
  2. Global active-pane state vs per-session runtime state

    • S.busy and S.activeStreamId are active-pane values.
    • Some UI paths can misrepresent background running sessions if they read only the global active-pane state.
  3. Approval/clarify prompts are not fully session-scoped

    • Switching sessions stops/hides polling/cards globally.
    • A prompt requested by session A can be hidden while viewing B and may not be restored clearly when returning to A.
  4. Inflight cleanup can have cross-session side effects

    • A completion/error/cancel for session A should not clear reconnect/runtime state for unrelated running session B.
  5. Background completion and canonical session updates

    • When a session completes while not viewed, the sidebar still needs accurate title/message count/running state and any canonical session id/lineage mapping.

Proposed incremental PR sequence

I am not proposing a large rewrite as the first step. A safer sequence would be:

  1. Add regression coverage for switching away from a running session and returning to it.
  2. Reuse an existing live stream when returning to the same running session instead of tearing down/reopening the same transport.
  3. Scope inflight cleanup to the completed session.
  4. Make approval/clarify prompt state explicitly session-owned.
  5. Ensure background completion refreshes sidebar state and canonical session mapping.
  6. If those invariants land, consider a small refactor to centralize per-session runtime state in vanilla JS.

Verification scenarios

A useful test/manual matrix for this line of work:

Scenario Expected
A running -> switch B -> switch A A shows running state and live output continues or reconnects safely.
A running -> switch B -> A completes in background Sidebar clears running indicator; opening A shows completed transcript.
A running -> approval requested while viewing B A is marked as needing attention; approval appears when A is opened.
A running -> browser reload A reconnects or restores the settled session without losing visibility.
A compresses/rotates in background Sidebar points to the latest canonical session/lineage row.
A and B both running Completion/cleanup for A does not clear B's runtime/reconnect state.

Constraints

Any solution should preserve the current WebUI constraints:

  • vanilla JS
  • no build step
  • no bundler
  • no frontend framework
  • small reviewable PRs

I plan to start with small focused PRs rather than a broad architecture change, unless maintainers prefer a different direction.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestsprint-candidateStrong candidate for next sprintuxUser experience / visual polish

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions