Skip to content

perf: session switch parallelization + message pagination (v0.50.229)#1181

Closed
nesquena-hermes wants to merge 8 commits intomasterfrom
integrate/pr1158
Closed

perf: session switch parallelization + message pagination (v0.50.229)#1181
nesquena-hermes wants to merge 8 commits intomasterfrom
integrate/pr1158

Conversation

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Attribution: Original work by @jasonjcwu (PR #1158). Integrated with fixes.

What this ships

Performance: session switch parallelization — three independent bottlenecks made concurrent:

  • Promise.all() for expanded-directory pre-fetches (N×RTT → 1×RTT)
  • ThreadPoolExecutor(max_workers=3) for git status/ahead/behind subprocesses
  • loadDir() and highlightCode() dispatched concurrently on the idle path

Message pagination for long conversations — sessions with > 30 messages load the most-recent 30 on switch. Older messages load on scroll-to-top or via the "↑ load older" indicator at the top of the list.

Fixes applied during integration

  • static/messages.js — reset _messagesTruncated in the done SSE handler. After a streaming response, the server sends the full session; without this fix the "load older" indicator would persist incorrectly after the first message exchange.
  • static/i18n.js — added load_older_messages key to all 7 locales (en, ru, es, de, zh, zh-Hant, ko). The original only had hardcoded English.
  • static/style.css — added .load-older-indicator CSS class instead of inline styles.
  • static/ui.js — indicator now uses CSS class + t('load_older_messages') i18n call.

All four _messagesTruncated reset paths from the original hold comment are addressed by the author:

  • loadSession() resets on every session switch ✅
  • cmdRetry() resets after reload ✅
  • cmdUndo() resets after reload ✅
  • _runManualCompression() resets after reload ✅

Test results

2676 tests passing. Browser QA 21/21 (desktop 1440×900 + mobile iPhone 14). Pagination msg_limit=30 confirmed hitting the server via log inspection.

fxd-jason and others added 7 commits April 27, 2026 22:48
Three changes to speed up session switching:

1. workspace.js: Parallelize expanded-dir pre-fetching in loadDir().
   Previously, expanded directories were fetched sequentially with
   for-await, creating a waterfall of N API calls. Now uses
   Promise.all() to fetch all expanded dirs concurrently.

2. sessions.js: Overlap loadDir() network requests with highlightCode().
   The idle path ran highlightCode() (CPU-bound Prism.js) before loadDir,
   delaying the file-tree fetch. Now loadDir() is kicked off first so the
   HTTP request is in-flight while Prism runs, then awaited.

3. workspace.py: Parallelize git_info_for_workspace() subprocess calls.
   git status, rev-list (ahead), and rev-list (behind) were called
   serially (3 subprocess.run calls, each up to 3s timeout). Now they
   run concurrently via ThreadPoolExecutor, reducing wall time by ~60%.
6 tests across 3 classes:
- TestLoadDirParallelPrefetch: static analysis of Promise.all in workspace.js
- TestLoadSessionIdleOverlap: static analysis of loadDir-before-highlightCode ordering
- TestGitInfoParallel: runtime tests with threading.Barrier concurrency proof
  and wall-clock timing (parallel ~0.1s vs serial ~0.3s for 3 git commands)
Add msg_limit and msg_before parameters to /api/session endpoint:
- Initial load fetches only last 30 messages (tail window)
- Scroll-to-top triggers lazy loading of older messages via cursor
- Payload reduced by ~70% for large sessions (100+ messages)
- undo/retry/compress auto-reset truncation state
- Export uses server-side generation, unaffected

Backend (api/routes.py):
- msg_limit: returns only the last N messages with truncation flag
- msg_before: cursor-based paging for scroll-to-top loading

Frontend (static/sessions.js):
- _INITIAL_MSG_LIMIT=30 for initial paginated load
- _messagesTruncated tracks pending older messages
- _loadOlderMessages() prepends older messages on scroll
- _ensureAllMessagesLoaded() for operations needing full history

Frontend (static/ui.js):
- Scroll-to-top detection (scrollTop < 80px) triggers lazy load
- 'Load older messages' visual indicator when truncated

Tests: 19 total (13 new for message pagination), all passing
Issues found and fixed:

1. workspace.js: S._expandedDirs||[] fallback produced Array (no .size)
   → changed to ||new Set()

2. routes.py: _messages_truncated flag was wrong for msg_before paging
   → separate truncation logic per code path, compare against _slice
   not _all_msgs

3. routes.py: msg_before used timestamp comparison which fails when
   messages lack _ts (827 msgs with _ts=0 in real data) or have
   duplicate timestamps
   → changed to index-based cursor (0-based position in full array)
   → added _messages_offset field so frontend knows cursor position
   → added input validation (try/except, max(1,...), bounds clamping)

4. sessions.js: _oldestIdx tracks cursor position, reset on session
   switch, updated from _messages_offset on each page load

5. workspace.py: moved concurrent.futures import to module top level

6. tests: updated to match index-based cursor, added 8 new tests:
   - test_expanded_dirs_fallback_is_set
   - test_msg_before_index_based_slicing
   - test_msg_before_zero_returns_empty
   - test_msg_before_equal_total
   - test_truncation_flag_with_msg_before
   - test_messages_offset_initial_load/with_msg_before
   - test_oldest_idx_tracking/reset_on_session_switch
   - test_load_older_uses_index_cursor
   - test_msg_before_bounds_clamping

27/27 tests pass, 2583/2583 total (5 pre-existing failures unrelated)
Race condition: if user switches sessions while _loadOlderMessages() is
in-flight, the stale response could land on the new session's S.messages.

Guards verified:
- _loadOlderMessages checks _loadingSessionId !== sid after await (bails out)
- Guard appears BEFORE S.messages mutation (stale data cannot land)
- loadSession resets _loadingOlder=false on switch (no stale lock)
- loadSession resets _messagesTruncated=false and _oldestIdx=0 on switch

Added: _loadingOlder=false reset in loadSession() reset block.
Added: 5 tests in TestSessionSwitchCancellation proving all guards exist
and are ordered correctly.

32/32 tests pass.
- messages.js: reset _messagesTruncated in done handler (server response is always full)
- ui.js: use CSS class + i18n for load-older indicator
- style.css: add .load-older-indicator class
- i18n.js: add load_older_messages key
- CHANGELOG: add v0.50.229 entry
The cancellation guard at the top of _loadOlderMessages checks
_loadingSessionId !== null && _loadingSessionId !== sid. That catches
the case where a NEW session load is in progress when the older-messages
fetch returns. It misses the case where:

  1. user is on session A, _loadOlderMessages in flight
  2. user switches to session B; loadSession(B) completes; sets
     _loadingSessionId = null
  3. older-messages response for A returns
     -> _loadingSessionId === null
     -> guard evaluates to (null !== null) && ... -> false -> NO bail
     -> S.messages = [...olderMsgs, ...S.messages] prepends A's old
        messages onto B's S.messages

Tighten the guard by also comparing the captured sid against the
currently-active S.session.session_id. Together they cover both windows:
mid-switch (new sid loading) and post-switch (no load in progress).

Added a regression test (test_load_older_compares_against_active_session_id)
that asserts the new check is present and runs before any S.messages
mutation. Existing 5 cancellation tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Copy link
Copy Markdown
Owner

@nesquena nesquena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — end-to-end ✅ (approved after stale-response race fix pushed)

What this ships

v0.50.229 — absorbing @jasonjcwu's PR #1158 with integration fixes:

Performance: parallelize the session switch hot path

Message pagination for long conversations

  • ?msg_limit=N&msg_before=I query params on /api/session at api/routes.py:775-794
  • Initial load: last 30 messages (_INITIAL_MSG_LIMIT)
  • Scroll-to-top triggers _loadOlderMessages with msg_before cursor (index-based, not timestamp)
  • All undo/retry/compress/done paths reset _messagesTruncated

Traced against upstream hermes-agent

Pulled fresh nousresearch/hermes-agent tarball.

Backend changes touch api/routes.py and api/workspace.py. The _messages_truncated and _messages_offset response fields are WebUI-internal (added to the JSON shape on top of the agent's session). The agent doesn't read these — s.compact() does NOT include them, they're only set by the route handler. Cross-tool risk = zero.

git_info_for_workspace is purely workspace.py — agent doesn't read git status. Threading is safe (subprocess calls block on pipes, not GIL).

What I caught — stale-response race in _loadOlderMessages

The PR's cancellation guard at static/sessions.js:373 (pre-fix):

if (!data || !data.session || (_loadingSessionId !== null && _loadingSessionId !== sid)) return;

This catches the mid-switch case (new session load in progress when older-messages returns). It misses the post-switch case:

  1. User on session A, _loadOlderMessages issues fetch (sid = A)
  2. User clicks session B → loadSession(B) completes → _loadingSessionId = null (sessions.js:141)
  3. Older-messages response for A returns → guard evaluates (null !== null) && ... = false → guard does NOT bail
  4. S.messages = [...olderMsgs, ...S.messages] prepends A's older messages onto B's S.messages → corruption

The reset block at sessions.js:117-122 clears _loadingOlder = false; _messagesTruncated = false; _oldestIdx = 0 on switch — but those are state for FUTURE calls. The in-flight promise is unaffected.

What I pushed — d974388

Tightened the guard to ALSO compare S.session.session_id !== sid:

if (!data || !data.session) return;
if (!S.session || S.session.session_id !== sid) return;
if (_loadingSessionId !== null && _loadingSessionId !== sid) return;

Both layers together cover both windows (mid-switch and post-switch). The S.session.session_id check is the authoritative "is this still the active session?" — _loadingSessionId is only meaningful while a load is in progress.

Plus a regression test (test_load_older_compares_against_active_session_id) that asserts:

  • S.session.session_id !== sid is present in _loadOlderMessages body
  • The check appears BEFORE the S.messages = [...olderMsgs, ...S.messages] mutation

CI re-ran green on d974388: 3.11 ✅, 3.12 ✅, 3.13 ✅.

End-to-end trace

Backend pagination math

api/routes.py:802-820:

  • Initial load (no msg_before): _truncated_msgs = _all_msgs[-msg_limit:] (last N)
  • Cursor paging (msg_before=I): _slice = _all_msgs[:_before_idx] (everything before cursor), then _slice[-msg_limit:] (last N before cursor)
  • Bounds: _before_idx = max(0, min(int(msg_before), len(_all_msgs))) — clamped at both ends
  • Truncation flag: len(_slice) > msg_limit for cursor path; len(_all_msgs) > msg_limit for initial path
  • Offset (cursor): _before_idx - len(_truncated_msgs) — index of first returned message in full array
  • Offset (initial): len(_all_msgs) - len(_truncated_msgs) — same semantics

Walked through with N=100, msg_limit=30:

  • Initial: returns msgs[70:100], offset=70, truncated=True ✅
  • msg_before=70, msg_limit=30: returns msgs[40:70], offset=40, truncated=True (40 > 0) ✅
  • msg_before=30, msg_limit=30: returns msgs[0:30], offset=0, truncated=False ✅
  • msg_before=10, msg_limit=30: returns msgs[0:10] (clamped by slice), offset=0, truncated=False ✅

git_info_for_workspace parallelization

api/workspace.py:618-628ThreadPoolExecutor(max_workers=3) per call. Each git subprocess blocks on the pipe, not the GIL. The with block ensures the pool is shut down after each call.

Cost: 3 thread creates per call. Trade-off vs the ~50-200ms wall-clock saving (subprocess.run timeouts can stack). Acceptable.

Frontend cancellation matrix (after my fix)

Scenario _loadingSessionId S.session.session_id Bail?
Same session, no switch null === sid no — apply ✅
Mid-switch to B (B loading) B sid (still A) yes — _loadingSessionId !== sid
Post-switch to B (B loaded) null B (!== sid) yes — S.session.session_id !== sid
Session deleted while in-flight null undefined / null yes — !S.session
Response error n/a n/a yes — `!data

done SSE handler resetting _messagesTruncated

api/streaming.py:1964: raw_session = s.compact() | {'messages': s.messages, 'tool_calls': tool_calls} — emits the FULL session, no _messages_truncated field. So static/messages.js:762's _messagesTruncated = !!d.session._messages_truncated evaluates !!undefined = false. Correctly resets the flag. ✅

_ensureAllMessagesLoaded for export/undo

static/sessions.js:400-407 fetches the full session WITHOUT msg_limit so cmdRetry/cmdUndo/manual compression always operate on the complete history. cmdRetry/cmdUndo/_runManualCompression also reset _messagesTruncated = false after the reload — locking in the new full-history state.

Edge-case trace

Scenario Expected Actual
Session with 5 messages no truncation, no indicator len(_all_msgs) > msg_limit is False
Session with 30 messages exactly no truncation 30 > 30 is False
Session with 31 messages last 30 returned, indicator shown
Scroll-to-top fires twice rapidly _loadingOlder gate blocks second if (_loadingOlder...) return
User switches session mid-fetch guard bails on response ✅ both _loadingSessionId !== sid and S.session.session_id !== sid checks
User switches session post-fetch (window I caught) guard bails on response ✅ after my fix
msg_before > total messages clamped to total, returns prefix min(int(msg_before), len(_all_msgs))
msg_before == 0 empty slice, no older _oldestIdx <= 0 early return
msg_limit=0 (invalid) clamped to 1 max(1, int(_msg_limit))
msg_limit=string (invalid) None (no truncation) ✅ try/except ValueError
Concurrent git_info calls each gets its own pool ✅ per-call ThreadPoolExecutor
Send message in truncated session done event resets truncation, full history loaded ✅ d.session has all messages
Export from truncated session server-side generation, unaffected ✅ /api/session/export reads s.messages directly

Tests

  • PR's targeted tests: 32/32 in test_parallel_session_switch.py. Plus my new test = 33/33.
  • Local full suite: 2629 passed, 47 skipped, 1 PR-unrelated pre-existing failure (test_sprint3.py::test_workspace_add_rejects_system_paths — macOS-only).
  • CI on PR after my push: ✅ test (3.11), ✅ test (3.12), ✅ test (3.13).
  • Behavioural verification: manual trace through bound clamping, truncation flag math, and cancellation matrix above.

Other audit — confirmed correct

  • i18n coverage: load_older_messages added in all 7 locales (en, ru, es, de, zh, zh-Hant, ko).
  • CSS: .load-older-indicator is a focused single-rule class. No specificity conflicts.
  • S._expandedDirs||new Set() fallback: was ||[] originally, fixed during integration. Set has .size and is iterable — works with [...expanded] spread. ✅
  • Index-based cursor (not timestamp): avoids issues with messages that have _ts=0 or duplicate timestamps. The PR description explicitly calls out the 827-msg dataset that motivated this change.
  • JS syntax: node --check passes on all touched files.

Minor observations (non-blocking)

  • The _INITIAL_MSG_LIMIT = 30 constant is hard-coded. Could be a per-user setting in the future (especially for power users with very long histories who want bigger initial loads). Out of scope.
  • The done event handler resets _messagesTruncated = !!d.session._messages_truncated — the done event always emits the full session, but if a future change ever adds _messages_truncated to the SSE payload, this would silently flip the flag. Not currently a risk since the field is only set in routes.py.
  • Per-call ThreadPoolExecutor(max_workers=3) has thread-creation overhead (~ms per call). For workspaces hit by every session list refresh, a module-level pool would be cheaper. Acceptable for now since git_info is gated by is_git.

Recommendation

Approved (after race-fix push). Three independent perf optimizations + a substantial pagination feature, all wired correctly and well-tested. The cancellation race I caught is a real corruption window — the PR's own test suite covered the mid-switch case but not the post-switch case. Tightened the guard with an S.session.session_id !== sid check and added a regression test that locks the new invariant. CI green; full suite 2629 passed; no agent coupling. Parked at approval — ready for the release agent's merge/tag pipeline.

nesquena-hermes added a commit that referenced this pull request Apr 27, 2026
)

Merged as v0.50.229. 2678 tests passing. Browser QA 21/21.

All three PRs were independently reviewed and approved by @nesquena with reviewer commits pulled in:
- #1181 (#1158): `d974388` (stale-response race in _loadOlderMessages)
- #1182: `7e20006` (full-scan fallback path consistency)
- #1180: `a5ad154` (regression test for iOS zoom threshold)

Thanks @jasonjcwu (#1158)!
@nesquena-hermes
Copy link
Copy Markdown
Collaborator Author

Merged as v0.50.229 via #1183. Thank you @jasonjcwu!

@nesquena-hermes nesquena-hermes deleted the integrate/pr1158 branch April 27, 2026 23:52
JKJameson pushed a commit to JKJameson/hermes-webui that referenced this pull request Apr 29, 2026
…squena#1183)

Merged as v0.50.229. 2678 tests passing. Browser QA 21/21.

All three PRs were independently reviewed and approved by @nesquena with reviewer commits pulled in:
- nesquena#1181 (nesquena#1158): `d974388` (stale-response race in _loadOlderMessages)
- nesquena#1182: `7e20006` (full-scan fallback path consistency)
- nesquena#1180: `a5ad154` (regression test for iOS zoom threshold)

Thanks @jasonjcwu (nesquena#1158)!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants