perf: session switch parallelization + message pagination (v0.50.229)#1181
perf: session switch parallelization + message pagination (v0.50.229)#1181nesquena-hermes wants to merge 8 commits intomasterfrom
Conversation
Three changes to speed up session switching: 1. workspace.js: Parallelize expanded-dir pre-fetching in loadDir(). Previously, expanded directories were fetched sequentially with for-await, creating a waterfall of N API calls. Now uses Promise.all() to fetch all expanded dirs concurrently. 2. sessions.js: Overlap loadDir() network requests with highlightCode(). The idle path ran highlightCode() (CPU-bound Prism.js) before loadDir, delaying the file-tree fetch. Now loadDir() is kicked off first so the HTTP request is in-flight while Prism runs, then awaited. 3. workspace.py: Parallelize git_info_for_workspace() subprocess calls. git status, rev-list (ahead), and rev-list (behind) were called serially (3 subprocess.run calls, each up to 3s timeout). Now they run concurrently via ThreadPoolExecutor, reducing wall time by ~60%.
6 tests across 3 classes: - TestLoadDirParallelPrefetch: static analysis of Promise.all in workspace.js - TestLoadSessionIdleOverlap: static analysis of loadDir-before-highlightCode ordering - TestGitInfoParallel: runtime tests with threading.Barrier concurrency proof and wall-clock timing (parallel ~0.1s vs serial ~0.3s for 3 git commands)
Add msg_limit and msg_before parameters to /api/session endpoint: - Initial load fetches only last 30 messages (tail window) - Scroll-to-top triggers lazy loading of older messages via cursor - Payload reduced by ~70% for large sessions (100+ messages) - undo/retry/compress auto-reset truncation state - Export uses server-side generation, unaffected Backend (api/routes.py): - msg_limit: returns only the last N messages with truncation flag - msg_before: cursor-based paging for scroll-to-top loading Frontend (static/sessions.js): - _INITIAL_MSG_LIMIT=30 for initial paginated load - _messagesTruncated tracks pending older messages - _loadOlderMessages() prepends older messages on scroll - _ensureAllMessagesLoaded() for operations needing full history Frontend (static/ui.js): - Scroll-to-top detection (scrollTop < 80px) triggers lazy load - 'Load older messages' visual indicator when truncated Tests: 19 total (13 new for message pagination), all passing
Issues found and fixed: 1. workspace.js: S._expandedDirs||[] fallback produced Array (no .size) → changed to ||new Set() 2. routes.py: _messages_truncated flag was wrong for msg_before paging → separate truncation logic per code path, compare against _slice not _all_msgs 3. routes.py: msg_before used timestamp comparison which fails when messages lack _ts (827 msgs with _ts=0 in real data) or have duplicate timestamps → changed to index-based cursor (0-based position in full array) → added _messages_offset field so frontend knows cursor position → added input validation (try/except, max(1,...), bounds clamping) 4. sessions.js: _oldestIdx tracks cursor position, reset on session switch, updated from _messages_offset on each page load 5. workspace.py: moved concurrent.futures import to module top level 6. tests: updated to match index-based cursor, added 8 new tests: - test_expanded_dirs_fallback_is_set - test_msg_before_index_based_slicing - test_msg_before_zero_returns_empty - test_msg_before_equal_total - test_truncation_flag_with_msg_before - test_messages_offset_initial_load/with_msg_before - test_oldest_idx_tracking/reset_on_session_switch - test_load_older_uses_index_cursor - test_msg_before_bounds_clamping 27/27 tests pass, 2583/2583 total (5 pre-existing failures unrelated)
Race condition: if user switches sessions while _loadOlderMessages() is in-flight, the stale response could land on the new session's S.messages. Guards verified: - _loadOlderMessages checks _loadingSessionId !== sid after await (bails out) - Guard appears BEFORE S.messages mutation (stale data cannot land) - loadSession resets _loadingOlder=false on switch (no stale lock) - loadSession resets _messagesTruncated=false and _oldestIdx=0 on switch Added: _loadingOlder=false reset in loadSession() reset block. Added: 5 tests in TestSessionSwitchCancellation proving all guards exist and are ordered correctly. 32/32 tests pass.
- messages.js: reset _messagesTruncated in done handler (server response is always full) - ui.js: use CSS class + i18n for load-older indicator - style.css: add .load-older-indicator class - i18n.js: add load_older_messages key - CHANGELOG: add v0.50.229 entry
…er truncation reset
The cancellation guard at the top of _loadOlderMessages checks
_loadingSessionId !== null && _loadingSessionId !== sid. That catches
the case where a NEW session load is in progress when the older-messages
fetch returns. It misses the case where:
1. user is on session A, _loadOlderMessages in flight
2. user switches to session B; loadSession(B) completes; sets
_loadingSessionId = null
3. older-messages response for A returns
-> _loadingSessionId === null
-> guard evaluates to (null !== null) && ... -> false -> NO bail
-> S.messages = [...olderMsgs, ...S.messages] prepends A's old
messages onto B's S.messages
Tighten the guard by also comparing the captured sid against the
currently-active S.session.session_id. Together they cover both windows:
mid-switch (new sid loading) and post-switch (no load in progress).
Added a regression test (test_load_older_compares_against_active_session_id)
that asserts the new check is present and runs before any S.messages
mutation. Existing 5 cancellation tests still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
nesquena
left a comment
There was a problem hiding this comment.
Review — end-to-end ✅ (approved after stale-response race fix pushed)
What this ships
v0.50.229 — absorbing @jasonjcwu's PR #1158 with integration fixes:
Performance: parallelize the session switch hot path
Promise.all()for expanded-dir pre-fetches inloadDir()(N×RTT → 1×RTT) at static/workspace.js:62-79ThreadPoolExecutor(max_workers=3)forgit status/rev-list ahead/rev-list behindsubprocess calls at api/workspace.py:609-628loadDir()dispatched concurrently withhighlightCode()on the idle path at static/sessions.js:269-273
Message pagination for long conversations
?msg_limit=N&msg_before=Iquery params on/api/sessionat api/routes.py:775-794- Initial load: last 30 messages (
_INITIAL_MSG_LIMIT) - Scroll-to-top triggers
_loadOlderMessageswithmsg_beforecursor (index-based, not timestamp) - All
undo/retry/compress/donepaths reset_messagesTruncated
Traced against upstream hermes-agent
Pulled fresh nousresearch/hermes-agent tarball.
Backend changes touch api/routes.py and api/workspace.py. The _messages_truncated and _messages_offset response fields are WebUI-internal (added to the JSON shape on top of the agent's session). The agent doesn't read these — s.compact() does NOT include them, they're only set by the route handler. Cross-tool risk = zero.
git_info_for_workspace is purely workspace.py — agent doesn't read git status. Threading is safe (subprocess calls block on pipes, not GIL).
What I caught — stale-response race in _loadOlderMessages
The PR's cancellation guard at static/sessions.js:373 (pre-fix):
if (!data || !data.session || (_loadingSessionId !== null && _loadingSessionId !== sid)) return;This catches the mid-switch case (new session load in progress when older-messages returns). It misses the post-switch case:
- User on session A,
_loadOlderMessagesissues fetch (sid = A) - User clicks session B →
loadSession(B)completes →_loadingSessionId = null(sessions.js:141) - Older-messages response for A returns → guard evaluates
(null !== null) && ...= false → guard does NOT bail S.messages = [...olderMsgs, ...S.messages]prepends A's older messages onto B's S.messages → corruption
The reset block at sessions.js:117-122 clears _loadingOlder = false; _messagesTruncated = false; _oldestIdx = 0 on switch — but those are state for FUTURE calls. The in-flight promise is unaffected.
What I pushed — d974388
Tightened the guard to ALSO compare S.session.session_id !== sid:
if (!data || !data.session) return;
if (!S.session || S.session.session_id !== sid) return;
if (_loadingSessionId !== null && _loadingSessionId !== sid) return;Both layers together cover both windows (mid-switch and post-switch). The S.session.session_id check is the authoritative "is this still the active session?" — _loadingSessionId is only meaningful while a load is in progress.
Plus a regression test (test_load_older_compares_against_active_session_id) that asserts:
S.session.session_id !== sidis present in_loadOlderMessagesbody- The check appears BEFORE the
S.messages = [...olderMsgs, ...S.messages]mutation
CI re-ran green on d974388: 3.11 ✅, 3.12 ✅, 3.13 ✅.
End-to-end trace
Backend pagination math
- Initial load (no
msg_before):_truncated_msgs = _all_msgs[-msg_limit:](last N) - Cursor paging (
msg_before=I):_slice = _all_msgs[:_before_idx](everything before cursor), then_slice[-msg_limit:](last N before cursor) - Bounds:
_before_idx = max(0, min(int(msg_before), len(_all_msgs)))— clamped at both ends - Truncation flag:
len(_slice) > msg_limitfor cursor path;len(_all_msgs) > msg_limitfor initial path - Offset (cursor):
_before_idx - len(_truncated_msgs)— index of first returned message in full array - Offset (initial):
len(_all_msgs) - len(_truncated_msgs)— same semantics
Walked through with N=100, msg_limit=30:
- Initial: returns msgs[70:100], offset=70, truncated=True ✅
- msg_before=70, msg_limit=30: returns msgs[40:70], offset=40, truncated=True (40 > 0) ✅
- msg_before=30, msg_limit=30: returns msgs[0:30], offset=0, truncated=False ✅
- msg_before=10, msg_limit=30: returns msgs[0:10] (clamped by slice), offset=0, truncated=False ✅
git_info_for_workspace parallelization
api/workspace.py:618-628 — ThreadPoolExecutor(max_workers=3) per call. Each git subprocess blocks on the pipe, not the GIL. The with block ensures the pool is shut down after each call.
Cost: 3 thread creates per call. Trade-off vs the ~50-200ms wall-clock saving (subprocess.run timeouts can stack). Acceptable.
Frontend cancellation matrix (after my fix)
| Scenario | _loadingSessionId | S.session.session_id | Bail? |
|---|---|---|---|
| Same session, no switch | null | === sid | no — apply ✅ |
| Mid-switch to B (B loading) | B | sid (still A) | yes — _loadingSessionId !== sid ✅ |
| Post-switch to B (B loaded) | null | B (!== sid) | yes — S.session.session_id !== sid ✅ |
| Session deleted while in-flight | null | undefined / null | yes — !S.session ✅ |
| Response error | n/a | n/a | yes — `!data |
done SSE handler resetting _messagesTruncated
api/streaming.py:1964: raw_session = s.compact() | {'messages': s.messages, 'tool_calls': tool_calls} — emits the FULL session, no _messages_truncated field. So static/messages.js:762's _messagesTruncated = !!d.session._messages_truncated evaluates !!undefined = false. Correctly resets the flag. ✅
_ensureAllMessagesLoaded for export/undo
static/sessions.js:400-407 fetches the full session WITHOUT msg_limit so cmdRetry/cmdUndo/manual compression always operate on the complete history. cmdRetry/cmdUndo/_runManualCompression also reset _messagesTruncated = false after the reload — locking in the new full-history state.
Edge-case trace
| Scenario | Expected | Actual |
|---|---|---|
| Session with 5 messages | no truncation, no indicator | ✅ len(_all_msgs) > msg_limit is False |
| Session with 30 messages exactly | no truncation | ✅ 30 > 30 is False |
| Session with 31 messages | last 30 returned, indicator shown | ✅ |
| Scroll-to-top fires twice rapidly | _loadingOlder gate blocks second |
✅ if (_loadingOlder...) return |
| User switches session mid-fetch | guard bails on response | ✅ both _loadingSessionId !== sid and S.session.session_id !== sid checks |
| User switches session post-fetch (window I caught) | guard bails on response | ✅ after my fix |
| msg_before > total messages | clamped to total, returns prefix | ✅ min(int(msg_before), len(_all_msgs)) |
| msg_before == 0 | empty slice, no older | ✅ _oldestIdx <= 0 early return |
| msg_limit=0 (invalid) | clamped to 1 | ✅ max(1, int(_msg_limit)) |
| msg_limit=string (invalid) | None (no truncation) | ✅ try/except ValueError |
| Concurrent git_info calls | each gets its own pool | ✅ per-call ThreadPoolExecutor |
| Send message in truncated session | done event resets truncation, full history loaded | ✅ d.session has all messages |
| Export from truncated session | server-side generation, unaffected | ✅ /api/session/export reads s.messages directly |
Tests
- PR's targeted tests: 32/32 in
test_parallel_session_switch.py. Plus my new test = 33/33. - Local full suite: 2629 passed, 47 skipped, 1 PR-unrelated pre-existing failure (
test_sprint3.py::test_workspace_add_rejects_system_paths— macOS-only). - CI on PR after my push: ✅ test (3.11), ✅ test (3.12), ✅ test (3.13).
- Behavioural verification: manual trace through bound clamping, truncation flag math, and cancellation matrix above.
Other audit — confirmed correct
- i18n coverage:
load_older_messagesadded in all 7 locales (en, ru, es, de, zh, zh-Hant, ko). - CSS:
.load-older-indicatoris a focused single-rule class. No specificity conflicts. S._expandedDirs||new Set()fallback: was||[]originally, fixed during integration. Set has.sizeand is iterable — works with[...expanded]spread. ✅- Index-based cursor (not timestamp): avoids issues with messages that have
_ts=0or duplicate timestamps. The PR description explicitly calls out the 827-msg dataset that motivated this change. - JS syntax:
node --checkpasses on all touched files.
Minor observations (non-blocking)
- The
_INITIAL_MSG_LIMIT = 30constant is hard-coded. Could be a per-user setting in the future (especially for power users with very long histories who want bigger initial loads). Out of scope. - The
doneevent handler resets_messagesTruncated = !!d.session._messages_truncated— thedoneevent always emits the full session, but if a future change ever adds_messages_truncatedto the SSE payload, this would silently flip the flag. Not currently a risk since the field is only set in routes.py. - Per-call
ThreadPoolExecutor(max_workers=3)has thread-creation overhead (~ms per call). For workspaces hit by every session list refresh, a module-level pool would be cheaper. Acceptable for now since git_info is gated byis_git.
Recommendation
Approved (after race-fix push). Three independent perf optimizations + a substantial pagination feature, all wired correctly and well-tested. The cancellation race I caught is a real corruption window — the PR's own test suite covered the mid-switch case but not the post-switch case. Tightened the guard with an S.session.session_id !== sid check and added a regression test that locks the new invariant. CI green; full suite 2629 passed; no agent coupling. Parked at approval — ready for the release agent's merge/tag pipeline.
) Merged as v0.50.229. 2678 tests passing. Browser QA 21/21. All three PRs were independently reviewed and approved by @nesquena with reviewer commits pulled in: - #1181 (#1158): `d974388` (stale-response race in _loadOlderMessages) - #1182: `7e20006` (full-scan fallback path consistency) - #1180: `a5ad154` (regression test for iOS zoom threshold) Thanks @jasonjcwu (#1158)!
|
Merged as v0.50.229 via #1183. Thank you @jasonjcwu! |
…squena#1183) Merged as v0.50.229. 2678 tests passing. Browser QA 21/21. All three PRs were independently reviewed and approved by @nesquena with reviewer commits pulled in: - nesquena#1181 (nesquena#1158): `d974388` (stale-response race in _loadOlderMessages) - nesquena#1182: `7e20006` (full-scan fallback path consistency) - nesquena#1180: `a5ad154` (regression test for iOS zoom threshold) Thanks @jasonjcwu (nesquena#1158)!
What this ships
Performance: session switch parallelization — three independent bottlenecks made concurrent:
Promise.all()for expanded-directory pre-fetches (N×RTT → 1×RTT)ThreadPoolExecutor(max_workers=3)for git status/ahead/behind subprocessesloadDir()andhighlightCode()dispatched concurrently on the idle pathMessage pagination for long conversations — sessions with > 30 messages load the most-recent 30 on switch. Older messages load on scroll-to-top or via the "↑ load older" indicator at the top of the list.
Fixes applied during integration
static/messages.js— reset_messagesTruncatedin thedoneSSE handler. After a streaming response, the server sends the full session; without this fix the "load older" indicator would persist incorrectly after the first message exchange.static/i18n.js— addedload_older_messageskey to all 7 locales (en, ru, es, de, zh, zh-Hant, ko). The original only had hardcoded English.static/style.css— added.load-older-indicatorCSS class instead of inline styles.static/ui.js— indicator now uses CSS class +t('load_older_messages')i18n call.All four
_messagesTruncatedreset paths from the original hold comment are addressed by the author:loadSession()resets on every session switch ✅cmdRetry()resets after reload ✅cmdUndo()resets after reload ✅_runManualCompression()resets after reload ✅Test results
2676 tests passing. Browser QA 21/21 (desktop 1440×900 + mobile iPhone 14). Pagination
msg_limit=30confirmed hitting the server via log inspection.