perf: session switch parallelization + message pagination (v0.50.229) by nesquena-hermes · Pull Request #1181 · nesquena/hermes-webui

nesquena-hermes · 2026-04-27T23:05:03Z

Attribution: Original work by @jasonjcwu (PR #1158). Integrated with fixes.

What this ships

Performance: session switch parallelization — three independent bottlenecks made concurrent:

Promise.all() for expanded-directory pre-fetches (N×RTT → 1×RTT)
ThreadPoolExecutor(max_workers=3) for git status/ahead/behind subprocesses
loadDir() and highlightCode() dispatched concurrently on the idle path

Message pagination for long conversations — sessions with > 30 messages load the most-recent 30 on switch. Older messages load on scroll-to-top or via the "↑ load older" indicator at the top of the list.

Fixes applied during integration

static/messages.js — reset _messagesTruncated in the done SSE handler. After a streaming response, the server sends the full session; without this fix the "load older" indicator would persist incorrectly after the first message exchange.
static/i18n.js — added load_older_messages key to all 7 locales (en, ru, es, de, zh, zh-Hant, ko). The original only had hardcoded English.
static/style.css — added .load-older-indicator CSS class instead of inline styles.
static/ui.js — indicator now uses CSS class + t('load_older_messages') i18n call.

All four _messagesTruncated reset paths from the original hold comment are addressed by the author:

loadSession() resets on every session switch ✅
cmdRetry() resets after reload ✅
cmdUndo() resets after reload ✅
_runManualCompression() resets after reload ✅

Test results

2676 tests passing. Browser QA 21/21 (desktop 1440×900 + mobile iPhone 14). Pagination msg_limit=30 confirmed hitting the server via log inspection.

Three changes to speed up session switching: 1. workspace.js: Parallelize expanded-dir pre-fetching in loadDir(). Previously, expanded directories were fetched sequentially with for-await, creating a waterfall of N API calls. Now uses Promise.all() to fetch all expanded dirs concurrently. 2. sessions.js: Overlap loadDir() network requests with highlightCode(). The idle path ran highlightCode() (CPU-bound Prism.js) before loadDir, delaying the file-tree fetch. Now loadDir() is kicked off first so the HTTP request is in-flight while Prism runs, then awaited. 3. workspace.py: Parallelize git_info_for_workspace() subprocess calls. git status, rev-list (ahead), and rev-list (behind) were called serially (3 subprocess.run calls, each up to 3s timeout). Now they run concurrently via ThreadPoolExecutor, reducing wall time by ~60%.

6 tests across 3 classes: - TestLoadDirParallelPrefetch: static analysis of Promise.all in workspace.js - TestLoadSessionIdleOverlap: static analysis of loadDir-before-highlightCode ordering - TestGitInfoParallel: runtime tests with threading.Barrier concurrency proof and wall-clock timing (parallel ~0.1s vs serial ~0.3s for 3 git commands)

Add msg_limit and msg_before parameters to /api/session endpoint: - Initial load fetches only last 30 messages (tail window) - Scroll-to-top triggers lazy loading of older messages via cursor - Payload reduced by ~70% for large sessions (100+ messages) - undo/retry/compress auto-reset truncation state - Export uses server-side generation, unaffected Backend (api/routes.py): - msg_limit: returns only the last N messages with truncation flag - msg_before: cursor-based paging for scroll-to-top loading Frontend (static/sessions.js): - _INITIAL_MSG_LIMIT=30 for initial paginated load - _messagesTruncated tracks pending older messages - _loadOlderMessages() prepends older messages on scroll - _ensureAllMessagesLoaded() for operations needing full history Frontend (static/ui.js): - Scroll-to-top detection (scrollTop < 80px) triggers lazy load - 'Load older messages' visual indicator when truncated Tests: 19 total (13 new for message pagination), all passing

Issues found and fixed: 1. workspace.js: S._expandedDirs||[] fallback produced Array (no .size) → changed to ||new Set() 2. routes.py: _messages_truncated flag was wrong for msg_before paging → separate truncation logic per code path, compare against _slice not _all_msgs 3. routes.py: msg_before used timestamp comparison which fails when messages lack _ts (827 msgs with _ts=0 in real data) or have duplicate timestamps → changed to index-based cursor (0-based position in full array) → added _messages_offset field so frontend knows cursor position → added input validation (try/except, max(1,...), bounds clamping) 4. sessions.js: _oldestIdx tracks cursor position, reset on session switch, updated from _messages_offset on each page load 5. workspace.py: moved concurrent.futures import to module top level 6. tests: updated to match index-based cursor, added 8 new tests: - test_expanded_dirs_fallback_is_set - test_msg_before_index_based_slicing - test_msg_before_zero_returns_empty - test_msg_before_equal_total - test_truncation_flag_with_msg_before - test_messages_offset_initial_load/with_msg_before - test_oldest_idx_tracking/reset_on_session_switch - test_load_older_uses_index_cursor - test_msg_before_bounds_clamping 27/27 tests pass, 2583/2583 total (5 pre-existing failures unrelated)

Race condition: if user switches sessions while _loadOlderMessages() is in-flight, the stale response could land on the new session's S.messages. Guards verified: - _loadOlderMessages checks _loadingSessionId !== sid after await (bails out) - Guard appears BEFORE S.messages mutation (stale data cannot land) - loadSession resets _loadingOlder=false on switch (no stale lock) - loadSession resets _messagesTruncated=false and _oldestIdx=0 on switch Added: _loadingOlder=false reset in loadSession() reset block. Added: 5 tests in TestSessionSwitchCancellation proving all guards exist and are ordered correctly. 32/32 tests pass.

- messages.js: reset _messagesTruncated in done handler (server response is always full) - ui.js: use CSS class + i18n for load-older indicator - style.css: add .load-older-indicator class - i18n.js: add load_older_messages key - CHANGELOG: add v0.50.229 entry

…er truncation reset

The cancellation guard at the top of _loadOlderMessages checks _loadingSessionId !== null && _loadingSessionId !== sid. That catches the case where a NEW session load is in progress when the older-messages fetch returns. It misses the case where: 1. user is on session A, _loadOlderMessages in flight 2. user switches to session B; loadSession(B) completes; sets _loadingSessionId = null 3. older-messages response for A returns -> _loadingSessionId === null -> guard evaluates to (null !== null) && ... -> false -> NO bail -> S.messages = [...olderMsgs, ...S.messages] prepends A's old messages onto B's S.messages Tighten the guard by also comparing the captured sid against the currently-active S.session.session_id. Together they cover both windows: mid-switch (new sid loading) and post-switch (no load in progress). Added a regression test (test_load_older_compares_against_active_session_id) that asserts the new check is present and runs before any S.messages mutation. Existing 5 cancellation tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

nesquena

Review — end-to-end ✅ (approved after stale-response race fix pushed)

What this ships

v0.50.229 — absorbing @jasonjcwu's PR #1158 with integration fixes:

Performance: parallelize the session switch hot path

Promise.all() for expanded-dir pre-fetches in loadDir() (N×RTT → 1×RTT) at static/workspace.js:62-79
ThreadPoolExecutor(max_workers=3) for git status / rev-list ahead / rev-list behind subprocess calls at api/workspace.py:609-628
loadDir() dispatched concurrently with highlightCode() on the idle path at static/sessions.js:269-273

Message pagination for long conversations

?msg_limit=N&msg_before=I query params on /api/session at api/routes.py:775-794
Initial load: last 30 messages (_INITIAL_MSG_LIMIT)
Scroll-to-top triggers _loadOlderMessages with msg_before cursor (index-based, not timestamp)
All undo/retry/compress/done paths reset _messagesTruncated

Traced against upstream hermes-agent

Pulled fresh nousresearch/hermes-agent tarball.

Backend changes touch api/routes.py and api/workspace.py. The _messages_truncated and _messages_offset response fields are WebUI-internal (added to the JSON shape on top of the agent's session). The agent doesn't read these — s.compact() does NOT include them, they're only set by the route handler. Cross-tool risk = zero.

git_info_for_workspace is purely workspace.py — agent doesn't read git status. Threading is safe (subprocess calls block on pipes, not GIL).

What I caught — stale-response race in `_loadOlderMessages`

The PR's cancellation guard at static/sessions.js:373 (pre-fix):

if (!data || !data.session || (_loadingSessionId !== null && _loadingSessionId !== sid)) return;

This catches the mid-switch case (new session load in progress when older-messages returns). It misses the post-switch case:

User on session A, _loadOlderMessages issues fetch (sid = A)
User clicks session B → loadSession(B) completes → _loadingSessionId = null (sessions.js:141)
Older-messages response for A returns → guard evaluates (null !== null) && ... = false → guard does NOT bail
S.messages = [...olderMsgs, ...S.messages] prepends A's older messages onto B's S.messages → corruption

The reset block at sessions.js:117-122 clears _loadingOlder = false; _messagesTruncated = false; _oldestIdx = 0 on switch — but those are state for FUTURE calls. The in-flight promise is unaffected.

What I pushed — `d974388`

Tightened the guard to ALSO compare S.session.session_id !== sid:

if (!data || !data.session) return;
if (!S.session || S.session.session_id !== sid) return;
if (_loadingSessionId !== null && _loadingSessionId !== sid) return;

Both layers together cover both windows (mid-switch and post-switch). The S.session.session_id check is the authoritative "is this still the active session?" — _loadingSessionId is only meaningful while a load is in progress.

Plus a regression test (test_load_older_compares_against_active_session_id) that asserts:

S.session.session_id !== sid is present in _loadOlderMessages body
The check appears BEFORE the S.messages = [...olderMsgs, ...S.messages] mutation

CI re-ran green on d974388: 3.11 ✅, 3.12 ✅, 3.13 ✅.

End-to-end trace

Backend pagination math

api/routes.py:802-820:

Initial load (no msg_before): _truncated_msgs = _all_msgs[-msg_limit:] (last N)
Cursor paging (msg_before=I): _slice = _all_msgs[:_before_idx] (everything before cursor), then _slice[-msg_limit:] (last N before cursor)
Bounds: _before_idx = max(0, min(int(msg_before), len(_all_msgs))) — clamped at both ends
Truncation flag: len(_slice) > msg_limit for cursor path; len(_all_msgs) > msg_limit for initial path
Offset (cursor): _before_idx - len(_truncated_msgs) — index of first returned message in full array
Offset (initial): len(_all_msgs) - len(_truncated_msgs) — same semantics

Walked through with N=100, msg_limit=30:

Initial: returns msgs[70:100], offset=70, truncated=True ✅
msg_before=70, msg_limit=30: returns msgs[40:70], offset=40, truncated=True (40 > 0) ✅
msg_before=30, msg_limit=30: returns msgs[0:30], offset=0, truncated=False ✅
msg_before=10, msg_limit=30: returns msgs[0:10] (clamped by slice), offset=0, truncated=False ✅

`git_info_for_workspace` parallelization

api/workspace.py:618-628 — ThreadPoolExecutor(max_workers=3) per call. Each git subprocess blocks on the pipe, not the GIL. The with block ensures the pool is shut down after each call.

Cost: 3 thread creates per call. Trade-off vs the ~50-200ms wall-clock saving (subprocess.run timeouts can stack). Acceptable.

Frontend cancellation matrix (after my fix)

Scenario	_loadingSessionId	S.session.session_id	Bail?
Same session, no switch	null	=== sid	no — apply ✅
Mid-switch to B (B loading)	B	sid (still A)	yes — `_loadingSessionId !== sid` ✅
Post-switch to B (B loaded)	null	B (!== sid)	yes — `S.session.session_id !== sid` ✅
Session deleted while in-flight	null	undefined / null	yes — `!S.session` ✅
Response error	n/a	n/a	yes — `!data

`done` SSE handler resetting `_messagesTruncated`

api/streaming.py:1964: raw_session = s.compact() | {'messages': s.messages, 'tool_calls': tool_calls} — emits the FULL session, no _messages_truncated field. So static/messages.js:762's _messagesTruncated = !!d.session._messages_truncated evaluates !!undefined = false. Correctly resets the flag. ✅

`_ensureAllMessagesLoaded` for export/undo

static/sessions.js:400-407 fetches the full session WITHOUT msg_limit so cmdRetry/cmdUndo/manual compression always operate on the complete history. cmdRetry/cmdUndo/_runManualCompression also reset _messagesTruncated = false after the reload — locking in the new full-history state.

Edge-case trace

Scenario	Expected	Actual
Session with 5 messages	no truncation, no indicator	✅ `len(_all_msgs) > msg_limit` is False
Session with 30 messages exactly	no truncation	✅ `30 > 30` is False
Session with 31 messages	last 30 returned, indicator shown	✅
Scroll-to-top fires twice rapidly	`_loadingOlder` gate blocks second	✅ `if (_loadingOlder...) return`
User switches session mid-fetch	guard bails on response	✅ both `_loadingSessionId !== sid` and `S.session.session_id !== sid` checks
User switches session post-fetch (window I caught)	guard bails on response	✅ after my fix
msg_before > total messages	clamped to total, returns prefix	✅ `min(int(msg_before), len(_all_msgs))`
msg_before == 0	empty slice, no older	✅ `_oldestIdx <= 0` early return
msg_limit=0 (invalid)	clamped to 1	✅ `max(1, int(_msg_limit))`
msg_limit=string (invalid)	None (no truncation)	✅ try/except ValueError
Concurrent git_info calls	each gets its own pool	✅ per-call ThreadPoolExecutor
Send message in truncated session	done event resets truncation, full history loaded	✅ d.session has all messages
Export from truncated session	server-side generation, unaffected	✅ /api/session/export reads s.messages directly

Tests

PR's targeted tests: 32/32 in test_parallel_session_switch.py. Plus my new test = 33/33.
Local full suite: 2629 passed, 47 skipped, 1 PR-unrelated pre-existing failure (test_sprint3.py::test_workspace_add_rejects_system_paths — macOS-only).
CI on PR after my push: ✅ test (3.11), ✅ test (3.12), ✅ test (3.13).
Behavioural verification: manual trace through bound clamping, truncation flag math, and cancellation matrix above.

Other audit — confirmed correct

i18n coverage: load_older_messages added in all 7 locales (en, ru, es, de, zh, zh-Hant, ko).
CSS: .load-older-indicator is a focused single-rule class. No specificity conflicts.
S._expandedDirs||new Set() fallback: was ||[] originally, fixed during integration. Set has .size and is iterable — works with [...expanded] spread. ✅
Index-based cursor (not timestamp): avoids issues with messages that have _ts=0 or duplicate timestamps. The PR description explicitly calls out the 827-msg dataset that motivated this change.
JS syntax: node --check passes on all touched files.

Minor observations (non-blocking)

The _INITIAL_MSG_LIMIT = 30 constant is hard-coded. Could be a per-user setting in the future (especially for power users with very long histories who want bigger initial loads). Out of scope.
The done event handler resets _messagesTruncated = !!d.session._messages_truncated — the done event always emits the full session, but if a future change ever adds _messages_truncated to the SSE payload, this would silently flip the flag. Not currently a risk since the field is only set in routes.py.
Per-call ThreadPoolExecutor(max_workers=3) has thread-creation overhead (~ms per call). For workspaces hit by every session list refresh, a module-level pool would be cheaper. Acceptable for now since git_info is gated by is_git.

Recommendation

Approved (after race-fix push). Three independent perf optimizations + a substantial pagination feature, all wired correctly and well-tested. The cancellation race I caught is a real corruption window — the PR's own test suite covered the mid-switch case but not the post-switch case. Tightened the guard with an S.session.session_id !== sid check and added a regression test that locks the new invariant. CI green; full suite 2629 passed; no agent coupling. Parked at approval — ready for the release agent's merge/tag pipeline.

@nesquena

) Merged as v0.50.229. 2678 tests passing. Browser QA 21/21. All three PRs were independently reviewed and approved by @nesquena with reviewer commits pulled in: - #1181 (#1158): `d974388` (stale-response race in _loadOlderMessages) - #1182: `7e20006` (full-scan fallback path consistency) - #1180: `a5ad154` (regression test for iOS zoom threshold) Thanks @jasonjcwu (#1158)!

nesquena-hermes · 2026-04-27T23:27:27Z

Merged as v0.50.229 via #1183. Thank you @jasonjcwu!

@nesquena

…squena#1183) Merged as v0.50.229. 2678 tests passing. Browser QA 21/21. All three PRs were independently reviewed and approved by @nesquena with reviewer commits pulled in: - nesquena#1181 (nesquena#1158): `d974388` (stale-response race in _loadOlderMessages) - nesquena#1182: `7e20006` (full-scan fallback path consistency) - nesquena#1180: `a5ad154` (regression test for iOS zoom threshold) Thanks @jasonjcwu (nesquena#1158)!

fxd-jason and others added 7 commits April 27, 2026 22:48

fix(i18n): add load_older_messages key to all locales; fix done-handl…

43ab1df

…er truncation reset

nesquena-hermes requested a review from nesquena April 27, 2026 23:05

nesquena approved these changes Apr 27, 2026

View reviewed changes

nesquena-hermes mentioned this pull request Apr 27, 2026

fix: batch v0.50.229 — session perf, ephemeral sessions, iOS zoom #1183

Merged

nesquena-hermes closed this Apr 27, 2026

nesquena-hermes mentioned this pull request Apr 27, 2026

Perf/parallelize session switch #1158

Closed

nesquena-hermes deleted the integrate/pr1158 branch April 27, 2026 23:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: session switch parallelization + message pagination (v0.50.229)#1181

perf: session switch parallelization + message pagination (v0.50.229)#1181
nesquena-hermes wants to merge 8 commits intomasterfrom
integrate/pr1158

nesquena-hermes commented Apr 27, 2026

Uh oh!

nesquena left a comment

Uh oh!

nesquena-hermes commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nesquena-hermes commented Apr 27, 2026

What this ships

Fixes applied during integration

Test results

Uh oh!

nesquena left a comment

Choose a reason for hiding this comment

Review — end-to-end ✅ (approved after stale-response race fix pushed)

What this ships

Traced against upstream hermes-agent

What I caught — stale-response race in _loadOlderMessages

What I pushed — d974388

End-to-end trace

Backend pagination math

git_info_for_workspace parallelization

Frontend cancellation matrix (after my fix)

done SSE handler resetting _messagesTruncated

_ensureAllMessagesLoaded for export/undo

Edge-case trace

Tests

Other audit — confirmed correct

Minor observations (non-blocking)

Recommendation

Uh oh!

nesquena-hermes commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

What I caught — stale-response race in `_loadOlderMessages`

What I pushed — `d974388`

`git_info_for_workspace` parallelization

`done` SSE handler resetting `_messagesTruncated`

`_ensureAllMessagesLoaded` for export/undo