release: v0.50.246 — 5-PR batch by nesquena-hermes · Pull Request #1343 · nesquena/hermes-webui

nesquena-hermes · 2026-04-30T16:24:17Z

Release v0.50.246 — 5-PR batch

Constituent PRs

#	Author	Subject	Lines
#1335	@bergeouss	feat(chat): render fenced code blocks in user messages (#1325)	+161 −1
#1337	@dso2ng	fix(ui+pwa): mermaid render error cleanup + cache-busting on static asset URLs	+140 −28
#1338	nesquena-hermes (already APPROVED by nesquena)	fix(streaming+ui): preserve user message on cancel + persist activity panel expand state (#1298)	+419 −5
#1339	@jimdawdy-hub	fix(streaming): handle list `fallback_providers` config	+18 −12
#1341	@fxd-jason	fix(models): persist context_length / threshold_tokens / last_prompt_tokens (#1318 split)	+12 −0

Held with feedback (not in this release)

fix(settings): preserve provider identity on model picker reload (#1313) #1336 (@bergeouss) — closed as superseded by Fix default model rehydration when providers share slash-qualified model IDs #1326 (already in v0.50.245), plus drops an empty-id guard. Detailed comment explains.
feat(sessions): add /branch command to fork conversations from any message (#465) #1342 (@bergeouss) — /branch slash command. Real CI failures (60s timeout in own test, Korean locale parity broken) + scope concerns. Held with a 2-path "fix tests + scope discussion or split" comment.

Pre-release gate

✅ pytest: 3343 passed, 2 skipped, 3 xpassed, 0 failed in 83s (up from 3309 baseline, +34 new tests across the batch)
✅ Flaky test stabilized: test_checkpoint_fires_on_activity_counter_increment was failing intermittently on Python 3.11 in CI (timing-based polling). Replaced time.sleep windows with threading.Event synchronization. Verified locally over multiple runs.
⏳ Opus diff review: queued in parallel (claude-opus-4-7, max thinking, full diff brief)
⏳ Browser sanity QA: pending (touches static/ui.js, static/index.html, static/sw.js)

CHANGELOG conflict resolution

PR #1338's CHANGELOG entry was originally added under [v0.50.245]. That section already shipped, so the squash conflict was resolved by keeping HEAD (the shipped v0.50.245 entries unchanged) and re-adding the #1338 entries fresh under the new [v0.50.246] section. No content lost — just relocated.

Diff stats

.env.example                                     |   2 +-  (already in v0.50.245)
CHANGELOG.md                                     |  20 +
CONTRIBUTORS.md                                  | new file
README.md                                        |  35 +-
ROADMAP.md                                       |   4 +-
SPRINTS.md                                       |  36 +-
ARCHITECTURE.md                                  |  13 +-
TESTING.md                                       |   6 +-
api/models.py                                    |  11 +
api/routes.py                                    |   8 +-
api/streaming.py                                 |  44 +-
static/index.html                                |  18 +-
static/sw.js                                     |   4 +
static/ui.js                                     | 200 ++++++--
tests/test_1325_user_fenced_code.py              | new (123 lines)
tests/test_issue1298_cancel_and_activity.py      | new (267 lines)
tests/test_issue347.py                           |  15 +
tests/test_issue765_streaming_persistence.py     |  21 +-
tests/test_onboarding_static.py                  |   2 +-
tests/test_pr1339_fallback_providers_list.py     | new (75 lines)
tests/test_pwa_manifest_sw.py                    |  20 +
tests/test_renderer_js_behaviour.py              |  20 +
tests/test_service_worker_api_cache.py           | new
tests/test_sprint9.py                            |   2 +-

Closes #1298, #1318 (via partial fix), #1325.

…ns in Session model (#1318 split) From PR #1341. Co-authored-by: fxd-jason <[email protected]>

…single fallback_model dict From PR #1339. Co-authored-by: Jim Dawdy <[email protected]>

… asset URLs on every release From PR #1337. Co-authored-by: Dennis Soong <[email protected]>

From PR #1335. Co-authored-by: bergeouss <[email protected]>

…-panel expand state (#1298) From PR #1338. Already independently APPROVED by nesquena before being absorbed into v0.50.246. CHANGELOG entries from this PR were dropped during squash (the v0.50.245 section is already shipped); they will be re-added under [v0.50.246] in the release commit. Co-authored-by: nesquena-hermes <[email protected]>

…back list - tests/test_issue765_streaming_persistence.py — replace timing-based polling in test_checkpoint_fires_on_activity_counter_increment with deterministic threading.Event-driven sync. The old version used time.sleep(0.15)+(0.25)+(0.25) with a 0.1s polling thread, which under CI scheduling jitter could miss the second increment and complete with only 1 save instead of 2. Now waits up to 3.0s for save_count to advance to the target after each increment. Locally observed flake on Python 3.11 in CI run 25175204451. - tests/test_pr1339_fallback_providers_list.py — new structural test that asserts streaming.py handles both legacy fallback_model (single dict) and new fallback_providers (list form) without calling .get() on a list. Three assertions: both keys consulted, list-form has explicit isinstance check, _fallback_resolved defaults to None.

Combines: - 4 contributor PRs (#1335 user fenced code, #1337 mermaid+cache-bust, #1339 fallback_providers list, #1341 context_length persistence) - Self-built #1338 (cancel data-loss + activity panel) — already independently APPROVED by nesquena before absorption - CONTRIBUTORS.md and markdown refresh from #1340 See CHANGELOG.md for the full list with author credit.

…us review) Pre-release Opus review on v0.50.246 caught a SHOULD-FIX in PR #1338's cancel_stream synthesis: the symmetric substring guard (_pending_user in _last_content OR _last_content in _pending_user) was too loose. Common confirmation replies ("ok", "yes", "go") in the prior turn would match longer follow-up prompts ("ok please continue"), the synthesis would be skipped, and the user's typed text would be lost — exactly the data-loss bug #1298 was supposed to fix. The fix: gate the substring check on a timestamp comparison. Only treat the latest user turn as 'already merged by the streaming thread' if its timestamp is at or after pending_started_at. Earlier turns whose content happens to be a substring of the pending must not short-circuit synthesis. Also drops the symmetric (_last_content in _pending_user) branch — that direction was the false-positive vector. Keeps the equality and prefix match (workspace-prefix tolerance from the streaming thread). Adds tests/test_issue1298_cancel_and_activity.py:: test_cancel_synthesizes_when_prior_turn_content_is_substring_of_pending — regression for the exact 'ok' → 'ok please continue' scenario.

nesquena-hermes · 2026-04-30T16:30:04Z

Opus pre-release review — applied in commit `f328f3b`

Ran claude-opus-4-7 --thinking enabled --effort max against the production diff. MUST-FIX: none. SHOULD-FIX: 1 (applied). NIT: 8 (deferred).

SHOULD-FIX applied: `cancel_stream` substring guard false-positive

PR #1338's substring guard for "is the latest user turn already the synthesized message" was symmetric:

if _pending_user in _last_content or _last_content in _pending_user:
    _already_persisted = True

The second branch (_last_content in _pending_user) is too loose. Scenario:

Prior user turn: "ok" (common confirmation reply)
User now types: "ok please continue with the analysis"
Cancel races the streaming thread → reaches synthesis with pending_user_message set, latest message in s.messages is the prior "ok" turn
"ok" in "ok please continue with the analysis" → True → _already_persisted = True → synthesis skipped → the new message is lost, re-introducing the exact data-loss bug Bug: Activity panel collapses during streaming and latest user message disappears after Stop #1298 was supposed to fix

Fix (in f328f3b): gate the substring check on a pending_started_at timestamp comparison. The latest user turn is only treated as "already merged" if its timestamp is at or after the current turn's pending_started_at. Earlier turns whose content is a coincidental substring no longer short-circuit synthesis. Drops the symmetric _last_content in _pending_user branch entirely — that direction was the false-positive vector. Keeps == equality + _pending_user in _last_content (workspace-prefix tolerance, which the streaming thread legitimately produces).

Added regression test: test_cancel_synthesizes_when_prior_turn_content_is_substring_of_pending — sets up exactly the "ok" → "ok please continue" scenario and verifies the synthesis runs.

NITs deferred (non-blocking)

#	Description	Action
1	Duplicate inline `from urllib.parse import quote` in `routes.py`	Hoist to module top in a follow-up
2	`_renderUserFencedBlocks` placeholder collision (`\x00UF<n>\x00` not escaped by `esc()`)	Low-likelihood cosmetic glitch — strip NUL upfront in a follow-up
3	Lang regex differs slightly between `_renderUserFencedBlocks` and `renderMd`	Cosmetic, no functional impact
4	`fallback_model` precedence over `fallback_providers` is silent	Add a one-time log warning if both keys present
5	`typeof === 'function'` guards on locally-declared functions are dead code	Drop in a follow-up
6	`_renderUserFencedBlocks` XSS surface — verified clean (allowlist regex, `esc()` everywhere, no HTML injection vector)	No action
7	`quote(WEBUI_VERSION, safe="")` correctness — verified handles all `git describe` outputs (tags, commits, `--dirty`, `unknown`)	No action
8	`finalizeThinkingCard` only respects user-expand, not user-collapse — verified the singleton is cleared between turns	No action

Final verification

✅ pytest after fix: 3344 passed, 0 failed in 74s (+1 from the regression test)
✅ f328f3b pushed to release/v0.50.246

Ready for review.

nesquena

Review — end-to-end ✅ (approve with one substantive non-blocking concern on #1341)

Release v0.50.246 integrating 5 PRs. #1338 was previously formally approved by nesquena (review-4206679556) and the squash matches that approved code byte-for-byte (verified below). The other four constituents (#1335, #1337, #1339, #1341) are reviewed here for the first time.

Squash audit (byte-identical to open PRs)

e2d33ff #1341  12+/0  across 2 files       ✅ matches open PR
09e12e3 #1339  18+/12-  across 1 file       ✅ matches open PR
fbe84d2 #1337  140+/28- across 10 files     ✅ matches open PR
1fa740d #1335  161+/1-  across 2 files      ✅ matches open PR
d4b055c #1338  417+/5-  across 3 files      ✅ matches prior approved diff (the
                                                2-line gap vs the open PR's 419+/-5
                                                is the CHANGELOG entries that were
                                                relocated under [v0.50.246] in the
                                                release commit, per the PR body)
50418cd  test stabilizer + #1339 regression — release-branch test commit

Confirmed #1338 carries the approved code

d4b055c matches d90a605 (the head of PR #1338) for the actual code surface:

_liveActivityUserExpanded markers in static/ui.js: 6 occurrences in both
_onLiveActivityToggle: 2 occurrences in both
The # ── Preserve the user's typed message before clearing pending state (#1298) ── comment in api/streaming.py: 1 occurrence in both
tests/test_issue1298_cancel_and_activity.py is byte-identical (diff shows 0 lines)

So my prior end-to-end trace for #1338 stands and I'm not re-tracing here.

End-to-end traces (new content)

PR #1335 — render fenced code blocks in user messages (@bergeouss, fixes #1325)

New helper _renderUserFencedBlocks(text) at static/ui.js:60-87. Pipeline:

Match /```([a-zA-Z0-9_+-]*)\n([\s\S]*?)```/g. Each fence becomes <pre><code> (with diff/patch colored-line variant) and is stashed under a \x00UF<idx>\x00 placeholder.
The remainder of the string is escaped via esc() and \n → <br>.
Placeholders are restored from the stash.

Wire-up at static/ui.js:3087: user-message branch in renderMessages() swaps esc(...).replace(/\n/g,'<br>') for _renderUserFencedBlocks(content). Assistant branch unchanged (still renderMd).

Security: the stashed code is esc(code) — already HTML-safe before stashing. The placeholder pattern uses \x00 null bytes which are extremely unlikely in user input but technically possible via JSON-encoded chat input (�). If a user types literal \x00UF99\x00, the regex matches and stash[99] is undefined → coerces to literal "undefined" text in output. Not an XSS vector — content of any matched stash entry is already escaped at stash time. Worst case is cosmetic UX (literal "undefined" text). Behavioural test (test_1325_user_fenced_code.py) actually runs the JS via node — confirmed pass.

PR #1337 — Mermaid render error cleanup + PWA cache-bust (@dso2ng)

Two related fixes:

Mermaid temp DOM cleanup at static/ui.js:4138-4151. When mermaid.render(id, code) fails, the lib leaves a body-level <div id="d<id>">…</div> containing a "Syntax error in text" SVG. The previous code never removed this. Fix: document.getElementById('d'+id) is removed on both success AND failure paths. Failed blocks also drop the mermaid-block class so a later render pass can't retry the same broken content.
Tighter mermaid heuristic at static/ui.js:1085-1097: rejects line-numbered tool output (/^\s*\d+\|/.test(firstCodeLine)) and requires the first non-comment line to start with a known mermaid keyword (graph, flowchart, sequenceDiagram, pie, gantt, etc., 30+ keywords).
PWA cache-bust at static/index.html:50,954-964: script tags and sw.js registration get ?v=__WEBUI_VERSION__. The token is server-side substituted in api/routes.py:720-723 and api/routes.py:781-786 — uses urllib.parse.quote(WEBUI_VERSION, safe="") to URL-encode (defensive against unusual git-tag formats). WEBUI_VERSION is read from git describe --tags --always --dirty or api/_version.py (api/updates.py:60-92), so it's server-controlled, not user-input.
Service worker bypass for itself at static/sw.js:65-67: if (url.pathname.endsWith('/sw.js')) return; — returning from the fetch handler lets the browser do a normal network fetch, ensuring the freshly-versioned sw.js is always retrieved.

Cross-tool: webui-only, no agent surface.

PR #1339 — handle list `fallback_providers` config (@jimdawdy-hub)

Pre-fix at api/streaming.py:1701: _fallback = _cfg.get('fallback_model') or None — if user had fallback_providers: [{...}, {...}] (the chained-fallback form), this returned None. Even if it had returned the list, _fallback.get('model', '') on a list would raise AttributeError.

Post-fix:

_fallback = _cfg.get('fallback_model') or _cfg.get('fallback_providers') or None
isinstance(_fallback, list) → iterate to first valid entry; isinstance(_fallback, dict) → use directly.

Verified against agent CLI behaviour at /tmp/hermes-agent-fresh/cli.py:2055-2059: the CLI does the same dual-key lookup but in opposite order (fallback_providers first, fallback_model fallback). If a user has BOTH keys set (rare), the CLI will use fallback_providers while the WebUI will use fallback_model. Minor inconsistency, not a bug since this is an edge case (users typically have only one).

Behavioural test (test_pr1339_fallback_providers_list.py) is structural-only (asserts that streaming.py contains the dual-key check via grep). The trace above covers the actual behaviour.

PR #1341 — context_length / threshold_tokens / last_prompt_tokens persistence (@fxd-jason, "#1318 split")

This is the constituent that needs the most attention. The PR enables persistence but does not add the writer, so the user-visible bug it claims to fix is NOT actually fixed by this PR alone.

What the PR does:

Layer	Change	File:line
`Session.__init__`	Accepts the 3 fields as named kwargs	api/models.py:321-322,347-349
`Session.save()`	Lists them in `METADATA_FIELDS`	api/models.py:369
`Session.compact()`	Exposes them on the dict	api/models.py:461-463
`/api/session` GET	Returns them in response	api/routes.py:929-931

What the PR does NOT do, and what's needed to actually fix the bug:

$ grep -rn "= .*context_length\|context_length =" api/
api/models.py:347:        self.context_length = context_length    # constructor only
api/streaming.py:2209:                usage['context_length'] = getattr(_cc, 'context_length', 0) or 0
                                       ↑ writes to the SSE usage payload, NOT to session

There is no writer to s.context_length anywhere in the codebase. The PR description claims "These fields are written by streaming.py during context compression" — that's inaccurate. They're written to the SSE usage payload at api/streaming.py:2209-2211, but never to the session itself.

Without a writer, the data flow is:

agent.context_compressor.context_length  (live during streaming)
    ↓ getattr at api/streaming.py:2209
SSE 'done' event → usage payload     (delivered to live client)
    ↓
S.lastUsage in browser memory          (fine while page is open)

  ── page reload here → S.lastUsage cleared ──

GET /api/session → s.context_length    (None on every reload, since no
                                        writer ever populated it)
    ↓
sessions.js:500 _pick(u.context_length, _s.context_length)  → 0 default
    ↓
Context-ring shows 0%                  (the bug, still present)

The CHANGELOG entry at line 16 says "The frontend context-ring indicator was previously losing its percentage on every session load because the Session model silently dropped these fields when reconstructing from disk." That's only true if something was writing to __dict__['context_length'] so save()'s extra dict caught it (see api/models.py:375-377). Without a writer, nothing lands in __dict__ either. The "silently dropped" phrasing implies the data was being written but lost on load — but it was never being written to the Session in the first place.

To actually fix the bug, a 3-line addition is needed near api/streaming.py:2208:

_cc = getattr(agent, 'context_compressor', None)
if _cc:
    usage['context_length'] = getattr(_cc, 'context_length', 0) or 0
    usage['threshold_tokens'] = getattr(_cc, 'threshold_tokens', 0) or 0
    usage['last_prompt_tokens'] = getattr(_cc, 'last_prompt_tokens', 0) or 0
+    # Persist on session so the indicator restores after a page reload (#1318)
+    s.context_length = usage['context_length']
+    s.threshold_tokens = usage['threshold_tokens']
+    s.last_prompt_tokens = usage['last_prompt_tokens']

This sits inside the existing with _agent_lock: block (api/streaming.py:2152) which already owns the session lock during the post-result merge, and the subsequent s.save() would persist them.

I'm not pushing this fix — it's a release-batch PR and the title says "(#1318 split)", explicitly signalling this is a deliberate part-1 component. Adding the writer here would expand scope beyond what fxd-jason / the maintainer chose to land. But the user-visible claim in the CHANGELOG is misleading and a follow-up PR is needed to actually close #1318. Approving as-is because the data-structure changes are safe and load-bearing for whatever follows; the CHANGELOG entry should be tightened to "data-structure scaffolding for #1318 — writer in follow-up".

Cross-PR interactions

Three PRs touch static/ui.js (#1335, #1337, #1338) and one touches api/streaming.py (#1339, #1338). Verified disjoint regions:

#1335 — adds _renderUserFencedBlocks near line 60 + swap at line ~3087 (renderMessages user branch).
#1337 — modifies renderMd mermaid heuristic near line 1085 + renderMermaidBlocks near line 4135.
#1338 — adds _liveActivityUserExpanded tracker near line 2545 + ensureActivityGroup re-create branch near 2562 + finalizeThinkingCard near 4180.
#1339 — modifies streaming.py:1701-1722 (fallback resolution).
#1338 — modifies streaming.py:2576-2627 (cancel_stream recovery block).

No overlap. Tests pass.

Cross-tool consistency

#1339 fallback_providers: agent CLI handles both keys (cli.py:2055-2059). WebUI now consistent.
#1341 context_length / threshold_tokens / last_prompt_tokens: webui-only fields, surfaced from agent's compressor via SSE; no agent-side persistence implication. Cross-tool clean.
#1337 cache-bust: webui-only, no agent surface.
#1335 user fenced code: webui rendering only, no agent surface.
#1338 cancel recovery: webui session state only, no agent surface (pending_user_message is a WebUI-only field).

Security audit

✅ No new endpoints, auth gates, or file-serving code paths.
✅ No eval/exec/Function()/shell=True in the diff.
✅ No secrets/tokens in the diff.
✅ #1335 user fenced code: stashed code is escaped before stashing; null-byte placeholder pattern is XSS-safe (worst case is "undefined" text rendering).
✅ #1337 cache-bust token substitution: urllib.parse.quote(WEBUI_VERSION, safe="") defensively URL-encodes; WEBUI_VERSION is server-controlled (git describe / build-time file).
✅ #1339 fallback list iteration: isinstance guards on every step prevent type confusion.
✅ #1338: re-confirmed against my prior approval.
✅ #1341: data-structure-only, no security surface.

Edge-case matrix

#	Scenario	Expected	Actual
1335	Fence-only user message	Renders as `<pre><code>` with language class	✅ harness
1335	Plain text + fence	Plain text escaped (no markdown), fence rendered	✅ harness
1335	Diff/patch language	Per-line +/- coloring	✅ harness
1335	User types `\x00UF0\x00` literally	No XSS, just literal text	✅ stash content already escaped
1337	Valid mermaid block	Renders SVG, removes temp DOM	✅ trace
1337	Invalid mermaid (syntax error)	Falls back to code block, removes temp DOM	✅ trace
1337	Line-numbered tool output marked `mermaid`	Now skipped (heuristic)	✅ trace
1337	New release deployed	Browser fetches fresh assets via `?v=`	✅ trace
1339	`fallback_model: {model: ..., provider: ...}` (legacy)	Used as-is	✅ dict branch
1339	`fallback_providers: [{model: ...}, {...}]` (chain)	First valid entry used	✅ list branch
1339	`fallback_providers: []` (empty list)	`_fallback_resolved = None`	✅ no valid entries
1339	Both keys set	`fallback_model` wins (legacy-first)	⚠️ CLI prefers `fallback_providers` — minor inconsistency
1341	New session, no streaming yet	Fields = None	✅
1341	After streaming, in-memory session	Fields = None still (no writer)	❌ user-visible bug NOT fixed
1341	Page reload, session loaded from disk	Fields = None still (nothing was saved)	❌ user-visible bug NOT fixed

Tests

PR-specific tests across 10 files: 155/155 pass
Full suite: 3291 passed, 54 skipped, 3 xpassed, 0 failed in 15.75s (3348 collected)
The PR description claims "3343 passed" — close enough; the discrepancy is consistent with skipped/xpassed/subtests being counted differently across runs.
test_pr1339_fallback_providers_list.py is structural (regex-based) but the trace covers the runtime behaviour.

Minor observations (non-blocking)

#1341 is data-structure-only: see #1341 trace above. Approving on the basis that the persistence scaffolding is harmless and load-bearing for whatever follows; CHANGELOG entry should ideally be reworded to make clear the writer is still pending. The user-visible bug ("context-ring shows 0% after reload") is NOT fixed by this PR alone.
#1339 ordering inconsistency: WebUI tries fallback_model first, CLI tries fallback_providers first. If a user has both set, behaviour differs. Recommend matching the CLI's preference order in a follow-up. Not a bug since both keys aren't supposed to coexist.
#1335 null-byte placeholder collision: literal \x00UF<N>\x00 in user input could trigger a placeholder restore for an unrelated stash index. Not exploitable (output is escaped either way) but a hardened version could use a Math.random() salt in the placeholder. Cosmetic edge case.
PR description's test count 3343 passed vs my local 3291 passed — a 52-test gap. Probably environment differences (some tests skip in non-CI runs); 0 failures is what matters.
CHANGELOG line 16 — "silently dropped these fields when reconstructing from disk" — overstates the fix. Without a writer, nothing was ever written to disk in the first place. See #1341 trace above.

Recommendation

✅ Approved. Five constituent traces clean (security-safe, behaviorally correct, no cross-tool regressions). #1338 squash matches the previously-approved code byte-for-byte. The flagged #1341 gap (no writer) is a documentation/scope issue, not a correctness regression — the persistence scaffolding is safe to merge and unblocks the follow-up writer PR. Approval is conditional on a follow-up landing the writer (3 lines in streaming.py near line 2208) before claiming #1318 fully fixed.

Parked at approval — ready for the release agent's merge/tag pipeline.

Pre-release Opus + nesquena review on v0.50.246 caught that PR #1341 added the data-structure scaffolding (Session.__init__ accepts the 3 fields, save() persists them, compact() exposes them, GET /api/session returns them) but did NOT add the writer that actually populates them. Without a writer, the user-visible bug (context-ring shows 0% after page reload) was NOT fixed by #1341 alone — the fields stayed None forever because nothing wrote to s.context_length anywhere. Adds the writer at api/streaming.py:2188 (post-merge per-turn save block, before s.save()) so the values from agent.context_compressor land on disk and survive page reloads. Also moves the SSE usage payload comment to clarify that the live SSE payload and the session-level persistence are now distinct paths (payload below, persistence above). Adds tests/test_pr1341_context_window_persistence.py — 6 structural + round-trip tests covering Session __init__/save/compact, the routes response, and the streaming.py writer placement. Closes #1318 (the actual user-visible bug, not just the scaffolding).

nesquena-hermes · 2026-04-30T16:43:32Z

Update — addressed nesquena's #1341 concern in this release

Nathan said "fix small things ourselves" and your review explicitly recommended:

To actually fix the bug, a 3-line addition is needed near api/streaming.py:2208 (...)
Approval is conditional on a follow-up landing the writer

Rather than punt to a follow-up, I landed the writer in this release as a5c10d5. The branch went from "scaffolding-only #1341" to "fully fixes #1318 end-to-end."

Two new commits since your APPROVED review

SHA	Title	Lines
`0d599b4`	fix(streaming): persist context_length on session — completes #1318 fix	+155/-1
`a5c10d5`	(amended into above — adds test ordering fix + CHANGELOG tightening)

What the writer does

Added inside the post-merge per-turn save block at api/streaming.py:2188, before s.save():

# Persist context window data on the session so the context-ring
# indicator survives a page reload (#1318). Must run BEFORE
# s.save() for the same reason as the reasoning trace above.
_cc_for_save = getattr(agent, 'context_compressor', None)
if _cc_for_save:
    s.context_length = getattr(_cc_for_save, 'context_length', 0) or 0
    s.threshold_tokens = getattr(_cc_for_save, 'threshold_tokens', 0) or 0
    s.last_prompt_tokens = getattr(_cc_for_save, 'last_prompt_tokens', 0) or 0

The SSE usage payload block below was preserved unchanged (still emits the same data live to the browser); a comment was added clarifying that the live SSE path and the session-level persistence are now distinct.

CHANGELOG tightening (per your minor observation)

The v0.50.246 entry for #1341 was rewritten to **Context window indicator persists across page reloads (#1318 — fully fixed)** and now explicitly mentions the writer in api/streaming.py. The "silently dropped these fields when reconstructing from disk" phrasing you flagged as overstating the fix is gone.

New regression test: `test_pr1341_context_window_persistence.py`

6 tests, all green:

test_streaming_persists_context_fields_on_session_before_save — the writer is present before s.save() (anchored on the unique if _reasoning_text and s.messages: marker).
test_session_init_accepts_context_fields — __init__ signature includes all 3 fields.
test_session_metadata_fields_includes_context_fields — METADATA_FIELDS covers all 3.
test_session_compact_exposes_context_fields — compact() output dict includes all 3.
test_routes_session_get_returns_context_fields — GET response includes all 3.
test_session_round_trip_persists_context_fields — real round-trip: write fields, save to a tmp_path SESSION_DIR (via monkeypatch.setattr — no module reload, so we don't pollute sys.modules for sibling tests), reload via Session.load(), all 3 values match.

The first round of this test broke test_session_sidecar_repair.py::TestLastResortSyncDelegation because the original implementation del'd api.models from sys.modules, which corrupted shared module state for tests that ran after it. Switched to monkeypatch.setattr(models, "SESSION_DIR", ...) — purely local to the test, no global state touched.

Test count

Pre-fix gate: 3344 passed
Post-writer gate: 3350 passed, 0 failed in 76.65s (+6 new tests, no regressions)

Re-requesting review

Since this is a substantive code addition post-approval, the prior approval's caveat ("conditional on a follow-up") is now moot — the writer landed here, not in a follow-up. Re-requesting review so you can confirm the writer placement and the round-trip semantics are correct. CI is also re-running on a5c10d5.

nesquena

Followup Review — both flagged concerns addressed ✅ (clean re-approve)

This is a followup to my prior approval (4206870751). Two new commits landed on the release branch since then, and they directly address the two concerns my earlier trace flagged:

New commit	Fixes	Origin
`f328f3b`	Cancel-stream substring-guard false positive	Pre-release Opus review
`a5c10d5`	#1341 missing writer (the user-visible bug NOT fixed before)	My earlier review observation

Both fixes traced cleanly. Re-approving.

`f328f3b` — gate substring guard on `pending_started_at` timestamp

The concern. PR #1338 added a content-match guard in cancel_stream() to avoid double-appending the user's typed turn when the streaming thread had already merged it. The guard was symmetric:

if _pending_user in _last_content or _last_content in _pending_user:
    _already_persisted = True

The _last_content in _pending_user direction is a false-positive vector. If the prior user turn was a short confirmation ("ok"), and the cancelled turn is a longer follow-up ("ok please continue"), then "ok" in "ok please continue" is True → guard fires → synthesis is skipped → user's "ok please continue" is lost. Exactly the data-loss bug #1298 was supposed to fix.

The fix. Two changes at api/streaming.py:2623-2643:

Add a timestamp gate: read _pending_started = _cs.pending_started_at and _last_ts = _last_user.get('timestamp'); only treat the latest user turn as "already merged" if _last_ts >= _pending_started. An earlier turn whose content happens to be a substring of the new pending must NOT short-circuit synthesis.
Drop the symmetric in direction: replace _pending_user in _last_content or _last_content in _pending_user with _pending_user == _last_content or _pending_user in _last_content. Keeps equality (exact match) and prefix-tolerance (the streaming thread prepends [Workspace: ...] which makes _last_content a strict superset of _pending_user), drops the false-positive direction.

Race trace under the new guard:

Cancel timing	`_pending_user`	`_last_content` (latest user msg)	`_last_ts` vs `_pending_started`	`isinstance str`	content match	Result
Beats merge, prior was "ok"	"ok please continue"	"ok"	`T1 < T2` → fail	str → True	n/a	Synthesize ✓
Loses race, plain text	"ok please continue"	"ok please continue"	`T2 == T2` → pass	str	equality True	Skip ✓
Loses race, workspace prefix	"ok please continue"	"[Workspace: /tmp]\nok please continue"	`T2 == T2` → pass	str	`_pending_user in _last_content` True	Skip ✓
Loses race, multimodal	"describe this"	`[{type: 'text', ...}, {type: 'image_url', ...}]`	`T2 == T2` → pass	list → False	n/a	Synthesize ✓ — but only if `_pending_user is set`, which the streaming thread already cleared (under same lock as merge), so the outer `if _pending_user and ...` short-circuits first → no double-append ✓
Legacy turn with no timestamp	"ok please continue"	"ok"	`0 < T2` → fail	str → True	n/a	Synthesize ✓ — defaults to safe behavior (avoid data loss)

All scenarios safe. The new test test_cancel_synthesizes_when_prior_turn_content_is_substring_of_pending locks the regression. ✅

`a5c10d5` — adds the missing writer for `#1341`

The concern from my prior review: PR #1341 added the Session data structure (constructor params, METADATA_FIELDS, compact(), /api/session route) but did NOT add the writer in streaming.py. Without a writer, s.context_length stayed None forever and the post-reload context-ring still showed 0% — so the user-visible bug was NOT fixed by #1341 alone.

The fix. New 5-line writer block at api/streaming.py:2188-2198, placed BEFORE the existing s.save() at line 2199:

# Persist context window data on the session so the context-ring
# indicator survives a page reload (#1318). Must run BEFORE
# s.save() for the same reason as the reasoning trace above.
_cc_for_save = getattr(agent, 'context_compressor', None)
if _cc_for_save:
    s.context_length = getattr(_cc_for_save, 'context_length', 0) or 0
    s.threshold_tokens = getattr(_cc_for_save, 'threshold_tokens', 0) or 0
    s.last_prompt_tokens = getattr(_cc_for_save, 'last_prompt_tokens', 0) or 0
s.save()

End-to-end trace verified:

Block sits inside with _agent_lock: (opened at api/streaming.py:1969). Same lock as all other session mutations. ✅
Runs after the result-merge and BEFORE s.save() — so the values are in __dict__ when save serializes. ✅
getattr(_cc_for_save, '...', 0) or 0 defends against missing fields and explicit None from older agent builds. ✅
The values now flow: agent.context_compressor → s.context_length (in-memory) → s.save() writes to disk via METADATA_FIELDS round-trip → Session.load() reads them back via the named kwargs from #1341 → /api/session GET returns them → frontend _pick(u.X, _s.X) falls back to _s.X after page reload (when S.lastUsage is empty). ✅
The SSE usage payload below at line 2207-2211 still captures the same values — live streaming clients still get them in real time. The new writer is the persistence path that survives reload. The two paths are now distinct, with comments updated to make that clear.

Lock safety: identical to other session writers in the same with _agent_lock: block (s.tool_calls, s.active_stream_id = None, s.input_tokens += ...). No concurrent reader can race because both readers (HTTP /api/session handler, cancel_stream) acquire _get_session_agent_lock(sid) for the same lock. Safe. ✅

Round-trip test verified: test_session_round_trip_persists_context_fields actually saves a Session with the fields set, calls Session.load(sid), asserts the values come back identical (200000 / 180000 / 45123). Real round-trip, not regex. ✅

CHANGELOG entry rewritten to remove the misleading "silently dropped these fields" phrasing and credit the writer addition. The new entry honestly reflects that #1341 was scaffolding-only and the writer was added during pre-release review.

Tests after these followup fixes

tests/test_issue1298_cancel_and_activity.py — 10/10 pass (added test_cancel_synthesizes_when_prior_turn_content_is_substring_of_pending)
tests/test_pr1341_context_window_persistence.py — 6/6 pass (5 structural + 1 round-trip)
Full suite: 3298 passed, 54 skipped, 3 xpassed, 0 failed in 15.59s (up from 3291 in my prior run; +7 new tests across these two commits)

Edge-case re-check

Scenario	Before `f328f3b`	After `f328f3b`
Prior turn "ok", new turn "ok please continue", cancelled	❌ data loss (substring guard short-circuited synthesis)	✅ synthesis fires (timestamp gate detects T1 < T2)
Same content as prior, cancelled	✅ no double-append (equality match still fires)	✅ no double-append (equality kept)
Same content with workspace prefix, cancelled	✅ no double-append (prefix tolerance)	✅ no double-append (`_pending_user in _last_content` kept)
Multimodal user turn, cancelled mid-merge	✅ short-circuits via `pending_user_message=None` cleared under same lock	✅ same
Legacy turn with no `timestamp` field	n/a	✅ defaults to synthesize (safe — no data loss)

Scenario	Before `a5c10d5`	After `a5c10d5`
Streaming turn completes, page kept open	✅ live SSE updates context ring	✅ same (unchanged)
Streaming turn completes, page reloaded	❌ context ring shows 0% (fields never persisted)	✅ context ring restores from `/api/session`
Multiple turns over a session lifetime	❌ post-reload always 0%	✅ shows last-known values
Older agent build with no `context_compressor`	✅ skipped (no SSE values either)	✅ skipped (`if _cc_for_save:` guards)
`context_compressor` exists but values are 0	✅ SSE shows 0	✅ persisted as 0 — same

Recommendation

✅ Approved (followup). Both new commits trace cleanly, address the exact concerns raised in pre-release review, and have the right shape:

f328f3b removes the false-positive guard direction AND adds a timestamp gate as belt-and-suspenders. Regression test locks the substring scenario.
a5c10d5 adds the missing writer in the right place (inside _agent_lock, before s.save()). Round-trip test verifies persistence end-to-end. CHANGELOG honestly reflects the scope of what was actually fixed.

#1318 is now actually fixed (not just scaffolded). #1298 substring guard hardened against the false-positive case.

Parked at approval — ready for the release agent's merge/tag pipeline.

nesquena-hermes · 2026-04-30T16:51:16Z

Shipped — v0.50.246 ✅

Merge: dec4b486 (2026-04-30 16:49 UTC)
Tag: v0.50.246
Production restarted on port 8787, health check OK, cache-bust live (?v=v0.50.246 injected on every static asset).

Constituent PRs closed with credit

#	Author	Subject
#1335	@bergeouss	feat(chat): render fenced code blocks in user messages (#1325)
#1337	@dso2ng	fix(ui+pwa): mermaid render error cleanup + cache-busting on static asset URLs
#1338	nesquena-hermes	fix(streaming+ui): preserve user message on cancel + persist activity-panel expand state (#1298)
#1339	@jimdawdy-hub	fix(streaming): handle list `fallback_providers` config
#1341	@fxd-jason	fix(models): persist context_length / threshold_tokens / last_prompt_tokens (#1318 fully fixed)

Issues closed

Bug: Activity panel collapses during streaming and latest user message disappears after Stop #1298 (cancel data-loss + activity panel collapse) — closed via fix(#1298): preserve user message on cancel + persist activity-panel expand state #1338
fix: context window indicator not updating + responsive sidebar + chat area width at 769px #1318 (context window indicator persistence) — closed via fix: persist context_length/threshold_tokens/last_prompt_tokens in Session model #1341 + writer addition (a5c10d5)
feat(chat): render fenced code blocks in user message bubbles #1325 (fenced code in user messages) — closed via feat(chat): render fenced code blocks in user messages #1335

Held with detailed feedback

feat(sessions): add /branch command to fork conversations from any message (#465) #1342 (@bergeouss) — /branch slash command. Real CI failures + scope concerns. Held with 2-path "fix tests + scope discussion or split" comment.

Closed as superseded

fix(settings): preserve provider identity on model picker reload (#1313) #1336 (@bergeouss) — superseded by Fix default model rehydration when providers share slash-qualified model IDs #1326 (already in v0.50.245), plus dropped an empty-id guard.

Pre-release gate timeline

Initial squash + tests: 3343 passed
Opus pre-release review caught the cancel_stream substring guard false-positive — fix landed in f328f3b
nesquena APPROVED with one substantive flag (fix: persist context_length/threshold_tokens/last_prompt_tokens in Session model #1341 had no writer)
Writer added in a5c10d5 — converted scaffolding-only fix: persist context_length/threshold_tokens/last_prompt_tokens in Session model #1341 into a complete fix: context window indicator not updating + responsive sidebar + chat area width at 769px #1318 fix
nesquena re-APPROVED on the new commits
Final pytest: 3350 passed, 0 failed in 76.65s
CI: 3.11 / 3.12 / 3.13 all green
Browser sanity verified: fenced code XSS-safe, cache-bust live, Mermaid stricter detection live

Thanks for the thorough end-to-end review on both passes — catching that #1341 was scaffolding-only saved the user-visible bug from staying broken until a follow-up.

nesquena-hermes and others added 7 commits April 30, 2026 16:17

fix(models): persist context_length/threshold_tokens/last_prompt_toke…

e2d33ff

…ns in Session model (#1318 split) From PR #1341. Co-authored-by: fxd-jason <[email protected]>

fix(streaming): handle list fallback_providers config in addition to …

09e12e3

…single fallback_model dict From PR #1339. Co-authored-by: Jim Dawdy <[email protected]>

fix(ui+pwa): avoid stale Mermaid render errors and bust cached static…

fbe84d2

… asset URLs on every release From PR #1337. Co-authored-by: Dennis Soong <[email protected]>

feat(chat): render fenced code blocks in user messages (#1325)

1fa740d

From PR #1335. Co-authored-by: bergeouss <[email protected]>

nesquena-hermes requested a review from nesquena April 30, 2026 16:24

nesquena-hermes mentioned this pull request Apr 30, 2026

feat(chat): render fenced code blocks in user messages #1335

Closed

nesquena approved these changes Apr 30, 2026

View reviewed changes

nesquena-hermes requested a review from nesquena April 30, 2026 16:43

nesquena approved these changes Apr 30, 2026

View reviewed changes

nesquena-hermes merged commit dec4b48 into master Apr 30, 2026
3 checks passed

nesquena-hermes deleted the release/v0.50.246 branch April 30, 2026 16:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: v0.50.246 — 5-PR batch#1343

release: v0.50.246 — 5-PR batch#1343
nesquena-hermes merged 9 commits intomasterfrom
release/v0.50.246

nesquena-hermes commented Apr 30, 2026

Uh oh!

nesquena-hermes commented Apr 30, 2026

Uh oh!

nesquena left a comment

Uh oh!

nesquena-hermes commented Apr 30, 2026

Uh oh!

nesquena left a comment

Uh oh!

Uh oh!

nesquena-hermes commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nesquena-hermes commented Apr 30, 2026

Release v0.50.246 — 5-PR batch

Constituent PRs

Held with feedback (not in this release)

Pre-release gate

CHANGELOG conflict resolution

Diff stats

Uh oh!

nesquena-hermes commented Apr 30, 2026

Opus pre-release review — applied in commit f328f3b

SHOULD-FIX applied: cancel_stream substring guard false-positive

NITs deferred (non-blocking)

Final verification

Uh oh!

nesquena left a comment

Choose a reason for hiding this comment

Review — end-to-end ✅ (approve with one substantive non-blocking concern on #1341)

Squash audit (byte-identical to open PRs)

Confirmed #1338 carries the approved code

End-to-end traces (new content)

PR #1335 — render fenced code blocks in user messages (@bergeouss, fixes #1325)

PR #1337 — Mermaid render error cleanup + PWA cache-bust (@dso2ng)

PR #1339 — handle list fallback_providers config (@jimdawdy-hub)

PR #1341 — context_length / threshold_tokens / last_prompt_tokens persistence (@fxd-jason, "#1318 split")

Cross-PR interactions

Cross-tool consistency

Security audit

Edge-case matrix

Tests

Minor observations (non-blocking)

Recommendation

Uh oh!

nesquena-hermes commented Apr 30, 2026

Update — addressed nesquena's #1341 concern in this release

Two new commits since your APPROVED review

What the writer does

CHANGELOG tightening (per your minor observation)

New regression test: test_pr1341_context_window_persistence.py

Test count

Re-requesting review

Uh oh!

nesquena left a comment

Choose a reason for hiding this comment

Followup Review — both flagged concerns addressed ✅ (clean re-approve)

f328f3b — gate substring guard on pending_started_at timestamp

a5c10d5 — adds the missing writer for #1341

Tests after these followup fixes

Edge-case re-check

Recommendation

Uh oh!

Uh oh!

nesquena-hermes commented Apr 30, 2026

Shipped — v0.50.246 ✅

Constituent PRs closed with credit

Issues closed

Held with detailed feedback

Closed as superseded

Pre-release gate timeline

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Opus pre-release review — applied in commit `f328f3b`

SHOULD-FIX applied: `cancel_stream` substring guard false-positive

PR #1339 — handle list `fallback_providers` config (@jimdawdy-hub)

New regression test: `test_pr1341_context_window_persistence.py`

`f328f3b` — gate substring guard on `pending_started_at` timestamp

`a5c10d5` — adds the missing writer for `#1341`