Skip to content

release: v0.50.246 — 5-PR batch#1343

Merged
nesquena-hermes merged 9 commits intomasterfrom
release/v0.50.246
Apr 30, 2026
Merged

release: v0.50.246 — 5-PR batch#1343
nesquena-hermes merged 9 commits intomasterfrom
release/v0.50.246

Conversation

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Release v0.50.246 — 5-PR batch

Constituent PRs

# Author Subject Lines
#1335 @bergeouss feat(chat): render fenced code blocks in user messages (#1325) +161 −1
#1337 @dso2ng fix(ui+pwa): mermaid render error cleanup + cache-busting on static asset URLs +140 −28
#1338 nesquena-hermes (already APPROVED by nesquena) fix(streaming+ui): preserve user message on cancel + persist activity panel expand state (#1298) +419 −5
#1339 @jimdawdy-hub fix(streaming): handle list fallback_providers config +18 −12
#1341 @fxd-jason fix(models): persist context_length / threshold_tokens / last_prompt_tokens (#1318 split) +12 −0

Held with feedback (not in this release)

Pre-release gate

  • pytest: 3343 passed, 2 skipped, 3 xpassed, 0 failed in 83s (up from 3309 baseline, +34 new tests across the batch)
  • Flaky test stabilized: test_checkpoint_fires_on_activity_counter_increment was failing intermittently on Python 3.11 in CI (timing-based polling). Replaced time.sleep windows with threading.Event synchronization. Verified locally over multiple runs.
  • Opus diff review: queued in parallel (claude-opus-4-7, max thinking, full diff brief)
  • Browser sanity QA: pending (touches static/ui.js, static/index.html, static/sw.js)

CHANGELOG conflict resolution

PR #1338's CHANGELOG entry was originally added under [v0.50.245]. That section already shipped, so the squash conflict was resolved by keeping HEAD (the shipped v0.50.245 entries unchanged) and re-adding the #1338 entries fresh under the new [v0.50.246] section. No content lost — just relocated.

Diff stats

.env.example                                     |   2 +-  (already in v0.50.245)
CHANGELOG.md                                     |  20 +
CONTRIBUTORS.md                                  | new file
README.md                                        |  35 +-
ROADMAP.md                                       |   4 +-
SPRINTS.md                                       |  36 +-
ARCHITECTURE.md                                  |  13 +-
TESTING.md                                       |   6 +-
api/models.py                                    |  11 +
api/routes.py                                    |   8 +-
api/streaming.py                                 |  44 +-
static/index.html                                |  18 +-
static/sw.js                                     |   4 +
static/ui.js                                     | 200 ++++++--
tests/test_1325_user_fenced_code.py              | new (123 lines)
tests/test_issue1298_cancel_and_activity.py      | new (267 lines)
tests/test_issue347.py                           |  15 +
tests/test_issue765_streaming_persistence.py     |  21 +-
tests/test_onboarding_static.py                  |   2 +-
tests/test_pr1339_fallback_providers_list.py     | new (75 lines)
tests/test_pwa_manifest_sw.py                    |  20 +
tests/test_renderer_js_behaviour.py              |  20 +
tests/test_service_worker_api_cache.py           | new
tests/test_sprint9.py                            |   2 +-

Closes #1298, #1318 (via partial fix), #1325.

nesquena-hermes and others added 7 commits April 30, 2026 16:17
…ns in Session model (#1318 split)

From PR #1341.

Co-authored-by: fxd-jason <[email protected]>
…single fallback_model dict

From PR #1339.

Co-authored-by: Jim Dawdy <[email protected]>
… asset URLs on every release

From PR #1337.

Co-authored-by: Dennis Soong <[email protected]>
…-panel expand state (#1298)

From PR #1338. Already independently APPROVED by nesquena before being absorbed into v0.50.246.

CHANGELOG entries from this PR were dropped during squash (the v0.50.245 section is already
shipped); they will be re-added under [v0.50.246] in the release commit.

Co-authored-by: nesquena-hermes <[email protected]>
…back list

- tests/test_issue765_streaming_persistence.py — replace timing-based polling
  in test_checkpoint_fires_on_activity_counter_increment with deterministic
  threading.Event-driven sync. The old version used time.sleep(0.15)+(0.25)+(0.25)
  with a 0.1s polling thread, which under CI scheduling jitter could miss the
  second increment and complete with only 1 save instead of 2. Now waits up
  to 3.0s for save_count to advance to the target after each increment.
  Locally observed flake on Python 3.11 in CI run 25175204451.

- tests/test_pr1339_fallback_providers_list.py — new structural test that
  asserts streaming.py handles both legacy fallback_model (single dict) and
  new fallback_providers (list form) without calling .get() on a list. Three
  assertions: both keys consulted, list-form has explicit isinstance check,
  _fallback_resolved defaults to None.
Combines:
- 4 contributor PRs (#1335 user fenced code, #1337 mermaid+cache-bust,
  #1339 fallback_providers list, #1341 context_length persistence)
- Self-built #1338 (cancel data-loss + activity panel) — already
  independently APPROVED by nesquena before absorption
- CONTRIBUTORS.md and markdown refresh from #1340

See CHANGELOG.md for the full list with author credit.
…us review)

Pre-release Opus review on v0.50.246 caught a SHOULD-FIX in PR #1338's
cancel_stream synthesis: the symmetric substring guard
(_pending_user in _last_content OR _last_content in _pending_user) was too
loose. Common confirmation replies ("ok", "yes", "go") in the prior turn
would match longer follow-up prompts ("ok please continue"), the synthesis
would be skipped, and the user's typed text would be lost — exactly the
data-loss bug #1298 was supposed to fix.

The fix: gate the substring check on a timestamp comparison. Only treat
the latest user turn as 'already merged by the streaming thread' if its
timestamp is at or after pending_started_at. Earlier turns whose content
happens to be a substring of the pending must not short-circuit synthesis.

Also drops the symmetric (_last_content in _pending_user) branch — that
direction was the false-positive vector. Keeps the equality and prefix
match (workspace-prefix tolerance from the streaming thread).

Adds tests/test_issue1298_cancel_and_activity.py::
test_cancel_synthesizes_when_prior_turn_content_is_substring_of_pending —
regression for the exact 'ok' → 'ok please continue' scenario.
@nesquena-hermes
Copy link
Copy Markdown
Collaborator Author

Opus pre-release review — applied in commit f328f3b

Ran claude-opus-4-7 --thinking enabled --effort max against the production diff. MUST-FIX: none. SHOULD-FIX: 1 (applied). NIT: 8 (deferred).

SHOULD-FIX applied: cancel_stream substring guard false-positive

PR #1338's substring guard for "is the latest user turn already the synthesized message" was symmetric:

if _pending_user in _last_content or _last_content in _pending_user:
    _already_persisted = True

The second branch (_last_content in _pending_user) is too loose. Scenario:

  • Prior user turn: "ok" (common confirmation reply)
  • User now types: "ok please continue with the analysis"
  • Cancel races the streaming thread → reaches synthesis with pending_user_message set, latest message in s.messages is the prior "ok" turn
  • "ok" in "ok please continue with the analysis" → True → _already_persisted = True → synthesis skipped → the new message is lost, re-introducing the exact data-loss bug Bug: Activity panel collapses during streaming and latest user message disappears after Stop #1298 was supposed to fix

Fix (in f328f3b): gate the substring check on a pending_started_at timestamp comparison. The latest user turn is only treated as "already merged" if its timestamp is at or after the current turn's pending_started_at. Earlier turns whose content is a coincidental substring no longer short-circuit synthesis. Drops the symmetric _last_content in _pending_user branch entirely — that direction was the false-positive vector. Keeps == equality + _pending_user in _last_content (workspace-prefix tolerance, which the streaming thread legitimately produces).

Added regression test: test_cancel_synthesizes_when_prior_turn_content_is_substring_of_pending — sets up exactly the "ok""ok please continue" scenario and verifies the synthesis runs.

NITs deferred (non-blocking)

# Description Action
1 Duplicate inline from urllib.parse import quote in routes.py Hoist to module top in a follow-up
2 _renderUserFencedBlocks placeholder collision (\x00UF<n>\x00 not escaped by esc()) Low-likelihood cosmetic glitch — strip NUL upfront in a follow-up
3 Lang regex differs slightly between _renderUserFencedBlocks and renderMd Cosmetic, no functional impact
4 fallback_model precedence over fallback_providers is silent Add a one-time log warning if both keys present
5 typeof === 'function' guards on locally-declared functions are dead code Drop in a follow-up
6 _renderUserFencedBlocks XSS surface — verified clean (allowlist regex, esc() everywhere, no HTML injection vector) No action
7 quote(WEBUI_VERSION, safe="") correctness — verified handles all git describe outputs (tags, commits, --dirty, unknown) No action
8 finalizeThinkingCard only respects user-expand, not user-collapse — verified the singleton is cleared between turns No action

Final verification

  • ✅ pytest after fix: 3344 passed, 0 failed in 74s (+1 from the regression test)
  • f328f3b pushed to release/v0.50.246

Ready for review.

Copy link
Copy Markdown
Owner

@nesquena nesquena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — end-to-end ✅ (approve with one substantive non-blocking concern on #1341)

Release v0.50.246 integrating 5 PRs. #1338 was previously formally approved by nesquena (review-4206679556) and the squash matches that approved code byte-for-byte (verified below). The other four constituents (#1335, #1337, #1339, #1341) are reviewed here for the first time.

Squash audit (byte-identical to open PRs)

e2d33ff #1341  12+/0  across 2 files       ✅ matches open PR
09e12e3 #1339  18+/12-  across 1 file       ✅ matches open PR
fbe84d2 #1337  140+/28- across 10 files     ✅ matches open PR
1fa740d #1335  161+/1-  across 2 files      ✅ matches open PR
d4b055c #1338  417+/5-  across 3 files      ✅ matches prior approved diff (the
                                                2-line gap vs the open PR's 419+/-5
                                                is the CHANGELOG entries that were
                                                relocated under [v0.50.246] in the
                                                release commit, per the PR body)
50418cd  test stabilizer + #1339 regression — release-branch test commit

Confirmed #1338 carries the approved code

d4b055c matches d90a605 (the head of PR #1338) for the actual code surface:

  • _liveActivityUserExpanded markers in static/ui.js: 6 occurrences in both
  • _onLiveActivityToggle: 2 occurrences in both
  • The # ── Preserve the user's typed message before clearing pending state (#1298) ── comment in api/streaming.py: 1 occurrence in both
  • tests/test_issue1298_cancel_and_activity.py is byte-identical (diff shows 0 lines)

So my prior end-to-end trace for #1338 stands and I'm not re-tracing here.

End-to-end traces (new content)

PR #1335 — render fenced code blocks in user messages (@bergeouss, fixes #1325)

New helper _renderUserFencedBlocks(text) at static/ui.js:60-87. Pipeline:

  1. Match /```([a-zA-Z0-9_+-]*)\n([\s\S]*?)```/g. Each fence becomes <pre><code> (with diff/patch colored-line variant) and is stashed under a \x00UF<idx>\x00 placeholder.
  2. The remainder of the string is escaped via esc() and \n → <br>.
  3. Placeholders are restored from the stash.

Wire-up at static/ui.js:3087: user-message branch in renderMessages() swaps esc(...).replace(/\n/g,'<br>') for _renderUserFencedBlocks(content). Assistant branch unchanged (still renderMd).

Security: the stashed code is esc(code) — already HTML-safe before stashing. The placeholder pattern uses \x00 null bytes which are extremely unlikely in user input but technically possible via JSON-encoded chat input (). If a user types literal \x00UF99\x00, the regex matches and stash[99] is undefined → coerces to literal "undefined" text in output. Not an XSS vector — content of any matched stash entry is already escaped at stash time. Worst case is cosmetic UX (literal "undefined" text). Behavioural test (test_1325_user_fenced_code.py) actually runs the JS via node — confirmed pass.

PR #1337 — Mermaid render error cleanup + PWA cache-bust (@dso2ng)

Two related fixes:

  1. Mermaid temp DOM cleanup at static/ui.js:4138-4151. When mermaid.render(id, code) fails, the lib leaves a body-level <div id="d<id>">…</div> containing a "Syntax error in text" SVG. The previous code never removed this. Fix: document.getElementById('d'+id) is removed on both success AND failure paths. Failed blocks also drop the mermaid-block class so a later render pass can't retry the same broken content.
  2. Tighter mermaid heuristic at static/ui.js:1085-1097: rejects line-numbered tool output (/^\s*\d+\|/.test(firstCodeLine)) and requires the first non-comment line to start with a known mermaid keyword (graph, flowchart, sequenceDiagram, pie, gantt, etc., 30+ keywords).
  3. PWA cache-bust at static/index.html:50,954-964: script tags and sw.js registration get ?v=__WEBUI_VERSION__. The token is server-side substituted in api/routes.py:720-723 and api/routes.py:781-786 — uses urllib.parse.quote(WEBUI_VERSION, safe="") to URL-encode (defensive against unusual git-tag formats). WEBUI_VERSION is read from git describe --tags --always --dirty or api/_version.py (api/updates.py:60-92), so it's server-controlled, not user-input.
  4. Service worker bypass for itself at static/sw.js:65-67: if (url.pathname.endsWith('/sw.js')) return; — returning from the fetch handler lets the browser do a normal network fetch, ensuring the freshly-versioned sw.js is always retrieved.

Cross-tool: webui-only, no agent surface.

PR #1339 — handle list fallback_providers config (@jimdawdy-hub)

Pre-fix at api/streaming.py:1701: _fallback = _cfg.get('fallback_model') or None — if user had fallback_providers: [{...}, {...}] (the chained-fallback form), this returned None. Even if it had returned the list, _fallback.get('model', '') on a list would raise AttributeError.

Post-fix:

  • _fallback = _cfg.get('fallback_model') or _cfg.get('fallback_providers') or None
  • isinstance(_fallback, list) → iterate to first valid entry; isinstance(_fallback, dict) → use directly.

Verified against agent CLI behaviour at /tmp/hermes-agent-fresh/cli.py:2055-2059: the CLI does the same dual-key lookup but in opposite order (fallback_providers first, fallback_model fallback). If a user has BOTH keys set (rare), the CLI will use fallback_providers while the WebUI will use fallback_model. Minor inconsistency, not a bug since this is an edge case (users typically have only one).

Behavioural test (test_pr1339_fallback_providers_list.py) is structural-only (asserts that streaming.py contains the dual-key check via grep). The trace above covers the actual behaviour.

PR #1341 — context_length / threshold_tokens / last_prompt_tokens persistence (@fxd-jason, "#1318 split")

This is the constituent that needs the most attention. The PR enables persistence but does not add the writer, so the user-visible bug it claims to fix is NOT actually fixed by this PR alone.

What the PR does:

Layer Change File:line
Session.__init__ Accepts the 3 fields as named kwargs api/models.py:321-322,347-349
Session.save() Lists them in METADATA_FIELDS api/models.py:369
Session.compact() Exposes them on the dict api/models.py:461-463
/api/session GET Returns them in response api/routes.py:929-931

What the PR does NOT do, and what's needed to actually fix the bug:

$ grep -rn "= .*context_length\|context_length =" api/
api/models.py:347:        self.context_length = context_length    # constructor only
api/streaming.py:2209:                usage['context_length'] = getattr(_cc, 'context_length', 0) or 0
                                       ↑ writes to the SSE usage payload, NOT to session

There is no writer to s.context_length anywhere in the codebase. The PR description claims "These fields are written by streaming.py during context compression" — that's inaccurate. They're written to the SSE usage payload at api/streaming.py:2209-2211, but never to the session itself.

Without a writer, the data flow is:

agent.context_compressor.context_length  (live during streaming)
    ↓ getattr at api/streaming.py:2209
SSE 'done' event → usage payload     (delivered to live client)
    ↓
S.lastUsage in browser memory          (fine while page is open)

  ── page reload here → S.lastUsage cleared ──

GET /api/session → s.context_length    (None on every reload, since no
                                        writer ever populated it)
    ↓
sessions.js:500 _pick(u.context_length, _s.context_length)  → 0 default
    ↓
Context-ring shows 0%                  (the bug, still present)

The CHANGELOG entry at line 16 says "The frontend context-ring indicator was previously losing its percentage on every session load because the Session model silently dropped these fields when reconstructing from disk." That's only true if something was writing to __dict__['context_length'] so save()'s extra dict caught it (see api/models.py:375-377). Without a writer, nothing lands in __dict__ either. The "silently dropped" phrasing implies the data was being written but lost on load — but it was never being written to the Session in the first place.

To actually fix the bug, a 3-line addition is needed near api/streaming.py:2208:

_cc = getattr(agent, 'context_compressor', None)
if _cc:
    usage['context_length'] = getattr(_cc, 'context_length', 0) or 0
    usage['threshold_tokens'] = getattr(_cc, 'threshold_tokens', 0) or 0
    usage['last_prompt_tokens'] = getattr(_cc, 'last_prompt_tokens', 0) or 0
+    # Persist on session so the indicator restores after a page reload (#1318)
+    s.context_length = usage['context_length']
+    s.threshold_tokens = usage['threshold_tokens']
+    s.last_prompt_tokens = usage['last_prompt_tokens']

This sits inside the existing with _agent_lock: block (api/streaming.py:2152) which already owns the session lock during the post-result merge, and the subsequent s.save() would persist them.

I'm not pushing this fix — it's a release-batch PR and the title says "(#1318 split)", explicitly signalling this is a deliberate part-1 component. Adding the writer here would expand scope beyond what fxd-jason / the maintainer chose to land. But the user-visible claim in the CHANGELOG is misleading and a follow-up PR is needed to actually close #1318. Approving as-is because the data-structure changes are safe and load-bearing for whatever follows; the CHANGELOG entry should be tightened to "data-structure scaffolding for #1318 — writer in follow-up".

Cross-PR interactions

Three PRs touch static/ui.js (#1335, #1337, #1338) and one touches api/streaming.py (#1339, #1338). Verified disjoint regions:

  • #1335 — adds _renderUserFencedBlocks near line 60 + swap at line ~3087 (renderMessages user branch).
  • #1337 — modifies renderMd mermaid heuristic near line 1085 + renderMermaidBlocks near line 4135.
  • #1338 — adds _liveActivityUserExpanded tracker near line 2545 + ensureActivityGroup re-create branch near 2562 + finalizeThinkingCard near 4180.
  • #1339 — modifies streaming.py:1701-1722 (fallback resolution).
  • #1338 — modifies streaming.py:2576-2627 (cancel_stream recovery block).

No overlap. Tests pass.

Cross-tool consistency

  • #1339 fallback_providers: agent CLI handles both keys (cli.py:2055-2059). WebUI now consistent.
  • #1341 context_length / threshold_tokens / last_prompt_tokens: webui-only fields, surfaced from agent's compressor via SSE; no agent-side persistence implication. Cross-tool clean.
  • #1337 cache-bust: webui-only, no agent surface.
  • #1335 user fenced code: webui rendering only, no agent surface.
  • #1338 cancel recovery: webui session state only, no agent surface (pending_user_message is a WebUI-only field).

Security audit

  • No new endpoints, auth gates, or file-serving code paths.
  • No eval/exec/Function()/shell=True in the diff.
  • No secrets/tokens in the diff.
  • #1335 user fenced code: stashed code is escaped before stashing; null-byte placeholder pattern is XSS-safe (worst case is "undefined" text rendering).
  • #1337 cache-bust token substitution: urllib.parse.quote(WEBUI_VERSION, safe="") defensively URL-encodes; WEBUI_VERSION is server-controlled (git describe / build-time file).
  • #1339 fallback list iteration: isinstance guards on every step prevent type confusion.
  • #1338: re-confirmed against my prior approval.
  • #1341: data-structure-only, no security surface.

Edge-case matrix

# Scenario Expected Actual
1335 Fence-only user message Renders as <pre><code> with language class ✅ harness
1335 Plain text + fence Plain text escaped (no markdown), fence rendered ✅ harness
1335 Diff/patch language Per-line +/- coloring ✅ harness
1335 User types \x00UF0\x00 literally No XSS, just literal text ✅ stash content already escaped
1337 Valid mermaid block Renders SVG, removes temp DOM ✅ trace
1337 Invalid mermaid (syntax error) Falls back to code block, removes temp DOM ✅ trace
1337 Line-numbered tool output marked mermaid Now skipped (heuristic) ✅ trace
1337 New release deployed Browser fetches fresh assets via ?v= ✅ trace
1339 fallback_model: {model: ..., provider: ...} (legacy) Used as-is ✅ dict branch
1339 fallback_providers: [{model: ...}, {...}] (chain) First valid entry used ✅ list branch
1339 fallback_providers: [] (empty list) _fallback_resolved = None ✅ no valid entries
1339 Both keys set fallback_model wins (legacy-first) ⚠️ CLI prefers fallback_providers — minor inconsistency
1341 New session, no streaming yet Fields = None
1341 After streaming, in-memory session Fields = None still (no writer) ❌ user-visible bug NOT fixed
1341 Page reload, session loaded from disk Fields = None still (nothing was saved) ❌ user-visible bug NOT fixed

Tests

  • PR-specific tests across 10 files: 155/155 pass
  • Full suite: 3291 passed, 54 skipped, 3 xpassed, 0 failed in 15.75s (3348 collected)
  • The PR description claims "3343 passed" — close enough; the discrepancy is consistent with skipped/xpassed/subtests being counted differently across runs.
  • test_pr1339_fallback_providers_list.py is structural (regex-based) but the trace covers the runtime behaviour.

Minor observations (non-blocking)

  • #1341 is data-structure-only: see #1341 trace above. Approving on the basis that the persistence scaffolding is harmless and load-bearing for whatever follows; CHANGELOG entry should ideally be reworded to make clear the writer is still pending. The user-visible bug ("context-ring shows 0% after reload") is NOT fixed by this PR alone.
  • #1339 ordering inconsistency: WebUI tries fallback_model first, CLI tries fallback_providers first. If a user has both set, behaviour differs. Recommend matching the CLI's preference order in a follow-up. Not a bug since both keys aren't supposed to coexist.
  • #1335 null-byte placeholder collision: literal \x00UF<N>\x00 in user input could trigger a placeholder restore for an unrelated stash index. Not exploitable (output is escaped either way) but a hardened version could use a Math.random() salt in the placeholder. Cosmetic edge case.
  • PR description's test count 3343 passed vs my local 3291 passed — a 52-test gap. Probably environment differences (some tests skip in non-CI runs); 0 failures is what matters.
  • CHANGELOG line 16 — "silently dropped these fields when reconstructing from disk" — overstates the fix. Without a writer, nothing was ever written to disk in the first place. See #1341 trace above.

Recommendation

Approved. Five constituent traces clean (security-safe, behaviorally correct, no cross-tool regressions). #1338 squash matches the previously-approved code byte-for-byte. The flagged #1341 gap (no writer) is a documentation/scope issue, not a correctness regression — the persistence scaffolding is safe to merge and unblocks the follow-up writer PR. Approval is conditional on a follow-up landing the writer (3 lines in streaming.py near line 2208) before claiming #1318 fully fixed.

Parked at approval — ready for the release agent's merge/tag pipeline.

Pre-release Opus + nesquena review on v0.50.246 caught that PR #1341
added the data-structure scaffolding (Session.__init__ accepts the 3
fields, save() persists them, compact() exposes them, GET /api/session
returns them) but did NOT add the writer that actually populates them.

Without a writer, the user-visible bug (context-ring shows 0% after
page reload) was NOT fixed by #1341 alone — the fields stayed None
forever because nothing wrote to s.context_length anywhere.

Adds the writer at api/streaming.py:2188 (post-merge per-turn save block,
before s.save()) so the values from agent.context_compressor land on
disk and survive page reloads.

Also moves the SSE usage payload comment to clarify that the live SSE
payload and the session-level persistence are now distinct paths
(payload below, persistence above).

Adds tests/test_pr1341_context_window_persistence.py — 6 structural +
round-trip tests covering Session __init__/save/compact, the routes
response, and the streaming.py writer placement.

Closes #1318 (the actual user-visible bug, not just the scaffolding).
@nesquena-hermes
Copy link
Copy Markdown
Collaborator Author

Update — addressed nesquena's #1341 concern in this release

Nathan said "fix small things ourselves" and your review explicitly recommended:

To actually fix the bug, a 3-line addition is needed near api/streaming.py:2208 (...)
Approval is conditional on a follow-up landing the writer

Rather than punt to a follow-up, I landed the writer in this release as a5c10d5. The branch went from "scaffolding-only #1341" to "fully fixes #1318 end-to-end."

Two new commits since your APPROVED review

SHA Title Lines
0d599b4 fix(streaming): persist context_length on session — completes #1318 fix +155/-1
a5c10d5 (amended into above — adds test ordering fix + CHANGELOG tightening)

What the writer does

Added inside the post-merge per-turn save block at api/streaming.py:2188, before s.save():

# Persist context window data on the session so the context-ring
# indicator survives a page reload (#1318). Must run BEFORE
# s.save() for the same reason as the reasoning trace above.
_cc_for_save = getattr(agent, 'context_compressor', None)
if _cc_for_save:
    s.context_length = getattr(_cc_for_save, 'context_length', 0) or 0
    s.threshold_tokens = getattr(_cc_for_save, 'threshold_tokens', 0) or 0
    s.last_prompt_tokens = getattr(_cc_for_save, 'last_prompt_tokens', 0) or 0

The SSE usage payload block below was preserved unchanged (still emits the same data live to the browser); a comment was added clarifying that the live SSE path and the session-level persistence are now distinct.

CHANGELOG tightening (per your minor observation)

The v0.50.246 entry for #1341 was rewritten to **Context window indicator persists across page reloads (#1318 — fully fixed)** and now explicitly mentions the writer in api/streaming.py. The "silently dropped these fields when reconstructing from disk" phrasing you flagged as overstating the fix is gone.

New regression test: test_pr1341_context_window_persistence.py

6 tests, all green:

  1. test_streaming_persists_context_fields_on_session_before_save — the writer is present before s.save() (anchored on the unique if _reasoning_text and s.messages: marker).
  2. test_session_init_accepts_context_fields__init__ signature includes all 3 fields.
  3. test_session_metadata_fields_includes_context_fieldsMETADATA_FIELDS covers all 3.
  4. test_session_compact_exposes_context_fieldscompact() output dict includes all 3.
  5. test_routes_session_get_returns_context_fields — GET response includes all 3.
  6. test_session_round_trip_persists_context_fields — real round-trip: write fields, save to a tmp_path SESSION_DIR (via monkeypatch.setattr — no module reload, so we don't pollute sys.modules for sibling tests), reload via Session.load(), all 3 values match.

The first round of this test broke test_session_sidecar_repair.py::TestLastResortSyncDelegation because the original implementation del'd api.models from sys.modules, which corrupted shared module state for tests that ran after it. Switched to monkeypatch.setattr(models, "SESSION_DIR", ...) — purely local to the test, no global state touched.

Test count

  • Pre-fix gate: 3344 passed
  • Post-writer gate: 3350 passed, 0 failed in 76.65s (+6 new tests, no regressions)

Re-requesting review

Since this is a substantive code addition post-approval, the prior approval's caveat ("conditional on a follow-up") is now moot — the writer landed here, not in a follow-up. Re-requesting review so you can confirm the writer placement and the round-trip semantics are correct. CI is also re-running on a5c10d5.

Copy link
Copy Markdown
Owner

@nesquena nesquena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followup Review — both flagged concerns addressed ✅ (clean re-approve)

This is a followup to my prior approval (4206870751). Two new commits landed on the release branch since then, and they directly address the two concerns my earlier trace flagged:

New commit Fixes Origin
f328f3b Cancel-stream substring-guard false positive Pre-release Opus review
a5c10d5 #1341 missing writer (the user-visible bug NOT fixed before) My earlier review observation

Both fixes traced cleanly. Re-approving.


f328f3b — gate substring guard on pending_started_at timestamp

The concern. PR #1338 added a content-match guard in cancel_stream() to avoid double-appending the user's typed turn when the streaming thread had already merged it. The guard was symmetric:

if _pending_user in _last_content or _last_content in _pending_user:
    _already_persisted = True

The _last_content in _pending_user direction is a false-positive vector. If the prior user turn was a short confirmation ("ok"), and the cancelled turn is a longer follow-up ("ok please continue"), then "ok" in "ok please continue" is True → guard fires → synthesis is skipped → user's "ok please continue" is lost. Exactly the data-loss bug #1298 was supposed to fix.

The fix. Two changes at api/streaming.py:2623-2643:

  1. Add a timestamp gate: read _pending_started = _cs.pending_started_at and _last_ts = _last_user.get('timestamp'); only treat the latest user turn as "already merged" if _last_ts >= _pending_started. An earlier turn whose content happens to be a substring of the new pending must NOT short-circuit synthesis.
  2. Drop the symmetric in direction: replace _pending_user in _last_content or _last_content in _pending_user with _pending_user == _last_content or _pending_user in _last_content. Keeps equality (exact match) and prefix-tolerance (the streaming thread prepends [Workspace: ...] which makes _last_content a strict superset of _pending_user), drops the false-positive direction.

Race trace under the new guard:

Cancel timing _pending_user _last_content (latest user msg) _last_ts vs _pending_started isinstance str content match Result
Beats merge, prior was "ok" "ok please continue" "ok" T1 < T2 → fail str → True n/a Synthesize ✓
Loses race, plain text "ok please continue" "ok please continue" T2 == T2 → pass str equality True Skip ✓
Loses race, workspace prefix "ok please continue" "[Workspace: /tmp]\nok please continue" T2 == T2 → pass str _pending_user in _last_content True Skip ✓
Loses race, multimodal "describe this" [{type: 'text', ...}, {type: 'image_url', ...}] T2 == T2 → pass list → False n/a Synthesize ✓ — but only if _pending_user is set, which the streaming thread already cleared (under same lock as merge), so the outer if _pending_user and ... short-circuits first → no double-append ✓
Legacy turn with no timestamp "ok please continue" "ok" 0 < T2 → fail str → True n/a Synthesize ✓ — defaults to safe behavior (avoid data loss)

All scenarios safe. The new test test_cancel_synthesizes_when_prior_turn_content_is_substring_of_pending locks the regression. ✅


a5c10d5 — adds the missing writer for #1341

The concern from my prior review: PR #1341 added the Session data structure (constructor params, METADATA_FIELDS, compact(), /api/session route) but did NOT add the writer in streaming.py. Without a writer, s.context_length stayed None forever and the post-reload context-ring still showed 0% — so the user-visible bug was NOT fixed by #1341 alone.

The fix. New 5-line writer block at api/streaming.py:2188-2198, placed BEFORE the existing s.save() at line 2199:

# Persist context window data on the session so the context-ring
# indicator survives a page reload (#1318). Must run BEFORE
# s.save() for the same reason as the reasoning trace above.
_cc_for_save = getattr(agent, 'context_compressor', None)
if _cc_for_save:
    s.context_length = getattr(_cc_for_save, 'context_length', 0) or 0
    s.threshold_tokens = getattr(_cc_for_save, 'threshold_tokens', 0) or 0
    s.last_prompt_tokens = getattr(_cc_for_save, 'last_prompt_tokens', 0) or 0
s.save()

End-to-end trace verified:

  1. Block sits inside with _agent_lock: (opened at api/streaming.py:1969). Same lock as all other session mutations. ✅
  2. Runs after the result-merge and BEFORE s.save() — so the values are in __dict__ when save serializes. ✅
  3. getattr(_cc_for_save, '...', 0) or 0 defends against missing fields and explicit None from older agent builds. ✅
  4. The values now flow: agent.context_compressors.context_length (in-memory) → s.save() writes to disk via METADATA_FIELDS round-trip → Session.load() reads them back via the named kwargs from #1341/api/session GET returns them → frontend _pick(u.X, _s.X) falls back to _s.X after page reload (when S.lastUsage is empty). ✅
  5. The SSE usage payload below at line 2207-2211 still captures the same values — live streaming clients still get them in real time. The new writer is the persistence path that survives reload. The two paths are now distinct, with comments updated to make that clear.

Lock safety: identical to other session writers in the same with _agent_lock: block (s.tool_calls, s.active_stream_id = None, s.input_tokens += ...). No concurrent reader can race because both readers (HTTP /api/session handler, cancel_stream) acquire _get_session_agent_lock(sid) for the same lock. Safe. ✅

Round-trip test verified: test_session_round_trip_persists_context_fields actually saves a Session with the fields set, calls Session.load(sid), asserts the values come back identical (200000 / 180000 / 45123). Real round-trip, not regex. ✅

CHANGELOG entry rewritten to remove the misleading "silently dropped these fields" phrasing and credit the writer addition. The new entry honestly reflects that #1341 was scaffolding-only and the writer was added during pre-release review.


Tests after these followup fixes

  • tests/test_issue1298_cancel_and_activity.py10/10 pass (added test_cancel_synthesizes_when_prior_turn_content_is_substring_of_pending)
  • tests/test_pr1341_context_window_persistence.py6/6 pass (5 structural + 1 round-trip)
  • Full suite: 3298 passed, 54 skipped, 3 xpassed, 0 failed in 15.59s (up from 3291 in my prior run; +7 new tests across these two commits)

Edge-case re-check

Scenario Before f328f3b After f328f3b
Prior turn "ok", new turn "ok please continue", cancelled ❌ data loss (substring guard short-circuited synthesis) ✅ synthesis fires (timestamp gate detects T1 < T2)
Same content as prior, cancelled ✅ no double-append (equality match still fires) ✅ no double-append (equality kept)
Same content with workspace prefix, cancelled ✅ no double-append (prefix tolerance) ✅ no double-append (_pending_user in _last_content kept)
Multimodal user turn, cancelled mid-merge ✅ short-circuits via pending_user_message=None cleared under same lock ✅ same
Legacy turn with no timestamp field n/a ✅ defaults to synthesize (safe — no data loss)
Scenario Before a5c10d5 After a5c10d5
Streaming turn completes, page kept open ✅ live SSE updates context ring ✅ same (unchanged)
Streaming turn completes, page reloaded ❌ context ring shows 0% (fields never persisted) ✅ context ring restores from /api/session
Multiple turns over a session lifetime ❌ post-reload always 0% ✅ shows last-known values
Older agent build with no context_compressor ✅ skipped (no SSE values either) ✅ skipped (if _cc_for_save: guards)
context_compressor exists but values are 0 ✅ SSE shows 0 ✅ persisted as 0 — same

Recommendation

Approved (followup). Both new commits trace cleanly, address the exact concerns raised in pre-release review, and have the right shape:

  • f328f3b removes the false-positive guard direction AND adds a timestamp gate as belt-and-suspenders. Regression test locks the substring scenario.
  • a5c10d5 adds the missing writer in the right place (inside _agent_lock, before s.save()). Round-trip test verifies persistence end-to-end. CHANGELOG honestly reflects the scope of what was actually fixed.

#1318 is now actually fixed (not just scaffolded). #1298 substring guard hardened against the false-positive case.

Parked at approval — ready for the release agent's merge/tag pipeline.

@nesquena-hermes
Copy link
Copy Markdown
Collaborator Author

Shipped — v0.50.246 ✅

Merge: dec4b486 (2026-04-30 16:49 UTC)
Tag: v0.50.246
Production restarted on port 8787, health check OK, cache-bust live (?v=v0.50.246 injected on every static asset).

Constituent PRs closed with credit

# Author Subject
#1335 @bergeouss feat(chat): render fenced code blocks in user messages (#1325)
#1337 @dso2ng fix(ui+pwa): mermaid render error cleanup + cache-busting on static asset URLs
#1338 nesquena-hermes fix(streaming+ui): preserve user message on cancel + persist activity-panel expand state (#1298)
#1339 @jimdawdy-hub fix(streaming): handle list fallback_providers config
#1341 @fxd-jason fix(models): persist context_length / threshold_tokens / last_prompt_tokens (#1318 fully fixed)

Issues closed

Held with detailed feedback

Closed as superseded

Pre-release gate timeline

  1. Initial squash + tests: 3343 passed
  2. Opus pre-release review caught the cancel_stream substring guard false-positive — fix landed in f328f3b
  3. nesquena APPROVED with one substantive flag (fix: persist context_length/threshold_tokens/last_prompt_tokens in Session model #1341 had no writer)
  4. Writer added in a5c10d5 — converted scaffolding-only fix: persist context_length/threshold_tokens/last_prompt_tokens in Session model #1341 into a complete fix: context window indicator not updating + responsive sidebar + chat area width at 769px #1318 fix
  5. nesquena re-APPROVED on the new commits
  6. Final pytest: 3350 passed, 0 failed in 76.65s
  7. CI: 3.11 / 3.12 / 3.13 all green
  8. Browser sanity verified: fenced code XSS-safe, cache-bust live, Mermaid stricter detection live

Thanks for the thorough end-to-end review on both passes — catching that #1341 was scaffolding-only saved the user-visible bug from staying broken until a follow-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Activity panel collapses during streaming and latest user message disappears after Stop

2 participants