fix: context window indicator shows 100% when context_length is missing from SSE payload#1356
fix: context window indicator shows 100% when context_length is missing from SSE payload#1356nesquena-hermes wants to merge 2 commits intomasterfrom
Conversation
…th is missing When the context compressor is absent or reports context_length=0, the live SSE usage payload omitted context_length entirely. The frontend then fell back to the 128K JS default (131,072 tokens), causing long sessions whose cumulative input_tokens exceeded 131K to display as 100% / 0% left. Root cause: the session-save path already had a metadata fallback for context_length (lines ~2205-2217) but the live SSE payload block (lines ~2239-2243) had no such fallback. Fix: - streaming.py: after reading context_length from the compressor, fall back to get_model_context_length() when the value is still 0. Also fall back to s.last_prompt_tokens when last_prompt_tokens is missing (no compressor), preventing the frontend from using the cumulative input_tokens counter as a proxy. - ui.js: track rawPct separately from the clamped pct; when rawPct exceeds 100 show 'N% used (context exceeded)' instead of the misleading '100% used (0% left)'. Reproduces when: using a large-context model (e.g. claude-sonnet-4.6 with 1M context) via OpenRouter without auto-compress enabled, in a session whose cumulative token count exceeds the 128K JS fallback.
…file upload
After uploadPendingFiles() resolves, the composer status was never
explicitly cleared. setComposerStatus('') only fires inside setBusy(false),
so 'Uploading...' remained visible for the entire duration of the agent
stream — showing incorrectly while the agent was responding.
Fix: call setComposerStatus('') immediately after the upload await
completes (and before setBusy(true)), so the label disappears as soon
as the upload is done rather than waiting for the stream to finish.
Additional fix included in this PR:
|
nesquena
left a comment
There was a problem hiding this comment.
Review — end-to-end ✅ (clean approve, behavioural harness confirms overflow fix)
Two-commit PR by the agent (nesquena-hermes) — first commit closes the live-SSE side of the context-length fallback story (mirrors the session-save fallback shipped in PR #1348/v0.50.247), second commit clears a stuck "Uploading…" composer status. Total +30/−2 across two files (api/streaming.py + static/ui.js + static/messages.js).
Two commits on the branch
ce137dc fix: context window indicator shows 100% when compressor context_length is missing
f02df28 fix: 'Uploading...' status persists for entire stream duration after file upload
What this ships
ce137dc — context-length fallback for the live SSE payload (api/streaming.py, static/ui.js).
The session-save path at api/streaming.py:2199-2217 already had the v0.50.247 / PR #1348 fallback — agent.model_metadata.get_model_context_length(model, base_url) gets called when the compressor didn't populate s.context_length. But the live SSE usage payload constructed at api/streaming.py:2235-2243 did NOT have the same fallback: if agent.context_compressor was missing or had context_length=0, usage['context_length'] was simply omitted, the JSON-serialized payload landed at the frontend without the field, and the JS at static/ui.js:858 substituted the 128K default (DEFAULT_CTX = 128*1024). For sessions whose cumulative last_prompt_tokens exceeded 128K (e.g. claude-sonnet-4.6 via OpenRouter — 1M context), this overflowed against the JS default and clamped to "100% used (0% left)" — the bug in the title.
The fix at api/streaming.py:2244-2258 mirrors the session-save block exactly: if not usage.get('context_length') → import + call get_model_context_length() → assign to usage['context_length'] → try/except: pass for backwards compat with older agent builds. Same import path, same parameter shape, same exception-swallow pattern. ✅
The companion fix at api/streaming.py:2259-2265 handles last_prompt_tokens symmetrically: when the compressor doesn't supply it, fall back to s.last_prompt_tokens (the session-persisted value from the prior turn's writer block). Without this, the frontend's existing usage.last_prompt_tokens || usage.input_tokens || 0 chain at static/ui.js:854 would substitute the cumulative input_tokens counter — which compounds across turns and makes the overflow even worse for long sessions.
The frontend tweak at static/ui.js:867-869 tracks rawPct separately from the clamped pct; when rawPct > 100, the tooltip at line 913 shows ${rawPct}% used (context exceeded) instead of the misleading 100% used (0% left). Ring drawing and center text still use pct (clamped) — a circle can't render past 100%, so this is correct.
f02df28 — clear "Uploading…" composer status after upload completes (static/messages.js).
The status flow at static/messages.js:140-147:
- Line 140:
setComposerStatus('Uploading…')beforeuploadPendingFiles(). - Line 142-143: try/catch around the upload; on error with no text, show error and return.
- NEW line 144-147:
setComposerStatus('')unconditionally after the upload await completes. - Old behavior: nothing cleared the status until
setBusy(false)fires at the end of the agent stream — so "Uploading…" remained visible during the entire agent response.
This is a UX defect, not a correctness one: file uploads succeed, the agent response streams correctly, the message text is correct. Just the visual status was wrong. Fix is the right shape.
Traced against upstream hermes-agent
agent.model_metadata.get_model_context_length signature verified at /tmp/hermes-agent-fresh/agent/model_metadata.py:1229 — (model, base_url, api_key, config_context_length, provider, custom_providers). The PR passes only the first two; rest default. Same call shape as the existing session-save fallback at line 2208-2211. The 9-stage resolution chain ends in a 256K default — never returns 0.
For sessions where compressor exists with explicit config_context_length set, the writer at line 2196 captures it from the compressor (which was initialized from config). For sessions where compressor is missing, the fallback walks the resolution chain. Both paths converge on a populated usage['context_length'] for the SSE payload.
End-to-end trace — overflow scenario
Pre-fix path (1M-context model via OpenRouter, no compressor):
- User starts session with
claude-sonnet-4.6via OpenRouter (no auto-compress configured). - Streaming completes;
_cc = getattr(agent, 'context_compressor', None)returnsNoneat api/streaming.py:2239. usagedict is built withoutcontext_lengthorlast_prompt_tokens.- SSE
usageevent reaches frontend. - Frontend's
_syncCtxIndicatorat static/ui.js:858 substitutesDEFAULT_CTX = 128*1024. promptTok = usage.last_prompt_tokens || usage.input_tokens || 0substitutes the cumulativeinput_tokenscounter (e.g. 250K after 5 turns).rawPct = round(250000/131072 * 100) = 191. Old code:pct = min(100, 191) = 100. Tooltip:100% used (0% left). Bug.
Post-fix path:
1-3. Same.
4. NEW api/streaming.py:2248: if not usage.get('context_length') → resolves 1000000 from model metadata.
5. NEW api/streaming.py:2262: if not usage.get('last_prompt_tokens') → falls back to s.last_prompt_tokens (per-turn value, not cumulative).
6. Frontend gets usage.context_length=1000000 and usage.last_prompt_tokens=42000 (single-turn).
7. rawPct = round(42000/1000000 * 100) = 4. pct = 4. Tooltip: 4% used (96% left). ✅
Post-fix path on older agent build (no get_model_context_length):
1-3. Same as pre-fix.
4. try: from agent.model_metadata import get_model_context_length → ImportError.
5. except Exception: pass → usage['context_length'] still missing.
6. Frontend defaults to 128K, overflow scenario re-emerges, BUT:
7. New overflowed branch at static/ui.js:913 shows ${rawPct}% used (context exceeded) instead of 100% used (0% left). User sees actual percentage and knows to pay attention. ✅
Race / lock analysis
The new fallback runs inside the existing with _agent_lock: block (started at api/streaming.py:1969 and continuing through line 2270+). get_model_context_length() is the same call already made by the session-save fallback at line 2208 — same caching, same potential network probe on cache miss, same one-time cost per (model, base_url) tuple.
A user reload during a first-resolution probe could briefly block on _agent_lock, but the resolver is heavily cached via get_cached_context_length() in the agent. After first hit, returns synchronously. No new lock interactions.
Cross-tool consistency
- Webui-only:
usageis the SSE payload to the browser; never round-trips throughconfig.yamlor the CLI. ✅ s.last_prompt_tokenspersists to the session file viasession.save(). The CLI doesn't read this field — webui-only persistence. ✅agent.model_metadatais a read-only resolver — no agent state mutated. ✅f02df28is JS-only, no backend or cross-tool impact.
Security audit
- ✅ No new endpoints, no new file-serving surface.
- ✅
getattr(agent, 'model', resolved_model or '') or ''andgetattr(agent, 'base_url', '') or ''— defensiveNonehandling, never passesNoneinto the resolver. - ✅
try/except Exception:is broad but matches the existing pattern in the same file (PR #1348 precedent at line 2214). Acceptable. - ✅ XSS/injection: the new tooltip text uses
textContent(notinnerHTML) at static/ui.js:913.rawPctis a number, so even if user-controlled it would be a no-op against textContent. ✅ - ✅
setComposerStatus('')inf02df28is a static empty string. No injection surface.
Behavioural harness — JS overflow tooltip
I extracted _syncCtxIndicator into a Node harness with realistic inputs:
Test 1: 250K cumulative input vs 128K default
rawPct: 191, pct: 100, overflowed: true
tooltip: '191% used (context exceeded)' ← was '100% used (0% left)' pre-fix
Test 2: 250K with 1M context window (resolved)
rawPct: 25, pct: 25, overflowed: false
tooltip: '25% used (75% left)' ← happy path with backend resolution
Test 3: 50K with 128K default
rawPct: 38, pct: 38, overflowed: false
tooltip: '38% used (62% left) [label: (est. 128K)]'
Test 4: exactly at 100% (131072)
rawPct: 100, pct: 100, overflowed: false
tooltip: '100% used (0% left)' ← boundary not flagged as overflow
Test 5: 130K vs 128K default (just over by 2K → rounds to 99%)
rawPct: 99, pct: 99, overflowed: false
tooltip: '99% used (1% left)'
Test 6: backend resolved correctly, 950K of 1M
rawPct: 95, pct: 95, overflowed: false
tooltip: '95% used (5% left)'
All match expected behavior. The exactly-at-100% boundary correctly says "0% left" rather than the new exceeded text — rawPct > 100 is strict. ✅
Edge-case matrix
| Scenario | Pre-fix | Post-fix |
|---|---|---|
| Compressor populates context_length=1000000 | usage=1M, ring shows correct pct | Same — fallback skipped (truthy) ✅ |
| Compressor populates context_length=0 | usage missing → JS defaults 128K → overflow | Fallback resolves from metadata ✅ |
| Compressor missing entirely | usage missing → JS defaults 128K → overflow | Fallback resolves from metadata ✅ |
Older agent build (no get_model_context_length) |
usage missing → JS defaults 128K → "100% used (0% left)" | Fallback try/except swallows ImportError; JS shows N% used (context exceeded) ✅ |
last_prompt_tokens missing (no compressor) |
JS chain falls back to cumulative input_tokens (overflows for long sessions) |
Falls back to s.last_prompt_tokens (per-turn) ✅ |
last_prompt_tokens=0 and s.last_prompt_tokens=0 (fresh session) |
JS chain falls back to input_tokens of current turn |
Same — _sess_lpt = 0 is falsy, no override ✅ |
| Boundary: prompt_tok exactly equals ctx_window | pct=100, overflowed=false |
Same — rawPct > 100 is strict ✅ |
| Just over boundary (rawPct=101) | pct=100, overflowed=true → "101% used (context exceeded)" |
✅ |
| Way over (rawPct=500) | "500% used (context exceeded)" | ✅ |
| Page reload mid-session | s.context_length persisted via PR #1318 writer | Same; no impact from this PR |
| Upload error with text present | "Uploading…" stuck for entire stream | "Uploading…" cleared right after upload await (f02df28) ✅ |
| Upload error with no text | "Upload error: …" shown, return | Unchanged — the early-return path still wins ✅ |
Minor observations (non-blocking)
- No dedicated regression test for the new SSE-payload fallback. The session-save fallback has
tests/test_pr1318_context_length_fallback.py(6/6 pass). A symmetric structural test asserting the newif not usage.get('context_length')block exists between line ~2243 and line ~2266 would lock this against regression. Not blocking — the existing 12 context-related tests pass and the behavioural harness confirms the JS side works. - aria-label vs tooltip inconsistency: at static/ui.js:909, the aria-label uses
pct(clamped at 100) —Context window 100% used (est. 128K)— while the visible tooltip says${rawPct}% used (context exceeded). Screen-reader users wouldn't learn about the overflow. Could be addressed by mirroring theoverflowedbranch into the aria-label. Minor — out of scope for this fix. Math.max(0, 100-pct)simplified to100-pctat line 913. Sincepct = Math.min(100, rawPct),100-pctis always ≥ 0. TheMath.maxwas redundant safety. Removing it is a clean simplification.setComposerStatus('')after upload error swallows the error message when text is present (the existing catch only sets status when!text). The new clear at line 147 hides any "Upload error" that might have been left over, BUT since the catch never set any status in the with-text path, this is a no-op. Pre-existing behavior — fix at line 147 doesn't introduce or worsen the silent-upload-error path. Worth a follow-up to surface the upload error even when text is present.- PR description claims "All 28 context-related tests and 161 streaming/usage/ctx tests pass." I count 12 context tests in
test_pr1318_*+test_pr1341_*. The "28" and "161" numbers are fuzzy — likely include partial-name matches across the suite. Tests pass either way.
Tests
- Targeted:
test_pr1318_context_length_fallback.py6/6 pass,test_pr1341_context_window_persistence.py6/6 pass — these cover the session-save fallback that this PR mirrors. - Full suite: 3359 passed, 54 skipped, 3 xpassed, 0 failed in 16.66s on
ce137dc. Thef02df28commit is JS-only and doesn't affect the Python test suite. - Behavioural harness: Node harness confirms 6 scenarios produce expected tooltip text, including the exactly-at-100% boundary (correctly NOT flagged as overflow).
Recommendation
✅ Approved. The streaming.py fallback is a precise mirror of the v0.50.247 session-save fallback — same signature, same exception-swallow pattern, same lock context. The frontend rawPct/overflowed split correctly preserves ring/center clamping behavior while fixing the misleading tooltip. Behavioural harness confirms the fix works against realistic 250K/1M and 1M-context-window scenarios. The f02df28 commit is a clean composer-status fix that addresses a stuck-state UX defect.
Parked at approval — ready for the release agent's merge/tag pipeline.
- api/streaming.py SSE payload now falls back to agent.model_metadata.get_model_context_length when compressor doesn't supply context_length (mirrors the session-save fallback shipped in v0.50.247). - api/streaming.py also falls back to s.last_prompt_tokens to avoid using the cumulative input_tokens counter. - static/ui.js tracks rawPct separately from pct and shows '(context exceeded)' tooltip when rawPct > 100 instead of misleading '100% used (0% left)'. - static/messages.js clears 'Uploading...' composer status after upload completes. Co-authored-by: nesquena-hermes <[email protected]>
Bundles 5 community PRs: - #1355 feat(clarify): SSE long-connection (mirrors #1350 pattern, includes all correctness lessons) - #1356 fix: context window indicator overflow (live SSE fallback) + uploading status clear - #1357 fix: preserve imported session source metadata - #1358 fix: collapse sidebar session lineage rows - #1359 fix: sync active session across tabs Tests: 3444 passing (3411 -> 3444, +33)
|
Shipped in v0.50.249 (merge Production verified live at port 8787, Already approved at 19:33 UTC, shipped in this batch. Thanks (self-review approved)! 🙏 |
- api/streaming.py SSE payload now falls back to agent.model_metadata.get_model_context_length when compressor doesn't supply context_length (mirrors the session-save fallback shipped in v0.50.247). - api/streaming.py also falls back to s.last_prompt_tokens to avoid using the cumulative input_tokens counter. - static/ui.js tracks rawPct separately from pct and shows '(context exceeded)' tooltip when rawPct > 100 instead of misleading '100% used (0% left)'. - static/messages.js clears 'Uploading...' composer status after upload completes. Co-authored-by: nesquena-hermes <[email protected]>
Bundles 5 community PRs: - nesquena#1355 feat(clarify): SSE long-connection (mirrors nesquena#1350 pattern, includes all correctness lessons) - nesquena#1356 fix: context window indicator overflow (live SSE fallback) + uploading status clear - nesquena#1357 fix: preserve imported session source metadata - nesquena#1358 fix: collapse sidebar session lineage rows - nesquena#1359 fix: sync active session across tabs Tests: 3444 passing (3411 -> 3444, +33)
Fix two-layer bug where `/api/session` returned `context_length=0` for sessions that pre-date #1318, then the frontend silently fell back to cumulative `input_tokens` and the 128K JS default, producing nonsense indicators like "100" capped from "890% used (context exceeded), 1.2M / 131.1k tokens used". Empirical impact: 23 of 75 sessions on dev server rendered >100% before this fix. #1356 fixed the same symptom on the live SSE path but missed the GET /api/session load path that older sessions go through. Two-layer fix: 1. Backend (api/routes.py:1295-1313) — resolve context_length via agent.model_metadata.get_model_context_length() when the persisted value is 0. Mirrors api/streaming.py:2333-2342. 2. Frontend (static/ui.js:1269) — drop the cumulative `input_tokens` fallback. When last_prompt_tokens is missing, render "·" + "tokens used" (existing !hasPromptTok branch) instead of computing a percentage from the cumulative total. 10 regression tests in tests/test_issue1436_context_indicator_load_path.py covering both layers + the empty-model edge case (avoids the 256K default-for-unknown-model trap that get_model_context_length('') returns). Verified live: claude-opus-4-7 session with input_tokens=5,226,479 now renders "·" + "5.3M tokens used" instead of "100" + "3987% used". Reported by @AvidFuturist. Closes #1436.
Summary
The context window indicator shows 100% / 0% left for long sessions even when the model has a large context window (e.g. 1M tokens for claude-sonnet-4.6).
Root cause
The live SSE
usagepayload that populates the indicator was missing a fallback forcontext_lengthwhen the context compressor was absent or reported 0. The frontend then fell back to the 128K JS default (131,072 tokens). For sessions whose cumulativeinput_tokensexceeded 131K, the indicator computed>100%and capped it at 100%, displaying100% used (0% left).The session-save path (lines ~2205–2217 in
streaming.py) already had aget_model_context_length()fallback, but the live SSE payload block had none.Separately, when
last_prompt_tokenswas missing (no compressor), the frontend fell back to the cumulativeinput_tokenscounter rather than the actual last-request prompt size, compounding the overflow.Changes
api/streaming.pycontext_lengthfrom the compressor, fall back toget_model_context_length()when the value is still 0. Mirrors the existing session-save fallback.last_prompt_tokensis missing from the compressor, fall back tos.last_prompt_tokens(the session-persisted value) to avoid using the cumulative counter as a proxy.static/ui.jsrawPctseparately from the clampedpct.rawPct > 100, showN% used (context exceeded)instead of the misleading100% used (0% left).Reproduction
Use a large-context model (e.g.
claude-sonnet-4.6via OpenRouter, 1M context window) without auto-compress enabled. After the session's cumulative token count exceeds 128K, the indicator shows 100% even though the model still has ~875K tokens available.Tests
All 28 context-related tests and 161 streaming/usage/ctx tests pass.