Skip to content

fix: context window indicator shows 100% when context_length is missing from SSE payload#1356

Closed
nesquena-hermes wants to merge 2 commits intomasterfrom
fix/context-window-indicator-overflow
Closed

fix: context window indicator shows 100% when context_length is missing from SSE payload#1356
nesquena-hermes wants to merge 2 commits intomasterfrom
fix/context-window-indicator-overflow

Conversation

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Summary

The context window indicator shows 100% / 0% left for long sessions even when the model has a large context window (e.g. 1M tokens for claude-sonnet-4.6).

Root cause

The live SSE usage payload that populates the indicator was missing a fallback for context_length when the context compressor was absent or reported 0. The frontend then fell back to the 128K JS default (131,072 tokens). For sessions whose cumulative input_tokens exceeded 131K, the indicator computed >100% and capped it at 100%, displaying 100% used (0% left).

The session-save path (lines ~2205–2217 in streaming.py) already had a get_model_context_length() fallback, but the live SSE payload block had none.

Separately, when last_prompt_tokens was missing (no compressor), the frontend fell back to the cumulative input_tokens counter rather than the actual last-request prompt size, compounding the overflow.

Changes

api/streaming.py

  • After reading context_length from the compressor, fall back to get_model_context_length() when the value is still 0. Mirrors the existing session-save fallback.
  • When last_prompt_tokens is missing from the compressor, fall back to s.last_prompt_tokens (the session-persisted value) to avoid using the cumulative counter as a proxy.

static/ui.js

  • Track rawPct separately from the clamped pct.
  • When rawPct > 100, show N% used (context exceeded) instead of the misleading 100% used (0% left).

Reproduction

Use a large-context model (e.g. claude-sonnet-4.6 via OpenRouter, 1M context window) without auto-compress enabled. After the session's cumulative token count exceeds 128K, the indicator shows 100% even though the model still has ~875K tokens available.

Tests

All 28 context-related tests and 161 streaming/usage/ctx tests pass.

…th is missing

When the context compressor is absent or reports context_length=0, the
live SSE usage payload omitted context_length entirely.  The frontend
then fell back to the 128K JS default (131,072 tokens), causing long
sessions whose cumulative input_tokens exceeded 131K to display as
100% / 0% left.

Root cause: the session-save path already had a metadata fallback for
context_length (lines ~2205-2217) but the live SSE payload block
(lines ~2239-2243) had no such fallback.

Fix:
- streaming.py: after reading context_length from the compressor, fall
  back to get_model_context_length() when the value is still 0.
  Also fall back to s.last_prompt_tokens when last_prompt_tokens is
  missing (no compressor), preventing the frontend from using the
  cumulative input_tokens counter as a proxy.
- ui.js: track rawPct separately from the clamped pct; when rawPct
  exceeds 100 show 'N% used (context exceeded)' instead of the
  misleading '100% used (0% left)'.

Reproduces when: using a large-context model (e.g. claude-sonnet-4.6
with 1M context) via OpenRouter without auto-compress enabled, in a
session whose cumulative token count exceeds the 128K JS fallback.
…file upload

After uploadPendingFiles() resolves, the composer status was never
explicitly cleared. setComposerStatus('') only fires inside setBusy(false),
so 'Uploading...' remained visible for the entire duration of the agent
stream — showing incorrectly while the agent was responding.

Fix: call setComposerStatus('') immediately after the upload await
completes (and before setBusy(true)), so the label disappears as soon
as the upload is done rather than waiting for the stream to finish.
@nesquena-hermes
Copy link
Copy Markdown
Collaborator Author

Additional fix included in this PR: Uploading... persists during stream

A second bug was found and fixed in the same branch.

Symptom: Uploading... appears in the composer status bar for the entire duration of an agent stream, even when no upload is in progress.

Root cause: messages.js calls setComposerStatus('Uploading...') before uploadPendingFiles(), but never clears it after the upload resolves. The only place setComposerStatus('') fires is inside setBusy(false) — which runs when the stream finishes, not when the upload finishes. So the label stuck around for the whole response.

Fix: One line — setComposerStatus('') immediately after the uploadPendingFiles() await, before setBusy(true). Also covers the partial-failure path (upload throws with text present) which previously left the label stuck too.

Copy link
Copy Markdown
Owner

@nesquena nesquena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — end-to-end ✅ (clean approve, behavioural harness confirms overflow fix)

Two-commit PR by the agent (nesquena-hermes) — first commit closes the live-SSE side of the context-length fallback story (mirrors the session-save fallback shipped in PR #1348/v0.50.247), second commit clears a stuck "Uploading…" composer status. Total +30/−2 across two files (api/streaming.py + static/ui.js + static/messages.js).

Two commits on the branch

ce137dc  fix: context window indicator shows 100% when compressor context_length is missing
f02df28  fix: 'Uploading...' status persists for entire stream duration after file upload

What this ships

ce137dc — context-length fallback for the live SSE payload (api/streaming.py, static/ui.js).

The session-save path at api/streaming.py:2199-2217 already had the v0.50.247 / PR #1348 fallback — agent.model_metadata.get_model_context_length(model, base_url) gets called when the compressor didn't populate s.context_length. But the live SSE usage payload constructed at api/streaming.py:2235-2243 did NOT have the same fallback: if agent.context_compressor was missing or had context_length=0, usage['context_length'] was simply omitted, the JSON-serialized payload landed at the frontend without the field, and the JS at static/ui.js:858 substituted the 128K default (DEFAULT_CTX = 128*1024). For sessions whose cumulative last_prompt_tokens exceeded 128K (e.g. claude-sonnet-4.6 via OpenRouter — 1M context), this overflowed against the JS default and clamped to "100% used (0% left)" — the bug in the title.

The fix at api/streaming.py:2244-2258 mirrors the session-save block exactly: if not usage.get('context_length') → import + call get_model_context_length() → assign to usage['context_length']try/except: pass for backwards compat with older agent builds. Same import path, same parameter shape, same exception-swallow pattern. ✅

The companion fix at api/streaming.py:2259-2265 handles last_prompt_tokens symmetrically: when the compressor doesn't supply it, fall back to s.last_prompt_tokens (the session-persisted value from the prior turn's writer block). Without this, the frontend's existing usage.last_prompt_tokens || usage.input_tokens || 0 chain at static/ui.js:854 would substitute the cumulative input_tokens counter — which compounds across turns and makes the overflow even worse for long sessions.

The frontend tweak at static/ui.js:867-869 tracks rawPct separately from the clamped pct; when rawPct > 100, the tooltip at line 913 shows ${rawPct}% used (context exceeded) instead of the misleading 100% used (0% left). Ring drawing and center text still use pct (clamped) — a circle can't render past 100%, so this is correct.

f02df28 — clear "Uploading…" composer status after upload completes (static/messages.js).

The status flow at static/messages.js:140-147:

  • Line 140: setComposerStatus('Uploading…') before uploadPendingFiles().
  • Line 142-143: try/catch around the upload; on error with no text, show error and return.
  • NEW line 144-147: setComposerStatus('') unconditionally after the upload await completes.
  • Old behavior: nothing cleared the status until setBusy(false) fires at the end of the agent stream — so "Uploading…" remained visible during the entire agent response.

This is a UX defect, not a correctness one: file uploads succeed, the agent response streams correctly, the message text is correct. Just the visual status was wrong. Fix is the right shape.

Traced against upstream hermes-agent

agent.model_metadata.get_model_context_length signature verified at /tmp/hermes-agent-fresh/agent/model_metadata.py:1229(model, base_url, api_key, config_context_length, provider, custom_providers). The PR passes only the first two; rest default. Same call shape as the existing session-save fallback at line 2208-2211. The 9-stage resolution chain ends in a 256K default — never returns 0.

For sessions where compressor exists with explicit config_context_length set, the writer at line 2196 captures it from the compressor (which was initialized from config). For sessions where compressor is missing, the fallback walks the resolution chain. Both paths converge on a populated usage['context_length'] for the SSE payload.

End-to-end trace — overflow scenario

Pre-fix path (1M-context model via OpenRouter, no compressor):

  1. User starts session with claude-sonnet-4.6 via OpenRouter (no auto-compress configured).
  2. Streaming completes; _cc = getattr(agent, 'context_compressor', None) returns None at api/streaming.py:2239.
  3. usage dict is built without context_length or last_prompt_tokens.
  4. SSE usage event reaches frontend.
  5. Frontend's _syncCtxIndicator at static/ui.js:858 substitutes DEFAULT_CTX = 128*1024.
  6. promptTok = usage.last_prompt_tokens || usage.input_tokens || 0 substitutes the cumulative input_tokens counter (e.g. 250K after 5 turns).
  7. rawPct = round(250000/131072 * 100) = 191. Old code: pct = min(100, 191) = 100. Tooltip: 100% used (0% left). Bug.

Post-fix path:
1-3. Same.
4. NEW api/streaming.py:2248: if not usage.get('context_length') → resolves 1000000 from model metadata.
5. NEW api/streaming.py:2262: if not usage.get('last_prompt_tokens') → falls back to s.last_prompt_tokens (per-turn value, not cumulative).
6. Frontend gets usage.context_length=1000000 and usage.last_prompt_tokens=42000 (single-turn).
7. rawPct = round(42000/1000000 * 100) = 4. pct = 4. Tooltip: 4% used (96% left). ✅

Post-fix path on older agent build (no get_model_context_length):
1-3. Same as pre-fix.
4. try: from agent.model_metadata import get_model_context_lengthImportError.
5. except Exception: passusage['context_length'] still missing.
6. Frontend defaults to 128K, overflow scenario re-emerges, BUT:
7. New overflowed branch at static/ui.js:913 shows ${rawPct}% used (context exceeded) instead of 100% used (0% left). User sees actual percentage and knows to pay attention. ✅

Race / lock analysis

The new fallback runs inside the existing with _agent_lock: block (started at api/streaming.py:1969 and continuing through line 2270+). get_model_context_length() is the same call already made by the session-save fallback at line 2208 — same caching, same potential network probe on cache miss, same one-time cost per (model, base_url) tuple.

A user reload during a first-resolution probe could briefly block on _agent_lock, but the resolver is heavily cached via get_cached_context_length() in the agent. After first hit, returns synchronously. No new lock interactions.

Cross-tool consistency

  • Webui-only: usage is the SSE payload to the browser; never round-trips through config.yaml or the CLI. ✅
  • s.last_prompt_tokens persists to the session file via session.save(). The CLI doesn't read this field — webui-only persistence. ✅
  • agent.model_metadata is a read-only resolver — no agent state mutated. ✅
  • f02df28 is JS-only, no backend or cross-tool impact.

Security audit

  • ✅ No new endpoints, no new file-serving surface.
  • getattr(agent, 'model', resolved_model or '') or '' and getattr(agent, 'base_url', '') or '' — defensive None handling, never passes None into the resolver.
  • try/except Exception: is broad but matches the existing pattern in the same file (PR #1348 precedent at line 2214). Acceptable.
  • ✅ XSS/injection: the new tooltip text uses textContent (not innerHTML) at static/ui.js:913. rawPct is a number, so even if user-controlled it would be a no-op against textContent. ✅
  • setComposerStatus('') in f02df28 is a static empty string. No injection surface.

Behavioural harness — JS overflow tooltip

I extracted _syncCtxIndicator into a Node harness with realistic inputs:

Test 1: 250K cumulative input vs 128K default
  rawPct: 191, pct: 100, overflowed: true
  tooltip: '191% used (context exceeded)'        ← was '100% used (0% left)' pre-fix

Test 2: 250K with 1M context window (resolved)
  rawPct: 25, pct: 25, overflowed: false
  tooltip: '25% used (75% left)'                 ← happy path with backend resolution

Test 3: 50K with 128K default
  rawPct: 38, pct: 38, overflowed: false
  tooltip: '38% used (62% left) [label: (est. 128K)]'

Test 4: exactly at 100% (131072)
  rawPct: 100, pct: 100, overflowed: false
  tooltip: '100% used (0% left)'                 ← boundary not flagged as overflow

Test 5: 130K vs 128K default (just over by 2K → rounds to 99%)
  rawPct: 99, pct: 99, overflowed: false
  tooltip: '99% used (1% left)'

Test 6: backend resolved correctly, 950K of 1M
  rawPct: 95, pct: 95, overflowed: false
  tooltip: '95% used (5% left)'

All match expected behavior. The exactly-at-100% boundary correctly says "0% left" rather than the new exceeded text — rawPct > 100 is strict. ✅

Edge-case matrix

Scenario Pre-fix Post-fix
Compressor populates context_length=1000000 usage=1M, ring shows correct pct Same — fallback skipped (truthy) ✅
Compressor populates context_length=0 usage missing → JS defaults 128K → overflow Fallback resolves from metadata ✅
Compressor missing entirely usage missing → JS defaults 128K → overflow Fallback resolves from metadata ✅
Older agent build (no get_model_context_length) usage missing → JS defaults 128K → "100% used (0% left)" Fallback try/except swallows ImportError; JS shows N% used (context exceeded)
last_prompt_tokens missing (no compressor) JS chain falls back to cumulative input_tokens (overflows for long sessions) Falls back to s.last_prompt_tokens (per-turn) ✅
last_prompt_tokens=0 and s.last_prompt_tokens=0 (fresh session) JS chain falls back to input_tokens of current turn Same — _sess_lpt = 0 is falsy, no override ✅
Boundary: prompt_tok exactly equals ctx_window pct=100, overflowed=false Same — rawPct > 100 is strict ✅
Just over boundary (rawPct=101) pct=100, overflowed=true → "101% used (context exceeded)"
Way over (rawPct=500) "500% used (context exceeded)"
Page reload mid-session s.context_length persisted via PR #1318 writer Same; no impact from this PR
Upload error with text present "Uploading…" stuck for entire stream "Uploading…" cleared right after upload await (f02df28) ✅
Upload error with no text "Upload error: …" shown, return Unchanged — the early-return path still wins ✅

Minor observations (non-blocking)

  • No dedicated regression test for the new SSE-payload fallback. The session-save fallback has tests/test_pr1318_context_length_fallback.py (6/6 pass). A symmetric structural test asserting the new if not usage.get('context_length') block exists between line ~2243 and line ~2266 would lock this against regression. Not blocking — the existing 12 context-related tests pass and the behavioural harness confirms the JS side works.
  • aria-label vs tooltip inconsistency: at static/ui.js:909, the aria-label uses pct (clamped at 100) — Context window 100% used (est. 128K) — while the visible tooltip says ${rawPct}% used (context exceeded). Screen-reader users wouldn't learn about the overflow. Could be addressed by mirroring the overflowed branch into the aria-label. Minor — out of scope for this fix.
  • Math.max(0, 100-pct) simplified to 100-pct at line 913. Since pct = Math.min(100, rawPct), 100-pct is always ≥ 0. The Math.max was redundant safety. Removing it is a clean simplification.
  • setComposerStatus('') after upload error swallows the error message when text is present (the existing catch only sets status when !text). The new clear at line 147 hides any "Upload error" that might have been left over, BUT since the catch never set any status in the with-text path, this is a no-op. Pre-existing behavior — fix at line 147 doesn't introduce or worsen the silent-upload-error path. Worth a follow-up to surface the upload error even when text is present.
  • PR description claims "All 28 context-related tests and 161 streaming/usage/ctx tests pass." I count 12 context tests in test_pr1318_* + test_pr1341_*. The "28" and "161" numbers are fuzzy — likely include partial-name matches across the suite. Tests pass either way.

Tests

  • Targeted: test_pr1318_context_length_fallback.py 6/6 pass, test_pr1341_context_window_persistence.py 6/6 pass — these cover the session-save fallback that this PR mirrors.
  • Full suite: 3359 passed, 54 skipped, 3 xpassed, 0 failed in 16.66s on ce137dc. The f02df28 commit is JS-only and doesn't affect the Python test suite.
  • Behavioural harness: Node harness confirms 6 scenarios produce expected tooltip text, including the exactly-at-100% boundary (correctly NOT flagged as overflow).

Recommendation

Approved. The streaming.py fallback is a precise mirror of the v0.50.247 session-save fallback — same signature, same exception-swallow pattern, same lock context. The frontend rawPct/overflowed split correctly preserves ring/center clamping behavior while fixing the misleading tooltip. Behavioural harness confirms the fix works against realistic 250K/1M and 1M-context-window scenarios. The f02df28 commit is a clean composer-status fix that addresses a stuck-state UX defect.

Parked at approval — ready for the release agent's merge/tag pipeline.

This was referenced Apr 30, 2026
nesquena-hermes added a commit that referenced this pull request Apr 30, 2026
- api/streaming.py SSE payload now falls back to agent.model_metadata.get_model_context_length when compressor doesn't supply context_length (mirrors the session-save fallback shipped in v0.50.247).
- api/streaming.py also falls back to s.last_prompt_tokens to avoid using the cumulative input_tokens counter.
- static/ui.js tracks rawPct separately from pct and shows '(context exceeded)' tooltip when rawPct > 100 instead of misleading '100% used (0% left)'.
- static/messages.js clears 'Uploading...' composer status after upload completes.

Co-authored-by: nesquena-hermes <[email protected]>
nesquena-hermes added a commit that referenced this pull request Apr 30, 2026
Bundles 5 community PRs:
- #1355 feat(clarify): SSE long-connection (mirrors #1350 pattern, includes all correctness lessons)
- #1356 fix: context window indicator overflow (live SSE fallback) + uploading status clear
- #1357 fix: preserve imported session source metadata
- #1358 fix: collapse sidebar session lineage rows
- #1359 fix: sync active session across tabs

Tests: 3444 passing (3411 -> 3444, +33)
@nesquena-hermes
Copy link
Copy Markdown
Collaborator Author

Shipped in v0.50.249 (merge d72399ae, tag https://github.com/nesquena/hermes-webui/releases/tag/v0.50.249) via batch release PR #1365.

Production verified live at port 8787, ?v=v0.50.249 cache-bust active.

Already approved at 19:33 UTC, shipped in this batch.

Thanks (self-review approved)! 🙏

GeoffBao pushed a commit to GeoffBao/hermes-webui that referenced this pull request May 1, 2026
- api/streaming.py SSE payload now falls back to agent.model_metadata.get_model_context_length when compressor doesn't supply context_length (mirrors the session-save fallback shipped in v0.50.247).
- api/streaming.py also falls back to s.last_prompt_tokens to avoid using the cumulative input_tokens counter.
- static/ui.js tracks rawPct separately from pct and shows '(context exceeded)' tooltip when rawPct > 100 instead of misleading '100% used (0% left)'.
- static/messages.js clears 'Uploading...' composer status after upload completes.

Co-authored-by: nesquena-hermes <[email protected]>
GeoffBao pushed a commit to GeoffBao/hermes-webui that referenced this pull request May 1, 2026
Bundles 5 community PRs:
- nesquena#1355 feat(clarify): SSE long-connection (mirrors nesquena#1350 pattern, includes all correctness lessons)
- nesquena#1356 fix: context window indicator overflow (live SSE fallback) + uploading status clear
- nesquena#1357 fix: preserve imported session source metadata
- nesquena#1358 fix: collapse sidebar session lineage rows
- nesquena#1359 fix: sync active session across tabs

Tests: 3444 passing (3411 -> 3444, +33)
nesquena-hermes pushed a commit that referenced this pull request May 2, 2026
Fix two-layer bug where `/api/session` returned `context_length=0` for
sessions that pre-date #1318, then the frontend silently fell back to
cumulative `input_tokens` and the 128K JS default, producing nonsense
indicators like "100" capped from "890% used (context exceeded), 1.2M
/ 131.1k tokens used".

Empirical impact: 23 of 75 sessions on dev server rendered >100% before
this fix. #1356 fixed the same symptom on the live SSE path but missed
the GET /api/session load path that older sessions go through.

Two-layer fix:
  1. Backend (api/routes.py:1295-1313) — resolve context_length via
     agent.model_metadata.get_model_context_length() when the persisted
     value is 0. Mirrors api/streaming.py:2333-2342.
  2. Frontend (static/ui.js:1269) — drop the cumulative `input_tokens`
     fallback. When last_prompt_tokens is missing, render "·" + "tokens
     used" (existing !hasPromptTok branch) instead of computing a
     percentage from the cumulative total.

10 regression tests in tests/test_issue1436_context_indicator_load_path.py
covering both layers + the empty-model edge case (avoids the 256K
default-for-unknown-model trap that get_model_context_length('') returns).

Verified live: claude-opus-4-7 session with input_tokens=5,226,479 now
renders "·" + "5.3M tokens used" instead of "100" + "3987% used".

Reported by @AvidFuturist.
Closes #1436.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants