Skip to content

Hermes Web UI — Sprints 11-14: multi-provider models, settings, sessi…#2

Merged
nesquena-hermes merged 1 commit intomasterfrom
sync-sprints-11-14
Mar 31, 2026
Merged

Hermes Web UI — Sprints 11-14: multi-provider models, settings, sessi…#2
nesquena-hermes merged 1 commit intomasterfrom
sync-sprints-11-14

Conversation

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

…on QoL, alerts, polish

Sprint 11 (v0.13): multi-provider model support, streaming smoothness

  • Dynamic model dropdown populated from configured API keys (OpenAI, Anthropic, Google, DeepSeek, GLM, Kimi, MiniMax, OpenRouter, Nous Portal)
  • Scroll pinning during streaming (no forced scroll when user has scrolled up)
  • All route handlers extracted to api/routes.py (server.py now ~76 lines)

Sprint 12 (v0.14): settings panel, SSE reconnect, session QoL

  • Settings panel (gear icon) -- persist default model and workspace server-side
  • SSE auto-reconnect on network blips
  • Pin/star sessions to top of sidebar
  • Import session from JSON export

Sprint 13 (v0.15): cron alerts, background errors, session duplicate, tab title

  • Cron completion alerts: toast per completion + unread badge on Tasks tab
  • Background agent error banner when a non-active session errors mid-stream
  • Session duplicate button
  • Browser tab title reflects active session name

Sprint 14 (v0.16): Mermaid diagrams, file ops, session archive/tags, timestamps

  • Mermaid diagram rendering inline (dark theme, lazy CDN load)
  • File rename (double-click in file tree) and create folder
  • Session archive (hide without deleting, toggle to show)
  • Session tags -- #hashtag in title becomes colored chip + click-to-filter
  • Message timestamps (HH:MM on hover, full date as tooltip)

Test suite: 224 tests across 14 sprint files + regression gate, 0 failures.

…on QoL, alerts, polish

Sprint 11 (v0.13): multi-provider model support, streaming smoothness
- Dynamic model dropdown populated from configured API keys (OpenAI, Anthropic,
  Google, DeepSeek, GLM, Kimi, MiniMax, OpenRouter, Nous Portal)
- Scroll pinning during streaming (no forced scroll when user has scrolled up)
- All route handlers extracted to api/routes.py (server.py now ~76 lines)

Sprint 12 (v0.14): settings panel, SSE reconnect, session QoL
- Settings panel (gear icon) -- persist default model and workspace server-side
- SSE auto-reconnect on network blips
- Pin/star sessions to top of sidebar
- Import session from JSON export

Sprint 13 (v0.15): cron alerts, background errors, session duplicate, tab title
- Cron completion alerts: toast per completion + unread badge on Tasks tab
- Background agent error banner when a non-active session errors mid-stream
- Session duplicate button
- Browser tab title reflects active session name

Sprint 14 (v0.16): Mermaid diagrams, file ops, session archive/tags, timestamps
- Mermaid diagram rendering inline (dark theme, lazy CDN load)
- File rename (double-click in file tree) and create folder
- Session archive (hide without deleting, toggle to show)
- Session tags -- #hashtag in title becomes colored chip + click-to-filter
- Message timestamps (HH:MM on hover, full date as tooltip)

Test suite: 224 tests across 14 sprint files + regression gate, 0 failures.
@nesquena-hermes nesquena-hermes merged commit d90fb42 into master Mar 31, 2026
@nesquena-hermes nesquena-hermes deleted the sync-sprints-11-14 branch March 31, 2026 07:05
Ola-Turmo pushed a commit to Ola-Turmo/hermes-webui that referenced this pull request Apr 9, 2026
Hermes Web UI — Sprints 11-14: multi-provider models, settings, sessi…
nesquena-hermes pushed a commit that referenced this pull request Apr 12, 2026
- Title/badge/source: 2025 → 2026, sources updated to Chatbot Arena/BenchLM
- Overall: Opus 4.6 (#1, 1504 Arena Elo), Gemini 3.1 Pro (#2), GPT-5.4 (#3),
  Sonnet 4.6 (#4), DeepSeek V3.2 (#5, replaces DeepSeek R1)
- Coding: Opus 4.6 (#1, powers Claude Code/Cursor), GPT-5.4 (#2, SWE-Pro 57.7%),
  Sonnet 4.6 (#3), Gemini 3.1 Pro (#4), DeepSeek V3.2 (#5)
  Removed: Sonnet 4.5, Gemini 2.5 Pro, GPT-5.3 Codex
- Writing: Opus 4.6 (#1, Mazur 8.53), Gemini 3.1 Pro (#2, Arena CW 1487),
  Sonnet 4.6 (#3, EQ-Bench CW 1936), GPT-5.4 (#4), Meta Muse Spark (#5)
  Removed: Llama 4 Maverick (replaced by Meta Muse Spark)
- Search: Gemini 3.1 Pro (#1), Grok 4 (#2, live X data), Sonnet 4.6+tool (#3),
  GPT-5.4 (#4), Gemini 3 Flash Thinking (#5)
  Added Grok 4 for real-time social/X data
- Reasoning: Gemini 3.1 Pro (#1, GPQA 95.45%), GPT-5.4 (#2), Opus 4.6 (#3),
  Gemini 3 Flash Thinking (#4, best value), DeepSeek V3.2 (#5)
  Removed: Kimi K2, DeepSeek R1 standalone
- Quick picker: all model names updated to 2026 versions
- Setup boxes: OpenAI gpt-5-4 names, Google gemini-3-1-pro-preview,
  self-hosted section updated to DeepSeek V3.2 + Muse Spark
nesquena-hermes pushed a commit that referenced this pull request Apr 12, 2026
Coding section:
- Gemini 3.1 Pro: add Terminal-Bench 78.4% (highest of any frontier model on CLI/DevOps)
  Score badge updated to show Terminal-Bench rather than SWE-bench Verified
- GPT-5.4: note Terminal-Bench 75.1% in description, consolidate pill text
- #5 DeepSeek V3.2 → Qwen 3.6-Plus: leads Terminal-Bench at 61.6%, 88.2% GPQA,
  1M context, available now on Alibaba Cloud + OpenRouter

Writing section (reordered based on EQ-Bench CW scores):
- #1 Claude Sonnet 4.6 (1936 EQ-Bench CW — highest, best voice consistency)
- #2 Claude Opus 4.6 (Mazur 8.53, IF Arena #1, 1M context for literary depth)
- #3 Gemini 3.1 Pro (Arena CW #1 1487, AI-tell avoidance, 2M context)
- #4 GPT-5.4 (noted as ~9th on Arena CW, better for structured/commercial writing)
- #5 Meta Muse Spark → Kimi K2.5 (/usr/bin/bash.60/.50, ~1700 EQ-Bench CW, live API)
  Muse Spark removed — no commercial API available yet

Reasoning section:
- Gemini 3.1 Pro GPQA: 95.45% → 94.1% (more conservative/recent figure, consistent
  with both agents' data)
- Added ARC-AGI-2 77.1% for Gemini 3.1 Pro (#1 on visual reasoning too)
- Opus 4.6: added note that Sonnet leads GDPval-AA (1633 Elo #1) for throughput
- #5 DeepSeek V3.2 → Qwen 3.6-Plus (88.2% GPQA, 1M context, same model as coding)

Quick picker:
- Creative writing: Opus → Sonnet 4.6 (EQ-Bench #1, 85% cheaper)
- Hard reasoning: 95.45% → 94.1%, add ARC-AGI-2 mention
- Budget pick: DeepSeek V3.2 → Gemini 3 Flash Thinking (/usr/bin/bash.50/1M, 89.8% GPQA)

Setup boxes:
- Self-hosted: Muse Spark → Qwen 3.6-Plus + Gemma 4 26B MoE (Apache 2.0,
  82.3% GPQA with 3.8B active params, best edge/self-hosted reasoning)

Overall section: unchanged (top 5 still correct per both agents)
Search section: unchanged (no new data from either agent)
nesquena-hermes pushed a commit that referenced this pull request Apr 14, 2026
index.html:
- Hero badge: 56k+ → 77k+ GitHub stars (NousResearch/hermes-agent: 76,811)

community/index.html:
- All 24 project star counts updated to live GitHub values (April 13 2026)
  Notable bumps: gbrain 4.8k→7.4k, hermes-webui 1.3k→1.8k, hermes-agent 56k+→77k,
  hermes-workspace 1.1k→1.3k, hudui 571→827, self-evolution 893→1.6k
- Removed awizermann/scarf (repo no longer exists on GitHub)
- Added 3 new high-value projects:
  - vectorize-io/hindsight (9.1k ⭐) → Memory Providers: Official Hermes memory plugin,
    biomimetic 3-tier memory, topped LongMemEval Jan 2026
  - rohitg00/agentmemory (1.3k ⭐) → Memory Providers: #1 persistent memory for agents,
    explicit Hermes integration path, 43 MCP tools
  - builderz-labs/mission-control (4.1k ⭐) → Tools & Orchestration: self-hosted agent
    platform with Hermes gateway integration
- Updated project count: 25 → 27

models/index.html:
- Gemini 3.1 Pro: bumped from #4 Arena (1492 Elo) to #1 (1505 Elo)
- Claude Opus 4.6: updated from #1 (1504) to #2 (1503 Elo) — still top tier
- Updated datestamp to April 13, 2026
nesquena-hermes pushed a commit that referenced this pull request Apr 16, 2026
Factual fixes:
- Hindsight link now points to github.com/vectorize-io/hindsight (hindsight.ai 403)
- Holographic: removed external link (domain for sale), note it is built into Hermes
- RetainDB: flagged domain availability as unverified
- Hindsight benchmarks added: LongMemEval 91-94.6%, BEAM 64.1%, LoCoMo 89.6%
- Holographic hosting corrected to Local only (was Cloud)
- Honcho hosting corrected to Cloud + self-host (was Cloud only)
- ByteRover license kept as Partial OSS (not MIT)
- Supermemory stars footnoted: consumer frontend, not core engine

Content additions:
- Architecture notes for all 8 providers (TEMPR, HRR/SQLite, L0/L1/L2, triple-store, etc.)
- Full pricing detail: paid tiers, per-token cloud rates
- GitHub star counts in table and provider cards
- Tool counts (2-5) in table and provider cards
- Config snippets section with 6 provider examples
- Last verified April 2026 timestamp in hero

Recommendation fixes:
- Coding agents: Hindsight #1 (TEMPR BM25, entity extraction), ByteRover #2
- Knowledge wiki: Hindsight #1 (highest LongMemEval), Supermemory #2
- Privacy: Holographic #1 (zero deps, local SQLite), then Hindsight, OpenViking
- Added Personal AI companion use case (Honcho #1)
- Benchmark winner updated to Hindsight on LongMemEval axis

Table structure:
- Best for column moved to position 2 (was last/rightmost)
- Paid from column added
- Stars and Tools columns added (hidden on narrow screens)
- Benchmark column shows multiple scores for Hindsight/ByteRover
snutp added a commit to snutp/hermes-webui that referenced this pull request Apr 19, 2026
Codex 리뷰 blocker nesquena#2:
단일 키 저장은 multi-tab / project-switch 시 잘못된 project를
resume하게 만듦. 주 키를 'webui:last:<port>' 로 바꾸고 legacy
전역 키는 호환용으로 유지.
nesquena-hermes pushed a commit that referenced this pull request Apr 21, 2026
JKJameson pushed a commit to JKJameson/hermes-webui that referenced this pull request Apr 25, 2026
Hermes Web UI — Sprints 11-14: multi-provider models, settings, sessi…
zhonghuaY referenced this pull request in zhonghuaY/hermes-webui Apr 27, 2026
…#2)

Revert to standard /v1 base URL and pass keyword via x-instance-keyword
HTTP header in extra_headers. The gw: model prefix prevents Responses
API auto-detection; the gateway strips it for strict matching.

Co-authored-by: Albert Yang <[email protected]>
Co-authored-by: Copilot <[email protected]>
bergeouss added a commit to bergeouss/hermes-webui that referenced this pull request Apr 27, 2026
Address reviewer feedback on nesquena#1152 (blocker nesquena#2):
- Document page reload behavior: state PERSISTS (server-side in-memory)
- Document cross-tab isolation: state is SHARED per session
- Document server restart: state is LOST (in-memory only)
- Correct misleading 'resets on page reload' comment in messages.js
- Update backend route comment in routes.py
bergeouss added a commit to bergeouss/hermes-webui that referenced this pull request Apr 27, 2026
Address reviewer feedback on nesquena#1152 (blocker nesquena#2):
- Document page reload behavior: state PERSISTS (server-side in-memory)
- Document cross-tab isolation: state is SHARED per session
- Document server restart: state is LOST (in-memory only)
- Correct misleading 'resets on page reload' comment in messages.js
- Update backend route comment in routes.py
bsgdigital pushed a commit to bsgdigital/hermes-webui that referenced this pull request Apr 30, 2026
- New CONTRIBUTORS.md: full ranked credit roll for all 66 contributors
  (5+ tiers), with first/latest release versions, single-PR roll, and
  attribution methodology. Generated from git log + gh pulls API +
  CHANGELOG mention parsing.

- README.md: stack-ranked top-10 contributors table at the top of the
  Contributors section, link to CONTRIBUTORS.md for the full list.
  Updated test count (1898 → 3309). Refreshed @franksong2702 and
  @bergeouss entries to reflect their broader bodies of work (now
  the #1 and nesquena#2 external contributors).

- ARCHITECTURE.md: removed stale 'tracks upstream v0.50.36' header;
  bumped current shipped build to v0.50.245 with current architecture
  state notes (streaming-markdown vendoring, byte-range streaming,
  configurable-model-badges).

- ROADMAP.md / SPRINTS.md / TESTING.md: header/last-updated bumps to
  v0.50.245 and 3309 tests. SPRINTS.md 'Where we are now' section
  refreshed for current CLI/Claude parity (~95% Claude parity now).

Generated by aggregating CHANGELOG attribution lines, gh PR API
authors, and CHANGELOG version-section walks. Internal/bot accounts
filtered out.
nesquena-hermes added a commit that referenced this pull request Apr 30, 2026
…al reconnect

The original PR #1355 frontend health timer at 60s force-reconnected every
60s regardless of activity, with a comment claiming it was a 'no event in
60s' detector. This churned one TCP/SSE setup per minute per active
session.

Fix: track lastEventAt on initial+clarify event arrivals; only reconnect
when the gap exceeds 60s. The server sends keepalive comments every 30s,
but EventSource silently consumes those — we can only bump lastEventAt on
real application events. Under healthy conditions the timer never fires.

Sits on top of nesquena's 604b44a deadlock fix (parallel work — Nathan
caught the same Opus MUST-FIX before my push landed). Opus SHOULD-FIX #2
from the v0.50.249 pre-release review.
JKJameson pushed a commit to JKJameson/hermes-webui that referenced this pull request Apr 30, 2026
…al reconnect (nesquena#1367)

Follow-up to v0.50.249 / PR nesquena#1365 absorbing Opus SHOULD-FIX nesquena#2.
Originally reset out of nesquena#1365 because the reviewer flagged it as
out-of-scope; brought back per follow-up guidance that
correctness-improving changes should ship even when out of scope.

The clarify SSE health timer at static/messages.js:1715 was an
unconditional 60s force-reconnect, not the 'no event in 60s' detector
its comment claimed. Now actually a stale-detector that tracks
lastEventAt on initial+clarify event arrivals; only reconnects when
the gap exceeds 60s. Under healthy conditions the timer never fires.

Co-authored-by: nesquena-hermes <[email protected]>
@nesquena-hermes nesquena-hermes mentioned this pull request Apr 30, 2026
5 tasks
GeoffBao pushed a commit to GeoffBao/hermes-webui that referenced this pull request May 1, 2026
- New CONTRIBUTORS.md: full ranked credit roll for all 66 contributors
  (5+ tiers), with first/latest release versions, single-PR roll, and
  attribution methodology. Generated from git log + gh pulls API +
  CHANGELOG mention parsing.

- README.md: stack-ranked top-10 contributors table at the top of the
  Contributors section, link to CONTRIBUTORS.md for the full list.
  Updated test count (1898 → 3309). Refreshed @franksong2702 and
  @bergeouss entries to reflect their broader bodies of work (now
  the nesquena#1 and nesquena#2 external contributors).

- ARCHITECTURE.md: removed stale 'tracks upstream v0.50.36' header;
  bumped current shipped build to v0.50.245 with current architecture
  state notes (streaming-markdown vendoring, byte-range streaming,
  configurable-model-badges).

- ROADMAP.md / SPRINTS.md / TESTING.md: header/last-updated bumps to
  v0.50.245 and 3309 tests. SPRINTS.md 'Where we are now' section
  refreshed for current CLI/Claude parity (~95% Claude parity now).

Generated by aggregating CHANGELOG attribution lines, gh PR API
authors, and CHANGELOG version-section walks. Internal/bot accounts
filtered out.
GeoffBao pushed a commit to GeoffBao/hermes-webui that referenced this pull request May 1, 2026
…al reconnect (nesquena#1367)

Follow-up to v0.50.249 / PR nesquena#1365 absorbing Opus SHOULD-FIX nesquena#2.
Originally reset out of nesquena#1365 because the reviewer flagged it as
out-of-scope; brought back per follow-up guidance that
correctness-improving changes should ship even when out of scope.

The clarify SSE health timer at static/messages.js:1715 was an
unconditional 60s force-reconnect, not the 'no event in 60s' detector
its comment claimed. Now actually a stale-detector that tracks
lastEventAt on initial+clarify event arrivals; only reconnects when
the gap exceeds 60s. Under healthy conditions the timer never fires.

Co-authored-by: nesquena-hermes <[email protected]>
nesquena-hermes pushed a commit to ccqqlo/hermes-webui that referenced this pull request May 2, 2026
…na#1458 Bug nesquena#1)

Issue nesquena#1458 reports persistent-host crashes (≥1/day) when running the WebUI
under launchd KeepAlive on macOS. Root cause: `bootstrap.py` calls
`subprocess.Popen([python, "server.py"], start_new_session=True)`, probes
/health, then exits 0. Under any process supervisor (launchd, systemd,
supervisord, runit, s6), the supervisor sees its tracked PID exit, marks
the program as "completed," and respawns it. The new bootstrap fails to
bind port 8787 (orphaned server still has it), exits non-zero, supervisor
respawns again — loop until the orphan crashes for some other reason and
the next respawn finds the port free.

This PR addresses Bug nesquena#1 of the three failure modes tracked in nesquena#1458:
the `bootstrap.py` double-fork breaking process supervisors. Bug nesquena#2
(state.db FD leak) and Bug nesquena#3 (HTTP-unhealthy wedge) remain open under
the same issue — they need diagnosis data before a fix can land.

Changes
-------

1. `bootstrap.py`:
   - New `--foreground` argparse flag with help text mentioning launchd /
     systemd / supervisord.
   - New `_detect_supervisor()` that returns the env var name for any
     supervisor it detects: `INVOCATION_ID` / `JOURNAL_STREAM` /
     `NOTIFY_SOCKET` (systemd, s6), `XPC_SERVICE_NAME` (launchd),
     `SUPERVISOR_ENABLED` (supervisord), or `HERMES_WEBUI_FOREGROUND` for
     the explicit user opt-in. Truthy values for the explicit opt-in:
     `1` / `true` / `yes` / `on` (case-insensitive).
   - `main()` branches on `args.foreground or _detect_supervisor()`:
     - **Foreground path:** chdir to `agent_dir or REPO_ROOT`, then
       `os.execv(python, [python, server_path])` to replace the bootstrap
       process image with the server. The supervisor sees the long-lived
       server as the original child. No `wait_for_health` probe — the
       supervisor's KeepAlive / Restart=on-failure handles liveness.
     - **Default path:** unchanged. Spawn server as detached child via
       `Popen + start_new_session=True`, probe /health, return 0. This
       still works for interactive `bash start.sh` invocations.
   - Resolved env vars (HOST/PORT/STATE_DIR/AGENT_DIR) are now mutated on
     `os.environ` directly instead of into a local `env` copy so they
     are inherited across `os.execv`.

2. `docs/supervisor.md` (new): runnable launchd plist, systemd .service,
   and supervisord conf examples + a diagnostic recipe (`lsof` + ppid
   chain) for catching the orphan-loop in production.

3. `.gitignore`: allowlist `docs/supervisor.md` (the directory uses an
   opt-in pattern; matches the existing `!docs/docker.md` precedent).

4. `tests/test_bootstrap_foreground.py` (new): 35 regression tests
   covering the argparse flag, `_detect_supervisor()` behavior across all
   five supervisor env vars, the explicit opt-in's truthy/falsy values,
   and `main()`'s execv-vs-Popen routing decision under each input
   combination. `os.execv` is monkeypatched in the routing tests — we
   pin the structural choice (which call is made, with which args, in
   which cwd, with which env) not the post-exec behavior.

Why this scope and no more
--------------------------

Bug nesquena#2 (state.db FD leak) lists 5 candidate paths and asks the reporter
for `lsof -p <pid> | sort | uniq -c | sort -rn | head -20` output to
disambiguate. Until that data lands, any "fix" would be speculative —
explicitly out of scope per the contributor-pickup comment on the issue.

Bug nesquena#3 (launchd-running, port-listening, HTTP-unhealthy) was added in
@stefanpieter's reply comment. Diagnosis is in flight; no concrete fix
shape yet. Also out of scope.

Running locally end-to-end verifies the behavior:

```
[bootstrap] Starting Hermes Web UI on http://127.0.0.1:8789 (foreground mode: --foreground)
$ pgrep -af 'server.py'
2997632 /home/.../python /tmp/wt-fix-1458/server.py
$ ps -o ppid -p 2997632
2997581   ← bash that ran bootstrap.py — same PID as the original bootstrap
$ ps -p 2997581 -o cmd
... bootstrap.py ...   ← but exec'd into server.py
```

The same PID that bash forked for `bootstrap.py` is now `server.py`.
A supervisor watching that PID would correctly observe the long-lived
server. No double-fork.

Verification
------------

- 3811 tests pass (`pytest tests/` — full suite, +51 from this PR plus
  master-merge-in)
- All 35 new bootstrap-foreground tests pass
- `bash scripts/run-browser-tests.sh` PASS (HTTP API checks against worktree)
- `bash scripts/webui_qa_agent.sh 8789` PASS (23/23 visual QA)
- Live verified: server starts cleanly under both `--foreground` and
  `HERMES_WEBUI_FOREGROUND=1`; PID lineage confirms no double-fork

Closes nesquena#1458 (Bug nesquena#1 only). Bugs nesquena#2 and nesquena#3 remain tracked under the
issue.
nesquena-hermes added a commit that referenced this pull request May 3, 2026
Stage 272: 3 PRs — #1493 sidebar cancel + #1495 state.db FD leak fix + #1492 P0 polish bundle (closes #1466 #1469 #1484 #1486 #1494; refs #1458 Bug #2)
lanehollard pushed a commit to lanehollard/hermes-webui that referenced this pull request May 3, 2026
… sidebar polling (nesquena#1494)

Production WebUI on macOS launchd reproduced an HTTP-unhealthy wedge after
nesquena#1483 closed the bootstrap supervisor double-fork: process alive, port
listening, every HTTP request reset by peer before a response. The reporter
(@insecurejezza) traced it to FD exhaustion — 366 open FDs on the wedged
process, 238 of them `~/.hermes/state.db`, `state.db-wal`, and `state.db-shm`.

Root cause: four sqlite callsites use `with sqlite3.connect(...) as conn:`.
Python's sqlite3 connection context manager only commits or rolls back on
exit; it does NOT close the connection. `/api/sessions` polling calls these
on every sidebar refresh, so each poll leaked one or more open state.db FDs
until the process hit macOS's soft FD limit and new sqlite3.connect() calls
inside fresh request handlers raised before any response bytes were written.

Fix: wrap each `sqlite3.connect(...)` in `contextlib.closing(...)` so the
connection is explicitly closed on scope exit, in addition to the auto-
commit / rollback semantics that `Connection.__exit__` already provides.

Callsites patched:
- api/agent_sessions.py:read_importable_agent_session_rows
- api/agent_sessions.py:read_session_lineage_metadata
- api/models.py:get_cli_session_messages
- api/models.py:delete_cli_session

Reporter's verification (post-patch, 100-request stress loop against
/api/sessions and /api/projects):

  batch=1 fd=92 state_handles=0
  batch=2 fd=92 state_handles=0
  ...
  batch=5 fd=92 state_handles=0

Pre-patch the same loop made FD count and state.db handle count climb
monotonically.

4 regression tests in tests/test_issue1494_state_db_fd_leak.py monkeypatch
sqlite3.connect with a tracking wrapper that records .close() calls and
assert every connection opened by each of the four functions is explicitly
closed. Verified to fail (catching the original bug) when the closing()
wrap is reverted: "leaked 5 of 5 sqlite connection(s) — context-manager-
only `with sqlite3.connect()` does not close. Wrap in contextlib.closing()."

This addresses Bug nesquena#2 of the umbrella issue nesquena#1458. Bug nesquena#3 (HTTP-unhealthy
wedge in the absence of FD exhaustion) remains open pending separate
diagnostic data — explicit scope discipline.

Closes nesquena#1494
Refs nesquena#1458 (Bug nesquena#2 of 3)

Co-authored-by: insecurejezza <[email protected]>
lanehollard pushed a commit to lanehollard/hermes-webui that referenced this pull request May 3, 2026
lanehollard pushed a commit to lanehollard/hermes-webui that referenced this pull request May 3, 2026
…ix + P0 polish bundle (nesquena#1466 nesquena#1494 nesquena#1469 nesquena#1484 nesquena#1486)

3 PRs in this batch (3866 → 3874 tests, +8):

- nesquena#1493 (@dso2ng) — sidebar Stop response cancels row's stream not active pane's (closes nesquena#1466, follow-up to nesquena#1480)
- nesquena#1495 (self-built; reported by @insecurejezza in nesquena#1494) — state.db connection FD leak in sidebar polling (closes nesquena#1494, addresses Bug nesquena#2 of nesquena#1458)
- nesquena#1492 (@bergeouss) — P0 bugfixes bundle: tool-card args readability + CLI rename persistence + scroll pinning + sw.js relative-path regression test (closes nesquena#1469 nesquena#1484 nesquena#1486)

This release closes Bug nesquena#2 of the umbrella issue nesquena#1458. Bug nesquena#1 was closed by v0.50.269 (nesquena#1483) + v0.50.270 (nesquena#1487). Bug nesquena#3 (HTTP-unhealthy without FD exhaustion) is the remaining work item.
waldmanz pushed a commit to waldmanz/hermes-webui that referenced this pull request May 4, 2026
Both flagged by pre-release Opus advisor; both clearly defensive and small
enough to absorb in-release per the reviewer-flagged-fix-in-release-not-followup
policy.

SHOULD-FIX nesquena#1 (api/routes.py:_clear_stale_stream_state, ~25 LOC):
After the metadata-only reload (nesquena#1559 Layer 2), the local 'session'
variable is reassigned to the full-load object but the caller still holds
the original metadata-only stub. /api/session then returns the stale
active_stream_id at routes.py:1791, causing the frontend to attempt one
ghost SSE reconnect before recovering. Fix: capture original_stub at
function entry, then patch its in-memory active_stream_id and pending_*
fields to None after both the early-return (full-load already cleared)
path AND the successful-mutation path. Now the caller's read returns
fresh state, no ghost reconnect.

SHOULD-FIX nesquena#2 (api/models.py:Session.save, ~20 LOC):
The .bak write at api/models.py:436 used write_text() which truncates-
then-writes — a crash mid-write or concurrent backup-producing save
could leave a torn .bak. Recovery defends correctly (JSONDecodeError →
returns -1 → 'no_action'), so the failure mode was 'backup lost' not
'spurious restore'. Fix: tmp + os.replace pattern matching the main file
write at line 446-453. Now backup either lands cleanly or doesn't land
at all.

4026/4026 tests pass post-absorb.
nesquena-hermes added a commit that referenced this pull request May 4, 2026
SHOULD-FIX #1 (renamed-root client cross-alias): drop strict-equality client
filter at static/sessions.js:1853. Server-side _profiles_match cross-aliases
'default'-tagged rows to a renamed root 'kinni'; the strict-equality client
would reject them, dropping every legacy session for renamed-root users. The
server is now solely authoritative for profile scoping.

SHOULD-FIX #2 (messaging-source dedupe ordering): _keep_latest_messaging_session_per_source
now runs AFTER the profile filter at api/routes.py:2078. Before, it ran on
the merged-cross-profile list with profile-blind keys, discarding the older
profile's row across profiles before the scope filter — leaving zero rows for
any messaging identity the active profile shared with another profile.

NIT #3: _projects_migrated flag now set only AFTER successful save_projects.
NIT #4: cleaned dead test code in test_is_root_profile_invalidation_drops_stale.
NIT #5: _create_profile_fallback's clone_from=='default' literal now routes
through _is_root_profile() for parity with the 5 other callsites.

+2 regression tests pin the SHOULD-FIX shapes:
- test_keep_latest_messaging_runs_after_profile_filter (source-string ordering)
- test_static_sessions_js_trusts_server_profile_scoping (no client re-filter)

4173 -> 4175 tests pass. 0 regressions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant