Harden auto title generation for reasoning models by nesquena-hermes · Pull Request #1026 · nesquena/hermes-webui

nesquena-hermes · 2026-04-25T18:44:00Z

Attribution

This is a fresh PR carrying the exact code from @franksong2702's original PR #1002, which was closed when the repo briefly blocked new PRs. Full credit to Frank for the work.

Original PR: #1002

Summary

This fixes the concrete first-exchange title-generation failure discussed in #869 for reasoning-style models such as Ollama Cloud kimi-k2.6.

The root cause is not title quality. It is an output-budget / response-shape mismatch: reasoning models can spend the completion budget in reasoning and return empty content with finish_reason: length, so Hermes falls back to the first-message local summary.

Changes

Raises the title-generation completion budget to a reasoning-safe 512 tokens.
Retries once with 1024 tokens when the response is empty because of finish_reason: length or reasoning-only output.
Applies the same retry policy to both routes:
- configured auxiliary.title_generation route
- active-agent fallback route
Preserves MiniMax reasoning_split as a provider-specific enhancement, but no longer treats MiniMax as the only reasoning-model case.
Improves diagnostics: when local fallback is used, title_status now preserves the underlying reason, e.g. local_summary:llm_length_aux.
Keeps the existing auxiliary.title_generation path intact so users can still route title generation to a cheaper/faster dedicated model.

Scope

This PR targets the bug side of #869: provisional first-message titles not being replaced after the first assistant reply when title generation returns empty content.

It does not implement continuous title re-summarization as the conversation evolves. That remains a separate feature/design scope.

Verification

python3 -m py_compile api/streaming.py
pytest tests/test_title_aux_routing.py -q
pytest tests/test_sprint41.py tests/test_title_sanitization.py tests/test_issues_853_857.py tests/test_title_aux_routing.py -q

nesquena

Review — end-to-end ✅ (clean, no fixes needed)

Traced against upstream hermes-agent

Fresh nousresearch/hermes-agent tarball pulled. Pure WebUI change inside api/streaming.py — agent.auxiliary_client.call_llm (the agent-side surface) is called as before, just with a larger max_tokens and a retry loop. Agent code untouched. ✓

What this PR does

Hardens auto title generation against reasoning models that route output into message.reasoning and finish with finish_reason: length and empty content. Direct fix for the concrete bug surfaced by @TUBEOF in #869 with kimi-k2.6 on Ollama Cloud.

Six surgical changes in api/streaming.py:

Title budget bumped 160 → 512. Removes the MiniMax-only 384 special case; everyone gets the reasoning-safe baseline.
_title_retry_completion_budget() returns 1024 — used on retry when the first call returned empty due to length/reasoning.
_extract_title_response(resp, *, aux=False) — new helper. Returns (content, empty_status). Distinguishes llm_length, llm_empty_reasoning, and generic llm_empty based on finish_reason and presence of reasoning/reasoning_content/thinking fields.
Retry loop in both generate_title_raw_via_aux and generate_title_raw_via_agent. Inner for budget_idx, max_tokens in enumerate(budgets) runs once at 512, then if _title_retry_status(last_status) triggers (length / empty_reasoning), appends 1024 and runs again. Retries are scoped to the same prompt index — outer for prompt in prompts continues unchanged.
last_status tracking — both routes now return the most specific failure status (llm_length_aux, llm_empty_reasoning_aux, etc.) instead of the generic llm_error_aux on exhaustion.
local_summary:{llm_status} — when the local summary fallback is used, _put_title_status includes the underlying LLM failure reason. The frontend console.info('[title]', …) listener already logs reason verbatim, so existing diagnostics flows pick this up automatically.

The MiniMax reasoning_split: true extra_body is preserved as a provider-specific enhancement (not removed alongside the budget change).

End-to-end trace (the #869 happy path)

User in Hermes WebUI: types first message, sends. Background title gen fires.

generate_title_raw_via_aux calls agent.auxiliary_client.call_llm with max_tokens=512.
Provider returns {choices[0]: {message: {content: "", reasoning: "..."}, finish_reason: "length"}}.
_extract_title_response → ('', 'llm_length_aux'). last_status = 'llm_length_aux'.
_title_retry_status('llm_length_aux') is True → append 1024 to budgets.
Loop runs again with max_tokens=1024.
Provider now has enough headroom — emits {message: {content: "Useful Title"}, finish_reason: "stop"}.
Returns ('Useful Title', 'llm_aux_retry'). Sidebar gets the real title.

If retry also fails, last_status propagates up. _run_background_title_update sees source='fallback', builds fallback_reason = 'local_summary:llm_length_aux', emits via _put_title_status. Frontend console: [title] {status: 'fallback', reason: 'local_summary:llm_length_aux', ...} — operator can now distinguish "retry exhausted on length" from "provider exception" without reading server logs.

Edge-case trace

Scenario	Behaviour
Non-reasoning model (most providers)	First call at 512 returns content → no retry. Status `llm_aux`. ✅
Reasoning model with adequate budget	First call at 512 returns content → no retry. Status `llm_aux`. ✅
Reasoning model that needs more headroom (the #869 case)	First call empty+length → retry at 1024 → content. Status `llm_aux_retry`. ✅
Reasoning model that needs >1024 tokens	Both calls empty → status `llm_length_aux` → fallback to local summary with reason `local_summary:llm_length_aux`. ✅
Provider returns 5xx	First call raises → `last_status = 'llm_error_aux'`, no retry, fallback with `local_summary:llm_error_aux`. ✅
Provider returns empty content + finish_reason=stop (no reasoning)	`llm_empty_aux`, no retry (not in `_title_retry_status` set). Falls back. ✅ — correct, retrying length would just waste tokens.
Active-agent path (no `auxiliary.title_generation` configured)	Same retry logic mirrored in `generate_title_raw_via_agent` for OpenAI, anthropic_messages, codex_responses api modes. The OpenAI branch uses `_extract_title_response(resp)` for status; codex/anthropic branches use a generic `llm_empty` (less precise but safe — these api modes don't typically return reasoning fields). ✅
MiniMax (uses `reasoning_split` extra_body)	Still gets `reasoning_split=True`. Budget went from 384 → 512 (mild bump). Retry at 1024 still applies. ✅
Budget cost concern	Title gen runs once per session (after first exchange), and only when prior call hit length/empty-reasoning. Worst case: 512+1024 = 1536 reasoning-tokens per session. For a typical 10-message session at 100K tokens, this is <2% overhead. Acceptable. ✅
`_safe_obj_value` MagicMock detection	Audited: production code paths only ever pass real provider response objects (`module.startswith('unittest.mock')` is False). Detection only fires when legacy MagicMock test fixtures are used; new tests use dicts/SimpleNamespace. Mild ugliness in production code but no functional risk. Not worth pushing a fix. ✅

Backwards compatibility

Frontend static/messages.js:658 only console.infos the title_status payload — no exact-match comparisons on reason. Adding :{llm_status} suffix is purely additive diagnostic info. ✅
Existing test tests/test_sprint41.py::test_streaming_emits_title_status_sse_event checks for put_event('title_status', payload) literal — unaffected. ✅
No protocol changes, no new fields, no DB migrations.

Tests

tests/test_title_aux_routing.py — 22/22 pass (5 new tests added: budget defaults, aux retry, aux exhausted status, agent retry, fallback reason preservation)
tests/test_sprint41.py, test_title_sanitization.py, test_issues_853_857.py — 73/73 pass
Full local suite: 2090 passed, 47 skipped, 1 deselected, 0 unrelated failed (test_workspace_add_rejects_system_paths fails on master too — pre-existing)
python3 -c "import ast; ast.parse(open('api/streaming.py').read())" — clean

Security audit

No new input paths. max_tokens and extra_body payloads are server-internal constants.
_extract_title_response does string-only comparisons on finish_reason (lowercased) and bounded getattr on response objects. No injection vector.
_safe_text_value uses str(value or '').strip() — safe coercion.
Fallback reason local_summary:{llm_status} only includes server-controlled status strings (llm_length, llm_empty_reasoning, etc.) — no user-controlled content. ✓

Recommendation

This is the cleanest of the three "integrate" PRs. Surgical fix for a real reproducible bug (TUBEOF's curl output in #869 confirmed kimi-k2.6 returns reasoning + empty content + length). The fix matches the diagnosis exactly: bump budget, retry on length/empty-reasoning, preserve diagnostic reason on fallback.

The MiniMax-specific 384 budget is replaced with a uniform 512 baseline plus a generic retry. This is a net architectural simplification — fewer provider-specific branches.

Credit: @franksong2702 for the original work in #1002 and the systematic diagnosis in the #869 thread.

Closes the bug-fix portion of #869. The continuous re-summarization feature request (titles that adapt as the conversation evolves) remains a separate scope, as the PR description correctly notes.

✅ Approved. Ready for merge. Independent of #1024 and #1025 — no merge-order constraints.

nesquena-hermes · 2026-04-25T19:57:41Z

Thanks for the detailed writeup, @franksong2702 (via @nesquena-hermes reconstruction) — this is a clean, targeted bug fix for the reasoning-model title-generation failure.

What looks good:

Root cause is correctly identified: reasoning models spend their completion budget in reasoning tokens and return empty content with finish_reason: length — not a title quality issue, but a budget mismatch.
The two-stage budget strategy (512 → retry at 1024 on length failure) is proportionate and avoids over-allocating on every call.
Applying the same retry policy to both routes (configured auxiliary.title_generation and the active-agent fallback) ensures consistent behavior regardless of routing.
Preserving the MiniMax reasoning_split path while generalizing the retry logic is the right call — no regressions for users already relying on that path.
title_status: local_summary:llm_length_aux diagnostic makes it much easier to distinguish "LLM returned empty due to budget" from other fallback reasons.
test_title_aux_routing.py covers the core new behavior.

One minor note:

The retry cap is 1024 tokens. For very large reasoning model contexts this may still hit finish_reason: length — but this is an edge case and the fallback behavior (local summary) is reasonable. A follow-up could expose title_max_tokens as a config option if needed, but that's out of scope here.

This looks ready. ✅

@franksong2702

…n UX, folder create, model errors, session speed, title gen (#1031) * fix: remove orphaned i18n keys from top-level LOCALES object Three Traditional Chinese translation keys (cmd_status, memory_saved, profile_delete_title) were placed outside any locale block between the en and ru blocks in static/i18n.js. They became top-level properties of the LOCALES object, causing them to appear as invalid language options in the Settings > Preferences dropdown. The correct translations already exist in the zh-Hant locale block. Fixes #1008 * fix: block stale SSE events from polluting new session's DOM - appendThinking(): guard with !S.session||!S.activeStreamId to drop events from a previous session's SSE stream during a session switch - appendLiveToolCard(): same guard for consistency - finalizeThinkingCard(): scroll thinking-card-body to top when scroll is pinned, so completed response is immediately visible - appendThinking(): auto-scroll thinking card body to bottom while streaming if user is watching (scroll pinned) * Fix empty agent sessions in sidebar * fix: resolve cron UI UX issues — icon ambiguity, toast overlap, running status Fixes #995 — three sub-issues in the Cron Jobs UI: 1. Dual play icons ambiguous: Resume button now shows a distinct play+bar icon (play triangle + vertical line) instead of the identical triangle used by Run now. 2. Toast notification overlapping header buttons: Added position:relative; z-index:10 to .main-view-header so it stacks above the fixed toast (z-index:100 within its layer). 3. No running status after trigger: After triggering a job, the status badge immediately shows 'running…' with a CSS spinner animation, and polls the cron list every 3s (up to 30s) to refresh when the job completes. - Added cron_status_running i18n key in all 5 locales (en, es, de, ru, zh, zh-Hant) - Added .detail-badge.running CSS class with spinner animation - New functions: _setCronDetailStatus(), _startCronRunningPoll() * fix(#1011): address review feedback — poll cleanup, badge persistence, 30s fallback - _clearCronDetail() now clears _cronRunningPoll interval on navigation - Poll re-applies 'running' badge after loadCrons() re-render (prevents flicker) - When poll ends (30s max), detail re-renders with actual status as fallback * feat: create folder and add space directly from UI (#782) - After creating a folder via the file tree New folder button, offer to add it as a space via confirm dialog - Add Create folder if it doesnt exist checkbox in the New Space form - Backend: support create flag in /api/workspaces/add to mkdir before validation - i18n: 4 new keys (folder_add_as_space_title/msg/btn, workspace_auto_create_folder) in all 6 locales * fix: validate workspace path before mkdir to prevent orphan directories Review feedback (critical): the previous code called mkdir() before validate_workspace_to_add(), which meant a rejected path (e.g. system dir) would leave an orphan directory on disk. New flow: 1. Resolve path and check against blocked system roots BEFORE any mutation 2. mkdir() only if path passes the blocklist check 3. Full validation (exists, is_dir) after mkdir Also imports _workspace_blocked_roots for the pre-mutation blocklist check. * fix(#1014): classify model-not-found errors with helpful message - Add model_not_found error type to streaming.py exception classifier - Detect 404, 'not found', 'does not exist', 'invalid model' patterns - Strip HTML tags from provider error messages (nginx 404 pages, etc.) - Add model_not_found branch to apperror handler in messages.js - Add i18n key model_not_found_label in all 6 locales - 15 tests covering detection, sanitization, frontend, and i18n * feat(ui): add live TPS stat to header Adds a TPS (Tokens Per Second) chip to the right of the header title bar that updates live while AI output is streaming. Metering (api/metering.py) - Tracks per-session output + reasoning tokens via GlobalMeter singleton - Per-session TPS = total_tokens / elapsed_time - Global TPS = average of active sessions' TPS values - HIGH/LOW are max/min of global_tps snapshots over a 60-minute rolling window (only recorded when > 0, so idle periods are excluded) - Thread-safe with a single lock Metering events emitted from streaming.py - Throttled at 100ms from token/reasoning/tool callbacks so the display updates rapidly during fast token streams - 1Hz ticker as fallback for slow streams (exits when no active sessions) - Final stats emitted on stream end Routes (api/routes.py) - Removed POST /api/metering/interval endpoint (dynamic interval via focus/blur was replaced with simple always-1s-when-active approach) UI (static/messages.js, index.html, style.css) - TPS chip in titlebar: shows 'N.N t/s . N.N high . N.N low' - Default: '0.0 t/s . 0.0 high' when idle - Display updates on every metering SSE event (throttled to 100ms) * feat: session restore speed + title gen reasoning hardening (#1025, #1026) PR #1025 (@franksong2702): Speed up large session restore paths - GET /api/session?messages=0 now parses only metadata before the messages array - Metadata-only loads no longer populate the full-session LRU cache - Frontend lazy fetch uses resolve_model=0 to avoid cold model-catalog lookup - Hard reload no longer waits for populateModelDropdown() before restoring session PR #1026 (@franksong2702): Harden auto title generation for reasoning models - Raises title-gen completion budget to 512 tokens (reasoning-safe) - Retries once with 1024 tokens on empty content / finish_reason:length - Applies retry to both auxiliary and active-agent fallback routes - Preserves underlying failure reason in title_status on local fallback Co-authored-by: Frank Song <[email protected]> * feat: session attention indicators in right slot + last_message_at timestamps (#1024) PR #1024 (@franksong2702): Polish session attention indicators - Streaming spinners and unread dots now reuse the right-side actions slot - Running/unread rows hide timestamps; idle/read rows keep right-aligned timestamps - Date group carets point down when expanded, right when collapsed - Pinned group no longer repeats pinned-star icon per row - Running indicators appear immediately after send (local busy state while /api/sessions catches up) - Sidebar sorting/grouping/timestamps now prefer last_message_at (derived from last real message) so metadata-only saves don't make old sessions appear under Today Co-authored-by: Frank Song <[email protected]> * docs: v0.50.207 release notes — 10 PRs, 2169 tests (+36) --------- Co-authored-by: bergeouss <[email protected]> Co-authored-by: Josh <[email protected]> Co-authored-by: Frank Song <[email protected]> Co-authored-by: nesquena-hermes <[email protected]>

nesquena-hermes · 2026-04-25T20:08:36Z

Merged in v0.50.207 via stage PR #1031. Thank you @franksong2702! 🎉

@franksong2702

…n UX, folder create, model errors, session speed, title gen (nesquena#1031) * fix: remove orphaned i18n keys from top-level LOCALES object Three Traditional Chinese translation keys (cmd_status, memory_saved, profile_delete_title) were placed outside any locale block between the en and ru blocks in static/i18n.js. They became top-level properties of the LOCALES object, causing them to appear as invalid language options in the Settings > Preferences dropdown. The correct translations already exist in the zh-Hant locale block. Fixes nesquena#1008 * fix: block stale SSE events from polluting new session's DOM - appendThinking(): guard with !S.session||!S.activeStreamId to drop events from a previous session's SSE stream during a session switch - appendLiveToolCard(): same guard for consistency - finalizeThinkingCard(): scroll thinking-card-body to top when scroll is pinned, so completed response is immediately visible - appendThinking(): auto-scroll thinking card body to bottom while streaming if user is watching (scroll pinned) * Fix empty agent sessions in sidebar * fix: resolve cron UI UX issues — icon ambiguity, toast overlap, running status Fixes nesquena#995 — three sub-issues in the Cron Jobs UI: 1. Dual play icons ambiguous: Resume button now shows a distinct play+bar icon (play triangle + vertical line) instead of the identical triangle used by Run now. 2. Toast notification overlapping header buttons: Added position:relative; z-index:10 to .main-view-header so it stacks above the fixed toast (z-index:100 within its layer). 3. No running status after trigger: After triggering a job, the status badge immediately shows 'running…' with a CSS spinner animation, and polls the cron list every 3s (up to 30s) to refresh when the job completes. - Added cron_status_running i18n key in all 5 locales (en, es, de, ru, zh, zh-Hant) - Added .detail-badge.running CSS class with spinner animation - New functions: _setCronDetailStatus(), _startCronRunningPoll() * fix(nesquena#1011): address review feedback — poll cleanup, badge persistence, 30s fallback - _clearCronDetail() now clears _cronRunningPoll interval on navigation - Poll re-applies 'running' badge after loadCrons() re-render (prevents flicker) - When poll ends (30s max), detail re-renders with actual status as fallback * feat: create folder and add space directly from UI (nesquena#782) - After creating a folder via the file tree New folder button, offer to add it as a space via confirm dialog - Add Create folder if it doesnt exist checkbox in the New Space form - Backend: support create flag in /api/workspaces/add to mkdir before validation - i18n: 4 new keys (folder_add_as_space_title/msg/btn, workspace_auto_create_folder) in all 6 locales * fix: validate workspace path before mkdir to prevent orphan directories Review feedback (critical): the previous code called mkdir() before validate_workspace_to_add(), which meant a rejected path (e.g. system dir) would leave an orphan directory on disk. New flow: 1. Resolve path and check against blocked system roots BEFORE any mutation 2. mkdir() only if path passes the blocklist check 3. Full validation (exists, is_dir) after mkdir Also imports _workspace_blocked_roots for the pre-mutation blocklist check. * fix(nesquena#1014): classify model-not-found errors with helpful message - Add model_not_found error type to streaming.py exception classifier - Detect 404, 'not found', 'does not exist', 'invalid model' patterns - Strip HTML tags from provider error messages (nginx 404 pages, etc.) - Add model_not_found branch to apperror handler in messages.js - Add i18n key model_not_found_label in all 6 locales - 15 tests covering detection, sanitization, frontend, and i18n * feat(ui): add live TPS stat to header Adds a TPS (Tokens Per Second) chip to the right of the header title bar that updates live while AI output is streaming. Metering (api/metering.py) - Tracks per-session output + reasoning tokens via GlobalMeter singleton - Per-session TPS = total_tokens / elapsed_time - Global TPS = average of active sessions' TPS values - HIGH/LOW are max/min of global_tps snapshots over a 60-minute rolling window (only recorded when > 0, so idle periods are excluded) - Thread-safe with a single lock Metering events emitted from streaming.py - Throttled at 100ms from token/reasoning/tool callbacks so the display updates rapidly during fast token streams - 1Hz ticker as fallback for slow streams (exits when no active sessions) - Final stats emitted on stream end Routes (api/routes.py) - Removed POST /api/metering/interval endpoint (dynamic interval via focus/blur was replaced with simple always-1s-when-active approach) UI (static/messages.js, index.html, style.css) - TPS chip in titlebar: shows 'N.N t/s . N.N high . N.N low' - Default: '0.0 t/s . 0.0 high' when idle - Display updates on every metering SSE event (throttled to 100ms) * feat: session restore speed + title gen reasoning hardening (nesquena#1025, nesquena#1026) PR nesquena#1025 (@franksong2702): Speed up large session restore paths - GET /api/session?messages=0 now parses only metadata before the messages array - Metadata-only loads no longer populate the full-session LRU cache - Frontend lazy fetch uses resolve_model=0 to avoid cold model-catalog lookup - Hard reload no longer waits for populateModelDropdown() before restoring session PR nesquena#1026 (@franksong2702): Harden auto title generation for reasoning models - Raises title-gen completion budget to 512 tokens (reasoning-safe) - Retries once with 1024 tokens on empty content / finish_reason:length - Applies retry to both auxiliary and active-agent fallback routes - Preserves underlying failure reason in title_status on local fallback Co-authored-by: Frank Song <[email protected]> * feat: session attention indicators in right slot + last_message_at timestamps (nesquena#1024) PR nesquena#1024 (@franksong2702): Polish session attention indicators - Streaming spinners and unread dots now reuse the right-side actions slot - Running/unread rows hide timestamps; idle/read rows keep right-aligned timestamps - Date group carets point down when expanded, right when collapsed - Pinned group no longer repeats pinned-star icon per row - Running indicators appear immediately after send (local busy state while /api/sessions catches up) - Sidebar sorting/grouping/timestamps now prefer last_message_at (derived from last real message) so metadata-only saves don't make old sessions appear under Today Co-authored-by: Frank Song <[email protected]> * docs: v0.50.207 release notes — 10 PRs, 2169 tests (+36) --------- Co-authored-by: bergeouss <[email protected]> Co-authored-by: Josh <[email protected]> Co-authored-by: Frank Song <[email protected]> Co-authored-by: nesquena-hermes <[email protected]>

Harden title generation for reasoning models

261ffc8

nesquena-hermes requested a review from nesquena April 25, 2026 18:44

nesquena-hermes mentioned this pull request Apr 25, 2026

Harden auto title generation for reasoning models #1002

Closed

nesquena mentioned this pull request Apr 25, 2026

Speed up large session restore paths #1025

Closed

nesquena-hermes added bug Something isn't working merge soon Reviewed and queued for next release batch labels Apr 25, 2026

nesquena approved these changes Apr 25, 2026

View reviewed changes

nesquena-hermes mentioned this pull request Apr 25, 2026

v0.50.207: batch of 10 PRs — TPS stat, SSE guard, session polish, cron UX, folder create, model errors, session speed, title gen #1031

Merged

nesquena-hermes closed this Apr 25, 2026

nesquena-hermes deleted the integrate/pr1002 branch April 25, 2026 20:08

nesquena mentioned this pull request Apr 26, 2026

perf+ui: fix session-switch bottleneck -- /api/models from ~30s to ~1ms on restart #1060

Closed

franksong2702 mentioned this pull request Apr 27, 2026

bug(title): do not persist generic Conversation topic fallback #1155

Closed

nesquena-hermes mentioned this pull request Apr 27, 2026

fix(title): do not persist generic fallback titles #1157

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden auto title generation for reasoning models#1026

Harden auto title generation for reasoning models#1026
nesquena-hermes wants to merge 1 commit intomasterfrom
integrate/pr1002

nesquena-hermes commented Apr 25, 2026

Uh oh!

nesquena left a comment

Uh oh!

nesquena-hermes commented Apr 25, 2026

Uh oh!

nesquena-hermes commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nesquena-hermes commented Apr 25, 2026

Attribution

Summary

Changes

Scope

Verification

Uh oh!

nesquena left a comment

Choose a reason for hiding this comment

Review — end-to-end ✅ (clean, no fixes needed)

Traced against upstream hermes-agent

What this PR does

End-to-end trace (the #869 happy path)

Edge-case trace

Backwards compatibility

Tests

Security audit

Recommendation

Uh oh!

nesquena-hermes commented Apr 25, 2026

Uh oh!

nesquena-hermes commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants