Skip to content

Show agent turn duration in WebUI#1592

Merged
2 commits merged intonesquena:masterfrom
Michaelyklam:feat/turn-duration-display
May 4, 2026
Merged

Show agent turn duration in WebUI#1592
2 commits merged intonesquena:masterfrom
Michaelyklam:feat/turn-duration-display

Conversation

@Michaelyklam
Copy link
Copy Markdown
Contributor

Summary

  • measure assistant turn duration from the backend pending_started_at timestamp and include it in the streaming done usage payload
  • persist the value on assistant messages as _turnDuration so reloads keep the display
  • show Done in … on the compact Activity row, and as a subtle assistant footer chip in expanded tool-call mode

Screenshots / QA

  • Compact mode: Activity row shows Done in 1m 12s, no duplicate footer duration
  • Expanded mode: individual tool cards remain expanded and assistant footer shows Done in 1m 12s

Local browser QA screenshots were captured during validation:

  • MEDIA:/home/michael/.hermes/cache/screenshots/browser_screenshot_143f3490bff248628f89441683062dbf.png
  • MEDIA:/home/michael/.hermes/cache/screenshots/browser_screenshot_74f62a3c45ae4d37ab05c898aa850752.png

Tests

  • python -m pytest tests/test_turn_duration_display.py tests/test_ui_tool_call_cleanup.py tests/test_streaming_markdown.py tests/test_sprint42.py tests/test_sprint49.py -q → 97 passed
  • git diff --check
  • python -m py_compile api/streaming.py
  • Full python -m pytest tests/ -q attempted: 4088 passed, 9 failed, 2 skipped. The 9 failures are existing environment/config-sensitive tests unrelated to this change (test_issue1094_provider_bugs.py, test_model_resolver.py, onboarding MVP tests, and test_sprint28.py::test_personalities_empty_when_none_exist).

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Thanks @Michaelyklam — duration on the Activity row is the kind of small thing that meaningfully changes how the UI feels during long agent turns. The implementation reads cleanly: backend stores _turnDuration on the assistant message so reloads keep the display, the streaming done payload now carries duration_seconds, and the renderer hides the footer chip when compact activity is showing the same value to avoid duplication. That last bit is a nice touch.

I pulled the branch and ran the new + adjacent suites:

tests/test_turn_duration_display.py
tests/test_ui_tool_call_cleanup.py
tests/test_streaming_markdown.py
tests/test_sprint42.py tests/test_sprint49.py
→ 97 passed in 2.41s

Plus a focused sweep across streaming / sse / usage / activity-tagged tests (~310 passed). No new regressions surfaced.

A few notes from the diff

  1. Start-time fallback when pending_started_at is missing. Line in api/streaming.py:

    _turn_started_at = getattr(s, 'pending_started_at', None) or time.time()

    This is the right default for normal flows — /api/chat/start sets pending_started_at = time.time() immediately before the agent thread spawns. For a recovered/re-started session where pending state was already cleared by _recover_pending_turn before this code runs, the fallback to "now" means duration starts ticking from inside _run_agent_streaming, which is close enough to the truth that I don't think it's worth over-engineering. Worth a one-line comment though, since "or time.time()" reads as if it could silently produce a near-zero duration in a regression scenario.

  2. The 0 falsy edge case. getattr(...) or time.time() will also fall through if pending_started_at == 0 (truthy-falsy). That's not realistic since it's a UNIX timestamp set with time.time(), but pending_started_at if pending_started_at is not None else time.time() is a touch safer if you want to be explicit. Optional.

  3. Compact-mode duplicate suppression. The new logic in renderMessages():

    const compactActivityForMessage=isSimplifiedToolCalling()&&(
      assistantThinking.has(mi)||
      (S.toolCalls||[]).some(tc=>tc&&(tc.assistant_msg_idx!==undefined?tc.assistant_msg_idx:-1)===mi)
    );
    const durationText=compactActivityForMessage?'':_formatTurnDuration(msg._turnDuration);

    This correctly suppresses the footer chip when the Activity row is showing duration. But there's a subtle corner: a turn that produces only an assistant text reply (no tool calls, no thinking) has no Activity group, so the duration falls through to the footer — which is what we want. If a turn has thinking but no tool calls and the user has compact mode off, assistantThinking.has(mi) is true so the message gets compacted — wait, no, compactActivityForMessage requires isSimplifiedToolCalling() first. Re-read: ok, it's gated on compact mode being on, and only suppresses the footer when there's actually an Activity row to show duration in. Good.

  4. The data-turn-duration attribute round-trip. _syncToolCallGroupSummary reads group.dataset.turnDuration, which means a group rendered before duration arrived (e.g. mid-stream) won't show the duration text until the attribute is set + summary is re-synced. The attribute set happens in renderMessages() based on sourceMsg._turnDuration, and attachLiveStream updates lastAsst._turnDuration from d.usage.duration_seconds on the done payload. So the path is: done event → _turnDuration populated → next render sets data-turn-duration_syncToolCallGroupSummary reads it. Looks correctly wired.

  5. Tiny formatter nit. _formatTurnDuration returns ${m}m ${s}s for ≥60s but ${h}h ${m}m for ≥3600s — dropping seconds at the hour boundary. That's fine and probably desired (nobody cares about the seconds in a 2h17m turn), but a 1h00m12s turn renders as "1h 0m" which reads slightly odd. Not blocking.

  6. 97 passed locally vs the 9 unrelated failures in full pytest. Confirmed those are environment-dependent (test_issue1094_provider_bugs.py, model_resolver, onboarding MVP, sprint28 personalities) and pre-existed master. Not blocking on this PR.

One thing worth double-checking

The _turnDuration is only persisted on the last assistant message in the loop:

if s.messages:
    for _dm in reversed(s.messages):
        if isinstance(_dm, dict) and _dm.get('role') == 'assistant':
            _dm['_turnDuration'] = round(_turn_duration_seconds, 3)
            break

That matches how the existing _turnUsage is written elsewhere, so it's consistent. But if a turn produces multiple assistant messages (a tool-call assistant message followed by a final-text assistant message — the standard tool-use pattern), only the final one gets the duration. The Activity row shows duration via data-turn-duration set on the assistant index that has the activity group, which… let me re-check.

Looking at renderMessages():

const sourceMsg=S.messages[aIdx]||{};
if(sourceMsg._turnDuration!==undefined) group.setAttribute('data-turn-duration', String(sourceMsg._turnDuration));

aIdx here is the assistant index that anchors the activity group. Tool-use turns typically have one consolidated final assistant message that owns the activity group, so this should land correctly. Worth a manual QA pass on a multi-turn-step scenario (Codex doing 5+ tool calls before its final reply) to confirm "Done in 3m 12s" actually appears on the Activity row in both compact and expanded modes — your screenshot captures this case I think but worth verifying once more.

Verdict

This is well-scoped, has solid test coverage, and the UX choice to dedupe duration between Activity row and footer is the right call. Will queue for stage review. Thanks for the careful work.

@Michaelyklam
Copy link
Copy Markdown
Contributor Author

Thanks for the careful review — I made the small cleanup you called out around the start-time fallback.

Follow-up commit: 0eddb05

What changed:

  • Switched the start-time fallback from getattr(..., None) or time.time() to an explicit is not None check, so even an explicit falsy timestamp is preserved.
  • Added a short comment explaining that pending_started_at is the normal path and time.time() is only for recovered/legacy flows where the marker is absent.
  • Tightened the regression test to cover that explicit fallback and comment so this doesn't drift back.

I also re-checked the multi-step/tool-call UI path you mentioned with a synthetic browser QA turn containing thinking + 3 tools:

  • Compact mode: single Activity: thinking + 3 tools row shows Done in 3m 12s, with no duplicate footer duration.
  • Expanded mode: individual tool rows are shown and the assistant footer shows Done in 3m 12s, with no compact Activity duration row.

Verification:

  • /home/michael/.hermes/hermes-agent/venv/bin/python -m pytest tests/test_turn_duration_display.py tests/test_ui_tool_call_cleanup.py tests/test_streaming_markdown.py tests/test_sprint42.py tests/test_sprint49.py -q → 97 passed
  • git diff --check
  • /home/michael/.hermes/hermes-agent/venv/bin/python -m py_compile api/streaming.py
  • Browser QA screenshots:
    • compact: MEDIA:/home/michael/.hermes/cache/screenshots/browser_screenshot_252bbf66db4044219902f0b1687fc28c.png
    • expanded: MEDIA:/home/michael/.hermes/cache/screenshots/browser_screenshot_e3201d647b3b4e93a1491eded15fd16e.png

nesquena-hermes pushed a commit that referenced this pull request May 4, 2026
… — 4094→4111 tests

- #1586 (Michaelyklam): login asset SW cache exemption
- #1590 (Michaelyklam): hot-apply compact tool activity setting
- #1591 (Michaelyklam): first-turn sidebar visibility (optimistic upserts)
- #1592 (Michaelyklam): turn duration display (Done in 1m 12s) + Opus follow-up (truthy-check on _pending_started_at)
- #1464 (JKJameson, maintainer-augmented): workspace dropdown sort+search+chip-sync (rebased + ternary fix + regression test)

Maintainer-side test fixes in stage:
- tests/test_465_session_branching.py: widen compact() search window 1500→3000
- tests/test_regressions.py: anchor on api('/api/chat/start' instead of comment line

Browser API sanity: 11/11 passed. Live UX verification: vision-confirmed dropdown sort+search+empty-state on test server. Opus advisor: SHIP AS-IS.
@nesquena-hermes nesquena-hermes closed this pull request by merging all changes into nesquena:master in 4559163 May 4, 2026
pull Bot pushed a commit to JamesWilliam1977/hermes-webui that referenced this pull request May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants