Skip to content

feat(chat): MCP tool call loop, engine status timeline, and UI polish#509

Merged
jundot merged 3 commits intojundot:mainfrom
rayone:feat/chat-mcp-tools
Apr 4, 2026
Merged

feat(chat): MCP tool call loop, engine status timeline, and UI polish#509
jundot merged 3 commits intojundot:mainfrom
rayone:feat/chat-mcp-tools

Conversation

@rayone
Copy link
Copy Markdown
Contributor

@rayone rayone commented Apr 1, 2026

Problem

The built-in chat UI (/admin/chat) silently dropped responses when a model decided to call an MCP tool. streamResponse only handled delta.content and delta.reasoning_content — on finish_reason: tool_calls there was no content delta, so streamingContent stayed empty and nothing was pushed to the message list.

Solution

MCP tool call loop

  • Accumulate delta.tool_calls chunks during streaming (OpenAI chunked format, keyed by index)
  • On finish_reason: tool_calls: push a hidden assistant message with tool_calls, execute all tools in parallel via POST /v1/mcp/execute, push hidden tool result messages, then recurse into streamResponse so the model receives full tool context and streams a final answer
  • Depth limit via MAX_TOOL_DEPTH = 10 constant; abort guard before recursion respects user clicking Stop
  • 30s per-tool timeout (AbortSignal.timeout + AbortSignal.any), HTTP error detection on non-2xx responses
  • Bookkeeping assistant/tool protocol messages hidden from display via _ui: false
  • Simplified messagesForApi to a single filter+map pass that preserves tool_calls and tool_call_id fields

Engine status timeline

  • Prefill progress poller (polls /admin/api/stats every 500ms): shows Prefilling N%Generating
  • Per-tool timing logged on completion/failure with check/error icons and elapsed time
  • Collapsible timeline panel (activity icon toggle in message header) — entries persist on the completed message and can be expanded post-completion
  • Live status line (Starting / Prefilling / Generating / Calling tool…) with animated dots, rendered below the timeline entries

Reasoning bubble

  • reasoning_content deltas accumulate live in a collapsible Thinking… bubble rendered as markdown
  • Collapses to Thinking (no ellipsis) once final content arrives

Bug fixes

  • Response box no longer disappears on reasoning-only or empty responses — always push an assistant message on stream completion
  • Timeline toggle works consistently after content arrives — removed !finalContent guard that blocked the panel
  • Alpine reactivity for _perfLog: messages.splice(i, 1, { ...msg, ...patch }) instead of direct property assignment so Alpine detects the change and re-renders the toggle button

Files changed

File Change
omlx/admin/templates/chat.html MCP loop, status timeline, thinking bubble, bug fixes
tests/test_chat_tool_call.py New file — 18 tests

Testing

pytest tests/test_chat_tool_call.py -v   # 18 passed

Manually verified with Qwen3.5-9B-MLX-4bit + simultaneous filesystem and Tavily MCP servers on macOS (Apple Silicon).

Checklist

  • Feature branch from main
  • Tests added and passing
  • SPDX-License-Identifier: Apache-2.0 present in new Python file
  • Code style matches project conventions
  • pyproject.toml unchanged — no new dependencies

The built-in chat UI (/admin/chat) silently dropped responses whenever a
model returned finish_reason: tool_calls — streamResponse only handled
delta.content and delta.reasoning_content, so tool-calling models produced
no visible output.

MCP tool call loop:
- Accumulate delta.tool_calls streaming chunks by index (OpenAI format)
- On finish_reason tool_calls: push hidden assistant message with
  tool_calls, execute all tools in parallel via POST /v1/mcp/execute,
  push hidden tool result messages, recurse into streamResponse so the
  model streams a final answer with full tool context
- Depth limit (MAX_TOOL_DEPTH = 10); abort guard before recursion
- 30s per-tool timeout via AbortSignal.timeout + AbortSignal.any
- HTTP error detection on /v1/mcp/execute (non-2xx)
- Hide bookkeeping assistant/tool messages via _ui: false flag
- Simplified messagesForApi to single filter+map preserving tool fields

Engine status timeline and streaming UI:
- Prefill progress poller (/admin/api/stats every 500ms): Prefilling N%
- Per-tool timing with check/error icons and elapsed time
- Collapsible timeline panel (activity icon toggle) persists on completed
  messages and can be expanded post-completion
- Live status line (Starting / Prefilling / Generating / Calling tool)
  with animated dots, always at the bottom of the timeline
- Live reasoning bubble: reasoning_content deltas in a collapsible
  Thinking bubble rendered as markdown

Bug fixes:
- Response box no longer disappears on reasoning-only or empty responses
- Timeline toggle works consistently: removed !finalContent guard
- Alpine reactivity: messages.splice instead of direct property mutation
  for _perfLog attachment so Alpine re-renders the toggle button

Tests (tests/test_chat_tool_call.py, 18 tests):
- Message filtering, chunk accumulation, depth boundary, result format
  contracts, tool status error format, abort guard logic

Tested with Qwen3.5-9B-MLX-4bit + simultaneous filesystem and Tavily MCP
servers on macOS (Apple Silicon).
rayone added 2 commits April 3, 2026 16:28
Remove the _noisy filter that was blocking Starting, Generating,
Prefilling, Loading model, and Thinking status from appearing in
the collapsible timeline panel.
Remove the _noisy filter that was blocking Starting, Generating,
Prefilling, Loading model, and Thinking status from appearing in
the collapsible timeline panel.
@jundot
Copy link
Copy Markdown
Owner

jundot commented Apr 4, 2026

Nice work on this. The MCP tool loop, status timeline, and thinking bubble are all solid additions. I verified the endpoint contracts (/v1/mcp/tools, /v1/mcp/execute, //admin/api/stats) and everything lines up. Security-wise, all new x-html bindings properly go through DOMPurify.sanitize(marked.parse()) so no XSS concerns.

I found a couple of small bugs that i'll fix in a follow-up commit after merging:

  1. Hidden messages visible in UI: assistant messages with _ui: false (the tool_calls bookkeeping turns) still show as empty bubbles because the template only checks msg.role === 'assistant'. Adding && msg._ui !== false to the x-show on line 555 fixes it.

  2. Stray </think> on abort: the abort handler still appends </think> to streamingContent, but reasoning now accumulates in streamingThinking separately. If a user clicks Stop during thinking, a literal </think> can appear in the visible content.

Nothing blocking, merging now.

@jundot jundot merged commit 9b6ce99 into jundot:main Apr 4, 2026
jundot added a commit that referenced this pull request Apr 4, 2026
- add msg._ui !== false guard to assistant message x-show so tool_calls
  bookkeeping turns don't render as empty bubbles
- replace streamingContent </think> append with thinkingState reset in
  abort handler, since reasoning now accumulates in streamingThinking
- fix whitespace inconsistencies introduced in #509
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants