feat(chat): MCP tool call loop, engine status timeline, and UI polish#509
feat(chat): MCP tool call loop, engine status timeline, and UI polish#509jundot merged 3 commits intojundot:mainfrom
Conversation
The built-in chat UI (/admin/chat) silently dropped responses whenever a model returned finish_reason: tool_calls — streamResponse only handled delta.content and delta.reasoning_content, so tool-calling models produced no visible output. MCP tool call loop: - Accumulate delta.tool_calls streaming chunks by index (OpenAI format) - On finish_reason tool_calls: push hidden assistant message with tool_calls, execute all tools in parallel via POST /v1/mcp/execute, push hidden tool result messages, recurse into streamResponse so the model streams a final answer with full tool context - Depth limit (MAX_TOOL_DEPTH = 10); abort guard before recursion - 30s per-tool timeout via AbortSignal.timeout + AbortSignal.any - HTTP error detection on /v1/mcp/execute (non-2xx) - Hide bookkeeping assistant/tool messages via _ui: false flag - Simplified messagesForApi to single filter+map preserving tool fields Engine status timeline and streaming UI: - Prefill progress poller (/admin/api/stats every 500ms): Prefilling N% - Per-tool timing with check/error icons and elapsed time - Collapsible timeline panel (activity icon toggle) persists on completed messages and can be expanded post-completion - Live status line (Starting / Prefilling / Generating / Calling tool) with animated dots, always at the bottom of the timeline - Live reasoning bubble: reasoning_content deltas in a collapsible Thinking bubble rendered as markdown Bug fixes: - Response box no longer disappears on reasoning-only or empty responses - Timeline toggle works consistently: removed !finalContent guard - Alpine reactivity: messages.splice instead of direct property mutation for _perfLog attachment so Alpine re-renders the toggle button Tests (tests/test_chat_tool_call.py, 18 tests): - Message filtering, chunk accumulation, depth boundary, result format contracts, tool status error format, abort guard logic Tested with Qwen3.5-9B-MLX-4bit + simultaneous filesystem and Tavily MCP servers on macOS (Apple Silicon).
Remove the _noisy filter that was blocking Starting, Generating, Prefilling, Loading model, and Thinking status from appearing in the collapsible timeline panel.
Remove the _noisy filter that was blocking Starting, Generating, Prefilling, Loading model, and Thinking status from appearing in the collapsible timeline panel.
|
Nice work on this. The MCP tool loop, status timeline, and thinking bubble are all solid additions. I verified the endpoint contracts ( I found a couple of small bugs that i'll fix in a follow-up commit after merging:
Nothing blocking, merging now. |
- add msg._ui !== false guard to assistant message x-show so tool_calls bookkeeping turns don't render as empty bubbles - replace streamingContent </think> append with thinkingState reset in abort handler, since reasoning now accumulates in streamingThinking - fix whitespace inconsistencies introduced in #509
Problem
The built-in chat UI (
/admin/chat) silently dropped responses when a model decided to call an MCP tool.streamResponseonly handleddelta.contentanddelta.reasoning_content— onfinish_reason: tool_callsthere was no content delta, sostreamingContentstayed empty and nothing was pushed to the message list.Solution
MCP tool call loop
delta.tool_callschunks during streaming (OpenAI chunked format, keyed byindex)finish_reason: tool_calls: push a hiddenassistantmessage withtool_calls, execute all tools in parallel viaPOST /v1/mcp/execute, push hiddentoolresult messages, then recurse intostreamResponseso the model receives full tool context and streams a final answerMAX_TOOL_DEPTH = 10constant; abort guard before recursion respects user clicking StopAbortSignal.timeout+AbortSignal.any), HTTP error detection on non-2xx responsesassistant/toolprotocol messages hidden from display via_ui: falsemessagesForApito a single filter+map pass that preservestool_callsandtool_call_idfieldsEngine status timeline
/admin/api/statsevery 500ms): showsPrefilling N%→GeneratingStarting/Prefilling/Generating/Calling tool…) with animated dots, rendered below the timeline entriesReasoning bubble
reasoning_contentdeltas accumulate live in a collapsibleThinking…bubble rendered as markdownThinking(no ellipsis) once final content arrivesBug fixes
!finalContentguard that blocked the panel_perfLog:messages.splice(i, 1, { ...msg, ...patch })instead of direct property assignment so Alpine detects the change and re-renders the toggle buttonFiles changed
omlx/admin/templates/chat.htmltests/test_chat_tool_call.pyTesting
Manually verified with Qwen3.5-9B-MLX-4bit + simultaneous filesystem and Tavily MCP servers on macOS (Apple Silicon).
Checklist
mainSPDX-License-Identifier: Apache-2.0present in new Python filepyproject.tomlunchanged — no new dependencies