feat(chat): MCP tool call loop, engine status timeline, and UI polish by rayone · Pull Request #509 · jundot/omlx

rayone · 2026-04-01T10:16:11Z

Problem

The built-in chat UI (/admin/chat) silently dropped responses when a model decided to call an MCP tool. streamResponse only handled delta.content and delta.reasoning_content — on finish_reason: tool_calls there was no content delta, so streamingContent stayed empty and nothing was pushed to the message list.

Solution

MCP tool call loop

Accumulate delta.tool_calls chunks during streaming (OpenAI chunked format, keyed by index)
On finish_reason: tool_calls: push a hidden assistant message with tool_calls, execute all tools in parallel via POST /v1/mcp/execute, push hidden tool result messages, then recurse into streamResponse so the model receives full tool context and streams a final answer
Depth limit via MAX_TOOL_DEPTH = 10 constant; abort guard before recursion respects user clicking Stop
30s per-tool timeout (AbortSignal.timeout + AbortSignal.any), HTTP error detection on non-2xx responses
Bookkeeping assistant/tool protocol messages hidden from display via _ui: false
Simplified messagesForApi to a single filter+map pass that preserves tool_calls and tool_call_id fields

Engine status timeline

Prefill progress poller (polls /admin/api/stats every 500ms): shows Prefilling N% → Generating
Per-tool timing logged on completion/failure with check/error icons and elapsed time
Collapsible timeline panel (activity icon toggle in message header) — entries persist on the completed message and can be expanded post-completion
Live status line (Starting / Prefilling / Generating / Calling tool…) with animated dots, rendered below the timeline entries

Reasoning bubble

reasoning_content deltas accumulate live in a collapsible Thinking… bubble rendered as markdown
Collapses to Thinking (no ellipsis) once final content arrives

Bug fixes

Response box no longer disappears on reasoning-only or empty responses — always push an assistant message on stream completion
Timeline toggle works consistently after content arrives — removed !finalContent guard that blocked the panel
Alpine reactivity for _perfLog: messages.splice(i, 1, { ...msg, ...patch }) instead of direct property assignment so Alpine detects the change and re-renders the toggle button

Files changed

File	Change
`omlx/admin/templates/chat.html`	MCP loop, status timeline, thinking bubble, bug fixes
`tests/test_chat_tool_call.py`	New file — 18 tests

Testing

pytest tests/test_chat_tool_call.py -v   # 18 passed

Manually verified with Qwen3.5-9B-MLX-4bit + simultaneous filesystem and Tavily MCP servers on macOS (Apple Silicon).

Checklist

Feature branch from main
Tests added and passing
SPDX-License-Identifier: Apache-2.0 present in new Python file
Code style matches project conventions
pyproject.toml unchanged — no new dependencies

The built-in chat UI (/admin/chat) silently dropped responses whenever a model returned finish_reason: tool_calls — streamResponse only handled delta.content and delta.reasoning_content, so tool-calling models produced no visible output. MCP tool call loop: - Accumulate delta.tool_calls streaming chunks by index (OpenAI format) - On finish_reason tool_calls: push hidden assistant message with tool_calls, execute all tools in parallel via POST /v1/mcp/execute, push hidden tool result messages, recurse into streamResponse so the model streams a final answer with full tool context - Depth limit (MAX_TOOL_DEPTH = 10); abort guard before recursion - 30s per-tool timeout via AbortSignal.timeout + AbortSignal.any - HTTP error detection on /v1/mcp/execute (non-2xx) - Hide bookkeeping assistant/tool messages via _ui: false flag - Simplified messagesForApi to single filter+map preserving tool fields Engine status timeline and streaming UI: - Prefill progress poller (/admin/api/stats every 500ms): Prefilling N% - Per-tool timing with check/error icons and elapsed time - Collapsible timeline panel (activity icon toggle) persists on completed messages and can be expanded post-completion - Live status line (Starting / Prefilling / Generating / Calling tool) with animated dots, always at the bottom of the timeline - Live reasoning bubble: reasoning_content deltas in a collapsible Thinking bubble rendered as markdown Bug fixes: - Response box no longer disappears on reasoning-only or empty responses - Timeline toggle works consistently: removed !finalContent guard - Alpine reactivity: messages.splice instead of direct property mutation for _perfLog attachment so Alpine re-renders the toggle button Tests (tests/test_chat_tool_call.py, 18 tests): - Message filtering, chunk accumulation, depth boundary, result format contracts, tool status error format, abort guard logic Tested with Qwen3.5-9B-MLX-4bit + simultaneous filesystem and Tavily MCP servers on macOS (Apple Silicon).

Remove the _noisy filter that was blocking Starting, Generating, Prefilling, Loading model, and Thinking status from appearing in the collapsible timeline panel.

jundot · 2026-04-04T20:18:18Z

Nice work on this. The MCP tool loop, status timeline, and thinking bubble are all solid additions. I verified the endpoint contracts (/v1/mcp/tools, /v1/mcp/execute, //admin/api/stats) and everything lines up. Security-wise, all new x-html bindings properly go through DOMPurify.sanitize(marked.parse()) so no XSS concerns.

I found a couple of small bugs that i'll fix in a follow-up commit after merging:

Hidden messages visible in UI: assistant messages with _ui: false (the tool_calls bookkeeping turns) still show as empty bubbles because the template only checks msg.role === 'assistant'. Adding && msg._ui !== false to the x-show on line 555 fixes it.
Stray </think> on abort: the abort handler still appends </think> to streamingContent, but reasoning now accumulates in streamingThinking separately. If a user clicks Stop during thinking, a literal </think> can appear in the visible content.

Nothing blocking, merging now.

- add msg._ui !== false guard to assistant message x-show so tool_calls bookkeeping turns don't render as empty bubbles - replace streamingContent </think> append with thinkingState reset in abort handler, since reasoning now accumulates in streamingThinking - fix whitespace inconsistencies introduced in #509

jundot force-pushed the main branch from 2d46d30 to d0f5a38 Compare April 2, 2026 02:13

rayone added 2 commits April 3, 2026 16:28

fix(chat): show engine status events in timeline

ad67523

Remove the _noisy filter that was blocking Starting, Generating, Prefilling, Loading model, and Thinking status from appearing in the collapsible timeline panel.

fix(chat): show engine status events in timeline

f1bdd49

Remove the _noisy filter that was blocking Starting, Generating, Prefilling, Loading model, and Thinking status from appearing in the collapsible timeline panel.

jundot merged commit 9b6ce99 into jundot:main Apr 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(chat): MCP tool call loop, engine status timeline, and UI polish#509

feat(chat): MCP tool call loop, engine status timeline, and UI polish#509
jundot merged 3 commits intojundot:mainfrom
rayone:feat/chat-mcp-tools

rayone commented Apr 1, 2026

Uh oh!

jundot commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rayone commented Apr 1, 2026

Problem

Solution

MCP tool call loop

Engine status timeline

Reasoning bubble

Bug fixes

Files changed

Testing

Checklist

Uh oh!

jundot commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants