fix: strip webui metadata from messages before LLM API call (#66)#67
Conversation
The webui stores display-only fields on messages (attachments, timestamp, _ts) for UI rendering. These leaked into the conversation_history passed to AIAgent.run_conversation(). Most providers ignore unknown fields, but Z.AI/GLM tries to deserialize 'attachments' as its native ChatAttachments type, causing HTTP 400 on every subsequent message after an image upload. Fix: _sanitize_messages_for_api() creates a clean copy with only API-standard keys (role, content, tool_calls, tool_call_id, name, refusal) before passing to run_conversation(). Applied to both the streaming path (streaming.py) and non-streaming path (routes.py). Closes #66 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
Agent review — APPROVED ✅ Full diff, security audit, and test run completed. Diff: 2 files, +25 / -2 Code review: Implementation is correct. Whitelist approach ( Root cause is correctly identified: display-only webui fields ( Merging now. |
…squena#67) The webui stores display-only fields on messages (attachments, timestamp, _ts) for UI rendering. These leaked into the conversation_history passed to AIAgent.run_conversation(). Most providers ignore unknown fields, but Z.AI/GLM tries to deserialize 'attachments' as its native ChatAttachments type, causing HTTP 400 on every subsequent message after an image upload. Fix: _sanitize_messages_for_api() creates a clean copy with only API-standard keys (role, content, tool_calls, tool_call_id, name, refusal) before passing to run_conversation(). Applied to both the streaming path (streaming.py) and non-streaming path (routes.py). Closes nesquena#66 Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
…squena#67) The webui stores display-only fields on messages (attachments, timestamp, _ts) for UI rendering. These leaked into the conversation_history passed to AIAgent.run_conversation(). Most providers ignore unknown fields, but Z.AI/GLM tries to deserialize 'attachments' as its native ChatAttachments type, causing HTTP 400 on every subsequent message after an image upload. Fix: _sanitize_messages_for_api() creates a clean copy with only API-standard keys (role, content, tool_calls, tool_call_id, name, refusal) before passing to run_conversation(). Applied to both the streaming path (streaming.py) and non-streaming path (routes.py). Closes nesquena#66 Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
Adds a "Local Ollama" tile to Settings → Providers that auto-detects
a host-side Ollama daemon, lists installed models, and lets the user
pick one with a single click. Routes through the existing `custom`
OpenAI-compat path — no hermes-agent change needed.
- api/ollama.py — detection probe ordered host.docker.internal →
localhost (10s cache); /api/tags wrapper that flattens the response
into name/size/params/quantization; use_model() that writes
model.{provider:custom,base_url,name} into config.yaml and triggers
the gateway hot-reload added in v0.2.0 PR nesquena#61.
- routes.py — registers GET /api/ollama/{status,models},
POST /api/ollama/{refresh,use-model}.
- static/index.html — tile in Settings → Providers with status dot,
model list, refresh button. Distinct empty/not-found states with
clear next-step guidance (install Ollama / pull a model).
- static/panels.js — loadOllamaLocal() on Settings open;
refreshOllamaLocal() / useOllamaModel() handlers.
The fitb container needs `--add-host=host.docker.internal:host-gateway`
on Linux for the probe to reach the host. The parent repo bumps that
in (Electron's docker-manager.js, install.sh, README's docker-run
example).
Phase 1+2 of issue fox-in-the-box-ai/fox-in-the-box#66; phase 3
(pull/delete UI) and phase 4 (onboarding integration) tracked
separately as nesquena#67 and #11 respectively.
Closes the remaining ACs from nesquena#66 — users no longer need a terminal to install or remove Ollama models. Two new endpoints, a richer Local Ollama tile, real-time progress with speed and ETA. Backend (api/ollama.py): - get_models() now returns total_size_bytes for the disk-usage indicator - _validate_model_name() — server-side allowlist regex (^[A-Za-z0-9._:/-]+$, ≤200 chars). Rejects shell metacharacters, whitespace, path traversal, control chars before any subprocess or HTTP call could touch the value - _iter_pull_stream() / stream_pull() — SSE proxy for Ollama's NDJSON /api/pull. Three event types: progress — every status line with model context attached done — terminal success, triggers cache invalidation error — validation, daemon-down, HTTP 4xx, mid-stream error, JSON parse, network drop. Emitted then stream closes. Client disconnect mid-pull is logged at info — Ollama keeps pulling in the background by design. - delete_model() — looks up size from cached /api/tags before deleting so the response includes freed_bytes for a friendly "deleted X — freed Y MB" toast. Distinguishes 404 (not installed) from generic Ollama errors. Routes (api/routes.py): - POST /api/ollama/pull (SSE) - POST /api/ollama/delete (POST, not DELETE — hermes-webui's request dispatcher only routes GET/POST at the framework level) Frontend (static/panels.js + index.html): - Total-disk indicator in the tile header (visible whenever the daemon is reachable) - Per-row Delete button next to the existing Use button. Confirmation dialog includes the freed-space estimate; toast reports the actual freed bytes from the server response - Pull form below the model list: free-form input with a datalist of curated suggestions (llama3.1:8b, mistral:7b, phi4-mini, deepseek-coder-v2:16b, gemma3:4b, qwen3:4b) - Recommended-models card shown when zero models are installed — one-click Pull buttons for the four canonical recommendations - Progress block uses fetch streaming + manual SSE parse (browsers' EventSource is GET-only; we want POST + body for symmetry with the rest of the API). Renders a percentage bar, current/total bytes, instantaneous bytes/sec speed (rolling 4-sample window), and ETA. After success the list reloads so the new model immediately appears (covers the AC). Verified e2e against an extended mock Ollama: - pull → SSE stream emits 11 progress events then done; mock model store reflects the new model - post-pull /api/ollama/models shows the model with correct size; total_size_bytes increases - delete returns {ok:true, freed_bytes:N}; post-refresh model is gone; total decreases - error paths: empty/invalid name in pull, invalid chars in delete, delete-nonexistent, daemon down — all surface friendly errors, no crashes, no shell injection
Summary
Fixes #66 -- after attaching an image to chat, subsequent messages fail with HTTP 400 from Z.AI/GLM because the webui
attachmentsfield leaks into the API payload.Root cause: The webui stores display-only metadata on messages (
attachments,timestamp,_ts) for UI rendering. These are passed toAIAgent.run_conversation()viaconversation_history=s.messages. Most providers silently ignore unknown fields, but Z.AI/GLM tries to deserializeattachmentsas its nativeChatAttachmentstype and fails with a JSON parse error.Fix: New
_sanitize_messages_for_api()helper instreaming.pycreates a clean copy of messages with only API-standard keys (role,content,tool_calls,tool_call_id,name,refusal). Applied to both code paths:streaming.py:190(streaming/SSE path)routes.py:952(non-streaming fallback path)The original session data is unchanged --
attachments,timestamp, etc. remain for the UI.Files changed
api/streaming.py_sanitize_messages_for_api(), applied torun_conversation()callapi/routes.pyrun_conversation()callTest plan
attachmentsbadge still shows on user messages in the UI (display data preserved)Generated with Claude Code