-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Bug Description
In openviking/models/vlm/backends/openai_vlm.py, all four completion methods (get_completion, get_completion_async, get_vision_completion, get_vision_completion_async) accept a thinking: bool = False parameter, but this parameter is never used to set enable_thinking in the API request body.
Models like qwen3.5-plus and qwen3.5-flash (DashScope) default to thinking mode (chain-of-thought reasoning). Since enable_thinking is never disabled, every memory extraction call triggers full CoT reasoning, causing severe timeouts.
Steps to Reproduce
- Configure openviking with
qwen3.5-plusorqwen3.5-flashas the VLM model via DashScope - Integrate with OpenClaw with
autoCapture: true - Send a message through the channel (e.g., Feishu)
- Observe OpenClaw logs after the agent replies
Expected Behavior
When thinking=False (the default), the backend should pass extra_body={"enable_thinking": False} to the API, disabling unnecessary chain-of-thought reasoning for simple memory extraction tasks.
Actual Behavior
auto-capture failed: AbortError: This operation was aborted appears in OpenClaw logs after 15~60 seconds. The root cause is that qwen3.5-plus/flash defaults to thinking mode, and each /extract API call spends 60+ seconds on CoT reasoning before timing out.
Minimal Reproducible Example
# Fix: add to each kwargs block in openai_vlm.py before the API call
if not thinking:
kwargs["extra_body"] = {"enable_thinking": False}Error Logs
2026-03-24T15:13:27 openviking: auto-capture failed: AbortError: This operation was aborted
2026-03-24T15:30:02 openviking: auto-capture failed: AbortError: This operation was aborted
# Timing: capture-check triggers at T+0, AbortError at T+15s (default) or T+60s (after raising timeoutMs)OpenViking Version
0.2.9
Python Version
3.11
Operating System
Linux
Model Backend
OpenAI
Additional Context
Measured API latency on DashScope:
- qwen3.5-plus (thinking ON, default): ~9s for a simple "hello" → 60s+ for conversation extract
- qwen3.5-flash (thinking ON, default): ~5s for "hello"
- qwen-turbo (no thinking): ~0.8s
The thinking parameter already exists in the method signature — it just needs to be wired to extra_body.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status