Skip to content

[Bug]: VLM backend thinking parameter defined but never passed to API (causes auto-capture timeout with thinking-enabled models) #923

@wangxiaojun1990

Description

@wangxiaojun1990

Bug Description

In openviking/models/vlm/backends/openai_vlm.py, all four completion methods (get_completion, get_completion_async, get_vision_completion, get_vision_completion_async) accept a thinking: bool = False parameter, but this parameter is never used to set enable_thinking in the API request body.

Models like qwen3.5-plus and qwen3.5-flash (DashScope) default to thinking mode (chain-of-thought reasoning). Since enable_thinking is never disabled, every memory extraction call triggers full CoT reasoning, causing severe timeouts.

Steps to Reproduce

  1. Configure openviking with qwen3.5-plus or qwen3.5-flash as the VLM model via DashScope
  2. Integrate with OpenClaw with autoCapture: true
  3. Send a message through the channel (e.g., Feishu)
  4. Observe OpenClaw logs after the agent replies

Expected Behavior

When thinking=False (the default), the backend should pass extra_body={"enable_thinking": False} to the API, disabling unnecessary chain-of-thought reasoning for simple memory extraction tasks.

Actual Behavior

auto-capture failed: AbortError: This operation was aborted appears in OpenClaw logs after 15~60 seconds. The root cause is that qwen3.5-plus/flash defaults to thinking mode, and each /extract API call spends 60+ seconds on CoT reasoning before timing out.

Minimal Reproducible Example

# Fix: add to each kwargs block in openai_vlm.py before the API call
if not thinking:
    kwargs["extra_body"] = {"enable_thinking": False}

Error Logs

2026-03-24T15:13:27 openviking: auto-capture failed: AbortError: This operation was aborted
2026-03-24T15:30:02 openviking: auto-capture failed: AbortError: This operation was aborted
# Timing: capture-check triggers at T+0, AbortError at T+15s (default) or T+60s (after raising timeoutMs)

OpenViking Version

0.2.9

Python Version

3.11

Operating System

Linux

Model Backend

OpenAI

Additional Context

Measured API latency on DashScope:

  • qwen3.5-plus (thinking ON, default): ~9s for a simple "hello" → 60s+ for conversation extract
  • qwen3.5-flash (thinking ON, default): ~5s for "hello"
  • qwen-turbo (no thinking): ~0.8s

The thinking parameter already exists in the method signature — it just needs to be wired to extra_body.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions