[Bug]: VLM backend `thinking` parameter defined but never passed to API (causes auto-capture timeout with thinking-enabled models)

### Bug Description

In `openviking/models/vlm/backends/openai_vlm.py`, all four completion methods (`get_completion`, `get_completion_async`, `get_vision_completion`, `get_vision_completion_async`) accept a `thinking: bool = False` parameter, but this parameter is **never used** to set `enable_thinking` in the API request body.

Models like `qwen3.5-plus` and `qwen3.5-flash` (DashScope) default to thinking mode (chain-of-thought reasoning). Since `enable_thinking` is never disabled, every memory extraction call triggers full CoT reasoning, causing severe timeouts.


### Steps to Reproduce

1. Configure openviking with `qwen3.5-plus` or `qwen3.5-flash` as the VLM model via DashScope
2. Integrate with OpenClaw with `autoCapture: true`
3. Send a message through the channel (e.g., Feishu)
4. Observe OpenClaw logs after the agent replies


### Expected Behavior

When `thinking=False` (the default), the backend should pass `extra_body={"enable_thinking": False}` to the API, disabling unnecessary chain-of-thought reasoning for simple memory extraction tasks.


### Actual Behavior

`auto-capture failed: AbortError: This operation was aborted` appears in OpenClaw logs after 15~60 seconds. The root cause is that qwen3.5-plus/flash defaults to thinking mode, and each `/extract` API call spends 60+ seconds on CoT reasoning before timing out.


### Minimal Reproducible Example

```python
# Fix: add to each kwargs block in openai_vlm.py before the API call
if not thinking:
    kwargs["extra_body"] = {"enable_thinking": False}
```

### Error Logs

```shell
2026-03-24T15:13:27 openviking: auto-capture failed: AbortError: This operation was aborted
2026-03-24T15:30:02 openviking: auto-capture failed: AbortError: This operation was aborted
# Timing: capture-check triggers at T+0, AbortError at T+15s (default) or T+60s (after raising timeoutMs)
```

### OpenViking Version

0.2.9

### Python Version

3.11

### Operating System

Linux

### Model Backend

OpenAI

### Additional Context

Measured API latency on DashScope:
- qwen3.5-plus (thinking ON, default): ~9s for a simple "hello" → 60s+ for conversation extract
- qwen3.5-flash (thinking ON, default): ~5s for "hello"
- qwen-turbo (no thinking): ~0.8s

The `thinking` parameter already exists in the method signature — it just needs to be wired to `extra_body`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: VLM backend `thinking` parameter defined but never passed to API (causes auto-capture timeout with thinking-enabled models) #923

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Minimal Reproducible Example

Error Logs

OpenViking Version

Python Version

Operating System

Model Backend

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: VLM backend thinking parameter defined but never passed to API (causes auto-capture timeout with thinking-enabled models) #923

Description

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Minimal Reproducible Example

Error Logs

OpenViking Version

Python Version

Operating System

Model Backend

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Bug]: VLM backend `thinking` parameter defined but never passed to API (causes auto-capture timeout with thinking-enabled models) #923