Skip to content

Misc. bug: enable_thinking param cannot turn off thinking for qwen3.5 #20182

@Goulustis

Description

@Goulustis

Name and Version

$./llama-cli --version
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
version: 8215 (17a4258)
built with GNU 13.3.0 for Linux x86_64

The model still thinks event with enable_thinking:false.

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-cli

Command line

./llama-cli -m ~/models/Qwen3.5-9B-Q4_K_M.gguf -cnv -t 10 -c 2048 --chat-template-kwargs '{\"enable_thinking\": false}'

Problem description & steps to reproduce

Run the above command, type hello. You will see that thinking mode is enabled.

First Bad Commit

No response

Relevant log output

Logs

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions