Name and Version
$./llama-cli --version
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
version: 8215 (17a4258)
built with GNU 13.3.0 for Linux x86_64
The model still thinks event with enable_thinking:false.
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-cli
Command line
./llama-cli -m ~/models/Qwen3.5-9B-Q4_K_M.gguf -cnv -t 10 -c 2048 --chat-template-kwargs '{\"enable_thinking\": false}'
Problem description & steps to reproduce
Run the above command, type hello. You will see that thinking mode is enabled.
First Bad Commit
No response
Relevant log output
Logs