-
Notifications
You must be signed in to change notification settings - Fork 45
Closed
Description
llama.cpp places any thinking/reasoning content into a reasoning_content field in its JSON response to chat/completions:
$ curl -X POST http://localhost:8000/api/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "Qwen3-0.6B-GGUF",
"messages": [
{"role": "user", "content": "What is the population of Paris?"}
],
"stream": true
}'
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"role":"assistant","content":null}}],"created":1765329204,"id":"chatcmpl-nYkKVUuO55nZsMjRF9LaaSKW4HWFjHJF","model":"Qwen3-0.6B-Q4_0.gguf","system_fingerprint":"b7247-7ca5991d2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"reasoning_content":"Okay"}}],"created":1765329204,"id":"chatcmpl-nYkKVUuO55nZsMjRF9LaaSKW4HWFjHJF","model":"Qwen3-0.6B-Q4_0.gguf","system_fingerprint":"b7247-7ca5991d2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"reasoning_content":","}}],"created":1765329204,"id":"chatcmpl-nYkKVUuO55nZsMjRF9LaaSKW4HWFjHJF","model":"Qwen3-0.6B-Q4_0.gguf","system_fingerprint":"b7247-7ca5991d2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"reasoning_content":" so"}}],"created":1765329204,"id":"chatcmpl-nYkKVUuO55nZsMjRF9LaaSKW4HWFjHJF","model":"Qwen3-0.6B-Q4_0.gguf","system_fingerprint":"b7247-7ca5991d2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"reasoning_content":" I"}}],"created":1765329204,"id":"chatcmpl-nYkKVUuO55nZsMjRF9LaaSKW4HWFjHJF","model":"Qwen3-0.6B-Q4_0.gguf","system_fingerprint":"b7247-7ca5991d2","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"reasoning_content":" need"}}],"created":1765329204,"id":"chatcmpl-nYkKVUuO55nZsMjRF9LaaSKW4HWFjHJF","model":"Qwen3-0.6B-Q4_0.gguf","system_fingerprint":"b7247-7ca5991d2","object":"chat.completion.chunk"}
However, flm does not, and puts all thinking/reasoning into the main content field:
$ curl -X POST http://localhost:8000/api/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "Qwen3-0.6b-FLM",
"messages": [
{"role": "user", "content": "What is the population of Paris?"}
],
"stream": true
}'
data: {"id":"chatcmpl-c4c4ede103caf48bc380697c","object":"chat.completion.chunk","created":1765328898,"model":"qwen3:0.6b","system_fingerprint":"fp_82e7d53fcc912bed","choices":[{"index":0,"delta":{"role":"assistant","content":"<think>","reasoning_content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-c4c4ede103caf48bc380697c","object":"chat.completion.chunk","created":1765328898,"model":"qwen3:0.6b","system_fingerprint":"fp_82e7d53fcc912bed","choices":[{"index":0,"delta":{"role":"assistant","content":"\n","reasoning_content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-c4c4ede103caf48bc380697c","object":"chat.completion.chunk","created":1765328898,"model":"qwen3:0.6b","system_fingerprint":"fp_82e7d53fcc912bed","choices":[{"index":0,"delta":{"role":"assistant","content":"Okay","reasoning_content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-c4c4ede103caf48bc380697c","object":"chat.completion.chunk","created":1765328898,"model":"qwen3:0.6b","system_fingerprint":"fp_82e7d53fcc912bed","choices":[{"index":0,"delta":{"role":"assistant","content":",","reasoning_content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-c4c4ede103caf48bc380697c","object":"chat.completion.chunk","created":1765328898,"model":"qwen3:0.6b","system_fingerprint":"fp_82e7d53fcc912bed","choices":[{"index":0,"delta":{"role":"assistant","content":" the","reasoning_content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-c4c4ede103caf48bc380697c","object":"chat.completion.chunk","created":1765328898,"model":"qwen3:0.6b","system_fingerprint":"fp_82e7d53fcc912bed","choices":[{"index":0,"delta":{"role":"assistant","content":" user","reasoning_content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-c4c4ede103caf48bc380697c","object":"chat.completion.chunk","created":1765328898,"model":"qwen3:0.6b","system_fingerprint":"fp_82e7d53fcc912bed","choices":[{"index":0,"delta":{"role":"assistant","content":" is","reasoning_content":""},"finish_reason":null}]}
Can FLM match the behavior of llama.cpp here? That would make parsing thinking in UIs like Lemonade's easier.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels