Skip to content

Place thinking tokens into reasoning_content field #283

@jeremyfowers

Description

@jeremyfowers

llama.cpp places any thinking/reasoning content into a reasoning_content field in its JSON response to chat/completions:

$ curl -X POST http://localhost:8000/api/v1/chat/completions   -H "Content-Type: application/json"   -d '{
        "model": "Qwen3-0.6B-GGUF",
        "messages": [
          {"role": "user", "content": "What is the population of Paris?"}
        ],
        "stream": true
      }'
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"role":"assistant","content":null}}],"created":1765329204,"id":"chatcmpl-nYkKVUuO55nZsMjRF9LaaSKW4HWFjHJF","model":"Qwen3-0.6B-Q4_0.gguf","system_fingerprint":"b7247-7ca5991d2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"reasoning_content":"Okay"}}],"created":1765329204,"id":"chatcmpl-nYkKVUuO55nZsMjRF9LaaSKW4HWFjHJF","model":"Qwen3-0.6B-Q4_0.gguf","system_fingerprint":"b7247-7ca5991d2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"reasoning_content":","}}],"created":1765329204,"id":"chatcmpl-nYkKVUuO55nZsMjRF9LaaSKW4HWFjHJF","model":"Qwen3-0.6B-Q4_0.gguf","system_fingerprint":"b7247-7ca5991d2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"reasoning_content":" so"}}],"created":1765329204,"id":"chatcmpl-nYkKVUuO55nZsMjRF9LaaSKW4HWFjHJF","model":"Qwen3-0.6B-Q4_0.gguf","system_fingerprint":"b7247-7ca5991d2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"reasoning_content":" I"}}],"created":1765329204,"id":"chatcmpl-nYkKVUuO55nZsMjRF9LaaSKW4HWFjHJF","model":"Qwen3-0.6B-Q4_0.gguf","system_fingerprint":"b7247-7ca5991d2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"reasoning_content":" need"}}],"created":1765329204,"id":"chatcmpl-nYkKVUuO55nZsMjRF9LaaSKW4HWFjHJF","model":"Qwen3-0.6B-Q4_0.gguf","system_fingerprint":"b7247-7ca5991d2","object":"chat.completion.chunk"}

However, flm does not, and puts all thinking/reasoning into the main content field:

$ curl -X POST http://localhost:8000/api/v1/chat/completions   -H "Content-Type: application/json"   -d '{
        "model": "Qwen3-0.6b-FLM",
        "messages": [
          {"role": "user", "content": "What is the population of Paris?"}
        ],
        "stream": true
      }'
data: {"id":"chatcmpl-c4c4ede103caf48bc380697c","object":"chat.completion.chunk","created":1765328898,"model":"qwen3:0.6b","system_fingerprint":"fp_82e7d53fcc912bed","choices":[{"index":0,"delta":{"role":"assistant","content":"<think>","reasoning_content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-c4c4ede103caf48bc380697c","object":"chat.completion.chunk","created":1765328898,"model":"qwen3:0.6b","system_fingerprint":"fp_82e7d53fcc912bed","choices":[{"index":0,"delta":{"role":"assistant","content":"\n","reasoning_content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-c4c4ede103caf48bc380697c","object":"chat.completion.chunk","created":1765328898,"model":"qwen3:0.6b","system_fingerprint":"fp_82e7d53fcc912bed","choices":[{"index":0,"delta":{"role":"assistant","content":"Okay","reasoning_content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-c4c4ede103caf48bc380697c","object":"chat.completion.chunk","created":1765328898,"model":"qwen3:0.6b","system_fingerprint":"fp_82e7d53fcc912bed","choices":[{"index":0,"delta":{"role":"assistant","content":",","reasoning_content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-c4c4ede103caf48bc380697c","object":"chat.completion.chunk","created":1765328898,"model":"qwen3:0.6b","system_fingerprint":"fp_82e7d53fcc912bed","choices":[{"index":0,"delta":{"role":"assistant","content":" the","reasoning_content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-c4c4ede103caf48bc380697c","object":"chat.completion.chunk","created":1765328898,"model":"qwen3:0.6b","system_fingerprint":"fp_82e7d53fcc912bed","choices":[{"index":0,"delta":{"role":"assistant","content":" user","reasoning_content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-c4c4ede103caf48bc380697c","object":"chat.completion.chunk","created":1765328898,"model":"qwen3:0.6b","system_fingerprint":"fp_82e7d53fcc912bed","choices":[{"index":0,"delta":{"role":"assistant","content":" is","reasoning_content":""},"finish_reason":null}]}

Can FLM match the behavior of llama.cpp here? That would make parsing thinking in UIs like Lemonade's easier.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions