Eval bug: parser error when specifying `max_tokens`

### Name and Version

ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 7600 XT (RADV NAVI33) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
ggml_vulkan: 1 = AMD Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
version: 8233 (c5a778891)
built with GNU 15.2.1 for Linux x86_64



### Operating systems

Linux

### GGML backends

HIP, Vulkan

### Hardware

- Strix Halo
- Xeon Platinum 8368 + Radeon RX 7900 XTX

### Models

Issue verified to affect at least:

- Qwen3.5 35B, 122B, 397B
- Step 3.5 Flash
- gpt-oss-120b

At various quants.

### Problem description & steps to reproduce

I started seeing parser errors last night, and it seems to be related to the `max_tokens` param, but it seems to be linked to the `max_tokens` parameters:

```
curl -X POST -H "Content-Type: application/json" --data '{"model": "gpt-oss-120b", "messages": [{"role": "user", "content": "Warmup Warmup Warmup Warmup Warmup Warmup Warmup Warmup Warmup Warmup "}], "max_tokens": 1}' http://127.0.0.1:10011/v1/chat/completions
```

Fails for all models I've tried with an errors like this:

gpt-oss-120b:
```
{"error":{"code":500,"message":"Failed to parse input at pos 0: <|channel|>","type":"server_error"}}
```

Qwen3.5:
```
{"error":{"code":500,"message":"Failed to parse input at pos 8: ","type":"server_error"}}
```

Removing the `max_tokens` fixes it for all models. For `gpt-oss-120b`, increasing it to 3 also fixes it (2 does not). For Qwen3-397B-A17B (unsloth quants), no value of `max_tokens` seems to be sufficient in my trials, and I ended up having to filter out `max_tokens` using llama-swap for now, but obviously this is not desirable.

Might be related to https://github.com/ggml-org/llama.cpp/pull/18675

### First Bad Commit

c5a778891 (could have been an issue before, haven't had time to bisect)

### Relevant log output


No relevant output in logs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: parser error when specifying `max_tokens` #20229

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: parser error when specifying max_tokens #20229

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Eval bug: parser error when specifying `max_tokens` #20229