-
Notifications
You must be signed in to change notification settings - Fork 14.1k
Description
Name and Version
llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
version: 7403 (5c8a717)
built with GNU 13.3.0 for Linux x86_64
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
llama-server --models-preset config.ini
(same problem with: llama-server --models-preset config.ini --no-mmap )Problem description & steps to reproduce
Content of my config.ini :
[gpt-oss-120b]
model = /path/to/models/gpt-oss-120b/ggml-org_gpt-oss-120b-GGUF_gpt-oss-120b-mxfp4-00001-of-00003.gguf
ctx-size = 32768
temp = 1.0
top-p = 1.0
top-k = 0
min-p = 0
gpu-layers = -1
split-mode = layer
tensor-split = 0.5,0.5
main-gpu = 0
numa = isolate
reasoning-format = none
flash-attn = on
jinja = 1
no-mmap = 1
chat-template-kwargs = {"reasoning_effort": "high"}
Problem, the "no-mmap = 1" isn't converted to --no-mmap
For boolean arguments, the documentation explain we can put on/off, 1/0 or true/false.
But it's not respected, and there is also a « --mmap » added:
srv load: spawning server instance with name=gpt-oss-120b on port 57733
srv load: spawning server instance with args:
srv load: /path/to/llama-server
srv load: --chat-template-kwargs
srv load: {"reasoning_effort": "high"}
srv load: --host
srv load: 127.0.0.1
srv load: --jinja
srv load: --min-p
srv load: 0
srv load: --mmap
srv load: --numa
srv load: isolate
srv load: --port
srv load: 57733
srv load: --reasoning-format
srv load: none
srv load: --temp
srv load: 1.0
srv load: --top-k
srv load: 0
srv load: --top-p
srv load: 1.0
srv load: --alias
srv load: gpt-oss-120b
srv load: --ctx-size
srv load: 32768
srv load: --flash-attn
srv load: on
srv load: --model
srv load: /path/to/models/gpt-oss-120b/ggml-org_gpt-oss-120b-GGUF_gpt-oss-120b-mxfp4-00001-of-00003.gguf
srv load: --main-gpu
srv load: 0
srv load: --n-gpu-layers
srv load: -1
srv load: --split-mode
srv load: layer
srv load: --tensor-split
srv load: 0.5,0.5
I haven't found any solution to transmit this --no-mmap parameter from the config.ini
My hardware doesn't support the --mmap paramater, so it's impossible to uses config.ini at all right now.
First Bad Commit
No response