llama: introduce support for model-embedded sampling parameters #17120

taronaeo · 2025-11-09T12:38:35Z

This PR introduces the feature to allow sampler parameters to be set from GGUF KV metadata allowing model creators to embed recommended sampler settings unless explicitly overridden using the CLI flags.

Handy for users who do not want to tinker with the settings but want the recommended settings applied.

Priority of Sampling Parameters

User flags (i.e., setting --temp 0.6)
Model-Embedded recommendation (i.e., general.sampling.temp = 0.6)
Default hardcoded values in common_params_sampling

Introduced Metadata

general.sampling.sequence
general.sampling.top_k
general.sampling.top_p
general.sampling.min_p
general.sampling.xtc_probability
general.sampling.xtc_threshold
general.sampling.temp
general.sampling.penalty_last_n
general.sampling.penalty_repeat
general.sampling.mirostat
general.sampling.mirostat_tau
general.sampling.mirostat_eta

Please let me know if we should introduce more sampling parameters.

Embedding From Safetensors into GGUF

By default, it will attempt to find the generation_config.json within the model directory and automatically add recommended sampler parameters into the GGUF metadata. If a sampling parameter is not available within the file, users can also specify --metadata metadata.json.

Note that --metadata metadata.json takes precedence over generation_config.json and will overwrite metadata if duplicate keys are found.

$ cat > metadata.json << EOF 
{
    "general.sampling.temp": 0.6
}
EOF

$ python3 convert_hf_to_gguf.py --outfile deepseek-r1-distill-qwen-1.5b.gguf --metadata metadata.json deepseek-r1-distill-qwen-1.5b/

$ ./build/bin/llama-cli -m deepseek-r1-distill-qwen-1.5b.gguf -p "Write me a dog walking business idea 1. " -no-cnv -n 1 -t 10 2>&1 | grep "temp"    
llama_model_loader: - kv   2:                       general.sampling.temp f32             = 0.600000
llama_model_loader: - kv  27:                    tokenizer.chat_template str              = {% if not add_generation_prompt is de...
        top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.600
sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist

Signed-off-by: Aaron Teo <[email protected]>

CISC · 2025-11-09T12:55:23Z

$ cat > metadata.json << EOF 
{
    "general.sampler.temp": 0.6
}
EOF

So, you're suggesting that parameters should be added manually before conversion? How likely is that to happen?

AFAIK most models come with recommended (though, some are likely to just be copy-pasted from somewhere) settings in generation_config.json, so perhaps a better idea to get them from there?

Edit: or is that automatically added to metadata?

taronaeo · 2025-11-09T12:59:09Z

$ cat > metadata.json << EOF 
{
    "general.sampler.temp": 0.6
}
EOF
So, you're suggesting that parameters should be added manually before conversion? How likely is that to happen?

AFAIK most models come with recommended (though, some are likely to just be copy-pasted from somewhere) settings in generation_config.json, so perhaps a better idea to get them from there?

You're right, I didn't spot that. Well I guess I have to rework the code such that it pulls generation_config.json from the model directory, maps to general.sampler.* and we can skip the --metadata flag.

Green-Sky · 2025-11-09T13:43:33Z

I think sampling sequence is important too. Also I personally only really tend to use min-p and xtc(not in your proposal).

taronaeo · 2025-11-09T14:10:59Z

@Green-Sky Will include general.sampler.xtc_probability and general.sampler.xtc_thresold first then --samplers SEQUENCE.

@CISC RE generation_config.json vs. the custom --metadata file, I've realised that generation_config.json does not actually document (non-standard) support for parameters such as mirostat. In this case, we'll still need support for --metadata metadata.json to cover these parameters, unless there is a better way of handling this.

Signed-off-by: Aaron Teo <[email protected]>

CISC · 2025-11-09T14:25:17Z

@CISC RE generation_config.json vs. the custom --metadata file, I've realised that generation_config.json does not actually document (non-standard) support for parameters such as mirostat. In this case, we'll still need support for --metadata metadata.json to cover these parameters, unless there is a better way of handling this.

Does transformers even have this parameter?

taronaeo · 2025-11-09T14:31:24Z

@CISC RE generation_config.json vs. the custom --metadata file, I've realised that generation_config.json does not actually document (non-standard) support for parameters such as mirostat. In this case, we'll still need support for --metadata metadata.json to cover these parameters, unless there is a better way of handling this.

Does transformers even have this parameter?

Doesn't look like it. Followed some of Ollama's supported parameters: https://ollama.readthedocs.io/en/modelfile/#parameter

Signed-off-by: Aaron Teo <[email protected]>

taronaeo · 2025-11-14T03:25:15Z

@CISC any update on this PR?

gguf-py/gguf/constants.py

common/common.h

common/common.cpp

CISC · 2025-11-14T08:38:05Z

@CISC any update on this PR?

Thanks for the reminder. :)

Signed-off-by: Aaron Teo <[email protected]>

taronaeo · 2025-11-21T15:31:32Z

@ggerganov soft ping. in-case it was missed, latest changes are ready for review :)

CISC · 2025-11-24T13:18:27Z

include/llama.h

+    // Get sampling metadata key name. Returns nullptr if the key is invalid
+    LLAMA_API const char * llama_model_meta_key_str(enum llama_model_meta_key key);


Although the comment is true right now, I guess the intended future use is broader than sampling?

Although the comment is true right now, I guess the intended future use is broader than sampling?

Yep, we just need to update the Ilama_model_meta_key to include other metadata keys :)

taronaeo added 2 commits November 9, 2025 19:31

common: introduce auto sampling params from metadata

70f568a

Signed-off-by: Aaron Teo <[email protected]>

gguf-py: introduce new kv for conversion scripts

7de014e

Signed-off-by: Aaron Teo <[email protected]>

taronaeo requested review from CISC and ggerganov as code owners November 9, 2025 12:38

gguf-py: fix formatting

c41bb28

Signed-off-by: Aaron Teo <[email protected]>

github-actions bot added the python python script changes label Nov 9, 2025

gguf-py: fix more formatting issues

caa7a03

Signed-off-by: Aaron Teo <[email protected]>

gguf-py: introduce support for reading from generation_config.py

44addce

Signed-off-by: Aaron Teo <[email protected]>

gguf-py: simplified gen_config loading

0f8d637

Signed-off-by: Aaron Teo <[email protected]>

DajanaV mentioned this pull request Nov 9, 2025

UPSTREAM PR #17120: llama: introduce support for model-embedded sampling parameters auroralabs-loci/llama.cpp#147

Closed

taronaeo added 5 commits November 9, 2025 22:42

llama: add support for xtc sampler

6cf3900

Signed-off-by: Aaron Teo <[email protected]>

chore: formatting

c8845ff

Signed-off-by: Aaron Teo <[email protected]>

common: introduce support for general.sampler.sequence

33ddb27

Signed-off-by: Aaron Teo <[email protected]>

gguf-py: revert test_metadata.py

fd3fa3a

Signed-off-by: Aaron Teo <[email protected]>

gguf-py: fix linting

fc91c10

Signed-off-by: Aaron Teo <[email protected]>

ggerganov reviewed Nov 14, 2025

View reviewed changes

gguf-py/gguf/constants.py Show resolved Hide resolved

common/common.h Outdated Show resolved Hide resolved

common/common.cpp Show resolved Hide resolved

taronaeo added 4 commits November 14, 2025 23:56

llama: rename sampler to sampling

a2701ec

Signed-off-by: Aaron Teo <[email protected]>

llama: sampling_config to user_sampling_config for clarity

954278c

Signed-off-by: Aaron Teo <[email protected]>

common: missed updating a variable

19f4c10

Signed-off-by: Aaron Teo <[email protected]>

llama: move metadata keys to libllama

f58b758

Signed-off-by: Aaron Teo <[email protected]>

taronaeo requested a review from danbev as a code owner November 16, 2025 10:20

revert: build-xcframework.sh

6e33d2c

Signed-off-by: Aaron Teo <[email protected]>

llama: fix typo b/w temp and temperature

a6494f4

Signed-off-by: Aaron Teo <[email protected]>

ggerganov approved these changes Nov 24, 2025

View reviewed changes

CISC approved these changes Nov 24, 2025

View reviewed changes

taronaeo merged commit 877566d into ggml-org:master Nov 25, 2025
74 checks passed

ORippler mentioned this pull request Dec 12, 2025

sampling : add support for backend sampling #17004

Open

25 tasks

		// Get sampling metadata key name. Returns nullptr if the key is invalid
		LLAMA_API const char * llama_model_meta_key_str(enum llama_model_meta_key key);

llama: introduce support for model-embedded sampling parameters #17120

llama: introduce support for model-embedded sampling parameters #17120

Uh oh!

Conversation

taronaeo commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Priority of Sampling Parameters

Introduced Metadata

Embedding From Safetensors into GGUF

Uh oh!

CISC commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taronaeo commented Nov 9, 2025

Uh oh!

Green-Sky commented Nov 9, 2025

Uh oh!

taronaeo commented Nov 9, 2025

Uh oh!

CISC commented Nov 9, 2025

Uh oh!

taronaeo commented Nov 9, 2025

Uh oh!

taronaeo commented Nov 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CISC commented Nov 14, 2025

Uh oh!

taronaeo commented Nov 21, 2025

Uh oh!

CISC Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

taronaeo Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

taronaeo commented Nov 9, 2025 •

edited

Loading

CISC commented Nov 9, 2025 •

edited

Loading