-
Notifications
You must be signed in to change notification settings - Fork 14.1k
llama: introduce support for model-embedded sampling parameters #17120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama: introduce support for model-embedded sampling parameters #17120
Conversation
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
So, you're suggesting that parameters should be added manually before conversion? How likely is that to happen? AFAIK most models come with recommended (though, some are likely to just be copy-pasted from somewhere) settings in Edit: or is that automatically added to metadata? |
You're right, I didn't spot that. Well I guess I have to rework the code such that it pulls |
|
I think |
|
@Green-Sky Will include @CISC RE |
Signed-off-by: Aaron Teo <[email protected]>
Does |
Doesn't look like it. Followed some of Ollama's supported parameters: https://ollama.readthedocs.io/en/modelfile/#parameter |
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
|
@CISC any update on this PR? |
Thanks for the reminder. :) |
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
|
@ggerganov soft ping. in-case it was missed, latest changes are ready for review :) |
| // Get sampling metadata key name. Returns nullptr if the key is invalid | ||
| LLAMA_API const char * llama_model_meta_key_str(enum llama_model_meta_key key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the comment is true right now, I guess the intended future use is broader than sampling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the comment is true right now, I guess the intended future use is broader than sampling?
Yep, we just need to update the Ilama_model_meta_key to include other metadata keys :)
ref: #17088
This PR introduces the feature to allow sampler parameters to be set from GGUF KV metadata allowing model creators to embed recommended sampler settings unless explicitly overridden using the CLI flags.
Handy for users who do not want to tinker with the settings but want the recommended settings applied.
Priority of Sampling Parameters
--temp 0.6)general.sampling.temp = 0.6)common_params_samplingIntroduced Metadata
general.sampling.sequencegeneral.sampling.top_kgeneral.sampling.top_pgeneral.sampling.min_pgeneral.sampling.xtc_probabilitygeneral.sampling.xtc_thresholdgeneral.sampling.tempgeneral.sampling.penalty_last_ngeneral.sampling.penalty_repeatgeneral.sampling.mirostatgeneral.sampling.mirostat_taugeneral.sampling.mirostat_etaPlease let me know if we should introduce more sampling parameters.
Embedding From Safetensors into GGUF
By default, it will attempt to find the
generation_config.jsonwithin the model directory and automatically add recommended sampler parameters into the GGUF metadata. If a sampling parameter is not available within the file, users can also specify--metadata metadata.json.Note that
--metadata metadata.jsontakes precedence overgeneration_config.jsonand will overwrite metadata if duplicate keys are found.