Llama4 RoPE fix#12889
Conversation
|
@ngxson :) |
|
@ggerganov FYI, I write a small script here: https://gist.github.com/ngxson/6dec015080121d239caa668332fba3f8 It calculates the Below is the value: on the left, old For now, I have no idea now to set this after loading GGUF (as discussed via DM), but feel free to make suggestion! |
|
Thanks. AFAIU the upstream models have been updated with a new RoPE config which technically would require re-converting existing GGUF models. I don't think there is an elegant way to avoid this conversion and do it seamlessly so that old GGUFs work with the new rope factors. It seems it will always be some non-trivial hack that will remain in the codebase forever. So I think it is better to just recommend conversion/re-download of the models. We can put a notice in the README in hot topics? |
|
Yes sounds ok to me, if it's too hacky then let's not do it. I think most people will use quantization from @unslothai or @bartowski1182 anyway (which was and will be updated very quickly), so probably don't need to add a notice. |
|
Yeah I'll let it sit another week to make sure there's nothing else breaking and throw the reconverted model up |
Llama4 Scout config.json changed RoPE scaling, so we need to remove the assert since it breaks on Llama 4