-
Notifications
You must be signed in to change notification settings - Fork 31.5k
🚨 [v5] Refactor RoPE for layer types #39847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚨 [v5] Refactor RoPE for layer types #39847
Conversation
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤗
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: apertus, arcee, aria, bamba, bitnet, blt, chameleon, cohere |
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating and bearing with my requests!
src/transformers/models/efficientloftr/modeling_efficientloftr.py
Outdated
Show resolved
Hide resolved
|
Visually inspected first ~20 models, for the rest I trust in our testing suite. Will fix a few inconsistencies and merge tomorrow |
|
run-slow: gpt2, qwen2_vl, gemma3, llama, mistral, llava, lfm2_vl, olmo, gemma, paligemma, qwen2_5_omni |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: apertus, arcee, aria, bamba, bitnet, blt, chameleon |
|
Just to be sure, let the slow CI run on some important models |
|
This comment contains run-slow, running the specified jobs: models: ['models/gemma', 'models/gemma3', 'models/gpt2', 'models/lfm2_vl', 'models/llama', 'models/llava', 'models/mistral', 'models/olmo', 'models/paligemma', 'models/qwen2_5_omni', 'models/qwen2_vl'] |
|
🤞🏻 |
|
This PR caused QwenLM/Qwen3-Omni#93 |
* update * batch update model code * typos * too many diffs, dump * dump again * another dump * fix copies * make `rope_scaling_dict` self attr * fix a few more tests * another update * fix a few more tests, hopefully last ones * fox copies * fix copies again * fix newly added models, I hate rebasing on main * update config files * modular files * fix rope utils test * docstring has to be indented more, why? * oops forgot to update some modualr files * copy from doesn't copy decorators? * fix overriden test as well * add a new test * fix failing tests again * update docstrings * fix phi3 * fix two models * fix copies * forgot to add * stupid bug from modular conversion * fix slow tests * update to call rotary emb once per model forward * 3K tests failing?! * update * update more models * fix copies * fix the rest of tests hopefully * fix after rebase * fix the rope tests * fix docs omni * change a bit * models with layer types * why it was deleted? * fix a few tests * fix last test! * delete extra empty lines * add a test case * more changes * fix models * typing hint for nested rope params * missed when resolving conflicts * delete layer types and fix typo * fix copies * fix copies * update docs text * docs * huuge update all models * fix copies * rename attr to align with new format * delete redundant rope tests * trigger ci * update the case * this is why i hate rebasing * maybe fixed? * oops * now fix? * fix last tests and copies * fix copies? * fix minimax and gemma3n * update typo * deprecation end version * final fix copies :fingers-crossed: * oh my, add the docs in toctree * oke, this is really the last fix * fix copies and hope that tests won't start failing again * use rope scaling if saved * fix slow tests * fix cwm and unrelated deepseek * fix last * update * hope it works now, it took so long * lets keep None for now, I will try to remove after checking tests * some more fixes, i find and replace does not always find all cases * last fix of tests * arthur's comment for extra foreward kwargs * delete unused code * fix slow qwen tests * delete layer types from models * faulty modular conversion * fix qwen omni * fix copies and style * address my comment --------- Co-authored-by: ydshieh <[email protected]>
* update * batch update model code * typos * too many diffs, dump * dump again * another dump * fix copies * make `rope_scaling_dict` self attr * fix a few more tests * another update * fix a few more tests, hopefully last ones * fox copies * fix copies again * fix newly added models, I hate rebasing on main * update config files * modular files * fix rope utils test * docstring has to be indented more, why? * oops forgot to update some modualr files * copy from doesn't copy decorators? * fix overriden test as well * add a new test * fix failing tests again * update docstrings * fix phi3 * fix two models * fix copies * forgot to add * stupid bug from modular conversion * fix slow tests * update to call rotary emb once per model forward * 3K tests failing?! * update * update more models * fix copies * fix the rest of tests hopefully * fix after rebase * fix the rope tests * fix docs omni * change a bit * models with layer types * why it was deleted? * fix a few tests * fix last test! * delete extra empty lines * add a test case * more changes * fix models * typing hint for nested rope params * missed when resolving conflicts * delete layer types and fix typo * fix copies * fix copies * update docs text * docs * huuge update all models * fix copies * rename attr to align with new format * delete redundant rope tests * trigger ci * update the case * this is why i hate rebasing * maybe fixed? * oops * now fix? * fix last tests and copies * fix copies? * fix minimax and gemma3n * update typo * deprecation end version * final fix copies :fingers-crossed: * oh my, add the docs in toctree * oke, this is really the last fix * fix copies and hope that tests won't start failing again * use rope scaling if saved * fix slow tests * fix cwm and unrelated deepseek * fix last * update * hope it works now, it took so long * lets keep None for now, I will try to remove after checking tests * some more fixes, i find and replace does not always find all cases * last fix of tests * arthur's comment for extra foreward kwargs * delete unused code * fix slow qwen tests * delete layer types from models * faulty modular conversion * fix qwen omni * fix copies and style * address my comment --------- Co-authored-by: ydshieh <[email protected]>
What does this PR do?
This PR enables rope layers to compute different frequencies for different layer types, which will help us to support models like ModernBert without monkey patching config on-the-fly
Main changes:
rope_parametersis a required attribute if model has RoPE layers. The attr must be a dict containingrope_thetaand optionally other parameters to configure rope. In case we want different params per layer type, it should be a nested dict of format{"full_attn": {**rope_params}, "sliding_attn": {**different_rope_params}}rope_scalingis deprecated in favor ofrope_parametersand raises warning. The latter name is more descriptiveeager_attention_forward, and copied with modular in each fileinv_freqfor each type. If the given layer types has no rope parameters saved in config (e.g.config.rope_scalinghas no key=="sliding_window") we raise an errorTypedDict. It will make our lives easier when we decide to enforce strict type validation on configsThe changes are BC and we will support old-format config files, and standardize it when initializing the config class. The best way to review is to start from
modeling_rope_utils.py->all llama model files->gemma2 and gemma33 model files->tests