Skip to content

Conversation

@zucchini-nlp
Copy link
Member

@zucchini-nlp zucchini-nlp commented Aug 1, 2025

What does this PR do?

This PR enables rope layers to compute different frequencies for different layer types, which will help us to support models like ModernBert without monkey patching config on-the-fly

Main changes:

  • In config classes the rope_parameters is a required attribute if model has RoPE layers. The attr must be a dict containing rope_theta and optionally other parameters to configure rope. In case we want different params per layer type, it should be a nested dict of format {"full_attn": {**rope_params}, "sliding_attn": {**different_rope_params}}
  • The config attr rope_scaling is deprecated in favor of rope_parameters and raises warning. The latter name is more descriptive
  • Default rope freq computation is moved to the model definition similar to eager_attention_forward, and copied with modular in each file
  • RoPE layer now looks for layer types in the config and computes inv_freq for each type. If the given layer types has no rope parameters saved in config (e.g. config.rope_scaling has no key=="sliding_window") we raise an error
  • All models copy from rope layers llama when possible, so that changing one file will update it everywhere. Models with layer types copy from gemma2
  • Config classes now have typing hint in all language models and the rope scaling attribute is typed with TypedDict. It will make our lives easier when we decide to enforce strict type validation on configs

The changes are BC and we will support old-format config files, and standardize it when initializing the config class. The best way to review is to start from modeling_rope_utils.py -> all llama model files -> gemma2 and gemma33 model files -> tests

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤗

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: apertus, arcee, aria, bamba, bitnet, blt, chameleon, cohere

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating and bearing with my requests!

@zucchini-nlp
Copy link
Member Author

Visually inspected first ~20 models, for the rest I trust in our testing suite. Will fix a few inconsistencies and merge tomorrow

@zucchini-nlp
Copy link
Member Author

run-slow: gpt2, qwen2_vl, gemma3, llama, mistral, llava, lfm2_vl, olmo, gemma, paligemma, qwen2_5_omni

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: apertus, arcee, aria, bamba, bitnet, blt, chameleon

@zucchini-nlp
Copy link
Member Author

Just to be sure, let the slow CI run on some important models

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/gemma', 'models/gemma3', 'models/gpt2', 'models/lfm2_vl', 'models/llama', 'models/llava', 'models/mistral', 'models/olmo', 'models/paligemma', 'models/qwen2_5_omni', 'models/qwen2_vl']
quantizations: [] ...

@zucchini-nlp
Copy link
Member Author

🤞🏻

@zucchini-nlp zucchini-nlp merged commit 10de06d into huggingface:main Oct 17, 2025
22 of 23 checks passed
@BakerBunker
Copy link
Contributor

This PR caused QwenLM/Qwen3-Omni#93

@ydshieh
Copy link
Collaborator

ydshieh commented Oct 21, 2025

cc @zucchini-nlp

yonigozlan pushed a commit to yonigozlan/transformers that referenced this pull request Oct 21, 2025
* update

* batch update model code

* typos

* too many diffs, dump

* dump again

* another dump

* fix copies

* make `rope_scaling_dict` self attr

* fix a few more tests

* another update

* fix a few more tests, hopefully last ones

* fox copies

* fix copies again

* fix newly added models, I hate rebasing on main

* update config files

* modular files

* fix rope utils test

* docstring has to be indented more, why?

* oops forgot to update some modualr files

* copy from doesn't copy decorators?

* fix overriden test as well

* add a new test

* fix failing tests again

* update docstrings

* fix phi3

* fix two models

* fix copies

* forgot to add

* stupid bug from modular conversion

* fix slow tests

* update to call rotary emb once per model forward

* 3K tests failing?!

* update

* update more models

* fix copies

* fix the rest of tests hopefully

* fix after rebase

* fix the rope tests

* fix docs omni

* change a bit

* models with layer types

* why it was deleted?

* fix a few tests

* fix last test!

* delete extra empty lines

* add a test case

* more changes

* fix models

* typing hint for nested rope params

* missed when resolving conflicts

* delete layer types and fix typo

* fix copies

* fix copies

* update docs text

* docs

* huuge update all models

* fix copies

* rename attr to align with new format

* delete redundant rope tests

* trigger ci

* update the case

* this is why i hate rebasing

* maybe fixed?

* oops

* now fix?

* fix last tests and copies

* fix copies?

* fix minimax and gemma3n

* update typo

* deprecation end version

* final fix copies :fingers-crossed:

* oh my, add the docs in toctree

* oke, this is really the last fix

* fix copies and hope that tests won't start failing again

* use rope scaling if saved

* fix slow tests

* fix cwm and unrelated deepseek

* fix last

* update

* hope it works now, it took so long

* lets keep None for now, I will try to remove after checking tests

* some more fixes, i find and replace does not always find all cases

* last fix of tests

* arthur's comment for extra foreward kwargs

* delete unused code

* fix slow qwen tests

* delete layer types from models

* faulty modular conversion

* fix qwen omni

* fix copies and style

* address my comment

---------

Co-authored-by: ydshieh <[email protected]>
ngazagna-qc pushed a commit to ngazagna-qc/transformers that referenced this pull request Oct 23, 2025
* update

* batch update model code

* typos

* too many diffs, dump

* dump again

* another dump

* fix copies

* make `rope_scaling_dict` self attr

* fix a few more tests

* another update

* fix a few more tests, hopefully last ones

* fox copies

* fix copies again

* fix newly added models, I hate rebasing on main

* update config files

* modular files

* fix rope utils test

* docstring has to be indented more, why?

* oops forgot to update some modualr files

* copy from doesn't copy decorators?

* fix overriden test as well

* add a new test

* fix failing tests again

* update docstrings

* fix phi3

* fix two models

* fix copies

* forgot to add

* stupid bug from modular conversion

* fix slow tests

* update to call rotary emb once per model forward

* 3K tests failing?!

* update

* update more models

* fix copies

* fix the rest of tests hopefully

* fix after rebase

* fix the rope tests

* fix docs omni

* change a bit

* models with layer types

* why it was deleted?

* fix a few tests

* fix last test!

* delete extra empty lines

* add a test case

* more changes

* fix models

* typing hint for nested rope params

* missed when resolving conflicts

* delete layer types and fix typo

* fix copies

* fix copies

* update docs text

* docs

* huuge update all models

* fix copies

* rename attr to align with new format

* delete redundant rope tests

* trigger ci

* update the case

* this is why i hate rebasing

* maybe fixed?

* oops

* now fix?

* fix last tests and copies

* fix copies?

* fix minimax and gemma3n

* update typo

* deprecation end version

* final fix copies :fingers-crossed:

* oh my, add the docs in toctree

* oke, this is really the last fix

* fix copies and hope that tests won't start failing again

* use rope scaling if saved

* fix slow tests

* fix cwm and unrelated deepseek

* fix last

* update

* hope it works now, it took so long

* lets keep None for now, I will try to remove after checking tests

* some more fixes, i find and replace does not always find all cases

* last fix of tests

* arthur's comment for extra foreward kwargs

* delete unused code

* fix slow qwen tests

* delete layer types from models

* faulty modular conversion

* fix qwen omni

* fix copies and style

* address my comment

---------

Co-authored-by: ydshieh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants