🚨 [v5] Refactor RoPE for layer types #39847

zucchini-nlp · 2025-08-01T14:29:09Z

What does this PR do?

This PR enables rope layers to compute different frequencies for different layer types, which will help us to support models like ModernBert without monkey patching config on-the-fly

Main changes:

In config classes the rope_parameters is a required attribute if model has RoPE layers. The attr must be a dict containing rope_theta and optionally other parameters to configure rope. In case we want different params per layer type, it should be a nested dict of format {"full_attn": {**rope_params}, "sliding_attn": {**different_rope_params}}
The config attr rope_scaling is deprecated in favor of rope_parameters and raises warning. The latter name is more descriptive
Default rope freq computation is moved to the model definition similar to eager_attention_forward, and copied with modular in each file
RoPE layer now looks for layer types in the config and computes inv_freq for each type. If the given layer types has no rope parameters saved in config (e.g. config.rope_scaling has no key=="sliding_window") we raise an error
All models copy from rope layers llama when possible, so that changing one file will update it everywhere. Models with layer types copy from gemma2
Config classes now have typing hint in all language models and the rope scaling attribute is typed with TypedDict. It will make our lives easier when we decide to enforce strict type validation on configs

The changes are BC and we will support old-format config files, and standardize it when initializing the config class. The best way to review is to start from modeling_rope_utils.py -> all llama model files -> gemma2 and gemma33 model files -> tests

ArthurZucker

🤗

src/transformers/models/llama/modeling_llama.py

HuggingFaceDocBuilderDev · 2025-08-11T13:41:23Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2025-10-16T13:10:47Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: apertus, arcee, aria, bamba, bitnet, blt, chameleon, cohere

ArthurZucker

Thanks for iterating and bearing with my requests!

docs/source/en/internal/rope_utils.md

src/transformers/models/bamba/configuration_bamba.py

src/transformers/models/efficientloftr/modeling_efficientloftr.py

zucchini-nlp · 2025-10-16T15:10:58Z

Visually inspected first ~20 models, for the rest I trust in our testing suite. Will fix a few inconsistencies and merge tomorrow

zucchini-nlp · 2025-10-16T15:18:46Z

run-slow: gpt2, qwen2_vl, gemma3, llama, mistral, llava, lfm2_vl, olmo, gemma, paligemma, qwen2_5_omni

github-actions · 2025-10-16T15:19:03Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: apertus, arcee, aria, bamba, bitnet, blt, chameleon

zucchini-nlp · 2025-10-16T15:19:08Z

Just to be sure, let the slow CI run on some important models

github-actions · 2025-10-16T15:20:19Z

This comment contains run-slow, running the specified jobs:

models: ['models/gemma', 'models/gemma3', 'models/gpt2', 'models/lfm2_vl', 'models/llama', 'models/llava', 'models/mistral', 'models/olmo', 'models/paligemma', 'models/qwen2_5_omni', 'models/qwen2_vl']
quantizations: [] ...

zucchini-nlp · 2025-10-17T12:56:27Z

🤞🏻

BakerBunker · 2025-10-21T07:39:17Z

This PR caused QwenLM/Qwen3-Omni#93

ydshieh · 2025-10-21T08:04:51Z

cc @zucchini-nlp

* update * batch update model code * typos * too many diffs, dump * dump again * another dump * fix copies * make `rope_scaling_dict` self attr * fix a few more tests * another update * fix a few more tests, hopefully last ones * fox copies * fix copies again * fix newly added models, I hate rebasing on main * update config files * modular files * fix rope utils test * docstring has to be indented more, why? * oops forgot to update some modualr files * copy from doesn't copy decorators? * fix overriden test as well * add a new test * fix failing tests again * update docstrings * fix phi3 * fix two models * fix copies * forgot to add * stupid bug from modular conversion * fix slow tests * update to call rotary emb once per model forward * 3K tests failing?! * update * update more models * fix copies * fix the rest of tests hopefully * fix after rebase * fix the rope tests * fix docs omni * change a bit * models with layer types * why it was deleted? * fix a few tests * fix last test! * delete extra empty lines * add a test case * more changes * fix models * typing hint for nested rope params * missed when resolving conflicts * delete layer types and fix typo * fix copies * fix copies * update docs text * docs * huuge update all models * fix copies * rename attr to align with new format * delete redundant rope tests * trigger ci * update the case * this is why i hate rebasing * maybe fixed? * oops * now fix? * fix last tests and copies * fix copies? * fix minimax and gemma3n * update typo * deprecation end version * final fix copies :fingers-crossed: * oh my, add the docs in toctree * oke, this is really the last fix * fix copies and hope that tests won't start failing again * use rope scaling if saved * fix slow tests * fix cwm and unrelated deepseek * fix last * update * hope it works now, it took so long * lets keep None for now, I will try to remove after checking tests * some more fixes, i find and replace does not always find all cases * last fix of tests * arthur's comment for extra foreward kwargs * delete unused code * fix slow qwen tests * delete layer types from models * faulty modular conversion * fix qwen omni * fix copies and style * address my comment --------- Co-authored-by: ydshieh <[email protected]>

update

30aaa21

zucchini-nlp mentioned this pull request Aug 1, 2025

[WIP] RoPE refactor #39827

Closed

ArthurZucker reviewed Aug 5, 2025

View reviewed changes

src/transformers/models/llama/modeling_llama.py Show resolved Hide resolved

src/transformers/models/llama/modeling_llama.py Outdated Show resolved Hide resolved

src/transformers/models/llama/modeling_llama.py Show resolved Hide resolved

zucchini-nlp added 12 commits August 5, 2025 18:02

batch update model code

0e5d07b

typos

0712f62

too many diffs, dump

4fc7355

dump again

b616240

another dump

06cd2a8

fix copies

f66ad57

make rope_scaling_dict self attr

7dc077f

fix a few more tests

4ac0f18

another update

9ad42e9

fix a few more tests, hopefully last ones

98944d5

fox copies

1213769

a huuuge merge conflict resolved!

d787da7

zucchini-nlp added 14 commits August 11, 2025 15:56

fix copies again

00d4b3d

fix newly added models, I hate rebasing on main

f9d4de3

update config files

d695f5a

modular files

303f218

fix rope utils test

3229fba

docstring has to be indented more, why?

1914d82

oops forgot to update some modualr files

fccb637

copy from doesn't copy decorators?

709c414

fix overriden test as well

b00d90c

add a new test

c8120cf

fix failing tests again

a352362

update docstrings

2f54cb3

fix phi3

11edd47

Merge branch 'main' into rope-refactor-version-2

6bc850d

fix copies and style

5f126b7

ArthurZucker approved these changes Oct 16, 2025

View reviewed changes

zucchini-nlp commented Oct 16, 2025

View reviewed changes

docs/source/en/internal/rope_utils.md Outdated Show resolved Hide resolved

zucchini-nlp commented Oct 16, 2025

View reviewed changes

src/transformers/models/bamba/configuration_bamba.py Show resolved Hide resolved

zucchini-nlp commented Oct 16, 2025

View reviewed changes

src/transformers/models/efficientloftr/modeling_efficientloftr.py Outdated Show resolved Hide resolved

address my comment

1cba3b8

zucchini-nlp merged commit 10de06d into huggingface:main Oct 17, 2025
22 of 23 checks passed

dg845 mentioned this pull request Oct 21, 2025

Add Photon model and pipeline support huggingface/diffusers#12456

Merged

6 tasks

BakerBunker mentioned this pull request Oct 21, 2025

image 和 video 在同一个batch中会报错 QwenLM/Qwen3-Omni#93

Closed

1 task

remi-or mentioned this pull request Oct 22, 2025

Use indices as position_ids in modernebert #41789

Merged

Sweet-john mentioned this pull request Nov 4, 2025

AttributeError: 'Qwen3OmniMoeCode2WavConfig' object has no attribute 'rope_parameters' QwenLM/Qwen3-Omni#116

Open

1 task

shubhagr-qc mentioned this pull request Nov 4, 2025

KeyError: 'rope_parameters_factor' when using convert_gpt_oss_weights_to_hf.py for openai/gpt-oss-20b #42003

Closed

4 tasks

Aznix07 mentioned this pull request Nov 4, 2025

Fix KeyError in GPT-OSS weight conversion script #42007

Merged

5 tasks

zucchini-nlp added the for_v5? label Nov 26, 2025

This was referenced Nov 26, 2025

CI fails with dev dependencies: HFValidationError: Repo id must use alphanumeric chars huggingface/trl#4323

Closed

Generate tiny models fails for Qwen2.5-VL: KeyError: 'rope_theta' huggingface/trl#4583

Closed

This was referenced Dec 13, 2025

convert : fix gpt-oss ggml-org/llama.cpp#18008

Closed

convert : refactor rope scaling handling ggml-org/llama.cpp#18013

Merged

loci-dev mentioned this pull request Dec 14, 2025

UPSTREAM PR #18013: convert : refactor rope scaling handling auroralabs-loci/llama.cpp#560

Open

🚨 [v5] Refactor RoPE for layer types #39847

🚨 [v5] Refactor RoPE for layer types #39847

Uh oh!

Conversation

zucchini-nlp commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Aug 11, 2025

Uh oh!

github-actions bot commented Oct 16, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zucchini-nlp commented Oct 16, 2025

Uh oh!

zucchini-nlp commented Oct 16, 2025

Uh oh!

github-actions bot commented Oct 16, 2025

Uh oh!

zucchini-nlp commented Oct 16, 2025

Uh oh!

github-actions bot commented Oct 16, 2025

Uh oh!

zucchini-nlp commented Oct 17, 2025

Uh oh!

Uh oh!

BakerBunker commented Oct 21, 2025

Uh oh!

ydshieh commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

zucchini-nlp commented Aug 1, 2025 •

edited

Loading