Fix a regression in encoder-decoder generation cache initialization by kaixuanliu · Pull Request #46111 · huggingface/transformers

kaixuanliu · 2026-05-20T09:13:10Z

For encoder-decoder models, generate() was passing the decoder config to both the self-attention cache and the cross-attention cache. This is incorrect for models like T5Gemma with decoder sliding-window layers: the cross-attention cache could inherit the decoder sliding-window structure and truncate encoder key/value states, causing FlashAttention generation to crash.

This PR keeps the decoder config for the self-attention cache, but initializes the cross-attention cache without the decoder config so cross-attention remains full-length.

Tested with:

pytest -q -rA tests/models/t5gemma/test_modeling_t5gemma.py::T5GemmaModelTest::test_generate_beyond_sliding_window_with_flash_attn

@vasqu @ArthurZucker @Cyrilvallez, pls help review, thx!

Signed-off-by: Liu, Kaixuan <[email protected]>

Signed-off-by: root <[email protected]>

kaixuanliu · 2026-05-25T12:30:57Z

Move the fix to t5gemma modeling part, and fix device mismatch bug for failed case:
tests/models/t5gemma/test_modeling_t5gemma.py::T5GemmaModelTest::test_model_parallel_beam_search

vasqu

Can we also add a fast test for this? Or at least a faster test?

Left some smaller comments

vasqu · 2026-05-25T12:59:04Z

            # We do not pass the config to the cross attn cache to avoid initializing SWA
            # --> we use full attention between our cross attentions
            past_key_values = EncoderDecoderCache(DynamicCache(config=self.config), DynamicCache())
+        elif (


Imo, we should override similar to t5gemma2 instead see

transformers/src/transformers/models/t5gemma2/modular_t5gemma2.py

Line 1106 in e65c3a2

def _prepare_cache_for_generation(

vasqu · 2026-05-25T13:00:49Z

+            if encoder_attention_mask is None:
+                encoder_attention_mask = torch.ones(
+                    encoder_hidden_states.shape[:2], device=inputs_embeds.device, dtype=torch.bool
+                )


This seems weird to me, any reason this was actually needed? If yes, please add a comment as to why

Well, it is somewhat like a WA here. As if we pass encoder_attention_mask to create_bidirectional_mask, it will hit early_exit in create_bidirectional_mask, and avoid the device mismatch issue afterwards. I think it should be a common issue for cross-attention mask, will fix the bug in masking_utils.py

Signed-off-by: Liu, Kaixuan <[email protected]>

…sformers into cross-aatn-cache

Signed-off-by: Liu, Kaixuan <[email protected]>

kaixuanliu · 2026-05-26T06:11:09Z

@vasqu I have resolved the comments you left, can you help review it again? Thx!

vasqu

LGTM, can we move the mask fix to a different PR. Other than that we can merge then

vasqu · 2026-05-26T13:26:47Z

+    # Use `inputs_embeds.device` to stay consistent with `_preprocess_mask_arguments`, which moves the 2D
+    # `attention_mask` to that device. In model parallel setups, `encoder_hidden_states` may live on a different
+    # device than `inputs_embeds` (e.g. cross-attention from a decoder to encoder states).
+    device = inputs_embeds.device


Can we move this to a separate PR but thanks for fixing, that is a good point

Move to #46221.

Signed-off-by: Liu, Kaixuan <[email protected]>

kaixuanliu · 2026-05-26T14:29:20Z

@vasqu Done.

github-actions · 2026-05-26T14:30:06Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: t5gemma

vasqu · 2026-05-26T15:14:26Z

Thanks! Merging

…uggingface#46111) * Fix a regression in encoder-decoder generation cache initialization Signed-off-by: Liu, Kaixuan <[email protected]> * move the fix to modeling part Signed-off-by: root <[email protected]> * update code Signed-off-by: Liu, Kaixuan <[email protected]> * Override `_prepare_cache_for_generation` Signed-off-by: Liu, Kaixuan <[email protected]> * add related fast test Signed-off-by: Liu, Kaixuan <[email protected]> * revert masking_utils.py change Signed-off-by: Liu, Kaixuan <[email protected]> --------- Signed-off-by: Liu, Kaixuan <[email protected]> Signed-off-by: root <[email protected]> Co-authored-by: root <[email protected]>

kaixuanliu added 2 commits May 20, 2026 09:08

Fix a regression in encoder-decoder generation cache initialization

589406b

Signed-off-by: Liu, Kaixuan <[email protected]>

Merge branch 'main' into cross-aatn-cache

323d999

kaixuanliu marked this pull request as draft May 25, 2026 08:22

root and others added 2 commits May 25, 2026 10:57

move the fix to modeling part

a2fb452

Signed-off-by: root <[email protected]>

Merge branch 'main' into cross-aatn-cache

234fc82

kaixuanliu marked this pull request as ready for review May 25, 2026 12:29

github-actions Bot requested review from ArthurZucker and Rocketknight1 May 25, 2026 12:29

vasqu reviewed May 25, 2026

View reviewed changes

kaixuanliu added 5 commits May 26, 2026 03:17

update code

7a50631

Signed-off-by: Liu, Kaixuan <[email protected]>

Merge branch 'main' into cross-aatn-cache

d07e371

Override _prepare_cache_for_generation

8923de2

Signed-off-by: Liu, Kaixuan <[email protected]>

Merge branch 'cross-aatn-cache' of https://github.com/kaixuanliu/tran…

f9c323e

…sformers into cross-aatn-cache

add related fast test

86981ae

Signed-off-by: Liu, Kaixuan <[email protected]>

vasqu approved these changes May 26, 2026

View reviewed changes

revert masking_utils.py change

de923c9

Signed-off-by: Liu, Kaixuan <[email protected]>

vasqu enabled auto-merge May 26, 2026 15:14

vasqu added this pull request to the merge queue May 26, 2026

Merged via the queue into huggingface:main with commit 90e3c4f May 26, 2026
23 checks passed

kaixuanliu deleted the cross-aatn-cache branch May 27, 2026 02:07

Conversation

kaixuanliu commented May 20, 2026

Uh oh!

kaixuanliu commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu May 25, 2026

Choose a reason for hiding this comment

Uh oh!

kaixuanliu May 26, 2026

Choose a reason for hiding this comment

Uh oh!

vasqu May 25, 2026

Choose a reason for hiding this comment

Uh oh!

kaixuanliu May 26, 2026

Choose a reason for hiding this comment

Uh oh!

kaixuanliu commented May 26, 2026

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu May 26, 2026

Choose a reason for hiding this comment

Uh oh!

kaixuanliu May 26, 2026

Choose a reason for hiding this comment

Uh oh!

kaixuanliu commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

vasqu commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kaixuanliu commented May 25, 2026 •

edited

Loading