[core] reuse `AttentionMixin` for compatible classes #12463

sayakpaul · 2025-10-10T15:32:47Z

What does this PR do?

Many models use "# Copied from ..." implementations of attn_processors and set_attn_processor. They are basically the same as what we have implemented in

diffusers/src/diffusers/models/attention.py

Line 39 in 693d8a3

class AttentionMixin:

This PR makes those models inherit from AttentionMixin and removes the copied-over implementations.

I decided to leave fuse_qkv_projections and unfuse_qkv_projections out of this PR because some models don't have attention processors implemented in a way that would make this seamless. But the methods removed in this PR should be very harmless.

HuggingFaceDocBuilderDev · 2025-10-11T03:25:43Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

src/diffusers/pipelines/audioldm2/modeling_audioldm2.py

src/diffusers/models/transformers/auraflow_transformer_2d.py

src/diffusers/pipelines/audioldm2/modeling_audioldm2.py

dg845 · 2025-10-14T01:30:09Z

src/diffusers/models/autoencoders/autoencoder_kl.py

-        for name, module in self.named_children():
-            fn_recursive_attn_processor(name, module, processor)
-
    # Copied from diffusers.models.unets.unet_2d_condition.UNet2DConditionModel.set_default_attn_processor


Perhaps it's out of the scope for this PR, but I see that a lot of models additionally have a set_default_attn_processor method, usually # Copied from diffusers.models.unets.unet_2d_condition.UNet2DConditionModel.set_default_attn_processor. Do you think it makes sense to add this method to AttentionMixin?

IMO, not yet since AttentionMixin is fairly agnostic to the model-type but set_default_attn_processor relies on some custom attention processor types. For UNet2DConditionModel, we have:

diffusers/src/diffusers/models/unets/unet_2d_condition.py

Lines 762 to 769 in fa468c5

if all(proc.__class__ in ADDED_KV_ATTENTION_PROCESSORS for proc in self.attn_processors.values()):

processor = AttnAddedKVProcessor()

elif all(proc.__class__ in CROSS_ATTENTION_PROCESSORS for proc in self.attn_processors.values()):

processor = AttnProcessor()

else:

raise ValueError(

f"Cannot call `set_default_attn_processor` when attention processors are of type {next(iter(self.attn_processors.values()))}"

)

However, for AutoencoderKL Temporal Decoder:

diffusers/src/diffusers/models/autoencoders/autoencoder_kl_temporal_decoder.py

Lines 269 to 274 in fa468c5

if all(proc.__class__ in CROSS_ATTENTION_PROCESSORS for proc in self.attn_processors.values()):

processor = AttnProcessor()

else:

raise ValueError(

f"Cannot call `set_default_attn_processor` when attention processors are of type {next(iter(self.attn_processors.values()))}"

)

I'd be down to the refactoring, though. Cc: @DN6

dg845

Looks good to me! I think AuraFlowTransformer2DModel and AudioLDM2UNet2DConditionModel have their attn_processor/set_attn_processor methods deleted but are missing the corresponding change to inherit from AttentionMixin.

sayakpaul · 2025-10-14T03:41:57Z

Thanks for those catches, @dg845. Should have been fixed by now.

dg845 · 2025-10-14T04:11:28Z

LGTM :)

sayakpaul · 2025-10-24T16:54:05Z

@DN6 okay to go?

sayakpaul · 2025-11-11T02:28:58Z

@DN6 a gentle ping.

sayakpaul added 2 commits October 10, 2025 20:49

remove attn_processors property

06d4fb5

more

2c9b7c9

sayakpaul added the great-unbloating label Oct 10, 2025

up

f4a2bde

sayakpaul requested review from DN6 and dg845 October 10, 2025 15:39

up more.

c04b016

sayakpaul marked this pull request as draft October 10, 2025 16:18

Merge branch 'main' into reuse-attn-mixin

339f9e5

up

6dbee1e

sayakpaul marked this pull request as ready for review October 11, 2025 03:47

dg845 reviewed Oct 14, 2025

View reviewed changes

src/diffusers/pipelines/audioldm2/modeling_audioldm2.py Show resolved Hide resolved

dg845 reviewed Oct 14, 2025

View reviewed changes

src/diffusers/models/transformers/auraflow_transformer_2d.py Show resolved Hide resolved

dg845 reviewed Oct 14, 2025

View reviewed changes

src/diffusers/pipelines/audioldm2/modeling_audioldm2.py Show resolved Hide resolved

dg845 reviewed Oct 14, 2025

View reviewed changes

dg845 approved these changes Oct 14, 2025

View reviewed changes

sayakpaul added 3 commits October 14, 2025 09:03

add AttentionMixin to AuraFlow.

95fcb53

Merge branch 'main' into reuse-attn-mixin

98d2123

up

fabc15d

dg845 mentioned this pull request Oct 14, 2025

[ci] xfail more incorrect transformer imports. #12455

Merged

Merge branch 'main' into reuse-attn-mixin

20dd719

dg845 mentioned this pull request Oct 15, 2025

[tests] fix clapconfig for text backbone in audioldm2 #12490

Merged

sayakpaul added 5 commits October 17, 2025 07:54

Merge branch 'main' into reuse-attn-mixin

3ad6a9a

Merge branch 'main' into reuse-attn-mixin

3a5001e

up

1d4c70b

Merge branch 'main' into reuse-attn-mixin

21cab86

Merge branch 'main' into reuse-attn-mixin

4622c11

Merge branch 'main' into reuse-attn-mixin

12ba3fc

sayakpaul added 4 commits October 24, 2025 06:58

up

8b03bce

Merge branch 'main' into reuse-attn-mixin

a982bdd

resolve conflicts.

b3dc1b9

Merge branch 'main' into reuse-attn-mixin

c063247

sayakpaul added 6 commits November 19, 2025 08:24

Merge branch 'main' into reuse-attn-mixin

a5f1b83

Merge branch 'main' into reuse-attn-mixin

a1148b9

Merge branch 'main' into reuse-attn-mixin

ff1ab1a

up

865fe1b

Merge branch 'main' into reuse-attn-mixin

45ef48b

Merge branch 'main' into reuse-attn-mixin

43ff55f

DN6 approved these changes Dec 3, 2025

View reviewed changes

sayakpaul added 3 commits December 3, 2025 15:32

Merge branch 'main' into reuse-attn-mixin

4baca46

up

31ebc16

Merge branch 'main' into reuse-attn-mixin

4551f49

sayakpaul merged commit 759ea58 into main Dec 3, 2025
14 of 15 checks passed

sayakpaul deleted the reuse-attn-mixin branch December 3, 2025 08:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[core] reuse `AttentionMixin` for compatible classes #12463

[core] reuse `AttentionMixin` for compatible classes #12463

Uh oh!

sayakpaul commented Oct 10, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dg845 Oct 14, 2025

Uh oh!

sayakpaul Oct 14, 2025

Uh oh!

dg845 left a comment

Uh oh!

sayakpaul commented Oct 14, 2025

Uh oh!

dg845 commented Oct 14, 2025

Uh oh!

sayakpaul commented Oct 24, 2025

Uh oh!

sayakpaul commented Nov 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	if all(proc.__class__ in ADDED_KV_ATTENTION_PROCESSORS for proc in self.attn_processors.values()):
	processor = AttnAddedKVProcessor()
	elif all(proc.__class__ in CROSS_ATTENTION_PROCESSORS for proc in self.attn_processors.values()):
	processor = AttnProcessor()
	else:
	raise ValueError(
	f"Cannot call `set_default_attn_processor` when attention processors are of type {next(iter(self.attn_processors.values()))}"
	)

	if all(proc.__class__ in CROSS_ATTENTION_PROCESSORS for proc in self.attn_processors.values()):
	processor = AttnProcessor()
	else:
	raise ValueError(
	f"Cannot call `set_default_attn_processor` when attention processors are of type {next(iter(self.attn_processors.values()))}"
	)

[core] reuse AttentionMixin for compatible classes #12463

[core] reuse AttentionMixin for compatible classes #12463

Uh oh!

Conversation

sayakpaul commented Oct 10, 2025

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Oct 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dg845 Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

sayakpaul Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

dg845 left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented Oct 14, 2025

Uh oh!

dg845 commented Oct 14, 2025

Uh oh!

sayakpaul commented Oct 24, 2025

Uh oh!

sayakpaul commented Nov 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[core] reuse `AttentionMixin` for compatible classes #12463

[core] reuse `AttentionMixin` for compatible classes #12463