[ControlNet] Adds controlnet for SanaTransformer by ishan-modi · Pull Request #11040 · huggingface/diffusers

ishan-modi · 2025-03-12T06:55:19Z

What does this PR do?

Fixes #10772, #11019, #11116

Who can review?

@yiyixuxu

src/diffusers/pipelines/sana/pipeline_sana_controlnet.py

src/diffusers/pipelines/sana/pipeline_sana.py

src/diffusers/models/transformers/sana_transformer.py

src/diffusers/models/controlnets/controlnet_sana.py

scripts/convert_sana_controlnet_to_diffusers.py

a-r-r-o-w · 2025-03-25T13:43:41Z

@ishan-modi Sorry about the slow review here. The team is at an offsite for this week and taking a break, but we'll try to merge asap once we're back next week. Thanks for the awesome work

ishan-modi · 2025-03-26T02:29:25Z

All good man ! enjoy your offsite.

lawrence-cj · 2025-04-06T06:31:40Z

gentle ping @a-r-r-o-w

a-r-r-o-w · 2025-04-07T06:31:37Z

Hi, sorry for the delay. Testing now and hopefully can merge soon 🤗

a-r-r-o-w

Thanks for the awesome work! Looks very close to merge except for a few more changes. LMK if I can help with any 🤗

tests/pipelines/sana/test_sana_controlnet.py

a-r-r-o-w · 2025-04-07T06:54:44Z

src/diffusers/pipelines/sana/pipeline_sana_controlnet.py

+                timestep = t.expand(latent_model_input.shape[0]).to(latents.dtype)
+
+                # controlnet(s) inference
+                controlnet_block_samples = self.controlnet(


The inference example in the docstring errors out for me here. This is because ControlNet is loaded in bf16, but latent_model_input is moved to self.transformer.dtype, which is fp16, due to the following code in encode_prompt:

if self.transformer is not None: dtype = self.transformer.dtype elif self.text_encoder is not None: dtype = self.text_encoder.dtype else: dtype = None prompt_embeds = prompt_embeds.to(dtype=dtype, device=device)

Let's do this:

Remove the above code occurence for determining dtype in encode_prompt

Return the prompt_embeds in the same dtype as the text encoder

Perform any dtype casting within the __call__ method based on the controlnet_dtype and transformer_dtype (create these variables similar to how it's done in Wan

Also, the following are the dtypes of each component:

vae: bf16

text_encoder: bf16

transformer: fp16

controlnet: bf16

Just so that I'm up to speed, is this expected?

Can you point me to the docstring that loads controlnet in bf16 ?

I think I overlooked the following:

controlnet for SANA_600M is supposed to use fp16 here

controlnet for SANA_1600M is supposed to use bf16 here

Most of the doc loads controlnet into fp16, but I guess it needs to be more generic as you mentioned

I'm referring to the example code that's at the top of the file:

import torch from diffusers import SanaControlNetModel, SanaControlNetPipeline from diffusers.utils import load_image controlnet = SanaControlNetModel.from_pretrained( "ishan24/Sana_600M_1024px_ControlNet_diffusers", torch_dtype=torch.bfloat16 ) pipe = SanaControlNetPipeline.from_pretrained( "Efficient-Large-Model/Sana_600M_1024px_diffusers", variant="fp16", torch_dtype=torch.float16, controlnet=controlnet, ) pipe.to("cuda") pipe.vae.to(torch.bfloat16) pipe.text_encoder.to(torch.bfloat16) cond_image = load_image( "https://huggingface.co/ishan24/Sana_600M_1024px_ControlNet_diffusers/resolve/main/hed_example.png" ) prompt = 'a cat with a neon sign that says "Sana"' image = pipe( prompt, control_image=cond_image, ).images[0] image.save("output.png")

I think:

I think we should update the docstring example to use fp16 for both contronet and transformer, unless there is a special reason to do it this way, i.e controlnet in bf16 while transformer in fp16

the changes @a-r-r-o-w proposed here [ControlNet] Adds controlnet for SanaTransformer #11040 (comment) sounds good. I think all the sana pipelines should have same encode_prompt methods, no? if so, let's not remove the #Copied from in encode_prompt, and update the one in pipeline_sana.py and make sure changes applied to all sana pipelines

a-r-r-o-w · 2025-04-07T06:56:17Z

src/diffusers/pipelines/sana/pipeline_sana_controlnet.py

+        if self.transformer is not None:
+            dtype = self.transformer.dtype
+        elif self.text_encoder is not None:
+            dtype = self.text_encoder.dtype
+        else:
+            dtype = None


text_encoder cannot be None. transformer can be None since we should be able to run encode_prompt without loading the transformer.

Let's make sure this method returns embeds in the same dtype as text encoder and do casting in __call__

ohh I think we need to make sure prompt_embeds is not None code path can work without text_encoders loaded too, no?

in modular, we started to make it a way so that you only run encode_prompt when you actually need to encode prompt, that's not the case here yet

@a-r-r-o-w let me know if I should change the current version to

if self.text_encoder is not None: dtype = self.text_encoder.dtype else: dtype = None

Once confirm I will make similar changes to pipeline_sana.py

Oh okay, based on YiYi's comment, let's do this for both pipelines

src/diffusers/pipelines/sana/pipeline_sana_controlnet.py

a-r-r-o-w · 2025-04-07T07:04:37Z

src/diffusers/pipelines/sana/pipeline_sana_controlnet.py

+        >>> from diffusers.utils import load_image
+
+        >>> controlnet = SanaControlNetModel.from_pretrained(
+        ...     "ishan24/Sana_600M_1024px_ControlNet_diffusers", torch_dtype=torch.bfloat16


@lawrence-cj Could we host the controlnet checkpoint in the Efficient-Large-Model org? We generally don't merge without officially hosted weights unless it's necessary for a quick release (which we then update later anyway).

@ishan-modi Please feel free to mention your hosted controlnet model in the docs 🤗

Yah, I would like to do it and test the PR at the same time. @a-r-r-o-w

@a-r-r-o-w We hosted the official ckpt here: https://huggingface.co/Efficient-Large-Model/Sana_600M_1024px_ControlNet_diffusers.

Awesome, thank you @lawrence-cj! I'll run some final tests and merge the PR in a few hours

@lawrence-cj The checkpoint does not seem accessible yet and I get a 404. I'll go ahead and merge this PR for now, and we can update the docs/examples with the official checkpoint in a follow up

src/diffusers/pipelines/sana/pipeline_sana.py

src/diffusers/models/transformers/sana_transformer.py

docs/source/en/api/models/controlnet_sana.md

docs/source/en/api/pipelines/controlnet_sana.md

a-r-r-o-w · 2025-04-07T11:36:47Z

src/diffusers/pipelines/sana/pipeline_sana_controlnet.py

+                timestep = t.expand(latent_model_input.shape[0]).to(latents.dtype)
+
+                # controlnet(s) inference
+                controlnet_block_samples = self.controlnet(


I'm referring to the example code that's at the top of the file:

import torch from diffusers import SanaControlNetModel, SanaControlNetPipeline from diffusers.utils import load_image controlnet = SanaControlNetModel.from_pretrained( "ishan24/Sana_600M_1024px_ControlNet_diffusers", torch_dtype=torch.bfloat16 ) pipe = SanaControlNetPipeline.from_pretrained( "Efficient-Large-Model/Sana_600M_1024px_diffusers", variant="fp16", torch_dtype=torch.float16, controlnet=controlnet, ) pipe.to("cuda") pipe.vae.to(torch.bfloat16) pipe.text_encoder.to(torch.bfloat16) cond_image = load_image( "https://huggingface.co/ishan24/Sana_600M_1024px_ControlNet_diffusers/resolve/main/hed_example.png" ) prompt = 'a cat with a neon sign that says "Sana"' image = pipe( prompt, control_image=cond_image, ).images[0] image.save("output.png")

src/diffusers/pipelines/sana/pipeline_sana_controlnet.py

a-r-r-o-w

Thanks, LGTM! Going to try testing out the model again and we can merge once we have the official hosted checkpoint by Junsong 🤗

docs/source/en/_toctree.yml

src/diffusers/pipelines/sana/pipeline_sana_controlnet.py

a-r-r-o-w · 2025-04-08T20:44:50Z

LMK if you need any help with the failing tests. They seem to be because of a misplace .to() on controlnet outputs, and due to SanaControlNetPipelineOutput being used instead of SanaControlNetOutput in the documentation

utils/check_doc_toc.py

This reverts commit 344ee8a.

…fusers into fixes-issue-10772

HuggingFaceDocBuilderDev · 2025-04-13T13:25:49Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

a-r-r-o-w

Awesome work @ishan-modi, thanks a lot!

Just some final changes

src/diffusers/pipelines/sana/pipeline_sana_sprint.py

src/diffusers/pipelines/sana/pipeline_sana_controlnet.py

src/diffusers/pipelines/sana/pipeline_sana.py

lawrence-cj · 2025-04-14T10:09:53Z

Congrats! Thank you so much @ishan-modi

ishan-modi added 2 commits March 12, 2025 12:16

added controlnet for sana transformer

321193b

improve code quality

1955579

hlky reviewed Mar 12, 2025

View reviewed changes

ishan-modi added 3 commits March 12, 2025 14:09

addressed PR comments

dfc396e

bug fixes

009937b

added test cases

6a62c3e

ishan-modi force-pushed the fixes-issue-10772 branch from eef6240 to 6a62c3e Compare March 13, 2025 15:33

ishan-modi marked this pull request as ready for review March 13, 2025 15:34

ishan-modi changed the title ~~[WIP] Adds ControlNet for SanaTransformer~~ [ControlNet] Adds controlnet for SanaTransformer Mar 13, 2025

ishan-modi mentioned this pull request Mar 17, 2025

add sana-sprint #11074

Merged

lawrence-cj mentioned this pull request Mar 17, 2025

When will the Controlnet pipeline support device_map NVlabs/Sana#189

Open

update

d698d81

ishan-modi force-pushed the fixes-issue-10772 branch from 6c71ca6 to d698d81 Compare March 19, 2025 03:04

added dummy objects

7f3cbc5

ishan-modi force-pushed the fixes-issue-10772 branch from fc00d13 to 7f3cbc5 Compare March 20, 2025 04:06

ishan-modi added 6 commits March 20, 2025 19:14

addressed PR comments

4145f6b

update

62aa0a6

Forcing update

a4b701a

add to docs

4b86649

Merge branch 'main' into fixes-issue-10772

e59208a

code quality

6d01ea0

a-r-r-o-w reviewed Apr 7, 2025

View reviewed changes

a-r-r-o-w added the close-to-merge label Apr 7, 2025

addressed PR comments

3d085a2

ishan-modi force-pushed the fixes-issue-10772 branch from ff92747 to 3d085a2 Compare April 7, 2025 10:51

a-r-r-o-w reviewed Apr 7, 2025

View reviewed changes

addressed PR comments

dea5de5

ishan-modi force-pushed the fixes-issue-10772 branch from 09359e6 to dea5de5 Compare April 7, 2025 12:03

update

e6ba267

a-r-r-o-w approved these changes Apr 8, 2025

View reviewed changes

docs/source/en/_toctree.yml Outdated Show resolved Hide resolved

docs/source/en/_toctree.yml Show resolved Hide resolved

src/diffusers/pipelines/sana/pipeline_sana_controlnet.py Outdated Show resolved Hide resolved

ishan-modi added 3 commits April 9, 2025 12:07

addressed PR comments

5776517

added proper styling

344ee8a

update

b973cd0

ishan-modi force-pushed the fixes-issue-10772 branch from fbe517b to b973cd0 Compare April 9, 2025 09:31

Merge branch 'main' into fixes-issue-10772

e488bd8

a-r-r-o-w reviewed Apr 10, 2025

View reviewed changes

utils/check_doc_toc.py Outdated Show resolved Hide resolved

ishan-modi added 3 commits April 10, 2025 13:33

Revert "added proper styling"

cfe555f

This reverts commit 344ee8a.

manually ordered

86709f2

Merge branch 'fixes-issue-10772' of https://github.com/ishan-modi/dif…

3b8a9c0

…fusers into fixes-issue-10772

a-r-r-o-w approved these changes Apr 13, 2025

View reviewed changes

src/diffusers/pipelines/sana/pipeline_sana_sprint.py Outdated Show resolved Hide resolved

src/diffusers/pipelines/sana/pipeline_sana_controlnet.py Outdated Show resolved Hide resolved

src/diffusers/pipelines/sana/pipeline_sana_controlnet.py Outdated Show resolved Hide resolved

a-r-r-o-w reviewed Apr 13, 2025

View reviewed changes

src/diffusers/pipelines/sana/pipeline_sana.py Outdated Show resolved Hide resolved

Apply suggestions from code review

0b20570

a-r-r-o-w merged commit f1f38ff into huggingface:main Apr 13, 2025
12 checks passed

ishan-modi deleted the fixes-issue-10772 branch April 19, 2025 05:11

Conversation

ishan-modi commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Who can review?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

a-r-r-o-w commented Mar 25, 2025

Uh oh!

ishan-modi commented Mar 26, 2025

Uh oh!

lawrence-cj commented Apr 6, 2025

Uh oh!

a-r-r-o-w commented Apr 7, 2025

Uh oh!

a-r-r-o-w left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ishan-modi Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yiyixuxu Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lawrence-cj Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

a-r-r-o-w left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

a-r-r-o-w commented Apr 8, 2025

Uh oh!

ishan-modi commented Mar 12, 2025 •

edited

Loading

ishan-modi Apr 7, 2025 •

edited

Loading

yiyixuxu Apr 7, 2025 •

edited

Loading

lawrence-cj Apr 7, 2025 •

edited

Loading