Skip to content

Conversation

@a-r-r-o-w
Copy link
Contributor

What does this PR do?

Original issue: huggingface/diffusers#10470

Previous PRs that handled sequential offloading by creating new nn.Parameter around AQT.

Suggestion comes from bdhirsh for improving maintainability: #3332 (comment)

Reproducer
import torch
from diffusers import FluxPipeline, FluxTransformer2DModel, TorchAoConfig

model_id = "black-forest-labs/Flux.1-Dev"
dtype = torch.bfloat16

quantization_config = TorchAoConfig("int8wo")
transformer = FluxTransformer2DModel.from_pretrained(
    model_id,
    subfolder="transformer",
    quantization_config=quantization_config,
    torch_dtype=dtype,
)
pipe = FluxPipeline.from_pretrained(
    model_id,
    transformer=transformer,
    torch_dtype=dtype,
)
# pipe.to("cuda")

pipe.enable_sequential_cpu_offload()
pipe.vae.enable_tiling()

prompt = "A cat holding a sign that says hello world"
image = pipe(prompt, num_inference_steps=4, guidance_scale=0.0).images[0]
image.save("output.png")

@SunMarc @jerryzh168

@a-r-r-o-w a-r-r-o-w requested a review from SunMarc March 14, 2025 23:40
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@SunMarc SunMarc merged commit fb90996 into huggingface:main Apr 8, 2025
25 checks passed
@a-r-r-o-w a-r-r-o-w deleted the torchao-sequential-offload-changes branch April 8, 2025 10:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants