Add Photon model and pipeline support#12456
Conversation
| print("✓ Created scheduler config") | ||
|
|
||
|
|
||
| def download_and_save_vae(vae_type: str, output_path: str): |
There was a problem hiding this comment.
I'm not sure on this one: I'm saving the VAE weights while they are already available on the Hub (Flux VAE and DC-AE).
Is there a way to avoid storing them and instead look directly for the original ones?
There was a problem hiding this comment.
For now, it's okay to keep this as is. This way, everything is under the same model repo.
| print(f"✓ Saved VAE to {vae_path}") | ||
|
|
||
|
|
||
| def download_and_save_text_encoder(output_path: str): |
There was a problem hiding this comment.
Same here for the Text Encoder.
| print("✓ Created scheduler config") | ||
|
|
||
|
|
||
| def download_and_save_vae(vae_type: str, output_path: str): |
There was a problem hiding this comment.
For now, it's okay to keep this as is. This way, everything is under the same model repo.
sayakpaul
left a comment
There was a problem hiding this comment.
Thanks for the clean PR! I left some initial feedback for you. LMK if that makes sense.
Also, it would be great to see some samples of Photon!
sayakpaul
left a comment
There was a problem hiding this comment.
Thanks! Left a couple more comments. Let's also add the pipeline-level tests.
| <img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/> | ||
| </div> | ||
|
|
||
| Photon is a text-to-image diffusion model using simplified MMDIT architecture with flow matching for efficient high-quality image generation. The model uses T5Gemma as the text encoder and supports either Flux VAE (AutoencoderKL) or DC-AE (AutoencoderDC) for latent compression. |
| return xq_out.reshape(*xq.shape).type_as(xq) | ||
|
|
||
|
|
||
| class PhotonAttnProcessor2_0: |
There was a problem hiding this comment.
Could we write it in a fashion similar to
There was a problem hiding this comment.
I second this suggestion - in particular, I think it would be more in line with other diffusers models implementations to reuse the layers defined in Attention, such as to_q/to_k/to_v, etc. instead of defining them in PhotonBlock (e.g. PhotonBlock.img_qkv_proj), and to keep the entire attention implementation in the PhotonAttnProcessor2_0 class.
Attention supports stuff like QK norms and fusing projections, so that could potentially be reused as well. If you need some custom logic not found in Attention, you could potentially add it in there or create a new Attention-style class like Flux does:
There was a problem hiding this comment.
I made the change and updated both the conversion script and the checkpoints on the hub.
| def __call__( | ||
| self, | ||
| prompt: Union[str, List[str]] = None, | ||
| height: Optional[int] = None, |
There was a problem hiding this comment.
We support passing prompt embeddings too in case users want to supply them precomputed:
4aeccfe to
ff28f65
Compare
stevhliu
left a comment
There was a problem hiding this comment.
Thanks for the docs, remember to add it to the toctree as well!
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: dg845 <[email protected]>
Co-authored-by: dg845 <[email protected]>
Co-authored-by: dg845 <[email protected]>
Co-authored-by: dg845 <[email protected]>
Co-authored-by: dg845 <[email protected]>
Thanks @sayakpaul! Sorry for all these back and forth. |
|
No, all good. I will look into the typing thing further. We should get rid of Hopefully the CI passes through 🤞 |
|
Thanks @sayakpaul! |
This commit adds support for the Photon image generation model:
Some exemples below with the 512 model fine-tuned on the Alchemist dataset and distilled with PAG
What does this PR do?
Fixes # (issue)
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.