Skip to content

Conversation

@zucchini-nlp
Copy link
Member

@zucchini-nlp zucchini-nlp commented Sep 15, 2025

What does this PR do?

Adds any-to-any as a pipeline and in auto classes so that we can have a single mapping for all multimodal models. The model mapping is almost same as image-text-to-text, with inclusion of audio-LLM and omni-LLM. I hope I added all audio models, but lmk if anything is missing from recent ones

Fixes #40302 and fixes #37794

Copy link
Contributor

@merveenoyan merveenoyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from what I understand in the code what we do is being able to load an any-to-any model and still being able to do what we do with image-text-to-text tasks with it, for me it's a bit confusing but if we write the docs well it should be ok!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp
Copy link
Member Author

@merveenoyan if you have time to look on the docs section, your advice will be appreciated. Do you think there is anything we should add or highlight? I added basic functionality with examples for now

@zucchini-nlp
Copy link
Member Author

Oke, I think this one is ready now, as long as CI turns green

@zucchini-nlp zucchini-nlp changed the title Any to any pipeline Any to any pipeline and auto-mapping Sep 16, 2025
@jackzhxng
Copy link
Contributor

Thank you! Looking forward to this getting merged 🙏🏻

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Left a few comments! Tagging @ArthurZucker as well, as the names we choose for the pipelines and mappings are important here - we will likely get stuck with them for some time so let's make sure we like them/they are descriptives enough!

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall very nice, naming not sure yet!

Copy link
Contributor

@jackzhxng jackzhxng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solves our use case perfectly, also output_modalities is very useful to have. Thanks @zucchini-nlp 🙏🏻

Leaving to @ArthurZucker and @Cyrilvallez for approval

@zucchini-nlp
Copy link
Member Author

Test failures not related!

@zucchini-nlp
Copy link
Member Author

Test failures not related, kind ping @ArthurZucker whenever you have time

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very very nice, sorry that it took so long to come back to it!
Fan of in/out modalities! Shaping well!

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: aimv2, align, altclip, aria, audioflamingo3, auto, autoformer, aya_vision, bark, beit, bit, blip, blip_2, blt, bridgetower, chameleon

@zucchini-nlp zucchini-nlp merged commit 55b1400 into huggingface:main Nov 27, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add audio-text-to-text task AutoModel cant load Qwen/Qwen2.5-0mni-7B

6 participants