-
Notifications
You must be signed in to change notification settings - Fork 31.4k
Any to any pipeline and auto-mapping #40884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
merveenoyan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from what I understand in the code what we do is being able to load an any-to-any model and still being able to do what we do with image-text-to-text tasks with it, for me it's a bit confusing but if we write the docs well it should be ok!
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@merveenoyan if you have time to look on the docs section, your advice will be appreciated. Do you think there is anything we should add or highlight? I added basic functionality with examples for now |
|
Oke, I think this one is ready now, as long as CI turns green |
|
Thank you! Looking forward to this getting merged 🙏🏻 |
Cyrilvallez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Left a few comments! Tagging @ArthurZucker as well, as the names we choose for the pipelines and mappings are important here - we will likely get stuck with them for some time so let's make sure we like them/they are descriptives enough!
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall very nice, naming not sure yet!
jackzhxng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solves our use case perfectly, also output_modalities is very useful to have. Thanks @zucchini-nlp 🙏🏻
Leaving to @ArthurZucker and @Cyrilvallez for approval
|
Test failures not related! |
|
Test failures not related, kind ping @ArthurZucker whenever you have time |
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very very nice, sorry that it took so long to come back to it!
Fan of in/out modalities! Shaping well!
|
[For maintainers] Suggested jobs to run (before merge) run-slow: aimv2, align, altclip, aria, audioflamingo3, auto, autoformer, aya_vision, bark, beit, bit, blip, blip_2, blt, bridgetower, chameleon |
What does this PR do?
Adds any-to-any as a pipeline and in auto classes so that we can have a single mapping for all multimodal models. The model mapping is almost same as image-text-to-text, with inclusion of audio-LLM and omni-LLM. I hope I added all audio models, but lmk if anything is missing from recent ones
Fixes #40302 and fixes #37794