-
Notifications
You must be signed in to change notification settings - Fork 31.4k
Add LFM2-VL support #40259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LFM2-VL support #40259
Conversation
zucchini-nlp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay happy to see a VLM release from LiquidAI! I left a few comments to refine and clean up the PR. Would be nice to use modular because the model arch is very similar to existing VLMs and it makes review process easier//faster
| def _smart_resize( | ||
| self, | ||
| image: Image.Image, | ||
| downsample_factor: int, | ||
| min_image_tokens: int, | ||
| max_image_tokens: int, | ||
| encoder_patch_size: int, | ||
| ) -> Image.Image: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can use modular and copy LfmV2ImageProcessor from Qwen-VL with minor changes, since that looks to be the closest processor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid there will be many changes to the Qwen-VL implementation as we treat images up to 512x512 pixels differently from larges ones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oke, maybe qwen-vl isn't much close to LFM-VL. It is nice to try to copy from similar processors if any, but we can make a separate class if there isn't any similar processor
In second case, we don't need modular and it is easier to just keep it as is in processing_xxx.py
| return list(dict.fromkeys(image_processor_input_names + tokenizer_input_names)) | ||
|
|
||
|
|
||
| __all__ = ["Lfm2VlProcessor"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice if you can add a few helpers here to make the model vLLM compatible ootb? We have a doc page on which helpers are needed here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that our backbone is a hybrid model and we're not sure if this functionality is supported, could we postpone vLLM integration until next update?
|
@zucchini-nlp thank you for the review! sorry, the PR was still a draft and wasn't quite ready. I have addressed most of your comments. Some model and processor tests are failing, due to failing language backbone tests and some kwargs merging tests, that I'm not sure how to resolve |
|
Ah yeah, just wanted to do a preliminary review for general format. No worries, ping me when you need another review :) The tests seem to be failing due to typing |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, lfm2_vl |
|
@zucchini-nlp Hi, let me know if any more changes are required. I'd appreciate your help with resolving some of the failing CI.
|
|
merged in #40624 |
Add support for LFM2-VL models.
LFM2‑VL is Liquid AI's first series of multimodal models, designed to process text and images with variable resolutions. Built on the LFM2 backbone, it is optimized for low-latency and edge AI applications.
Checkpoints available here.