-
Notifications
You must be signed in to change notification settings - Fork 14.1k
Closed
Labels
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
Qwen just released Qwen2-VL 2B & 7B under the Apache 2.0 License.
Motivation
SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc.
Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc.
Possible Implementation
No response
AaronFeng753, zhongwei, blaueck, NaiveYan, pinyin and 222 morecrzroot, ilovesusu, AaronFeng753, kac487, Amusingdock25 and 34 moreAaronFeng753, kac487, amirvenus, isr431, WildCatApp and 30 moreAaronFeng753, kac487, isr431, sammcj, elyzionz and 28 moreAaronFeng753, kac487, uestcbraid, mrhalyang, swistaczek and 29 more