train

LLaVA (the initial UGround V1 introduced by the initial paper)

Models based on Qwen2-VL, which is a stronger model backbone on grounding and GUI tasks

We trained a model based on Qwen-VL for controlled comparison to SeeClick, with only the Web-Hybrid dataset for training.