LLaVA (the initial UGround V1 introduced by the initial paper) Qwen2-VL: Models based on Qwen2-VL, which is a stronger model backbone on grounding and GUI tasks Qwen-VL: We trained a model based on Qwen-VL for controlled comparison to SeeClick, with only the Web-Hybrid dataset for training.