GUI-Libra: Training Native GUI Agents to Reason and Act
with Action-aware Supervision and Partially Verifiable RL
Rui Yang1,
Qianhui Wu2,
Zhaoyang Wang3,
Hanyang Chen1,
Ke Yang1,
Hao Cheng2,
Huaxiu Yao3,
Baolin Peng2,
Huan Zhang1,
Jianfeng Gao2,
Tong Zhang1
1UIUC 2Microsoft 3UNC-Chapel Hill
GUI-Libra is a post-training framework that turns open-source VLMs into strong native GUI agents — models that see a screenshot, think step-by-step, and output an executable action, all within a single forward pass.
We find that naively adding chain-of-thought (CoT) to GUI agents hurts grounding accuracy, and that standard RLVR-style training cannot achieve stable offline-to-online performance because GUI rewards are only partially verifiable. GUI-Libra solves both:
| Component | What it does |
|---|---|
| GUI-Libra-81K | 81K-step reasoning dataset with action re-prediction filtering and bounding-box coordinate verification |
| Action-Aware SFT | Mixes reasoning and direct-action data; reweights tokens so the model doesn't forget where to click while learning why to click |
| Conservative RL | KL-regularized GRPO that stays stable under ambiguous rewards, with success-adaptive scaling to tame noisy negative gradients |
The result: GUI-Libra-4B/8B match or outperform GPT-4o/GPT-4.1/GPT-5-mini and 72/32B native models on AndroidWorld, WebArena-Lite-v2, and Online-Mind2Web — without any online data collection.
- Release training code (SFT + RL, supporting both Qwen2.5-VL and Qwen3-VL models)
- Release evaluation code (WebArena-Lite-v2, Online-Mind2Web)
- Release GUI-Libra-81K dataset
- Release model checkpoints (GUI-Libra-3B/4B/7B/8B)
- AndroidWorld evaluation code
- Offline evaluation code (MM-Mind2Web, AndroidControl)
GUI-Libra follows a two-stage post-training pipeline:
Base VLM ──► Action-Aware SFT (ASFT) ──► Conservative RL (GRPO) ──► GUI-Libra
See SFT/README.md for full training instructions.
See EasyR1/README.md for full RL training instructions.
GUI-Libra/
├── SFT/ # Supervised fine-tuning
│ ├── train.py # Main training script
│ ├── src/aguvis/ # Dataset, trainer, constants
│ ├── scripts/ # Training shell scripts
│ │ ├── train_qwen2_5.sh # Qwen2.5-VL (3B/7B)
│ │ └── train_qwen3.sh # Qwen3-VL (4B/8B)
│ ├── data/ # Data config YAMLs
│ └── README.md # SFT documentation
│
├── EasyR1/ # Reinforcement learning (based on EasyR1/veRL)
│ ├── verl/ # RL training framework
│ ├── examples/
│ │ ├── gui_grpo.sh # Qwen2.5-VL GRPO training
│ │ ├── gui_grpo_qwen3.sh # Qwen3-VL GRPO training
│ │ ├── reward_function/ # GUI reward functions
│ │ └── README.md # RL documentation
│ └── README.md # EasyR1 framework docs
│
├── evaluation/ # Evaluation benchmarks
│ ├── WebArenaLiteV2/ # WebArena-Lite-v2 evaluation
│ ├── online-mind2web-eval/ # Online-Mind2Web evaluation
│ ├── android_world_seeact_v/ # AndroidWorld evaluation
│ └── offline_evaluation/ # Offline (MM-Mind2Web, AndroidControl)
│
└── images/ # Project assets
git clone https://github.com/GUI-Libra/GUI-Libra.git
cd GUI-Libracd SFT
bash setup.sh # install dependencies
export DATA_ROOT=/path/to/your/datasets # set data root
bash scripts/train_qwen2_5.sh # train Qwen2.5-VL
# or
bash scripts/train_qwen3.sh # train Qwen3-VLcd EasyR1
pip install -e .
# Edit examples/gui_grpo.sh to set MODEL_PATH, TRAIN_FILES, VAL_FILES
bash examples/gui_grpo.sh # Qwen2.5-VL GRPO
# or
bash examples/gui_grpo_qwen3.sh # Qwen3-VL GRPOcd evaluation/WebArenaLiteV2
bash setup_env.sh # set up Docker environments
python launcher/start.py # start web environments
# Serve model via vLLM, then:
python agent_run.py \
--platform web \
--env_config_path config/env/web.yaml \
--agent_config_path config/agent/<agent_config>.yaml \
--task_config_path tasks/ \
--num_workers 8 \
--max_steps 15cd evaluation/online-mind2web-eval
uv venv --python 3.11 && source .venv/bin/activate
uv pip install -e .
# Serve model via vLLM, then:
python run.py \
--tasks_path configs/mind2web.300.jsonl \
--gpt.model <model_name> \
--gpt.openai_api_base http://localhost:20001/v1 \
--num_processes 4cd evaluation/android_world_seeact_v
# Launch 15 Android emulators via Docker
docker compose up -d
# Serve model via vLLM, then edit run.sh and run:
bash run.shSee evaluation/android_world_seeact_v/README.md for detailed setup instructions.
cd evaluation/offline_evaluation
# See README for data download and plan_gen_guilibra.sh usageSee evaluation/offline_evaluation/README.md for data setup, planning scripts, and evaluation pipelines.
Each training sample follows a unified structured format:
Input: system prompt + user instruction + interaction history + screenshot
Output:
<think>
Reasoning about the current UI state, reflecting on progress,
and planning the next action...
</think>
<answer>
{
"action_description": "brief description of the action",
"action_type": "Click",
"value": "None",
"point_2d": [x, y]
}
</answer>
Note
We use <thinking></thinking> for Qwen3-based models instead of <think></think>.
Supported action types: Click, Write, Terminate, Swipe, Scroll, NavigateHome, Answer, Wait, OpenAPP, NavigateBack, KeyboardPress, LongPress, Select.
@misc{yang2026guilibratrainingnativegui,
title={GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL},
author={Rui Yang and Qianhui Wu and Zhaoyang Wang and Hanyang Chen and Ke Yang and Hao Cheng and Huaxiu Yao and Baoling Peng and Huan Zhang and Jianfeng Gao and Tong Zhang},
year={2026},
eprint={2602.22190},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2602.22190},
}This project builds upon the following excellent work:
- EasyR1 — RL training framework
- AGUVIS — GUI agent framework and data
- ScaleCUA — WebArena-Lite-v2 evaluation
- WebArena — Web environment
- Online-Mind2Web — Online evaluation benchmark
- UGround — Evaluation on MM-Mind2Web, AndroidControl, and AndroidWorld
This project is released under the MIT License.
