Skip to content

GUI-Libra/GUI-Libra

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GUI-Libra Logo

GUI-Libra: Training Native GUI Agents to Reason and Act
with Action-aware Supervision and Partially Verifiable RL

Project Page Paper Code Models&Datasets

Rui Yang1, Qianhui Wu2, Zhaoyang Wang3, Hanyang Chen1, Ke Yang1, Hao Cheng2,
Huaxiu Yao3, Baolin Peng2, Huan Zhang1, Jianfeng Gao2, Tong Zhang1
1UIUC   2Microsoft   3UNC-Chapel Hill


Overview

GUI-Libra is a post-training framework that turns open-source VLMs into strong native GUI agents — models that see a screenshot, think step-by-step, and output an executable action, all within a single forward pass.

We find that naively adding chain-of-thought (CoT) to GUI agents hurts grounding accuracy, and that standard RLVR-style training cannot achieve stable offline-to-online performance because GUI rewards are only partially verifiable. GUI-Libra solves both:

Component What it does
GUI-Libra-81K 81K-step reasoning dataset with action re-prediction filtering and bounding-box coordinate verification
Action-Aware SFT Mixes reasoning and direct-action data; reweights tokens so the model doesn't forget where to click while learning why to click
Conservative RL KL-regularized GRPO that stays stable under ambiguous rewards, with success-adaptive scaling to tame noisy negative gradients

The result: GUI-Libra-4B/8B match or outperform GPT-4o/GPT-4.1/GPT-5-mini and 72/32B native models on AndroidWorld, WebArena-Lite-v2, and Online-Mind2Web — without any online data collection.

To Do List

  • Release training code (SFT + RL, supporting both Qwen2.5-VL and Qwen3-VL models)
  • Release evaluation code (WebArena-Lite-v2, Online-Mind2Web)
  • Release GUI-Libra-81K dataset
  • Release model checkpoints (GUI-Libra-3B/4B/7B/8B)
  • AndroidWorld evaluation code
  • Offline evaluation code (MM-Mind2Web, AndroidControl)

Training Pipeline

GUI-Libra follows a two-stage post-training pipeline:

Base VLM ──► Action-Aware SFT (ASFT) ──► Conservative RL (GRPO) ──► GUI-Libra

Stage 1: Action-Aware Supervised Fine-Tuning

See SFT/README.md for full training instructions.

Stage 2: Reinforcement Learning with Partial Verifiable Rewards

See EasyR1/README.md for full RL training instructions.

Project Structure

GUI-Libra/
├── SFT/                          # Supervised fine-tuning
│   ├── train.py                  # Main training script
│   ├── src/aguvis/               # Dataset, trainer, constants
│   ├── scripts/                  # Training shell scripts
│   │   ├── train_qwen2_5.sh      # Qwen2.5-VL (3B/7B)
│   │   └── train_qwen3.sh        # Qwen3-VL (4B/8B)
│   ├── data/                     # Data config YAMLs
│   └── README.md                  # SFT documentation
│
├── EasyR1/                       # Reinforcement learning (based on EasyR1/veRL)
│   ├── verl/                     # RL training framework
│   ├── examples/
│   │   ├── gui_grpo.sh           # Qwen2.5-VL GRPO training
│   │   ├── gui_grpo_qwen3.sh     # Qwen3-VL GRPO training
│   │   ├── reward_function/      # GUI reward functions
│   │   └── README.md             # RL documentation
│   └── README.md                 # EasyR1 framework docs
│
├── evaluation/                   # Evaluation benchmarks
│   ├── WebArenaLiteV2/           # WebArena-Lite-v2 evaluation
│   ├── online-mind2web-eval/     # Online-Mind2Web evaluation
│   ├── android_world_seeact_v/   # AndroidWorld evaluation
│   └── offline_evaluation/        # Offline (MM-Mind2Web, AndroidControl)
│
└── images/                       # Project assets

Getting Started

1. Clone the Repository

git clone https://github.com/GUI-Libra/GUI-Libra.git
cd GUI-Libra

2. SFT Training

cd SFT
bash setup.sh                                    # install dependencies
export DATA_ROOT=/path/to/your/datasets           # set data root
bash scripts/train_qwen2_5.sh                     # train Qwen2.5-VL
# or
bash scripts/train_qwen3.sh                       # train Qwen3-VL

3. RL Training

cd EasyR1
pip install -e .
# Edit examples/gui_grpo.sh to set MODEL_PATH, TRAIN_FILES, VAL_FILES
bash examples/gui_grpo.sh                         # Qwen2.5-VL GRPO
# or
bash examples/gui_grpo_qwen3.sh                   # Qwen3-VL GRPO

4. Evaluation

WebArena-Lite-v2

cd evaluation/WebArenaLiteV2
bash setup_env.sh                                 # set up Docker environments
python launcher/start.py                          # start web environments
# Serve model via vLLM, then:
python agent_run.py \
    --platform web \
    --env_config_path config/env/web.yaml \
    --agent_config_path config/agent/<agent_config>.yaml \
    --task_config_path tasks/ \
    --num_workers 8 \
    --max_steps 15

Online-Mind2Web

cd evaluation/online-mind2web-eval
uv venv --python 3.11 && source .venv/bin/activate
uv pip install -e .
# Serve model via vLLM, then:
python run.py \
    --tasks_path configs/mind2web.300.jsonl \
    --gpt.model <model_name> \
    --gpt.openai_api_base http://localhost:20001/v1 \
    --num_processes 4

AndroidWorld

cd evaluation/android_world_seeact_v
# Launch 15 Android emulators via Docker
docker compose up -d
# Serve model via vLLM, then edit run.sh and run:
bash run.sh

See evaluation/android_world_seeact_v/README.md for detailed setup instructions.

Offline evaluation (MM-Mind2Web, AndroidControl)

cd evaluation/offline_evaluation
# See README for data download and plan_gen_guilibra.sh usage

See evaluation/offline_evaluation/README.md for data setup, planning scripts, and evaluation pipelines.

Data Format

Each training sample follows a unified structured format:

Input: system prompt + user instruction + interaction history + screenshot

Output:

<think>
Reasoning about the current UI state, reflecting on progress,
and planning the next action...
</think>
<answer>
{
  "action_description": "brief description of the action",
  "action_type": "Click",
  "value": "None",
  "point_2d": [x, y]
}
</answer>

Note

We use <thinking></thinking> for Qwen3-based models instead of <think></think>.

Supported action types: Click, Write, Terminate, Swipe, Scroll, NavigateHome, Answer, Wait, OpenAPP, NavigateBack, KeyboardPress, LongPress, Select.

Citation

@misc{yang2026guilibratrainingnativegui,
      title={GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL}, 
      author={Rui Yang and Qianhui Wu and Zhaoyang Wang and Hanyang Chen and Ke Yang and Hao Cheng and Huaxiu Yao and Baoling Peng and Huan Zhang and Jianfeng Gao and Tong Zhang},
      year={2026},
      eprint={2602.22190},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.22190}, 
}

Acknowledgements

This project builds upon the following excellent work:

  • EasyR1 — RL training framework
  • AGUVIS — GUI agent framework and data
  • ScaleCUA — WebArena-Lite-v2 evaluation
  • WebArena — Web environment
  • Online-Mind2Web — Online evaluation benchmark
  • UGround — Evaluation on MM-Mind2Web, AndroidControl, and AndroidWorld

License

This project is released under the MIT License.

About

Official code for paper "GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors