GUI-Rise is an agent designed for GUI navigation with enhanced reasoning capabilities. It employs a three-stage sub-task framework that mimics the human "think-act-summarize" decision-making process, ensuring that the agent makes optimal decisions at each step based on sufficient historical information.
This project has two separate environments for SFT/Evaluation and RL.
1. SFT (Supervised Fine-Tuning) & Eval Environment
The environment for SFT and Evaluation is self-contained. Please navigate to the sft directory to set it up:
# Enter the sft directory
cd sft
# Create a conda environment
conda create -n gui-rise-sft python=3.11
conda activate gui-rise-sft
# Install dependencies
pip install -r requirements.txt2. Reinforcement Learning (RL) Environment
For Reinforcement Learning, we utilize the verl framework. Please navigate to the rl directory and follow the instructions in its dedicated README.md to create the environment.
# Enter the rl directory
cd rl
# Follow the setup instructions in rl/README.md- GUIAct: Download from Hugging Face and use our
prepare/hf_guiact.ipynbto create metadata for each split (i.e., web, mobile). - Other Datasets: Set up Mind2Web, AITW, and MiniWoB by following SeeClick's Instructions. Then, use our provided scripts (
prepare/hf_mind2web.py,prepare/hf_aitw.py,prepare/hf_miniwob.py) to process them and generate the metadata.
After completing these steps, your dataset directory should be organized as follows:
$_DATA_DIR/
├── GUI_Course/
│ └── GUIAct/
│ ├── images/
│ └── metadata/
├── Mind2Web/
│ ├── images/
│ └── metadata/
├── AITW/
│ ├── images/
│ └── metadata/
└── MiniWob/
├── images/
└── metadata/
The SFT stage teaches the model basic reasoning and history summarization skills. For detailed instructions on data generation and the training process, please refer to the dedicated guide:
In the RL stage, we use Group Relative Policy Optimization (GRPO) to further optimize the model. For detailed instructions on data generation and the training process, please refer to the dedicated guide:
The evaluation stage tests the model's performance on various benchmark test sets.
- Start Evaluation:
# Activate the SFT/Eval environment conda activate gui-rise-sft # Navigate to the sft directory and run the script cd sft bash scripts/eval.sh
We would like to express our sincere gratitude to the contributors of the open-source projects and datasets used in this project, especially:
- ShowUI for their foundational work.
- verl for the reinforcement learning framework.
- SeeClick for their clear dataset instructions.
If you use GUI-Rise in your research, please cite our paper:
@article{liu2025gui,
title={GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation},
author={Liu, Tao and Wang, Chongyu and Li, Rongjie and Yu, Yingchen and He, Xuming and Song, Bai},
journal={arXiv preprint arXiv:2510.27210},
year={2025}
}