Skip to content

Leon022/GUI-Rise-code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[NeurIPS 2025] GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation

📄 Read the Paper on arXiv

arXiv Python 3.11 License

GUI-Rise is an agent designed for GUI navigation with enhanced reasoning capabilities. It employs a three-stage sub-task framework that mimics the human "think-act-summarize" decision-making process, ensuring that the agent makes optimal decisions at each step based on sufficient historical information.


📚 Table of Contents


🚀 Quick Start

Environment Setup

This project has two separate environments for SFT/Evaluation and RL.

1. SFT (Supervised Fine-Tuning) & Eval Environment

The environment for SFT and Evaluation is self-contained. Please navigate to the sft directory to set it up:

# Enter the sft directory
cd sft

# Create a conda environment
conda create -n gui-rise-sft python=3.11
conda activate gui-rise-sft

# Install dependencies
pip install -r requirements.txt
2. Reinforcement Learning (RL) Environment

For Reinforcement Learning, we utilize the verl framework. Please navigate to the rl directory and follow the instructions in its dedicated README.md to create the environment.

# Enter the rl directory
cd rl

# Follow the setup instructions in rl/README.md

💾 Data Preparation

Navigation Datasets

  1. GUIAct: Download from Hugging Face and use our prepare/hf_guiact.ipynb to create metadata for each split (i.e., web, mobile).
  2. Other Datasets: Set up Mind2Web, AITW, and MiniWoB by following SeeClick's Instructions. Then, use our provided scripts (prepare/hf_mind2web.py, prepare/hf_aitw.py, prepare/hf_miniwob.py) to process them and generate the metadata.

Directory Structure

After completing these steps, your dataset directory should be organized as follows:

$_DATA_DIR/
    ├── GUI_Course/
    │   └── GUIAct/
    │       ├── images/
    │       └── metadata/
    ├── Mind2Web/
    │   ├── images/
    │   └── metadata/
    ├── AITW/
    │   ├── images/
    │   └── metadata/
    └── MiniWob/
        ├── images/
        └── metadata/

🏋️ Training

1. Supervised Fine-Tuning (SFT)

The SFT stage teaches the model basic reasoning and history summarization skills. For detailed instructions on data generation and the training process, please refer to the dedicated guide:

➡️ SFT Training README

2. Reinforcement Learning (RL)

In the RL stage, we use Group Relative Policy Optimization (GRPO) to further optimize the model. For detailed instructions on data generation and the training process, please refer to the dedicated guide:

➡️ RL Training README


🧪 Evaluation

The evaluation stage tests the model's performance on various benchmark test sets.

  • Start Evaluation:
    # Activate the SFT/Eval environment
    conda activate gui-rise-sft
    
    # Navigate to the sft directory and run the script
    cd sft
    bash scripts/eval.sh

🙏 Acknowledgement

We would like to express our sincere gratitude to the contributors of the open-source projects and datasets used in this project, especially:

  • ShowUI for their foundational work.
  • verl for the reinforcement learning framework.
  • SeeClick for their clear dataset instructions.

✒️ Citation

If you use GUI-Rise in your research, please cite our paper:

@article{liu2025gui,
  title={GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation},
  author={Liu, Tao and Wang, Chongyu and Li, Rongjie and Yu, Yingchen and He, Xuming and Song, Bai},
  journal={arXiv preprint arXiv:2510.27210},
  year={2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published