Skip to content

numbmelon/OS-Oracle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models

Overview

OS-Oracle is a comprehensive framework designed for developing cross-platform GUI critic models that span mobile, desktop, and web environments. The framework integrates three key components — data synthesis, model training, and evaluation — to enable consistent and scalable critic model development across diverse GUI platforms.

To facilitate systematic evaluation, we introduce OS-Critic Bench, a unified benchmark for assessing GUI critic models across all platforms. Models trained under the OS-Oracle framework demonstrate strong generalization and reasoning ability, with OS-Oracle-7B achieving state-of-the-art performance among open-sourced VLMs on OS-Critic Bench.

os-orcale-overview

📝TODO List

  • Release data synthesis pipeline
  • Release training datasets
  • Release model checkpoints

OS-Critic Bench

Follow the steps below to use OS-Critic Bench.

1. Download the Benchmark

Clone the dataset from Hugging Face and rename it:

cd os-critic-bench
git clone https://huggingface.co/datasets/OS-Copilot/OS-Critic-Bench


mv OS-Critic-Bench test_jsonl

2. Run the Inference Script

Execute the following command to run inference across all three platforms (Mobile, Desktop, and Web).

Before running the evaluation, make sure that all dependencies for the target model are properly installed and that the script has been correctly configured.

bash run_eval.sh

3. Get the results

After inference is completed, compute the final metrics

python cal_acc.py --jsonl <your_output_file_path>

Citation

If you find this repository helpful, feel free to cite our paper:

@article{wu2025osoracle,
        title={OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models},
        author={Zhenyu Wu and Jingjing Xie and Zehao Li and Bowen Yang and Qiushi Sun and Zhaoyang Liu and Zhoumianze Liu and Yu Qiao and Xiangyu Yue and Zun Wang and Zichen Ding},
        journal={arXiv preprint arXiv:2512.16295},
        year={2025}
      }

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published