π Homepage β’ π Paper β’ π€ Data β’ ποΈ Code
To use SyncMind and SyncBench for agent out-of-sync recovery:
git clone https://github.com/xhguo7/SyncMind.git
Setup environment for SyncMind:
-
We are using OpenHands to implement interactive codebase environments for agent out-of-sync recovery.
- Miniconda env setup: may refer to Development.md for further details
# Download and install Mamba (a faster version of conda) curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh" bash Miniforge3-$(uname)-$(uname -m).sh # Install Python 3.12, nodejs, and poetry mamba install python=3.12 mamba install conda-forge::nodejs mamba install conda-forge::poetry
- Miniconda env setup: may refer to Development.md for further details
-
Our experiments in paper are conducted on OpenHands
- Can directly use SyncMind on OpenHands 0.10.0:
- Quick Use: May directly use the entire framework
cd SyncMind/syncmind/framework/OpenHands - OR: May clone OpenHands 0.10.0 to your desired local path
git clone https://github.com/xhguo7/OpenHands10.git cp -rp SyncMind/syncmind/framework/syncmind OpenHands10/evaluation/
- Quick Use: May directly use the entire framework
- If run SyncMind on latest OpenHands
- May need to modify several relative paths of imports
- We will do our best to maintain the synchronized version of SyncMind that can be compatible with the latest OpenHands
- Our latest version syncs with OpenHands 0.27.0
- Check our recent updates at SyncMind.md
- We will save updated versions of SyncMind to the following directory:
cd SyncMind/syncmind/updates
- Can directly use SyncMind on OpenHands 0.10.0:
-
Quick Install:
conda env create -f environment.yml conda activate syncmind -
Env Setup:
cd your_desired_root_dir git clone https://github.com/xhguo7/SyncMind.git cd SyncMind python -m pip install -e .
-
Prepare data
- SyncBench
- In our current version, SyncBench is built upon 21 popular GitHub repositories.
- For computational efficiency, we also downsampled an evaluation subset comprising 300 instances for agent out-of-sync evaluation.
- Load SyncBench
from datasets import load_dataset dataset = load_dataset("xuehang/SyncBench")
- SyncBench
-
Run SyncMind
cd SyncMind/syncmind/framework/OpenHands bash ./evaluation/benchmarks/syncmind/scripts/run_infer.sh [llm configuration] [git version] [agent] [evaluation limit] [out-of-sync recovery method] [if using remote run] [max-turn limit] [num-workers] [evaluation data path] [resource-budget] [resource-coding cost] [resource-asking cost]For example: Run SyncMind with
GPT-4oas the agent tackling out-of-sync[llm configuration]: llm.gpt_4o[git version]: HEAD[agent]: CodeActAgent[evaluation limit]: 10[out-of-sync recovery method]: independent[if using remote run]: false[max-turn limit]: 30[num-workers]: 1[evaluation data path]: set this field only if you have downloaded SyncBench locally
If loading SyncBench directly from Hugging Face, skip
[evaluation data path]:bash ./evaluation/benchmarks/syncmind/scripts/run_infer.sh llm.gpt_4o HEAD CodeActAgent 10 independent false 30 1Or have already downloaded SyncBench locally:
Run SyncMind on local dataset
./data/callee_11_whisper_instance.csv:bash ./evaluation/benchmarks/syncmind/scripts/run_infer.sh llm.gpt_4o HEAD CodeActAgent 10 independent false 30 1 ./data/callee_11_whisper_instance.csvResource-aware agent out-of-sync recovery:
[max-turn limit]: 30[resource-budget]: 1000 (default)[resource-coding cost]: 100 (default)[resource-asking cost]: 100 (default)
Continue with our example: If would like to define a different setting of resources
[max-turn limit]: 20[resource-budget]: 3000[resource-coding cost]: 50[resource-asking cost]: 200
If loading SyncBench directly from Hugging Face, skip
[evaluation data path]:bash ./evaluation/benchmarks/syncmind/scripts/run_infer.sh llm.gpt_4o HEAD CodeActAgent 10 independent false 20 1 3000 50 200Or have already downloaded SyncBench locally:
Run SyncMind on local dataset
./data/callee_11_whisper_instance.csv:bash ./evaluation/benchmarks/syncmind/scripts/run_infer.sh llm.gpt_4o HEAD CodeActAgent 10 independent false 20 1 3000 50 200 ./data/callee_11_whisper_instance.csv -
Evaluation
cd ./SyncMind/syncmind/framework/OpenHands bash ./evaluation/benchmarks/syncmind/scripts/run_eval.sh [path to eval data]The evaluation result will be saved to the same directory as your eval data, with the file name
eval_summary_{timestamp}.json.
(1) Load SyncBench
from datasets import load_dataset
dataset = load_dataset("xuehang/SyncBench")
You can now access all SyncBench datasets:
- Evaluation dataset consisting of 300 instances:
dataset['syncbench_300']- Callee:
dataset['syncbench_300_callee'] - Caller:
dataset['syncbench_300_caller']
- Callee:
- SyncBench consisting of 24,332 instances:
dataset['syncbench_24k']- Callee:
dataset['syncbench_24k_callee'] - Caller:
dataset['syncbench_24k_caller']
- Callee:
(2) Load A Specific SyncBench Dataset
from datasets import load_dataset
dataset = load_dataset("xuehang/SyncBench", data_files=<dataset_name>)
Fill in <dataset_name> with a specific dataset name:
- syncbench_300
- syncbench_300_callee
- syncbench_300_caller
- syncbench_24k
- syncbench_24k_callee
- syncbench_24k_caller
For example:
from datasets import load_dataset
dataset = load_dataset("xuehang/SyncBench", data_files="syncbench/syncbench_300.csv")
Alternatively:
from datasets import load_dataset
# Load datasets
dataset = load_dataset("xuehang/SyncBench", data_files={
"syncbench_300": "syncbench/syncbench_300.csv",
"syncbench_300_caller": "syncbench/syncbench_300_caller.csv",
"syncbench_300_callee": "syncbench/syncbench_300_callee.csv"
})
# Access the data
eval_data = dataset["syncbench_300"]
caller_data = dataset["syncbench_300_caller"]
callee_data = dataset["syncbench_300_callee"]
Run unit test:
cd SyncMind
pytest ./tests/test_syncbench.py -v
In our current version, SyncBench is built upon 21 popular GitHub repositories.
SyncBench can be readily scale up by applying to diverse qualified Python repositories, and can also be quickly downsampled to smaller evaluation subsets.
SyncBench can be readily scale up by applying to diverse Python repositories that meet the following prerequisites:
- Have Python as the primary language
- Possess well-developed unit tests
- (Optional) Support easy env setup is a plus, be not required
- Repositories with env setup files, such as
setup.py,.toml,.yml, etc., can help quickly build up the docker environment - Meanwhile, please be rest assured that you can also manually specified certain packages to install when your selected repositories may not include these env setup files.
- Repositories with env setup files, such as
Source Repository
Edit source repo at: ./source/my_repo_dict.json
- Append new source repositories to this dictionary
- One may preset environment dependencies in this dictionary if the source repository does not prepare environment setup necessities
Set source repo in SyncBench construction command to specify which source repositories to use.
Set params: directly modify construction.sh
cd SyncMind/scripts/construction.sh
-
Set
root_pathto the dir with enough space to save generated benchmark instancesROOT_PATH="/home/xuehangg/" -
Set the path to source repositories
DATA_PATH="./source/my_repo_dict.json" -
Set dataset type:
callerorcalleeDATASET='caller' -
Define function and method filtering strictness (See more filtering details at SyncBench.md)
STRICT_FILTERING=0 # 0: not strict | 1: strict (may result in no filtered data being collected) -
Define execution test timeout
TIMEOUT=600 -
Define the maximum length of data to be filtered
MAX_LENGTH=1000 -
Set source repository range:
[CONSTRUCT_START, CONSTRUCT_END), start from 0For example, if constructing SyncBench based on source repositories with ID
1-3:CONSTRUCT_START=0 CONSTRUCT_END=3 -
Set out-of-sync mode
[Execution test filtering mode]
fp: fail-to-pass onlypp: pass-to-pass onlyboth: fail-to-pass and pass-to-pass
TEST_MODE="fp" -
Set commit tracing mode
- Trace all commits that satisfy
TEST_MODE:TRACE_MODE=0 - Trace only the oldest commit that satisfies
TEST_MODE:TRACE_MODE=0
TRACE_MODE=0 - Trace all commits that satisfy
(1) (Optional) Check Gits
- If would like to check git commits before constructing SyncBench
cd SyncMind bash ./scripts/git.sh
(2) SyncBench Construction
-
Construct SyncBench
cd SyncMind bash ./scripts/construction.shThis will save both the structured data in
.jsonformat and the instantiated data in.csvformatJSONdata: will be saved to./syncbench_build/datasetin.jsonformatCSVdata: will be saved to./syncbench_build/syncbenchin.csvformat
Where
syncbench_buildshares the same parent directory asSyncMind.
(3) (Optional) SyncBench Instantiation
-
Instantiate
JSONdata intoCSVinstances (after running syncbench constructionbash ./scripts/construction.shto generateJSONdata):cd SyncMind bash ./scripts/syncbench.shThis will convert structured
.jsondata into instantiated datasets in.csvformatCSVdata: will be saved to./syncbench_build/syncbenchin.csvformat
Where
syncbench_buildshares the same parent directory asSyncMind. -
Noted that this step in totally optional, just in case if you would like to change instance attributes.
- Running
construction.shalready includes this instantiation step with default attributes for agent out-of-sync recovery evaluation.
- Running
For small-scale evaluation, SyncBench can be readily downsampled to fewer instances:
(1) 300 Instances:
- We have sampled a small evaluation dataset through weighted downsampling: 300 Instances
- 300 out-of-sync instances derived from 21 GitHub repositories
- 150 Caller instances
- 150 Callee instances
- 300 out-of-sync instances derived from 21 GitHub repositories
(2) Custom subset
- Choose a proper method to downsample a custom SyncBench subset
If you find our method or benchmark useful, please kindly cite our paper:
@article{guo2025syncmind,
title={SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering},
author={Guo, Xuehang and Wang, Xingyao and Chen, Yangyi and Li, Sha and Han, Chi and Li, Manling and Ji, Heng},
journal={arXiv preprint arXiv:2502.06994},
year={2025}
}

