This is the official data and code of the paper: DEPO: Dual-Efficiency Preference Optimization for LLM Agents
Project Page: Link
Before training, update both of the following:
-
Dataset registry
DEPO/data/dataset_info.jsonPoint each dataset entry to your local files.
-
Experiment configs
DEPO/efficient_agent/*.yamlEdit any fields that contain file paths (output dirs, model checkpoints, etc.).
Create and activate a Python environment that satisfies LLaMA-Factory.
Kick off training with the provided script:
bash train_depo.shCommon things to customize:
- Which YAML config to load (inside
train_depo.sh) - Output directory, logging/ckpt intervals
- LoRA settings, batch size, learning rate
- Which datasets (as defined in
dataset_info.json) to use
For model evaluation, we use the testing data from data/test.
All evaluations are conducted within the AgentGym framework, which provides the necessary environment server.
DEPO/
├─ data/
│ ├─ dataset_info.json # dataset path registry
│ ├─ kto_data # training data
│ └─ test # testing data
├─ efficient_agent/
│ ├─ *.yaml # experiment configs
├─ src/
│ └─ llamafactory/
│ └─ train/
│ └─ kto/
├─ train_depo.sh # entry script to start training
├─ requirements.txt # env deps (example)
└─ ......
That’s it—edit paths, install env, run the script. Happy training! 🚀
🤝 Feel free to cite our paper if you find this repository benefits your work.
@misc{chen2025depodualefficiencypreferenceoptimization,
title={DEPO: Dual-Efficiency Preference Optimization for LLM Agents},
author={Sirui Chen and Mengshi Zhao and Lei Xu and Yuying Zhao and Beier Zhu and Hanwang Zhang and Shengjie Zhao and Chaochao Lu},
year={2025},
eprint={2511.15392},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2511.15392},
}