Skip to content

OpenCausaLab/DEPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DEPO

This is the official data and code of the paper: DEPO: Dual-Efficiency Preference Optimization for LLM Agents

Project Page: Link

1) Configure Paths

Before training, update both of the following:

  • Dataset registry

    DEPO/data/dataset_info.json
    

    Point each dataset entry to your local files.

  • Experiment configs

    DEPO/efficient_agent/*.yaml
    

    Edit any fields that contain file paths (output dirs, model checkpoints, etc.).

2) Install LLaMA-Factory Environment

Create and activate a Python environment that satisfies LLaMA-Factory.

3) Train

Kick off training with the provided script:

bash train_depo.sh

Common things to customize:

  • Which YAML config to load (inside train_depo.sh)
  • Output directory, logging/ckpt intervals
  • LoRA settings, batch size, learning rate
  • Which datasets (as defined in dataset_info.json) to use

4) Evaluation

For model evaluation, we use the testing data from data/test. All evaluations are conducted within the AgentGym framework, which provides the necessary environment server.

Repo Layout

DEPO/
├─ data/
│  ├─ dataset_info.json         # dataset path registry
│  ├─ kto_data                  # training data
│  └─ test                      # testing data
├─ efficient_agent/
│  ├─ *.yaml                    # experiment configs
├─ src/
│  └─ llamafactory/
│     └─ train/
│        └─ kto/
├─ train_depo.sh                # entry script to start training
├─ requirements.txt             # env deps (example)
└─ ...... 

That’s it—edit paths, install env, run the script. Happy training! 🚀

🖇️ Citation

🤝 Feel free to cite our paper if you find this repository benefits your work.

@misc{chen2025depodualefficiencypreferenceoptimization,
      title={DEPO: Dual-Efficiency Preference Optimization for LLM Agents}, 
      author={Sirui Chen and Mengshi Zhao and Lei Xu and Yuying Zhao and Beier Zhu and Hanwang Zhang and Shengjie Zhao and Chaochao Lu},
      year={2025},
      eprint={2511.15392},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2511.15392}, 
}

About

[AAAI 2026] Code and Data for Paper "DEPO: Dual-Efficiency Preference Optimization for LLM Agents"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published