Official implementation of
Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL by
Yu Luo, Tianying Ji, Fuchun Sun*, Jianwei Zhang, Huazhe Xu, Xianyuan Zhan
Offline-Boosted Actor-Critic (OBAC), a model-free online RL framework that elegantly identifies the outperforming offline policy through value comparison, and uses it as an adaptive constraint to guarantee stronger policy learning performance.
We evaluate our method across 53 diverse continuous control tasks spanning \textbf{6} domains: Mujoco, DMControl, Meta-World, Adroit, Myosuite, and Maniskill2, comparing it with BAC, TD-MPC2, SAC, and TD3.
We provide examples on how to train and evaluate OBAC agent.
See below examples on how to train OBAC on a single task.
python main.py --env_name YOUR_TASKWe recommend using default hyperparameters. See utilis/default_config.py for a full list of arguments.
See below examples on how to evaluate OBAC checkpoints.
python evaluate.py YOUR_TASK_PATH/../checkpoint bestIf you find our work useful, please consider citing our paper as follows:
@inproceedings{Luo2024obac,
title={Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL},
author={Yu Luo and Tianjing Ji and Fuchun Sun and Jianwei Zhang and Huazhe Xu and Xianyuan Zhan},
booktitle={International Conference on Machine Learning},
year={2024}
}
Please feel free to participate in our project by opening issues or sending pull requests for any enhancements or bug reports you might have. We’re striving to develop a codebase that’s easily expandable to different settings and tasks, and your feedback on how it’s working is greatly appreciated!
This project is licensed under the MIT License - see the LICENSE file for details. Note that the repository relies on third-party code, which is subject to their respective licenses.








