SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies via velocity-reparameterized sequential modeling

Overview

SAC Flow is a stable, sample-efficient, and high-performance off-policy RL algorithm for flow-based policies. SAC Flow treats the flow-based model as a sequential model and reparameterizes its velocity network as a GRU or a Transformer.

Get Start

All necessary dependencies and environment setup steps are detailed in our installation guide. Please follow the instructions in setup to prepare your environment.

From-scratch training

Run the corresponding file directly:

cd from_scratch_code

Run SAC Flow-T:

python SAC_flow_transformer_jax.py

Run SAC Flow-G:

python SAC_flow_gru_jax.py

Run Naive SAC Flow

python Naive_sac_flow_jax.py

Offline-to-online training

cd offline-to-online

Run SAC Flow-T:

MUJOCO_GL=egl python main_action_reg_three_phase.py --run_group=reproduce --agent=agents/acfql_transformer_ablation_online_sac.py --agent.alpha=100 --env_name=cube-triple-play-singletask-task4-v0 --sparse=False --horizon_length=5

Run SAC Flow-G:

MUJOCO_GL=egl python main_action_reg_three_phase.py --run_group=reproduce --agent=agents/acfql_gru_ablation_online_sac.py --agent.alpha=100 --env_name=cube-triple-play-singletask-task4-v0 --sparse=False --horizon_length=5

Run Naive SAC Flow:

MUJOCO_GL=egl python main_action_reg_three_phase.py --run_group=reproduce --agent=agents/acfql_ablation_online.py --agent.alpha=100 --env_name=cube-triple-play-singletask-task4-v0 --sparse=False --horizon_length=5

Run QC-FQL:

MUJOCO_GL=egl  python main.py --run_group=reproduce --agent.alpha=100 --env_name=cube-triple-play-singletask-task4-v0 --sparse=False --horizon_length=5

Run FQL:

MUJOCO_GL=egl  python main.py --run_group=reproduce --agent.alpha=100 --env_name=cube-triple-play-singletask-task4-v0 --sparse=False --horizon_length=1

Cite our work

@article{sacflow,
      title={SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies via Velocity-Reparameterized Sequential Modeling}, 
      author={Yixian Zhang and Shu'ang Yu and Tonghe Zhang and Mo Guang and Haojia Hui and Kaiwen Long and Yu Wang and Chao Yu and Wenbo Ding},
      year={2025},
      eprint={2509.25756},
      archivePrefix={arXiv},
      url={https://arxiv.org/abs/2509.25756}, 
}

Acknowledgments

This code is mainly developed based on the cleanrl (from-scratch training) and QC-FQL (offline-to-online training). If you find our work useful, please consider citing their works as well.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
cleanrl_utils		cleanrl_utils
from_scratch_code		from_scratch_code
offline-to-online		offline-to-online
.gitignore		.gitignore
README.md		README.md
overview.png		overview.png
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
sacflow-setup.md		sacflow-setup.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies via velocity-reparameterized sequential modeling

Overview

Get Start

From-scratch training

Offline-to-online training

Cite our work

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies via velocity-reparameterized sequential modeling

Overview

Get Start

From-scratch training

Offline-to-online training

Cite our work

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages