Improving Value Estimation Critically Enhances Vanilla Policy Gradient

This repository contains the code and implementation details in the paper Improving Value Estimation Critically Enhances Vanilla Policy Gradient [arXiv]. In this work, we challenge the common belief that enforcing approximate trust regions leads to steady policy improvement achieved by modern policy gradient methods such as TRPO and PPO. Instead, we show that the more critical factor is the enhanced value estimation accuracy from more value update steps in each iteration. To demonstrate, we show by simply increasing the number of value steps per iteration, Vanilla Policy Gradient can achieve performance comparable to or even better than PPO.

The empirical results presented in the paper were obtained using examples/mujoco/run_experiments.sh, which is adapted from Tianshou. However, we understand that the Tianshou library is somewhat complex. For easier reproduction, we also provide a simplified, single-file implementation VPG_single_file.py adapted from CleanRL which removes some non-essential components from the original implementation.

For prerequisites, please check with requirements.txt in CleanRL. Note that since we only conduct experiments on robotics environments in Gymnasium, not all dependencies required by CleanRL need to be installed to run our experiments.

An example use of VPG_single_file.py:

python VPG_single_file.py --env_id Hopper-v4 --seed 1 --num_value_step 50

If you find our work helpful, please cite:

@inproceedings{
wang2025improving,
title={Improving Value Estimation Critically Enhances Vanilla Policy Gradient},
author={Tao Wang and Ruipeng Zhang and Sicun Gao},
booktitle={Forty-second International Conference on Machine Learning},
year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
examples		examples
tianshou		tianshou
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
VPG_single_file		VPG_single_file
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Improving Value Estimation Critically Enhances Vanilla Policy Gradient

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Improving Value Estimation Critically Enhances Vanilla Policy Gradient

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages