Skip to content

prestonfu/qscaled

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Value-Based Deep RL Scales Predictably

A workflow for evaluating trade-offs between data efficiency, compute efficiency, and performance for online off-policy RL, validated across multiple algorithms and environments.

Oleh Rybkin1, Michal Nauman1,2, Preston Fu1, Charlie Snell1, Pieter Abbeel1, Sergey Levine1, Aviral Kumar3
1UC Berkeley, 2University of Warsaw, 3Carnegie Mellon University

Installation

QScaled can be easily installed in any environment with Python >= 3.10.

pip install -e .

Overview

We collect run data from the Wandb API using the BaseCollector subclasses defined in qscaled/wandb_utils. Then, we format this data into zip files, saved to ~/.qscaled/zip by default. Then, to perform analysis on this data, we reference these zip files by name in experiments.

Reproduce results from our paper

First, download the zip files from our experiments.

bash qscaled/scripts/download_zipdata.sh

Using data from our hyperparameter grid search, we can compute the "best" batch size $B^* (\sigma)$ and learning rate $\eta^* (\sigma)$ according to our method. To perform this analysis for e.g. OpenAI Gym,

cd experiments/1_grid_search
python gym_compute_params.py

For a closer look at our method, see e.g. gym_explore.ipynb.

Then, using data from our runs using these fitted hyperparameters, we follow the rest of the procedure we describe in the paper:

  • Fitting the minimum amount of data $\mathcal{D}_J(\sigma)$ needed to achieve performance level $J$ at UTD $\sigma$.
  • Using these fits at different performance levels $J$ to fit $\sigma^*(\mathcal F_0)$. This procedure is detailed in the notebooks in experiments/2_fitted.

Reproduce our method on your own experiments

The full workflow is as follows:

  1. First, run a hyperparameter grid search over UTD $\sigma$, batch size $B$, and learning rate $\eta$, with logging to Wandb. In our experiments, we run 8-10 seeds per configuration.

Using the results of this sweep, we will fit the "best" batch size $B^* (\sigma)$ and learning rate $\eta^* (\sigma)$.

  1. Depending on whether you are running one or multiple seeds in a single Wandb run, implement a subclass of OneSeedPerRunCollector or MultipleSeedsPerRunCollector, which have example subclass implementations ExampleOneSeedPerRunCollector and ExampleMultipleSeedsPerRunCollector, respectively.
  2. Make a copy of a notebook in experiments/1_grid_search.
  3. Label your Wandb runs with tags (or, if you don't have many runs, skip this step and leave wandb_tags as []). You can add tags by selecting runs in the Wandb UI and clicking "Tag".
  4. Update the SweepConfig.

This procedure takes ~10 minutes!

  1. Running the notebook and inspecting the fits will enable you to determine whether a fit with shared or separate log-log slopes works better for your use case (see the following section for more details). Hyperparameters are saved to experiments/outputs/grid_proposed_hparams.
  2. Run experiments using these fits on a larger range of UTDs.
  3. Make a copy of a notebook in experiments/2_fitted, follow the same setup in steps 4 and 5, and run!

Actually, I just want your hyperparameters.

See experiments/outputs/grid_proposed_hparams.

  • shared (recommended): Our batch size $B^*(\sigma)$ and learning rate $\eta^*(\sigma)$ log-linear fits use a shared slope across all tasks within the same benchmark.
  • separate: We fit $B^*(\sigma)$ and $\eta^*(\sigma)$ separately for each task.
  • baseline_utd{sigma}: We compare our approach against taking the best $B$ and $\eta$ for some given UTD $\sigma$, and reusing the same $B$ and $\eta$ for all other UTDs.

Citation

@inproceedings{
  rybkin2025valuebased,
  title={Value-Based Deep {RL} Scales Predictably},
  author={Oleh Rybkin and Michal Nauman and Preston Fu and Charlie Victor Snell and Pieter Abbeel and Sergey Levine and Aviral Kumar},
  booktitle={Forty-second International Conference on Machine Learning},
  year={2025},
  eprint={2502.04327},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2502.04327}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages