score_po

This repository contains code for "Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching".

Uncertainty-Penalized Optimal Control

We perform uncertainty penalization optimal control problems of the folloinwg form,

max_\theta E_{x_i\sim\rho} [\sum_t r(x_t,u_t) + \beta \sum_t \log p(x_t,u_t)
      s.t. x_{t+1} = f(x_t, u_t) forall t,
               u_t = pi_\theta(x_t) forall t,
               x_0 = x_i

where θ are policy parameters, r is the reward, and p is the perturbed empirical distribution of data that encourages rollout trajectories to stay close to data.

Score-Guided Planning (SGP)

Our codebase supports general feedback policy optimization, but examples mainly evolve around open-loop planning. This is a special case of uncertainty-penalized optimal control where the policy is parametrized as an open-loop sequence of inputs.

How to Run

Installation

This repo is mainly written in torch, and heavily uses wandb and hydra-core. For some robotic examples, drake might be required as a dependency.

To install these dependencies and set the path, simply run

python -m pip install -r requirements.txt

after cloning the repo, and add the python path to ~/.bashrc. Since this repo relies on calling lines such as import examples, we recommend putting this line at the end of the bashrc file.

export PYTHONPATH=${HOME}/score_po:${PYTHONPATH}

Running with Hydra

We use hydra for our examples, and users are required to have a config file. Add your own user config file under config/user for each example, and modify the config files to have your user name.

For example, to run examples/cartpole/learn_model.py,

Add a profile to examples/cartpole/config/user as new_user.yaml, following patterns of terry.yaml.
In examples/cartpole/config/learn_model.yaml, set

defaults:
  - user: new_user

Run python examples/cartpole/learn_model.py from the cloned directory.

Running Tests

We use pytest for testing and CI. To run tests, do

pytest .

from the cloned directory.

Examples

All the examples can be found in the examples folder with instructions on how to run.

Simple1D
Cart-pole system
The pixel-space single integrator: use branch pixels_glen.
D4RL Mujoco Benchmark
Box-Keypoint Pushing Example: for hardware code, use branch lcm_hardware.

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
.github/workflows		.github/workflows
examples		examples
score_po		score_po
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

score_po

Uncertainty-Penalized Optimal Control

Score-Guided Planning (SGP)

How to Run

Installation

Running with Hydra

Running Tests

Examples

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

hjsuh94/score_po

Folders and files

Latest commit

History

Repository files navigation

score_po

Uncertainty-Penalized Optimal Control

Score-Guided Planning (SGP)

How to Run

Installation

Running with Hydra

Running Tests

Examples

Citations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages