Code for paper "RA-PbRL: Provably Efficient Risk-Aware Preference-Based Reinforcement Learning"

Code Setup Documentation

Libraries (>= Python 3.12.4)

For more information on the version specifics, see the environment. yaml file. To import the environment, execute the following command prompt commands:

[mamba | conda | microbamba] create -n env python=3.12
[mamba | conda | microbamba] activate env
[mamba | conda | microbamba | pip] install numpy scipy pandas seaborn matplotlib jupyter gymnasium pytest

conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

For the last step, we strongly recommend to Follow Nvidia tutorial to install pytorch . The code we provide to install pytorch is the one used for our project.

Seaborn
Matplotlib
Jupyter
scipy
numpy
pandas
pytorch (see Pytorch installation website)
Gymnasium by Farama (previously OpenAI gym)
pytest (testing included)

Script Usage

Our codebase provides the code implementing and training RA-PbRL algorithm. In order to execute the experiments we implemented, refer to

python3 src/paper_experiments/pbrl_experiment.py

Implemented Algorithms

Bellow is a description and pseudo-code used to implement algorithms with the experiment API.

Risk-Aware Preference-baser Reinforcement Learning (RA-PbRL)

RA-PbRL is a type of Policy-Iteration and "Confidence Bound" reinforcement learning algorithm designed for preference-based reinforcement learning while maximizing risk-awareness through Value-at-Risk penalties. The intuition behind the algorithm depends on the idea of confidence bounds. The algorithm begins by computing the confidence bound sets within which there is a 1-δ probability that the transition function ($\hat{P}_k$), reward function ($\hat{r}_k (\cdot)$), and policies ($\pi$). The first two are computed by finding a center to the sets, and then using theoretical bounds to limit the set (see UCB algorithm in bandit theory.) As for the policy optimization, we optimize for two policies. We accomplish this by computing a policy confidence set that contains "almost-optimal" policies with a minor sub-optimality gap. Finally, the two "most exploratory" policies are chosen (choose most and least optimal policies in set.)

========================================================================
ALGORITHM - Risk Aware Preference-based Reinforcement Learning (RA-PbRL)
========================================================================
INPUT:      τ1:  list[tuple[State, Action, Reward]],
            τ2:  list[tuple[State, Action, Reward]],

PARAMETERS: K:   int - Number of episodes,
            H:   int - Horizon,
            S:   State - State space cardinality,
            A:   Action - Action space cardinality,
            δ:   float - Theoretical probabilistic guarantee (1-δ),
            n_k: Callable[[State, Action], int] - # times (s,a) visited
            
OUTPUT:     
------------------------------------------------------------------------
P_k   <= argmin_P |P[s, a] @ I[s, a]|^2
B^P_k <= {P'| |P_k|}

^{Oregon State University - Corvallis, OR.}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
figs		figs
src		src
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md
environment.yaml		environment.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code for paper "RA-PbRL: Provably Efficient Risk-Aware Preference-Based Reinforcement Learning"

Code Setup Documentation

Libraries (>= Python 3.12.4)

Script Usage

Implemented Algorithms

Risk-Aware Preference-baser Reinforcement Learning (RA-PbRL)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Code for paper "RA-PbRL: Provably Efficient Risk-Aware Preference-Based Reinforcement Learning"

Code Setup Documentation

Libraries (>= Python 3.12.4)

Script Usage

Implemented Algorithms

Risk-Aware Preference-baser Reinforcement Learning (RA-PbRL)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages