Flexible Environment
1 Overview
In cpprb version 8 and newer, you can store any number of
environments (aka. observation, action, etc.).
For example, you can add your special environments like
next_next_obs, second_reward, and so on.
These environments can take multi-dimensional shape (e.g. 3,
(4,4), (84,84,4)), and any numpy data type.
1.1 __init__
In order to construct replay buffers, you need to specify the second
parameter of their constructor, env_dict.
The env_dict is a dict whose keys are environment name and whose
values are dict describing their properties.
The following table is supported properties and their default values.
| key | description | type | default value |
|---|---|---|---|
| shape | shape (size of each dimension) | int or array like of int |
1 |
| dtype | data type | numpy.dtype |
default_dtype in constructor or numpy.single |
1.2 add
When add -ing environments to the replay buffer, you have to pass
them by keyword arguments (aka. key=value style). If your
environment name is not a syntactically valid identifier, you can
still create dictionary first, then unpack the dictionary by **
operator (e.g. rb.add(**kwargs)).
1.3 sample
sample returns dict with keys of environments’ name and with
values of sampled ones.
2 Example Usage
from cpprb import ReplayBuffer
import numpy as np
buffer_size = 32
rb = ReplayBuffer(buffer_size,{"obs": {"shape": (4,4)},
"act": {"shape": 1},
"rew": {},
"next_obs": {"shape": (4,4)},
"next_next_obs": {"shape": (4,4)},
"done": {},
"my_important_info": {"dtype": {np.short}}})
for _ in range(100):
rb.add(obs=np.zeros((4,4)),
act=1.5,
rew=0.0,
next_obs=np.zeros((4,4)),
next_next_obs=np.zeros((4,4)),
done=0,
my_important_info=2)
rb.sample(64)3 Notes
priorities, weights, and indexes for PrioritizedReplayBuffer
are special environments and are automatically set.
4 Technical Detail
Internally, these flexible environments are implemented with (cython
version of) numpy.ndarray. They were implemented with C++ code in
older than version 8, which had trouble in flexibilities of data type
and the number of environment. (There was a dirty hack to put all
extra environments into act which was not treat specially.)