Usage
1 Basic Usage
Basic usage is following step;
- Create replay buffer (
ReplayBuffer.__init__) - Add transitions (
ReplayBuffer.add)- Reset at episode end (
ReplayBuffer.on_episode_end)
- Reset at episode end (
- Sample transitions (
ReplayBuffer.sample)
2 Example Code
Here is a simple example for storing standard environment (aka. obs, act, rew, next_obs, and done).
from cpprb import ReplayBuffer
buffer_size = 256
obs_shape = 3
act_dim = 1
rb = ReplayBuffer(buffer_size,
env_dict ={"obs": {"shape": obs_shape},
"act": {"shape": act_dim},
"rew": {},
"next_obs": {"shape": obs_shape},
"done": {}})
obs = np.ones(shape=(obs_shape))
act = np.ones(shape=(act_dim))
rew = 0
next_obs = np.ones(shape=(obs_shape))
done = 0
for i in range(500):
rb.add(obs=obs,act=act,rew=rew,next_obs=next_obs,done=done)
if done:
# Together with resetting environment, call ReplayBuffer.on_episode_end()
rb.on_episode_end()
batch_size = 32
sample = rb.sample(batch_size)
# sample is a dictionary whose keys are 'obs', 'act', 'rew', 'next_obs', and 'done'3 Construction Parameters
(See also API reference)
| Name | Type | Optional | Discription |
|---|---|---|---|
size |
int |
No | Buffer size |
env_dict |
dict |
Yes (but unusable) | Environment definition (See here) |
next_of |
str or array-like of str |
Yes | Memory compression (See here) |
stack_compress |
str or array-like of str |
Yes | Memory compression (See here) |
default_dtype |
numpy.dtype |
Yes | Fall back data type |
Nstep |
dict |
Yes | Nstep configuration (See here) |
mmap_prefix |
str |
Yes | mmap file prefix (See here) |
4 Notes
Flexible environment values are defined by env_dict when buffer creation. The detail is described at document.
Since stored values have flexible name, you have to pass to ReplayBuffer.add member by keyword.