[Feature] Allow multiple (nested) action, reward, done keys in env,vec_env and collectors#1462
Conversation
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
# Conflicts: # test/mocking_classes.py # test/test_env.py
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
env module
Signed-off-by: Matteo Bettini <[email protected]>
|
Up to here benchmark runs have shown all performance has been maintained |
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
env moduleenv,vec_env and collectors
|
This looks great! I have a few questions though:
|
Basically the idea is that at every step when some reset flag will be true, The other possibility is that you only put the dones you catually need in your done spec. i.e. if your agent can be done, but you want to reset only when some groups are done, than your done spec should have group granularity and not agent granularity.
That snippet i posted is just an example pushing the flexibility of a composite done spec to the extreme. I do not see a use for that in multiagent. In multiagent we suggest to stick to what is outlined in #1463 |
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
|
Benchmarks up to here
|
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
|
@vmoens PR is ready for review, here the last benchmark results (i dunno how attendibili they are ahaha)
|
This reverts commit e8e410e.
Signed-off-by: Matteo Bettini <[email protected]>
# Conflicts: # torchrl/data/utils.py
vmoens
left a comment
There was a problem hiding this comment.
As part of this PR, I think we should rename the "_action_spec" in "full_action_spec" etc.
For the rest I just have some minor comments, nice work!
Signed-off-by: Matteo Bettini <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
…`vec_env` and `collectors` (pytorch#1462) Signed-off-by: Matteo Bettini <[email protected]> Co-authored-by: vmoens <[email protected]>
…`vec_env` and `collectors` (pytorch#1462) Signed-off-by: Matteo Bettini <[email protected]> Co-authored-by: vmoens <[email protected]>
Depends on #512
This PR is an important milestone in the journey to #1463.
It will also allow single-agent users to have more than one action, reward, and done keys.
This is very important when your agents has multiple action (e.g., some discrete, some continuous ).
Main design problem: multiple dones
One major design choice here though is that if we allow multiple done keys, we need a "_reset" key for each (same for "truncated" , "is_init" and all the other done-based keys).
This is because we need to independently reset each done (e.g., in multiagent we want to reset only some done agents).
So the question is, what is the best way to do this? because we can imagine having a done_spec like
With a spec of this type, we place a "_reset" key for each done present (and that has any trues).
Same will be valid for "truncated" and "is_init".
A _reset td will look like
This solution is already implemented in the PR
cc @hyerra