-
Notifications
You must be signed in to change notification settings - Fork 158
Noise in the environment 🐛 #385
Description
Description
Currently, the data generation process works in the following way.
Whenever there is noise in the environment configurations, fixed a seed, the data generation process will be fixed, thus generating always the same "environment trajectory" (i.e., same order distribution, same vessel speeds, same vessel parking noise).
Expected Behavior
I would expect each "environment trajectory" to be different from the previous one (i.e., after a reset), in the sense that a different noise should be applied each time the environment is reset.
This is crucial also for the different reasons that are mentioned in both of your papers: if not done, the environment is fully deterministic (and one of the main reason to apply methods based on RL is the way in which they can handle uncertainty, as it happens in truly real scenarios indeed).
If this is not done, the performances that any RL-based method is able to achieve are flawed. In this case, indeed, it is obvious that the method is overfitting the "noise" in that specific configuration (at this point, it is even missleading to call it noise, since each trajectory generates the same exact data) .
Environment
- MARO version (e.g., v0.1.1a1): master
- MARO scenario (
CIM,Citi Bike): CIM - MARO component (
Simulation,RL,Distributed Training): Simulation - Orchestration platform (
GraSS on Azure,AKS on Azure): - How you installed MARO (
pip,source): source - OS (
Linux,Windows,macOS): Linux - Python version (
3.6,3.7): 3.7 - Docker image (e.g., maro2020/maro:latest):
- CPU/GPU:
- Any other relevant information: