This repository supports a compute-efficient, parallel evaluation of different environment shaping operations, i.e., evaluating RL performance under different
- reward functions
- observation spaces
- action spaces
- initial/goal states
- terminal conditions
Let's consider a scenario; imagine training a quadruped with RL -- we might want to test out different reward functions to get the best locomotion behavior. Let's say we have
The most common way to evaluate the performance of these reward functions is to either sequentially or parallely run independent Isaac Gym/Sim RL training processes for each reward functions.
python train.py --reward_idx 0 --num_envs 256;
python train.py --reward_idx 1 --num_envs 256;
...
python train.py --reward_idx n --num_envs 256;Let's imagine launching all these scripts i parallel in a single GPU, i.e., RTX 4090 with 24GB of VRAM. With 256 environments for each reward function, launching more than 8 procesess is impossible -- it is bottlenecked by the VRAM. Speed of training for each processes rapidly decreases as well as we launch more processes. Imagine we want to run this evaluation for over
Isaac Gym/Sim is designed to support massively parallel environments with independent agents. It is thus possible to assign different rewards to different environment subgroups, and evaluate each reward in a single process, i.e.,
python parallel_train.py --reward_idx 0,1,2,..,n --num_envs ${256*n}In fact, the latter approach has significant benefit over the former in terms of (a) computational speed, and (b) VRAM usage. When using RTX4090 with 24GB VRAM, the latter approach allows evaluation of up to 250 rewards, while the former approach only allows 8 rewards. Decrease of training speed is much less significant as well.
If you want to test out any shaping operations, include all your shaping candidates under ./envcoder_bench/user_functions following the instruction. Then, run:
cd envcoder_bench
python eval_shapers.py --task Anymal --shaper_types compute_reward compute_observation This will evaluate all your shaping candidates in a single Isaac Gym process
To let LLMs figure out the best shaping candidates, run:
cd envcoder_bench
python auto_shape.py --task Anymal --task_description "Make anymal run as fast as possible" --shaper_types compute_reward compute_observation --llm gpt-4-turbo-previewIt supports both OpenAI GPT models and Anthropic Claude models as an LLM agent.