Paper (RLC 2025)
Multi-task Reinforcement Learning (MTRL) has emerged as a critical training paradigm for applying reinforcement learning (RL) to a set of complex real-world robotic tasks, which demands a generalizable and robust policy. At the same time, massively parallelized training has gained popularity, not only for significantly accelerating data collection through GPU-accelerated simulation but also for enabling diverse data collection across multiple tasks by simulating heterogeneous scenes in parallel. However, existing MTRL research has largely been limited to off-policy methods like SAC in the low-parallelization regime. MTRL could capitalize on the higher asymptotic performance of on-policy algorithms, whose batches require data from the current policy, and as a result, take advantage of massive parallelization offered by GPU-accelerated simulation. To bridge this gap, we introduce a massively parallelized Multi-Task Benchmark for robotics (MTBench), an open-sourced benchmark featuring a broad distribution of 50 manipulation tasks and 20 locomotion tasks, implemented using the GPU-accelerated simulator IsaacGym. MTBench also includes four base RL algorithms combined with seven state-of-the-art MTRL algorithms and architectures, providing a unified framework for evaluating their performance. Our extensive experiments highlight the superior speed of evaluating MTRL approaches using MTBench, while also uncovering unique challenges that arise from combining massive parallelism with MTRL.
- [Aug 15th, 2025] Add FastTD3 + SimbaV2, BRO, Asymmetric Critic
Download the Isaac Gym Preview 4 release from the website, then follow the installation instructions in the documentation.
Ensure that Isaac Gym works on your system by running one of the examples from the python/examples
directory, like joint_monkey.py. Follow troubleshooting steps described in the Isaac Gym Preview 4
install instructions if you have any trouble running the samples.
Once Isaac Gym is installed and samples work within your current python environment, install this repo:
pip install -e .
pip install skrl
pip install moviepyMTBench/
βββ assets
β βββ assets_v2/
β βββ unified_objects/ # Meta-World assets
| βββurdf
β βββ franka_description/ # franka robot assets
| βββ go1/ # go1 assets
βββ exec/ # Bash scripts to run experiments
βββ isaacgymenvs/
β βββ cfg/ # Hydra configs for tasks and training
β βββ learning/ # RLGames MTRL Algorithms and approaches
β βββ networks/ # MTRL architectures
β βββ tasks/
β βββ franka/
β βββ vec_task/
β βββ franka_base.py # Meta-World base class
β βββ task_fns/ # Meta-World task functions
β βββ locomotion/
β βββ legged_base.py # Parkour base class
β βββ set_terrains/ # Parkour curriculum
| βββ train.py # Entry point for training and evaluation
βββ scripts/ # Scripts for visualization
In this repo, we provide two task sets, Meta-World and Parkour, which have base classes franka_base.py and legged_base.py respectively. All tasks are defined on top of these base classes and are located in the isaacgymenvs/tasks/franka and isaacgymenvs/tasks/locomotionfolders.
To add a new task to Meta-World, add it to the task id to task map in isaacgymenvs/tasks/franka/vec_task/franka_base.py and implement the task with the required functions (create_envs, compute_observations, compute_reward, and reset_env) in a new file in the task_fns folder.
| Approach | Reference | Location |
|---|---|---|
| Vanilla | - | MTSACAgent |
| Soft-Modularization | https://arxiv.org/abs/2003.13661 | SoftModularizedSACBuilder |
| Approach | Reference | Location |
|---|---|---|
| PQN | https://arxiv.org/abs/2407.04811 | Algorithm; Architecture |
| PQN + Soft-Modularization | - | SoftModularizedPQBuilder |
| Approach | Reference | Location |
|---|---|---|
| FastTD3 | https://arxiv.org/abs/2505.22642 | FastTD3Agent |
| Approach | Reference | Location |
|---|---|---|
| GRPO | https://arxiv.org/abs/2402.03300 | MTGRPOAgent |
By simply adding agents in the learning folder, you can easily extend the benchmark with new MTRL approaches. By simply adding networks in the learning/networks folder, you can easily extend the benchmark with new MTRL architectures.
All experimental results are reproducible from the bash executables in the exec folder. The bash scripts are organized by environment and MTRL approach, so you can easily find the one you need. They all call train.py, which means the key arguments are below. To reproduce any PPO MTRL approach x in evaluation setting y, simply run the following command:
exec/ppo_exps/y/x.sh
Instantiate the domain of your choice with the below Gym wrapper and use your own RL code. Code adapted from FastTD3
class MTBenchEnv:
def __init__(
self,
task_name: str,
device_id: int,
num_envs: int,
seed: int,
):
def get_task_counts(task_name):
num_tasks = 10 if task_name == "meta-world-v2-mt10" else 50
base_count = num_envs // num_tasks
remainder = num_envs % num_tasks
subgroup_counts = [base_count + 1] * remainder + [base_count] * (num_tasks - remainder)
return subgroup_counts
task_config = MTBENCH_MW2_CONFIG.copy()
if task_name == "meta-world-v2-mt10":
self.num_tasks = 10
task_config["env"]["tasks"] = [4, 16, 17, 18, 28, 31, 38, 40, 48, 49]
task_config["env"]["taskEnvCount"] = get_task_counts(task_name)
elif task_name == "meta-world-v2-mt50":
task_config["env"]["tasks"] = list(range(self.num_tasks))
task_config["env"]["taskEnvCount"] = get_task_counts(task_name)
else:
raise ValueError(f"Unsupported task name: {task_name}")
task_config["env"]["numEnvs"] = num_envs
task_config["env"]["numObservations"] = 39 + self.num_tasks
task_config["env"]["seed"] = seed
env_cfg = {"task": task_config}
env_cfg = OmegaConf.create(env_cfg)
self.env = isaacgymenvs.make(
task=env_cfg.task.name,
num_envs=num_envs,
sim_device=f"cuda:{device_id}",
rl_device=f"cuda:{device_id}",
seed=seed,
headless=True,
cfg=env_cfg,
)
self.num_envs = num_envs
self.num_obs = self.env.observation_space.shape[0]
self.num_actions = self.env.action_space.shape[0]
self.max_episode_steps = self.env.max_episode_length
def reset(self) -> torch.Tensor:
"""Reset the environment."""
with torch.no_grad():
self.env.reset_idx(torch.arange(self.num_envs, device=self.env.device))
self.env.cumulatives["rewards"][:] = 0
self.env.cumulatives["success"][:] = 0
obs_dict = self.env.reset()
return obs_dict["obs"].detach()
def step(
self, actions: torch.Tensor
) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor, dict]:
"""Step the environment."""
with torch.no_grad():
obs_dict, rew, dones, infos = self.env.step(actions.detach())
return obs_dict["obs"].detach(), rew.detach(), dones.detach(), infosCheckpoints are saved in the folder runs/EXPERIMENT_NAME/nn where EXPERIMENT_NAME
defaults to the task name, but can also be overridden via the experiment argument.
To load a trained checkpoint and continue training, use the checkpoint argument:
python train.py task=meta-world-v2 checkpoint=runs/x/nn/x.pthTo load a trained checkpoint and only perform inference (no training), pass test=True
as an argument, along with the checkpoint name. To avoid rendering overhead, you may
also want to run with fewer environments using num_envs=64:
python train.py task=meta-world-v2 checkpoint=runs/x/nn/x.pth test=True num_envs=64We use Hydra to manage the config. Note that this has some differences from previous incarnations in older versions of Isaac Gym.
Key arguments to the train.py script are:
task=TASK- selects which task to use. Any ofmeta-world-v2,go1-benchmark(these correspond to the config for each environment in the folderisaacgymenvs/cfg/task)train=TRAIN- selects which training config to use.task_id- selects which task IDs to use for training. This is a comma-separated list of integers, e.g.task_id=[5,8,1]will layout tasks 5,8, and 1 in that order. The number of task ids must match the number of task counts specified intask_counts.task_counts- an array of integers that specifies how many environments to use for each task. For example,task_counts=[512, 512, 512, ... ]will use 512 environments for each of the first three tasks. The number of task counts must match the number of tasks specified intask_id.num_envs=NUM_ENVS- the number of total parallel environments to use and must match the sum of the task counts specified intask_counts. For example, if you have 10 tasks with 512 environments each, you would setnum_envs=5120.seed=SEED- sets a seed value for randomizations, and overrides the default seed set up in the task configsim_device=SIM_DEVICE_TYPE- Device used for physics simulation. Set tocuda:0(default) to use GPU and tocpufor CPU. Follows PyTorch-like device syntax.rl_device=RL_DEVICE- Which device / ID to use for the RL algorithm. Defaults tocuda:0, and also follows PyTorch-like device syntax.graphics_device_id=GRAPHICS_DEVICE_ID- Which Vulkan graphics device ID to use for rendering. Defaults to 0. Note - this may be different from CUDA device ID, and does not follow PyTorch-like device syntax.pipeline=PIPELINE- Which API pipeline to use. Defaults togpu, can also set tocpu. When using thegpupipeline, all data stays on the GPU and everything runs as fast as possible. When using thecpupipeline, simulation can run on either CPU or GPU, depending on thesim_devicesetting, but a copy of the data is always made on the CPU at every step.test=TEST- If set toTrue, only runs inference on the policy and does not do any training.checkpoint=CHECKPOINT_PATH- Set to path to the checkpoint to load for training or testing.headless=HEADLESS- Whether to run in headless mode.experiment=EXPERIMENT- Sets the name of the experiment.max_iterations=MAX_ITERATIONS- Sets how many iterations to run for.exempted_tasls=[]- an optional list of task ids that does not randomize the start step count per env. We find that it improves performance, so we always randomize the starting step count per env.
Hydra also allows setting variables inside config files directly as command line arguments. As an example, to set the discount rate for a rl_games training run, you can use train.params.config.gamma=0.999. Similarly, variables in task configs can also be set. For example, task.env.enableDebugVis=True.
Default values for each of these are found in the isaacgymenvs/config/config.yaml file.
The way that the task and train portions of the config works are through the use of config groups.
You can learn more about how these work here
The actual configs for task are in isaacgymenvs/config/task/<TASK>.yaml and for train in isaacgymenvs/config/train/<TASK>PPO.yaml.
In some places in the config you will find other variables referenced (for example,
num_actors: ${....task.env.numEnvs}). Each . represents going one level up in the config hierarchy.
This is documented fully here.
You can run WandB with Isaac Gym Envs by setting wandb_activate=True flag from the command line. You can set the group, name, entity, and project for the run by setting the wandb_group, wandb_name, wandb_entity and wandb_project set. Make sure you have WandB installed with pip install wandb before activating.
Videos of the agents gameplay during training can be toggled by the record_videos=True flag and are uploaded to WandB automatically.
Please cite this work as:
@inproceedings{
joshi2025benchmarking,
title={Benchmarking Massively Parallelized Multi-Task Reinforcement Learning for Robotics Tasks},
author={Viraj Joshi and Zifan Xu and Bo Liu and Peter Stone and Amy Zhang},
booktitle={Reinforcement Learning Conference},
year={2025},
url={https://openreview.net/forum?id=z0MM0y20I2}
}