Skip to content

brave-eai/dexjoco

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

dexjoco logo

arXiv Project Homepage HF Dataset HF Models HF Paper License


DexJoCo is a MuJoCo-based simulation benchmark and toolkit for task-oriented dexterous manipulation. It provides 11 functionally grounded tasks covering πŸ› οΈ tool use, 🀝 bimanual coordination, ⏱️ long-horizon execution, and βš™οΈ reasoning, together with a low-cost teleoperation data collection system, replayable demonstrations, domain randomization, and OpenPI Ο€0.5 policy training/evaluation support.

Bimanual photograph
Bimanual photograph
Bimanual unlock iPad
Bimanual unlock iPad
Bimanual assembly
Bimanual assembly
Bimanual hanoi
Bimanual hanoi
Bimanual microwave
Bimanual microwave
Water plant
Water plant
Hammer nail
Hammer nail
Fold glasses
Fold glasses
Pinch tongs
Pinch tongs
Pick bucket
Pick bucket
Click mouse
Click mouse

Table of Contents

πŸš€ Installation

Create and activate the DexJoCo environment:

conda env create -f environment-dexjoco.yaml
conda activate dexjoco

OpenPI training and serving environment

cd openpi
bash install.bash
conda activate openpi

πŸ€– Policy Evaluation

Download model checkpoints from DexJoCo-Pi05 or train them yourself before evaluation.

Start an OpenPI policy server from the openpi environment:

cd openpi
conda activate openpi
python scripts/serve_policy.py --port=8000 policy:checkpoint \
  --policy.config water_plant \
  --policy.dir ../checkpoints/pi05_ckpts/water_plant/<exp_name>/<step>

Run evaluation from the repository root in the dexjoco environment:

conda activate dexjoco
dexjoco-openpi-eval \
  --config=./configs/rand_obj/water_plant.yaml \
  --seed=0 \
  --port=8000

For rand_full evaluation, use a config under configs/rand_full/ and pass --rand-full:

dexjoco-openpi-eval \
  --config=./configs/rand_full/water_plant.yaml \
  --seed=0 \
  --port=8000 \
  --rand-full

Convenience launch templates are available at scripts/serve_pi05.bash and scripts/evaluate_pi05.bash.

dexjoco-openpi-eval options:

Option Default Description
--config PATH Required Evaluation YAML under configs/rand_obj/ or configs/rand_full/.
--seed INT 0 Random seed for NumPy and Python random state.
--rand-full False Enables the rand_full evaluation regime.
--randomize-dynamics False Enables dynamics randomization.
--port INT 8000 OpenPI websocket policy server port.
--output PATH Auto-generated Output directory for videos and success-rate marker files.
--render-mode {rgb_array,human} rgb_array DexJoCo rendering mode. rgb_array is headless.
--episodes INT 50 Number of evaluation episodes to run.

See dexjoco/dexjoco_openpi_client/eval_dexjoco_openpi.py for the complete option set.

πŸ”Œ Custom Policy Integration

DexJoCo supports custom policy evaluation through the same environment contract used by the OpenPI client. Observations are collected from the simulator and passed to a policy for action inference. The resulting actions are executed in the environment.

Custom integrations should follow the protocol described in docs/custom_policy_integration.md, including:

  • observation fields for camera images, proprioceptive state, and prompts
  • action layout conversion from rotation-vector policy actions to quaternion environment actions
  • chunked action execution and replanning for latency-tolerant inference
  • optional multi-frame observation history
  • LeRobot-style async_inference integration patterns

πŸ“¦ Data Collection

Please refer to the teleoperation/ directory for the hardware and software configuration required for teleoperation:

Component Documentation
teleoperation overview teleoperation/README.md
hardware setup teleoperation/Teleoperation_System_Tutorial.pdf
Vive tracker bridge teleoperation/vive_bridge
Rokoko hand-keypoint bridge teleoperation/rokoko
GeoRT hand retargeting teleoperation/GeoRT

Supported tasks:

Task Setup Task Name
Unlock iPad Bimanual bimanual_unlock_ipad
Hanoi Bimanual bimanual_hanoi
Assembly Bimanual bimanual_assembly
Microwave cook Bimanual bimanual_microwave_cook
Photograph Bimanual bimanual_photograph
Hammer nail Single-arm hammer_nail
Click mouse Single-arm click_mouse
Pick bucket Single-arm pick_bucket
Pinch tongs Single-arm pinch_tongs
Fold glasses Single-arm fold_glasses
Water plant Single-arm water_plant

TODO: how to add new tasks?

Start demonstration recording from the repository root with scripts/record_demos_zarr.py:

conda activate dexjoco
python scripts/record_demos_zarr.py \
  --exp_name=water_plant \
  --successes_needed=20 \
  --randomize=True \
  --out_dir=./demos

The script opens the selected DexJoCo task, records successful teleoperation episodes, and writes each success as a replayable Zarr episode with camera videos.

Common record_demos_zarr.py options:

Flag Purpose
--exp_name Selects one of the task names.
--successes_needed Stops collection after the requested number of successful demos.
--randomize Enables the rand_full visual randomization regime used for data collection.
--show_sim_cameras Displays camera streams during interactive collection.
--save_depth Saves depth arrays and depth videos alongside RGB videos.
--out_dir Selects the output directory for collected demos.

scripts/record_demos_zarr.py supports --camera_screen_effect to display a camera viewfinder overlay, defaults to False.

🎬 Demonstration Replay

Raw DexJoCo datasets for replay are available from DexJoCo/DexJoCo-Datasets-Raw.

Replay recorded demonstrations with scripts/replay_demos_zarr.py:

conda activate dexjoco
python scripts/replay_demos_zarr.py \
  --exp_name=water_plant \
  --input_dir=./demos \
  --out_dir=./replay_output \
  --randomize=True \
  --restore_state=True

Replay runs through the policy interface, can restore recorded initial object poses and table height, and can generate rand_full visual variants through camera, lighting, and table texture randomization.

Common replay_demos_zarr.py options:

Flag Default Purpose
--exp_name water_plant Selects the task used to replay the demonstrations.
--input_dir ./ Directory containing recorded demo folders with replay.zarr.
--out_dir ./replay_output Output directory for replayed Zarr episodes and videos.
--randomize True Enables replay-time rand_full visual randomization with preset camera, lighting, and texture changes.
--seed 0 Base replay seed; the demo index is added for each input demo.
--save_depth False Saves depth arrays and depth videos alongside RGB replay videos.

See scripts/replay_demos_zarr.py for the complete option set.

scripts/replay_demos_zarr.py supports --camera_screen_effect to display a camera viewfinder overlay, defaults to False.

πŸ—‚οΈ Data Format

Each successful demonstration is written as:

<out_dir>/<exp_name>_demo_<index>_<timestamp>/
  replay.zarr/
  videos/<camera_key>.mp4
  videos/<camera_key>_depth.npz
  videos/<camera_key>_depth.mp4

Depth outputs are present when --save_depth=True.

The Zarr replay buffer stores low-dimensional episode data:

Field Description
action Recorded policy or teleoperation actions.
action_rotvec Action representation with orientation stored as rotation vectors when conversion is available.
timestamp Per-step timestamps derived from --data_fps.
state Proprioceptive and task state used by replay and state restoration when available.

Policy Training Action and State Layout

For bimanual demonstrations, the recorded action layout is:

[r_pose7, r_hand16, l_pose7, l_hand16]

The policy-mode DexJoCo environment expects the flat action layout:

[r_pose7, l_pose7, r_hand16, l_hand16]

During OpenPI evaluation, dexjoco/dexjoco_openpi_client/dexjoco_openpi_env.py handles the action order conversion automatically. It also converts rotation-vector actions into the quaternion pose representation used by the DexJoCo environment.

OpenPI training uses action_rotvec as the action target. The raw action field stores quaternion poses with 23 dimensions for single-arm tasks and 46 dimensions for bimanual tasks. action_rotvec stores the action layout with 22 dimensions for single-arm tasks and 44 dimensions for bimanual tasks.

The recorded state field includes privileged environment state for replay, such as object poses and table height. Policy training should use only robot proprioception:

Setup Policy State
Single-arm First 23 dimensions: TCP pose and hand joints
Bimanual First 46 dimensions: right TCP pose, left TCP pose, right hand, left hand

Privileged environment fields should be filtered out before training policy models.

πŸ”„ Data Conversion

Use dexjoco-data-converter/ to convert raw DexJoCo datasets into LeRobot datasets or Zarr replay buffers.

bash dexjoco-data-converter/install.bash
conda activate dexjoco-dc

dexjoco-dc-single-lerobot \
  --input ./datasets/raw/dexjoco_raw_datasets/water_plant \
  --output ./converted_datasets/lerobot/single/water_plant \
  --language-instruction "Grasp the watering can and apply water to the plant." \
  --selected-data-yaml "{action: action_rotvec, state: state, cameras: {front: front, wrist: wrist}}" \
  --slice-yaml "{state: [null, 23]}"

See dexjoco-data-converter/README.md for batch conversion, multi-task merge and configuration files.

βš™οΈ Policy Training

DexJoCo LeRobot datasets are available from DexJoCo/DexJoCo-Datasets-LeRobot.

OpenPI Ο€0.5 training support lives under openpi/. The OpenPI setup covers two DexJoCo data regimes:

Regime Randomization
rand_obj Object placement and table height randomization
rand_full rand_obj plus third-person camera, lighting, and table texture randomization

Training workflow:

  1. Install the OpenPI environment with openpi/install.bash.
  2. Place checkpoints and LeRobot datasets according to openpi/config.yaml.
  3. Convert the Ο€0.5 base checkpoint for 44-dimensional bimanual actions with openpi/scripts/convert_to_action_dim_44_model.py when training bimanual tasks.
  4. Compute normalization statistics with openpi/scripts/compute_norm_stats.py or openpi/scripts/compute_norm_stats.bash.
  5. Launch multiple tmux training jobs with openpi/scripts/launch_tmux_train.py, or train a single policy with openpi/scripts/train.py.

See openpi/README.md for command examples and checkpoint layout details.

πŸ–₯️ Headless Rendering

Headless environments use policy_mode=True and render_mode="rgb_array":

TaskConfig.get_environment(
    policy_mode=True,
    render_mode="rgb_array",
    ...
)

policy_mode=True exposes the policy action interface and disables the teleoperation wrapper. render_mode="rgb_array" selects offscreen rendering for policy evaluation and automated environment usage.

Interactive teleoperation collection uses policy_mode=False and the MuJoCo viewer, so it does not require the headless configuration.

πŸ“„ License

DexJoCo-owned code in this repository is released under the MIT License.

Bundled third-party components and assets retain their separate license terms:

Component License Scope
teleoperation/GeoRT Upstream non-commercial GeoRT license
franka_emika_panda Apache-2.0
wonik_allegro BSD-2-Clause
openpi/ Apache License, Version 2.0, plus Gemma model terms where applicable

πŸ“š Citation

@misc{wang2026dexjocobenchmarktoolkittaskoriented,
      title={DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo},
      author={Hanwen Wang and Weizhi Zhao and Xiangyu Wang and Siyuan Huang and He Lin and Boyuan Zheng and Rongtao Xu and Gang Wang and Yao Mu and He Wang and Lue Fan and Hongsheng Li and Zhaoxiang Zhang and Tieniu Tan},
      year={2026},
      eprint={2605.16257},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2605.16257},
}

Releases

No releases published

Packages

 
 
 

Contributors

Languages