EgoHOI: Egocentric World Model for Photorealistic Hand-Object Interaction Synthesis

This repository contains the EgoHOI codebase. It packages the release entrypoints, the egohoi modules, and a vendored diffsynth dependency required by the current code.

Repository Layout

egohoi_release/
├── infer.py
├── egohoi/
│   ├── __init__.py
│   ├── camera.py
│   ├── dataset.py
│   ├── inference.py
│   └── model.py
└── diffsynth/

infer.py: batch inference over an entire dataset split or dataset root.
egohoi/inference.py: single-clip inference utility and shared inference helpers.
egohoi/dataset.py: dataset loading, pose/object frame lookup, and camera embedding preparation.
egohoi/model.py: inference-time model classes and conditioning modules.
diffsynth/: local dependency used by the EgoHOI pipeline.

Environment

Recommended:

Python 3.10+
CUDA-enabled PyTorch environment
A GPU with enough memory for Wan I2V inference

Install dependencies from the repository root:

pip install -r requirements.txt

Note: cupy-cuda12x is CUDA-version-specific. If your environment uses a different CUDA version, replace it with the matching CuPy package or remove it if your runtime does not need it.

Required Assets

This repository does not include model weights or datasets. You must provide paths for:

Wan DiT weights via --dit_path
text encoder weights via --text_encoder_path
VAE weights via --vae_path
image encoder weights via --image_encoder_path
fine-tuned EgoHOI checkpoint via --checkpoint

Checkpoints and base weights:

Dataset Layout

infer.py accepts either a dataset root with split subdirectories or a single split directory directly.

Expected split layout:

SPLIT_ROOT/
├── videos/
├── saved_pose/
├── obj_mask/
└── camera_traj1/

If you pass a dataset root, it should look like:

DATA_ROOT/
├── train/
│   ├── videos/
│   ├── saved_pose/
│   ├── obj_mask/
│   └── camera_traj1/
└── val/
    ├── videos/
    ├── saved_pose/
    ├── obj_mask/
    └── camera_traj1/

Optional hand-pose override for batch inference:

HAND_POSE_ROOT/
└── <clip_id>/
    ├── 000000.png
    ├── 000001.png
    └── ...

Usage

Batch inference over a split or dataset root:

python infer.py \
  --dataset_path /path/to/data_root \
  --output_root outputs/batch \
  --dit_path /path/to/dit_01.safetensors,/path/to/dit_02.safetensors \
  --text_encoder_path /path/to/text_encoder.pth \
  --vae_path /path/to/vae.pth \
  --image_encoder_path /path/to/image_encoder.pth \
  --checkpoint /path/to/egohoi_checkpoint.pt \
  --num_frames 81 \
  --height 480 \
  --width 480 \
  --output_fps 24 \
  --torch_dtype bf16

Outputs are written under:

<output_root>/<split>/<clip_id>.mp4

Single-clip inference:

python egohoi/inference.py \
  --dataset_path /path/to/data_root \
  --split train \
  --clip_id clip-001947 \
  --output_path outputs/single/clip-001947.mp4 \
  --dit_path /path/to/dit_01.safetensors,/path/to/dit_02.safetensors \
  --text_encoder_path /path/to/text_encoder.pth \
  --vae_path /path/to/vae.pth \
  --image_encoder_path /path/to/image_encoder.pth \
  --checkpoint /path/to/egohoi_checkpoint.pt \
  --num_frames 81 \
  --height 480 \
  --width 480 \
  --output_fps 24 \
  --torch_dtype bf16

Useful options:

--skip_existing: skip clips that already have output videos.
--splits train val: restrict batch inference to selected splits.
--max_clips_per_split N: run a smoke test on a subset.
--hand_pose_root /path/to/override_pose_root: override the default pose directory for batch inference.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
diffsynth		diffsynth
egohoi		egohoi
examples/val		examples/val
.gitignore		.gitignore
README.md		README.md
infer.py		infer.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EgoHOI: Egocentric World Model for Photorealistic Hand-Object Interaction Synthesis

Repository Layout

Environment

Required Assets

Dataset Layout

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

EgoHOI: Egocentric World Model for Photorealistic Hand-Object Interaction Synthesis

Repository Layout

Environment

Required Assets

Dataset Layout

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages