Skip to content

[Feature] VITA Policy + FM Policy using DiT + minor bug fixes#580

Merged
zcyqyq merged 7 commits intoRoboVerseOrg:mainfrom
gaodechen:vita
Nov 6, 2025
Merged

[Feature] VITA Policy + FM Policy using DiT + minor bug fixes#580
zcyqyq merged 7 commits intoRoboVerseOrg:mainfrom
gaodechen:vita

Conversation

@gaodechen
Copy link
Copy Markdown
Contributor

@gaodechen gaodechen commented Oct 23, 2025

Description

New Features: two new IL policies (VITA, FM + DiT)
Bug Fixes

Type of change

1. VITA Policy [0]

VITA (Vision-to-Action flow matching)[0] is a new fast and performant policy learning algorithm. VITA directly flows from latent images to latent actions without sampling from Gaussian or injecting conditions during denoising.

  • The PR provides a VITA implementation with all the losses and network components used in the paper.
  • Add flow matcher wrappers. Includes conditional FM, Optimal Transport FM [3], Schrodinger Bridge FM [4], etc., such that users can swtich to different FM algorithms via modifying the configs. The flow matcher wrapper provide an Euler solver; loss computation and sampling functions that supports custom source distributions.

2. Flow Matching Policy

RoboVerse now supports FM + UNet. The PR provides a DiT implementation that has been well tested in our previous papers and can outperform FM + UNet and match RoboVerse DP implementation.

  • Add Diffusion Transformer [1] (DiT, as in Vision Transformer) using AdaLN blocks
  • Add RoPE [2] to positional embeders for DiT

Local performance tests on CloseBox L0. We use the default collect_demos scripts to generate datasets and default DP runner configurations for all the models and conduct two random runs locally. FM + DiT outperforms UNet and VITA outperforms both DP and FM.

Policy SRs
DP 0.86, 0.78
FM + UNet 0.78, 0.79
FM + DiT 0.85, 0.85
VITA 0.95, 0.95

[0] Gao, Dechen, et al. "VITA: Vision-to-Action Flow Matching Policy." arXiv preprint arXiv:2507.13231 (2025).
[1] Peebles, William, and Saining Xie. "Scalable diffusion models with transformers." Proceedings of the IEEE/CVF international conference on computer vision. 2023.
[2] Su, Jianlin, et al. "Roformer: Enhanced transformer with rotary position embedding." Neurocomputing 568 (2024): 127063.
[3] Tong, Alexander, et al. "Improving and generalizing flow-based generative models with minibatch optimal transport." arXiv preprint arXiv:2302.00482 (2023).
[4] Tong, Alexander, et al. "Simulation-free schr" odinger bridges via score and flow matching." arXiv preprint arXiv:2307.03672 (2023).

3. Minor fixes.

  • Remove repeated BaseImagePolicy declartions in DP policies and use the base class within DP utils.
  • roboverse_learn/il/data2zarr_dp.py can be interrupted when meta json is not successfully generated. Added try-catch.
  • All lower-cased consistent naming for file names. DDPM_model -> ddpm_model.
  • Fix a Posix path type error ckpt_name = args.checkpoint_path.split("/")[-1] + "_" + time_str in dp_runner.py is not compatible with newer pathlib versions.

How to test

Please describe how to test the change if applicable.

  • VITA Policy:
    Set algo_choose to vita_model (dp_run.sh)

  • FM Policy + DiT:
    Set algo_choose to fm_dit_model (dp_run.sh)

Local performance tests on CloseBox L0:

Policy SRs
DP 0.86, 0.78
FM + UNet 0.78, 0.79
FM + DiT 0.85, 0.85
VITA 0.95, 0.95

Screenshots / Videos

Please attach before and after screenshots or videos of the change if applicable.

image

Checklist

  • I have run the pre-commit checks with pre-commit run --color=always --all-files
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@gaodechen gaodechen changed the title VITA Policy + FM Policy using DiT + minor bug fixes [Feature] VITA Policy + FM Policy using DiT + minor bug fixes Oct 23, 2025
@zcyqyq zcyqyq added this pull request to the merge queue Nov 6, 2025
Merged via the queue into RoboVerseOrg:main with commit 29eacb2 Nov 6, 2025
1 check passed
Morpheus-An pushed a commit to Morpheus-Antuo/RoboVerse that referenced this pull request Nov 6, 2025
…rseOrg#580)

* Remove noise scheduler from fm. Refactor dp and fm. Fix posix path.

* add DiT and FM DiT; lower cased file names

* Init VITA

* Remove repeated base policy

* pre-commit and update contrib.md

---------

Co-authored-by: Murphy <[email protected]>
myuansun pushed a commit to yongce-liu/RoboVerse that referenced this pull request Nov 27, 2025
…rseOrg#580)

* Remove noise scheduler from fm. Refactor dp and fm. Fix posix path.

* add DiT and FM DiT; lower cased file names

* Init VITA

* Remove repeated base policy

* pre-commit and update contrib.md

---------

Co-authored-by: Murphy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants