JustDubit: Video Dubbing via Joint Audio-Visual Diffusion

📰 News

[2026/02/10] 🔥 Code, checkpoints, and data released
[2026/01/29] 🔥 Tech report released

📄 Abstract

Audio-Visual Foundation Models, which are pretrained to jointly generate sound and visual content, have recently shown an unprecedented ability to model multi-modal generation and editing, opening new opportunities for downstream tasks.

Among these tasks, video dubbing could greatly benefit from such priors, yet most existing solutions still rely on complex, task-specific pipelines that struggle in real-world settings.

In this work, we introduce a single-model approach that adapts a foundational audio-video diffusion model for video-to-video dubbing via a lightweight LoRA. The LoRA enables the model to condition on an input audio-video while jointly generating translated audio and synchronized facial motion.

To train this LoRA, we leverage the generative model itself to synthesize paired multilingual videos of the same speaker. Specifically, we generate multilingual videos with language switches within a single clip, and then inpaint the face and audio in each half to match the language of the other half.

By leveraging the rich generative prior of the audio-visual model, our approach preserves speaker identity and lip synchronization while remaining robust to complex motion and real-world dynamics. We demonstrate that our approach produces high-quality dubbed videos with improved visual fidelity, lip synchronization, and robustness compared to existing dubbing pipelines.

🚀 Quick Links

Resource	Description
Inference Pipeline	Run video dubbing with the JustDubit pipeline
Training Guide	Train your own JustDubit LoRA

📦 Repository Structure

just-dub-it/
├── packages/
│   ├── ltx-pipelines/     # Inference pipeline for video dubbing
│   │   └── README.md      # Pipeline usage guide
│   ├── ltx-trainer/       # Training tools for JustDubit LoRA
│   │   └── README.md      # Training guide
│   └── ltx-core/          # Core model components
└── README.md              # This file

🎬 Inference

See the Pipeline README for:

Installation instructions
Model checkpoint downloads
Prompt format guide
CLI arguments reference

🏋️ Training

See the Trainer README for:

Dataset download and preparation
Preprocessing pipeline
Training configuration
Multi-GPU training setup

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
packages		packages
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JustDubit: Video Dubbing via Joint Audio-Visual Diffusion

📰 News

📄 Abstract

🚀 Quick Links

📦 Repository Structure

🎬 Inference

🏋️ Training

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

Folders and files

Latest commit

History

Repository files navigation

JustDubit: Video Dubbing via Joint Audio-Visual Diffusion

📰 News

📄 Abstract

🚀 Quick Links

📦 Repository Structure

🎬 Inference

🏋️ Training

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages