|
Research
My research focuses on understanding motion and interactions in video, including multi-modal video understanding, dense motion understanding, and object-centric learning.
|
|
Efficiently Reconstructing Dynamic Scenes One 🎯 D4RT at a Time
Chuhan Zhang,
Guillaume Le Moing,
Skanda Koppula,
Ignacio Rocco,
Liliane Momeni,
Junyu Xie,
Shuyang Sun,
Rahul Sukthankar,
Joëlle K. Barral,
Raia Hadsell,
Zoubin Ghahramani,
Andrew Zisserman,
Junlin Zhang,
Mehdi S. M. Sajjadi
ArXiv, 2025
ArXiv /
Bibtex /
Project page
@article{zhang2025d4rt,
title={Efficiently Reconstructing Dynamic Scenes One D4RT at a Time},
author={Zhang, Chuhan and Le Moing, Guillaume and Koppula, Skanda and Rocco, Ignacio and Momeni, Liliane and Xie, Junyu and Sun, Shuyang and Sukthankar, Rahul and Barral, Jo{\"e}lle K. and Hadsell, Raia and Ghahramani, Zoubin and Zisserman, Andrew and Zhang, Junlin and Sajjadi, Mehdi S. M.},
journal={arXiv preprint},
year={2025}
}
D4RT is a feedforward model that utilizes a unified transformer architecture and a novel querying mechanism to jointly infer depth, spatio-temporal correspondence, and camera parameters from a single video, achieving state-of-the-art performance in 4D reconstruction tasks.
|
|
Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation
Junyu Xie,
Tengda Han, Max Bain, Arsha Nagrani, Eshika Khandelwal, Gül Varol, Weidi Xie, Andrew Zisserman
In ICCV, 2025
ArXiv /
Bibtex /
Project page /
Code /
Metric (Action Score)
@InProceedings{xie2025shotbyshot,
title={Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation},
author={Junyu Xie and Tengda Han and Max Bain and Arsha Nagrani and Eshika Khandelwal and G\"ul Varol and Weidi Xie and Andrew Zisserman},
booktitle={ICCV},
year={2025}
}
In this work, we introduce an enhanced two-stage training-free framework for Audio Description (AD) generation. We consider "shot" as the fundamental unit in movie and TV series, incorporating shot-based temporal context and film grammar information into VideoLLM perception. Additionally, we formulate a new metric (Action Score) that assesses whether the predicted ADs captures the correct action information.
|
|
Character-Centric Understanding of Animated Movies
Zhongrui Gui,
Junyu Xie,
Tengda Han, Weidi Xie, Andrew Zisserman
In ACMMM, 2025
ArXiv /
Bibtex /
Project page /
Code
@InProceedings{gui2025character,
title={Character-Centric Understanding of Animated Movies},
author={Zhongrui Gui and Junyu Xie and Tengda Han and Weidi Xie and Andrew Zisserman},
booktitle={ACMMM},
year={2025}
}
To address the challenge of recognising highly variable animated characters, this work introduces a novel audio-visual pipeline and the CMD-AM dataset, utilising a multi-modal character bank to significantly improve accessibility through generated audio descriptions and character-aware subtitles.
|
|
AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description
Junyu Xie,
Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
In ACCV, 2024  
ArXiv /
Bibtex /
Project page /
Code /
Dataset (TV-AD)
@InProceedings{xie2024autoad0,
title={AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description},
author={Junyu Xie and Tengda Han and Max Bain and Arsha Nagrani and G\"ul Varol and Weidi Xie and Andrew Zisserman},
booktitle={ACCV},
year={2024}
}
In this paper, we propose AutoAD-Zero, which is a training-free framework aiming at zero-shot Audio Description (AD) generation for movies and TV series. The overall framework feature two stages (dense description + AD summary), with the character information injected by visual-textual prompting.
|
|
Moving Object Segmentation: All You Need Is SAM (and Flow)
Junyu Xie,
Charig Yang, Weidi Xie, Andrew Zisserman
In ACCV (Oral), 2024  
ArXiv /
Bibtex /
Project page /
Code
@InProceedings{xie2024flowsam,
title={Moving Object Segmentation: All You Need Is SAM (and Flow)},
author={Junyu Xie and Charig Yang and Weidi Xie and Andrew Zisserman},
booktitle={ACCV},
year={2024}
}
This paper focuses on motion segmentation by incorporating optical flow into the Segment Anything model (SAM), applying flow information as direct inputs (FlowISAM) or prompts (FlowPSAM).
|
|
Appearance-Based Refinement for Object-Centric Motion Segmentation
Junyu Xie,
Weidi Xie, Andrew Zisserman
In ECCV, 2024  
ArXiv /
Bibtex /
Project page
@InProceedings{xie2024appearrefine,
title={Appearance-Based Refinement for Object-Centric Motion Segmentation},
author={Junyu Xie and Weidi Xie and Andrew Zisserman},
booktitle={ECCV},
year={2024}
}
This paper aims at improving flow-only motion segmentation (e.g. OCLR predictions) by leveraging appearance information across video frames. A selection-correction pipeline is developed, along with a test-time model adaptation scheme that further alleviates the Sim2Real disparity.
|
|
SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds
Minghao Chen,
Junyu Xie,
Iro Laina, Andrea Vedaldi
In CVPR , 2024  
ArXiv /
Bibtex /
Project page /
Code /
Demo
@InProceedings{chen2024shap,
title={SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds},
author={Chen, Minghao and Xie, Junyu and Laina, Iro and Vedaldi, Andrea},
booktitle=CVPR,
year={2024}
}
This paper present a method, named SHAP-EDITOR, aiming at fast 3D editing (within one second). To acheve this, we propose to learn a universal editing function that can be applied to different objects in a feed-forward manner.
|
|
Segmenting Moving Objects via an Object-Centric Layered Representation
Junyu Xie,
Weidi Xie, Andrew Zisserman
In NeurIPS, 2022  
ArXiv /
Bibtex /
Project page /
Code
@InProceedings{xie2022segmenting,
title = {Segmenting Moving Objects via an Object-Centric Layered Representation},
author = {Junyu Xie and Weidi Xie and Andrew Zisserman},
booktitle = {NeurIPS},
year = {2022}
}
In this paper, we propose the OCLR model for discovering, tracking and segmenting multiple moving objects in a video without relying on human annotations. This object-centric segmentation model utilises depth-ordered layered representations and is trained following a Sim2Real procedure.
|
This website template is originally designed by Jon Barron.
|
|