Skip to content
/ UniTime Public

Universal Video Temporal Grounding with Generative Multi-modal Large Language Models

Notifications You must be signed in to change notification settings

Lzq5/UniTime

Repository files navigation

UniTime

This repository provides the official PyTorch implementation of "Universal Video Temporal Grounding with Generative Multi-modal Large Language Models" (NeurIPS 2025).

🌐 Project Page $\cdot$ 📄 Paper $\cdot$ 🤗 Model

🔥 News

  • [2025.10] Released the code for data construction, training, and evaluation.
  • [2025.09] UniTime accepted to NeurIPS 2025!
  • [2025.06] Released the inference code.
  • [2025.06] Preprint available on arXiv.

⚙️ Installation

conda create -n UniTime python=3.10
conda activate UniTime
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

🚀 Quick Start

  1. Download Model Checkpoints

    • Obtain the pretrained checkpoints from Qwen2-VL-7B and UniTime.
    • Set model_local_path to your local path for Qwen2-VL-7B, and model_finetune_path to your UniTime checkpoint.
  2. Prepare Input Data

    • Create a JSON file for inference as data/test.json, and specify its path via the data_path argument.
  3. Run Inference

    • Execute the following command to perform inference. The output results will be saved in the results/ directory.
    export CUDA_VISIBLE_DEVICES=0
    python inference.py --model_local_path path_to_qwen2vl7B \
         --model_finetune_path ckpt/unitime \
         --data_path data/test.json \
         --output_dir ./results/test \
         --nf_short 128

Data Preparation

  1. Download the video and annotation files for each dataset from the corresponding source links.

  2. Create the input file following the format below:

    [
        {
            "qid": 0, 
            "id": "3MSZA", 
            "annos": [
                {
                    "query": "person turn a light on.",
                    "window": [[24.3, 30.4]]
                }
            ],
            "duration": 30.96,
            "video_path": "./videos/3MSZA.mp4",
            "mode": "mr",
        }
    ]

    Example construction code for Ego4D-NLQ can be found in datasets/data_ego4d.py (see load_data_to_dict() function). Modify it as needed for other datasets.

  3. (Optional) You may also download preprocessed annotations for each dataset from UniTime-Data.

Training and Evaluation

Execute the following commands in sequence:

# Feature Extraction
bash scripts/feature.sh

# Training
bash scripts/train.sh

# Evaluation
bash scripts/eval.sh

# Metrics
python eval_metrics.py --res ./results/RUN_NAME/results.json

Note: Modify the arguments marked with ToModify in the code according to the following definitions:

Argument Description
path_to_qwen2vl7B Path to the Qwen2-VL-7B model directory
path_to_feature_root Root directory containing features for all datasets
path_to_video_root Root directory path containing all video files
path_to_train_data Path to training set annotation file generated by datasets/data_ego4d.py
path_to_val_data Path to validation set annotation file generated by datasets/data_ego4d.py
path_to_test_data Path to test set annotation file generated by datasets/data_ego4d.py
path_to_feature_folder Subfolder under path_to_feature_root for a specific dataset
RUN_NAME Experiment identifier/name for this training run

Citation

If you use this code and data for your research or project, please cite:

@inproceedings{unitime2025,
    title={Universal Video Temporal Grounding with Generative Multi-modal Large Language Models},
    author={Li, Zeqian and Di, Shangzhe and Zhai, Zhonghua and Huang, Weilin and Wang, Yanfeng and Xie, Weidi},
    booktitle={NeurIPS},
    year={2025}
}

Acknowledgements

This project builds upon several excellent open-source efforts:

Contact

For questions, please contact: [email protected].

About

Universal Video Temporal Grounding with Generative Multi-modal Large Language Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •