This repository contains the official implementation of "TeaserGen: Generating Teasers for Long Documentaries"(The codebase is still under construction.)
TeaserGen:Generating Teasers for Long Documentaries
Weihan Xu, Paul Pu Liang, Haven Kim, Julian McAuley, Taylor Berg-Kirkpatrick, Hao-Wen Dong
The International Conference on Learning Representations (ICLR), 2025
[Paper] [Demo Page] [Pretrained Model]
- Overview
- Prerequisites
- Dataset Annotation and Processing
- Narration Generation
- TeaserGen-PT
- TeaserGen-LR
- Evaluation
- Reproducibility
- Interactive Demo on Gradio: Coming Soon!
- Acknowledgement
- Final Note
- Citation
conda env create -f newgpt.ymlWe annotate the separating point of teaser and main documentary and save the annotations in annotation/annotation.csv.
You can find detailed data processing code under ./data_preprocessing
The general pipeline for data processing:
-
Download raw data from youtube: ./data_preprocessing/video_download.py
-
Preprocess raw video by separating the audio track from the video: ./data_preprocessing/video_preprocessing.py
-
Audio Separation: ./data_preprocessing/audio_preprocess.py
-
Transcription: Timestamped-whisper or whisperX: ./data_preprocessing/whisperx.py
-
Prepare your CLIP feature: Extract frames from video with ./data_preprocessing/frame_extraction.py and then use ./data_preprocessing/clip_frame_feat_extractor.py and ./data_preprocessing/clip_text_feat_extractor.py
We also provide additional processing on scene at scene_process.py
-
Input your transcribed text into narr_gen/text_gpt4.py and generate narrations
-
Get audio track and correponding audio length with narr_gen/text2speech.py
If you want to use the pretrained model from UniVTG: You can find code ./teasergen-pt/gpt_tf_queue.py
If you want to use the finetuned model on DocumentaryNet: You can find code ./teasergen-pt/gpt_ft_queue.py
-
Prepare your training dataset: ./teasergen_lr/prepare_dataset.py
-
Training: ./teasergen_lr/train_epoch.py or ./teasergen_lr/train_step.py
-
Decoding: ./teasergen_lr/inference.py
-
Generate vtgscore array: ./eval/vtgscore.py
-
Evaluate finetuned highlight detection model: ./eval/highlight_eval.py
-
Evaluation: ./eval/evaluation.py
We put all decoded array(TeaserGen-LR) and time intervals(TeaserGen-PT) within ./reproducibility folder. We also provide the pretrained model at the shared google drive.
We will release interactive demonstration shortly.
WhipserX is based on WhipserX
Audio Separation is based on CDX
TeaserGen-PT is based on UniVTG
We thank the authors for their open-source contributions.
Due to copyright concerns, we are unable to release the raw data. If you encounter any issues with your data or have any questions, feel free to reach out to Weihan at [email protected].
@inproceedings{xu2025teasergen,
title={TeaserGen: Generating Teasers for Long Documentaries},
author={Weihan Xu and Paul Pu Liang and Haven Kim and Julian McAuley and Taylor Berg-Kirkpatrick and Hao-Wen Dong},
booktitle={International Conference on Learning Representations},
year={2025}
}

