Skip to content

wx83/TeaserGen_Official

Repository files navigation

TeaserGen: Generating Teasers for Long Documentaries

This repository contains the official implementation of "TeaserGen: Generating Teasers for Long Documentaries"(The codebase is still under construction.)

TeaserGen:Generating Teasers for Long Documentaries

Weihan Xu, Paul Pu Liang, Haven Kim, Julian McAuley, Taylor Berg-Kirkpatrick, Hao-Wen Dong

The International Conference on Learning Representations (ICLR), 2025

[Paper] [Demo Page] [Pretrained Model]

Contents

  1. Overview
  2. Prerequisites
  3. Dataset Annotation and Processing
  4. Narration Generation
  5. TeaserGen-PT
  6. TeaserGen-LR
  7. Evaluation
  8. Reproducibility
  9. Interactive Demo on Gradio: Coming Soon!
  10. Acknowledgement
  11. Final Note
  12. Citation

Overview

Dataset Annotation Overview: Dataset Annotation Overview

TeaserGen-PT: TeaserGen-PT

TeaserGen-LR: TeaserGen-LR

Prerequisites

conda env create -f newgpt.yml

Dataset Annotation and Processing

We annotate the separating point of teaser and main documentary and save the annotations in annotation/annotation.csv.

You can find detailed data processing code under ./data_preprocessing

The general pipeline for data processing:

  1. Download raw data from youtube: ./data_preprocessing/video_download.py

  2. Preprocess raw video by separating the audio track from the video: ./data_preprocessing/video_preprocessing.py

  3. Audio Separation: ./data_preprocessing/audio_preprocess.py

  4. Transcription: Timestamped-whisper or whisperX: ./data_preprocessing/whisperx.py

  5. Prepare your CLIP feature: Extract frames from video with ./data_preprocessing/frame_extraction.py and then use ./data_preprocessing/clip_frame_feat_extractor.py and ./data_preprocessing/clip_text_feat_extractor.py

We also provide additional processing on scene at scene_process.py

Narration Generation

  1. Input your transcribed text into narr_gen/text_gpt4.py and generate narrations

  2. Get audio track and correponding audio length with narr_gen/text2speech.py

TeaserGen-PT

If you want to use the pretrained model from UniVTG: You can find code ./teasergen-pt/gpt_tf_queue.py

If you want to use the finetuned model on DocumentaryNet: You can find code ./teasergen-pt/gpt_ft_queue.py

TeaserGen-LR

  1. Prepare your training dataset: ./teasergen_lr/prepare_dataset.py

  2. Training: ./teasergen_lr/train_epoch.py or ./teasergen_lr/train_step.py

  3. Decoding: ./teasergen_lr/inference.py

Evaluation

  1. Generate vtgscore array: ./eval/vtgscore.py

  2. Evaluate finetuned highlight detection model: ./eval/highlight_eval.py

  3. Evaluation: ./eval/evaluation.py

Reproducibility

We put all decoded array(TeaserGen-LR) and time intervals(TeaserGen-PT) within ./reproducibility folder. We also provide the pretrained model at the shared google drive.

Interactive Demo

We will release interactive demonstration shortly.

Acknowledgement

WhipserX is based on WhipserX

Audio Separation is based on CDX

TeaserGen-PT is based on UniVTG

We thank the authors for their open-source contributions.

Final Note

Due to copyright concerns, we are unable to release the raw data. If you encounter any issues with your data or have any questions, feel free to reach out to Weihan at [email protected].

Citation

@inproceedings{xu2025teasergen,
    title={TeaserGen: Generating Teasers for Long Documentaries},
    author={Weihan Xu and Paul Pu Liang and Haven Kim and Julian McAuley and Taylor Berg-Kirkpatrick and Hao-Wen Dong},
    booktitle={International Conference on Learning Representations},
    year={2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages