Skip to content

Darcyddx/VAD-LLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 

Repository files navigation

VAD-LLM


👋👋👋 A collection of resources related to Large Language Models in video anomaly detection🚨.

📌 More details please refer to our paper.

🛠️ Please let us know if you find out a mistake or have any suggestions by e-mail: [email protected]

📑 Citation

If you find our work useful for your research, please cite the following paper:

@article{ding2024quo,
  title={Quo Vadis, Anomaly Detection? LLMs and VLMs in the Spotlight},
  author={Ding, Xi and Wang, Lei},
  journal={arXiv preprint arXiv:2412.18298},
  year={2024}
}

🚀 News

  • [27/12/2024] 🎁The GitHub repository for our paper has been released.
  • [25/12/2024] 🎄Our paper has been published on arXiv.

🔦 Table of Contents

Temporal modeling
(a) Temporal modeling
Interpretability
(b) Interpretability
Training-free
(c) Training-free
Open-world
(d) Open-world

We present a systematic evaluation of 13 closely related works from 2024 that use large language models (LLMs) and vision-language models (VLMs) for video anomaly detection (VAD). The analysis is organized around four key perspectives: (a) temporal modeling, (b) interpretability, (c) training-free, and (d) open-world detection, each represented by a subfigure. For each perspective, we highlight the strategies used, key strengths, limitations, and outline promising directions for future research. The video frames used in the analysis are sourced from the MSAD dataset.

⚡⚡⚡ Comparison of methods released in 2024 for video anomaly detection (VAD)

We compare recent approaches in VAD, highlighting key aspects such as interpretability, temporal modeling, few-shot learning, and open-world detection. Performance is evaluated across six benchmark datasets: UCSD Ped2 (Ped2), CUHK Avenue (CUHK), ShanghaiTech (ShT), UCF-Crime (UCF), XD-Violence (XD), and UBnormal (UB). Datasets evaluated using Area Under the Curve (AUC) include Ped2, CUHK, ShT, UCF, and UB, while the XD dataset is evaluated using Average Precision (AP).

Method Code LLM/VLM Interpret. Temporal Few-shot Open-world Ped2 CUHK ShT UCF XD UB
VLAVAD - Fine-tuning 99.0 87.6 87.2 -- -- --
VADor - Fine-tuning -- -- -- 88.1 -- --
OVVAD - Fine-tuning -- -- -- 86.4 66.5 62.9
LAVAD GitHub Training-free -- -- -- 80.3 62.0 --
TPWNG - Fine-tuning -- -- -- 87.8 83.7 --
Holmes-VAD GitHub Fine-tuning -- -- -- 89.5 90.7 --
AnomalyRuler GitHub Fine-tuning 97.9 89.7 85.2 -- -- 71.9
STPrompt - Fine-tuning -- -- 97.8 88.1 -- 64.0
Holmes-VAU GitHub Fine-tuning -- -- -- 89.0 87.7 --
VERA - Training-free -- -- -- 86.6 88.2 --

🕹️🕹️🕹️ Comparison of Different Sampling Strategies for Temporal Reasoning

Sampling Strategies Illustration

Sampling Strategies

The figure present the most popular sampling strategies for video tasks.

Sampling Strategies Table

Comparison of different sampling strategies for temporal reasoning.

Sampling Interval Frame Count Redundancy Target Use Case Cost
Uniform Fixed Medium Medium Global trend High
Random Random Medium Low Data augmentation High
Key frame Adaptive Low to Med. Low Key event extraction Medium
Dense One High High Fine-grained modeling Low
Sliding window Adaptive Medium Medium Local temporal details Medium
Adaptive Dynamic High Low Comprehensive modeling Medium

❤️‍🔥❤️‍🔥❤️‍🔥 Contribution

We warmly invite everyone to contribute to this repository and help enhance its quality and scope. Feel free to submit pull requests to add new papers, projects, or other useful resources, as well as to correct any errors you discover. To ensure consistency, please format your pull requests using our tables' structures. We greatly appreciate your valuable contributions and support!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors