Skip to content

Darcyddx/Video-Action-Recognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

27 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Video-Action-Recognition

πŸ”₯πŸ”₯πŸ”₯The Journey of Action Recognition✈️

πŸ‘‹πŸ‘‹πŸ‘‹ A collection of methods and datasets in the journey of action recognition.

πŸ“Œ More details please refer to our paper.

πŸ› οΈ Please let us know if you find out a mistake or have any suggestions by e-mail: [email protected]

πŸ“‘ Citation

DOI

If you find our work useful for your research, please cite the following paper:

@inproceedings{10.1145/3701716.3717746,
author = {Ding, Xi and Wang, Lei},
title = {The Journey of Action Recognition},
year = {2025},
isbn = {9798400713316},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3701716.3717746},
doi = {10.1145/3701716.3717746},
booktitle = {Companion Proceedings of the ACM on Web Conference 2025},
pages = {1869–1884},
numpages = {16},
keywords = {action recognition, data, learning paradigm, model architectures},
location = {Sydney NSW, Australia},
series = {WWW '25}
}

πŸš€ News

  • [10/02/2025] 🎁 The GitHub repository for our paper has been released.
  • [27/01/2025] 🎈 Our paper has been accepted as an oral presentation at the Companion Proceedings of The Web Conference 2025 (WWW 2025)

πŸ”¦ Table of Contents

🧰 Methods Used in The Journey of Action Recognition

Handcrafted Methods

Click to expand Table 1
Model Venue Learning Dataset Modality Code
HL-STIP IJCV 2005 Supervised Outdoor scenes RGB -
Spatio-temporal Cuboids VS-PETS 2005 Supervised Human Action Dataset RGB -
3D-SURF ECCV 2006 Supervised Mikolajczyk RGB -
3D-SIFT ACM MM 2007 Supervised Weizmann RGB -
NNMF Detector ICCV 2007 Supervised KTH RGB -
HOG3D BMVC 2008 Supervised KTH, Weizmann, Hollywood RGB -
Laptev et al. CVPR 2008 Supervised KTH RGB + Optical flow -
Action MACH CVPR 2008 Supervised KTH, Weizmann RGB + Optical flow -
Extended SURF ECCV 2008 Supervised KTH, TRECVID 2006 RGB -
LTP ICCV 2009 Supervised KTH, Hollywood, Kissing and slapping dataset, UCF Sports RGB -
Messing et al. ICCV 2009 Supervised KTH RGB -
Bregonzio et al. CVPR 2009 Supervised KTH, Weizmann RGB -
Tracklet Descriptors ECCV 2010 Supervised KTH, ADL, Hollywood RGB + Optical flow -
Dense Long-Duration Trajectories ICME 2010 Supervised KTH RGB + Optical flow -
Dense Trajectories IJCV 2013 Supervised KTH, YouTube, Hollywood2, UCF Sports, IXMAS, Olympic Sports, UCF50, UIUC, HMDB51 RGB + Optical flow -
iDT ICCV 2013 Supervised Hollywood2, HMDB51, Olympic Sports, UCF50 RGB + Optical flow -
Taylor videos ICML 2024 Supervised HMDB51, CATER, MPII Cooking, Kinetics-400, -600, Something-Something V2, NTU RGB+D, Kinetics-skeleton RGB + Skeleton GitHub

2D-based Methods

Click to expand Table 2
Model Venue Learning Dataset Modality Code
Slow fusion CVPR 2014 Supervised Sports-1M, UCF101 RGB GitHub
CNN-LSTM CVPR 2015 Supervised Sports-1M, UCF101 RGB + Optical flow GitHub
LRCN CVPR 2015 Supervised UCF101 RGB + Optical flow GitHub
Composite LSTM ICML 2015 Unsupervised UCF101, HMDB51 RGB GitHub
Rank Pooling TPAMI 2016 Supervised HMDB51, Hollywood2, MPII Cooking RGB + Optical flow -
LENN CVPR 2016 Supervised UCF101 RGB -
Bilen et al. TPAMI 2017 Supervised UCF101, HMDB51 RGB -
TSN TPAMI 2018 Supervised HMDB51, UCF101, Kinetics-400, ActivityNet, THUMOS14 RGB + RGB differences + Optical flow + Audio GitHub
Attention-LSTM CVPR 2018 Supervised UCF101, HMDB51, Kinetics-400 RGB + Optical flow + Audio GitHub
PEAR ICME 2019 Reinforcement UCF101, Sports-1M RGB + Optical flow -
TSM ICCV 2019 Supervised Something-Something V1, V2, Kinetics-400, UCF101, HMDB51 RGB GitHub
VINCE arXiv 2020 Self-supervised Kinetics-400 RGB GitHub
CΒ²LSTM Neurocomputing 2020 Supervised UCF101, HMDB51 RGB -
MoCo CVPR 2021 Self-supervised Kinetics-400, UCF101, HMDB51 RGB GitHub
TCL CVPR 2021 Semi-supervised + Contrastive Mini-Something-V2, Kinetics-400, Charades-Ego RGB GitHub
TDN CVPR 2021 Supervised Something-Something V1, V2, Kinetics-400 RGB GitHub
DB-LSTM Neurocomputing 2021 Supervised UCF101, HMDB51 RGB + Optical flow -
SeCo AAAI 2021 Self-supervised Kinetics-400, UCF101, HMDB51, ActivityNet RGB GitHub
Xiao et al. CVPR 2022 Semi-supervised + Contrastive Kinetics-400, UCF101, HMDB51 RGB GitHub
GCSM ACM MM 2023 Few-shot UCF101, HMDB51, Kinetics-400 RGB -
GgHM ICCV 2023 Few-shot HMDB51, UCF101, Kinetics-400, Something-Something V2 RGB GitHub

3D-based Methods

Click to expand Table 3
Model Venue Learning Dataset Modality Code
C3D ICCV 2015 Supervised UCF101 RGB GitHub
I3D CVPR 2017 Supervised Kinetics-400, UCF101, HMDB51 RGB GitHub
P3D ICCV 2017 Supervised Sports-1M, UCF101, ActivityNet RGB GitHub
ResNet3D CVPR 2018 Supervised Kinetics-400, UCF101, HMDB51, ActivityNet RGB GitHub
S3D ECCV 2018 Supervised Kinetics-400, Something-Something V1, UCF101, HMDB51 RGB + Optical flow GitHub
CSN ICCV 2019 Supervised Sports-1M, Kinetics-400, Something-Something V1 RGB GitHub
SlowFast ICCV 2019 Supervised Kinetics-400, Kinetics-600, Charades, AVA RGB GitHub
STM ICCV 2019 Supervised Something-Something V1, Something-Something V2, Kinetics-400, UCF101, HMDB51 RGB -
DEEP-HAL ICCV 2019 Self-supervised HMDB51, Charades, MPII Cooking RGB + Optical flow -
Xv et al. CVPR 2019 Self-supervised UCF101, HMDB51 RGB -
X3D CVPR 2020 Supervised Kinetics-400, Kinetics-600, Charades, AVA RGB GitHub
TPN CVPR 2020 Supervised Kinetics-400, Something-Something V1, Something-Something V2, Epic-Kitchens RGB GitHub
SpeedNet CVPR 2020 Self-supervised Kinetics-400, UCF101, HMDB51, NfS RGB GitHub
CoCLR NeurIPS 2020 Self-supervised UCF101, HMDB51, Kinetics-400 RGB + Optical flow GitHub
VTHCL arXiv 2020 Self-supervised Kinetics-400, UCF101, HMDB51 RGB GitHub
MvPL ICCV 2021 Semi-supervised Kinetics-400, UCF101, HMDB51 RGB + Optical flow -
CVRL CVPR 2021 Self-supervised Kinetics-400, Kinetics-600, UCF101, HMDB51 RGB GitHub
Yang et al. CVPR 2021 Supervised Kinetics-400, Kinetics-700, Charades, Something-Something V1, AVA RGB -
3DResNet+ATFR CVPR 2021 Supervised Kinetics-400, Kinetics-600, UCF101, HMDB51, Something-Something V2 RGB -
MoViNet CVPR 2021 Supervised Kinetics-400, Kinetics-600, Kinetics-700, Something-Something V2, Epic-Kitchens-100, MiT, Charades RGB GitHub
ODF+SDF ACM MM 2021 Self-supervised HMDB51, Charades, MPII Cooking, EPIC-Kitchen RGB + Optical flow + object/saliency detectors -
CLASTER ECCV 2022 Reinforcement+Zero-shot UCF101, HMDB51, Olympic Sports RGB + Optical flow + Semantic embeddings -
TFCNet arXiv 2022 Supervised Diving48, CATER RGB -
Multi-Transforms ICMEW 2024 Self-supervised UCF101, HMDB51 RGB -
HoT ICASSP 2024 Supervised HMDB51, MPII Cooking RGB + Optical flow -
Flow corr. ICASSP 2024 Supervised HMDB51, Charades, MPII Cooking RGB + Optical flow -

Two-stream Methods

Click to expand Table 4
Model Venue Learning Dataset Modality Code
Two-Stream ConvNet NeurIPS 2014 Supervised UCF101, HMDB51 RGB + Optical flow GitHub
P-CNN ICCV 2015 Supervised JHMDB, MPII Cooking RGB + Optical Flow + Joint -
TDD CVPR 2015 Supervised HMDB51, UCF101 RGB + Optical flow GitHub
Two-Stream Fusion CVPR 2016 Supervised UCF101, HMDB51 RGB + Optical flow GitHub
TSN-Two-Stream ECCV 2016 Supervised HMDB51, UCF101 RGB + RGB differences + Optical flow + Warped optical flow GitHub
DOVF CVPR 2017 Supervised UCF101, HMDB51 RGB + Optical flow GitHub
TLE CVPR 2017 Supervised UCF101, HMDB51 RGB + Optical flow GitHub
ActionVLAD CVPR 2017 Supervised HMDB51, UCF101, Charades RGB + Optical flow -
TRN-Two-Stream ECCV 2018 Supervised Something-Something V1, Something-Something V2, Charades RGB GitHub
TSM-Two-Stream ICCV 2019 Supervised Something-Something V1, Something-Something V2, Kinetics-400, UCF101, HMDB51 RGB + Optical flow GitHub
KTSN arXiv 2020 Supervised FSD-10 RGB + Optical flow + Skeleton -
MSM-ResNets IVC 2021 Supervised UCF101, HMDB51 RGB + Optical Flow + Motion Saliency -
MAT-EffNet MMSys 2023 Supervised UCF101, HMDB51, Kinetics-400 RGB + Optical flow -
TTFA SPL 2024 Few-shot Something-Something V2, Kinetics-400 RGB + Optical flow -

(2+1)D-based Methods

Click to expand Table 5
Model Venue Learning Dataset Modality Code
R(2+1)D CVPR 2018 Supervised Kinetics-400, Sports-1M, UCF101, HMDB51 RGB + Optical flow GitHub
R(2+1)D+BERT ECCVW 2020 Supervised HMDB51, UCF101 RGB GitHub
XDC NeurIPS 2020 Self-supervised HMDB51, UCF101 RGB + Audio GitHub
ELo CVPR 2020 Self-supervised Kinetics-400, UCF101, HMDB51 RGB + Optical flow + Audio -
Jin et al. ICICSP 2021 Supervised UCF101 RGB -
GDT arXiv 2021 Self-supervised Kinetics-400, UCF101, HMDB51 RGB + Audio -
AVID CVPR 2021 Self-supervised Kinetics-400, UCF101, HMDB51 RGB + Audio GitHub

Transformer-based Methods

Click to expand Table 6
Model Venue Learning Dataset Modality Code
VTN ICCV 2021 Supervised Kinetics-400, MiT RGB GitHub
TimeSformer ICML 2021 Supervised Kinetics-400, Kinetics-600 RGB GitHub
STAM arXiv 2021 Supervised Kinetics-400, UCF101, Charades RGB GitHub
ViViT ICCV 2021 Supervised Kinetics-400, Kinetics-600, Epic-Kitchens-100, MiT, Something-Something V2 RGB GitHub
MViT ICCV 2021 Supervised Kinetics-400, Kinetics-600, Something-Something V2, Charades, AVA RGB GitHub
Motionformer NeurIPS 2021 Supervised Kinetics-400, Kinetics-600, Something-Something V2, Epic-Kitchens-100 RGB GitHub
X-ViT NeurIPS 2021 Supervised Kinetics-400, Kinetics-600, Something-Something V2, Epic-Kitchens-100 RGB GitHub
TallFormer ECCV 2022 Supervised THUMOS14, ActivityNet RGB GitHub
VideoSwin CVPR 2022 Supervised Kinetics-400, Kinetics-600, Something-Something V2 RGB GitHub
ORViT CVPR 2022 Supervised Something-Something V2, SomethingElse, Diving48, AVA, Epic-Kitchens-100 RGB GitHub
BEVT CVPR 2022 Self-supervised Kinetics-400, Something-Something V2, Diving-48 RGB GitHub
MaskFeat CVPR 2022 Self-supervised Kinetics-400, Kinetics-600, Kinetics-700 RGB GitHub
UniFormer arXiv 2022 Supervised Kinetics-400, Kinetics-600, Something-Something V1, V2 RGB GitHub
VideoMAE NeurIPS 2022 Self-supervised Kinetics-400, Something-Something V2, UCF101, HMDB51, AVA RGB GitHub
MTV CVPR 2022 Supervised Kinetics-400, Kinetics-600, Kinetics-700, Something-Something V2, Epic-Kitchens-100, MiT RGB GitHub
MAE-ST arXiv 2022 Self-supervised Kinetics-400, Something-Something V2, AVA RGB GitHub
CAST NeurIPS 2023 Supervised Kinetics-400, Something-Something V2, Epic-Kitchens-100 RGB GitHub
UniFormerV2 ICCV 2023 Supervised+Contrastive Kinetics-400, Kinetics-600, Kinetics-700, MiT, Something-Something V1, V2, ActivityNet, HACS RGB -
OmniMAE CVPR 2023 Self-supervised Something-Something V2, Epic-Kitchens-100, Kinetics-400 RGB GitHub
MVD CVPR 2023 Self-supervised Kinetics-400, Something-Something V2, UCF101, HMDB51 RGB GitHub
Hiera ICML 2023 Self-supervised Kinetics-400, Kinetics-600, Kinetics-700, Something-Something V2, AVA RGB GitHub
VideoMAE V2 CVPR 2023 Self-supervised Kinetics-400, Something-Something V2, UCF101, HMDB51 RGB GitHub
SOAP ACM MM 2024 Few-shot Something-Something V2, Kinetics-400, UCF101, HMDB51 RGB GitHub
C2C ECCV 2024 Zero-shot Sth-com RGB GitHub
VMPs ACML 2024 Supervised HMDB51, MPII Cooking 2, FineGym RGB + Motion prompts GitHub
TIME Layer arXiv 2024 Self-supervised UCF101, HMDB51, UWA3D Multiview Activity II, NTU RGB+D, NTU RGB+D 120 RGB + Depth -

Skeletons-based Methods

Click to expand Table 7
Model Venue Learning Dataset Modality Code
Dynamic Skeletons CVPR 2015 Supervised MSRDailyActivity, CAD-60, SYSU 3D HOI Depth + Joint -
HBRNN-L CVPR 2015 Supervised MSRAction3D, Berkeley MHAD, HDM05 Joint -
Part-aware LSTM CVPR 2016 Supervised NTU RGB+D RGB + Depth + Joint + Infrared GitHub
LARP-SO CVPR 2016 Supervised Florence3D-Action, MSRActionPairs3D, G3D-Gaming Joint -
STA-LSTM AAAI 2017 Supervised NTU RGB+D Joint -
LieNet CVPR 2017 Supervised NTU RGB+D, HDM05, G3D-Gaming Joint + Bone -
Two-Stream RNN CVPR 2017 Supervised NTU RGB+D Joint -
Ke et al. CVPR 2017 Supervised NTU RGB+D Joint -
VA-LSTM ICCV 2017 Supervised NTU RGB+D, SYSU 3D HOI Joint GitHub
View Invariant Pattern Recognit. 2017 Supervised NTU RGB+D, Northwestern-UCLA, UWA3D Multiview Activity II, MSRC-12 Joint -
Two-Stream CNN ICMEW 2017 Supervised NTU RGB+D, PKU-MMD I Joint + Skeleton motion GitHub
LSTM-CNN ICMEW 2017 Supervised NTU RGB+D Joint -
ST-LSTM+Trust Gate TPAMI 2018 Supervised NTU RGB+D, MSRAction3D, SYSU 3D HOI, Berkeley MHAD Joint -
ST-GCN AAAI 2018 Supervised Kinetics-400, NTU RGB+D Joint GitHub
Tang et al. CVPR 2018 Reinforcement NTU RGB+D, SYSU 3D HOI, UTKinect-Action3D Joint + Bone -
AS-GCN CVPR 2019 Supervised NTU RGB+D, Kinetics-400 Joint + Bone GitHub
2s-AGCN CVPR 2019 Fully-supervised NTU RGB+D, Kinetics-skeleton Joint + Bone GitHub
DGNN CVPR 2019 Supervised NTU RGB+D, Kinetics-skeleton Joint + Bone GitHub
EfficientGCN ACM MM 2020 Supervised NTU RGB+D, NTU RGB+D 120 Joint + Velocity + Bone -
RA-GCN TCSVT 2020 Supervised NTU RGB+D, NTU RGB+D 120 Joint + Bone gitee
Shift-GCN CVPR 2020 Supervised NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA Joint + Bone GitHub
MS-G3D CVPR 2020 Supervised NTU RGB+D 60, NTU RGB+D 120, Kinetics-skeleton Joint + Bone GitHub
DSTA-Net ACCV 2020 Supervised NTU RGB+D, NTU RGB+D 120 Joint + Bone -
SCK+DCK / SCK$\oplus$+DCK$\oplus$ TPAMI 2020 Supervised UTKinect-Action3D, Florence3D-Action, MSRAction3D, NTU RGB+D 60, Kinetics-400, HMDB51, MPII Cooking Joint -
CTR-GCN ICCV 2021 Supervised NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA Joint + Bone -
FGCN TIP 2022 Supervised NTU RGB+D, NTU RGB+D120, Northwestern-UCLA Joint + Bone -
AGE-Ens TNNLS 2022 Supervised NTU RGB+D, NTU RGB+D 120 Joint + Bone GitHub
PoseConv3D CVPR 2022 Supervised Kinetics-400, UCF101, HMDB51 Joint + Bone + RGB GitHub
InfoGCN CVPR 2022 Supervised NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA Joint + Bone GitHub
DASTM ECCV 2022 Few-shot NTU RGB+D 120, Kinetics-skeleton Joint + Bone -
Uncertainty-DTW ECCV 2022 Supervised/Unsupervised few-shot NTU RGB+D, NTU RGB+D 120, Kinetics-skeleton Skeleton sequences GitHub
TranSkeleton TCSVT 2023 Supervised NTU RGB+D, NTU RGB+D 120 Joint + Bone -
HiCo AAAI 2023 Unsupervised + Contrastive NTU RGB+D, NTU RGB+D 120, PKU-MMD I, PKU MMD II Joint GitHub
FR-Head CVPR 2023 Supervised + Contrastive NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA Joint + Bone GitHub
3Mformer CVPR 2023 Supervised NTU RGB+D, NTU RGB+D 120, Kinetics-400, Northwestern-UCLA Joint + Hyper-edge -
HYSP ICLR 2023 Self-supervised NTU RGB+D, NTU RGB+D 120, PKU-MMD I Joint GitHub
PAINet ICCV 2023 Few-shot NTU RGB+D 120, Kinetics-skeleton Joint + Bone -
PCM3 ACM MM 2023 Self-supervised NTU RGB+D, NTU RGB+D 120, PKU-MMD I Joint + Bone + Motion GitHub
Stream-GCN IJCAI 2023 Supervised NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA Joint + Bone -
SkeletonGCL arXiv 2023 Self-supervised NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA Joint + Bone GitHub
DSCNet ESWA 2024 Supervised + Multimodal NTU RGB+D, NTU RGB+D 120, PKU-MMD I, UAV-Human, IKEA ASM, Northwestern-UCLA RGB + Joint + Bone -
Skeleton-OOD Neurocomputing 2024 Supervised NTU RGB+D, NTU RGB+D 120, Kinetics-400 Joint GitHub
ViA IJCV 2024 Self-supervised Posetics, NTU RGB+D, NTU RGB+D 120, Toyota Smarthome, UAV-Human, Penn Action Joint + Motion GitHub
DeGCN TIP 2024 Supervised NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA Joint + Bone GitHub
Js-SaPR-GCN TCSVT 2024 Supervised NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA Joint + Bone + Motion -
BlockGCN CVPR 2024 Supervised NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA Joint + Bone + Motion GitHub
JEANIE IJCV 2024 Supervised/Unsupervised few-shot NTU RGB+D, NTU RGB+D 120, Kinetics-skeleton, MSRAction3D, UWA3D Multiview Activity Skeleton sequences -
SA-DVAE arXiv 2024 Zero-shot NTU RGB+D, NTU RGB+D 120, PKU-MMD I Joint GitHub
ProtoGCN arXiv 2024 Self-supervised + Prototype NTU RGB+D, NTU RGB+D 120, Kinetics-skeleton, FineGYM Joint GitHub
HSIC-based arXiv 2024 Supervised NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA Joint + Bone -
USDRL AAAI 2025 Self-supervised NTU RGB+D, NTU RGB+D 120, PKU-MMD I, PKU-MMD II Joint + Bone + Motion GitHub

Depth-based Methods

Click to expand Table 8
Model Venue Learning Dataset Modality Code
HON4D CVPR 2013 Supervised MSRAction3D, MSRDailyActivity3D, MSRActionPairs3D Depth -
HOPC ECCV 2014 Supervised MSRAction3D, MSRActionPairs3D, UWA3D Multiview Activity Depth + Point cloud -
Wang et al. Trans. Human-Mach. Syst. 2016 Supervised MSRAction3D, MSRDailyActivity3D, UTKinect-Action3D Depth -
Rahmani et al. CVPR 2016 Supervised Northwestern-UCLA, UWA3D Multiview Activity II Depth -
S2DDI ICCVW 2017 Supervised MSRAction3D, G3D-Gaming, MSRDailyActivity3D, SYSU 3D HOI, UTD-MHAD Depth -
Wang et al. TMM 2018 Supervised NTU RGB+D Depth -
MVDI Inf. Sci. 2018 Supervised NTU RGB+D, Northwestern-UCLA, UWA3D Multiview Activity II Depth GitHub
3DFCNN Multimed. Tools Appl. 2020 Supervised NTU RGB+D, Northwestern-UCLA, UWA3D Multiview Activity II Depth -
Liu et al. ICASSP 2017 Supervised MSRAction3D, DHA Depth -
Dhiman et al. TIP 2020 Supervised NTU RGB-D, UWA3D Multiview Activity II, Northwestern-UCLA RGB + Depth -
Stateful ConvLSTM arXiv 2020 Supervised NTU RGB+D Depth -
DEAR arXiv 2024 Supervised Something-Something V2 RGB + Depth GitHub

Infrared-based Methods

Click to expand Table 9
Model Venue Learning Dataset Modality Code
Gao et al. Neurocomputing 2016 Supervised InfAR Infrared + Optical flow -
Jiang et al. CVPRW 2017 Supervised InfAR Infrared + Optical flow -
Kawashima et al. AVSS 2017 Supervised Custom Dataset Infrared -
Shah et al. SPIE 2018 Supervised Custom IR Dataset Infrared -
TSTDDs SPL 2018 Supervised InfAR, NTU RGB+D Infrared + Optical flow -
Akula et al. CSR 2018 Supervised Custom IR Dataset Infrared -
Imran et al. Infrared Phys. Technol. 2019 Supervised InfAR, IITR-IAR Infrared + Optical flow -
Meglouli et al. CEAI 2019 Supervised InfAR Infrared + Optical flow -
Mehta et al. ICPR 2020 Adversarial TSF Infrared + Optical flow GitHub

Point Cloud Methods

Click to expand Table 10
Model Venue Learning Dataset Modality Code
MeteorNet ICCV 2019 Supervised MSRAction3D Point cloud GitHub
PointLSTM CVPR 2020 Supervised MSRAction3D Point cloud GitHub
3DV-PointNet++ CVPR 2020 Supervised NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA, UWA3D Multiview Activity II Depth GitHub
ASTA3DConv Trans. Instrum. Meas. 2020 Supervised MSRAction3D Point cloud -
Wang et al. WACV 2021 Self-supervised NTU RGB+D, NTU-PCL, MSRAction3D Point cloud -
P4Transformer CVPR 2021 Supervised MSRAction3D, NTU RGB+D, NTU RGB+D 120 Point cloud GitHub
PSTNet ICLR 2021 Supervised MSRAction3D, NTU RGB+D, NTU RGB+D 120 Point cloud GitHub
PST2 WACV 2022 Supervised MSRAction3D Point cloud -
MaST-Pre ICCV 2023 Self-supervised MSRAction3D, NTU RGB+D Point cloud GitHub
PointCPSC ICCV 2023 Self-supervised MSRAction3D, NTU RGB+D Point cloud -
3DInAction CVPR 2024 Supervised MSRAction3D Point cloud GitHub
KAN-HyperpointNet arXiv 2024 Supervised NTU RGB+D, MSRAction3D Point cloud -

Text/Audio Methods

Click to expand Table 11
Model Venue Learning Dataset Modality Code
CPD arXiv 2020 Self-supervised Kinetics-400, HMDB51, UCF101 RGB + Text GitHub
G-Blend CVPR 2020 Multi-task Kinetics-400, Mini-Sports, EPIC-Kitchen RGB + Optical flow + Audio -
MIL-NCE CVPR 2020 Self-supervised HowTo100M, HMDB51, UCF101 RGB + Text GitHub
MMV NeurIPS 2020 Self-supervised UCF101, HMDB51, Kinetics-600 RGB + Audio + Text GitHub
VIMPAC arXiv 2021 Self-supervised Something-Something V2, Diving48, UCF101, HMDB51 RGB + Text GitHub
InternVideo CVPR 2023 Self-supervised Kinetics-400, Kinetics-600, Kinetics-700, Something-Something V1, V2, ActivityNet, HACS, HMDB51 RGB + Text GitHub
Side4Video arXiv 2023 Self-supervised Something-Something V1, Something-Something V2, Kinetics-400 RGB + Text GitHub
EZ-CLIP arXiv 2024 Zero-shot Kinetics-400, HMDB51, UCF101, Something-Something V2 RGB + Text GitHub
SATA arXiv 2024 Zero-shot UCF101, HMDB51 RGB + Text GitHub
TC-CLIP ECCV 2024 Zero-shot/Few-shot/Fully-supervised HMDB51, UCF101, Kinetics-400, Something-Something V2 RGB + Text -
InternVideo2 arXiv 2024 Self-supervised + Multimodal Kinetics-400, Kinetics-600, Kinetics-700, MiT, Something-Something V2, ActivityNet, HACS, Charades, HMDB51 RGB + Audio + Text GitHub
OmniViD CVPR 2024 Supervised Kinetics-400, Something-Something V2, UCF101, HMDB51 RGB + Text GitHub
LoCATe-GAT TETCI 2024 Zero-shot UCF101, HMDB51, ActivityNet, Kinetics-400 RGB + Text GitHub
STDD arXiv 2024 Zero-shot Kinetics-600, UCF101, HMDB51 RGB + Text GitHub

πŸ’» Datasets Used in The Journey of Action Recognition

Click to expand Table 12
Datasets Year # Classes # Subjects # Views # Video clips Sensor Modalities Dataset type
KTH 2004 6 25 1 2391 Static camera RGB Human actions (e.g., walking, jogging)
Weizmann 2005 10 9 1 90 - RGB Human actions (e.g., jumping, running)
IXMAS 2006 11 10 5 330 - RGB Movie Scenes (e.g., kissing, running)
Hollywood 2008 8 - - 1422 - RGB Movie Scenes (e.g., eating, driving)
Hollywood2 2009 12 - - 1709 - RGB Movie Scenes (e.g., running, kissing)
ADL 2009 10 5 - 150 Static camera RGB Daily Activities (e.g., brushing teeth, reading)
Olympic Sports 2010 16 - - 783 - RGB Sports (e.g., high jumping, diving)
MSRAction3D 2010 20 10 1 567 Kinect v1 Depth+3DJoints Daily Activities (e.g., drinking, walking)
CAD-60 2011 14 4 - 68 Kinect v1 RGB+Depth+3DJoints Human performing activities (e.g., cleaning objects)
HMDB51 2011 51 - - 6,766 - RGB Human actions (e.g., jumping, running)
MSRDailyActivity3D 2012 16 10 1 320 Kinect v1 RGB+Depth+3DJoints Daily Activities (e.g., calling, playing game)
UCF101 2012 101 - - 13,320 - RGB Body motion, Human-object interactions, sports etc.
UTKinect-Action3D 2012 10 10 1 199 Kinect v1 RGB+Depth+3DJoints Human actions (e.g., waving hands, pushing)
MPII Cooking 2012 64 12 1 3,748 - RGB Cooking
G3D-Gaming 2012 20 10 1 - Kinect v1 RGB+Depth+3DJoints Gaming scenario (e.g., defending, climbing)
Berkeley MHAD 2013 11 12 4 660 Multi-baseline stereo cameras RGB+Depth+3DJoints+Accelerometer+Audio Human actions (e.g., throwing, clapping hands)
CAD-120 2013 10 4 - 120 Kinect v1 RGB+Depth+3DJoints Human performing activities (e.g., picking objects)
UCF50 2013 50 - - 6676 - RGB Body motion, Human-object interactions, sports etc.
Florence3D-Action 2013 9 10 1 215 Kinect v1 RGB+Depth+3DJoints Human actions (e.g., bowing, drinking)
MSRActionPairs3D 2013 12 10 1 360 Kinect v1 RGB+Depth+3DJoints Human actions (e.g., picking up, putting down)
Sports-1M 2014 487 - - 1,000,000 - RGB Sports (e.g., swimming, skiing)
THUMOS14 2014 101 - - 5,613 - RGB Human Actions (e.g., making up, archery)
Northwestern-UCLA 2014 10 10 3 1494 Kinect v1 RGB+Depth+3DJoints Human actions (e.g., dropping trash)
UWA3D Multiview Activity 2014 30 10 1 701 Kinect v1 RGB+Depth+3DJoints Daily Activities (e.g., holding head, walking)
ActivityNet 2015 203 - - 27,801 - RGB Human actions (e.g., drawing, washing)
MPII Cooking 2 2015 67 30 1 273 Static camera RGB Cooking
UWA3D Multiview Activity II 2015 30 9 4 1,070 Kinect v1 RGB+Depth+3DJoints Daily Activities (e.g., waving head, jumping)
SYSU 3D HOI 2015 12 40 - 480 Kinect v1 RGB+Depth+3DJoints Human-Object Interactions (e.g., sweeping the floor)
NTU RGB+D 2016 60 40 80 56,880 Kinect v2 RGB+Depth+3DJoints Daily actions, health-related actions etc.
InfAR 2016 12 40 - 600 Infrared camera Infrared Human actions (e.g., jogging)
TSF 2016 2 - 1 44 FLIR ONE Infrared Falls and normal activities
Charades 2016 157 - - 66,500 - RGB+Flow Indoor activities (e.g., cleaning)
PKU-MMD I 2017 51 66 3 1,076 Kinect v2 RGB+Depth+Infrared+3DJoints Human actions (e.g., walking)
NfS 2017 - - - 100 240 FPS camera RGB Visual object tracking
Kinetics-400 2017 400 - - 306,245 - RGB Human-centered actions (e.g., playing instruments)
Something-Something V1 2017 174 - - 108,499 - RGB Human performing actions with everyday objects
Kinetics-skeleton 2017 400 - - 260,232 - 2DJoints Human-centered actions
HACS 2017 200 - - 1,500,000 - RGB+Flow Human actions (e.g., dancing)
Charades-Ego 2018 157 112 2 68,536 Head-mounted+standard camera RGB Egocentric indoor activities
AVA 2018 80 - - 211,000 - RGB+Flow Human actions (e.g., talking, sitting)
Diving48 2018 48 - - 18,404 - RGB+Flow Diving actions
Epic-Kitchens 2018 149 32 - 39,594 - RGB+Flow Cooking
Something-Something V2 2018 174 - - 220,847 - RGB Human performing actions with everyday objects
MiT 2018 339 - - 1,000,000+ - RGB+Audio+Flow Dynamic actions (e.g., human, animals)
Kinetics-600 2018 600 - - 495,547 - RGB Human-centered actions (e.g., playing instruments)
NTU RGB+D 120 2019 120 106 155 114,480 Kinect v2 RGB+Depth+3DJoints+Infrared Daily actions, health-related actions etc.
IITR-IAR 2019 21 35 - 1,470 FLIR T1020 Infrared Human actions (hugging, fighting)
Kinetics-700 2019 700 - - 650,317 - RGB Human-centered actions (e.g., playing instruments)
HowTo100M 2019 23,611 - - 136,000,000 - RGB Instructional videos (e.g., cooking)
CATER 2019 301 - - 5,500 - RGB Compositional actions and temporal reasoning
FineGym 2020 530 - - 32,697 - RGB Gymnasium videos (e.g., balance beam)
PKU-MMD II 2020 41 13 3 1,009 Kinect v2 RGB+Depth+Infrared+3DJoints Human actions (e.g., standing)
EPIC-KITCHENS-100 2020 4,053 37 - 89,977 GoPro Hero7 Black RGB+Flow Cooking
UAV-Human 2021 155 119 - 22,476 UAV Camera RGB+3DJoints Human Actions (e.g., walking, jogging)

❀️‍πŸ”₯❀️‍πŸ”₯❀️‍πŸ”₯ Contribution

We warmly invite everyone to contribute to this repository and help enhance its quality and scope. Feel free to submit pull requests to add new methods, datasets or other useful resources, as well as to correct any errors you discover. To ensure consistency, please format your pull requests using our tables' structures. We greatly appreciate your valuable contributions and support!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors