π₯π₯π₯The Journey of Action RecognitionβοΈ
πππ A collection of methods and datasets in the journey of action recognition.
π More details please refer to our paper.
π οΈ Please let us know if you find out a mistake or have any suggestions by e-mail: [email protected]
If you find our work useful for your research, please cite the following paper:
@inproceedings{10.1145/3701716.3717746,
author = {Ding, Xi and Wang, Lei},
title = {The Journey of Action Recognition},
year = {2025},
isbn = {9798400713316},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3701716.3717746},
doi = {10.1145/3701716.3717746},
booktitle = {Companion Proceedings of the ACM on Web Conference 2025},
pages = {1869β1884},
numpages = {16},
keywords = {action recognition, data, learning paradigm, model architectures},
location = {Sydney NSW, Australia},
series = {WWW '25}
}- [10/02/2025] π The GitHub repository for our paper has been released.
- [27/01/2025] π Our paper has been accepted as an oral presentation at the Companion Proceedings of The Web Conference 2025 (WWW 2025)
- Video-Action-Recognition
Click to expand Table 1
| Model | Venue | Learning | Dataset | Modality | Code |
|---|---|---|---|---|---|
| HL-STIP | IJCV 2005 | Supervised | Outdoor scenes | RGB | - |
| Spatio-temporal Cuboids | VS-PETS 2005 | Supervised | Human Action Dataset | RGB | - |
| 3D-SURF | ECCV 2006 | Supervised | Mikolajczyk | RGB | - |
| 3D-SIFT | ACM MM 2007 | Supervised | Weizmann | RGB | - |
| NNMF Detector | ICCV 2007 | Supervised | KTH | RGB | - |
| HOG3D | BMVC 2008 | Supervised | KTH, Weizmann, Hollywood | RGB | - |
| Laptev et al. | CVPR 2008 | Supervised | KTH | RGB + Optical flow | - |
| Action MACH | CVPR 2008 | Supervised | KTH, Weizmann | RGB + Optical flow | - |
| Extended SURF | ECCV 2008 | Supervised | KTH, TRECVID 2006 | RGB | - |
| LTP | ICCV 2009 | Supervised | KTH, Hollywood, Kissing and slapping dataset, UCF Sports | RGB | - |
| Messing et al. | ICCV 2009 | Supervised | KTH | RGB | - |
| Bregonzio et al. | CVPR 2009 | Supervised | KTH, Weizmann | RGB | - |
| Tracklet Descriptors | ECCV 2010 | Supervised | KTH, ADL, Hollywood | RGB + Optical flow | - |
| Dense Long-Duration Trajectories | ICME 2010 | Supervised | KTH | RGB + Optical flow | - |
| Dense Trajectories | IJCV 2013 | Supervised | KTH, YouTube, Hollywood2, UCF Sports, IXMAS, Olympic Sports, UCF50, UIUC, HMDB51 | RGB + Optical flow | - |
| iDT | ICCV 2013 | Supervised | Hollywood2, HMDB51, Olympic Sports, UCF50 | RGB + Optical flow | - |
| Taylor videos | ICML 2024 | Supervised | HMDB51, CATER, MPII Cooking, Kinetics-400, -600, Something-Something V2, NTU RGB+D, Kinetics-skeleton | RGB + Skeleton | GitHub |
Click to expand Table 2
| Model | Venue | Learning | Dataset | Modality | Code |
|---|---|---|---|---|---|
| Slow fusion | CVPR 2014 | Supervised | Sports-1M, UCF101 | RGB | GitHub |
| CNN-LSTM | CVPR 2015 | Supervised | Sports-1M, UCF101 | RGB + Optical flow | GitHub |
| LRCN | CVPR 2015 | Supervised | UCF101 | RGB + Optical flow | GitHub |
| Composite LSTM | ICML 2015 | Unsupervised | UCF101, HMDB51 | RGB | GitHub |
| Rank Pooling | TPAMI 2016 | Supervised | HMDB51, Hollywood2, MPII Cooking | RGB + Optical flow | - |
| LENN | CVPR 2016 | Supervised | UCF101 | RGB | - |
| Bilen et al. | TPAMI 2017 | Supervised | UCF101, HMDB51 | RGB | - |
| TSN | TPAMI 2018 | Supervised | HMDB51, UCF101, Kinetics-400, ActivityNet, THUMOS14 | RGB + RGB differences + Optical flow + Audio | GitHub |
| Attention-LSTM | CVPR 2018 | Supervised | UCF101, HMDB51, Kinetics-400 | RGB + Optical flow + Audio | GitHub |
| PEAR | ICME 2019 | Reinforcement | UCF101, Sports-1M | RGB + Optical flow | - |
| TSM | ICCV 2019 | Supervised | Something-Something V1, V2, Kinetics-400, UCF101, HMDB51 | RGB | GitHub |
| VINCE | arXiv 2020 | Self-supervised | Kinetics-400 | RGB | GitHub |
| CΒ²LSTM | Neurocomputing 2020 | Supervised | UCF101, HMDB51 | RGB | - |
| MoCo | CVPR 2021 | Self-supervised | Kinetics-400, UCF101, HMDB51 | RGB | GitHub |
| TCL | CVPR 2021 | Semi-supervised + Contrastive | Mini-Something-V2, Kinetics-400, Charades-Ego | RGB | GitHub |
| TDN | CVPR 2021 | Supervised | Something-Something V1, V2, Kinetics-400 | RGB | GitHub |
| DB-LSTM | Neurocomputing 2021 | Supervised | UCF101, HMDB51 | RGB + Optical flow | - |
| SeCo | AAAI 2021 | Self-supervised | Kinetics-400, UCF101, HMDB51, ActivityNet | RGB | GitHub |
| Xiao et al. | CVPR 2022 | Semi-supervised + Contrastive | Kinetics-400, UCF101, HMDB51 | RGB | GitHub |
| GCSM | ACM MM 2023 | Few-shot | UCF101, HMDB51, Kinetics-400 | RGB | - |
| GgHM | ICCV 2023 | Few-shot | HMDB51, UCF101, Kinetics-400, Something-Something V2 | RGB | GitHub |
Click to expand Table 3
| Model | Venue | Learning | Dataset | Modality | Code |
|---|---|---|---|---|---|
| C3D | ICCV 2015 | Supervised | UCF101 | RGB | GitHub |
| I3D | CVPR 2017 | Supervised | Kinetics-400, UCF101, HMDB51 | RGB | GitHub |
| P3D | ICCV 2017 | Supervised | Sports-1M, UCF101, ActivityNet | RGB | GitHub |
| ResNet3D | CVPR 2018 | Supervised | Kinetics-400, UCF101, HMDB51, ActivityNet | RGB | GitHub |
| S3D | ECCV 2018 | Supervised | Kinetics-400, Something-Something V1, UCF101, HMDB51 | RGB + Optical flow | GitHub |
| CSN | ICCV 2019 | Supervised | Sports-1M, Kinetics-400, Something-Something V1 | RGB | GitHub |
| SlowFast | ICCV 2019 | Supervised | Kinetics-400, Kinetics-600, Charades, AVA | RGB | GitHub |
| STM | ICCV 2019 | Supervised | Something-Something V1, Something-Something V2, Kinetics-400, UCF101, HMDB51 | RGB | - |
| DEEP-HAL | ICCV 2019 | Self-supervised | HMDB51, Charades, MPII Cooking | RGB + Optical flow | - |
| Xv et al. | CVPR 2019 | Self-supervised | UCF101, HMDB51 | RGB | - |
| X3D | CVPR 2020 | Supervised | Kinetics-400, Kinetics-600, Charades, AVA | RGB | GitHub |
| TPN | CVPR 2020 | Supervised | Kinetics-400, Something-Something V1, Something-Something V2, Epic-Kitchens | RGB | GitHub |
| SpeedNet | CVPR 2020 | Self-supervised | Kinetics-400, UCF101, HMDB51, NfS | RGB | GitHub |
| CoCLR | NeurIPS 2020 | Self-supervised | UCF101, HMDB51, Kinetics-400 | RGB + Optical flow | GitHub |
| VTHCL | arXiv 2020 | Self-supervised | Kinetics-400, UCF101, HMDB51 | RGB | GitHub |
| MvPL | ICCV 2021 | Semi-supervised | Kinetics-400, UCF101, HMDB51 | RGB + Optical flow | - |
| CVRL | CVPR 2021 | Self-supervised | Kinetics-400, Kinetics-600, UCF101, HMDB51 | RGB | GitHub |
| Yang et al. | CVPR 2021 | Supervised | Kinetics-400, Kinetics-700, Charades, Something-Something V1, AVA | RGB | - |
| 3DResNet+ATFR | CVPR 2021 | Supervised | Kinetics-400, Kinetics-600, UCF101, HMDB51, Something-Something V2 | RGB | - |
| MoViNet | CVPR 2021 | Supervised | Kinetics-400, Kinetics-600, Kinetics-700, Something-Something V2, Epic-Kitchens-100, MiT, Charades | RGB | GitHub |
| ODF+SDF | ACM MM 2021 | Self-supervised | HMDB51, Charades, MPII Cooking, EPIC-Kitchen | RGB + Optical flow + object/saliency detectors | - |
| CLASTER | ECCV 2022 | Reinforcement+Zero-shot | UCF101, HMDB51, Olympic Sports | RGB + Optical flow + Semantic embeddings | - |
| TFCNet | arXiv 2022 | Supervised | Diving48, CATER | RGB | - |
| Multi-Transforms | ICMEW 2024 | Self-supervised | UCF101, HMDB51 | RGB | - |
| HoT | ICASSP 2024 | Supervised | HMDB51, MPII Cooking | RGB + Optical flow | - |
| Flow corr. | ICASSP 2024 | Supervised | HMDB51, Charades, MPII Cooking | RGB + Optical flow | - |
Click to expand Table 4
| Model | Venue | Learning | Dataset | Modality | Code |
|---|---|---|---|---|---|
| Two-Stream ConvNet | NeurIPS 2014 | Supervised | UCF101, HMDB51 | RGB + Optical flow | GitHub |
| P-CNN | ICCV 2015 | Supervised | JHMDB, MPII Cooking | RGB + Optical Flow + Joint | - |
| TDD | CVPR 2015 | Supervised | HMDB51, UCF101 | RGB + Optical flow | GitHub |
| Two-Stream Fusion | CVPR 2016 | Supervised | UCF101, HMDB51 | RGB + Optical flow | GitHub |
| TSN-Two-Stream | ECCV 2016 | Supervised | HMDB51, UCF101 | RGB + RGB differences + Optical flow + Warped optical flow | GitHub |
| DOVF | CVPR 2017 | Supervised | UCF101, HMDB51 | RGB + Optical flow | GitHub |
| TLE | CVPR 2017 | Supervised | UCF101, HMDB51 | RGB + Optical flow | GitHub |
| ActionVLAD | CVPR 2017 | Supervised | HMDB51, UCF101, Charades | RGB + Optical flow | - |
| TRN-Two-Stream | ECCV 2018 | Supervised | Something-Something V1, Something-Something V2, Charades | RGB | GitHub |
| TSM-Two-Stream | ICCV 2019 | Supervised | Something-Something V1, Something-Something V2, Kinetics-400, UCF101, HMDB51 | RGB + Optical flow | GitHub |
| KTSN | arXiv 2020 | Supervised | FSD-10 | RGB + Optical flow + Skeleton | - |
| MSM-ResNets | IVC 2021 | Supervised | UCF101, HMDB51 | RGB + Optical Flow + Motion Saliency | - |
| MAT-EffNet | MMSys 2023 | Supervised | UCF101, HMDB51, Kinetics-400 | RGB + Optical flow | - |
| TTFA | SPL 2024 | Few-shot | Something-Something V2, Kinetics-400 | RGB + Optical flow | - |
Click to expand Table 5
| Model | Venue | Learning | Dataset | Modality | Code |
|---|---|---|---|---|---|
| R(2+1)D | CVPR 2018 | Supervised | Kinetics-400, Sports-1M, UCF101, HMDB51 | RGB + Optical flow | GitHub |
| R(2+1)D+BERT | ECCVW 2020 | Supervised | HMDB51, UCF101 | RGB | GitHub |
| XDC | NeurIPS 2020 | Self-supervised | HMDB51, UCF101 | RGB + Audio | GitHub |
| ELo | CVPR 2020 | Self-supervised | Kinetics-400, UCF101, HMDB51 | RGB + Optical flow + Audio | - |
| Jin et al. | ICICSP 2021 | Supervised | UCF101 | RGB | - |
| GDT | arXiv 2021 | Self-supervised | Kinetics-400, UCF101, HMDB51 | RGB + Audio | - |
| AVID | CVPR 2021 | Self-supervised | Kinetics-400, UCF101, HMDB51 | RGB + Audio | GitHub |
Click to expand Table 6
| Model | Venue | Learning | Dataset | Modality | Code |
|---|---|---|---|---|---|
| VTN | ICCV 2021 | Supervised | Kinetics-400, MiT | RGB | GitHub |
| TimeSformer | ICML 2021 | Supervised | Kinetics-400, Kinetics-600 | RGB | GitHub |
| STAM | arXiv 2021 | Supervised | Kinetics-400, UCF101, Charades | RGB | GitHub |
| ViViT | ICCV 2021 | Supervised | Kinetics-400, Kinetics-600, Epic-Kitchens-100, MiT, Something-Something V2 | RGB | GitHub |
| MViT | ICCV 2021 | Supervised | Kinetics-400, Kinetics-600, Something-Something V2, Charades, AVA | RGB | GitHub |
| Motionformer | NeurIPS 2021 | Supervised | Kinetics-400, Kinetics-600, Something-Something V2, Epic-Kitchens-100 | RGB | GitHub |
| X-ViT | NeurIPS 2021 | Supervised | Kinetics-400, Kinetics-600, Something-Something V2, Epic-Kitchens-100 | RGB | GitHub |
| TallFormer | ECCV 2022 | Supervised | THUMOS14, ActivityNet | RGB | GitHub |
| VideoSwin | CVPR 2022 | Supervised | Kinetics-400, Kinetics-600, Something-Something V2 | RGB | GitHub |
| ORViT | CVPR 2022 | Supervised | Something-Something V2, SomethingElse, Diving48, AVA, Epic-Kitchens-100 | RGB | GitHub |
| BEVT | CVPR 2022 | Self-supervised | Kinetics-400, Something-Something V2, Diving-48 | RGB | GitHub |
| MaskFeat | CVPR 2022 | Self-supervised | Kinetics-400, Kinetics-600, Kinetics-700 | RGB | GitHub |
| UniFormer | arXiv 2022 | Supervised | Kinetics-400, Kinetics-600, Something-Something V1, V2 | RGB | GitHub |
| VideoMAE | NeurIPS 2022 | Self-supervised | Kinetics-400, Something-Something V2, UCF101, HMDB51, AVA | RGB | GitHub |
| MTV | CVPR 2022 | Supervised | Kinetics-400, Kinetics-600, Kinetics-700, Something-Something V2, Epic-Kitchens-100, MiT | RGB | GitHub |
| MAE-ST | arXiv 2022 | Self-supervised | Kinetics-400, Something-Something V2, AVA | RGB | GitHub |
| CAST | NeurIPS 2023 | Supervised | Kinetics-400, Something-Something V2, Epic-Kitchens-100 | RGB | GitHub |
| UniFormerV2 | ICCV 2023 | Supervised+Contrastive | Kinetics-400, Kinetics-600, Kinetics-700, MiT, Something-Something V1, V2, ActivityNet, HACS | RGB | - |
| OmniMAE | CVPR 2023 | Self-supervised | Something-Something V2, Epic-Kitchens-100, Kinetics-400 | RGB | GitHub |
| MVD | CVPR 2023 | Self-supervised | Kinetics-400, Something-Something V2, UCF101, HMDB51 | RGB | GitHub |
| Hiera | ICML 2023 | Self-supervised | Kinetics-400, Kinetics-600, Kinetics-700, Something-Something V2, AVA | RGB | GitHub |
| VideoMAE V2 | CVPR 2023 | Self-supervised | Kinetics-400, Something-Something V2, UCF101, HMDB51 | RGB | GitHub |
| SOAP | ACM MM 2024 | Few-shot | Something-Something V2, Kinetics-400, UCF101, HMDB51 | RGB | GitHub |
| C2C | ECCV 2024 | Zero-shot | Sth-com | RGB | GitHub |
| VMPs | ACML 2024 | Supervised | HMDB51, MPII Cooking 2, FineGym | RGB + Motion prompts | GitHub |
| TIME Layer | arXiv 2024 | Self-supervised | UCF101, HMDB51, UWA3D Multiview Activity II, NTU RGB+D, NTU RGB+D 120 | RGB + Depth | - |
Click to expand Table 7
| Model | Venue | Learning | Dataset | Modality | Code |
|---|---|---|---|---|---|
| Dynamic Skeletons | CVPR 2015 | Supervised | MSRDailyActivity, CAD-60, SYSU 3D HOI | Depth + Joint | - |
| HBRNN-L | CVPR 2015 | Supervised | MSRAction3D, Berkeley MHAD, HDM05 | Joint | - |
| Part-aware LSTM | CVPR 2016 | Supervised | NTU RGB+D | RGB + Depth + Joint + Infrared | GitHub |
| LARP-SO | CVPR 2016 | Supervised | Florence3D-Action, MSRActionPairs3D, G3D-Gaming | Joint | - |
| STA-LSTM | AAAI 2017 | Supervised | NTU RGB+D | Joint | - |
| LieNet | CVPR 2017 | Supervised | NTU RGB+D, HDM05, G3D-Gaming | Joint + Bone | - |
| Two-Stream RNN | CVPR 2017 | Supervised | NTU RGB+D | Joint | - |
| Ke et al. | CVPR 2017 | Supervised | NTU RGB+D | Joint | - |
| VA-LSTM | ICCV 2017 | Supervised | NTU RGB+D, SYSU 3D HOI | Joint | GitHub |
| View Invariant | Pattern Recognit. 2017 | Supervised | NTU RGB+D, Northwestern-UCLA, UWA3D Multiview Activity II, MSRC-12 | Joint | - |
| Two-Stream CNN | ICMEW 2017 | Supervised | NTU RGB+D, PKU-MMD I | Joint + Skeleton motion | GitHub |
| LSTM-CNN | ICMEW 2017 | Supervised | NTU RGB+D | Joint | - |
| ST-LSTM+Trust Gate | TPAMI 2018 | Supervised | NTU RGB+D, MSRAction3D, SYSU 3D HOI, Berkeley MHAD | Joint | - |
| ST-GCN | AAAI 2018 | Supervised | Kinetics-400, NTU RGB+D | Joint | GitHub |
| Tang et al. | CVPR 2018 | Reinforcement | NTU RGB+D, SYSU 3D HOI, UTKinect-Action3D | Joint + Bone | - |
| AS-GCN | CVPR 2019 | Supervised | NTU RGB+D, Kinetics-400 | Joint + Bone | GitHub |
| 2s-AGCN | CVPR 2019 | Fully-supervised | NTU RGB+D, Kinetics-skeleton | Joint + Bone | GitHub |
| DGNN | CVPR 2019 | Supervised | NTU RGB+D, Kinetics-skeleton | Joint + Bone | GitHub |
| EfficientGCN | ACM MM 2020 | Supervised | NTU RGB+D, NTU RGB+D 120 | Joint + Velocity + Bone | - |
| RA-GCN | TCSVT 2020 | Supervised | NTU RGB+D, NTU RGB+D 120 | Joint + Bone | gitee |
| Shift-GCN | CVPR 2020 | Supervised | NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA | Joint + Bone | GitHub |
| MS-G3D | CVPR 2020 | Supervised | NTU RGB+D 60, NTU RGB+D 120, Kinetics-skeleton | Joint + Bone | GitHub |
| DSTA-Net | ACCV 2020 | Supervised | NTU RGB+D, NTU RGB+D 120 | Joint + Bone | - |
| SCK+DCK / SCK$\oplus$+DCK$\oplus$ | TPAMI 2020 | Supervised | UTKinect-Action3D, Florence3D-Action, MSRAction3D, NTU RGB+D 60, Kinetics-400, HMDB51, MPII Cooking | Joint | - |
| CTR-GCN | ICCV 2021 | Supervised | NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA | Joint + Bone | - |
| FGCN | TIP 2022 | Supervised | NTU RGB+D, NTU RGB+D120, Northwestern-UCLA | Joint + Bone | - |
| AGE-Ens | TNNLS 2022 | Supervised | NTU RGB+D, NTU RGB+D 120 | Joint + Bone | GitHub |
| PoseConv3D | CVPR 2022 | Supervised | Kinetics-400, UCF101, HMDB51 | Joint + Bone + RGB | GitHub |
| InfoGCN | CVPR 2022 | Supervised | NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA | Joint + Bone | GitHub |
| DASTM | ECCV 2022 | Few-shot | NTU RGB+D 120, Kinetics-skeleton | Joint + Bone | - |
| Uncertainty-DTW | ECCV 2022 | Supervised/Unsupervised few-shot | NTU RGB+D, NTU RGB+D 120, Kinetics-skeleton | Skeleton sequences | GitHub |
| TranSkeleton | TCSVT 2023 | Supervised | NTU RGB+D, NTU RGB+D 120 | Joint + Bone | - |
| HiCo | AAAI 2023 | Unsupervised + Contrastive | NTU RGB+D, NTU RGB+D 120, PKU-MMD I, PKU MMD II | Joint | GitHub |
| FR-Head | CVPR 2023 | Supervised + Contrastive | NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA | Joint + Bone | GitHub |
| 3Mformer | CVPR 2023 | Supervised | NTU RGB+D, NTU RGB+D 120, Kinetics-400, Northwestern-UCLA | Joint + Hyper-edge | - |
| HYSP | ICLR 2023 | Self-supervised | NTU RGB+D, NTU RGB+D 120, PKU-MMD I | Joint | GitHub |
| PAINet | ICCV 2023 | Few-shot | NTU RGB+D 120, Kinetics-skeleton | Joint + Bone | - |
| PCM3 | ACM MM 2023 | Self-supervised | NTU RGB+D, NTU RGB+D 120, PKU-MMD I | Joint + Bone + Motion | GitHub |
| Stream-GCN | IJCAI 2023 | Supervised | NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA | Joint + Bone | - |
| SkeletonGCL | arXiv 2023 | Self-supervised | NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA | Joint + Bone | GitHub |
| DSCNet | ESWA 2024 | Supervised + Multimodal | NTU RGB+D, NTU RGB+D 120, PKU-MMD I, UAV-Human, IKEA ASM, Northwestern-UCLA | RGB + Joint + Bone | - |
| Skeleton-OOD | Neurocomputing 2024 | Supervised | NTU RGB+D, NTU RGB+D 120, Kinetics-400 | Joint | GitHub |
| ViA | IJCV 2024 | Self-supervised | Posetics, NTU RGB+D, NTU RGB+D 120, Toyota Smarthome, UAV-Human, Penn Action | Joint + Motion | GitHub |
| DeGCN | TIP 2024 | Supervised | NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA | Joint + Bone | GitHub |
| Js-SaPR-GCN | TCSVT 2024 | Supervised | NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA | Joint + Bone + Motion | - |
| BlockGCN | CVPR 2024 | Supervised | NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA | Joint + Bone + Motion | GitHub |
| JEANIE | IJCV 2024 | Supervised/Unsupervised few-shot | NTU RGB+D, NTU RGB+D 120, Kinetics-skeleton, MSRAction3D, UWA3D Multiview Activity | Skeleton sequences | - |
| SA-DVAE | arXiv 2024 | Zero-shot | NTU RGB+D, NTU RGB+D 120, PKU-MMD I | Joint | GitHub |
| ProtoGCN | arXiv 2024 | Self-supervised + Prototype | NTU RGB+D, NTU RGB+D 120, Kinetics-skeleton, FineGYM | Joint | GitHub |
| HSIC-based | arXiv 2024 | Supervised | NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA | Joint + Bone | - |
| USDRL | AAAI 2025 | Self-supervised | NTU RGB+D, NTU RGB+D 120, PKU-MMD I, PKU-MMD II | Joint + Bone + Motion | GitHub |
Click to expand Table 8
| Model | Venue | Learning | Dataset | Modality | Code |
|---|---|---|---|---|---|
| HON4D | CVPR 2013 | Supervised | MSRAction3D, MSRDailyActivity3D, MSRActionPairs3D | Depth | - |
| HOPC | ECCV 2014 | Supervised | MSRAction3D, MSRActionPairs3D, UWA3D Multiview Activity | Depth + Point cloud | - |
| Wang et al. | Trans. Human-Mach. Syst. 2016 | Supervised | MSRAction3D, MSRDailyActivity3D, UTKinect-Action3D | Depth | - |
| Rahmani et al. | CVPR 2016 | Supervised | Northwestern-UCLA, UWA3D Multiview Activity II | Depth | - |
| S2DDI | ICCVW 2017 | Supervised | MSRAction3D, G3D-Gaming, MSRDailyActivity3D, SYSU 3D HOI, UTD-MHAD | Depth | - |
| Wang et al. | TMM 2018 | Supervised | NTU RGB+D | Depth | - |
| MVDI | Inf. Sci. 2018 | Supervised | NTU RGB+D, Northwestern-UCLA, UWA3D Multiview Activity II | Depth | GitHub |
| 3DFCNN | Multimed. Tools Appl. 2020 | Supervised | NTU RGB+D, Northwestern-UCLA, UWA3D Multiview Activity II | Depth | - |
| Liu et al. | ICASSP 2017 | Supervised | MSRAction3D, DHA | Depth | - |
| Dhiman et al. | TIP 2020 | Supervised | NTU RGB-D, UWA3D Multiview Activity II, Northwestern-UCLA | RGB + Depth | - |
| Stateful ConvLSTM | arXiv 2020 | Supervised | NTU RGB+D | Depth | - |
| DEAR | arXiv 2024 | Supervised | Something-Something V2 | RGB + Depth | GitHub |
Click to expand Table 9
| Model | Venue | Learning | Dataset | Modality | Code |
|---|---|---|---|---|---|
| Gao et al. | Neurocomputing 2016 | Supervised | InfAR | Infrared + Optical flow | - |
| Jiang et al. | CVPRW 2017 | Supervised | InfAR | Infrared + Optical flow | - |
| Kawashima et al. | AVSS 2017 | Supervised | Custom Dataset | Infrared | - |
| Shah et al. | SPIE 2018 | Supervised | Custom IR Dataset | Infrared | - |
| TSTDDs | SPL 2018 | Supervised | InfAR, NTU RGB+D | Infrared + Optical flow | - |
| Akula et al. | CSR 2018 | Supervised | Custom IR Dataset | Infrared | - |
| Imran et al. | Infrared Phys. Technol. 2019 | Supervised | InfAR, IITR-IAR | Infrared + Optical flow | - |
| Meglouli et al. | CEAI 2019 | Supervised | InfAR | Infrared + Optical flow | - |
| Mehta et al. | ICPR 2020 | Adversarial | TSF | Infrared + Optical flow | GitHub |
Click to expand Table 10
| Model | Venue | Learning | Dataset | Modality | Code |
|---|---|---|---|---|---|
| MeteorNet | ICCV 2019 | Supervised | MSRAction3D | Point cloud | GitHub |
| PointLSTM | CVPR 2020 | Supervised | MSRAction3D | Point cloud | GitHub |
| 3DV-PointNet++ | CVPR 2020 | Supervised | NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA, UWA3D Multiview Activity II | Depth | GitHub |
| ASTA3DConv | Trans. Instrum. Meas. 2020 | Supervised | MSRAction3D | Point cloud | - |
| Wang et al. | WACV 2021 | Self-supervised | NTU RGB+D, NTU-PCL, MSRAction3D | Point cloud | - |
| P4Transformer | CVPR 2021 | Supervised | MSRAction3D, NTU RGB+D, NTU RGB+D 120 | Point cloud | GitHub |
| PSTNet | ICLR 2021 | Supervised | MSRAction3D, NTU RGB+D, NTU RGB+D 120 | Point cloud | GitHub |
| PST2 | WACV 2022 | Supervised | MSRAction3D | Point cloud | - |
| MaST-Pre | ICCV 2023 | Self-supervised | MSRAction3D, NTU RGB+D | Point cloud | GitHub |
| PointCPSC | ICCV 2023 | Self-supervised | MSRAction3D, NTU RGB+D | Point cloud | - |
| 3DInAction | CVPR 2024 | Supervised | MSRAction3D | Point cloud | GitHub |
| KAN-HyperpointNet | arXiv 2024 | Supervised | NTU RGB+D, MSRAction3D | Point cloud | - |
Click to expand Table 11
| Model | Venue | Learning | Dataset | Modality | Code |
|---|---|---|---|---|---|
| CPD | arXiv 2020 | Self-supervised | Kinetics-400, HMDB51, UCF101 | RGB + Text | GitHub |
| G-Blend | CVPR 2020 | Multi-task | Kinetics-400, Mini-Sports, EPIC-Kitchen | RGB + Optical flow + Audio | - |
| MIL-NCE | CVPR 2020 | Self-supervised | HowTo100M, HMDB51, UCF101 | RGB + Text | GitHub |
| MMV | NeurIPS 2020 | Self-supervised | UCF101, HMDB51, Kinetics-600 | RGB + Audio + Text | GitHub |
| VIMPAC | arXiv 2021 | Self-supervised | Something-Something V2, Diving48, UCF101, HMDB51 | RGB + Text | GitHub |
| InternVideo | CVPR 2023 | Self-supervised | Kinetics-400, Kinetics-600, Kinetics-700, Something-Something V1, V2, ActivityNet, HACS, HMDB51 | RGB + Text | GitHub |
| Side4Video | arXiv 2023 | Self-supervised | Something-Something V1, Something-Something V2, Kinetics-400 | RGB + Text | GitHub |
| EZ-CLIP | arXiv 2024 | Zero-shot | Kinetics-400, HMDB51, UCF101, Something-Something V2 | RGB + Text | GitHub |
| SATA | arXiv 2024 | Zero-shot | UCF101, HMDB51 | RGB + Text | GitHub |
| TC-CLIP | ECCV 2024 | Zero-shot/Few-shot/Fully-supervised | HMDB51, UCF101, Kinetics-400, Something-Something V2 | RGB + Text | - |
| InternVideo2 | arXiv 2024 | Self-supervised + Multimodal | Kinetics-400, Kinetics-600, Kinetics-700, MiT, Something-Something V2, ActivityNet, HACS, Charades, HMDB51 | RGB + Audio + Text | GitHub |
| OmniViD | CVPR 2024 | Supervised | Kinetics-400, Something-Something V2, UCF101, HMDB51 | RGB + Text | GitHub |
| LoCATe-GAT | TETCI 2024 | Zero-shot | UCF101, HMDB51, ActivityNet, Kinetics-400 | RGB + Text | GitHub |
| STDD | arXiv 2024 | Zero-shot | Kinetics-600, UCF101, HMDB51 | RGB + Text | GitHub |
Click to expand Table 12
| Datasets | Year | # Classes | # Subjects | # Views | # Video clips | Sensor | Modalities | Dataset type |
|---|---|---|---|---|---|---|---|---|
| KTH | 2004 | 6 | 25 | 1 | 2391 | Static camera | RGB | Human actions (e.g., walking, jogging) |
| Weizmann | 2005 | 10 | 9 | 1 | 90 | - | RGB | Human actions (e.g., jumping, running) |
| IXMAS | 2006 | 11 | 10 | 5 | 330 | - | RGB | Movie Scenes (e.g., kissing, running) |
| Hollywood | 2008 | 8 | - | - | 1422 | - | RGB | Movie Scenes (e.g., eating, driving) |
| Hollywood2 | 2009 | 12 | - | - | 1709 | - | RGB | Movie Scenes (e.g., running, kissing) |
| ADL | 2009 | 10 | 5 | - | 150 | Static camera | RGB | Daily Activities (e.g., brushing teeth, reading) |
| Olympic Sports | 2010 | 16 | - | - | 783 | - | RGB | Sports (e.g., high jumping, diving) |
| MSRAction3D | 2010 | 20 | 10 | 1 | 567 | Kinect v1 | Depth+3DJoints | Daily Activities (e.g., drinking, walking) |
| CAD-60 | 2011 | 14 | 4 | - | 68 | Kinect v1 | RGB+Depth+3DJoints | Human performing activities (e.g., cleaning objects) |
| HMDB51 | 2011 | 51 | - | - | 6,766 | - | RGB | Human actions (e.g., jumping, running) |
| MSRDailyActivity3D | 2012 | 16 | 10 | 1 | 320 | Kinect v1 | RGB+Depth+3DJoints | Daily Activities (e.g., calling, playing game) |
| UCF101 | 2012 | 101 | - | - | 13,320 | - | RGB | Body motion, Human-object interactions, sports etc. |
| UTKinect-Action3D | 2012 | 10 | 10 | 1 | 199 | Kinect v1 | RGB+Depth+3DJoints | Human actions (e.g., waving hands, pushing) |
| MPII Cooking | 2012 | 64 | 12 | 1 | 3,748 | - | RGB | Cooking |
| G3D-Gaming | 2012 | 20 | 10 | 1 | - | Kinect v1 | RGB+Depth+3DJoints | Gaming scenario (e.g., defending, climbing) |
| Berkeley MHAD | 2013 | 11 | 12 | 4 | 660 | Multi-baseline stereo cameras | RGB+Depth+3DJoints+Accelerometer+Audio | Human actions (e.g., throwing, clapping hands) |
| CAD-120 | 2013 | 10 | 4 | - | 120 | Kinect v1 | RGB+Depth+3DJoints | Human performing activities (e.g., picking objects) |
| UCF50 | 2013 | 50 | - | - | 6676 | - | RGB | Body motion, Human-object interactions, sports etc. |
| Florence3D-Action | 2013 | 9 | 10 | 1 | 215 | Kinect v1 | RGB+Depth+3DJoints | Human actions (e.g., bowing, drinking) |
| MSRActionPairs3D | 2013 | 12 | 10 | 1 | 360 | Kinect v1 | RGB+Depth+3DJoints | Human actions (e.g., picking up, putting down) |
| Sports-1M | 2014 | 487 | - | - | 1,000,000 | - | RGB | Sports (e.g., swimming, skiing) |
| THUMOS14 | 2014 | 101 | - | - | 5,613 | - | RGB | Human Actions (e.g., making up, archery) |
| Northwestern-UCLA | 2014 | 10 | 10 | 3 | 1494 | Kinect v1 | RGB+Depth+3DJoints | Human actions (e.g., dropping trash) |
| UWA3D Multiview Activity | 2014 | 30 | 10 | 1 | 701 | Kinect v1 | RGB+Depth+3DJoints | Daily Activities (e.g., holding head, walking) |
| ActivityNet | 2015 | 203 | - | - | 27,801 | - | RGB | Human actions (e.g., drawing, washing) |
| MPII Cooking 2 | 2015 | 67 | 30 | 1 | 273 | Static camera | RGB | Cooking |
| UWA3D Multiview Activity II | 2015 | 30 | 9 | 4 | 1,070 | Kinect v1 | RGB+Depth+3DJoints | Daily Activities (e.g., waving head, jumping) |
| SYSU 3D HOI | 2015 | 12 | 40 | - | 480 | Kinect v1 | RGB+Depth+3DJoints | Human-Object Interactions (e.g., sweeping the floor) |
| NTU RGB+D | 2016 | 60 | 40 | 80 | 56,880 | Kinect v2 | RGB+Depth+3DJoints | Daily actions, health-related actions etc. |
| InfAR | 2016 | 12 | 40 | - | 600 | Infrared camera | Infrared | Human actions (e.g., jogging) |
| TSF | 2016 | 2 | - | 1 | 44 | FLIR ONE | Infrared | Falls and normal activities |
| Charades | 2016 | 157 | - | - | 66,500 | - | RGB+Flow | Indoor activities (e.g., cleaning) |
| PKU-MMD I | 2017 | 51 | 66 | 3 | 1,076 | Kinect v2 | RGB+Depth+Infrared+3DJoints | Human actions (e.g., walking) |
| NfS | 2017 | - | - | - | 100 | 240 FPS camera | RGB | Visual object tracking |
| Kinetics-400 | 2017 | 400 | - | - | 306,245 | - | RGB | Human-centered actions (e.g., playing instruments) |
| Something-Something V1 | 2017 | 174 | - | - | 108,499 | - | RGB | Human performing actions with everyday objects |
| Kinetics-skeleton | 2017 | 400 | - | - | 260,232 | - | 2DJoints | Human-centered actions |
| HACS | 2017 | 200 | - | - | 1,500,000 | - | RGB+Flow | Human actions (e.g., dancing) |
| Charades-Ego | 2018 | 157 | 112 | 2 | 68,536 | Head-mounted+standard camera | RGB | Egocentric indoor activities |
| AVA | 2018 | 80 | - | - | 211,000 | - | RGB+Flow | Human actions (e.g., talking, sitting) |
| Diving48 | 2018 | 48 | - | - | 18,404 | - | RGB+Flow | Diving actions |
| Epic-Kitchens | 2018 | 149 | 32 | - | 39,594 | - | RGB+Flow | Cooking |
| Something-Something V2 | 2018 | 174 | - | - | 220,847 | - | RGB | Human performing actions with everyday objects |
| MiT | 2018 | 339 | - | - | 1,000,000+ | - | RGB+Audio+Flow | Dynamic actions (e.g., human, animals) |
| Kinetics-600 | 2018 | 600 | - | - | 495,547 | - | RGB | Human-centered actions (e.g., playing instruments) |
| NTU RGB+D 120 | 2019 | 120 | 106 | 155 | 114,480 | Kinect v2 | RGB+Depth+3DJoints+Infrared | Daily actions, health-related actions etc. |
| IITR-IAR | 2019 | 21 | 35 | - | 1,470 | FLIR T1020 | Infrared | Human actions (hugging, fighting) |
| Kinetics-700 | 2019 | 700 | - | - | 650,317 | - | RGB | Human-centered actions (e.g., playing instruments) |
| HowTo100M | 2019 | 23,611 | - | - | 136,000,000 | - | RGB | Instructional videos (e.g., cooking) |
| CATER | 2019 | 301 | - | - | 5,500 | - | RGB | Compositional actions and temporal reasoning |
| FineGym | 2020 | 530 | - | - | 32,697 | - | RGB | Gymnasium videos (e.g., balance beam) |
| PKU-MMD II | 2020 | 41 | 13 | 3 | 1,009 | Kinect v2 | RGB+Depth+Infrared+3DJoints | Human actions (e.g., standing) |
| EPIC-KITCHENS-100 | 2020 | 4,053 | 37 | - | 89,977 | GoPro Hero7 Black | RGB+Flow | Cooking |
| UAV-Human | 2021 | 155 | 119 | - | 22,476 | UAV Camera | RGB+3DJoints | Human Actions (e.g., walking, jogging) |
We warmly invite everyone to contribute to this repository and help enhance its quality and scope. Feel free to submit pull requests to add new methods, datasets or other useful resources, as well as to correct any errors you discover. To ensure consistency, please format your pull requests using our tables' structures. We greatly appreciate your valuable contributions and support!