(ECCV2024) Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition Paper
Requirements:
- PyTorch
- DGL (https://www.dgl.ai/pages/start.html)
- decord
- timm
- einops
- scikit-learn
- pandas
Example:
conda create -n evi-mae python=3.11
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu118
pip install dgl -f https://data.dgl.ai/wheels/torch-2.4/cu118/repo.html
pip install decord timm==0.4.5 einops scikit-learn pandasWe use the CMU-MMAC dataset and the WEAR dataset.
For the CMU-MMAC dataset, we follow mmac_captions to preprocess the data and leverage the action labels from EgoProceL.
We repack the two datasets which can be downloaded at CMU-MMAC and WEAR.
Before pretraining or fine-tuning, we create an adapted VideoMAE checkpoint to initialize our model. Please download it at small-video-mae. The adaptation method follows cav-mae.
We also provide our pretrained and fine-tuned checkpoints at share_link.
Please finish the data preparation and checkpoints preparation first, and then organize the data and checkpoints in the following structure:
data_release/
cmu-mmac-release/
wear-release/
videomae_adapt_ckpt/
Then, pretraining can be conducted by:
cd egs/release; bash pretrain_cmummac.sh
cd egs/release; bash pretrain_wear.sh
In the shell scripts, $dataset_base_path is the path to the data_release folder.
cd egs/release; bash finetune_cmummac.sh
cd egs/release; bash finetune_wear.sh
In the shell scripts, please modify the dataset_base_path and pretrain_path.
Our model can be trained without MAE pretraining. You can optionally use the adapted VideoMAE checkpoint to initialize the model or train from scratch.
cd egs/release; bash train_cmummac.sh
cd egs/release; bash train_wear.sh
Our model can be trained without video.
cd egs/release; bash train_cmummac_imuonly.sh
cd egs/release; bash train_wear_imuonly.sh
The easiest way to adapt our model to one-IMU setting is to use train_cmummac_imuonly.sh and set imu_enable_graph to False. Then, make 4 copies of the one input IMU in the dataloader.
Our code implementation is based on the following repositories:
https://github.com/THUDM/GraphMAE
https://github.com/MCG-NJU/VideoMAE
@inproceedings{zhang2025masked,
title={Masked video and body-worn IMU autoencoder for egocentric action recognition},
author={Zhang, Mingfang and Huang, Yifei and Liu, Ruicong and Sato, Yoichi},
booktitle={European Conference on Computer Vision},
pages={312--330},
year={2025},
organization={Springer}
}
