IMU-Video-MAE

(ECCV2024) Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition Paper

Installation

Requirements:

PyTorch
DGL (https://www.dgl.ai/pages/start.html)
decord
timm
einops
scikit-learn
pandas

Example:

conda create -n evi-mae python=3.11
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu118
pip install dgl -f https://data.dgl.ai/wheels/torch-2.4/cu118/repo.html
pip install decord timm==0.4.5 einops scikit-learn pandas

Data Preparation

We use the CMU-MMAC dataset and the WEAR dataset.

For the CMU-MMAC dataset, we follow mmac_captions to preprocess the data and leverage the action labels from EgoProceL.

We repack the two datasets which can be downloaded at CMU-MMAC and WEAR.

Checkpoints

Before pretraining or fine-tuning, we create an adapted VideoMAE checkpoint to initialize our model. Please download it at small-video-mae. The adaptation method follows cav-mae.

We also provide our pretrained and fine-tuned checkpoints at share_link.

MAE Pretraining

Please finish the data preparation and checkpoints preparation first, and then organize the data and checkpoints in the following structure:

data_release/
    cmu-mmac-release/
    wear-release/
    videomae_adapt_ckpt/

Then, pretraining can be conducted by:

cd egs/release; bash pretrain_cmummac.sh

cd egs/release; bash pretrain_wear.sh

In the shell scripts, $dataset_base_path is the path to the data_release folder.

Fine-tuning for Action Recognition

cd egs/release; bash finetune_cmummac.sh

cd egs/release; bash finetune_wear.sh

In the shell scripts, please modify the dataset_base_path and pretrain_path.

Training for Action Recognition without MAE Pretraining

Our model can be trained without MAE pretraining. You can optionally use the adapted VideoMAE checkpoint to initialize the model or train from scratch.

cd egs/release; bash train_cmummac.sh

cd egs/release; bash train_wear.sh

Training for Action Recognition with 4 IMUs (no video) without MAE Pretraining

Our model can be trained without video.

cd egs/release; bash train_cmummac_imuonly.sh

cd egs/release; bash train_wear_imuonly.sh

Training for Action Recognition with 1 IMU

The easiest way to adapt our model to one-IMU setting is to use train_cmummac_imuonly.sh and set imu_enable_graph to False. Then, make 4 copies of the one input IMU in the dataloader.

Reference

Our code implementation is based on the following repositories:

https://github.com/YuanGongND/cav-mae

https://github.com/THUDM/GraphMAE

https://github.com/MCG-NJU/VideoMAE

Citation

@inproceedings{zhang2025masked,
  title={Masked video and body-worn IMU autoencoder for egocentric action recognition},
  author={Zhang, Mingfang and Huang, Yifei and Liu, Ruicong and Sato, Yoichi},
  booktitle={European Conference on Computer Vision},
  pages={312--330},
  year={2025},
  organization={Springer}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
egs		egs
src		src
LICENSE		LICENSE
README.md		README.md
evi-mae.png		evi-mae.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IMU-Video-MAE

Installation

Data Preparation

Checkpoints

MAE Pretraining

Fine-tuning for Action Recognition

Training for Action Recognition without MAE Pretraining

Training for Action Recognition with 4 IMUs (no video) without MAE Pretraining

Training for Action Recognition with 1 IMU

Reference

Citation

About

Uh oh!

Releases

Packages

Languages

License

mf-zhang/IMU-Video-MAE

Folders and files

Latest commit

History

Repository files navigation

IMU-Video-MAE

Installation

Data Preparation

Checkpoints

MAE Pretraining

Fine-tuning for Action Recognition

Training for Action Recognition without MAE Pretraining

Training for Action Recognition with 4 IMUs (no video) without MAE Pretraining

Training for Action Recognition with 1 IMU

Reference

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages