EgoPet: Egomotion and Interaction Data from an Animal's Perspective (ECCV 2022)

Amir Bar, Arya Bakhtiar, Antonio Loquercio, Jathushan Rajasegaran, Danny Tran, Yann LeCun, Amir Globerson, Trevor Darrell

This repository is the implementation of the pretraining and linear probing experiments in this paper.

Abstract

Animals perceive the world to plan their actions and interact with other agents to accomplish complex tasks, demonstrating capabilities that are still unmatched by AI systems. To advance our understanding and reduce the gap between the capabilities of animals and AI systems, we introduce a dataset of pet egomotion imagery with diverse examples of simultaneous egomotion and multi-agent interaction. Current video datasets separately contain egomotion and interaction examples, but rarely both at the same time. In addition, EgoPet offers a radically distinct perspective from existing egocentric datasets of humans or vehicles. We define two in-domain benchmark tasks that capture animal behavior, and a third benchmark to assess the utility of EgoPet as a pretraining resource to robotic quadruped locomotion, showing that models trained from EgoPet outperform those trained from prior datasets. This work provides evidence that today's pets could be a valuable resource for training future AI systems and robotic assistants.

EgoPet Dataset

Install the data here.

Pre-training

Installation

Please follow the instructions in INSTALL.md.

Pretrain an MVD model on EgoPet dataset:

Download VideoMAE ViT-B checkpoint from here
Download MAE ViT-B checkpoint from here

Run the following command:

OUTPUT_DIR='./logs_dir/mvd_vit_base_with_vit_base_teacher_egopet'
IMAGE_TEACHER="path/to/mae/checkpoint"
VIDEO_TEACHER="path/to/kinetics/checkpoint"
DATA_PATH='egopet_pretrain.csv'
GPUS=8
NODE_COUNT=4
RANK=0
MASTER_PORT=29500
OMP_NUM_THREADS=1 python3 -m torch.distributed.launch --nproc_per_node=${GPUS} --use_env \
        --master_port ${MASTER_PORT} --nnodes=${NODE_COUNT} \
        --node_rank=${RANK} --master_addr=${MASTER_ADDR} \
        run_mvd_pretraining.py \
        --data_path ${DATA_PATH} \
        --data_root ${DATA_ROOT} \
        --model pretrain_masked_video_student_base_patch16_224 \
        --opt adamw --opt_betas 0.9 0.95 \
        --log_dir ${OUTPUT_DIR} \
        --output_dir ${OUTPUT_DIR} \
        --image_teacher_model mae_teacher_vit_base_patch16 \
        --distillation_target_dim 768 \
        --distill_loss_func SmoothL1 \
        --image_teacher_model_ckpt_path ${IMAGE_TEACHER} \
        --video_teacher_model pretrain_videomae_teacher_base_patch16_224 \
        --video_distillation_target_dim 768 \
        --video_distill_loss_func SmoothL1 \
        --video_teacher_model_ckpt_path ${VIDEO_TEACHER} \
        --mask_type tube --mask_ratio 0.9 --decoder_depth 2 \
        --batch_size 16 --update_freq 2 --save_ckpt_freq 10 \
        --num_frames 16 --sampling_rate 4 \
        --lr 1.5e-4 --min_lr 1e-4 --drop_path 0.1 --warmup_epochs 268 --epochs 2680 \
        --auto_resume

We set RANK (--node_rank) as 0 on the first node. On other nodes, run the same command with RANK=1, ..., RANK=3 respectively. --master_addr is set as the ip of the node 0.

Pretrained Models

Model	Pretraining	Epochs	Link
MVD (ViT-B)	EgoPet	2670	link

Visual Interaction Prediction (VIP)

The fine-tuning instructions for the VIP task is in VIP.md.

Locomotion Prediction (LP)

The fine-tuning instructions for the LP task is in LP.md.

Vision to Proprioception Prediction (VPP)

The fine-tuning instructions for the VPP task is in VPP.md.

Acknowledgements

This project is built upon MVD, MAE_ST, and DPVO. Thank you to the contributors of these codebases!

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
csv		csv
decoder		decoder
dpvo		dpvo
images		images
locomotion_prediction		locomotion_prediction
object_interaction		object_interaction
DATASET.md		DATASET.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
LP.md		LP.md
NOTICE.md		NOTICE.md
README.md		README.md
VIP.md		VIP.md
VPP.md		VPP.md
cms_dataset.py		cms_dataset.py
datasets.py		datasets.py
engine_for_finetuning.py		engine_for_finetuning.py
engine_for_pretraining.py		engine_for_pretraining.py
engine_locomotion_prediction_for_finetuning.py		engine_locomotion_prediction_for_finetuning.py
engine_object_interaction_for_finetuning.py		engine_object_interaction_for_finetuning.py
eval_adaptation_ds.py		eval_adaptation_ds.py
functional.py		functional.py
kinetics.py		kinetics.py
masking_generator.py		masking_generator.py
mixup.py		mixup.py
modeling_finetune.py		modeling_finetune.py
modeling_student.py		modeling_student.py
modeling_teacher.py		modeling_teacher.py
modeling_video_teacher.py		modeling_video_teacher.py
optim_factory.py		optim_factory.py
rand_augment.py		rand_augment.py
random_erasing.py		random_erasing.py
run_class_finetuning.py		run_class_finetuning.py
run_fs_domain_adaptation.py		run_fs_domain_adaptation.py
run_latents_finetuning.py		run_latents_finetuning.py
run_locomotion_prediction_finetuning.py		run_locomotion_prediction_finetuning.py
run_mvd_pretraining.py		run_mvd_pretraining.py
run_object_interaction_finetuning.py		run_object_interaction_finetuning.py
ssv2.py		ssv2.py
transforms.py		transforms.py
utils.py		utils.py
video_transforms.py		video_transforms.py
volume_transforms.py		volume_transforms.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EgoPet: Egomotion and Interaction Data from an Animal's Perspective (ECCV 2022)

Amir Bar, Arya Bakhtiar, Antonio Loquercio, Jathushan Rajasegaran, Danny Tran, Yann LeCun, Amir Globerson, Trevor Darrell

Abstract

EgoPet Dataset

Pre-training

Installation

Pretrain an MVD model on EgoPet dataset:

Pretrained Models

Visual Interaction Prediction (VIP)

Locomotion Prediction (LP)

Vision to Proprioception Prediction (VPP)

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

DannyTran123/egopet

Folders and files

Latest commit

History

Repository files navigation

EgoPet: Egomotion and Interaction Data from an Animal's Perspective (ECCV 2022)

Amir Bar, Arya Bakhtiar, Antonio Loquercio, Jathushan Rajasegaran, Danny Tran, Yann LeCun, Amir Globerson, Trevor Darrell

Abstract

EgoPet Dataset

Pre-training

Installation

Pretrain an MVD model on EgoPet dataset:

Pretrained Models

Visual Interaction Prediction (VIP)

Locomotion Prediction (LP)

Vision to Proprioception Prediction (VPP)

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages