Skip to content

leorebensabath/TMRPlusPlus

 
 

Repository files navigation

TMR++

A Cross-Dataset Study for Text-based 3D Human Motion Retrieval

Léore Bensabath · Mathis Petrovich · Gül Varol

arXiv

Description

Official PyTorch implementation of the paper:

This repo is based on the implementation of TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis.

Please visit our webpage for more details.

Bibtex

If you find this code useful in your research, please cite:

@inproceedings{lbensabath2024,
    title={TMR++: A Cross-Dataset Study for Text-based 3D Human Motion Retrieval},
    author={Bensabath, Léore and Petrovich, Mathis and Varol, G{\"u}l},
    journal={CVPRW HuMoGen},
    year={2024}
}

and

@inproceedings{petrovich23tmr,
    title     = {{TMR}: Text-to-Motion Retrieval Using Contrastive {3D} Human Motion Synthesis},
    author    = {Petrovich, Mathis and Black, Michael J. and Varol, G{\"u}l},
    booktitle = {International Conference on Computer Vision ({ICCV})},
    year      = 2023
}

You can also put a star ⭐, if the code is useful to you.

Installation 👷

Create environment

Create a python virtual environnement:

python -m venv ~/.venv/TMR
source ~/.venv/TMR/bin/activate

Install PyTorch

python -m pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

Then install remaining packages:

python -m pip install -r requirements.txt

which corresponds to the packages: pytorch_lightning, einops, hydra-core, hydra-colorlog, orjson, tqdm, scipy. The code was tested on Python 3.10.12 and PyTorch 2.0.1.

Set up the datasets

Please first set up the datasets as explain in https://github.com/Mathux/TMR/tree/master in the same README section.

In this repo, we provide the augmented versions of dataset humanml3d, kitml and babel. For a given dataset ($DATASET), up to 3 new annotation file have been created:

  • dataset/annotations/$DATASET/annotations_paraphrases.json: Includes all the paraphrases generated by a llm
  • dataset/annotations/$DATASET/annotations_actions.json: For humanml3d and kitml only, includes the action type label generated by a llm
  • dataset/annotations/$DATASET/annotations_all.json: Includes a concatenation by key id of all the annotations (original and llm generated)

Copy the data in your repo from here

Compute the text embeddings for the data with text augmentation

Run this command to compute the sentence embeddings and token embeddings for the annotations with text augmentation:

python -m prepare.text_embeddings --config-name=text_embeddings_with_augmentation data=$DATASET
Combine datasets

To create a combination of any of the datasets, run:

python -m prepare.combine_datasets datasets=$DATASETS test_sets=$TEST_DATASETS split_suffix=$SPLIT_SUFFIX [OPTIONS]

Where:

  • datasets: The list of datasets to combine
  • test_sets: The intended list on which the dataset is going to be tested. When generating the split files, this will filter from the training set the samples from one of the training datasets that overlap with samples from another provided testing dataset. Note that you can create different splits for different intended testing sets by leveraging parameter split_suffix. The annotations file for the given combination will stay the same regardless of the test_sets value.
  • split_suffix: The split file suffix for this given combination of test sets. Training and validation split files will be saved under: datasets/annotations/splits/train{split_suffix}.txt and datasets/annotations/splits/val{split_suffix}.txt

The new dataset will be created inside folder datasets/annotations/{dataset1}_{dataset2}(_{dataset3})

Example:

python -m prepare.combine_datasets datasets=["humanml3d","kitml"] test_sets=["babel"] split_suffix="_wo_hkb"

Then run the ''python -m prepare.text_embeddings'' command with or without text augmentations on your new dataset combination.

Example:

python -m prepare.text_embeddings --config-name=text_embeddings_with_augmentation data=humanml3d_kitml

Training 🚀

Training with a combination of datasets

To train with a combination of datasets without any text augmentation, run the same command as in TMR with the relevant dataset name:

Example:

python train.py data=humanml3d_kitml

Training with text augmentation

python train.py --config-name=train_with_augmentation data=$DATASET
Details Relevant parameters you can modify in addition to the ones in TMR are the text augmentation picking probabilities detailed in the paper: **Example** ```bash python train.py --config-name=train_with_augmentation data=humanml3d data.paraphrase_prob=0.2 data.summary_prob=0.2 data.averaging_prob=0.3 run_dir=outputs/tmr_humanml3d_w_textAugmentation_0.2_0.2_0.3 ```
Extracting weights After training, run the following command, to extract the weights from the checkpoint:
python extract.py run_dir=$RUN_DIR

It will take the last checkpoint by default. This should create the folder RUN_DIR/last_weights and populate it with the files: motion_decoder.pt, motion_encoder.pt and text_encoder.pt. This process makes loading models faster, it does not depends on the file structure anymore, and each module can be loaded independently. This is already done for pretrained models.

Pretrained models 📀

You can find the different models used in the paper here: pre-trained models

Evaluation 📊

Motion to text / Text to motion retrieval

python retrieval.py run_dir=$RUN_DIR data=$DATA

Action recognition

For action recognition on datasets babel_actions_60 and babel_actions_120, run:

python retrieval_action_multi_labels.py run_dir=$RUN_DIR data=$DATA

It will compute the metrics, show them and save them in this folder RUN_DIR/contrastive_metrics_$DATA/. You can change the name of the saving file using argument save_file_name.

Usage 💻

Encode a motion

Note that the .npy file should corresponds to HumanML3D Guo features.

python encode_motion.py run_dir=RUN_DIR npy=/path/to/motion.npy

Encode a text

python encode_text.py run_dir=RUN_DIR text="A person is walking forward."

Compute similarity between text and motion

python text_motion_sim.py run_dir=RUN_DIR text=TEXT npy=/path/to/motion.npy

For example with text="a man sets to do a backflips then fails back flip and falls to the ground" and npy=HumanML3D/HumanML3D/new_joint_vecs/001034.npy you should get around 0.96.

Launch the demo

Encode the whole motion dataset

python encode_dataset.py run_dir=RUN_DIR

Text-to-motion retrieval demo

Run this command:

python app.py

and then open your web browser at the address: http://localhost:7860.

License 📚

This code is distributed under an MIT LICENSE.

Note that our code depends on other libraries, including PyTorch, PyTorch3D, Hugging Face, Hydra, and uses datasets which each have their own respective licenses that must also be followed.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.8%
  • Shell 0.2%