Skip to content

kulits/RAW

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reconstructing Animals and the Wild

Peter Kulits, Michael J. Black, Silvia Zuffi

[Project Page]

Data and code coming soon.

Summary

We train an LLM to decode a frozen CLIP embedding of a natural image into a structured compositional scene representation encompassing both animals and their habitats.

Data

Data can be found at https://raw.is.tue.mpg.de/download.php after registering on the project page.

Setup

The environment can be configured with conda/micromamba from environment.yml or using the Dockerfile.

Training

After the data has been downloaded, training can be initiated with the following:
python train.py \
    --images_tar data/train.tar \
    --data_path data/train.gz.feather \
    --images_val_tar data/val.tar \
    --data_path_val data/val.gz.feather \
    --per_device_train_batch_size X \
    --output_dir ./checkpoints/RAW-Y \
    --max_steps 40000 \
    --image_aspect_ratio pad

Inference

python inference.py \
    --model-path ./checkpoints/RAW-Y \
    --images_tar data/val.tar \
    --out_path ./out/RAW-Y.json.gz \
    --image_aspect_ratio pad

License

We build off the LLaVA codebase to perform our experiments. As such, inherited code falls under the original Apache 2.0 license. Additions and modifications are released under a different license in accordance with institute requirements which has been prepended to LICENSE.

About

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 42