Reconstructing Animals and the Wild

Peter Kulits, Michael J. Black, Silvia Zuffi

Data and code coming soon.

Summary

We train an LLM to decode a frozen CLIP embedding of a natural image into a structured compositional scene representation encompassing both animals and their habitats.

Data

Data can be found at https://raw.is.tue.mpg.de/download.php after registering on the project page.

Setup

The environment can be configured with conda/micromamba from environment.yml or using the Dockerfile.

Training

After the data has been downloaded, training can be initiated with the following:

python train.py \
    --images_tar data/train.tar \
    --data_path data/train.gz.feather \
    --images_val_tar data/val.tar \
    --data_path_val data/val.gz.feather \
    --per_device_train_batch_size X \
    --output_dir ./checkpoints/RAW-Y \
    --max_steps 40000 \
    --image_aspect_ratio pad

Inference

python inference.py \
    --model-path ./checkpoints/RAW-Y \
    --images_tar data/val.tar \
    --out_path ./out/RAW-Y.json.gz \
    --image_aspect_ratio pad

License

We build off the LLaVA codebase to perform our experiments. As such, inherited code falls under the original Apache 2.0 license. Additions and modifications are released under a different license in accordance with institute requirements which has been prepended to LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 451 Commits
llava		llava
scripts		scripts
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
inference.py		inference.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reconstructing Animals and the Wild

Summary

Data

Setup

Training

Inference

License

About

Uh oh!

Releases

Packages

Contributors 42

Uh oh!

Languages

License

kulits/RAW

Folders and files

Latest commit

History

Repository files navigation

Reconstructing Animals and the Wild

Summary

Data

Setup

Training

Inference

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 42

Uh oh!

Languages

Packages