Skip to content

Official Code implementatation of "Seeing the Abstract: Translating the Abstract Language for Vision Language Models"

License

Notifications You must be signed in to change notification settings

davidetalon/fashionact

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Abstract-to-Concrete Translator (ACT)

[Paper][Project Page]
Official Pytorch implementation of CVPR25 paper "Seeing the Abstract: Translating the Abstract Language for Vision Language Models"
Davide Talon*, Federico Girella*, Ziyue Liu, Marco Cristani, Yiming Wang

*Equal Contribution

1. Installation

Clone the repo and install the environment:

git clone https://github.com/davidetalon/fashionact.git
cd fashionact
conda env create -f environment.yml

Then, activate the environment and install our modules:

conda activate fashionact
pip install -e .

2. Data setup

First, download the Deepfashion data from the In-shop Clothes Retrieval task.

You can hence format the data using:

mkdir data/in-shop_clothes_retrieval/
mkdir data/in-shop_clothes_retrieval/tmp
cd data/in-shop_clothes_retrieval/tmp
unzip "/path/to/In-Shop Clothes Retrieval Benchmark*"
mv In-shop\ Clothes\ Retrieval\ Benchmark/* ../
cd ../

NOTE: Our experiments were run on the high-resolution version of the dataset. You will need to get authorization and password access from the original authors:

unzip path/to/img_highres_seg*.zip -o img
rm -r Img
rm -r tmp

Alternatively, you can opt to use low-resolution version:

unzip Img/img.zip
rm -r Img
rm -r tmp

2a. Pre-computed data

You can download pre-computed data from here and store it under the data/ folder.

2b. OPTIONAL: Compute data locally

NOTE: This step is only necessary if you want to compute the data locally, without using the pre-computed data from the previous section.

From the root folder you can generate json files of the input with:

# Train data
python -m  scripts.generate_data --data_path data/in-shop_clothes_retrieval/ --deepfashion --split train --out_file data/deepfashion-train.json
# Test data
python -m  scripts.generate_data --data_path data/in-shop_clothes_retrieval/ --deepfashion --split eval --out_file data/deepfashion-eval.json

Hence you can caption available data with:

# Train data
python -m scripts.captioning --data-file data/deepfashion-train.json --vlm-type qwen2-vl --out-file data/deepfashion-train-captioned.json

And apply language rewriting:

python -m scripts.language_rewrite --prompt-type dssp --data-file data/deepfashion-eval.json --llm-type llama3-8B --out-file data/deepfashion-eval-rewritten.json

You can then merge the two files as:

# Train
python -m scripts.add_description --input-file data/deepfashion-train.json --extra-info data/deepfashion-train-captioned.json --info-name qwen2-vl --info-type 'other' --out-file data/deepfashion_train_database.json
# Eval
python -m scripts.add_description --input-file data/deepfashion-eval.json --extra-info data/deepfashion-eval-rewritten.json --info-name llama-3-8B --info-type 'llama-3' --out-file data/deepfashion_eval_noics.json

So you can generate the queries from the data using language-rewritten descriptions:

python -m scripts.generate_queries -i data/deepfashion_eval_noics.json -o data/deepfashion_eval_noics_queries_llama3-8B.json --query_type llama-3-8B

3. Evaluation

You can evaluate with:

python -m scripts.evaluate \
--queries_file data/deepfashion_eval_noics_queries_llama3-8B.json \
--images_file data/deepfashion_eval_noics.json \
--store_encodings siglip-deepfashion.pt --out_file out/results.json \
--backbone siglip --use_textual_prompts \
--concrete_cache data/deepfashion_train_database.json \
--concrete_type qwen2-vl \
--abstract_type description \
--notes siglip-act-df-df

Notebooks

In the notebooks/ folder you can find some useful Jupyter Notebooks.

  • query.ipynb allows you to query from Deepfashion evaluation set using your own descriptions. Note that you need to first run the inference script to save necessary evaluation embeddings and the shift representation.
  • attribute-categorization.ipynb showcases the attribute categorization pipeline used for preliminary experiments. You can download the needed SpaCy model using python -m spacy download en_core_web_sm

Docker

You can eventually use Docker containers. Build the container with:

docker build -t fashionact:latest -f docker/Dockerfile .

And then run it:

docker run --shm-size=64g --gpus '"device=0"' --rm -it -v $(pwd)/data:/app/data/ -v /path-to-huggingface-cache:/root/.cache fashionact /bin/bash

So you can use the same scripts as before.

Demo

We release a minimal Gradio demo with the ACT model. Install gradio and then run the demo using:

python demo.py

Note that you should run the inference script first to save necessary image embeddings and shift characterization.

Acknowledgements

If you find this repo useful, please don't forget to cite:

@inproceedings{talon2025seeing,
  title={Seeing the Abstract: Translating the Abstract Language for Vision Language Models},
  author={Talon, Davide and Girella, Federico and Liu, Ziyue and Cristani, Marco and Wang, Yiming},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2025}
}

About

Official Code implementatation of "Seeing the Abstract: Translating the Abstract Language for Vision Language Models"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •