GitHub - davidetalon/fashionact: Official Code implementatation of "Seeing the Abstract: Translating the Abstract Language for Vision Language Models"

Abstract-to-Concrete Translator (ACT)

[Paper][Project Page]
Official Pytorch implementation of CVPR25 paper "Seeing the Abstract: Translating the Abstract Language for Vision Language Models"
Davide Talon*, Federico Girella*, Ziyue Liu, Marco Cristani, Yiming Wang

^_*^{_{Equal Contribution}}

1. Installation

Clone the repo and install the environment:

git clone https://github.com/davidetalon/fashionact.git
cd fashionact
conda env create -f environment.yml

Then, activate the environment and install our modules:

conda activate fashionact
pip install -e .

2. Data setup

First, download the Deepfashion data from the In-shop Clothes Retrieval task.

You can hence format the data using:

mkdir data/in-shop_clothes_retrieval/
mkdir data/in-shop_clothes_retrieval/tmp
cd data/in-shop_clothes_retrieval/tmp
unzip "/path/to/In-Shop Clothes Retrieval Benchmark*"
mv In-shop\ Clothes\ Retrieval\ Benchmark/* ../
cd ../

NOTE: Our experiments were run on the high-resolution version of the dataset. You will need to get authorization and password access from the original authors:

unzip path/to/img_highres_seg*.zip -o img
rm -r Img
rm -r tmp

Alternatively, you can opt to use low-resolution version:

unzip Img/img.zip
rm -r Img
rm -r tmp

2a. Pre-computed data

You can download pre-computed data from here and store it under the data/ folder.

2b. OPTIONAL: Compute data locally

NOTE: This step is only necessary if you want to compute the data locally, without using the pre-computed data from the previous section.

From the root folder you can generate json files of the input with:

# Train data
python -m  scripts.generate_data --data_path data/in-shop_clothes_retrieval/ --deepfashion --split train --out_file data/deepfashion-train.json

# Test data
python -m  scripts.generate_data --data_path data/in-shop_clothes_retrieval/ --deepfashion --split eval --out_file data/deepfashion-eval.json

Hence you can caption available data with:

# Train data
python -m scripts.captioning --data-file data/deepfashion-train.json --vlm-type qwen2-vl --out-file data/deepfashion-train-captioned.json

And apply language rewriting:

python -m scripts.language_rewrite --prompt-type dssp --data-file data/deepfashion-eval.json --llm-type llama3-8B --out-file data/deepfashion-eval-rewritten.json

You can then merge the two files as:

# Train
python -m scripts.add_description --input-file data/deepfashion-train.json --extra-info data/deepfashion-train-captioned.json --info-name qwen2-vl --info-type 'other' --out-file data/deepfashion_train_database.json

# Eval
python -m scripts.add_description --input-file data/deepfashion-eval.json --extra-info data/deepfashion-eval-rewritten.json --info-name llama-3-8B --info-type 'llama-3' --out-file data/deepfashion_eval_noics.json

So you can generate the queries from the data using language-rewritten descriptions:

python -m scripts.generate_queries -i data/deepfashion_eval_noics.json -o data/deepfashion_eval_noics_queries_llama3-8B.json --query_type llama-3-8B

3. Evaluation

You can evaluate with:

python -m scripts.evaluate \
--queries_file data/deepfashion_eval_noics_queries_llama3-8B.json \
--images_file data/deepfashion_eval_noics.json \
--store_encodings siglip-deepfashion.pt --out_file out/results.json \
--backbone siglip --use_textual_prompts \
--concrete_cache data/deepfashion_train_database.json \
--concrete_type qwen2-vl \
--abstract_type description \
--notes siglip-act-df-df

Notebooks

In the notebooks/ folder you can find some useful Jupyter Notebooks.

query.ipynb allows you to query from Deepfashion evaluation set using your own descriptions. Note that you need to first run the inference script to save necessary evaluation embeddings and the shift representation.
attribute-categorization.ipynb showcases the attribute categorization pipeline used for preliminary experiments. You can download the needed SpaCy model using python -m spacy download en_core_web_sm

Docker

You can eventually use Docker containers. Build the container with:

docker build -t fashionact:latest -f docker/Dockerfile .

And then run it:

docker run --shm-size=64g --gpus '"device=0"' --rm -it -v $(pwd)/data:/app/data/ -v /path-to-huggingface-cache:/root/.cache fashionact /bin/bash

So you can use the same scripts as before.

Demo

We release a minimal Gradio demo with the ACT model. Install gradio and then run the demo using:

python demo.py

Note that you should run the inference script first to save necessary image embeddings and shift characterization.

Acknowledgements

If you find this repo useful, please don't forget to cite:

@inproceedings{talon2025seeing,
  title={Seeing the Abstract: Translating the Abstract Language for Vision Language Models},
  author={Talon, Davide and Girella, Federico and Liu, Ziyue and Cristani, Marco and Wang, Yiming},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github		.github
act		act
data		data
docker		docker
notebooks		notebooks
out		out
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
environment.yml		environment.yml
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Abstract-to-Concrete Translator (ACT)

1. Installation

2. Data setup

2a. Pre-computed data

2b. OPTIONAL: Compute data locally

3. Evaluation

Notebooks

Docker

Demo

Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

davidetalon/fashionact

Folders and files

Latest commit

History

Repository files navigation

Abstract-to-Concrete Translator (ACT)

1. Installation

2. Data setup

2a. Pre-computed data

2b. OPTIONAL: Compute data locally

3. Evaluation

Notebooks

Docker

Demo

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages