Official Github Repo for PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models
In this work, we release the first open, ultra-high-fidelity dataset of multi-layer transparent images — PrismLayers (200K) and its high-quality subset PrismLayersPro (20K) — with accurate alpha mattes, synthesized based on our bottom-up approach MultilayerFlux.
This project provides three core components:
-
ART+, a new-generation multi-layer image generation model trained on our synthesized data.
-
The Transparent Image Quality Assessment model(TIPS) trained on over 100K transparent image pairs.
-
The artifact filter for multi-layer transparent images to automatically remove low-quality generations.
- Environment Setup
- Checkpoints
- Demo
- ART+
- Artifact Classifier
- TIPS
conda create -n multilayer python=3.10 -y
conda activate multilayerpip3 install torch==2.4.0 torchvision==0.19.0
pip install diffusers==0.31.0 transformers==4.44.0 accelerate==0.34.2 peft==0.12.0 datasets==2.20.0
pip install wandb==0.17.7 einops==0.8.0 sentencepiece==0.2.0 mmengine==0.10.4 prodigyopt==1.0huggingface-cli loginCreate a path checkpoints and download the following checkpoints into this path.
| Variable | Description | Action Required |
|---|---|---|
base_ckpt |
ART model pretrained | Download fromGoogle Drive |
artplus_PrismLayersPro_ckpt |
ART2 model fine-tuned at 1024 resolution with high-quality data from the PrismLayersPro dataset | Download fromGoogle Drive |
artplus_PrismLayersPro100k_ckpt |
ART2 model fine-tuned at 1024 resolution with ultra high-quality data from the PrismLayersPro100k dataset | Download fromGoogle Drive |
transp_vae_ckpt |
Multi-layer transparency decoder checkpoint | Download fromGoogle Drive |
pre_fuse_lora_dir |
LoRA weights to be fused initially | Download fromGoogle Drive |
classifier_ckpt |
The artifact multi-layer transparent image filter | Download fromGoogle Drive |
TIPS_ckpt |
Transparent Image Quality Assessment model | Download fromGoogle Drive |
You can choose either artplus_PrismLayersPro_ckpt or artplus_PrismLayersPro100k_ckpt as artplus_ckpt.
The downloaded checkpoints for ART+'s inference should be organized as follows:
checkpoints/
├── artplus_ckpt/
│ ├── layer_pe.pth
│ └── pytorch_lora_weights.safetensors
├── base_ckpt/
| └── pytorch_lora_weights.safetensors
├── pre_fuse_lora/
| └── pytorch_lora_weights.safetensors
└── transparent_decoder_ckpt.pt
Use our prepared layout with style guidance caption in inference_layout/demo/test_1.json
python multi_layer_gen/demo.py -c multi_layer_gen/demo.yaml -i inference_layout/test_1.json
- Change the
save_dirinmulti_layer_gen/demo.yaml
inference_layout format
[
{
"whole_image": {
"caption": "This is a [STYLE_NAME] style image. xxx",
"height": 1024,
"width": 1024
},
"layout": [
[0, 0, 1023, 1023],
[0, 0, 1023, 1023],
[x11, y11, x12, y12],
[x21, y21, x22, y22],
...
]
},
...
]
- box: value from 0 to image_size-1
STYLE_NAME: 3D, Japanese anime, cartoon, watercolor painting, steampunk, pixel art, melting gold, furry, metal-textured, pop art, wood carving, melting silver, toy, Pokemon, sand painting, line draw, kid crayon drawing, neon graffiti, doodle art, papercut art, ink
Env
pip install hpsv2
wget https://dl.fbaipublicfiles.com/mmf/clip/bpe_simple_vocab_16e6.txt.gz
mv bpe_simple_vocab_16e6.txt.gz [site-packages_path]/hpsv2/src/open_clip/bpe_simple_vocab_16e6.txt.gzThe classifier outputs values between 0 and 1, where higher scores indicate more severe artifacts. We use 0.18 as the threshold.
Inference
cd artifact_classifier
python inference.py --image_path merged.png --classifier_weights [ckpt_file_path] --output_file merged.jsonThe TIPS ranges from 0 to 1, with higher scores indicating better transparent image quality.
Inference
cd TIPS
python inference.py --img_path test.png --caption "A pink shiny star" --checkpoint [ckpt_file_path] --output_file result.jsonPlease kindly cite our paper if you use our code, data, models or results:
@article{chen2025prismlayers,
title={PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models},
author={Chen, Junwen and Jiang, Heyang and Wang, Yanbin and Wu, Keming and Li, Ji and Zhang, Chao and Yanai, Keiji and Chen, Dong and Yuan, Yuhui},
journal={arXiv preprint arXiv:2505.22523},
year={2025}
}
