Skip to content

redredsheep/PrismLayers

Repository files navigation

PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models

arXiv Project Page Dataset

Official Github Repo for PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models

In this work, we release the first open, ultra-high-fidelity dataset of multi-layer transparent images — PrismLayers (200K) and its high-quality subset PrismLayersPro (20K) — with accurate alpha mattes, synthesized based on our bottom-up approach MultilayerFlux.

This project provides three core components:

  • ART+, a new-generation multi-layer image generation model trained on our synthesized data.

  • The Transparent Image Quality Assessment model(TIPS) trained on over 100K transparent image pairs.

  • The artifact filter for multi-layer transparent images to automatically remove low-quality generations.

Table of Contents

Environment Setup

1. Create Conda Environment

conda create -n multilayer python=3.10 -y
conda activate multilayer

2. Install Dependencies

pip3 install torch==2.4.0 torchvision==0.19.0
pip install diffusers==0.31.0 transformers==4.44.0 accelerate==0.34.2 peft==0.12.0 datasets==2.20.0
pip install wandb==0.17.7 einops==0.8.0 sentencepiece==0.2.0 mmengine==0.10.4 prodigyopt==1.0

3. Login to Hugging Face

huggingface-cli login

Checkpoints

Create a path checkpoints and download the following checkpoints into this path.

Variable Description Action Required
base_ckpt ART model pretrained Download fromGoogle Drive
artplus_PrismLayersPro_ckpt ART2 model fine-tuned at 1024 resolution with high-quality data from the PrismLayersPro dataset Download fromGoogle Drive
artplus_PrismLayersPro100k_ckpt ART2 model fine-tuned at 1024 resolution with ultra high-quality data from the PrismLayersPro100k dataset Download fromGoogle Drive
transp_vae_ckpt Multi-layer transparency decoder checkpoint Download fromGoogle Drive
pre_fuse_lora_dir LoRA weights to be fused initially Download fromGoogle Drive
classifier_ckpt The artifact multi-layer transparent image filter Download fromGoogle Drive
TIPS_ckpt Transparent Image Quality Assessment model Download fromGoogle Drive

You can choose either artplus_PrismLayersPro_ckpt or artplus_PrismLayersPro100k_ckpt as artplus_ckpt.

The downloaded checkpoints for ART+'s inference should be organized as follows:

checkpoints/
├── artplus_ckpt/
│   ├── layer_pe.pth
│   └── pytorch_lora_weights.safetensors
├── base_ckpt/
|   └── pytorch_lora_weights.safetensors
├── pre_fuse_lora/
|   └── pytorch_lora_weights.safetensors
└── transparent_decoder_ckpt.pt

Demo

ART+

Use our prepared layout with style guidance caption in inference_layout/demo/test_1.json

python multi_layer_gen/demo.py -c multi_layer_gen/demo.yaml -i inference_layout/test_1.json
  • Change the save_dir in multi_layer_gen/demo.yaml

inference_layout format

[
    {
        "whole_image": {
            "caption": "This is a [STYLE_NAME] style image. xxx",
            "height": 1024,
            "width": 1024
        },
        "layout": [
            [0, 0, 1023, 1023],
            [0, 0, 1023, 1023],
            [x11, y11, x12, y12],
            [x21, y21, x22, y22],
            ...
        ]
    },
    ...
]
  • box: value from 0 to image_size-1
  • STYLE_NAME: 3D, Japanese anime, cartoon, watercolor painting, steampunk, pixel art, melting gold, furry, metal-textured, pop art, wood carving, melting silver, toy, Pokemon, sand painting, line draw, kid crayon drawing, neon graffiti, doodle art, papercut art, ink

Artifact Classifier

Env

pip install hpsv2
wget https://dl.fbaipublicfiles.com/mmf/clip/bpe_simple_vocab_16e6.txt.gz
mv bpe_simple_vocab_16e6.txt.gz [site-packages_path]/hpsv2/src/open_clip/bpe_simple_vocab_16e6.txt.gz

The classifier outputs values between 0 and 1, where higher scores indicate more severe artifacts. We use 0.18 as the threshold.

Inference

cd artifact_classifier
python inference.py --image_path merged.png --classifier_weights [ckpt_file_path] --output_file merged.json

TIPS

The TIPS ranges from 0 to 1, with higher scores indicating better transparent image quality.

Inference

cd TIPS
python inference.py --img_path test.png --caption "A pink shiny star" --checkpoint [ckpt_file_path] --output_file result.json

📚 Citation

Please kindly cite our paper if you use our code, data, models or results:

@article{chen2025prismlayers,
  title={PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models},
  author={Chen, Junwen and Jiang, Heyang and Wang, Yanbin and Wu, Keming and Li, Ji and Zhang, Chao and Yanai, Keiji and Chen, Dong and Yuan, Yuhui},
  journal={arXiv preprint arXiv:2505.22523},
  year={2025}
}

About

PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published