✨ Lumina-Accessory directly leverages the self-attention mechanism in DiT to perform interaction between condition and target image tokens, consistent with approaches such as OminiControl, DSD, VisualCloze, etc.
✨ Built on top of Lumina-Image-2.0, Lumina-Accessory introduces an additional condition processor, initialized with the weights of the latent processor.
✨ We pass TA-Tok's discrete tokens to Lumina-Accessory for transforming text-aligned representation into the pixel space with high quality. We made minor modifications to Lumina-Accessory, such as iterative parquet dataset loading and TA-Tok condition support.
conda create -n Lumina2 -y
conda activate Lumina2
conda install python=3.11 pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia -ypip install -r requirements.txtpip install flash-attn --no-build-isolationWe suggest to use parquet dataset for loading large scale training data. Check csuhan/ImageNet1K-T2I-QwenVL-QwenImage for example.
bash scripts/run_1024_finetune_tatok.shPlease check the inference script in the main branch.
