Skip to content

yanq095/InstructMoLE

Repository files navigation

InstructMoLE: Instruction-Guided Mixture of Low-rank Experts
for Multi-Conditional Image Generation

Jinqi Xiao1,2 · Qing Yan1 · Liming Jiang1 · Zichuan Liu1 · Hao Kang1 · Shen Sang1 · Tiancheng Zhi1 · Jing Liu1 · Cheng Yang1 · Xin Lu1 · Bo Yuan2

1ByteDance Inc.  2Rutgers University  

Paper PDF

InstructMoLE (Instruction-Guided Mixture of Low-rank Experts) addresses task interference in multi-conditional image generation by aligning expert selection with global user intent. Unlike standard per-token routing mechanisms that can cause semantic and spatial artifacts, InstructMoLE introduces a unified routing strategy that ensures consistent expert choices across the entire image, enabling effective handling of diverse conditional generation tasks including single image editing, multi-subject generation, and spatial alignment.


Key Contributions

InstructMoLE solves task interference in multi-conditional image generation through two key innovations:

  • Instruction-Guided Routing (IGR): Replaces standard per-token routing with a single, global signal derived from the user's instruction. This enforces a consistent expert choice across the entire image, preventing semantic and spatial artifacts that arise from inconsistent routing decisions.

  • Output-Space Orthogonality Loss: A novel regularizer that forces experts to be functionally distinct. It directly prevents expert collapse by penalizing redundant outputs, ensuring effective specialization across different conditional generation tasks.


Installation

conda create -n instruct_mole python=3.11
conda activate instruct_mole
bash install_env.sh

The installation script will install all required dependencies including PyTorch, Diffusers, Transformers, and other necessary packages for training and evaluation.


Dataset Preparation

For training InstructMoLE, we support multiple open-source datasets covering different conditional generation scenarios:

Please prepare your datasets according to the expected format and place them in the appropriate directories. You can also use your own additional data for model training.


Training

To train InstructMoLE, use the provided training script:

bash run.sh

The training script uses accelerate launch for distributed training. You can customize training parameters by modifying train_config.json, which includes:

Configuration Parameters

MoE Configuration:

  • num_experts: Number of experts in the mixture (default: 8)
  • num_experts_per_tok: Number of experts activated per token (default: 4)
  • rank: Low-rank decomposition rank for experts (default: 32)
  • alpha: Scaling factor for expert outputs (default: 32)
  • type_aux_loss_alpha: Weight for type-based auxiliary loss (default: 0.1)
  • token_aux_loss_alpha: Weight for token-based auxiliary loss (default: 0.01)
  • orthogonal_reg_alpha: Weight for orthogonality regularization (default: 0.01)
  • use_type_embedding: Whether to use instruction-guided routing (default: true)

LoRA Configuration:

  • r: LoRA rank (default: 256)
  • lora_alpha: LoRA alpha scaling factor (default: 256)
  • target_modules: List of modules to apply LoRA

For more details on training, refer to train_kontext.py and train_config.json.


Evaluation

InstructMoLE supports evaluation on multiple benchmarks:

  • XVerseBench: Multi-subject conditional generation benchmark
  • OmniContext: Image editing benchmark
  • Spatial Alignment: Pose, depth, and canny edge evaluation

Evaluation scripts are provided in the eval/ directory. Please refer to the respective evaluation scripts for detailed usage instructions.


BibTeX

If you find InstructMoLE useful for your research and applications, please cite InstructMoLE using this BibTeX:

@misc{xiao2025instructmoleinstructionguidedmixturelowrank,
      title={InstructMoLE: Instruction-Guided Mixture of Low-rank Experts for Multi-Conditional Image Generation}, 
      author={Jinqi Xiao and Qing Yan and Liming Jiang and Zichuan Liu and Hao Kang and Shen Sang and Tiancheng Zhi and Jing Liu and Cheng Yang and Xin Lu and Bo Yuan},
      year={2025},
      eprint={2512.21788},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.21788}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages