FOCUS: A Principled Route to Multi‑Subject Fidelity

Eric Tillmann Bill, Enis Simsar, Thomas Hofmann

Abstract. Text-to-image (T2I) models excel on single-entity prompts but struggle with multi-subject descriptions, often showing attribute leakage, identity entanglement, and subject omissions. We introduce the first theoretical framework with a principled, optimizable objective for steering sampling dynamics toward multi-subject fidelity. Viewing flow matching (FM) through stochastic optimal control (SOC), we formulate subject disentanglement as control over a trained FM sampler. This yields two architecture-agnostic algorithms: (i) a training-free test‑time controller that perturbs the base velocity with a single-pass update, and (ii) Adjoint Matching, a lightweight fine-tuning rule that regresses a control network to a backward adjoint signal while preserving base-model capabilities. The same formulation unifies prior attention heuristics, extends to diffusion models via a flow–diffusion correspondence, and provides the first fine-tuning route explicitly designed for multi-subject fidelity. Empirically, on Stable Diffusion 3.5, FLUX, and Stable Diffusion XL, both algorithms consistently improve multi-subject alignment while maintaining base-model style. Test-time control runs efficiently on commodity GPUs, and fine-tuned controllers trained on limited prompts generalize to unseen ones. We further highlight FOCUS (Flow Optimal Control for Unentangled Subjects), which achieves state-of-the-art multi-subject fidelity across models.

Base Models Base Models + FOCUS (Ours)

Highlights

Two ways to use FOCUS
- Test‑time controller: training‑free, perturbs the base sampler to encourage disentanglement.
- Adjoint Matching: efficient fine‑tuning that learns a control network without degrading base capabilities.
Model‑agnostic: demonstrated with SD3.5, FLUX, and SDXL.
Plug‑and‑play: minimal changes to official pipelines; FOCUS controller is passed into the sampler.
Efficient: test‑time control runs on commodity GPUs (≈12 GB VRAM works with memory‑saving modes).

Installation

# Clone the repository
git clone https://github.com/ericbill21/FOCUS.git
cd FOCUS

# Install dependencies
pip install -r requirements.txt

Note: The example notebooks rely on the official SD3 / FLUX pipelines with a small wrapper to accept the Controller object.

Test‑Time Control

We provide two simple notebooks to run FOCUS without changing model weights:

sample_sd3_otf.ipynb (Stable Diffusion 3.5)
sample_flux_otf.ipynb (FLUX‑1 [dev])

Each pipeline is a lightly modified copy of the official implementation to accept a Controller that specifies the heuristic, λ (lambda), and subject indices.

from focus.controller import Controller

controller = Controller(
    t5_ids=[[1], [5]],      # indices of prompt subjects in the T5 encoder
    clip_ids=[[2], [5]],    # indices of prompt subjects in the CLIP encoder
    lambda_scale=4.0,
    heuristic="focus",        # one of: "focus", "conform", "attend_and_excite", "divide_and_bind"
    model="SD3"            # "SD3" or "FLUX"
)

# Pass `controller` to the sampling pipeline in the notebook or your script.

Tips

Set lambda_scale=0 to disable the controller (baseline).
Start with moderate λ (e.g., 1–5) and tune per prompt.

Fine‑Tuning (Adjoint Matching)

We provide two training scripts (one per base model). Both follow the same logic.

Inference on FOCUS+Finetuned Models

Effective Batch Size

Let num_traj be the number of trajectories per iteration and k the length (time steps) of each trajectory. The effective batch size is num_traj × k. We further sub‑sample time steps via sub-start, sub-end, and sub-extra to reduce memory and avoid overfitting.

Common Arguments

Group	Flag	Default	Description
Setup	`--res-dir`	–	Directory to save results/checkpoints
	`--dataset`	–	Path to a pre‑encoded dataset. We provided examples in `datasets/data_finetuning/`
	`--ckpt-every`	100	Save a checkpoint every N iterations
	`--verbose`	1	0: warning, 1: info, 2: debug
Training	`--num-iterations`	400	Number of training iterations
	`--learning-rate`	1e-5	Optimizer learning rate
	`--dtype`	bfloat16	Compute dtype (e.g., bfloat16, float32)
Scheduler / Solver	`--num-traj`	5	Trajectories sampled per iteration
	`--k`	28	Time steps per trajectory
	`--lambda-value`	1.0	λ in the reward/objective
Sub‑sampling	`--sub-start`	0	Start index for sub‑sampling time steps
	`--sub-end`	0	End index for sub‑sampling time steps
	`--sub-extra`	5	Additional random time steps
LoRA	`--lora-rank`	4	LoRA rank
	`--lora-alpha`	16	LoRA α
	`--lora-dropout`	0.05	LoRA dropout
Misc	`--seed`	42	Random seed
	`--image-size`	256	Training image size

Example

python finetune_sd3.py \
  --dataset datasets/data_finetuning/example.yaml \
  --res-dir runs/sd3_focus \
  --num-iterations 200 --learning-rate 5e-5 \
  --num-traj 5 --k 28 --lambda-value 1.0 \
  --sub-start 0 --sub-end 0 --sub-extra 16 \
  --lora-rank 4 --lora-alpha 16 --lora-dropout 0.05 \
  --dtype bfloat16 --image-size 256 --seed 42

Sampling Script (`sample.py`)

sample.py can generate images using test‑time control, fine‑tuned weights, or both. Disable the test-time controller by setting --lambda-scale 0.

Key Arguments

Group	Flag	Default	Description
Paths	`--exp-dir`	images	Output directory
	`--dataset`	(required)	Path to a dataset YAML of prompts. We provide examples in `datasets/`
Weights	`--path`	–	Path to fine‑tuned weights (optional)
Sampler	`--image-size`	512	Output resolution
	`--num-steps`	model default	ODE/diffusion steps
	`--guidance-scale`	model default	Classifier‑free guidance
Seeds	`--seed`	0	Single seed
	`--seed-range`	–	Range `[a b]` to sweep multiple seeds
Controller	`--lambda-scale`	1.0	λ for controller; set `0` to disable
	`--heuristic`	focus	One of {focus, conform, attend_and_excite, divide_and_bind}
	`--model`	SD3	Base model: {SD3, FLUX}
Memory	`--save-memory`	0	0: none, 1: offload, 2: sequential offload, 3: + grad checkpointing

Example

python sample.py \
  --dataset datasets/natural_prompts.yaml \
  --exp-dir outputs/sd3_focus \
  --model SD3 --heuristic focus --lambda-scale 4 \
  --image-size 512 --seed-range 0 10

Metrics & Evaluation

Use eval_metrics.py to compute all metrics reported in the paper.

Generate a set of images with sample.py.
Run eval_metrics.py on the produced images.
The script writes a CSV next to the image folder with per‑image metrics.

python eval_metrics.py \
  --data-dir outputs/sd3_focus \
  --batch-size 16

Datasets

We provide all used datasets in the folder datasets/, which are simple YAML files, containing for each prompt, the subjects, and their corresponding T5 and CLIP indices. For the finetuning, we further provide already encoded versions of datasets/two_objects.yaml and datasets/horse_bear.yaml, in the folder datasets/data_finetuning/.

Model Checkpoints

We released our best perfoming checkpoints for FLUX and SD3 on Huggingface.

Troubleshooting

OOM / CUDA out of memory: reduce --image-size, use --save-memory 2 or 3, or reduce --k / --num-traj.
No improvement from controller: tune --lambda-scale, verify subject indices (t5_ids, clip_ids), and try alternative heuristics (conform, attend_and_excite, divide_and_bind).
Slow sampling: decrease steps, enable memory‑saving modes, or disable gradient‑heavy options in notebooks.

📄 Citation

If you find our work useful, please consider citing:

@misc{bill2025focus,
  title         = {Optimal Control Meets Flow Matching: A Principled Route to Multi-Subject Fidelity},
  author        = {Eric Tillmann Bill and Enis Simsar and Thomas Hofmann},
  year          = {2025},
  eprint        = {2505.19166},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV}
}

Questions or issues? Please open a GitHub issue or reach out via the project website.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
datasets		datasets
src		src
static/images		static/images
.gitignore		.gitignore
README.md		README.md
demo_finetuning.ipynb		demo_finetuning.ipynb
eval_metrics.py		eval_metrics.py
finetune_flux.py		finetune_flux.py
finetune_sd3.py		finetune_sd3.py
requirements.txt		requirements.txt
sample.py		sample.py
sample_flux_otf.ipynb		sample_flux_otf.ipynb
sample_sd3_otf.ipynb		sample_sd3_otf.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FOCUS: A Principled Route to Multi‑Subject Fidelity

Highlights

Installation

Test‑Time Control

Fine‑Tuning (Adjoint Matching)

Inference on FOCUS+Finetuned Models

Effective Batch Size

Common Arguments

Example

Sampling Script (`sample.py`)

Key Arguments

Example

Metrics & Evaluation

Datasets

Model Checkpoints

Troubleshooting

📄 Citation

About

Uh oh!

Packages

Contributors 2

Uh oh!

Languages

ericbill21/FOCUS

Folders and files

Latest commit

History

Repository files navigation

FOCUS: A Principled Route to Multi‑Subject Fidelity

Highlights

Installation

Test‑Time Control

Fine‑Tuning (Adjoint Matching)

Inference on FOCUS+Finetuned Models

Effective Batch Size

Common Arguments

Example

Sampling Script (sample.py)

Key Arguments

Example

Metrics & Evaluation

Datasets

Model Checkpoints

Troubleshooting

📄 Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Packages 0

Contributors 2

Uh oh!

Languages

Sampling Script (`sample.py`)

Packages