Scope plugin providing the LTX 2.3 audio-video generation pipeline from Lightricks.
LTX 2.3 is a 22B-parameter DiT (Diffusion Transformer) that generates synchronized video and audio from text prompts. This plugin uses ComfyUI-derived model loading and inference code with Kijai's separated FP8 checkpoints, enabling it to run on a single 24GB GPU.
Important
Plugin support is a preview feature in Scope; APIs may change before a stable release. Use a Scope build that includes plugin support (manual installation).
- Audio–video generation — synchronized video and audio from text prompts
- Modes — text (default) and video; video mode supports IC-LoRA guide conditioning using a control video (e.g. depth / canny / pose) on the graph video input
- Image-to-video (I2V) — optional reference image (
i2v_image) with adjustable I2V strength (i2v_strength) to condition the first frame; set strength to0for pure text-to-video - LoRA — permanent merge at load into the FP8 transformer (dequantize → merge → requantize); zero runtime LoRA overhead; compatible with block streaming. Additional LoRA files go under your Scope models directory (e.g.
models/lora/) and are selected via the pipeline LoRA UI orlorasin/load - IC-LoRA Union Control — weights from Lightricks/LTX-2.3-22b-IC-LoRA-Union-Control are downloaded with the pipeline; they are merged when you use video-mode guide conditioning (or you can list the safetensors explicitly in
loras). Use control strength (control_strength) to blend the guide - Sampling — default 8-step distilled Euler schedule; optional schedules:
linear,cosine,linear_quadratic,beta; configurable denoising steps (num_steps). Advanced: customsigmaslist (API) overrides step count and schedule - Output constraints — height/width snapped to 32-pixel multiples; frame count snapped to 8×K+1 (minimum 9)
- Runs on 24GB GPUs — FP8 weights in checkpoints, CPU-resident transformer blocks with double-buffered streaming to GPU
- Configurable output — resolution, frame count, frame rate, seed / randomize seed per chunk, FFN chunk size for memory tuning
- VRAM: ~22GB (24GB GPU recommended, e.g. RTX 4090 / A5000)
- Gemma 3 12B FP8 text encoder: ~13GB (offloaded after encoding)
- Transformer 22B FP8: ~23GB total, CPU-resident with block streaming
- Video VAE + Audio VAE + vocoder: ~1GB (GPU-resident)
- Python: 3.12+
- CUDA: 12.8+
Weights are pulled from these Hugging Face repositories:
| Repository | Contents |
|---|---|
| Kijai/LTX2.3_comfy | Transformer (22B distilled v3 FP8), text projection, video VAE, audio VAE (includes vocoder weights used at decode) |
| Comfy-Org/ltx-2 | Gemma 3 12B FP8 text encoder (includes embedded SentencePiece tokenizer) |
| Lightricks/LTX-2.3-22b-IC-LoRA-Union-Control | IC-LoRA Union Control safetensors for video-mode guide conditioning |
The Gemma model architecture config is bundled with this plugin — no separate download from google/gemma-3-12b-it is needed. The tokenizer is extracted at runtime from the FP8 checkpoint's embedded spiece_model tensor.
Follow the manual installation instructions for Scope (plugin support for the desktop app is not available yet).
Install the plugin within the scope directory:
DAYDREAM_SCOPE_PREVIEW=1 uv run daydream-scope install git+https://github.com/daydreamlive/scope-ltx-2.gitConfirm that the plugin is installed:
DAYDREAM_SCOPE_PREVIEW=1 uv run daydream-scope pluginsConfirm that the ltx2 pipeline is available:
DAYDREAM_SCOPE_PREVIEW=1 uv run daydream-scope pipelinesDAYDREAM_SCOPE_PREVIEW=1 uv run daydream-scope install --upgrade git+https://github.com/daydreamlive/scope-ltx-2.gitCreate a HuggingFace access token with read permissions at huggingface.co/settings/tokens, then set:
Windows Command Prompt:
set HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxWindows PowerShell:
$env:HF_TOKEN="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"Unix/Linux/macOS:
export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxTip
Add the export to your shell profile (~/.bashrc, ~/.zshrc, etc.) to persist the token.
uv run daydream-scopeThe web UI defaults to http://localhost:8000. Select pipeline ltx2 in Settings, or load it via the API (load pipeline).
Prefetch weights without the UI:
uv run download_models --pipeline ltx2| Parameter | Default | Description |
|---|---|---|
height / width |
384 / 320 | Output size in pixels (snapped to multiples of 32) |
base_seed |
42 | Base seed when not randomizing |
randomize_seed |
true | New random seed each inference chunk |
num_frames |
129 | Frame count (snapped to 8×K+1; e.g. 9, 17, …, 129) |
num_steps |
8 | Euler denoising steps (1–20) |
schedule |
"distilled" |
Sigma schedule: distilled, linear, cosine, linear_quadratic, beta |
frame_rate |
24.0 | Metadata / output frame rate |
lora_merge_strategy |
"permanent_merge" |
Only permanent_merge is supported for this FP8 model |
i2v_image |
— | Optional path or asset for image-to-video first-frame conditioning |
i2v_strength |
1.0 | 0 = no I2V conditioning, 1 = full first-frame conditioning |
control_strength |
1.0 | Video mode: IC-LoRA guide strength (0 = off, 1 = full) |
ffn_chunk_size |
4096 | FFN chunking for memory (smaller = less VRAM, more overhead; null disables) |
sigmas |
— | API: custom descending sigma list; overrides num_steps and schedule |
LoRAs: Pass loras as a list of { "path": "...", "scale": 1.0 } in the /load body (paths are typically under your models tree). The IC-LoRA file is downloaded automatically; it is merged when guide video conditioning is used, unless you already included that file in loras.
Note
Which parameters appear in the Scope UI depends on your Scope version. Anything exposed in the pipeline JSON schema can be set via /load if needed.
curl -X POST http://localhost:8000/load \
-H "Content-Type: application/json" \
-d '{
"pipeline_id": "ltx2",
"params": {
"height": 384,
"width": 320,
"num_frames": 129,
"num_steps": 8,
"schedule": "distilled",
"randomize_seed": true,
"frame_rate": 24.0,
"ffn_chunk_size": 4096
}
}'Valid counts follow 8×K+1 (minimum 9): 9, 17, 25, 33, … Other values are snapped to the nearest valid count.
- Lower resolution and/or fewer frames
- Lower
ffn_chunk_size(e.g. 2048 or 1024) PYTORCH_CUDA_ALLOC_CONF=expandable_segments:Trueto reduce fragmentation
- No frame-by-frame streaming — each run produces a full clip; latency is batch generation time, not interactive streaming
- LoRA strategy — permanent merge only for FP8 (no runtime PEFT path for this pipeline)
- Quantization — transformer FP8 is fixed by the published checkpoint; there is no separate “pick your dtype” mode in-plugin
Exact Scope UI coverage for every parameter can lag the schema; use /load when a control is not in the UI yet.
- Gemma 3 12B FP8 text encoder, aggregate embedding projection
- 22B transformer for joint audio–video denoising (FP8 scaled matmul where applicable)
- Video VAE — 32× spatial, 8× temporal downsampling
- Audio VAE + vocoder — mel decode to waveform (aligned with ComfyUI-style audio stack)
- Euler sampling with configurable sigma schedules; default distilled 8-step schedule when
num_stepsis 8 andscheduleisdistilled - CPU→GPU block streaming — transformer blocks in pinned host memory, double-buffered async copies during denoising
Details: Lightricks LTX-2.
- Reduce
num_frames(e.g. 33 instead of 129) - Reduce resolution
- Reduce
ffn_chunk_size - Close other GPU workloads
Invalid token: set HF_TOKEN correctly; token needs at least read access.
Repository not found: confirm HF account email verification if required.
General: check network; huggingface-cli login to verify the token.
Generation time scales with frames, resolution, and GPU/PCIe throughput (weight streaming).
This plugin is licensed under the same terms as the LTX-2 model.
- Lightricks for LTX-2
- Kijai for separated Comfy-format FP8 checkpoints
- ComfyUI for patterns this plugin adapts
- Daydream for Scope