scope-ltx-2

Scope plugin providing the LTX 2.3 audio-video generation pipeline from Lightricks.

LTX 2.3 is a 22B-parameter DiT (Diffusion Transformer) that generates synchronized video and audio from text prompts. This plugin uses ComfyUI-derived model loading and inference code with Kijai's separated FP8 checkpoints, enabling it to run on a single 24GB GPU.

Important

Plugin support is a preview feature in Scope; APIs may change before a stable release. Use a Scope build that includes plugin support (manual installation).

Features

Audio–video generation — synchronized video and audio from text prompts
Modes — text (default) and video; video mode supports IC-LoRA guide conditioning using a control video (e.g. depth / canny / pose) on the graph video input
Image-to-video (I2V) — optional reference image (i2v_image) with adjustable I2V strength (i2v_strength) to condition the first frame; set strength to 0 for pure text-to-video
LoRA — permanent merge at load into the FP8 transformer (dequantize → merge → requantize); zero runtime LoRA overhead; compatible with block streaming. Additional LoRA files go under your Scope models directory (e.g. models/lora/) and are selected via the pipeline LoRA UI or loras in /load
IC-LoRA Union Control — weights from Lightricks/LTX-2.3-22b-IC-LoRA-Union-Control are downloaded with the pipeline; they are merged when you use video-mode guide conditioning (or you can list the safetensors explicitly in loras). Use control strength (control_strength) to blend the guide
Sampling — default 8-step distilled Euler schedule; optional schedules: linear, cosine, linear_quadratic, beta; configurable denoising steps (num_steps). Advanced: custom sigmas list (API) overrides step count and schedule
Output constraints — height/width snapped to 32-pixel multiples; frame count snapped to 8×K+1 (minimum 9)
Runs on 24GB GPUs — FP8 weights in checkpoints, CPU-resident transformer blocks with double-buffered streaming to GPU
Configurable output — resolution, frame count, frame rate, seed / randomize seed per chunk, FFN chunk size for memory tuning

Requirements

VRAM: ~22GB (24GB GPU recommended, e.g. RTX 4090 / A5000)
- Gemma 3 12B FP8 text encoder: ~13GB (offloaded after encoding)
- Transformer 22B FP8: ~23GB total, CPU-resident with block streaming
- Video VAE + Audio VAE + vocoder: ~1GB (GPU-resident)
Python: 3.12+
CUDA: 12.8+

Supported Models

Weights are pulled from these Hugging Face repositories:

Repository	Contents
Kijai/LTX2.3_comfy	Transformer (22B distilled v3 FP8), text projection, video VAE, audio VAE (includes vocoder weights used at decode)
Comfy-Org/ltx-2	Gemma 3 12B FP8 text encoder (includes embedded SentencePiece tokenizer)
Lightricks/LTX-2.3-22b-IC-LoRA-Union-Control	IC-LoRA Union Control safetensors for video-mode guide conditioning

The Gemma model architecture config is bundled with this plugin — no separate download from google/gemma-3-12b-it is needed. The tokenizer is extracted at runtime from the FP8 checkpoint's embedded spiece_model tensor.

Install

Follow the manual installation instructions for Scope (plugin support for the desktop app is not available yet).

Install the plugin within the scope directory:

DAYDREAM_SCOPE_PREVIEW=1 uv run daydream-scope install git+https://github.com/daydreamlive/scope-ltx-2.git

Confirm that the plugin is installed:

DAYDREAM_SCOPE_PREVIEW=1 uv run daydream-scope plugins

Confirm that the ltx2 pipeline is available:

DAYDREAM_SCOPE_PREVIEW=1 uv run daydream-scope pipelines

Upgrade

DAYDREAM_SCOPE_PREVIEW=1 uv run daydream-scope install --upgrade git+https://github.com/daydreamlive/scope-ltx-2.git

Usage

Step 1: Configure HuggingFace Token

Create a HuggingFace access token with read permissions at huggingface.co/settings/tokens, then set:

Windows Command Prompt:

set HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Windows PowerShell:

$env:HF_TOKEN="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

Unix/Linux/macOS:

export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Tip

Add the export to your shell profile (~/.bashrc, ~/.zshrc, etc.) to persist the token.

Step 2: Run Scope

uv run daydream-scope

The web UI defaults to http://localhost:8000. Select pipeline ltx2 in Settings, or load it via the API (load pipeline).

Prefetch weights without the UI:

uv run download_models --pipeline ltx2

Configuration Options

Parameter	Default	Description
`height` / `width`	384 / 320	Output size in pixels (snapped to multiples of 32)
`base_seed`	42	Base seed when not randomizing
`randomize_seed`	true	New random seed each inference chunk
`num_frames`	129	Frame count (snapped to 8×K+1; e.g. 9, 17, …, 129)
`num_steps`	8	Euler denoising steps (1–20)
`schedule`	`"distilled"`	Sigma schedule: `distilled`, `linear`, `cosine`, `linear_quadratic`, `beta`
`frame_rate`	24.0	Metadata / output frame rate
`lora_merge_strategy`	`"permanent_merge"`	Only `permanent_merge` is supported for this FP8 model
`i2v_image`	—	Optional path or asset for image-to-video first-frame conditioning
`i2v_strength`	1.0	0 = no I2V conditioning, 1 = full first-frame conditioning
`control_strength`	1.0	Video mode: IC-LoRA guide strength (0 = off, 1 = full)
`ffn_chunk_size`	4096	FFN chunking for memory (smaller = less VRAM, more overhead; `null` disables)
`sigmas`	—	API: custom descending sigma list; overrides `num_steps` and `schedule`

LoRAs: Pass loras as a list of { "path": "...", "scale": 1.0 } in the /load body (paths are typically under your models tree). The IC-LoRA file is downloaded automatically; it is merged when guide video conditioning is used, unless you already included that file in loras.

Note

Which parameters appear in the Scope UI depends on your Scope version. Anything exposed in the pipeline JSON schema can be set via /load if needed.

Example `/load` body

curl -X POST http://localhost:8000/load \
  -H "Content-Type: application/json" \
  -d '{
    "pipeline_id": "ltx2",
    "params": {
      "height": 384,
      "width": 320,
      "num_frames": 129,
      "num_steps": 8,
      "schedule": "distilled",
      "randomize_seed": true,
      "frame_rate": 24.0,
      "ffn_chunk_size": 4096
    }
  }'

Frame count

Valid counts follow 8×K+1 (minimum 9): 9, 17, 25, 33, … Other values are snapped to the nearest valid count.

Memory optimization

Lower resolution and/or fewer frames
Lower ffn_chunk_size (e.g. 2048 or 1024)
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to reduce fragmentation

Limitations

No frame-by-frame streaming — each run produces a full clip; latency is batch generation time, not interactive streaming
LoRA strategy — permanent merge only for FP8 (no runtime PEFT path for this pipeline)
Quantization — transformer FP8 is fixed by the published checkpoint; there is no separate “pick your dtype” mode in-plugin

Exact Scope UI coverage for every parameter can lag the schema; use /load when a control is not in the UI yet.

Architecture

Gemma 3 12B FP8 text encoder, aggregate embedding projection
22B transformer for joint audio–video denoising (FP8 scaled matmul where applicable)
Video VAE — 32× spatial, 8× temporal downsampling
Audio VAE + vocoder — mel decode to waveform (aligned with ComfyUI-style audio stack)
Euler sampling with configurable sigma schedules; default distilled 8-step schedule when num_steps is 8 and schedule is distilled
CPU→GPU block streaming — transformer blocks in pinned host memory, double-buffered async copies during denoising

Details: Lightricks LTX-2.

Troubleshooting

Out of memory (OOM)

Reduce num_frames (e.g. 33 instead of 129)
Reduce resolution
Reduce ffn_chunk_size
Close other GPU workloads

Model download fails

Invalid token: set HF_TOKEN correctly; token needs at least read access.

Repository not found: confirm HF account email verification if required.

General: check network; huggingface-cli login to verify the token.

Slow generation

Generation time scales with frames, resolution, and GPU/PCIe throughput (weight streaming).

License

This plugin is licensed under the same terms as the LTX-2 model.

Acknowledgments

Lightricks for LTX-2
Kijai for separated Comfy-format FP8 checkpoints
ComfyUI for patterns this plugin adapts
Daydream for Scope

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
scope_ltx_2		scope_ltx_2
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scope-ltx-2

Features

Requirements

Supported Models

Install

Upgrade

Usage

Step 1: Configure HuggingFace Token

Step 2: Run Scope

Configuration Options

Example `/load` body

Frame count

Memory optimization

Limitations

Architecture

Troubleshooting

Out of memory (OOM)

Model download fails

Slow generation

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

scope-ltx-2

Features

Requirements

Supported Models

Install

Upgrade

Usage

Step 1: Configure HuggingFace Token

Step 2: Run Scope

Configuration Options

Example /load body

Frame count

Memory optimization

Limitations

Architecture

Troubleshooting

Out of memory (OOM)

Model download fails

Slow generation

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Example `/load` body

Packages