Skip to content

daydreamlive/scope-ltx-2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scope-ltx-2

Available on Daydream

Discord

Scope plugin providing the LTX 2.3 audio-video generation pipeline from Lightricks.

LTX 2.3 is a 22B-parameter DiT (Diffusion Transformer) that generates synchronized video and audio from text prompts. This plugin uses ComfyUI-derived model loading and inference code with Kijai's separated FP8 checkpoints, enabling it to run on a single 24GB GPU.

Important

Plugin support is a preview feature in Scope; APIs may change before a stable release. Use a Scope build that includes plugin support (manual installation).

Features

  • Audio–video generation — synchronized video and audio from text prompts
  • Modestext (default) and video; video mode supports IC-LoRA guide conditioning using a control video (e.g. depth / canny / pose) on the graph video input
  • Image-to-video (I2V) — optional reference image (i2v_image) with adjustable I2V strength (i2v_strength) to condition the first frame; set strength to 0 for pure text-to-video
  • LoRApermanent merge at load into the FP8 transformer (dequantize → merge → requantize); zero runtime LoRA overhead; compatible with block streaming. Additional LoRA files go under your Scope models directory (e.g. models/lora/) and are selected via the pipeline LoRA UI or loras in /load
  • IC-LoRA Union Control — weights from Lightricks/LTX-2.3-22b-IC-LoRA-Union-Control are downloaded with the pipeline; they are merged when you use video-mode guide conditioning (or you can list the safetensors explicitly in loras). Use control strength (control_strength) to blend the guide
  • Sampling — default 8-step distilled Euler schedule; optional schedules: linear, cosine, linear_quadratic, beta; configurable denoising steps (num_steps). Advanced: custom sigmas list (API) overrides step count and schedule
  • Output constraints — height/width snapped to 32-pixel multiples; frame count snapped to 8×K+1 (minimum 9)
  • Runs on 24GB GPUs — FP8 weights in checkpoints, CPU-resident transformer blocks with double-buffered streaming to GPU
  • Configurable output — resolution, frame count, frame rate, seed / randomize seed per chunk, FFN chunk size for memory tuning

Requirements

  • VRAM: ~22GB (24GB GPU recommended, e.g. RTX 4090 / A5000)
    • Gemma 3 12B FP8 text encoder: ~13GB (offloaded after encoding)
    • Transformer 22B FP8: ~23GB total, CPU-resident with block streaming
    • Video VAE + Audio VAE + vocoder: ~1GB (GPU-resident)
  • Python: 3.12+
  • CUDA: 12.8+

Supported Models

Weights are pulled from these Hugging Face repositories:

Repository Contents
Kijai/LTX2.3_comfy Transformer (22B distilled v3 FP8), text projection, video VAE, audio VAE (includes vocoder weights used at decode)
Comfy-Org/ltx-2 Gemma 3 12B FP8 text encoder (includes embedded SentencePiece tokenizer)
Lightricks/LTX-2.3-22b-IC-LoRA-Union-Control IC-LoRA Union Control safetensors for video-mode guide conditioning

The Gemma model architecture config is bundled with this plugin — no separate download from google/gemma-3-12b-it is needed. The tokenizer is extracted at runtime from the FP8 checkpoint's embedded spiece_model tensor.

Install

Follow the manual installation instructions for Scope (plugin support for the desktop app is not available yet).

Install the plugin within the scope directory:

DAYDREAM_SCOPE_PREVIEW=1 uv run daydream-scope install git+https://github.com/daydreamlive/scope-ltx-2.git

Confirm that the plugin is installed:

DAYDREAM_SCOPE_PREVIEW=1 uv run daydream-scope plugins

Confirm that the ltx2 pipeline is available:

DAYDREAM_SCOPE_PREVIEW=1 uv run daydream-scope pipelines

Upgrade

DAYDREAM_SCOPE_PREVIEW=1 uv run daydream-scope install --upgrade git+https://github.com/daydreamlive/scope-ltx-2.git

Usage

Step 1: Configure HuggingFace Token

Create a HuggingFace access token with read permissions at huggingface.co/settings/tokens, then set:

Windows Command Prompt:

set HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Windows PowerShell:

$env:HF_TOKEN="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

Unix/Linux/macOS:

export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Tip

Add the export to your shell profile (~/.bashrc, ~/.zshrc, etc.) to persist the token.

Step 2: Run Scope

uv run daydream-scope

The web UI defaults to http://localhost:8000. Select pipeline ltx2 in Settings, or load it via the API (load pipeline).

Prefetch weights without the UI:

uv run download_models --pipeline ltx2

Configuration Options

Parameter Default Description
height / width 384 / 320 Output size in pixels (snapped to multiples of 32)
base_seed 42 Base seed when not randomizing
randomize_seed true New random seed each inference chunk
num_frames 129 Frame count (snapped to 8×K+1; e.g. 9, 17, …, 129)
num_steps 8 Euler denoising steps (1–20)
schedule "distilled" Sigma schedule: distilled, linear, cosine, linear_quadratic, beta
frame_rate 24.0 Metadata / output frame rate
lora_merge_strategy "permanent_merge" Only permanent_merge is supported for this FP8 model
i2v_image Optional path or asset for image-to-video first-frame conditioning
i2v_strength 1.0 0 = no I2V conditioning, 1 = full first-frame conditioning
control_strength 1.0 Video mode: IC-LoRA guide strength (0 = off, 1 = full)
ffn_chunk_size 4096 FFN chunking for memory (smaller = less VRAM, more overhead; null disables)
sigmas API: custom descending sigma list; overrides num_steps and schedule

LoRAs: Pass loras as a list of { "path": "...", "scale": 1.0 } in the /load body (paths are typically under your models tree). The IC-LoRA file is downloaded automatically; it is merged when guide video conditioning is used, unless you already included that file in loras.

Note

Which parameters appear in the Scope UI depends on your Scope version. Anything exposed in the pipeline JSON schema can be set via /load if needed.

Example /load body

curl -X POST http://localhost:8000/load \
  -H "Content-Type: application/json" \
  -d '{
    "pipeline_id": "ltx2",
    "params": {
      "height": 384,
      "width": 320,
      "num_frames": 129,
      "num_steps": 8,
      "schedule": "distilled",
      "randomize_seed": true,
      "frame_rate": 24.0,
      "ffn_chunk_size": 4096
    }
  }'

Frame count

Valid counts follow 8×K+1 (minimum 9): 9, 17, 25, 33, … Other values are snapped to the nearest valid count.

Memory optimization

  1. Lower resolution and/or fewer frames
  2. Lower ffn_chunk_size (e.g. 2048 or 1024)
  3. PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to reduce fragmentation

Limitations

  • No frame-by-frame streaming — each run produces a full clip; latency is batch generation time, not interactive streaming
  • LoRA strategypermanent merge only for FP8 (no runtime PEFT path for this pipeline)
  • Quantization — transformer FP8 is fixed by the published checkpoint; there is no separate “pick your dtype” mode in-plugin

Exact Scope UI coverage for every parameter can lag the schema; use /load when a control is not in the UI yet.

Architecture

  • Gemma 3 12B FP8 text encoder, aggregate embedding projection
  • 22B transformer for joint audio–video denoising (FP8 scaled matmul where applicable)
  • Video VAE — 32× spatial, 8× temporal downsampling
  • Audio VAE + vocoder — mel decode to waveform (aligned with ComfyUI-style audio stack)
  • Euler sampling with configurable sigma schedules; default distilled 8-step schedule when num_steps is 8 and schedule is distilled
  • CPU→GPU block streaming — transformer blocks in pinned host memory, double-buffered async copies during denoising

Details: Lightricks LTX-2.

Troubleshooting

Out of memory (OOM)

  1. Reduce num_frames (e.g. 33 instead of 129)
  2. Reduce resolution
  3. Reduce ffn_chunk_size
  4. Close other GPU workloads

Model download fails

Invalid token: set HF_TOKEN correctly; token needs at least read access.

Repository not found: confirm HF account email verification if required.

General: check network; huggingface-cli login to verify the token.

Slow generation

Generation time scales with frames, resolution, and GPU/PCIe throughput (weight streaming).

License

This plugin is licensed under the same terms as the LTX-2 model.

Acknowledgments

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages