omni

llm-scaler-omni

Getting Started with Omni Docker Image

Pull docker image from dockerhub:

docker pull intel/llm-scaler-omni:0.1.0-b6

Or build docker image:

bash build.sh

Run docker image:

export DOCKER_IMAGE=intel/llm-scaler-omni:0.1.0-b6
export CONTAINER_NAME=comfyui
export MODEL_DIR=<your_model_dir>
export COMFYUI_MODEL_DIR=<your_comfyui_model_dir>
sudo docker run -itd \
        --privileged \
        --net=host \
        --device=/dev/dri \
        -e no_proxy=localhost,127.0.0.1 \
        --name=$CONTAINER_NAME \
        -v $MODEL_DIR:/llm/models/ \
        -v $COMFYUI_MODEL_DIR:/llm/ComfyUI/models \
        --shm-size="64g" \
        --entrypoint=/bin/bash \
        $DOCKER_IMAGE

docker exec -it comfyui bash

ComfyUI

📖 Detailed Documentation: See ComfyUI Detailed Guide for complete model configuration, directory structure, and official reference links. 中文文档

Starting ComfyUI

cd /llm/ComfyUI

export http_proxy=<your_proxy>
export https_proxy=<your_proxy>
export no_proxy=localhost,127.0.0.1

python3 main.py --listen 0.0.0.0

Then you can access the webUI at http://<your_local_ip>:8188/.

(Optional) Preview settings for ComfyUI

Click the button on the top-right corner to launch ComfyUI Manager.

Modify the Preview method to show the preview image during sampling iterations.

Supported Models

The following models are supported in ComfyUI workflows. For detailed model files and directory structure, see the ComfyUI Guide.

Model Category	Model Name	Type	Workflow Files
Image Generation	Qwen-Image, Qwen-Image-Edit, Qwen-Image-Edit-2511	Text-to-Image, Image Editing	`image_qwen_image.json`, `image_qwen_image_2512.json`, `image_qwen_image_distill.json`, `image_qwen_image_edit.json`, `image_qwen_image_edit_2509.json`, `image_qwen_image_edit_2511.json`, `image_qwen_image_layered.json`
Image Generation	Stable Diffusion 3.5	Text-to-Image, ControlNet	`image_sd3.5_simple_example.json`, `image_sd3.5_midium.json`, `image_sd3.5_large_canny_controlnet_example.json`
Image Generation	Z-Image-Turbo	Text-to-Image	`image_z_image_turbo.json`
Image Generation	Flux.1, Flux.1 Kontext dev	Text-to-Image, Multi-Image Reference, ControlNet	`image_flux_kontext_dev_basic.json`, `image_flux_controlnet_example.json`
Image Generation	FireRed-Image-Edit-1.1	Image Editing	`image_firered_image_edit_1.1.json`
Video Generation	Wan2.2 TI2V 5B, Wan2.2 T2V 14B, Wan2.2 I2V 14B	Text-to-Video, Image-to-Video	`video_wan2_2_5B_ti2v.json`, `video_wan2_2_14B_t2v.json`, `video_wan2_2_14B_t2v_rapid_aio_multi_xpu.json`, `video_wan2.2_14B_i2v_rapid_aio_multi_xpu.json`
Video Generation	Wan2.2 Animate 14B	Video Animation	`video_wan2_2_animate_basic.json`
Video Generation	HunyuanVideo 1.5 8.3B	Text-to-Video, Image-to-Video	`video_hunyuan_video_1.5_t2v.json`, `video_hunyuan_video_1.5_i2v.json`, `video_hunyuan_video_1.5_i2v_multi_xpu.json`
Video Generation	LTX-2 T2V 19B, LTX-2 I2V 19B,	Text-to-Video, Image-to-Video	`video_ltx2_19B_t2v.json`, `video_ltx2_19B_i2v.json`, `video_ltx_2_19B_t2v_distilled.json`, `video_ltx_2_19B_i2v_distilled.json`
3D Generation	Hunyuan3D 2.1	Text/Image-to-3D	`3d_hunyuan3d.json`
Audio Generation	VoxCPM1.5, IndexTTS 2	Text-to-Speech, Voice Cloning	`audio_VoxCPM_example.json`, `audio_indextts2.json`
Video Upscaling	SeedVR2, FlashVSR-v1.1	Video Restoration and Upscaling	`video_upscale_SeedVR2.json`, `video_upscale_FlashVSR.json`

Enabling Optional Nodes

Some nodes are disabled by default to save resources. To use SeedVR2, FlashVSR Hunyuan3D, VoxCPM, IndexTTS, or HY-Motion1, you can enable them using ComfyUI Manager:

Click the Manager button in the ComfyUI menu.
In the Manager window, use the Filter dropdown to select Disabled.
Locate the node you want to enable (e.g., IndexTTS, VoxCPM) and click Enable.
Restart ComfyUI and refresh the page to apply changes.

Cache-DiT & torch.compile Acceleration

Cache-DiT accelerates diffusion model inference by caching and reusing intermediate DiT block outputs across denoising steps, skipping redundant computation without retraining. Combined with torch.compile, it provides further speedup through graph-level kernel fusion. The ComfyUI integration is powered by ComfyUI-CacheDiT.

The table below shows a comparison on Z-Image-Turbo across three configurations:

No Acceleration	Cache-DiT	torch.compile	Cache-DiT + torch.compile

Baseline	~1.5x speedup	~1.45x speedup	~2.2x speedup

Usage in ComfyUI

Insert the acceleration node(s) between the model loader and the sampler.

Cache-DiT only — add ⚡ CacheDit Accelerator after the model loader:

torch.compile only — add TorchCompileModel after the model loader:

Cache-DiT + torch.compile — chain TorchCompileModel after ⚡ CacheDit Accelerator:

Note: Cache-DiT is best suited for high step-count workflows (≥ 8 steps). torch.compile is supported in the intel/llm-scaler-omni Linux Docker image only and incurs a one-time warm-up cost on the first run.

Cache-DiT Supported Models

Category	Models
Image	Z-Image, Z-Image-Turbo, Qwen-Image-2512, Flux.2 Klein 4B / 9B
Video	LTX-2 T2V / I2V, Wan2.2 14B T2V / I2V

ComfyUI Workflows

On the left side of the web UI, you can find the workflows logo to load and manage workflows.

All workflow files are available in the workflows/ directory. Below are detailed descriptions of supported workflows organized by category.

Image Generation Workflows

📖 Detailed Documentation: For model files, directory structure and download links, see Image Generation Models.

Qwen-Image

ComfyUI tutorial: https://docs.comfy.org/tutorials/image/qwen/qwen-image

Available Workflows:

image_qwen_image.json (official): Native Qwen-Image workflow for text-to-image generation
image_qwen_image_2512.json (official): Significant improvements in image quality and realism
image_qwen_image_distill.json (official): Distilled version with better performance (recommended)
image_qwen_image_layered.json (official): Layered image generation workflow

Note: Use fp8 format for all diffusion models to optimize memory usage and performance. It's recommended to use the distilled version for better performance.

Q: What should I do if I encounter Out of Memory (OOM) errors?

A: You can try the following solutions:

Add --disable-smart-memory parameter when starting ComfyUI.

If the OOM issue persists, you can try adding --reserve-vram 4 parameter to reserve more VRAM.

Qwen-Image-Edit

ComfyUI tutorial: https://docs.comfy.org/tutorials/image/qwen/qwen-image-edit

Available Workflows:

image_qwen_image_edit.json (official): Standard image editing workflow
image_qwen_image_edit_2511.json (official): Multi-image reference editing workflow (Edit Plus)

These workflows enable image editing based on text prompts, allowing you to modify existing images. The 2511 version supports multi-image reference for advanced editing scenarios like material transfer.

Note: Use fp8 format for all diffusion models to optimize memory usage and performance.

Stable Diffusion 3.5

ComfyUI tutorial: https://comfyanonymous.github.io/ComfyUI_examples/sd3/

Available Workflows:

image_sd3.5_simple_example.json: Simple text-to-image workflow
image_sd3.5_midium.json: Medium model variant
image_sd3.5_large_canny_controlnet_example.json: Large model with Canny edge ControlNet for precise control

Stable Diffusion 3.5 provides high-quality text-to-image generation with optional ControlNet support for guided generation.

Z-Image-Turbo

Comfyui tutorial: https://docs.comfy.org/tutorials/image/z-image/z-image-turbo

Available Workflows:

image_z_image_turbo.json (official): Basic workflow for text-to-image generation

Flux.1 Kontext Dev

ComfyUI tutorial: https://docs.comfy.org/tutorials/flux/flux-1-kontext-dev

Available Workflows:

image_flux_kontext_dev_basic.json: Basic workflow with multi-image reference support

FireRed-Image-Edit-1.1

HuggingFace: https://huggingface.co/FireRedTeam/FireRed-Image-Edit-1.1-ComfyUI

Available Workflows:

image_firered_image_edit_1.1.json: Multi-image reference image editing workflow with optional Lightning LoRA acceleration

Video Generation Workflows

📖 Detailed Documentation: For model files, directory structure and download links, see Video Generation Models.

Wan2.2

ComfyUI tutorial: https://docs.comfy.org/tutorials/video/wan/wan2_2

Available Workflows:

video_wan2_2_5B_ti2v.json (official): Text+Image-to-Video with 5B model
video_wan2_2_14B_t2v.json (official): Text-to-Video with 14B model
video_wan2_2_14B_i2v.json (official): Image-to-Video with 14B model
video_wan2_2_14B_t2v_rapid_aio_multi_xpu.json: 14B Text-to-Video with multi-XPU support (using raylight)
video_wan2.2_14B_i2v_rapid_aio_multi_xpu.json: 14B Image-to-Video with multi-XPU support

Multi-XPU Support with Raylight:

For workflows using WAN2.2-14B-Rapid-AllInOne with raylight for faster inference with multi-XPU support:

Steps to Complete Multi-XPU Workflows:

Model Loading
- Ensure the Load Diffusion Model (Ray) node loads the diffusion model part from WAN2.2-14B-Rapid-AllInOne
- Ensure the Load VAE node loads the VAE part from WAN2.2-14B-Rapid-AllInOne
- Ensure the Load CLIP node loads umt5_xxl_fp8_e4m3fn_scaled.safetensors
Ray Configuration
- Set the GPU and ulysses_degree in Ray Init Actor node to the number of GPUs you want to use
Run the Workflow
- Click the Run button or use the shortcut Ctrl(cmd) + Enter to run the workflow

Note: Model weights can be obtained from ModelScope. You may need to extract the unet and VAE parts separately using tools/extract.py.

Wan2.2 Animate 14B

Available Workflows:

video_wan2_2_animate_basic.json: Video animation workflow with control video support

This is a separate model from the standard Wan2.2 T2V/I2V models, designed specifically for video animation with control video inputs.

HunyuanVideo 1.5 8.3B

ComfyUI tutorial: https://docs.comfy.org/tutorials/video/hunyuan/hunyuan-video-1-5

Available Workflows:

video_hunyuan_video_1.5_t2v.json: Basic workflow for Text-to-Video generation
video_hunyuan_video_1.5_i2v.json: Basic workflow for Image-to-Video generation
video_hunyuan_video_1.5_i2v_multi_xpu.json: 8.3B Image-to-Video multi-XPU support with raylight

The default parameter configurations of these workflows are optimized for 480p FP8 Image-to-Video.

LTX-2

ComfyUI tutorial: https://blog.comfy.org/p/ltx-2-open-source-audio-video-ai

Available Workflows:

video_ltx2_19B_t2v.json (official): Text to Video with motion, dialogue, SFX, and music
video_ltx2_19B_i2v.json (official): Image to Video with motion, dialogue, SFX, and music
video_ltx_2_19B_t2v_distilled.json: Distilled Text-to-Video workflow
video_ltx_2_19B_i2v_distilled.json: Distilled Image-to-Video workflow

Note: Model weights of distilled workflow can be obtained from Kijai/LTXV2_comfy.

3D Generation Workflows

📖 Detailed Documentation: For model configuration details, see 3D Generation Models.

Hunyuan3D

Available Workflows:

3d_hunyuan3d.json: Text/Image-to-3D mesh generation

This workflow generates 3D models from text descriptions or images using the Hunyuan3D model.

Video Upscaling Workflows

📖 Detailed Documentation: For model configuration details, see Video Upscale Models.

SeedVR2

GitHub: https://github.com/numz/ComfyUI-SeedVR2_VideoUpscaler

Available Workflows:

video_upscale_SeedVR2.json: Video restoration and upscaling workflow

This workflow uses SeedVR2, a diffusion-based video super-resolution model, to upscale and restore video quality.

FlashVSR

GitHub: https://github.com/lihaoyun6/ComfyUI-FlashVSR_Ultra_Fast

Available Workflows:

video_upscale_FlashVSR.json: Video restoration and upscaling workflow

This workflow uses FlashVSR-v1.1, a diffusion-based video super-resolution model, to upscale and restore video quality.

Audio Generation Workflows

📖 Detailed Documentation: For model files and setup instructions, see Audio Generation Models.

VoxCPM1.5

Available Workflows:

audio_VoxCPM_example.json: Text-to-Speech synthesis

This workflow generates speech audio from text input using the VoxCPM1.5 or VoxCPM model.

IndexTTS 2

Available Workflows:

audio_indextts2.json: Voice cloning

This workflow synthesizes new speech using a single reference audio file for voice cloning.

Usage Steps:

Prepare Models

Download the following models and place them in the <your comfyui model path>/TTS directory:

IndexTeam/IndexTTS-2
nvidia/bigvgan_v2_22khz_80band_256x
funasr/campplus
amphion/MaskGCT
facebook/w2v-bert-2.0

Ensure your file structure matches the following hierarchy:

TTS/
├── bigvgan_v2_22khz_80band_256x/
│   ├── bigvgan_generator.pt
│   └── config.json
├── campplus/
│   └── campplus_cn_common.bin
├── IndexTTS-2/
│   ├── .gitattributes
│   ├── bpe.model
│   ├── config.yaml
│   ├── feat1.pt
│   ├── feat2.pt
│   ├── gpt.pth
│   ├── README.md
│   ├── s2mel.pth
│   ├── wav2vec2bert_stats.pt
│   └── qwen0.6bemo4-merge/
│       ├── added_tokens.json
│       ├── chat_template.jinja
│       ├── config.json
│       ├── generation_config.json
│       ├── merges.txt
│       ├── model.safetensors
│       ├── Modelfile
│       ├── special_tokens_map.json
│       ├── tokenizer.json
│       ├── tokenizer_config.json
│       └── vocab.json
├── MaskGCT/
│   └── semantic_codec/
│       └── model.safetensors
└── w2v-bert-2.0/
    ├── .gitattributes
    ├── config.json
    ├── conformer_shaw.pt
    ├── model.safetensors
    ├── preprocessor_config.json
    └── README.md

Configure Workflow
- Load the reference audio file.
- Set the desired input text.
Run the Workflow
- Execute the workflow to generate the speech.

SGLang Diffusion (experimental)

📖 Detailed Documentation: See SGLang Diffusion Guide for complete server configuration, API reference, and multi-GPU setup. For ComfyUI integration, see SGLang Diffusion ComfyUI Guide.

SGLang Diffusion provides OpenAI-compatible API for image/video generation models.

1. CLI Generation

sglang generate --model-path /llm/models/Wan2.1-T2V-1.3B-Diffusers \
    --text-encoder-cpu-offload --pin-cpu-memory \
    --prompt "A curious raccoon" \
    --save-output

2. OpenAI API Server

Start the server:

# Configure proxy if needed
export http_proxy=<your_http_proxy>
export https_proxy=<your_https_proxy>
export no_proxy=localhost,127.0.0.1

# Start server
sglang serve --model-path /llm/models/Z-Image-Turbo/ \
    --vae-cpu-offload --pin-cpu-memory \
    --num-gpus 1 --port 30010

Or use the provided script:

bash /llm/entrypoints/start_sgl_diffusion.sh

cURL example:

curl http://localhost:30010/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Z-Image-Turbo",
    "prompt": "A beautiful sunset over the ocean",
    "size": "1024x1024"
  }'

Python example (OpenAI SDK):

from openai import OpenAI
import base64

client = OpenAI(base_url="http://localhost:30010/v1", api_key="EMPTY")

response = client.images.generate(
    model="Z-Image-Turbo",
    prompt="A beautiful sunset over the ocean",
    size="1024x1024",
)

# Save image from base64 response
with open("output.png", "wb") as f:
    f.write(base64.b64decode(response.data[0].b64_json))

XInference

xinference-local --host 0.0.0.0 --port 9997

Supported models:

Stable Diffusion 3.5 Medium
Kokoro 82M
whisper large v3

WebUI Usage

1. Access Xinference Web UI

2. Select model and configure `model_path`

3. Find running model and launch Gradio UI for this model

4. Generate within Gradio UI

OpenAI API Usage

Visit http://127.0.0.1:9997/docs to inspect the API docs.

1. Launch API service

You can select model and launch service via WebUI (refer to here) or by command:

xinference-local --host 0.0.0.0 --port 9997

xinference launch --model-name sd3.5-medium --model-type image --model-path /llm/models/stable-diffusion-3.5-medium/ --gpu-idx 0

2. Post request in OpenAI API format

For TTS model (Kokoro 82M for example):

curl http://localhost:9997/v1/audio/speech   -H "Content-Type: application/json"   -d '{
    "model": "Kokoro-82M",
    "input": "kokoro, hello, I am kokoro." 
  }'   --output output.wav

For STT models (whisper large v3 for example):

AUDIO_FILE_PATH=<your_audio_file_path>

curl -X 'POST' \
  "http://localhost:9997/v1/audio/translations" \
  -H 'accept: application/json' \
  -F "model=whisper-large-v3" \
  -F "file=@${AUDIO_FILE_PATH}"

{"text":" Cacaro's hello, I am Cacaro."}

For text-to-image models (Stable Diffusion 3.5 Medium for example):

curl http://localhost:9997/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sd3.5-medium",
    "prompt": "A Shiba Inu chasing butterflies on a sunny grassy field, cartoon style, with vibrant colors.",
    "n": 1,
    "size": "1024x1024",
    "quality": "standard",
    "response_format": "url"
  }'

Stand-alone Examples

Notes: Stand-alone examples are excluded from intel/llm-scaler-omni image.

Supported models:

Hunyuan3D 2.1
Qwen Image
Wan 2.1 / 2.2

ComfyUI for Windows (experimental)

We have provided a conda-install method to use llm-scaler-omni version ComfyUI on Windows.

git clone https://github.com/intel/llm-scaler.git
cd llm-scaler\omni\
.\init_conda_env.bat

After installation, you can enter the ComfyUI directory and start ComfyUI server.

cd ComfyUI
conda activate omni_env
$env:HTTP_PROXY = <your_proxy>
$env:HTTPS_PROXY = <your_proxy>
python .\main.py --listen 0.0.0.0

Name		Name	Last commit message	Last commit date
parent directory ..
assets		assets
benchmarks		benchmarks
comfyui_windows_setup		comfyui_windows_setup
docker		docker
docs		docs
entrypoints		entrypoints
example_inputs		example_inputs
omni_xpu_kernel		omni_xpu_kernel
patches		patches
standalone_examples		standalone_examples
tools		tools
workflows		workflows
README.md		README.md
build.sh		build.sh
init_conda_env.bat		init_conda_env.bat

FilesExpand file tree

omni

Directory actions

More options

Directory actions

More options

Latest commit

History

omni

Folders and files

parent directory

README.md

llm-scaler-omni

Table of Contents

Getting Started with Omni Docker Image

ComfyUI

Starting ComfyUI

(Optional) Preview settings for ComfyUI

Supported Models

Enabling Optional Nodes

Cache-DiT & torch.compile Acceleration

Usage in ComfyUI

Cache-DiT Supported Models

ComfyUI Workflows

Image Generation Workflows

Qwen-Image

Qwen-Image-Edit

Stable Diffusion 3.5

Z-Image-Turbo

Flux.1 Kontext Dev

FireRed-Image-Edit-1.1

Video Generation Workflows

Wan2.2

Wan2.2 Animate 14B

HunyuanVideo 1.5 8.3B

LTX-2

3D Generation Workflows

Hunyuan3D

Video Upscaling Workflows

SeedVR2

FlashVSR

Audio Generation Workflows

VoxCPM1.5

IndexTTS 2

SGLang Diffusion (experimental)

1. CLI Generation

2. OpenAI API Server

XInference

WebUI Usage

1. Access Xinference Web UI

2. Select model and configure model_path

3. Find running model and launch Gradio UI for this model

4. Generate within Gradio UI

OpenAI API Usage

1. Launch API service

2. Post request in OpenAI API format

Stand-alone Examples

ComfyUI for Windows (experimental)

2. Select model and configure `model_path`