Skip to content

Latest commit

Β 

History

History

README.md

llm-scaler-omni


Table of Contents

  1. Getting Started with Omni Docker Image
  2. ComfyUI
  3. SGLang Diffusion
  4. XInference
  5. Stand-alone Examples
  6. ComfyUI for Windows (experimental)

Getting Started with Omni Docker Image

Pull docker image from dockerhub:

docker pull intel/llm-scaler-omni:0.1.0-b6

Or build docker image:

bash build.sh

Run docker image:

export DOCKER_IMAGE=intel/llm-scaler-omni:0.1.0-b6
export CONTAINER_NAME=comfyui
export MODEL_DIR=<your_model_dir>
export COMFYUI_MODEL_DIR=<your_comfyui_model_dir>
sudo docker run -itd \
        --privileged \
        --net=host \
        --device=/dev/dri \
        -e no_proxy=localhost,127.0.0.1 \
        --name=$CONTAINER_NAME \
        -v $MODEL_DIR:/llm/models/ \
        -v $COMFYUI_MODEL_DIR:/llm/ComfyUI/models \
        --shm-size="64g" \
        --entrypoint=/bin/bash \
        $DOCKER_IMAGE

docker exec -it comfyui bash

ComfyUI

πŸ“– Detailed Documentation: See ComfyUI Detailed Guide for complete model configuration, directory structure, and official reference links. δΈ­ζ–‡ζ–‡ζ‘£

Starting ComfyUI

cd /llm/ComfyUI

export http_proxy=<your_proxy>
export https_proxy=<your_proxy>
export no_proxy=localhost,127.0.0.1

python3 main.py --listen 0.0.0.0

Then you can access the webUI at http://<your_local_ip>:8188/.

(Optional) Preview settings for ComfyUI

Click the button on the top-right corner to launch ComfyUI Manager. comfyui_manager_logo

Modify the Preview method to show the preview image during sampling iterations.

comfyui_manager_preview

Supported Models

The following models are supported in ComfyUI workflows. For detailed model files and directory structure, see the ComfyUI Guide.

Model Category Model Name Type Workflow Files
Image Generation Qwen-Image, Qwen-Image-Edit, Qwen-Image-Edit-2511 Text-to-Image, Image Editing image_qwen_image.json, image_qwen_image_2512.json, image_qwen_image_distill.json, image_qwen_image_edit.json, image_qwen_image_edit_2509.json, image_qwen_image_edit_2511.json, image_qwen_image_layered.json
Image Generation Stable Diffusion 3.5 Text-to-Image, ControlNet image_sd3.5_simple_example.json, image_sd3.5_midium.json, image_sd3.5_large_canny_controlnet_example.json
Image Generation Z-Image-Turbo Text-to-Image image_z_image_turbo.json
Image Generation Flux.1, Flux.1 Kontext dev Text-to-Image, Multi-Image Reference, ControlNet image_flux_kontext_dev_basic.json, image_flux_controlnet_example.json
Image Generation FireRed-Image-Edit-1.1 Image Editing image_firered_image_edit_1.1.json
Video Generation Wan2.2 TI2V 5B, Wan2.2 T2V 14B, Wan2.2 I2V 14B Text-to-Video, Image-to-Video video_wan2_2_5B_ti2v.json, video_wan2_2_14B_t2v.json, video_wan2_2_14B_t2v_rapid_aio_multi_xpu.json, video_wan2.2_14B_i2v_rapid_aio_multi_xpu.json
Video Generation Wan2.2 Animate 14B Video Animation video_wan2_2_animate_basic.json
Video Generation HunyuanVideo 1.5 8.3B Text-to-Video, Image-to-Video video_hunyuan_video_1.5_t2v.json, video_hunyuan_video_1.5_i2v.json, video_hunyuan_video_1.5_i2v_multi_xpu.json
Video Generation LTX-2 T2V 19B, LTX-2 I2V 19B, Text-to-Video, Image-to-Video video_ltx2_19B_t2v.json, video_ltx2_19B_i2v.json, video_ltx_2_19B_t2v_distilled.json, video_ltx_2_19B_i2v_distilled.json
3D Generation Hunyuan3D 2.1 Text/Image-to-3D 3d_hunyuan3d.json
Audio Generation VoxCPM1.5, IndexTTS 2 Text-to-Speech, Voice Cloning audio_VoxCPM_example.json, audio_indextts2.json
Video Upscaling SeedVR2, FlashVSR-v1.1 Video Restoration and Upscaling video_upscale_SeedVR2.json, video_upscale_FlashVSR.json

Enabling Optional Nodes

Some nodes are disabled by default to save resources. To use SeedVR2, FlashVSR Hunyuan3D, VoxCPM, IndexTTS, or HY-Motion1, you can enable them using ComfyUI Manager:

  1. Click the Manager button in the ComfyUI menu.
  2. In the Manager window, use the Filter dropdown to select Disabled.
  3. Locate the node you want to enable (e.g., IndexTTS, VoxCPM) and click Enable.
  4. Restart ComfyUI and refresh the page to apply changes.

comfyui_manager_enable

Cache-DiT & torch.compile Acceleration

Cache-DiT accelerates diffusion model inference by caching and reusing intermediate DiT block outputs across denoising steps, skipping redundant computation without retraining. Combined with torch.compile, it provides further speedup through graph-level kernel fusion. The ComfyUI integration is powered by ComfyUI-CacheDiT.

The table below shows a comparison on Z-Image-Turbo across three configurations:

No Acceleration Cache-DiT torch.compile Cache-DiT + torch.compile
Baseline ~1.5x speedup ~1.45x speedup ~2.2x speedup

Usage in ComfyUI

Insert the acceleration node(s) between the model loader and the sampler.

Cache-DiT only β€” add ⚑ CacheDit Accelerator after the model loader:

cachedit_workflow

torch.compile only β€” add TorchCompileModel after the model loader:

compile_workflow

Cache-DiT + torch.compile β€” chain TorchCompileModel after ⚑ CacheDit Accelerator:

cachedit_compile_workflow

Note: Cache-DiT is best suited for high step-count workflows (β‰₯ 8 steps). torch.compile is supported in the intel/llm-scaler-omni Linux Docker image only and incurs a one-time warm-up cost on the first run.

Cache-DiT Supported Models

Category Models
Image Z-Image, Z-Image-Turbo, Qwen-Image-2512, Flux.2 Klein 4B / 9B
Video LTX-2 T2V / I2V, Wan2.2 14B T2V / I2V

ComfyUI Workflows

On the left side of the web UI, you can find the workflows logo to load and manage workflows. workflow image

All workflow files are available in the workflows/ directory. Below are detailed descriptions of supported workflows organized by category.

Image Generation Workflows

πŸ“– Detailed Documentation: For model files, directory structure and download links, see Image Generation Models.

Qwen-Image

ComfyUI tutorial: https://docs.comfy.org/tutorials/image/qwen/qwen-image

Available Workflows:

  • image_qwen_image.json (official): Native Qwen-Image workflow for text-to-image generation
  • image_qwen_image_2512.json (official): Significant improvements in image quality and realism
  • image_qwen_image_distill.json (official): Distilled version with better performance (recommended)
  • image_qwen_image_layered.json (official): Layered image generation workflow

Note: Use fp8 format for all diffusion models to optimize memory usage and performance. It's recommended to use the distilled version for better performance.

Q: What should I do if I encounter Out of Memory (OOM) errors?

A: You can try the following solutions:

  1. Add --disable-smart-memory parameter when starting ComfyUI.
  2. If the OOM issue persists, you can try adding --reserve-vram 4 parameter to reserve more VRAM.
Qwen-Image-Edit

ComfyUI tutorial: https://docs.comfy.org/tutorials/image/qwen/qwen-image-edit

Available Workflows:

  • image_qwen_image_edit.json (official): Standard image editing workflow

  • image_qwen_image_edit_2511.json (official): Multi-image reference editing workflow (Edit Plus)

These workflows enable image editing based on text prompts, allowing you to modify existing images. The 2511 version supports multi-image reference for advanced editing scenarios like material transfer.

Note: Use fp8 format for all diffusion models to optimize memory usage and performance.

Stable Diffusion 3.5

ComfyUI tutorial: https://comfyanonymous.github.io/ComfyUI_examples/sd3/

Available Workflows:

  • image_sd3.5_simple_example.json: Simple text-to-image workflow
  • image_sd3.5_midium.json: Medium model variant
  • image_sd3.5_large_canny_controlnet_example.json: Large model with Canny edge ControlNet for precise control

Stable Diffusion 3.5 provides high-quality text-to-image generation with optional ControlNet support for guided generation.

Z-Image-Turbo

Comfyui tutorial: https://docs.comfy.org/tutorials/image/z-image/z-image-turbo

Available Workflows:

  • image_z_image_turbo.json (official): Basic workflow for text-to-image generation
Flux.1 Kontext Dev

ComfyUI tutorial: https://docs.comfy.org/tutorials/flux/flux-1-kontext-dev

Available Workflows:

  • image_flux_kontext_dev_basic.json: Basic workflow with multi-image reference support
FireRed-Image-Edit-1.1

HuggingFace: https://huggingface.co/FireRedTeam/FireRed-Image-Edit-1.1-ComfyUI

Available Workflows:

  • image_firered_image_edit_1.1.json: Multi-image reference image editing workflow with optional Lightning LoRA acceleration

Video Generation Workflows

πŸ“– Detailed Documentation: For model files, directory structure and download links, see Video Generation Models.

Wan2.2

ComfyUI tutorial: https://docs.comfy.org/tutorials/video/wan/wan2_2

Available Workflows:

  • video_wan2_2_5B_ti2v.json (official): Text+Image-to-Video with 5B model
  • video_wan2_2_14B_t2v.json (official): Text-to-Video with 14B model
  • video_wan2_2_14B_i2v.json (official): Image-to-Video with 14B model
  • video_wan2_2_14B_t2v_rapid_aio_multi_xpu.json: 14B Text-to-Video with multi-XPU support (using raylight)
  • video_wan2.2_14B_i2v_rapid_aio_multi_xpu.json: 14B Image-to-Video with multi-XPU support

Multi-XPU Support with Raylight:

For workflows using WAN2.2-14B-Rapid-AllInOne with raylight for faster inference with multi-XPU support:

wan_raylight

Steps to Complete Multi-XPU Workflows:

  1. Model Loading

    • Ensure the Load Diffusion Model (Ray) node loads the diffusion model part from WAN2.2-14B-Rapid-AllInOne
    • Ensure the Load VAE node loads the VAE part from WAN2.2-14B-Rapid-AllInOne
    • Ensure the Load CLIP node loads umt5_xxl_fp8_e4m3fn_scaled.safetensors
  2. Ray Configuration

    • Set the GPU and ulysses_degree in Ray Init Actor node to the number of GPUs you want to use
  3. Run the Workflow

    • Click the Run button or use the shortcut Ctrl(cmd) + Enter to run the workflow

Note: Model weights can be obtained from ModelScope. You may need to extract the unet and VAE parts separately using tools/extract.py.

Wan2.2 Animate 14B

Available Workflows:

  • video_wan2_2_animate_basic.json: Video animation workflow with control video support

This is a separate model from the standard Wan2.2 T2V/I2V models, designed specifically for video animation with control video inputs.

HunyuanVideo 1.5 8.3B

ComfyUI tutorial: https://docs.comfy.org/tutorials/video/hunyuan/hunyuan-video-1-5

Available Workflows:

  • video_hunyuan_video_1.5_t2v.json: Basic workflow for Text-to-Video generation

  • video_hunyuan_video_1.5_i2v.json: Basic workflow for Image-to-Video generation

  • video_hunyuan_video_1.5_i2v_multi_xpu.json: 8.3B Image-to-Video multi-XPU support with raylight

The default parameter configurations of these workflows are optimized for 480p FP8 Image-to-Video.

LTX-2

ComfyUI tutorial: https://blog.comfy.org/p/ltx-2-open-source-audio-video-ai

Available Workflows:

  • video_ltx2_19B_t2v.json (official): Text to Video with motion, dialogue, SFX, and music

  • video_ltx2_19B_i2v.json (official): Image to Video with motion, dialogue, SFX, and music

  • video_ltx_2_19B_t2v_distilled.json: Distilled Text-to-Video workflow

  • video_ltx_2_19B_i2v_distilled.json: Distilled Image-to-Video workflow

Note: Model weights of distilled workflow can be obtained from Kijai/LTXV2_comfy.

3D Generation Workflows

πŸ“– Detailed Documentation: For model configuration details, see 3D Generation Models.

Hunyuan3D

Available Workflows:

  • 3d_hunyuan3d.json: Text/Image-to-3D mesh generation

This workflow generates 3D models from text descriptions or images using the Hunyuan3D model.

Video Upscaling Workflows

πŸ“– Detailed Documentation: For model configuration details, see Video Upscale Models.

SeedVR2

GitHub: https://github.com/numz/ComfyUI-SeedVR2_VideoUpscaler

Available Workflows:

  • video_upscale_SeedVR2.json: Video restoration and upscaling workflow

This workflow uses SeedVR2, a diffusion-based video super-resolution model, to upscale and restore video quality.

FlashVSR

GitHub: https://github.com/lihaoyun6/ComfyUI-FlashVSR_Ultra_Fast

Available Workflows:

  • video_upscale_FlashVSR.json: Video restoration and upscaling workflow

This workflow uses FlashVSR-v1.1, a diffusion-based video super-resolution model, to upscale and restore video quality.

Audio Generation Workflows

πŸ“– Detailed Documentation: For model files and setup instructions, see Audio Generation Models.

VoxCPM1.5

Available Workflows:

  • audio_VoxCPM_example.json: Text-to-Speech synthesis

This workflow generates speech audio from text input using the VoxCPM1.5 or VoxCPM model.

IndexTTS 2

Available Workflows:

  • audio_indextts2.json: Voice cloning

This workflow synthesizes new speech using a single reference audio file for voice cloning.

Usage Steps:

  1. Prepare Models

    Download the following models and place them in the <your comfyui model path>/TTS directory:

    • IndexTeam/IndexTTS-2
    • nvidia/bigvgan_v2_22khz_80band_256x
    • funasr/campplus
    • amphion/MaskGCT
    • facebook/w2v-bert-2.0

    Ensure your file structure matches the following hierarchy:

    TTS/
    β”œβ”€β”€ bigvgan_v2_22khz_80band_256x/
    β”‚   β”œβ”€β”€ bigvgan_generator.pt
    β”‚   └── config.json
    β”œβ”€β”€ campplus/
    β”‚   └── campplus_cn_common.bin
    β”œβ”€β”€ IndexTTS-2/
    β”‚   β”œβ”€β”€ .gitattributes
    β”‚   β”œβ”€β”€ bpe.model
    β”‚   β”œβ”€β”€ config.yaml
    β”‚   β”œβ”€β”€ feat1.pt
    β”‚   β”œβ”€β”€ feat2.pt
    β”‚   β”œβ”€β”€ gpt.pth
    β”‚   β”œβ”€β”€ README.md
    β”‚   β”œβ”€β”€ s2mel.pth
    β”‚   β”œβ”€β”€ wav2vec2bert_stats.pt
    β”‚   └── qwen0.6bemo4-merge/
    β”‚       β”œβ”€β”€ added_tokens.json
    β”‚       β”œβ”€β”€ chat_template.jinja
    β”‚       β”œβ”€β”€ config.json
    β”‚       β”œβ”€β”€ generation_config.json
    β”‚       β”œβ”€β”€ merges.txt
    β”‚       β”œβ”€β”€ model.safetensors
    β”‚       β”œβ”€β”€ Modelfile
    β”‚       β”œβ”€β”€ special_tokens_map.json
    β”‚       β”œβ”€β”€ tokenizer.json
    β”‚       β”œβ”€β”€ tokenizer_config.json
    β”‚       └── vocab.json
    β”œβ”€β”€ MaskGCT/
    β”‚   └── semantic_codec/
    β”‚       └── model.safetensors
    └── w2v-bert-2.0/
        β”œβ”€β”€ .gitattributes
        β”œβ”€β”€ config.json
        β”œβ”€β”€ conformer_shaw.pt
        β”œβ”€β”€ model.safetensors
        β”œβ”€β”€ preprocessor_config.json
        └── README.md
    
  2. Configure Workflow

    • Load the reference audio file.
    • Set the desired input text.
  3. Run the Workflow

    • Execute the workflow to generate the speech.

SGLang Diffusion (experimental)

πŸ“– Detailed Documentation: See SGLang Diffusion Guide for complete server configuration, API reference, and multi-GPU setup. For ComfyUI integration, see SGLang Diffusion ComfyUI Guide.

SGLang Diffusion provides OpenAI-compatible API for image/video generation models.

1. CLI Generation

sglang generate --model-path /llm/models/Wan2.1-T2V-1.3B-Diffusers \
    --text-encoder-cpu-offload --pin-cpu-memory \
    --prompt "A curious raccoon" \
    --save-output

2. OpenAI API Server

Start the server:

# Configure proxy if needed
export http_proxy=<your_http_proxy>
export https_proxy=<your_https_proxy>
export no_proxy=localhost,127.0.0.1

# Start server
sglang serve --model-path /llm/models/Z-Image-Turbo/ \
    --vae-cpu-offload --pin-cpu-memory \
    --num-gpus 1 --port 30010

Or use the provided script:

bash /llm/entrypoints/start_sgl_diffusion.sh

cURL example:

curl http://localhost:30010/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Z-Image-Turbo",
    "prompt": "A beautiful sunset over the ocean",
    "size": "1024x1024"
  }'

Python example (OpenAI SDK):

from openai import OpenAI
import base64

client = OpenAI(base_url="http://localhost:30010/v1", api_key="EMPTY")

response = client.images.generate(
    model="Z-Image-Turbo",
    prompt="A beautiful sunset over the ocean",
    size="1024x1024",
)

# Save image from base64 response
with open("output.png", "wb") as f:
    f.write(base64.b64decode(response.data[0].b64_json))

XInference

xinference-local --host 0.0.0.0 --port 9997

Supported models:

  • Stable Diffusion 3.5 Medium
  • Kokoro 82M
  • whisper large v3

WebUI Usage

1. Access Xinference Web UI

xinference_launch

2. Select model and configure model_path

xinference_model

3. Find running model and launch Gradio UI for this model

xinference_gradio

4. Generate within Gradio UI

xinference_example

OpenAI API Usage

Visit http://127.0.0.1:9997/docs to inspect the API docs.

1. Launch API service

You can select model and launch service via WebUI (refer to here) or by command:

xinference-local --host 0.0.0.0 --port 9997

xinference launch --model-name sd3.5-medium --model-type image --model-path /llm/models/stable-diffusion-3.5-medium/ --gpu-idx 0

2. Post request in OpenAI API format

For TTS model (Kokoro 82M for example):

curl http://localhost:9997/v1/audio/speech   -H "Content-Type: application/json"   -d '{
    "model": "Kokoro-82M",
    "input": "kokoro, hello, I am kokoro." 
  }'   --output output.wav

For STT models (whisper large v3 for example):

AUDIO_FILE_PATH=<your_audio_file_path>

curl -X 'POST' \
  "http://localhost:9997/v1/audio/translations" \
  -H 'accept: application/json' \
  -F "model=whisper-large-v3" \
  -F "file=@${AUDIO_FILE_PATH}"

{"text":" Cacaro's hello, I am Cacaro."}

For text-to-image models (Stable Diffusion 3.5 Medium for example):

curl http://localhost:9997/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sd3.5-medium",
    "prompt": "A Shiba Inu chasing butterflies on a sunny grassy field, cartoon style, with vibrant colors.",
    "n": 1,
    "size": "1024x1024",
    "quality": "standard",
    "response_format": "url"
  }'

Stand-alone Examples

Notes: Stand-alone examples are excluded from intel/llm-scaler-omni image.

Supported models:

  • Hunyuan3D 2.1
  • Qwen Image
  • Wan 2.1 / 2.2

ComfyUI for Windows (experimental)

We have provided a conda-install method to use llm-scaler-omni version ComfyUI on Windows.

git clone https://github.com/intel/llm-scaler.git
cd llm-scaler\omni\
.\init_conda_env.bat

After installation, you can enter the ComfyUI directory and start ComfyUI server.

cd ComfyUI
conda activate omni_env
$env:HTTP_PROXY = <your_proxy>
$env:HTTPS_PROXY = <your_proxy>
python .\main.py --listen 0.0.0.0