- Getting Started with Omni Docker Image
- ComfyUI
- SGLang Diffusion
- XInference
- Stand-alone Examples
- ComfyUI for Windows (experimental)
Pull docker image from dockerhub:
docker pull intel/llm-scaler-omni:0.1.0-b6Or build docker image:
bash build.shRun docker image:
export DOCKER_IMAGE=intel/llm-scaler-omni:0.1.0-b6
export CONTAINER_NAME=comfyui
export MODEL_DIR=<your_model_dir>
export COMFYUI_MODEL_DIR=<your_comfyui_model_dir>
sudo docker run -itd \
--privileged \
--net=host \
--device=/dev/dri \
-e no_proxy=localhost,127.0.0.1 \
--name=$CONTAINER_NAME \
-v $MODEL_DIR:/llm/models/ \
-v $COMFYUI_MODEL_DIR:/llm/ComfyUI/models \
--shm-size="64g" \
--entrypoint=/bin/bash \
$DOCKER_IMAGE
docker exec -it comfyui bashπ Detailed Documentation: See ComfyUI Detailed Guide for complete model configuration, directory structure, and official reference links. δΈζζζ‘£
cd /llm/ComfyUI
export http_proxy=<your_proxy>
export https_proxy=<your_proxy>
export no_proxy=localhost,127.0.0.1
python3 main.py --listen 0.0.0.0Then you can access the webUI at http://<your_local_ip>:8188/.
Click the button on the top-right corner to launch ComfyUI Manager.

Modify the Preview method to show the preview image during sampling iterations.
The following models are supported in ComfyUI workflows. For detailed model files and directory structure, see the ComfyUI Guide.
| Model Category | Model Name | Type | Workflow Files |
|---|---|---|---|
| Image Generation | Qwen-Image, Qwen-Image-Edit, Qwen-Image-Edit-2511 | Text-to-Image, Image Editing | image_qwen_image.json, image_qwen_image_2512.json, image_qwen_image_distill.json, image_qwen_image_edit.json, image_qwen_image_edit_2509.json, image_qwen_image_edit_2511.json, image_qwen_image_layered.json |
| Image Generation | Stable Diffusion 3.5 | Text-to-Image, ControlNet | image_sd3.5_simple_example.json, image_sd3.5_midium.json, image_sd3.5_large_canny_controlnet_example.json |
| Image Generation | Z-Image-Turbo | Text-to-Image | image_z_image_turbo.json |
| Image Generation | Flux.1, Flux.1 Kontext dev | Text-to-Image, Multi-Image Reference, ControlNet | image_flux_kontext_dev_basic.json, image_flux_controlnet_example.json |
| Image Generation | FireRed-Image-Edit-1.1 | Image Editing | image_firered_image_edit_1.1.json |
| Video Generation | Wan2.2 TI2V 5B, Wan2.2 T2V 14B, Wan2.2 I2V 14B | Text-to-Video, Image-to-Video | video_wan2_2_5B_ti2v.json, video_wan2_2_14B_t2v.json, video_wan2_2_14B_t2v_rapid_aio_multi_xpu.json, video_wan2.2_14B_i2v_rapid_aio_multi_xpu.json |
| Video Generation | Wan2.2 Animate 14B | Video Animation | video_wan2_2_animate_basic.json |
| Video Generation | HunyuanVideo 1.5 8.3B | Text-to-Video, Image-to-Video | video_hunyuan_video_1.5_t2v.json, video_hunyuan_video_1.5_i2v.json, video_hunyuan_video_1.5_i2v_multi_xpu.json |
| Video Generation | LTX-2 T2V 19B, LTX-2 I2V 19B, | Text-to-Video, Image-to-Video | video_ltx2_19B_t2v.json, video_ltx2_19B_i2v.json, video_ltx_2_19B_t2v_distilled.json, video_ltx_2_19B_i2v_distilled.json |
| 3D Generation | Hunyuan3D 2.1 | Text/Image-to-3D | 3d_hunyuan3d.json |
| Audio Generation | VoxCPM1.5, IndexTTS 2 | Text-to-Speech, Voice Cloning | audio_VoxCPM_example.json, audio_indextts2.json |
| Video Upscaling | SeedVR2, FlashVSR-v1.1 | Video Restoration and Upscaling | video_upscale_SeedVR2.json, video_upscale_FlashVSR.json |
Some nodes are disabled by default to save resources. To use SeedVR2, FlashVSR Hunyuan3D, VoxCPM, IndexTTS, or HY-Motion1, you can enable them using ComfyUI Manager:
- Click the Manager button in the ComfyUI menu.
- In the Manager window, use the Filter dropdown to select Disabled.
- Locate the node you want to enable (e.g.,
IndexTTS,VoxCPM) and click Enable. - Restart ComfyUI and refresh the page to apply changes.
Cache-DiT accelerates diffusion model inference by caching and reusing intermediate DiT block outputs across denoising steps, skipping redundant computation without retraining. Combined with torch.compile, it provides further speedup through graph-level kernel fusion. The ComfyUI integration is powered by ComfyUI-CacheDiT.
The table below shows a comparison on Z-Image-Turbo across three configurations:
| No Acceleration | Cache-DiT | torch.compile | Cache-DiT + torch.compile |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
| Baseline | ~1.5x speedup | ~1.45x speedup | ~2.2x speedup |
Insert the acceleration node(s) between the model loader and the sampler.
Cache-DiT only β add β‘ CacheDit Accelerator after the model loader:
torch.compile only β add TorchCompileModel after the model loader:
Cache-DiT + torch.compile β chain TorchCompileModel after β‘ CacheDit Accelerator:
Note: Cache-DiT is best suited for high step-count workflows (β₯ 8 steps).
torch.compileis supported in theintel/llm-scaler-omniLinux Docker image only and incurs a one-time warm-up cost on the first run.
| Category | Models |
|---|---|
| Image | Z-Image, Z-Image-Turbo, Qwen-Image-2512, Flux.2 Klein 4B / 9B |
| Video | LTX-2 T2V / I2V, Wan2.2 14B T2V / I2V |
On the left side of the web UI, you can find the workflows logo to load and manage workflows.

All workflow files are available in the workflows/ directory. Below are detailed descriptions of supported workflows organized by category.
π Detailed Documentation: For model files, directory structure and download links, see Image Generation Models.
ComfyUI tutorial: https://docs.comfy.org/tutorials/image/qwen/qwen-image
Available Workflows:
- image_qwen_image.json (official): Native Qwen-Image workflow for text-to-image generation
- image_qwen_image_2512.json (official): Significant improvements in image quality and realism
- image_qwen_image_distill.json (official): Distilled version with better performance (recommended)
- image_qwen_image_layered.json (official): Layered image generation workflow
Note: Use fp8 format for all diffusion models to optimize memory usage and performance. It's recommended to use the distilled version for better performance.
Q: What should I do if I encounter Out of Memory (OOM) errors?
A: You can try the following solutions:
- Add
--disable-smart-memoryparameter when starting ComfyUI.- If the OOM issue persists, you can try adding
--reserve-vram 4parameter to reserve more VRAM.
ComfyUI tutorial: https://docs.comfy.org/tutorials/image/qwen/qwen-image-edit
Available Workflows:
-
image_qwen_image_edit.json (official): Standard image editing workflow
-
image_qwen_image_edit_2511.json (official): Multi-image reference editing workflow (Edit Plus)
These workflows enable image editing based on text prompts, allowing you to modify existing images. The 2511 version supports multi-image reference for advanced editing scenarios like material transfer.
Note: Use fp8 format for all diffusion models to optimize memory usage and performance.
ComfyUI tutorial: https://comfyanonymous.github.io/ComfyUI_examples/sd3/
Available Workflows:
- image_sd3.5_simple_example.json: Simple text-to-image workflow
- image_sd3.5_midium.json: Medium model variant
- image_sd3.5_large_canny_controlnet_example.json: Large model with Canny edge ControlNet for precise control
Stable Diffusion 3.5 provides high-quality text-to-image generation with optional ControlNet support for guided generation.
Comfyui tutorial: https://docs.comfy.org/tutorials/image/z-image/z-image-turbo
Available Workflows:
- image_z_image_turbo.json (official): Basic workflow for text-to-image generation
ComfyUI tutorial: https://docs.comfy.org/tutorials/flux/flux-1-kontext-dev
Available Workflows:
- image_flux_kontext_dev_basic.json: Basic workflow with multi-image reference support
Available Workflows:
- image_firered_image_edit_1.1.json: Multi-image reference image editing workflow with optional Lightning LoRA acceleration
π Detailed Documentation: For model files, directory structure and download links, see Video Generation Models.
ComfyUI tutorial: https://docs.comfy.org/tutorials/video/wan/wan2_2
Available Workflows:
- video_wan2_2_5B_ti2v.json (official): Text+Image-to-Video with 5B model
- video_wan2_2_14B_t2v.json (official): Text-to-Video with 14B model
- video_wan2_2_14B_i2v.json (official): Image-to-Video with 14B model
- video_wan2_2_14B_t2v_rapid_aio_multi_xpu.json: 14B Text-to-Video with multi-XPU support (using raylight)
- video_wan2.2_14B_i2v_rapid_aio_multi_xpu.json: 14B Image-to-Video with multi-XPU support
Multi-XPU Support with Raylight:
For workflows using WAN2.2-14B-Rapid-AllInOne with raylight for faster inference with multi-XPU support:
Steps to Complete Multi-XPU Workflows:
-
Model Loading
- Ensure the
Load Diffusion Model (Ray)node loads the diffusion model part from WAN2.2-14B-Rapid-AllInOne - Ensure the
Load VAEnode loads the VAE part from WAN2.2-14B-Rapid-AllInOne - Ensure the
Load CLIPnode loadsumt5_xxl_fp8_e4m3fn_scaled.safetensors
- Ensure the
-
Ray Configuration
- Set the
GPUandulysses_degreeinRay Init Actornode to the number of GPUs you want to use
- Set the
-
Run the Workflow
- Click the
Runbutton or use the shortcutCtrl(cmd) + Enterto run the workflow
- Click the
Note: Model weights can be obtained from ModelScope. You may need to extract the unet and VAE parts separately using
tools/extract.py.
Available Workflows:
- video_wan2_2_animate_basic.json: Video animation workflow with control video support
This is a separate model from the standard Wan2.2 T2V/I2V models, designed specifically for video animation with control video inputs.
ComfyUI tutorial: https://docs.comfy.org/tutorials/video/hunyuan/hunyuan-video-1-5
Available Workflows:
-
video_hunyuan_video_1.5_t2v.json: Basic workflow for Text-to-Video generation
-
video_hunyuan_video_1.5_i2v.json: Basic workflow for Image-to-Video generation
-
video_hunyuan_video_1.5_i2v_multi_xpu.json: 8.3B Image-to-Video multi-XPU support with raylight
The default parameter configurations of these workflows are optimized for 480p FP8 Image-to-Video.
ComfyUI tutorial: https://blog.comfy.org/p/ltx-2-open-source-audio-video-ai
Available Workflows:
-
video_ltx2_19B_t2v.json (official): Text to Video with motion, dialogue, SFX, and music
-
video_ltx2_19B_i2v.json (official): Image to Video with motion, dialogue, SFX, and music
-
video_ltx_2_19B_t2v_distilled.json: Distilled Text-to-Video workflow
-
video_ltx_2_19B_i2v_distilled.json: Distilled Image-to-Video workflow
Note: Model weights of distilled workflow can be obtained from Kijai/LTXV2_comfy.
π Detailed Documentation: For model configuration details, see 3D Generation Models.
Available Workflows:
- 3d_hunyuan3d.json: Text/Image-to-3D mesh generation
This workflow generates 3D models from text descriptions or images using the Hunyuan3D model.
π Detailed Documentation: For model configuration details, see Video Upscale Models.
Available Workflows:
- video_upscale_SeedVR2.json: Video restoration and upscaling workflow
This workflow uses SeedVR2, a diffusion-based video super-resolution model, to upscale and restore video quality.
Available Workflows:
- video_upscale_FlashVSR.json: Video restoration and upscaling workflow
This workflow uses FlashVSR-v1.1, a diffusion-based video super-resolution model, to upscale and restore video quality.
π Detailed Documentation: For model files and setup instructions, see Audio Generation Models.
Available Workflows:
- audio_VoxCPM_example.json: Text-to-Speech synthesis
This workflow generates speech audio from text input using the VoxCPM1.5 or VoxCPM model.
Available Workflows:
- audio_indextts2.json: Voice cloning
This workflow synthesizes new speech using a single reference audio file for voice cloning.
Usage Steps:
-
Prepare Models
Download the following models and place them in the
<your comfyui model path>/TTSdirectory:IndexTeam/IndexTTS-2nvidia/bigvgan_v2_22khz_80band_256xfunasr/campplusamphion/MaskGCTfacebook/w2v-bert-2.0
Ensure your file structure matches the following hierarchy:
TTS/ βββ bigvgan_v2_22khz_80band_256x/ β βββ bigvgan_generator.pt β βββ config.json βββ campplus/ β βββ campplus_cn_common.bin βββ IndexTTS-2/ β βββ .gitattributes β βββ bpe.model β βββ config.yaml β βββ feat1.pt β βββ feat2.pt β βββ gpt.pth β βββ README.md β βββ s2mel.pth β βββ wav2vec2bert_stats.pt β βββ qwen0.6bemo4-merge/ β βββ added_tokens.json β βββ chat_template.jinja β βββ config.json β βββ generation_config.json β βββ merges.txt β βββ model.safetensors β βββ Modelfile β βββ special_tokens_map.json β βββ tokenizer.json β βββ tokenizer_config.json β βββ vocab.json βββ MaskGCT/ β βββ semantic_codec/ β βββ model.safetensors βββ w2v-bert-2.0/ βββ .gitattributes βββ config.json βββ conformer_shaw.pt βββ model.safetensors βββ preprocessor_config.json βββ README.md -
Configure Workflow
- Load the reference audio file.
- Set the desired input text.
-
Run the Workflow
- Execute the workflow to generate the speech.
π Detailed Documentation: See SGLang Diffusion Guide for complete server configuration, API reference, and multi-GPU setup. For ComfyUI integration, see SGLang Diffusion ComfyUI Guide.
SGLang Diffusion provides OpenAI-compatible API for image/video generation models.
sglang generate --model-path /llm/models/Wan2.1-T2V-1.3B-Diffusers \
--text-encoder-cpu-offload --pin-cpu-memory \
--prompt "A curious raccoon" \
--save-outputStart the server:
# Configure proxy if needed
export http_proxy=<your_http_proxy>
export https_proxy=<your_https_proxy>
export no_proxy=localhost,127.0.0.1
# Start server
sglang serve --model-path /llm/models/Z-Image-Turbo/ \
--vae-cpu-offload --pin-cpu-memory \
--num-gpus 1 --port 30010Or use the provided script:
bash /llm/entrypoints/start_sgl_diffusion.shcURL example:
curl http://localhost:30010/v1/images/generations \
-H "Content-Type: application/json" \
-d '{
"model": "Z-Image-Turbo",
"prompt": "A beautiful sunset over the ocean",
"size": "1024x1024"
}'Python example (OpenAI SDK):
from openai import OpenAI
import base64
client = OpenAI(base_url="http://localhost:30010/v1", api_key="EMPTY")
response = client.images.generate(
model="Z-Image-Turbo",
prompt="A beautiful sunset over the ocean",
size="1024x1024",
)
# Save image from base64 response
with open("output.png", "wb") as f:
f.write(base64.b64decode(response.data[0].b64_json))xinference-local --host 0.0.0.0 --port 9997Supported models:
- Stable Diffusion 3.5 Medium
- Kokoro 82M
- whisper large v3
Visit http://127.0.0.1:9997/docs to inspect the API docs.
You can select model and launch service via WebUI (refer to here) or by command:
xinference-local --host 0.0.0.0 --port 9997
xinference launch --model-name sd3.5-medium --model-type image --model-path /llm/models/stable-diffusion-3.5-medium/ --gpu-idx 0For TTS model (Kokoro 82M for example):
curl http://localhost:9997/v1/audio/speech -H "Content-Type: application/json" -d '{
"model": "Kokoro-82M",
"input": "kokoro, hello, I am kokoro."
}' --output output.wavFor STT models (whisper large v3 for example):
AUDIO_FILE_PATH=<your_audio_file_path>
curl -X 'POST' \
"http://localhost:9997/v1/audio/translations" \
-H 'accept: application/json' \
-F "model=whisper-large-v3" \
-F "file=@${AUDIO_FILE_PATH}"
{"text":" Cacaro's hello, I am Cacaro."}For text-to-image models (Stable Diffusion 3.5 Medium for example):
curl http://localhost:9997/v1/images/generations \
-H "Content-Type: application/json" \
-d '{
"model": "sd3.5-medium",
"prompt": "A Shiba Inu chasing butterflies on a sunny grassy field, cartoon style, with vibrant colors.",
"n": 1,
"size": "1024x1024",
"quality": "standard",
"response_format": "url"
}'Notes: Stand-alone examples are excluded from
intel/llm-scaler-omniimage.
Supported models:
- Hunyuan3D 2.1
- Qwen Image
- Wan 2.1 / 2.2
We have provided a conda-install method to use llm-scaler-omni version ComfyUI on Windows.
git clone https://github.com/intel/llm-scaler.git
cd llm-scaler\omni\
.\init_conda_env.batAfter installation, you can enter the ComfyUI directory and start ComfyUI server.
cd ComfyUI
conda activate omni_env
$env:HTTP_PROXY = <your_proxy>
$env:HTTPS_PROXY = <your_proxy>
python .\main.py --listen 0.0.0.0












