中文说明 | English
1~4.79× speedup for diffusion sampling — Unofficial ComfyUI implementation of Spectrum (CVPR 2026). Training-free, plug-and-play.
Spectrum is a training-free diffusion sampling acceleration technique. It treats the internal features of the denoiser as functions over time and approximates them with Chebyshev polynomials in the spectral domain, enabling prediction and skipping of redundant network forward passes. Unlike prior methods (e.g., local Taylor expansion), Spectrum's approximation error does not compound with skip distance, maintaining sample quality even at high speedup ratios.
Currently supported models:
| Model | Detection Type | Quality |
|---|---|---|
| Klein 9b | Flux-like | Excellent |
| Longcat Image | Flux-like | Excellent |
| FLUX.1 | Flux-like | Excellent |
| Qwen Image (T2I) | MMDiT | Good |
| Z Image Turbo | Lumina2 | Excellent |
| ErnieImage | Ernie | Normal |
| Wan2.2 | Wan | Modest speedup (dual sampling, fewer steps per round) |
| HunyuanVideo 1.5 | Hunyuan | Normal |
| Qwen Image Edit | MMDiT | Poor (60 layers with split modulation; not recommended) |
| LTX2.3 | LTX | Untested (hardware-limited) |
The node requires warmup_steps (default 3) to build an initial cache, then gradually accelerates. More total steps = more noticeable speedup. For lightweight models like Z Image Turbo or Klein, warmup_steps can be set to 1.
Text-to-Image (qwen image):
Image Editing (klein base 9b):
All tests on RTX 4090 with default parameters (w=0.5, M=4, window_size=2, flex_window=0.75).
| Klein 9b | Z Image Turbo | Qwen Image | ErnieImage | |
|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
View each feature channel at the output of the denoiser's last attention block as a scalar function
Approximate each channel using
where
Why Chebyshev? Its approximation error bound depends only on the degree
At each actual forward pass, collect block output features
Solved as
Final features are a convex combination:
- Taylor term: discrete forward-difference extrapolation from nearest cached points — captures high-frequency details
- Chebyshev term: global spectral fit over all cached points — captures long-range trends
-
$w$ controls the blend: larger skips favor Chebyshev, smaller skips favor Taylor
Analogy: Taylor prediction is like judging a car's next position by its taillight distance — accurate up close, wildly wrong at range. Chebyshev prediction is like reading the car's driving rhythm — you can predict 5 steps ahead almost as well as 1.
-
Formula:
$h_{\text{mix}} = (1-w) \cdot h_{\text{taylor}} + w \cdot h_{\text{cheb}}$ - Range: 0.0 ~ 1.0, Default 0.5, Recommended 0.3 ~ 0.8
- w=0: pure Taylor (good for short skips); w=1: pure Chebyshev (stable for long skips)
- Dynamically adjusted: larger windows → higher w, capped at
max_w
-
Formula:
$\sum_{m=0}^{M} c_{m,i} \cdot T_m(\tau)$ - Range: 1 ~ 10, Default 4, Recommended 3 ~ 6
- M=2 too coarse; M=4 sweet spot; M=6+ diminishing returns
-
Formula:
$(\Phi^T\Phi + \lambda I)^{-1}\Phi^T\mathbf{H}$ - Range: 0.001 ~ 10.0, Default 0.1, Recommended 0.01 ~ 1.0
- Too small → numerical instability; too large → underfitting. 0.1 is the paper's optimal value.
- Range: 0 ~ 20, Default 3, Recommended 2 ~ 5
- First N steps always run full precision to build initial cache
- Set to 1 for lightweight models (Klein, Z Image Turbo)
- Set to total steps to disable acceleration entirely
-
Formula:
$\mathcal{N}$ (paper's initial window size) - Range: 1.0 ~ 16.0, Default 2.0, Recommended 1.5 ~ 4.0
- 1 = no skip; 2 = every other step; higher = more aggressive initially
-
Formula:
$\alpha$ (paper's adaptive scheduling slope) - Range: 0.0 ~ 4.0, Default 0.75, Recommended 0.3 ~ 2.0
- Interval sequence:
window, window+α, window+2α, window+3α, ... - α=0: fixed schedule; α=0.75: gradual; α=3.0: aggressive
- Why grow? Early steps determine layout (error-sensitive), later steps refine details (error-tolerant)
- Step size 0.01 for precise tuning
Analogy: flex_window is your throttle. α=0 is cruise control, α=0.75 is gradual acceleration, α=3.0 is pedal-to-the-metal. "Slow first, fast later" is optimal.
- Range: 0.0 ~ 1.0, Default 0.8, Recommended 0.6 ~ 0.9
- Upper bound for dynamic w. Raise to 0.9 for extreme speedups; otherwise leave at 0.8.
- Prints per-step FWD/SKIP decisions and window sizes for parameter tuning.
| Scenario | Parameters | Expected Speedup |
|---|---|---|
| Conservative (quality-first) | w=0.3, M=4, warmup=4, window=2, flex=0.3, max_w=0.6 | ≈2× |
| Balanced (default) | w=0.5, M=4, warmup=3, window=2, flex=0.75, max_w=0.8 | ≈3× |
| Aggressive (speed-first) | w=0.7, M=6, warmup=2, window=2, flex=2.0, max_w=0.9 | ≈4–5× |
| Image Editing | w=0.5, M=4, warmup=4, window=2, flex=0.5, max_w=0.8 | ≈2× |
- Klein / Longcat Edit: Single blocks apply uniform modulation, smoothing main/ref token differences. Acceleration quality matches T2I.
- Qwen Image Edit: All 60 layers use split timestep_zero modulation. Main token step-to-step variation is 3× that of T2I, causing severe quality degradation. Use conservative parameters or disable acceleration.
This node was developed with assistance from Claude Code and DeepSeek. Licensed under the MIT License, same as the original project. Feel free to use and contribute.
@article{han2026adaptive,
title={Adaptive Spectral Feature Forecasting for Diffusion Sampling Acceleration},
author={Han, Jiaqi and Shi, Juntong and Li, Puheng and Ye, Haotian and Guo, Qiushan and Ermon, Stefano},
journal={arXiv preprint arXiv:2603.01623},
year={2026}
}





