-
Notifications
You must be signed in to change notification settings - Fork 718
Continuous batching fails with broadcast shape error when Turbo Quant is enabled (Qwen3.5-27B 4-bit) #559
Description
Describe the bug
When running performance benchmarks on the Qwen3.5-27B 4-bit model with the Turbo Quant feature enabled, continuous batching fails with a shape broadcasting error. The error message shown is:
Cannot broadcast array of shape (2,4,1,256) into shape (1,4,1,256).
Single request benchmarks complete successfully, but the issue consistently appears when continuous batching tests are executed.
To Reproduce
- Select model: Qwen3.5-27B-4bit (15.7 GB)
- Enable Turbo Quant ( in the model settings)
- Open the Performance Benchmark interface
- (Likely can be skipped) Under Single Request Tests, select multiple prompt sizes (e.g., pp1024 through pp65536)
- Under Continuous Batching Tests, enable batch sizes (e.g., 2x, 4x, 8x batch)
- Click Run Benchmark
- Observe error during continuous batching phase
Expected behavior
Continuous batching benchmarks should execute successfully and return throughput and latency metrics, similar to single request benchmarks, without any tensor shape mismatch errors.
Actual behavior
The benchmark fails during continuous batching with the error:
Cannot broadcast array of shape (2,4,1,256) into shape (1,4,1,256)
This prevents completion of batching performance tests.
Screenshots
Desktop
- OS: macOS Tahoe
- Browser: Chrome
- Version: 26.4
Additional context
- Issue only occurs when Turbo Quant is enabled
- Single request benchmarks complete without issue
- Continuous batching appears to trigger a tensor shape mismatch, likely related to batch dimension handling or KV cache allocation under quantized inference