Continuous batching fails with broadcast shape error when Turbo Quant is enabled (Qwen3.5-27B 4-bit)

**Describe the bug**
When running performance benchmarks on the Qwen3.5-27B 4-bit model with the Turbo Quant feature enabled, continuous batching fails with a shape broadcasting error. The error message shown is:  
`Cannot broadcast array of shape (2,4,1,256) into shape (1,4,1,256).`  

Single request benchmarks complete successfully, but the issue consistently appears when continuous batching tests are executed.

---

**To Reproduce**
1. Select model: **Qwen3.5-27B-4bit (15.7 GB)**  
2. Enable **Turbo Quant** ( in the model settings)
3. Open the Performance Benchmark interface  
4. (Likely can be skipped) Under *Single Request Tests*, select multiple prompt sizes (e.g., pp1024 through pp65536)  
5. Under *Continuous Batching Tests*, enable batch sizes (e.g., 2x, 4x, 8x batch)  
6. Click **Run Benchmark**  
7. Observe error during continuous batching phase  

---

**Expected behavior** 
Continuous batching benchmarks should execute successfully and return throughput and latency metrics, similar to single request benchmarks, without any tensor shape mismatch errors.

---

**Actual behavior** 
The benchmark fails during continuous batching with the error:  
`Cannot broadcast array of shape (2,4,1,256) into shape (1,4,1,256)`  

This prevents completion of batching performance tests.

---

**Screenshots**

<img width="1466" height="1019" alt="Image" src="https://github.com/user-attachments/assets/3a925b52-94d6-499c-9e9a-653d63a5e58c" />

---

**Desktop**
- OS: macOS Tahoe
- Browser: Chrome 
- Version: 26.4

---

**Additional context**  
- Issue only occurs when **Turbo Quant is enabled**  
- Single request benchmarks complete without issue  
- Continuous batching appears to trigger a tensor shape mismatch, likely related to batch dimension handling or KV cache allocation under quantized inference  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continuous batching fails with broadcast shape error when Turbo Quant is enabled (Qwen3.5-27B 4-bit) #559

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Continuous batching fails with broadcast shape error when Turbo Quant is enabled (Qwen3.5-27B 4-bit) #559

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions