-
Notifications
You must be signed in to change notification settings - Fork 718
Context Overflow Crash on Qwen 3.5-122BA10B #421
Description
Describe the bug
Environment:
Hardware: Mac Studio / Mac Pro (M2 Ultra, 196GB Unified Memory)
Model: Qwen 3.5 122B (Versions: 122ba10b Q4 and Q8)
Clients: OpenCode or Cline
Settings: Official Qwen recommended parameters

Description
A consistent crash occurs when the input context length exceeds approximately 42k tokens.
Behavior under 42k: The model performs normally with fast inference and stable memory usage.
Behavior above 42k: The model runner crashes immediately upon reaching the 42k threshold.
Consistency: This issue persists across different quantization levels (both Q4 and Q8) and different integration tools (OpenCode or Cline), suggesting a potential issue with the KV cache management or the context window implementation for this specific model version on Apple Silicon.
To Reproduce
Steps to reproduce the behavior:
- Go to '...'
- Click on '....'
- Scroll down to '....'
- See error
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
- OS: [e.g. iOS]
- Browser [e.g. chrome, safari]
- Version [e.g. 22]
Smartphone (please complete the following information):
- Device: [e.g. iPhone6]
- OS: [e.g. iOS8.1]
- Browser [e.g. stock browser, safari]
- Version [e.g. 22]
Additional context
Add any other context about the problem here.