Skip to content

Context Overflow Crash on Qwen 3.5-122BA10B #421

@sewall2025

Description

@sewall2025

Describe the bug
Environment:
Hardware: Mac Studio / Mac Pro (M2 Ultra, 196GB Unified Memory)
Model: Qwen 3.5 122B (Versions: 122ba10b Q4 and Q8)
Clients: OpenCode or Cline

Settings: Official Qwen recommended parameters
Image

Description
A consistent crash occurs when the input context length exceeds approximately 42k tokens.

Behavior under 42k: The model performs normally with fast inference and stable memory usage.

Behavior above 42k: The model runner crashes immediately upon reaching the 42k threshold.

Consistency: This issue persists across different quantization levels (both Q4 and Q8) and different integration tools (OpenCode or Cline), suggesting a potential issue with the KV cache management or the context window implementation for this specific model version on Apple Silicon.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions