Kernel panic (IOGPUMemory completeMemory() prepare count underflow) after upgrading v0.2.20 → v0.2.23

## Kernel panic (IOGPUMemory completeMemory() prepare count underflow) after upgrading from v0.2.20 to v0.2.23

### Summary

Upgrading oMLX from v0.2.20 to v0.2.23 causes repeated kernel panics on Mac Studio M2 Ultra 64GB. Rolling back to v0.2.20 immediately resolves the issue. 4 identical panics occurred within ~1 hour of running v0.2.23.

### Environment

- **Hardware**: Mac Studio M2 Ultra, 64GB unified memory
- **macOS**: 26.4 (Build 25E5233c), Darwin 25.4.0
- **oMLX**: v0.2.23 (panics) / v0.2.20 (stable)
- **Models loaded**: Qwen3.5-35B-A3B-4bit (pinned, ~20GB) + Qwen3.5-0.8B-4bit (SpecPrefill draft)
- **Config**: `--max-process-memory 80% --hot-cache-max-size 8GB --paged-ssd-cache-dir ~/.omlx/cache`
- **SSD cache**: ~92GB (1163 files)

### Panic details

All 4 panics are identical:

```
panic(cpu 18 caller 0xfffffe00427725d8): "completeMemory() prepare count underflow" @IOGPUMemory.cpp:550
```

Timestamps: 21:11, 21:29, 21:40, 21:48 (2026-03-27, ~10-20 min intervals)

### Reproduction steps

1. Running oMLX v0.2.20 stably for days (no panics)
2. `pip install "omlx @ git+https://github.com/jundot/omlx.git@v0.2.23"`
3. `launchctl kickstart -k gui/502/ai.jarvis.omlx`
4. oMLX starts normally, models load, inference works
5. Within 10-20 minutes: kernel panic
6. After reboot, panics repeat on each boot until rollback

### What v0.2.23 changed (from source diff)

Comparing v0.2.20 and v0.2.23, the relevant changes are:

1. **New files**: `patches/turboquant_attention.py`, `turboquant_kv.py` (TurboQuant KV cache, disabled via hardcoded `turboquant_kv_enabled = False`)
2. **`cache/prefix_cache.py`**: Changed offset calculation to always use tensor shape instead of meta_state. Added TurboQuantKVCache handling with block slicing.
3. **`cache/paged_ssd_cache.py`**: Added TurboQuant tensor serialization, disk pressure handling (ENOSPC/EDQUOT)
4. **`cache/type_handlers.py`**: Changed cache offset logic: `cache.offset = keys.shape[2]` instead of using meta_state
5. **`engine/batched.py`**: Added TurboQuant initialization code (checks `turboquant_kv_enabled`, patches attention)
6. **`model_settings.py`**: Added `turboquant_kv_enabled` and `turboquant_kv_bits` fields

Even though TurboQuant is disabled, the new cache handling code (prefix_cache offset fix, type_handlers change) runs unconditionally and likely changes Metal buffer allocation/deallocation patterns.

### Analysis

The panic is a Metal GPU driver reference count underflow. The GPU driver's `completeMemory()` function encounters more deallocations than allocations, causing a negative prepare count. This is likely triggered by:

- Changed buffer lifecycle in the new prefix cache offset logic
- Metal buffer allocation patterns from TurboQuant infrastructure code (type registry, cache type detection) even when TurboQuant is disabled
- Interaction with SSD cache reconstruction on startup (92GB cache, 1163 files)

### Workaround

Rolling back to v0.2.20 resolves the issue completely:
```bash
pip install "omlx @ git+https://github.com/jundot/omlx.git@v0.2.20"
```

### Related issues

- mlx-lm #883: Same IOGPUMemory panic with 30B model on M3 Ultra
- oMLX #300: Kernel panics reported in v0.2.13/v0.2.18, stable in v0.2.11


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel panic (IOGPUMemory completeMemory() prepare count underflow) after upgrading v0.2.20 → v0.2.23 #435

Kernel panic (IOGPUMemory completeMemory() prepare count underflow) after upgrading from v0.2.20 to v0.2.23

Summary

Environment

Panic details

Reproduction steps

What v0.2.23 changed (from source diff)

Analysis

Workaround

Related issues

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Kernel panic (IOGPUMemory completeMemory() prepare count underflow) after upgrading v0.2.20 → v0.2.23 #435

Description

Kernel panic (IOGPUMemory completeMemory() prepare count underflow) after upgrading from v0.2.20 to v0.2.23

Summary

Environment

Panic details

Reproduction steps

What v0.2.23 changed (from source diff)

Analysis

Workaround

Related issues

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions