-
Notifications
You must be signed in to change notification settings - Fork 718
Crashes during large-context inference; server stops with Metal/MLX SIGABRT #520
Description
Bug: oMLX 0.3.0 server stops during large-context inference with MiniMax-M2.5-MLX-4bit, crashing in Metal/MLX backend on macOS Tahoe 26.3.1
Summary
oMLX 0.3.0 crashed at 2026-04-01 18:56:56 -0500 while Hermes was sending a large-context request to MiniMax-M2.5-MLX-4bit through the oMLX API. During the request, oMLX went offline, stopped answering API calls, and the oMLX app showed “server stopped.”
The macOS crash report shows this was a native SIGABRT in the Metal/MLX backend, not a normal Python exception.
Environment
- oMLX version: 0.3.0
- App: oMLX.app
- Process:
python3 - OS: macOS Tahoe 26.3.1
- Hardware: Mac Studio M3 Ultra, 256 GB unified memory
- Crash time: 2026-04-01 18:56:56 -0500
- Client: Hermes
- Model in use at crash time:
MiniMax-M2.5-MLX-4bit
What I was doing
Hermes was processing a large-context request against the oMLX API using MiniMax-M2.5-MLX-4bit.
At the time of the crash:
- oMLX was actively serving inference
- Hermes was waiting on the API response
- oMLX suddenly went offline
- subsequent API calls failed
- the oMLX app reported that the server stopped
Expected behavior
oMLX should continue serving the request or fail gracefully with an error response, without terminating the server.
Actual behavior
The oMLX server stopped during inference and became unavailable to API clients. The process crashed rather than returning a handled error.
Crash details
The macOS crash report shows:
- Signal:
SIGABRT/Abort trap: 6 - Relevant stack frames:
__assert_rtnMTLReportFailure-[IOGPUMetalCommandBuffer encodeSignalEvent:value:]-[AGXG15XFamilyCommandBuffer encodeSignalEvent:value:]mlx::core::Event::signal(...)mlx::core::eval_impl(...)
This appears to be an assertion failure in Apple Metal triggered during MLX event signaling / synchronization.
Reproduction
I do not yet have a minimal deterministic repro, but the observed scenario was:
- Start oMLX 0.3.0
- Serve
MiniMax-M2.5-MLX-4bitover the API - Send a large-context request from Hermes
- During inference, oMLX stops and the app reports server stopped
Frequency
Observed several times before on previous oMLX version
Why this seems important
This is not just a bad model response or API timeout. The server process itself terminated during inference, which makes oMLX unavailable to clients until manually restarted.
Possible area to investigate
The crash stack suggests a backend issue related to MLX/Metal synchronization during evaluation of a large request, possibly around:
mlx::core::Event::signal(...)mlx::core::eval_impl(...)- Metal command buffer event signaling via
encodeSignalEvent:value: