Skip to content

Crashes during large-context inference; server stops with Metal/MLX SIGABRT #520

@zipzagster

Description

@zipzagster

Bug: oMLX 0.3.0 server stops during large-context inference with MiniMax-M2.5-MLX-4bit, crashing in Metal/MLX backend on macOS Tahoe 26.3.1

Summary

oMLX 0.3.0 crashed at 2026-04-01 18:56:56 -0500 while Hermes was sending a large-context request to MiniMax-M2.5-MLX-4bit through the oMLX API. During the request, oMLX went offline, stopped answering API calls, and the oMLX app showed “server stopped.”

The macOS crash report shows this was a native SIGABRT in the Metal/MLX backend, not a normal Python exception.

Environment

  • oMLX version: 0.3.0
  • App: oMLX.app
  • Process: python3
  • OS: macOS Tahoe 26.3.1
  • Hardware: Mac Studio M3 Ultra, 256 GB unified memory
  • Crash time: 2026-04-01 18:56:56 -0500
  • Client: Hermes
  • Model in use at crash time: MiniMax-M2.5-MLX-4bit

What I was doing

Hermes was processing a large-context request against the oMLX API using MiniMax-M2.5-MLX-4bit.

At the time of the crash:

  • oMLX was actively serving inference
  • Hermes was waiting on the API response
  • oMLX suddenly went offline
  • subsequent API calls failed
  • the oMLX app reported that the server stopped

Expected behavior

oMLX should continue serving the request or fail gracefully with an error response, without terminating the server.

Actual behavior

The oMLX server stopped during inference and became unavailable to API clients. The process crashed rather than returning a handled error.

Crash details

The macOS crash report shows:

  • Signal: SIGABRT / Abort trap: 6
  • Relevant stack frames:
    • __assert_rtn
    • MTLReportFailure
    • -[IOGPUMetalCommandBuffer encodeSignalEvent:value:]
    • -[AGXG15XFamilyCommandBuffer encodeSignalEvent:value:]
    • mlx::core::Event::signal(...)
    • mlx::core::eval_impl(...)

This appears to be an assertion failure in Apple Metal triggered during MLX event signaling / synchronization.

Reproduction

I do not yet have a minimal deterministic repro, but the observed scenario was:

  1. Start oMLX 0.3.0
  2. Serve MiniMax-M2.5-MLX-4bit over the API
  3. Send a large-context request from Hermes
  4. During inference, oMLX stops and the app reports server stopped

Frequency

Observed several times before on previous oMLX version

Why this seems important

This is not just a bad model response or API timeout. The server process itself terminated during inference, which makes oMLX unavailable to clients until manually restarted.

Possible area to investigate

The crash stack suggests a backend issue related to MLX/Metal synchronization during evaluation of a large request, possibly around:

  • mlx::core::Event::signal(...)
  • mlx::core::eval_impl(...)
  • Metal command buffer event signaling via encodeSignalEvent:value:

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs timenot a quick fix - requires time to dig into

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions