Skip to content

Background mempalace mine pins 400–500 % CPU — ORT intra_op pool ignores OMP env vars #1068

@sha2fiddy

Description

@sha2fiddy

What happened?

On a multi-core host, every mempalace mine run pins 4–5 cores (peak sustained 372–463 % in ps). Stacked Stop-hook fires from concurrent Claude Code sessions make the machine unusable.

The cost isn't in mempalace's Python — miner.py and convo_miner.py are sequential for loops (no multiprocessing, no torch, no manual hnswlib calls). The CPU sits inside a single collection.upsert(...).

ChromaDB's default ONNXMiniLM_L6_V2 builds its ONNX Runtime InferenceSession with no SessionOptions, so ORT's intra_op pool defaults to ≈physical-core-count workers. OMP_NUM_THREADS / MKL_NUM_THREADS / OPENBLAS_NUM_THREADS are inert against this pool — ORT owns its own. Same class of silent no-op as ORT_DISABLE_COREML (#397 / #653).

What did you expect?

A user-settable cap on the threads a background mine is allowed to use, with a safe default so a fresh install on a 10-core host doesn't turn every Stop-hook fire into a thermal event.

How to reproduce:

  1. mempalace 3.3.0 (any 3.x with chromadb 1.5.x default embedder) on a multi-core host.
  2. Mine ≥500 JSONL transcripts (~1360 drawers):
    nice -n 10 ~/.local/bin/mempalace mine <project-dir>
  3. Sample CPU + threads:
    ps -o pid,%cpu,command -p "$(pgrep -f 'mempalace mine')"
    ps -M "$(pgrep -f 'mempalace mine')"   # count onnxruntime::ThreadPo workers
  4. Setting OMP_NUM_THREADS=2 MKL_NUM_THREADS=2 OPENBLAS_NUM_THREADS=2 before step 2 changes nothing.

Benchmarks (M-series Mac, 10 cores, 500 files / 1360 drawers):

Run Peak %CPU Hot ORT workers Wall
Uncapped (stock) 372–463 8 54 s
Isolated upsert() + SessionOptions(intra_op=2, inter_op=1) 62.9 1 47.5 s
Full mine with cap, fresh palace 188 1 107 s
Full mine with cap, live Stop-hook fire, real palace 201 (sustained 197–201 / ~12 s) 1

Proposed fix:

Build the session with explicit SessionOptions (intra_op_num_threads=cap, inter_op_num_threads=1) and a pinned CPUExecutionProvider. Controlled by an env var (default 2; sentinel value to disable). Also set hnsw:num_threads=<cap> on collection create — covers #974's hnswlib side.

Correct override point: @cached_property model on ONNXMiniLM_L6_V2. _init_model_and_tokenizer is silently a no-op (no such method on 3.3.0).

PR to follow.

Related:

Environment:

  • OS: macOS 15.x (Darwin 25.4.0), Apple Silicon, 10 cores
  • Python: 3.12
  • MemPal version: 3.3.0 (develop HEAD 1b00f93)
  • ChromaDB: 1.5.8 (reproducible 1.5.4+)
  • ONNX Runtime: 1.24.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/miningFile and conversation miningbugSomething isn't workingperformancePerformance improvements

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions