What happened?
On a multi-core host, every mempalace mine run pins 4–5 cores (peak sustained 372–463 % in ps). Stacked Stop-hook fires from concurrent Claude Code sessions make the machine unusable.
The cost isn't in mempalace's Python — miner.py and convo_miner.py are sequential for loops (no multiprocessing, no torch, no manual hnswlib calls). The CPU sits inside a single collection.upsert(...).
ChromaDB's default ONNXMiniLM_L6_V2 builds its ONNX Runtime InferenceSession with no SessionOptions, so ORT's intra_op pool defaults to ≈physical-core-count workers. OMP_NUM_THREADS / MKL_NUM_THREADS / OPENBLAS_NUM_THREADS are inert against this pool — ORT owns its own. Same class of silent no-op as ORT_DISABLE_COREML (#397 / #653).
What did you expect?
A user-settable cap on the threads a background mine is allowed to use, with a safe default so a fresh install on a 10-core host doesn't turn every Stop-hook fire into a thermal event.
How to reproduce:
- mempalace 3.3.0 (any 3.x with chromadb 1.5.x default embedder) on a multi-core host.
- Mine ≥500 JSONL transcripts (~1360 drawers):
nice -n 10 ~/.local/bin/mempalace mine <project-dir>
- Sample CPU + threads:
ps -o pid,%cpu,command -p "$(pgrep -f 'mempalace mine')"
ps -M "$(pgrep -f 'mempalace mine')" # count onnxruntime::ThreadPo workers
- Setting
OMP_NUM_THREADS=2 MKL_NUM_THREADS=2 OPENBLAS_NUM_THREADS=2 before step 2 changes nothing.
Benchmarks (M-series Mac, 10 cores, 500 files / 1360 drawers):
| Run |
Peak %CPU |
Hot ORT workers |
Wall |
| Uncapped (stock) |
372–463 |
8 |
54 s |
Isolated upsert() + SessionOptions(intra_op=2, inter_op=1) |
62.9 |
1 |
47.5 s |
Full mine with cap, fresh palace |
188 |
1 |
107 s |
Full mine with cap, live Stop-hook fire, real palace |
201 (sustained 197–201 / ~12 s) |
1 |
— |
Proposed fix:
Build the session with explicit SessionOptions (intra_op_num_threads=cap, inter_op_num_threads=1) and a pinned CPUExecutionProvider. Controlled by an env var (default 2; sentinel value to disable). Also set hnsw:num_threads=<cap> on collection create — covers #974's hnswlib side.
Correct override point: @cached_property model on ONNXMiniLM_L6_V2. _init_model_and_tokenizer is silently a no-op (no such method on 3.3.0).
PR to follow.
Related:
Environment:
- OS: macOS 15.x (Darwin 25.4.0), Apple Silicon, 10 cores
- Python: 3.12
- MemPal version: 3.3.0 (
develop HEAD 1b00f93)
- ChromaDB: 1.5.8 (reproducible 1.5.4+)
- ONNX Runtime: 1.24.4
What happened?
On a multi-core host, every
mempalace minerun pins 4–5 cores (peak sustained 372–463 % inps). Stacked Stop-hook fires from concurrent Claude Code sessions make the machine unusable.The cost isn't in mempalace's Python —
miner.pyandconvo_miner.pyare sequentialforloops (no multiprocessing, no torch, no manualhnswlibcalls). The CPU sits inside a singlecollection.upsert(...).ChromaDB's default
ONNXMiniLM_L6_V2builds its ONNX RuntimeInferenceSessionwith noSessionOptions, so ORT's intra_op pool defaults to ≈physical-core-count workers.OMP_NUM_THREADS/MKL_NUM_THREADS/OPENBLAS_NUM_THREADSare inert against this pool — ORT owns its own. Same class of silent no-op asORT_DISABLE_COREML(#397 / #653).What did you expect?
A user-settable cap on the threads a background mine is allowed to use, with a safe default so a fresh install on a 10-core host doesn't turn every Stop-hook fire into a thermal event.
How to reproduce:
OMP_NUM_THREADS=2 MKL_NUM_THREADS=2 OPENBLAS_NUM_THREADS=2before step 2 changes nothing.Benchmarks (M-series Mac, 10 cores, 500 files / 1360 drawers):
upsert()+SessionOptions(intra_op=2, inter_op=1)minewith cap, fresh palaceminewith cap, live Stop-hook fire, real palaceProposed fix:
Build the session with explicit
SessionOptions(intra_op_num_threads=cap,inter_op_num_threads=1) and a pinnedCPUExecutionProvider. Controlled by an env var (default 2; sentinel value to disable). Also sethnsw:num_threads=<cap>on collection create — covers #974's hnswlib side.Correct override point:
@cached_property modelonONNXMiniLM_L6_V2._init_model_and_tokenizeris silently a no-op (no such method on 3.3.0).PR to follow.
Related:
hnsw:num_threadson create. Same fix on the hnswlib side.Environment:
developHEAD1b00f93)