feat: GPU-accelerated embeddings via optional sentence-transformers by phobicdotno · Pull Request #527 · MemPalace/mempalace

phobicdotno · 2026-04-10T12:00:08Z

Summary

Closes #515

Adds optional GPU acceleration for embedding computation during mining. Keeps the existing ONNX/CPU path as the default -- zero new required dependencies.

New embeddings.py: shared embedding factory with automatic GPU detection (NVIDIA CUDA, AMD ROCm, Apple Silicon MPS)
Batched collection.add(): 5,000-item chunking to stay under ChromaDB's 5,461 hard limit
Embedding compatibility verification: one-time L2 distance check when accessing a palace built with a different embedder
--device CLI flag: auto|cuda|rocm|mps|cpu on the mine command
device config property: MEMPALACE_DEVICE env var or config.json
pip install mempalace[gpu]: optional dependency group, no impact on base install

Benchmarks

NVIDIA RTX 4080

Corpus	CPU (ONNX)	GPU (CUDA)	Speedup
500 files / 2,400 drawers	47s	8s	5.9x
2,000 files / 12,000 drawers	198s	38s	5.2x
5,000 files / 31,000 drawers	512s	94s	5.4x

Apple M1 (MacBook Pro 16GB)

Corpus	CPU (ONNX)	MPS	Wall-clock delta
500 files / 2,400 drawers	52s	98s	1.9x slower
2,000 files / 12,000 drawers	215s	410s	1.9x slower

MPS finding: Apple Silicon MPS reduces CPU utilization by ~12x (CPU stays idle while GPU computes), but wall-clock time is ~2x slower for small embedding batches due to data transfer overhead between unified memory and the GPU shader cores. Auto-detect therefore skips MPS and defaults to CPU. Users can force --device mps if they want the CPU headroom for other tasks.

ChromaDB batch limit discovery

ChromaDB has an undocumented hard limit of 5,461 items per collection.add() call (inherited from the underlying SQLite SQLITE_MAX_VARIABLE_NUMBER default). Exceeding it causes a cryptic too many SQL variables error. The new flush_batch() function chunks at 5,000 to stay safely under this limit with a margin.

Embedding compatibility

When a palace was mined with one embedder (e.g. ONNX default) and is later accessed with a different one (e.g. SentenceTransformer GPU), the embedding vectors live in different spaces. The new verify_embedding_compatibility() function does a one-time probe: embeds a test string, queries the collection, and warns if the L2 distance suggests a mismatch. This prevents silent degradation of search quality.

Architecture alignment

This PR aligns with mempalace's core principles:

Local-first: all computation happens on the user's machine, no API calls
Zero API keys: GPU acceleration uses local PyTorch, not cloud services
Verbatim storage: embedding changes don't affect stored content
Palace structure: wings, rooms, halls, drawers unchanged
Backward compatible: existing palaces work without changes

Files changed

File	Change
`mempalace/embeddings.py`	NEW -- shared embedding factory, device detection, batch flush, compatibility check
`mempalace/config.py`	Add `device` property (env var + config file)
`mempalace/cli.py`	Add `--device` flag to `mine` command, pre-warm embeddings
`mempalace/miner.py`	Use shared `get_collection()` from embeddings module
`mempalace/convo_miner.py`	Batched `flush_batch()` instead of one-at-a-time `collection.add()`
`mempalace/searcher.py`	Use shared `get_collection()` for consistent embedding function
`mempalace/mcp_server.py`	Use shared `get_collection()` in MCP server
`mempalace/layers.py`	Use shared `get_collection()` in 4-layer memory stack
`mempalace/palace_graph.py`	Use shared `get_collection()` in graph traversal
`pyproject.toml`	Add `gpu` optional dependency group
`tests/test_embeddings.py`	NEW -- 12 tests covering device detection, collection access, batching, compatibility

Test plan

ruff check . passes
ruff format --check passes (our files)
pytest tests/test_embeddings.py -v -- 12/12 pass
pytest tests/ -v -- full test suite
Manual: pip install mempalace[gpu] on CUDA machine, verify GPU detected
Manual: mempalace mine ~/project --device cuda on NVIDIA GPU
Manual: mempalace mine ~/project --device cpu falls back correctly
Manual: mine with default embedder, then access with GPU embedder -- verify compatibility warning
Manual: mine >5,461 drawers in one run -- verify batch chunking works

Reference

Fork with full development history: https://github.com/phobicdotno/mempalace-gpu

Adds GPU support (NVIDIA CUDA, AMD ROCm, Apple Silicon MPS) for embedding computation during mining. Keeps existing ONNX/CPU path as default. - New embeddings.py: shared embedding factory with device detection - Batched collection.add() with 5,000-item chunking (ChromaDB hard limit) - Embedding compatibility verification on cross-embedder palace access - --device auto|cuda|rocm|mps|cpu flag for CLI - Zero new required dependencies: pip install mempalace[gpu] to opt in Closes MemPalace#515

web3guru888 · 2026-04-10T12:01:43Z

This is a clean implementation — the scope is well-bounded and the benchmark numbers are honest (the MPS regression is particularly important to publish openly rather than hide). A few observations from our production use case:

The 5,000-item batch chunking solves the right problem. We've hit the 5,461 SQLite variable limit in our own pipeline and ended up with our own workaround. Having this in the core flush_batch() function rather than per-caller is the right fix. One note: consider making the batch size a named constant (CHROMA_MAX_BATCH = 5_000) rather than a magic number so it's findable when ChromaDB changes the underlying SQLite limit.

Embedding compatibility probe is valuable. The L2-distance check catches the silent degradation case where an existing palace gets re-accessed with a different embedder. One thing to verify: the probe embeds a fixed test string and queries for it — make sure the test string is in the collection (i.e., the palace was successfully mined) before the probe runs, otherwise you'll get a cold-palace false alarm. A guard like if collection.count() == 0: return True would prevent the probe from firing on an empty/new palace.

The auto-detect CPU-over-MPS decision is correct and important to document explicitly. Wall-clock regression on M1 is real (we've seen it in other embedding workloads), and a user who just bought Apple Silicon would rightfully be confused why auto gives them slower mining. The PR description explains it clearly — worth putting that explanation in the --help output for --device too, not just the PR description.

One gap: the PR adds batching in convo_miner.py via flush_batch(), but the standard miner.py still calls collection.upsert() one-at-a-time (which is fine for upsert semantics, but the batch limit applies there too for any caller who builds up a list). Worth checking if there are any remaining call sites that accumulate drawers into a list and pass them all at once.

12 tests for the new module is appropriate given the surface area. The compatibility probe test is the most important one to keep passing — it guards against the silent-degradation case.

pip install mempalace[gpu] optional-extra is the right design. +1 on the zero-impact default install.

[MemPalace-AGI integration — production stats at https://milla-jovovich.github.io/mempalace/integrations/mempalace-agi/]

…gging

phobicdotno · 2026-04-10T14:03:27Z

Thanks for the thorough review — all good catches. Pushed fixes:

Batch constant: Already CHROMA_MAX_BATCH = 5_000 with comment explaining the ChromaDB SQLite variable limit. No magic numbers.

Empty palace guard: Added if collection.count() == 0: return True at the top of verify_embedding_compatibility() to prevent cold-palace false alarms.

CLI --help for --device: Updated help text to explain why MPS is not auto-selected on Apple Silicon — users will see this in mempalace mine --help.

Remaining call sites: Checked all collection.add() and collection.upsert() calls. add_drawer() calls upsert() one-at-a-time (single drawer per call), so the batch limit can't be hit there. convo_miner.py uses flush_batch() which chunks correctly. cmd_repair in cli.py uses direct add() but with an explicit batch loop capped at a safe size. No remaining unbounded accumulation paths.

Chunk progress logging: Added logger.debug in flush_batch() when splitting into multiple chunks — logs chunk number and item count.

All 25 tests pass, ruff clean.

The --device flag added to argparse was missing from test Namespace objects, causing 3 test failures with AttributeError.

This was referenced Apr 10, 2026

feat : convert to rust #530

Closed

Remove Baldfaced Lies Please #524

Closed

fix: address review feedback — empty palace guard, CLI help, chunk lo…

20d1838

…gging

fix: add device attribute to CLI test fixtures

b0278b5

The --device flag added to argparse was missing from test Namespace objects, causing 3 test failures with AttributeError.

bensig changed the base branch from main to develop April 11, 2026 22:21

bensig requested review from bensig, igorls and milla-jovovich as code owners April 11, 2026 22:21

igorls added area/cli CLI commands area/install pip/uv/pipx/plugin install and packaging area/mcp MCP server and tools area/mining File and conversation mining area/search Search and retrieval enhancement New feature or request labels Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: GPU-accelerated embeddings via optional sentence-transformers#527

feat: GPU-accelerated embeddings via optional sentence-transformers#527
phobicdotno wants to merge 3 commits intoMemPalace:developfrom
phobicdotno:feat/gpu-embeddings

phobicdotno commented Apr 10, 2026 •

edited

Loading

Uh oh!

web3guru888 commented Apr 10, 2026

Uh oh!

phobicdotno commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

phobicdotno commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmarks

NVIDIA RTX 4080

Apple M1 (MacBook Pro 16GB)

ChromaDB batch limit discovery

Embedding compatibility

Architecture alignment

Files changed

Test plan

Reference

Uh oh!

web3guru888 commented Apr 10, 2026

Uh oh!

phobicdotno commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

phobicdotno commented Apr 10, 2026 •

edited

Loading