Skip to content

feat: GPU-accelerated embeddings, batch processing, and incremental update#351

Closed
phobicdotno wants to merge 11 commits intoMemPalace:mainfrom
phobicdotno:main
Closed

feat: GPU-accelerated embeddings, batch processing, and incremental update#351
phobicdotno wants to merge 11 commits intoMemPalace:mainfrom
phobicdotno:main

Conversation

@phobicdotno
Copy link
Copy Markdown

Summary

  • GPU acceleration: New embeddings.py module provides CUDA-aware embedding via sentence-transformers when available, with graceful fallback to ChromaDB's default ONNX model (CPU)
  • Batch processing: collection.add() calls batched (100 docs per call instead of 1), dramatically reducing overhead for large directories
  • Incremental update: New mempalace update command detects new/changed/deleted files via content hashing and syncs the palace without full re-mine
  • Device selection: --device auto|cuda|cpu CLI flag, MEMPALACE_DEVICE env var, and config.json device property
  • Zero new required dependencies: GPU support is an optional extra (pip install mempalace[gpu]), base install unchanged

Changes

New files

  • mempalace/embeddings.py — Shared embedding function factory with device detection, collection wrapper, and batch flush
  • tests/test_embeddings.py — 6 tests for embeddings module

Modified files

  • mempalace/miner.py — Batch processing in mine(), content hashing, new update() function
  • mempalace/convo_miner.py — Batch processing in mine_convos()
  • mempalace/config.pydevice property (auto/cuda/cpu)
  • mempalace/cli.py--device flag, update subcommand
  • mempalace/searcher.py — Shared embedding function for vector compatibility
  • mempalace/mcp_server.py — Shared embedding function
  • mempalace/layers.py — Shared embedding function (5 sites)
  • mempalace/palace_graph.py — Shared embedding function
  • pyproject.tomlgpu optional dependency group
  • tests/test_config.py — Device config tests

Architecture notes

  • All ChromaDB collection access goes through embeddings.get_collection() to ensure embedding vector compatibility across mine/search/MCP
  • sentence-transformers all-MiniLM-L6-v2 produces identical vectors to ChromaDB's default ONNX model — existing palaces remain compatible
  • Follows project principles: local-first, zero API by default, verbatim storage

Test plan

  • All 19 tests pass (pytest tests/ -v)
  • Ruff format and lint clean
  • mempalace mine <dir> --device cuda uses GPU
  • mempalace mine <dir> --device cpu falls back to batched CPU
  • mempalace update <dir> detects new/changed/deleted files
  • mempalace search works with GPU-embedded vectors
  • Uninstall sentence-transformers → graceful fallback to ONNX default
  • No new required dependencies — base install unchanged

- Run ruff format across mempalace/ and tests/
- Fix multi-imports in test_config.py (split to separate lines)
- Fix unused variable in test_embeddings.py (add tautological assert)
- Add docstrings to all public functions in embeddings.py
- Use flush_batch() return value for total_drawers count in mine()
- Extract room from drawer metadata instead of double detect_room() call
- Skip collection creation during dry-run in update()
- Remove dead add_drawer() function from miner.py
- Cache resolved device instead of preference string in embeddings.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant