Skip to content

phobicdotno/mempalace-gpu

 
 

Repository files navigation

mempalace-gpu

GPU-accelerated fork of milla-jovovich/mempalace

This fork adds GPU-accelerated embeddings and batch processing to MemPalace. Supports NVIDIA (CUDA), AMD (ROCm), and Apple Silicon (MPS). For documentation on MemPalace itself (palace structure, AAAK dialect, MCP tools, benchmarks), see the upstream README.


What this fork adds

GPU-accelerated embeddings

Embeddings are computed via sentence-transformers on GPU when available, falling back to ChromaDB's default CPU/ONNX model when not.

mempalace mine ~/myproject --device auto    # auto-detect best GPU
mempalace mine ~/myproject --device cuda    # NVIDIA
mempalace mine ~/myproject --device rocm    # AMD
mempalace mine ~/myproject --device mps     # Apple Silicon (M1-M5)
mempalace mine ~/myproject --device cpu     # force CPU

Also configurable via MEMPALACE_DEVICE env var or "device" in ~/.mempalace/config.json.

Batch processing

collection.add() calls are batched (100 documents per call instead of 1), reducing ChromaDB overhead regardless of CPU or GPU mode.

Self-update MCP tool

The MCP server includes a mempalace_self_update tool that pulls the latest version from PyPI, callable directly from your AI assistant.


Performance

Tested on two real-world codebases. GPU: NVIDIA GeForce RTX 4080 SUPER. Same files, same drawers — only the device changes.

Test Files Drawers Size CPU RTX 4080 SUPER Speedup
Large mixed codebase (JS/TS/Dart/Python/HTML) 118 13,673 ~1.7 GB 156.7s 26.3s 6.0x
Medium Flutter app (Dart/YAML/JSON) 145 2,906 ~85 MB 37.3s 10.7s 3.5x

Speedup scales with drawer count. More chunks = more embedding work = bigger GPU advantage. Results will vary by GPU — expect similar gains on any modern NVIDIA/AMD/Apple Silicon GPU.

Apple Silicon (M1)

Tested on MacBook M1. Key finding: CPU outperforms MPS for mining due to data transfer overhead with small embedding batches.

Test Files Drawers MPS (GPU) CPU Winner
~/Documents (mixed files) 500 1,239 5:48 6:09 MPS 1.06x
~/phobic (mixed files) 500 8,886 17:16 8:28 CPU 2.0x

MPS uses 12x less CPU (5-6% vs 60-69%), freeing the processor for other work — but wall-clock time is worse. The fork defaults to CPU on Apple Silicon for this reason. Use --device mps to override.

Full results: benchmarks/apple_m1_results.md


Installation

pip install mempalace-gpu
claude mcp add mempalace-gpu -- python -m mempalace.mcp_server

Restart Claude Code — mempalace-gpu appears in /plugin with all tools. Works on NVIDIA, AMD, and Apple Silicon — GPU is auto-detected.

AMD (ROCm) note

AMD GPUs need the ROCm version of PyTorch installed first:

pip install torch --index-url https://download.pytorch.org/whl/rocm6.2
pip install mempalace-gpu

Your data is safe

Installing or upgrading mempalace-gpu only replaces the Python code. Your mined data lives in ~/.mempalace/palace/ (ChromaDB files) and is never touched. Existing palaces remain fully compatible.

Development install

git clone https://github.com/phobicdotno/mempalace-gpu.git
cd mempalace-gpu
pip install -e .

Staying in sync with upstream

git remote add upstream https://github.com/milla-jovovich/mempalace.git
git fetch upstream
git merge upstream/main

Remote GPU server

Run your palace on a GPU machine and access it from any device over the network.

Server

pip install mempalace-gpu[serve]
mempalace serve --port 8420 --token <your-token> --device cuda

Client (MCP proxy)

On your local machine, add a proxy that forwards MCP calls to the remote server:

claude mcp add mempalace-remote \
  -e MEMPALACE_REMOTE_URL=http://<gpu-host>:8420 \
  -e MEMPALACE_TOKEN=<your-token> \
  -- python -m mempalace.mcp_proxy

The proxy speaks MCP stdio to Claude Code and HTTP to the server. All tool calls are forwarded transparently.

Endpoints

Method Path Auth Description
GET /health No Server status
GET /tools Yes List available tools
POST /tool/{name} Yes Call a tool with JSON body

Changes from upstream

File Change
mempalace/embeddings.py New -- GPU detection (NVIDIA/AMD/Apple), embedding factory, batch flush
mempalace/miner.py Batched collection.add(), content hashing, update() command
mempalace/convo_miner.py Batched collection.add()
mempalace/config.py device property (auto/cuda/rocm/mps/cpu)
mempalace/cli.py --device flag, update subcommand
mempalace/mcp_server.py mempalace_self_update tool, shared embeddings
mempalace/searcher.py Shared embedding function for vector compatibility
mempalace/layers.py Shared embedding function
mempalace/palace_graph.py Shared embedding function
mempalace/http_server.py New -- FastAPI HTTP server for remote GPU access
mempalace/mcp_proxy.py New -- MCP-to-HTTP proxy for remote palace access
pyproject.toml gpu optional dependency group

All other files are unmodified from upstream. Existing palaces remain compatible.


License

MIT -- same as upstream.

About

The highest-scoring AI memory system ever benchmarked. And it's free.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 98.4%
  • Shell 1.6%