mempalace-gpu

GPU-accelerated fork of milla-jovovich/mempalace

This fork adds GPU-accelerated embeddings and batch processing to MemPalace. Supports NVIDIA (CUDA), AMD (ROCm), and Apple Silicon (MPS). For documentation on MemPalace itself (palace structure, AAAK dialect, MCP tools, benchmarks), see the upstream README.

What this fork adds

GPU-accelerated embeddings

Embeddings are computed via sentence-transformers on GPU when available, falling back to ChromaDB's default CPU/ONNX model when not.

mempalace mine ~/myproject --device auto    # auto-detect best GPU
mempalace mine ~/myproject --device cuda    # NVIDIA
mempalace mine ~/myproject --device rocm    # AMD
mempalace mine ~/myproject --device mps     # Apple Silicon (M1-M5)
mempalace mine ~/myproject --device cpu     # force CPU

Also configurable via MEMPALACE_DEVICE env var or "device" in ~/.mempalace/config.json.

Batch processing

collection.add() calls are batched (100 documents per call instead of 1), reducing ChromaDB overhead regardless of CPU or GPU mode.

Self-update MCP tool

The MCP server includes a mempalace_self_update tool that pulls the latest version from PyPI, callable directly from your AI assistant.

Performance

Tested on two real-world codebases. GPU: NVIDIA GeForce RTX 4080 SUPER. Same files, same drawers — only the device changes.

Test	Files	Drawers	Size	CPU	RTX 4080 SUPER	Speedup
Large mixed codebase (JS/TS/Dart/Python/HTML)	118	13,673	~1.7 GB	156.7s	26.3s	6.0x
Medium Flutter app (Dart/YAML/JSON)	145	2,906	~85 MB	37.3s	10.7s	3.5x

Speedup scales with drawer count. More chunks = more embedding work = bigger GPU advantage. Results will vary by GPU — expect similar gains on any modern NVIDIA/AMD/Apple Silicon GPU.

Apple Silicon (M1)

Tested on MacBook M1. Key finding: CPU outperforms MPS for mining due to data transfer overhead with small embedding batches.

Test	Files	Drawers	MPS (GPU)	CPU	Winner
~/Documents (mixed files)	500	1,239	5:48	6:09	MPS 1.06x
~/phobic (mixed files)	500	8,886	17:16	8:28	CPU 2.0x

MPS uses 12x less CPU (5-6% vs 60-69%), freeing the processor for other work — but wall-clock time is worse. The fork defaults to CPU on Apple Silicon for this reason. Use --device mps to override.

Full results: benchmarks/apple_m1_results.md

Installation

pip install mempalace-gpu
claude mcp add mempalace-gpu -- python -m mempalace.mcp_server

Restart Claude Code — mempalace-gpu appears in /plugin with all tools. Works on NVIDIA, AMD, and Apple Silicon — GPU is auto-detected.

AMD (ROCm) note

AMD GPUs need the ROCm version of PyTorch installed first:

pip install torch --index-url https://download.pytorch.org/whl/rocm6.2
pip install mempalace-gpu

Your data is safe

Installing or upgrading mempalace-gpu only replaces the Python code. Your mined data lives in ~/.mempalace/palace/ (ChromaDB files) and is never touched. Existing palaces remain fully compatible.

Development install

git clone https://github.com/phobicdotno/mempalace-gpu.git
cd mempalace-gpu
pip install -e .

Staying in sync with upstream

git remote add upstream https://github.com/milla-jovovich/mempalace.git
git fetch upstream
git merge upstream/main

Remote GPU server

Run your palace on a GPU machine and access it from any device over the network.

Server

pip install mempalace-gpu[serve]
mempalace serve --port 8420 --token <your-token> --device cuda

Client (MCP proxy)

On your local machine, add a proxy that forwards MCP calls to the remote server:

claude mcp add mempalace-remote \
  -e MEMPALACE_REMOTE_URL=http://<gpu-host>:8420 \
  -e MEMPALACE_TOKEN=<your-token> \
  -- python -m mempalace.mcp_proxy

The proxy speaks MCP stdio to Claude Code and HTTP to the server. All tool calls are forwarded transparently.

Endpoints

Method	Path	Auth	Description
GET	/health	No	Server status
GET	/tools	Yes	List available tools
POST	/tool/{name}	Yes	Call a tool with JSON body

Changes from upstream

File	Change
`mempalace/embeddings.py`	New -- GPU detection (NVIDIA/AMD/Apple), embedding factory, batch flush
`mempalace/miner.py`	Batched `collection.add()`, content hashing, `update()` command
`mempalace/convo_miner.py`	Batched `collection.add()`
`mempalace/config.py`	`device` property (auto/cuda/rocm/mps/cpu)
`mempalace/cli.py`	`--device` flag, `update` subcommand
`mempalace/mcp_server.py`	`mempalace_self_update` tool, shared embeddings
`mempalace/searcher.py`	Shared embedding function for vector compatibility
`mempalace/layers.py`	Shared embedding function
`mempalace/palace_graph.py`	Shared embedding function
`mempalace/http_server.py`	New -- FastAPI HTTP server for remote GPU access
`mempalace/mcp_proxy.py`	New -- MCP-to-HTTP proxy for remote palace access
`pyproject.toml`	`gpu` optional dependency group

All other files are unmodified from upstream. Existing palaces remain compatible.

License

MIT -- same as upstream.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.github		.github
assets		assets
benchmarks		benchmarks
examples		examples
hooks		hooks
mempalace		mempalace
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mempalace-gpu

What this fork adds

GPU-accelerated embeddings

Batch processing

Self-update MCP tool

Performance

Apple Silicon (M1)

Installation

AMD (ROCm) note

Your data is safe

Development install

Staying in sync with upstream

Remote GPU server

Server

Client (MCP proxy)

Endpoints

Changes from upstream

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mempalace-gpu

What this fork adds

GPU-accelerated embeddings

Batch processing

Self-update MCP tool

Performance

Apple Silicon (M1)

Installation

AMD (ROCm) note

Your data is safe

Development install

Staying in sync with upstream

Remote GPU server

Server

Client (MCP proxy)

Endpoints

Changes from upstream

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages