Mesh LLM

Mesh LLM lets you pool spare GPU capacity across machines and expose the result as one OpenAI-compatible API.

If a model fits on one machine, it runs there. If it does not, Mesh LLM automatically spreads the work across the mesh:

Dense models use pipeline parallelism.
MoE models use expert sharding with zero cross-node inference traffic.
Every node gets the same local API at http://localhost:9337/v1.

Why people use it

Run models larger than a single machine can hold.
Turn a few uneven boxes into one shared inference pool.
Give agents a local OpenAI-compatible endpoint instead of wiring each tool by hand.
Keep the setup simple: start one node, add more later.

Quick start

Install the latest release:

curl -fsSL https://raw.githubusercontent.com/michaelneale/mesh-llm/main/install.sh | bash

Then start a node:

mesh-llm --auto

That command:

picks a suitable bundled backend for your machine
downloads a model if needed
joins the best public mesh
exposes an OpenAI-compatible API at http://localhost:9337/v1
starts the web console at http://localhost:3131

Check what is available:

curl -s http://localhost:9337/v1/models | jq '.data[].id'

Send a request:

curl http://localhost:9337/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"GLM-4.7-Flash-Q4_K_M","messages":[{"role":"user","content":"hello"}]}'

Common workflows

1. Try the public mesh

mesh-llm --auto

This is the easiest way to see the system working end to end.

2. Start a private mesh

mesh-llm --model Qwen2.5-32B

This starts serving a model, opens the local API and console, and prints an invite token for other machines.

If you want the mesh to be discoverable via --auto, publish it:

mesh-llm --model Qwen2.5-32B --publish

3. Add another machine

mesh-llm --join <token>

Use --client if the machine should join without serving a model:

mesh-llm --client --join <token>

4. Create a named mesh for a group

mesh-llm --auto --model GLM-4.7-Flash-Q4_K_M --mesh-name "poker-night"

Everyone runs the same command. The first node creates the mesh, the rest discover and join it automatically.

5. Serve more than one model

mesh-llm --model Qwen2.5-32B --model GLM-4.7-Flash

Requests are routed by the model field:

curl localhost:9337/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"GLM-4.7-Flash-Q4_K_M","messages":[{"role":"user","content":"hello"}]}'

How it works

Mesh LLM keeps the user-facing surface simple: talk to localhost:9337, pick a model, and let the mesh decide how to serve it.

If a model fits on one machine, it runs there with no network overhead.
If a dense model does not fit, layers are split across low-latency peers.
If an MoE model does not fit, experts are split across nodes and requests are hash-routed for cache locality.
Different nodes can serve different models at the same time.

Each node also exposes a management API and web console on port 3131.

Install notes

The installer currently targets macOS and Linux release bundles. Windows coming soon.

To force a specific bundled flavor during install:

curl -fsSL https://raw.githubusercontent.com/michaelneale/mesh-llm/main/install.sh | MESH_LLM_INSTALL_FLAVOR=vulkan bash

Installed release bundles use flavor-specific llama.cpp binaries:

macOS: metal
Linux: cpu, cuda, rocm, vulkan

To update a bundle install to the latest release:

mesh-llm update

If you build from source, always use just:

git clone https://github.com/michaelneale/mesh-llm
cd mesh-llm
just build

Requirements and backend-specific build notes are in CONTRIBUTING.md.

Web console

When a node is running, open:

http://localhost:3131

The console shows live topology, VRAM usage, loaded models, and built-in chat. It is backed by /api/status and /api/events.

You can also try the hosted demo:

mesh-llm-console.fly.dev

More docs

docs/USAGE.md for service installs, model commands, storage, and runtime control
docs/AGENTS.md for Goose, Claude Code, pi, OpenCode, curl, and blackboard usage
docs/BENCHMARKS.md for benchmark numbers and context
CONTRIBUTING.md for local development and build workflows
PLUGINS.md for the plugin system and blackboard internals
mesh-llm/README.md for Rust crate structure
ROADMAP.md for future work

Community

Join the #mesh-llm channel on the Goose Discord for discussion and support.

Name		Name	Last commit message	Last commit date
Latest commit History 834 Commits
.github		.github
.skills/deploy		.skills/deploy
benchmarks		benchmarks
ci		ci
dist		dist
docs		docs
evals		evals
fly		fly
mesh-llm		mesh-llm
relay		relay
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
.nvmrc		.nvmrc
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Justfile		Justfile
PLUGINS.md		PLUGINS.md
README.md		README.md
RELEASE.md		RELEASE.md
ROADMAP.md		ROADMAP.md
install.sh		install.sh
mesh.png		mesh.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mesh LLM

Why people use it

Quick start

Common workflows

1. Try the public mesh

2. Start a private mesh

3. Add another machine

4. Create a named mesh for a group

5. Serve more than one model

How it works

Install notes

Web console

More docs

Community

About

Uh oh!

Releases 80

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mesh LLM

Why people use it

Quick start

Common workflows

1. Try the public mesh

2. Start a private mesh

3. Add another machine

4. Create a named mesh for a group

5. Serve more than one model

How it works

Install notes

Web console

More docs

Community

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 80

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages