Local AI compute control plane for Claude Code and coding agents.
subagent-fleet turns your Macs, GPUs, and Ollama backends into one intelligent compute fleet for coding agents — routing subagents to the right model and machine by role, with real-time health monitoring, model warmup, and execution tracing.
Quickstart • Configuration • Examples • Generated Files • Security • Roadmap
Local model users often have more than one useful machine: a laptop, a Mac mini, a workstation, a home server, or a spare GPU box. Most coding harnesses still point at one model endpoint.
subagent-fleet sits above Ollama and LiteLLM as the control plane — routing, monitoring, warming, and tracing your local subagent fleet:
Claude Code / coding harness
|
v
subagent-fleet control plane
(routing · health · warmup · traces)
|
+-- Ollama node: laptop -> planner, summarizer
+-- Ollama node: Mac mini 64GB -> implementer, reviewer
+-- Ollama node: workstation -> implementer, reviewer
subagent-fleet is a compute control plane for local LLMs. It doesn't replace Ollama or LiteLLM — it sits above them as an intelligent layer that routes, monitors, warms, and traces agent work across your fleet:
┌─────────────────────────────── Layer 1: Topology ───────────────────────────────┐
│ │
│ Your fleet.yaml defines the topology — nodes (machines), models (LLMs), │
│ and agents (roles). This is your single source of truth. │
│ │
│ nodes: models: agents: │
│ macbook-pro small-coder planner -> fast, planning │
│ mac-mini-64gb heavy-coder implementer -> large, coding │
│ gpu-workstation batch-summarizer reviewer -> review, safety │
│ summarizer -> summary, docs │
│ │
└─────────────────────────────── Layer 2: Generation ─────────────────────────────┘
│ │
│ subagent-fleet generate produces: │
│ • litellm_config.yaml — a proxy that routes requests to the right node │
│ • .claude/agents/*.md — per-agent definitions Claude Code uses for tools │
│ • .env.subagent-fleet — env vars pointing your harness at the local gateway │
│ │
└─────────────────────────────── Layer 3: Runtime ─────────────────────────────┘
│ │
│ While Claude Code runs, subagent-fleet tracks everything: │
│ • Node health — who's online, who dropped │
│ • Agent routes — which model each agent is hitting │
│ • Execution traces — full LiteLLM log tail, colored by severity │
│ • Warmup progress — preload status for models before a session starts │
│ │
│ All of this streams live to the SSE dashboard at http://localhost:8080 │
│ │
└──────────────────────────────────────────────────────────────────────────────────────┘
A real-time screenshot from the Fleet Dashboard (running against a 3-node example fleet):
Most local model setups have one machine, one model, one problem. When you hit limits — slow coding on a small model, GPU underutilized for batch tasks, no visibility into agent routing — you either buy bigger hardware or accept slower output.
subagent-fleet solves this with role-based routing:
| Problem | How subagent-fleet fixes it |
|---|---|
| Single point of failure | Multiple nodes, automatic failover when one drops offline |
| One model for all tasks | Fast model for planning (cheap), large model for coding (capable) |
| No visibility into agent work | SSE dashboard shows real-time node health, routing, and traces |
| Cold models slow startup | subagent-fleet warmup preloads models so they're ready before your session starts |
| Wasted GPU capacity on batch tasks | Offload summarization and docs to the biggest model, keep planning models lightweight |
The fleet dashboard (subagent-fleet ui) is your operational center — one browser tab to monitor everything:
And it adapts to any screen size:
Every release ships with 482 tests (270 unit + 212 live cluster evals) that run against a real 3-node fleet on the developer's machines. This isn't toy testing — every test hits actual Ollama endpoints and validates LLM responses from your own hardware.
| Category | What It Proves | Tests |
|---|---|---|
| Node Discovery | Healthy nodes report models; offline nodes don't crash the fleet | 15 |
| LiteLLM Routing | heavy-coder routes to mac-mini, small-planner routes to laptop |
30 |
| Agent Config Generation | Frontmatter, tool lists, model aliases — all match fleet.yaml exactly | 40 |
| Claude Agents Config | Planners get read-only tools; implementers get Edit, Bash, MultiEdit | 40 |
| Aider Config | Model strings and API base point to the correct local gateway | 15 |
| Model Warmup | Preload works for empty responses and minimal prompts on all nodes | 20 |
| Fleet Validation | Bad YAML, missing fields, invalid ports, unsafe agent names — all rejected | 30 |
| Security & Edge Cases | Malformed JSON, oversized models, unknown fields — gracefully handled | 25 |
| Dashboard / SSE | HTTP endpoints return correct JSON; static files load; node status streams live | 35 |
| Prompt Quality | Math addition, code review, classification, multi-step reasoning across all 3 nodes | 137 |
We run these evals every release against our own production-like fleet:
┌──────────────┬─────────────────────┬──────────┐
│ Node │ Model │ Role │
├──────────────┼─────────────────────┼──────────┤
│ laptop │ qwen3.6:35b-mlx │ planner │
│ mac-mini-64b │ qwen3-coder:latest │ coder │
│ mac-mini-16g │ gemma4:latest │ planner │
└──────────────┴─────────────────────┴──────────┘
The prompt evals alone verify that each node in your fleet can actually do its assigned job — math on the small model, code review on the heavy one. If a node goes offline mid-test, it fails. No mocks. No stubs.
Run the eval suite locally:
cd src
python -m pytest tests/evals/ --tb=shortWe ran a head-to-head coding eval: 8 real coding tasks (bug fixes, an LRU cache, a rate limiter, N+1 query fixes, FastAPI endpoints, and more), sent to the local fleet (routed through the LiteLLM gateway) and to Claude Sonnet 5 and GPT-4o-mini (via OpenRouter), then scored blind by an LLM judge on a 0-10 rubric (correctness, code quality, completeness).
| System | Mean Score | Mean Latency (s) | Total Cost (USD) |
|-------------|-----------:|------------------:|------------------:|
| fleet | 8.38 | 17.47 | $0.0000 |
| sonnet-5 | 8.88 | 5.22 | $0.0077 |
| gpt-4o-mini | 7.50 | 3.47 | $0.0004 |
The fleet scored 94% of Sonnet 5's quality at $0 marginal cost, and beat GPT-4o-mini's mean score outright. It passed 7 of 8 individual prompts against the "within 80% of best frontier score" bar — the one miss (pytest_unit_tests, generating edge-case test coverage) is a genuine, specific gap the eval surfaced rather than a fluke.
Full per-prompt results (score, latency, cost per model per prompt): docs/evals/frontier-comparison-2026-06-30.json.
Run it yourself against your own fleet:
export OPENROUTER_API_KEY=<your-key>
export LITELLM_MASTER_KEY=<your-fleet-master-key>
litellm --config ./litellm_config.yaml & # start your fleet's gateway
python -m pytest tests/evals/test_frontier_comparison_live.py --run-live -v -s- Monitor node health in real time — unreachable nodes are isolated automatically.
- Route subagents by role: planner → fast model, implementer → large coding model.
- Warm models before workflows start, with live dashboard progress.
- Stream and trace LiteLLM execution logs in real time.
- Generate LiteLLM and Claude Code agent configuration from
fleet.yaml. - Validate, discover, and inspect your fleet with a single command.
Run subagent-fleet ui to open a live-updating dashboard that monitors your fleet in real time via SSE:
The dashboard shows three things at a glance:
| Panel | What it does |
|---|---|
| Node Health | Real-time online/offline status per node, with discovered Ollama models and endpoints. |
| Agent Routing | Which agent maps to which model on which node — instantly visible. |
| Live Trace Stream | Tail LiteLLM logs as they happen, colored by severity (routing → success → error). |
| Model Warmup Progress | Track preload status when you run subagent-fleet warmup before a coding session. |
Responsive layout adapts to any screen size:
Start the dashboard in one command — point it at your fleet config:
subagent-fleet ui --config fleet.yaml
# Opens http://localhost:8080v0.2.0 — control-plane release.
Available commands:
subagent-fleet init
subagent-fleet validate
subagent-fleet discover
subagent-fleet generate
subagent-fleet warmup
subagent-fleet status
subagent-fleet doctor
subagent-fleet clean
subagent-fleet ui
subagent-fleet trace
subagent-fleet skills list
subagent-fleet skills install
subagent-fleet plugins installChoose one of the install paths below.
Install the CLI directly from PyPI:
python -m pip install subagent-fleetOr install it as an isolated command with pipx:
pipx install subagent-fleetVerify:
subagent-fleet --helpUse this when contributing to the project:
git clone https://github.com/adityak74/subagent-fleet.git
cd subagent-fleet
python -m pip install -e ".[dev]"Run tests:
python -m pytestInstall the plugin first from Claude Code, then let the bundled bootstrap skill install the CLI:
/plugin marketplace add https://github.com/adityak74/subagent-fleet
/plugin install subagent-fleet
After install, ask Claude Code:
Use the subagent-fleet bootstrap skill to install the CLI and set up this repo.
The bootstrap skill will run or recommend:
python -m pip install subagent-fleet
subagent-fleet skills installInstall this repository as a local Codex marketplace:
codex plugin marketplace add .
codex plugin add subagent-fleet@subagent-fleetThen ask Codex:
Use the subagent-fleet bootstrap skill to install the CLI and set up this repo.
Create a starter config:
subagent-fleet initEdit fleet.yaml with your Ollama node endpoints and model names, then validate it:
subagent-fleet validateCheck which nodes are reachable:
subagent-fleet discoverGenerate LiteLLM, Claude agent, and environment files:
subagent-fleet generateStart LiteLLM:
export LITELLM_MASTER_KEY="sk-local-dev"
litellm \
--config ./litellm_config.yaml \
--host 127.0.0.1 \
--port 4000Point Claude Code at the local gateway:
source .env.subagent-fleet
claudesubagent-fleet is driven by fleet.yaml.
project:
name: local-dev
gateway:
provider: litellm
host: 127.0.0.1
port: 4000
master_key_env: LITELLM_MASTER_KEY
nodes:
m5-local:
endpoint: http://localhost:11434
tags: [controller, local, fast]
m4-mini-64gb:
endpoint: http://192.168.1.50:11434
tags: [heavy, coder, reviewer]
m4-mini-16gb:
endpoint: http://192.168.1.51:11434
tags: [small, planner, summarizer]
models:
heavy-coder:
node: m4-mini-64gb
ollama_model: qwen2.5-coder:32b
litellm_alias: claude-sonnet-local
context: 32768
timeout: 600
max_parallel: 1
small-coder:
node: m4-mini-16gb
ollama_model: qwen2.5-coder:7b
litellm_alias: claude-haiku-local
context: 8192
timeout: 300
max_parallel: 1
agents:
planner:
model: small-coder
description: Use for planning, file discovery, task decomposition, and summarization.
tools: [Read, Grep, Glob]
prompt: |
You are a fast local planning agent.
Do not edit files.
Return a concise response with:
- plan
- relevant files
- risks
- next recommended agent
implementer:
model: heavy-coder
description: Use for implementation, bug fixes, refactors, and patch creation.
tools: [Read, Grep, Glob, Edit, MultiEdit, Bash]
reviewer:
model: heavy-coder
description: Use after implementation to review diffs, tests, regressions, and maintainability.
tools: [Read, Grep, Glob, Bash]Running:
subagent-fleet generatecreates:
litellm_config.yaml
.claude/agents/planner.md
.claude/agents/implementer.md
.claude/agents/reviewer.md
.env.subagent-fleet
Example LiteLLM route:
model_list:
- model_name: claude-sonnet-local
litellm_params:
model: ollama_chat/qwen2.5-coder:32b
api_base: http://192.168.1.50:11434
api_key: ollama
timeout: 600
model_info:
max_input_tokens: 32768Example Claude agent:
---
name: planner
description: Use for planning, file discovery, task decomposition, and summarization.
model: claude-haiku-local
tools: Read, Grep, Glob
---
You are a fast local planning agent.
Do not edit files.
Return a concise response with:
- plan
- relevant files
- risks
- next recommended agentReady-to-use fleet configurations in examples/:
| Directory | What it shows |
|---|---|
ollama-laptop-only/ |
Single-machine setup — everything on one laptop running Ollama |
multi-node-cluster/ |
Three-node fleet — laptop + Mac mini (64GB) + GPU workstation |
litellm-proxy/ |
Generated LiteLLM config showing model-to-node routing |
claude-agents/ |
Claude Code agent definitions generated from a fleet YAML |
Quick start with the multi-node example:
subagent-fleet validate --config examples/multi-node-cluster/fleet.yaml
subagent-fleet generate --config examples/multi-node-cluster/fleet.yaml
subagent-fleet ui --config examples/multi-node-cluster/fleet.yaml| Command | Purpose |
|---|---|
subagent-fleet init |
Create a starter fleet.yaml. |
subagent-fleet validate |
Validate schema, references, URLs, aliases, and agent names. |
subagent-fleet discover |
Query configured Ollama nodes for available models. |
subagent-fleet generate |
Generate LiteLLM config, Claude agents, and env file. |
subagent-fleet warmup |
Preload configured Ollama models with keep_alive. |
subagent-fleet status |
Show node health and agent routing. |
subagent-fleet doctor |
Show validation and local-network safety guidance. |
subagent-fleet clean |
List or remove generated files. |
subagent-fleet skills list |
List bundled assistant skills and supported targets. |
subagent-fleet skills install |
Install assistant-facing setup and operations skills. |
subagent-fleet plugins install |
Install Claude Code and Codex plugin marketplace bundles. |
JSON output is available for discovery and status:
subagent-fleet discover --json
subagent-fleet status --jsonsubagent-fleet ships assistant-facing skills that teach Claude Code, Codex, OpenCode, and similar tools how to set up and operate the fleet from inside a repository.
List bundled skills and supported targets:
subagent-fleet skills listInstall all bundled skills for all supported targets:
subagent-fleet skills installThis writes:
.claude/skills/subagent-fleet-setup/SKILL.md
.claude/skills/subagent-fleet-operations/SKILL.md
.codex/skills/subagent-fleet-setup/SKILL.md
.codex/skills/subagent-fleet-operations/SKILL.md
.opencode/skills/subagent-fleet-setup/SKILL.md
.opencode/skills/subagent-fleet-operations/SKILL.md
Install for a specific assistant:
subagent-fleet skills install --target codex
subagent-fleet skills install --target claude-code
subagent-fleet skills install --target opencodeInstall one bundled skill:
subagent-fleet skills install --skill subagent-fleet-setupExisting skill files are not overwritten unless you pass --force.
This repository also ships plugin marketplace metadata so users can install the assistant skill first, then let that skill install and verify the Python CLI.
Included plugin artifacts:
.claude-plugin/marketplace.json
.agents/plugins/marketplace.json
plugins/subagent-fleet/.claude-plugin/plugin.json
plugins/subagent-fleet/.codex-plugin/plugin.json
plugins/subagent-fleet/skills/subagent-fleet-bootstrap/SKILL.md
plugins/subagent-fleet/skills/subagent-fleet-setup/SKILL.md
plugins/subagent-fleet/skills/subagent-fleet-operations/SKILL.md
The bootstrap skill teaches Claude Code or Codex how to install the CLI:
python -m pip install subagent-fleetand then install repo-local assistant skills:
subagent-fleet skills installClaude Code plugin install flow:
/plugin marketplace add https://github.com/adityak74/subagent-fleet
/plugin install subagent-fleet
Codex local marketplace flow:
codex plugin marketplace add .
codex plugin add subagent-fleet@subagent-fleetTo generate the same marketplace/plugin bundle into another directory:
subagent-fleet plugins install --out /path/to/marketplace-rootInstall only one target:
subagent-fleet plugins install --target claude-code
subagent-fleet plugins install --target codexExisting plugin marketplace files are not overwritten unless you pass --force.
On each worker machine, run Ollama on a private interface reachable from your controller:
launchctl setenv OLLAMA_HOST "0.0.0.0:11434"
launchctl setenv OLLAMA_KEEP_ALIVE "-1"
launchctl setenv OLLAMA_NUM_PARALLEL "1"
launchctl setenv OLLAMA_MAX_LOADED_MODELS "1"
killall Ollama
open -a OllamaFrom the controller:
curl http://NODE_IP:11434/api/tagssubagent-fleet assumes private local networking.
Do:
- Use LAN, firewall rules, Tailscale, WireGuard, or a private subnet.
- Keep
LITELLM_MASTER_KEYset for LiteLLM access. - Treat generated
.env.subagent-fleetfiles as local developer configuration.
Do not:
- Expose Ollama directly to the public internet.
- Expose LiteLLM without authentication.
- Commit real API keys, LAN secrets, or machine-specific private
.envfiles.
Run:
subagent-fleet doctorfor local setup and safety reminders.
Install dev dependencies:
python -m pip install -e ".[dev]"Run tests:
python -m pytestRun a focused test:
python -m pytest tests/test_config.pyCheck CLI wiring:
python -m subagent_fleet.cli --helpsrc/subagent_fleet/
cli.py
config.py
discovery.py
plugins.py
warmup.py
status.py
skills.py
generators/
skill_templates/
templates/
examples/
plugins/
tests/
v0.1 — config-first fleet manager:
-
fleet.yamlschema and validation - Ollama node health checks and isolation
- Ollama model discovery via
/api/tags - LiteLLM config generation
- Claude Code agent generation
- Environment file generation
- Model warmup with
keep_alive - Status and routing tables
v0.2 — control-plane release:
- Real-time SSE dashboard (node health, routing, warmup, trace stream)
- Execution trace viewer (
subagent-fleet trace) - Generative UI dashboard (
subagent-fleet ui) - Aider target support
v0.3 — live router:
- Latency benchmarking and node ranking
- Recommended agent-to-node assignment
- Tailscale-aware node discovery
- Fallback model generation
v0.4+ — scheduler and integrations:
- Queue-aware scheduling
- Dynamic routing by task type
- vLLM, LM Studio, llama.cpp, OpenRouter support
Issues and pull requests are welcome.
Good first areas:
- More generator tests
- Additional example fleets
- Better status formatting
- More robust Ollama error reporting
- Documentation for real multi-machine setups
Before opening a PR:
python -m pytestsubagent-fleet is not:
- an inference engine
- a replacement for Ollama
- a replacement for LiteLLM
- a model sharding framework
- Kubernetes for local LLMs
- a public model hosting platform
It is a small workflow layer for private local subagent orchestration.
MIT. See LICENSE.


