Skip to content

adityak74/subagent-fleet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

subagent-fleet

Local AI compute control plane for Claude Code and coding agents.

subagent-fleet turns your Macs, GPUs, and Ollama backends into one intelligent compute fleet for coding agents — routing subagents to the right model and machine by role, with real-time health monitoring, model warmup, and execution tracing.

GitHub Repo stars License: MIT Python CLI Ollama LiteLLM GitHub last commit GitHub issues

QuickstartConfigurationExamplesGenerated FilesSecurityRoadmap

Overview

Local model users often have more than one useful machine: a laptop, a Mac mini, a workstation, a home server, or a spare GPU box. Most coding harnesses still point at one model endpoint.

subagent-fleet sits above Ollama and LiteLLM as the control plane — routing, monitoring, warming, and tracing your local subagent fleet:

Claude Code / coding harness
            |
        v
  subagent-fleet control plane
       (routing · health · warmup · traces)
            |
            +-- Ollama node: laptop             -> planner, summarizer
            +-- Ollama node: Mac mini 64GB -> implementer, reviewer
            +-- Ollama node: workstation        -> implementer, reviewer

How It Works

subagent-fleet is a compute control plane for local LLMs. It doesn't replace Ollama or LiteLLM — it sits above them as an intelligent layer that routes, monitors, warms, and traces agent work across your fleet:

┌───────────────────────────────  Layer 1: Topology  ───────────────────────────────┐
│                                                                                     │
│  Your fleet.yaml defines the topology — nodes (machines), models (LLMs),          │
│  and agents (roles). This is your single source of truth.                          │
│                                                                                     │
│  nodes:                models:               agents:                               │
│    macbook-pro         small-coder           planner     -> fast, planning          │
│    mac-mini-64gb       heavy-coder           implementer -> large, coding           │
│    gpu-workstation     batch-summarizer      reviewer    -> review, safety         │
│                                      summarizer   -> summary, docs                 │
│                                                                                     │
└───────────────────────────────  Layer 2: Generation  ─────────────────────────────┘
│                                                                                     │
│  subagent-fleet generate produces:                                                  │
│    • litellm_config.yaml   — a proxy that routes requests to the right node        │
│    • .claude/agents/*.md — per-agent definitions Claude Code uses for tools        │
│    • .env.subagent-fleet — env vars pointing your harness at the local gateway     │
│                                                                                     │
└───────────────────────────────  Layer 3: Runtime     ─────────────────────────────┘
│                                                                                     │
│  While Claude Code runs, subagent-fleet tracks everything:                         │
│    • Node health — who's online, who dropped                                     │
│    • Agent routes  — which model each agent is hitting                            │
│    • Execution traces — full LiteLLM log tail, colored by severity              │
│    • Warmup progress — preload status for models before a session starts          │
│                                                                                     │
│  All of this streams live to the SSE dashboard at http://localhost:8080             │
│                                                                                     │
└──────────────────────────────────────────────────────────────────────────────────────┘

A real-time screenshot from the Fleet Dashboard (running against a 3-node example fleet):

Fleet Dashboard

Why This Matters

Most local model setups have one machine, one model, one problem. When you hit limits — slow coding on a small model, GPU underutilized for batch tasks, no visibility into agent routing — you either buy bigger hardware or accept slower output.

subagent-fleet solves this with role-based routing:

Problem How subagent-fleet fixes it
Single point of failure Multiple nodes, automatic failover when one drops offline
One model for all tasks Fast model for planning (cheap), large model for coding (capable)
No visibility into agent work SSE dashboard shows real-time node health, routing, and traces
Cold models slow startup subagent-fleet warmup preloads models so they're ready before your session starts
Wasted GPU capacity on batch tasks Offload summarization and docs to the biggest model, keep planning models lightweight

The fleet dashboard (subagent-fleet ui) is your operational center — one browser tab to monitor everything:

Fleet Dashboard

And it adapts to any screen size:

Mobile view

Verified With Evals

Every release ships with 482 tests (270 unit + 212 live cluster evals) that run against a real 3-node fleet on the developer's machines. This isn't toy testing — every test hits actual Ollama endpoints and validates LLM responses from your own hardware.

Eval Suite Breakdown

Category What It Proves Tests
Node Discovery Healthy nodes report models; offline nodes don't crash the fleet 15
LiteLLM Routing heavy-coder routes to mac-mini, small-planner routes to laptop 30
Agent Config Generation Frontmatter, tool lists, model aliases — all match fleet.yaml exactly 40
Claude Agents Config Planners get read-only tools; implementers get Edit, Bash, MultiEdit 40
Aider Config Model strings and API base point to the correct local gateway 15
Model Warmup Preload works for empty responses and minimal prompts on all nodes 20
Fleet Validation Bad YAML, missing fields, invalid ports, unsafe agent names — all rejected 30
Security & Edge Cases Malformed JSON, oversized models, unknown fields — gracefully handled 25
Dashboard / SSE HTTP endpoints return correct JSON; static files load; node status streams live 35
Prompt Quality Math addition, code review, classification, multi-step reasoning across all 3 nodes 137

Real-World Validation Against Our Fleet

We run these evals every release against our own production-like fleet:

┌──────────────┬─────────────────────┬──────────┐
│ Node         │ Model               │ Role     │
├──────────────┼─────────────────────┼──────────┤
│ laptop       │ qwen3.6:35b-mlx     │ planner  │
│ mac-mini-64b │ qwen3-coder:latest  │ coder    │
│ mac-mini-16g │ gemma4:latest       │ planner  │
└──────────────┴─────────────────────┴──────────┘

The prompt evals alone verify that each node in your fleet can actually do its assigned job — math on the small model, code review on the heavy one. If a node goes offline mid-test, it fails. No mocks. No stubs.

Run the eval suite locally:

cd src
python -m pytest tests/evals/ --tb=short

Fleet vs. Frontier Models

We ran a head-to-head coding eval: 8 real coding tasks (bug fixes, an LRU cache, a rate limiter, N+1 query fixes, FastAPI endpoints, and more), sent to the local fleet (routed through the LiteLLM gateway) and to Claude Sonnet 5 and GPT-4o-mini (via OpenRouter), then scored blind by an LLM judge on a 0-10 rubric (correctness, code quality, completeness).

| System      | Mean Score | Mean Latency (s) | Total Cost (USD) |
|-------------|-----------:|------------------:|------------------:|
| fleet       |       8.38 |             17.47 |            $0.0000 |
| sonnet-5    |       8.88 |              5.22 |            $0.0077 |
| gpt-4o-mini |       7.50 |              3.47 |            $0.0004 |

The fleet scored 94% of Sonnet 5's quality at $0 marginal cost, and beat GPT-4o-mini's mean score outright. It passed 7 of 8 individual prompts against the "within 80% of best frontier score" bar — the one miss (pytest_unit_tests, generating edge-case test coverage) is a genuine, specific gap the eval surfaced rather than a fluke.

Full per-prompt results (score, latency, cost per model per prompt): docs/evals/frontier-comparison-2026-06-30.json.

Run it yourself against your own fleet:

export OPENROUTER_API_KEY=<your-key>
export LITELLM_MASTER_KEY=<your-fleet-master-key>
litellm --config ./litellm_config.yaml &   # start your fleet's gateway
python -m pytest tests/evals/test_frontier_comparison_live.py --run-live -v -s

Features

  • Monitor node health in real time — unreachable nodes are isolated automatically.
  • Route subagents by role: planner → fast model, implementer → large coding model.
  • Warm models before workflows start, with live dashboard progress.
  • Stream and trace LiteLLM execution logs in real time.
  • Generate LiteLLM and Claude Code agent configuration from fleet.yaml.
  • Validate, discover, and inspect your fleet with a single command.

Interactive Fleet Dashboard

Run subagent-fleet ui to open a live-updating dashboard that monitors your fleet in real time via SSE:

Fleet Dashboard

The dashboard shows three things at a glance:

Panel What it does
Node Health Real-time online/offline status per node, with discovered Ollama models and endpoints.
Agent Routing Which agent maps to which model on which node — instantly visible.
Live Trace Stream Tail LiteLLM logs as they happen, colored by severity (routing → success → error).
Model Warmup Progress Track preload status when you run subagent-fleet warmup before a coding session.

Responsive layout adapts to any screen size:

Mobile view

Start the dashboard in one command — point it at your fleet config:

subagent-fleet ui --config fleet.yaml
# Opens http://localhost:8080

Status

v0.2.0 — control-plane release.

Available commands:

subagent-fleet init
subagent-fleet validate
subagent-fleet discover
subagent-fleet generate
subagent-fleet warmup
subagent-fleet status
subagent-fleet doctor
subagent-fleet clean
subagent-fleet ui
subagent-fleet trace
subagent-fleet skills list
subagent-fleet skills install
subagent-fleet plugins install

Install

Choose one of the install paths below.

CLI from GitHub

Install the CLI directly from PyPI:

python -m pip install subagent-fleet

Or install it as an isolated command with pipx:

pipx install subagent-fleet

Verify:

subagent-fleet --help

Development Checkout

Use this when contributing to the project:

git clone https://github.com/adityak74/subagent-fleet.git
cd subagent-fleet
python -m pip install -e ".[dev]"

Run tests:

python -m pytest

Claude Code Plugin First

Install the plugin first from Claude Code, then let the bundled bootstrap skill install the CLI:

/plugin marketplace add https://github.com/adityak74/subagent-fleet
/plugin install subagent-fleet

After install, ask Claude Code:

Use the subagent-fleet bootstrap skill to install the CLI and set up this repo.

The bootstrap skill will run or recommend:

python -m pip install subagent-fleet
subagent-fleet skills install

Codex Plugin First

Install this repository as a local Codex marketplace:

codex plugin marketplace add .
codex plugin add subagent-fleet@subagent-fleet

Then ask Codex:

Use the subagent-fleet bootstrap skill to install the CLI and set up this repo.

Quickstart

Create a starter config:

subagent-fleet init

Edit fleet.yaml with your Ollama node endpoints and model names, then validate it:

subagent-fleet validate

Check which nodes are reachable:

subagent-fleet discover

Generate LiteLLM, Claude agent, and environment files:

subagent-fleet generate

Start LiteLLM:

export LITELLM_MASTER_KEY="sk-local-dev"

litellm \
  --config ./litellm_config.yaml \
  --host 127.0.0.1 \
  --port 4000

Point Claude Code at the local gateway:

source .env.subagent-fleet
claude

Configuration

subagent-fleet is driven by fleet.yaml.

project:
  name: local-dev
  gateway:
    provider: litellm
    host: 127.0.0.1
    port: 4000
    master_key_env: LITELLM_MASTER_KEY

nodes:
  m5-local:
    endpoint: http://localhost:11434
    tags: [controller, local, fast]

  m4-mini-64gb:
    endpoint: http://192.168.1.50:11434
    tags: [heavy, coder, reviewer]

  m4-mini-16gb:
    endpoint: http://192.168.1.51:11434
    tags: [small, planner, summarizer]

models:
  heavy-coder:
    node: m4-mini-64gb
    ollama_model: qwen2.5-coder:32b
    litellm_alias: claude-sonnet-local
    context: 32768
    timeout: 600
    max_parallel: 1

  small-coder:
    node: m4-mini-16gb
    ollama_model: qwen2.5-coder:7b
    litellm_alias: claude-haiku-local
    context: 8192
    timeout: 300
    max_parallel: 1

agents:
  planner:
    model: small-coder
    description: Use for planning, file discovery, task decomposition, and summarization.
    tools: [Read, Grep, Glob]
    prompt: |
      You are a fast local planning agent.
      Do not edit files.
      Return a concise response with:
      - plan
      - relevant files
      - risks
      - next recommended agent

  implementer:
    model: heavy-coder
    description: Use for implementation, bug fixes, refactors, and patch creation.
    tools: [Read, Grep, Glob, Edit, MultiEdit, Bash]

  reviewer:
    model: heavy-coder
    description: Use after implementation to review diffs, tests, regressions, and maintainability.
    tools: [Read, Grep, Glob, Bash]

Generated Files

Running:

subagent-fleet generate

creates:

litellm_config.yaml
.claude/agents/planner.md
.claude/agents/implementer.md
.claude/agents/reviewer.md
.env.subagent-fleet

Example LiteLLM route:

model_list:
  - model_name: claude-sonnet-local
    litellm_params:
      model: ollama_chat/qwen2.5-coder:32b
      api_base: http://192.168.1.50:11434
      api_key: ollama
      timeout: 600
    model_info:
      max_input_tokens: 32768

Example Claude agent:

---
name: planner
description: Use for planning, file discovery, task decomposition, and summarization.
model: claude-haiku-local
tools: Read, Grep, Glob
---

You are a fast local planning agent.
Do not edit files.
Return a concise response with:
- plan
- relevant files
- risks
- next recommended agent

Examples

Ready-to-use fleet configurations in examples/:

Directory What it shows
ollama-laptop-only/ Single-machine setup — everything on one laptop running Ollama
multi-node-cluster/ Three-node fleet — laptop + Mac mini (64GB) + GPU workstation
litellm-proxy/ Generated LiteLLM config showing model-to-node routing
claude-agents/ Claude Code agent definitions generated from a fleet YAML

Quick start with the multi-node example:

subagent-fleet validate --config examples/multi-node-cluster/fleet.yaml
subagent-fleet generate --config examples/multi-node-cluster/fleet.yaml
subagent-fleet ui --config examples/multi-node-cluster/fleet.yaml

Commands

Command Purpose
subagent-fleet init Create a starter fleet.yaml.
subagent-fleet validate Validate schema, references, URLs, aliases, and agent names.
subagent-fleet discover Query configured Ollama nodes for available models.
subagent-fleet generate Generate LiteLLM config, Claude agents, and env file.
subagent-fleet warmup Preload configured Ollama models with keep_alive.
subagent-fleet status Show node health and agent routing.
subagent-fleet doctor Show validation and local-network safety guidance.
subagent-fleet clean List or remove generated files.
subagent-fleet skills list List bundled assistant skills and supported targets.
subagent-fleet skills install Install assistant-facing setup and operations skills.
subagent-fleet plugins install Install Claude Code and Codex plugin marketplace bundles.

JSON output is available for discovery and status:

subagent-fleet discover --json
subagent-fleet status --json

Assistant Skills

subagent-fleet ships assistant-facing skills that teach Claude Code, Codex, OpenCode, and similar tools how to set up and operate the fleet from inside a repository.

List bundled skills and supported targets:

subagent-fleet skills list

Install all bundled skills for all supported targets:

subagent-fleet skills install

This writes:

.claude/skills/subagent-fleet-setup/SKILL.md
.claude/skills/subagent-fleet-operations/SKILL.md
.codex/skills/subagent-fleet-setup/SKILL.md
.codex/skills/subagent-fleet-operations/SKILL.md
.opencode/skills/subagent-fleet-setup/SKILL.md
.opencode/skills/subagent-fleet-operations/SKILL.md

Install for a specific assistant:

subagent-fleet skills install --target codex
subagent-fleet skills install --target claude-code
subagent-fleet skills install --target opencode

Install one bundled skill:

subagent-fleet skills install --skill subagent-fleet-setup

Existing skill files are not overwritten unless you pass --force.

Plugin Marketplaces

This repository also ships plugin marketplace metadata so users can install the assistant skill first, then let that skill install and verify the Python CLI.

Included plugin artifacts:

.claude-plugin/marketplace.json
.agents/plugins/marketplace.json
plugins/subagent-fleet/.claude-plugin/plugin.json
plugins/subagent-fleet/.codex-plugin/plugin.json
plugins/subagent-fleet/skills/subagent-fleet-bootstrap/SKILL.md
plugins/subagent-fleet/skills/subagent-fleet-setup/SKILL.md
plugins/subagent-fleet/skills/subagent-fleet-operations/SKILL.md

The bootstrap skill teaches Claude Code or Codex how to install the CLI:

python -m pip install subagent-fleet

and then install repo-local assistant skills:

subagent-fleet skills install

Claude Code plugin install flow:

/plugin marketplace add https://github.com/adityak74/subagent-fleet
/plugin install subagent-fleet

Codex local marketplace flow:

codex plugin marketplace add .
codex plugin add subagent-fleet@subagent-fleet

To generate the same marketplace/plugin bundle into another directory:

subagent-fleet plugins install --out /path/to/marketplace-root

Install only one target:

subagent-fleet plugins install --target claude-code
subagent-fleet plugins install --target codex

Existing plugin marketplace files are not overwritten unless you pass --force.

Ollama Worker Setup

On each worker machine, run Ollama on a private interface reachable from your controller:

launchctl setenv OLLAMA_HOST "0.0.0.0:11434"
launchctl setenv OLLAMA_KEEP_ALIVE "-1"
launchctl setenv OLLAMA_NUM_PARALLEL "1"
launchctl setenv OLLAMA_MAX_LOADED_MODELS "1"

killall Ollama
open -a Ollama

From the controller:

curl http://NODE_IP:11434/api/tags

Security

subagent-fleet assumes private local networking.

Do:

  • Use LAN, firewall rules, Tailscale, WireGuard, or a private subnet.
  • Keep LITELLM_MASTER_KEY set for LiteLLM access.
  • Treat generated .env.subagent-fleet files as local developer configuration.

Do not:

  • Expose Ollama directly to the public internet.
  • Expose LiteLLM without authentication.
  • Commit real API keys, LAN secrets, or machine-specific private .env files.

Run:

subagent-fleet doctor

for local setup and safety reminders.

Development

Install dev dependencies:

python -m pip install -e ".[dev]"

Run tests:

python -m pytest

Run a focused test:

python -m pytest tests/test_config.py

Check CLI wiring:

python -m subagent_fleet.cli --help

Project Layout

src/subagent_fleet/
  cli.py
  config.py
  discovery.py
  plugins.py
  warmup.py
  status.py
  skills.py
  generators/
  skill_templates/
  templates/

examples/
plugins/
tests/

Roadmap

v0.1 — config-first fleet manager:

  • fleet.yaml schema and validation
  • Ollama node health checks and isolation
  • Ollama model discovery via /api/tags
  • LiteLLM config generation
  • Claude Code agent generation
  • Environment file generation
  • Model warmup with keep_alive
  • Status and routing tables

v0.2 — control-plane release:

  • Real-time SSE dashboard (node health, routing, warmup, trace stream)
  • Execution trace viewer (subagent-fleet trace)
  • Generative UI dashboard (subagent-fleet ui)
  • Aider target support

v0.3 — live router:

  • Latency benchmarking and node ranking
  • Recommended agent-to-node assignment
  • Tailscale-aware node discovery
  • Fallback model generation

v0.4+ — scheduler and integrations:

  • Queue-aware scheduling
  • Dynamic routing by task type
  • vLLM, LM Studio, llama.cpp, OpenRouter support

Star History

Star History Chart

Contributing

Issues and pull requests are welcome.

Good first areas:

  • More generator tests
  • Additional example fleets
  • Better status formatting
  • More robust Ollama error reporting
  • Documentation for real multi-machine setups

Before opening a PR:

python -m pytest

What This Is Not

subagent-fleet is not:

  • an inference engine
  • a replacement for Ollama
  • a replacement for LiteLLM
  • a model sharding framework
  • Kubernetes for local LLMs
  • a public model hosting platform

It is a small workflow layer for private local subagent orchestration.

License

MIT. See LICENSE.

Packages

 
 
 

Contributors