nginx for AI agents.
You have a fleet of AI agents -- researchers, coders, reviewers, planners -- all making API calls to Claude, GPT-4, and other models. Without a gateway, you have:
- No visibility into which agent is spending how much
- No budget controls -- a runaway agent can burn through your API credits in minutes
- No rate limiting per agent -- one noisy agent starves the others
- No circuit breakers -- one failing provider cascades errors everywhere
- No audit trail -- good luck debugging what happened at 3am
agentgateway sits between your agents and the LLM providers. It proxies every request, tracks every token, enforces budgets, and gives you a real-time dashboard to see exactly what your fleet is doing.
npm install -g agentgatewayCreate a gateway.yaml:
gateway:
port: 8080
agents:
researcher:
model: claude-sonnet-4-20250514
provider: anthropic
budget:
max_cost_per_hour: 5.00
rate_limit: 100/min
writer:
model: claude-haiku-4-5-20251001
provider: anthropic
budget:
max_cost_per_hour: 1.00
rate_limit: 50/min
dashboard:
port: 4040Start the gateway:
agentgateway start -c gateway.yamlPoint your agents at the gateway instead of the API directly:
# Before (direct to Anthropic)
curl https://api.anthropic.com/v1/messages ...
# After (through agentgateway)
curl http://localhost:8080/v1/messages \
-H "x-agent-id: researcher" \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-d '{"model": "claude-sonnet-4-20250514", "messages": [...]}'| Command | Description |
|---|---|
agentgateway start |
Start the proxy server |
agentgateway status |
Show status of all agents |
agentgateway logs |
Stream audit logs |
agentgateway logs -f |
Follow logs in real-time |
agentgateway pause <agent> |
Pause an agent |
agentgateway pause <agent> -r |
Resume a paused agent |
agentgateway budget <agent> |
View agent budget |
agentgateway budget <agent> --hourly 10 |
Update hourly budget |
agentgateway dashboard |
Open web dashboard |
agentgateway
┌──────────┐ ┌─────────────────────────────────────────────┐
│ Agent 1 │────▶│ │
│ researcher│ │ ┌───────────┐ ┌──────────┐ ┌─────────┐ │ ┌──────────┐
└──────────┘ │ │ Rate │ │ Budget │ │ Circuit │ │────▶│ Anthropic│
│ │ Limiter │─▶│ Tracker │─▶│ Breaker │ │ └──────────┘
┌──────────┐ │ └───────────┘ └──────────┘ └─────────┘ │
│ Agent 2 │────▶│ │
│ coder │ │ ┌───────────┐ ┌──────────┐ ┌─────────┐ │ ┌──────────┐
└──────────┘ │ │ Policy │ │ Audit │ │ Proxy │ │────▶│ OpenAI │
│ │ Engine │ │ Logger │ │ Layer │ │ └──────────┘
┌──────────┐ │ └───────────┘ └──────────┘ └─────────┘ │
│ Agent 3 │────▶│ │
│ writer │ │ ┌──────────────────┐ │
└──────────┘ │ │ Dashboard │ │
│ │ :4040 (WS) │ │
│ └──────────────────┘ │
└─────────────────────────────────────────────┘
Set hourly, daily, and total cost limits per agent. Agents are automatically paused when they exceed their budget.
Each agent gets its own rate limiter. Configure requests per second, minute, hour, or day. Burst-friendly token bucket algorithm.
Full state machine (CLOSED -> OPEN -> HALF_OPEN -> CLOSED) with configurable error thresholds, reset timeouts, and sliding window error tracking.
Define rules in YAML that evaluate against agent state:
policies:
- name: high-error-rate
condition: "error_rate > 0.5"
action: pause
- name: runaway-spend
condition: "cost_this_hour > 50.0"
action: denyFull passthrough of Server-Sent Events with mid-stream token counting. Your agents get streaming responses with zero added latency.
Dark-themed web dashboard showing agent status, costs, call rates, errors, and circuit breaker states. Updates via WebSocket every second.
Every request logged with: agent ID, model, provider, tokens (in/out), cost, latency, and HTTP status. Query by agent, time range, or export as JSON.
Route different agents to different providers. Run your planner on Claude Opus, your coder on GPT-4o, and your summarizer on Haiku -- all through one gateway.
gateway:
port: 8080 # Proxy server port
host: 0.0.0.0 # Bind address
agents:
<agent-id>:
model: string # Default model for this agent
provider: string # "anthropic" or "openai"
api_key: string # Optional per-agent API key
budget:
max_cost_per_hour: number
max_cost_per_day: number # Optional
max_cost_total: number # Optional lifetime cap
rate_limit: string # Format: "<count>/<unit>" (e.g. "100/min")
circuit_breaker:
error_threshold: number # Errors before tripping (default: 5)
reset_timeout: number # Ms before retry (default: 30000)
half_open_requests: number # Successes to close (default: 3)
tags: string[] # Optional labels
policies:
- name: string
condition: string # e.g. "error_rate > 0.5"
action: string # "pause", "deny", "alert", "throttle"
params: object # Action-specific parameters
dashboard:
port: 4040
host: 0.0.0.0
audit:
log_file: string # Path to JSONL audit log
max_entries: number # Max in-memory entries| Field | Type | Description |
|---|---|---|
error_rate |
number | Error ratio in current hour (0-1) |
cost_this_hour |
number | Dollar cost in current hour |
cost_today |
number | Dollar cost today |
total_cost |
number | Lifetime dollar cost |
calls_this_hour |
number | API calls in current hour |
call_count |
number | Lifetime API calls |
error_count |
number | Lifetime errors |
status |
string | Agent status (active/paused/idle) |
The gateway exposes management endpoints alongside the proxy:
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/api/status |
GET | All agent statuses |
/api/agents/:id/pause |
POST | Pause an agent |
/api/agents/:id/resume |
POST | Resume an agent |
/api/budget/:id |
PUT | Update agent budget |
/api/logs |
GET | Recent audit entries |
# Clone and install
git clone https://github.com/tobySolutions/agentgateway.git
cd agentgateway
npm install
# Build all packages
npm run build
# Development mode (watch)
npm run devMIT