Skip to content

feat: Agent Registry channel plugin (Pilot Protocol P2P)#16

Open
guiramos wants to merge 67 commits intoworkfrom
feat/agent-registry-channel
Open

feat: Agent Registry channel plugin (Pilot Protocol P2P)#16
guiramos wants to merge 67 commits intoworkfrom
feat/agent-registry-channel

Conversation

@guiramos
Copy link
Copy Markdown

@guiramos guiramos commented Mar 29, 2026

Agent Registry Channel Plugin

Full OpenClaw channel plugin for P2P agent discovery via Agent Registry.

Architecture

  • discovery.ts — AgentRegistryClient: register, deregister, search, heartbeat, listAgents
  • monitor.ts — Pilot Protocol subscriber (readline JSON stream, exponential backoff)
  • send.ts — Pilot Protocol sender (pilotctl send)
  • daemon.ts — Pilot daemon lifecycle management
  • config.ts — Channel config adapter (hostname, registryUrl, registryApiKey, pilotPort)
  • index.ts — Full ChannelPlugin wiring: outbound, gateway methods, heartbeat, status

Code Review Fixes (10 issues)

  1. ✅ Capabilities query param: append() loop for repeated params
  2. isConfigured: checks hostname not registryUrl
  3. ✅ Deregister on stopAccount via registryClients Map
  4. ✅ Heartbeat re-registration after 3 failures
  5. pilotPort read from config
  6. ✅ Removed unused execFile import from send.ts
  7. ✅ ENOENT race condition: event-driven spawn detection
  8. checkReady only checks Pilot if pilotPort configured
  9. ✅ Client reuse via registryClients Map in gateway methods
  10. registryApiKey support (X-Registry-Key header)

Entrypoint

  • Auto-enables plugin + channel on container startup
  • Injects registryApiKey from AGENT_REGISTRY_API_KEY env var
  • Handles existing configs (won't overwrite)

Platform Extension

  • platform-config.json, package.json, README.md for deployment

Testing

Deployed to staging. Agent 'teste-workspace' registered, heartbeat active, search by text/capability works, auth enforced.

Create extensions/agent-registry/ with:
- ChannelPlugin implementation (id: agent-registry)
- Types: PilotPeer, AgentRegistryConfig, Inbound/OutboundMessage
- Config adapter with placeholder resolveAllowFrom/resolveDefaultTo
- Daemon management stubs (start/stop/status)
- Discovery API client stubs (search/register/heartbeat)
- Monitor stub (inbound Pilot listener)
- Send stub (outbound Pilot messaging)
- Plugin manifest (openclaw.plugin.json)

No existing files modified. All placeholder implementations.
- startDaemon: spawn pilot-daemon with hostname/port, parse address from stdout
- stopDaemon: SIGTERM → 5s timeout → SIGKILL, cleanup refs
- getDaemonStatus: parse pilotctl info --json output
- getDaemonAddress: convenience wrapper for address
- isPilotInstalled: check pilot-daemon in PATH
monitor.ts:
- Subscribe to pilotctl JSON stream for inbound messages
- Newline-delimited JSON parsing via readline
- Exponential backoff reconnection (configurable)
- Lifecycle callbacks: onConnected, onDisconnected, onError
- Clean stop with SIGTERM → 3s → SIGKILL

send.ts:
- sendMessage: send via pilotctl to hostname
- sendMessageToAddress: send directly to Pilot address
- broadcastMessage: parallel sends via Promise.allSettled
- Configurable timeout (15s default)
- JSON response parsing with UUID fallback
discovery.ts:
- AgentRegistryClient class with full CRUD (register, deregister, search, heartbeat, getAgent, listAgents)
- Native fetch(), structured ok/error results
- agentToPeer converter, backward-compat legacy exports

index.ts:
- Outbound adapter wired to send.ts
- Gateway startAccount: register in registry, start monitor, 60s heartbeat
- Gateway stopAccount: cleanup monitor + heartbeat interval
- 3 gateway HTTP methods (status, peers, send)
- Status adapter with live daemon check
- Heartbeat adapter validates config + daemon state
guiramos added 25 commits March 28, 2026 20:50
…d README

- platform-config.json: enables plugin + sets channel via dot-path patches
- package.json: version 0.1.0 for manifest generation
- README.md: architecture diagram, module docs, deploy instructions, status checklist
Documents the three extensions (butley-api, agent-registry, lossless-claw),
their delivery methods (baked vs platform sync), and when to use each.
Copy extensions/agent-registry to dist/extensions/ alongside butley-api
so it gets loaded by OpenClaw at runtime.
Injects plugin config into openclaw.json on container startup.
Uses host.docker.internal:8001 as default registry URL.
- Config adapter now reads from channels.entries.agent-registry.accounts
- Entrypoint creates both plugin entry and channel account
- Account config includes registryUrl for API access
- startMonitor now rejects immediately with ENOENT if pilotctl not found
- startAccount catches the error and logs info about discovery-only mode
- Agent still registers in registry and heartbeats work
- P2P messaging is disabled but discovery works
- #1: Fix capabilities query param — use append() for repeated params instead of join()
- #4: Fix isConfigured — require hostname instead of registryUrl (default localhost OK)
- #5: Add deregister on stopAccount — store registryClient+agentId per account
- #7: Heartbeat re-registration — after 3 failures, deregister + re-register
- #10: Read pilotPort from config in resolveAccount
- #11: Remove unused execFile import from send.ts
- #12: Fix ENOENT race condition — use spawn event instead of setTimeout(500ms)
- #13: heartbeat checkReady — only require Pilot daemon if pilotPort configured
- #17: Verified daemon.ts imports — both spawn and execFile are used (no change)
- openclaw#18: Client reuse — store registryClient per account, reuse in gateway methods
- Also: Add API key support (registryApiKey config + X-Registry-Key header)
- Also: Add ok/error fields to SearchResult type
…try URL

- resolve_registry_url() replaces localhost/127.0.0.1 with 172.17.0.1
- Skips agent-registry setup entirely when no AGENT_REGISTRY_URL set
- Fixes existing configs with empty/localhost URLs on restart
- Prevents 'fetch failed' errors in containers without registry access
… hacks

- Pre-check isPilotInstalled() before attempting startMonitor
- No more ENOENT errors or auto-restart loops when Pilot is missing
- setStatus connected:true after registry registration (not after monitor)
- Remove localhost→Docker bridge URL rewriting from entrypoint
- .env.staging must use correct Docker-reachable URL directly
Gateway auto-restarts when startAccount resolves/returns. The channel must
block on abortSignal (like all other channels do) to stay alive. Previously
startAccount returned immediately after setup, causing 10 restart attempts.
- Add convex-client.ts to read networkDiscoverable from Convex metadata
- startAccount checks Convex before registering (idle when private)
- Add agent-registry/refresh gateway method for real-time toggle
- Extract registerAgent/deregisterAgent/startHeartbeat helpers
- Frontend POST /network triggers refresh via orchestrator proxy
- search: find assistants by query or capabilities
- get_agent: get details about a specific assistant
- contact: placeholder for Pilot Protocol messaging (not yet available)

Tool registered via api.registerTool in plugin.register()
- get_agent: increase search limit to 25 to avoid missing exact hostname match
- search: clamp limit to 1-50 range, cap capabilities to 20 max

Per Codex review (issues #1 and #2)
guiramos added 30 commits March 30, 2026 00:11
- Add pilot-daemon symlink in Dockerfile
- Replace pilotctl daemon start with direct pilot-daemon invocation
- Listen on PILOT_PORT to match Docker port mapping (fixes port mismatch bug)
- Add identity file verification before daemon start
- Support fixed endpoint mode via PILOT_PUBLIC_IP env var
- Fallback to STUN mode if PUBLIC_IP not set

Fixes Docker NAT issues where STUN fails and containers register with localhost.
Pilot Protocol Registry cannot resolve hostnames via external lookup.
Changed contact action to:
1. Look up target's pilot_node_id from Agent Registry
2. Use 'pilotctl send-message <node_id>' instead of 'pilotctl send <hostname>'

This fixes 'cannot resolve hostname' errors when sending messages.
FastAPI returns 307 redirect for trailing slash, and fetch() doesn't
follow redirects by default. Remove trailing slash to hit endpoint directly.
Allows assistants to poll their inbox for messages from other agents.
Optional 'clear' parameter to remove messages after reading.
Removed the line that deleted /root/.pilot/* on startup.
Identity is now persisted via volume mount from orchestrator.
The .pilot directory is mounted as a volume for identity persistence,
so binaries placed there during build get hidden by the mount.
- Add pollIntervalSeconds config (default 300s / 5 min)
- Poll loop checks pilotctl pending + inbox
- Inject handshake notifications as normal messages
- Inject first message with system event (introduction)
- Add reject/untrust actions to agent_network tool
- Make introduction required for handshake action
- Add Convex helpers for handshake persistence

Fixes from review:
- notificationSent flag prevents duplicate notifications
- deliver callback is no-op for handshake notifications
- Single Convex lookup for inbox messages
- Clear inbox only after successful processing
- Log Convex sync errors instead of swallowing
- Explicit node_id guards in tool actions
…ocol

Bug 1: Session routing now uses real hostnames
- Added resolveHostnameForNode() to resolve hostname from node_id
- Checks: from_hostname → Convex handshake → pilotctl peers → fallback
- Sessions now created as 'agent:main:agent-registry:default:direct:teste-workspace'
  instead of 'node-0:0000.0000.37D8'

Bug 2: Outbound adapter sends replies via Pilot Protocol
- Added hostnameToNodeId cache, populated when inbox messages arrive
- sendText now uses 'pilotctl send-message <node_id>' for known peers
- Falls back to hostname-based send for unknown peers
…API for hostname

- Parse inbox msg.from (Pilot address like 0:0000.0000.37D8) to extract node_id
- Use Agent Registry API (/api/v1/agents/?pilot_node_id=N) to resolve hostname
- Falls back to Convex handshake, then node-<id>
- Now sessions are properly named (e.g., teste-workspace instead of node-0:...)
…ions

Without this, all agent-registry messages were going to the main session.
Now each peer agent gets its own isolated session.
CRITICAL:
- #1: Fix InboxMessage.from type from number to string (Pilot address)
- #2: Fix parseNodeIdFromPilotAddress to extract only last hex group
- #3: Rebuild hostnameToNodeId cache from Convex on startup
- #4: Fix AgentNetworkConfig → AgentRegistryConfig type reference
- #5: Track processed message hashes to prevent re-processing on clear failure

HIGH:
- #7: Change default pollIntervalSeconds from 300 to 15
- #8: Fix startDaemon regex to match Pilot address format (0:xxxx.xxxx.xxxx)
- #9: Use registerAgent() for re-registration to preserve enriched metadata
- #10: Fix search query param from 'query' to 'q'
- #12: handleRefresh supports optional accountId, refreshes all if omitted
- #13: Move spawn import to top-level, remove 7 inline dynamic imports
- #14: Mark inbox tool action as debug-only (poll loop handles processing)

MEDIUM:
- #15: Add installation_id to PilotPeer interface and agentToPeer
- #16: Use exact /by-hostname/ endpoint for get_agent action
- #17: Add LRU-style eviction to hostnameToNodeId (max 1000 entries)
- openclaw#18: Add mutex (isPolling flag) to executePollCycle
- openclaw#19: Add lifecycle comments to startDaemon/stopDaemon (container vs local)
- openclaw#20: Pass gatewayToken in fetchNetworkMetadata Convex query
- openclaw#21: Wrap sendText fallback in try/catch with error handling
- openclaw#23: Document sendMessage JSON vs raw text design intent
The installations:get query is public and doesn't accept gatewayToken.
Adding it caused ArgumentValidationError: Object contains extra field.
Pilot Protocol daemon causes excessive CPU consumption.
Using HTTP relay via Agent Registry for agent-to-agent communication
until the issue is resolved upstream.
The openclaw cron list --json command may print config warnings to
stdout before the JSON payload. This caused json.loads() to fail,
which made bootstrap.py skip the duplicate check and create multiple
system-context-harvester crons on each container restart.

Now extracts the JSON portion by finding the first { or [ character.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant