feat: Agent Registry channel plugin (Pilot Protocol P2P)#16
Open
feat: Agent Registry channel plugin (Pilot Protocol P2P)#16
Conversation
Create extensions/agent-registry/ with: - ChannelPlugin implementation (id: agent-registry) - Types: PilotPeer, AgentRegistryConfig, Inbound/OutboundMessage - Config adapter with placeholder resolveAllowFrom/resolveDefaultTo - Daemon management stubs (start/stop/status) - Discovery API client stubs (search/register/heartbeat) - Monitor stub (inbound Pilot listener) - Send stub (outbound Pilot messaging) - Plugin manifest (openclaw.plugin.json) No existing files modified. All placeholder implementations.
- startDaemon: spawn pilot-daemon with hostname/port, parse address from stdout - stopDaemon: SIGTERM → 5s timeout → SIGKILL, cleanup refs - getDaemonStatus: parse pilotctl info --json output - getDaemonAddress: convenience wrapper for address - isPilotInstalled: check pilot-daemon in PATH
monitor.ts: - Subscribe to pilotctl JSON stream for inbound messages - Newline-delimited JSON parsing via readline - Exponential backoff reconnection (configurable) - Lifecycle callbacks: onConnected, onDisconnected, onError - Clean stop with SIGTERM → 3s → SIGKILL send.ts: - sendMessage: send via pilotctl to hostname - sendMessageToAddress: send directly to Pilot address - broadcastMessage: parallel sends via Promise.allSettled - Configurable timeout (15s default) - JSON response parsing with UUID fallback
discovery.ts: - AgentRegistryClient class with full CRUD (register, deregister, search, heartbeat, getAgent, listAgents) - Native fetch(), structured ok/error results - agentToPeer converter, backward-compat legacy exports index.ts: - Outbound adapter wired to send.ts - Gateway startAccount: register in registry, start monitor, 60s heartbeat - Gateway stopAccount: cleanup monitor + heartbeat interval - 3 gateway HTTP methods (status, peers, send) - Status adapter with live daemon check - Heartbeat adapter validates config + daemon state
…d README - platform-config.json: enables plugin + sets channel via dot-path patches - package.json: version 0.1.0 for manifest generation - README.md: architecture diagram, module docs, deploy instructions, status checklist
Documents the three extensions (butley-api, agent-registry, lossless-claw), their delivery methods (baked vs platform sync), and when to use each.
Copy extensions/agent-registry to dist/extensions/ alongside butley-api so it gets loaded by OpenClaw at runtime.
Injects plugin config into openclaw.json on container startup. Uses host.docker.internal:8001 as default registry URL.
- Config adapter now reads from channels.entries.agent-registry.accounts - Entrypoint creates both plugin entry and channel account - Account config includes registryUrl for API access
- startMonitor now rejects immediately with ENOENT if pilotctl not found - startAccount catches the error and logs info about discovery-only mode - Agent still registers in registry and heartbeats work - P2P messaging is disabled but discovery works
- #1: Fix capabilities query param — use append() for repeated params instead of join() - #4: Fix isConfigured — require hostname instead of registryUrl (default localhost OK) - #5: Add deregister on stopAccount — store registryClient+agentId per account - #7: Heartbeat re-registration — after 3 failures, deregister + re-register - #10: Read pilotPort from config in resolveAccount - #11: Remove unused execFile import from send.ts - #12: Fix ENOENT race condition — use spawn event instead of setTimeout(500ms) - #13: heartbeat checkReady — only require Pilot daemon if pilotPort configured - #17: Verified daemon.ts imports — both spawn and execFile are used (no change) - openclaw#18: Client reuse — store registryClient per account, reuse in gateway methods - Also: Add API key support (registryApiKey config + X-Registry-Key header) - Also: Add ok/error fields to SearchResult type
…try URL - resolve_registry_url() replaces localhost/127.0.0.1 with 172.17.0.1 - Skips agent-registry setup entirely when no AGENT_REGISTRY_URL set - Fixes existing configs with empty/localhost URLs on restart - Prevents 'fetch failed' errors in containers without registry access
… hacks - Pre-check isPilotInstalled() before attempting startMonitor - No more ENOENT errors or auto-restart loops when Pilot is missing - setStatus connected:true after registry registration (not after monitor) - Remove localhost→Docker bridge URL rewriting from entrypoint - .env.staging must use correct Docker-reachable URL directly
Gateway auto-restarts when startAccount resolves/returns. The channel must block on abortSignal (like all other channels do) to stay alive. Previously startAccount returned immediately after setup, causing 10 restart attempts.
- Add convex-client.ts to read networkDiscoverable from Convex metadata - startAccount checks Convex before registering (idle when private) - Add agent-registry/refresh gateway method for real-time toggle - Extract registerAgent/deregisterAgent/startHeartbeat helpers - Frontend POST /network triggers refresh via orchestrator proxy
- search: find assistants by query or capabilities - get_agent: get details about a specific assistant - contact: placeholder for Pilot Protocol messaging (not yet available) Tool registered via api.registerTool in plugin.register()
- Add pilot-daemon symlink in Dockerfile - Replace pilotctl daemon start with direct pilot-daemon invocation - Listen on PILOT_PORT to match Docker port mapping (fixes port mismatch bug) - Add identity file verification before daemon start - Support fixed endpoint mode via PILOT_PUBLIC_IP env var - Fallback to STUN mode if PUBLIC_IP not set Fixes Docker NAT issues where STUN fails and containers register with localhost.
Pilot Protocol Registry cannot resolve hostnames via external lookup. Changed contact action to: 1. Look up target's pilot_node_id from Agent Registry 2. Use 'pilotctl send-message <node_id>' instead of 'pilotctl send <hostname>' This fixes 'cannot resolve hostname' errors when sending messages.
FastAPI returns 307 redirect for trailing slash, and fetch() doesn't follow redirects by default. Remove trailing slash to hit endpoint directly.
Allows assistants to poll their inbox for messages from other agents. Optional 'clear' parameter to remove messages after reading.
Removed the line that deleted /root/.pilot/* on startup. Identity is now persisted via volume mount from orchestrator.
The .pilot directory is mounted as a volume for identity persistence, so binaries placed there during build get hidden by the mount.
…tallation_id unknown
- Add pollIntervalSeconds config (default 300s / 5 min) - Poll loop checks pilotctl pending + inbox - Inject handshake notifications as normal messages - Inject first message with system event (introduction) - Add reject/untrust actions to agent_network tool - Make introduction required for handshake action - Add Convex helpers for handshake persistence Fixes from review: - notificationSent flag prevents duplicate notifications - deliver callback is no-op for handshake notifications - Single Convex lookup for inbox messages - Clear inbox only after successful processing - Log Convex sync errors instead of swallowing - Explicit node_id guards in tool actions
…ocol Bug 1: Session routing now uses real hostnames - Added resolveHostnameForNode() to resolve hostname from node_id - Checks: from_hostname → Convex handshake → pilotctl peers → fallback - Sessions now created as 'agent:main:agent-registry:default:direct:teste-workspace' instead of 'node-0:0000.0000.37D8' Bug 2: Outbound adapter sends replies via Pilot Protocol - Added hostnameToNodeId cache, populated when inbox messages arrive - sendText now uses 'pilotctl send-message <node_id>' for known peers - Falls back to hostname-based send for unknown peers
…API for hostname - Parse inbox msg.from (Pilot address like 0:0000.0000.37D8) to extract node_id - Use Agent Registry API (/api/v1/agents/?pilot_node_id=N) to resolve hostname - Falls back to Convex handshake, then node-<id> - Now sessions are properly named (e.g., teste-workspace instead of node-0:...)
…ions Without this, all agent-registry messages were going to the main session. Now each peer agent gets its own isolated session.
CRITICAL: - #1: Fix InboxMessage.from type from number to string (Pilot address) - #2: Fix parseNodeIdFromPilotAddress to extract only last hex group - #3: Rebuild hostnameToNodeId cache from Convex on startup - #4: Fix AgentNetworkConfig → AgentRegistryConfig type reference - #5: Track processed message hashes to prevent re-processing on clear failure HIGH: - #7: Change default pollIntervalSeconds from 300 to 15 - #8: Fix startDaemon regex to match Pilot address format (0:xxxx.xxxx.xxxx) - #9: Use registerAgent() for re-registration to preserve enriched metadata - #10: Fix search query param from 'query' to 'q' - #12: handleRefresh supports optional accountId, refreshes all if omitted - #13: Move spawn import to top-level, remove 7 inline dynamic imports - #14: Mark inbox tool action as debug-only (poll loop handles processing) MEDIUM: - #15: Add installation_id to PilotPeer interface and agentToPeer - #16: Use exact /by-hostname/ endpoint for get_agent action - #17: Add LRU-style eviction to hostnameToNodeId (max 1000 entries) - openclaw#18: Add mutex (isPolling flag) to executePollCycle - openclaw#19: Add lifecycle comments to startDaemon/stopDaemon (container vs local) - openclaw#20: Pass gatewayToken in fetchNetworkMetadata Convex query - openclaw#21: Wrap sendText fallback in try/catch with error handling - openclaw#23: Document sendMessage JSON vs raw text design intent
The installations:get query is public and doesn't accept gatewayToken. Adding it caused ArgumentValidationError: Object contains extra field.
Pilot Protocol daemon causes excessive CPU consumption. Using HTTP relay via Agent Registry for agent-to-agent communication until the issue is resolved upstream.
The openclaw cron list --json command may print config warnings to
stdout before the JSON payload. This caused json.loads() to fail,
which made bootstrap.py skip the duplicate check and create multiple
system-context-harvester crons on each container restart.
Now extracts the JSON portion by finding the first { or [ character.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Agent Registry Channel Plugin
Full OpenClaw channel plugin for P2P agent discovery via Agent Registry.
Architecture
Code Review Fixes (10 issues)
append()loop for repeated paramsisConfigured: checkshostnamenotregistryUrlstopAccountviaregistryClientsMappilotPortread from configexecFileimport from send.tscheckReadyonly checks Pilot ifpilotPortconfiguredregistryClientsMap in gateway methodsregistryApiKeysupport (X-Registry-Keyheader)Entrypoint
registryApiKeyfromAGENT_REGISTRY_API_KEYenv varPlatform Extension
platform-config.json,package.json,README.mdfor deploymentTesting
Deployed to staging. Agent 'teste-workspace' registered, heartbeat active, search by text/capability works, auth enforced.