feat: add context-harvester platform tool with cron integration#15
Merged
feat: add context-harvester platform tool with cron integration#15
Conversation
- Add context-harvester-setup.md with complete installation guide - Include reference implementation files in docker/agent/context-harvester-files/ - harvester.ts, package.json, tsconfig.json, run.sh - .gitignore and README - Add Dockerfile.snippet showing integration pattern - Context Harvester runs as cron job from agent workspace (~/clawd/tools/context-harvester/) - Cron config goes in agent's openclaw.json (not in repo) - Script generates live CONTEXT.md snapshot every 30 minutes - No manifest needed — runs via cron, not as extension
7b09fd2 to
5cba556
Compare
- Add context-harvester tool installation to Dockerfile - Files copied to /root/workspace/tools/context-harvester/ - Dependencies (openai, dotenv) installed at build time - run.sh made executable - Add .env.example template for API key - Tool ready to run via cron at container startup - Reduces setup overhead — just set DEEPSEEK_API_KEY at runtime
Add idempotent cron provisioning that runs on every container start: - Lists existing cron jobs via 'openclaw cron list --json' - Creates missing system-* crons via 'openclaw cron add' CLI flags - Skips if context-harvester tool not installed in workspace - Waits for gateway readiness before attempting provisioning This replaces the invalid cron.jobs config approach (cron is not a recognized key in openclaw.json schema — jobs are managed via API).
- DeepSeek returns JSON with contextMd + followUps array - FollowUps are optional, based on DeepSeek judgment - 2h cooldown per session to prevent spam - Session keys with cooldown info included in prompt - Cron agent processes followUps via sessions_send - .followup-state.json tracks last followUp per session
The sync-platform.sh script was doing an early exit when the stamp version matched, skipping BOTH file sync AND config patches. But the config may have been reset/overwritten by the backend provisioner during redeploy, losing plugin entries (e.g. lossless-claw). Split into two phases: 1. File sync — still controlled by stamp, skipped when version matches 2. Config patches — ALWAYS run, idempotent, ensures openclaw.json has required plugin entries regardless of provisioner overwrites
FollowUps injected via sessions_send appear as user messages in the target session, which is confusing. Disable until a proper delivery mechanism (system event or clearly marked automation) is implemented. DeepSeek still generates followUp suggestions (for logging), but they are not acted upon.
The openclaw sessions CLI crashes in some container builds due to duplicate qmd command registration. Add fallback: scan session JSONL files by mtime (last 60min) to discover active sessions directly. CLI is still tried first; fallback only activates when CLI fails.
…ory dir exists Container workspace is ~/workspace not ~/clawd. Also mkdir -p the memory directory before writing CONTEXT.md.
OpenClaw 2026.3.14 registers the qmd subcommand twice in register.subclis-*.js, causing all CLI commands to fail with "cannot add command qmd as already have command qmd". Added a build-time Python patch script that removes the duplicate entry. Safe to run on future versions without the bug (no-ops when only 1 entry found).
guiramos
added a commit
that referenced
this pull request
Mar 31, 2026
CRITICAL: - #1: Fix InboxMessage.from type from number to string (Pilot address) - #2: Fix parseNodeIdFromPilotAddress to extract only last hex group - #3: Rebuild hostnameToNodeId cache from Convex on startup - #4: Fix AgentNetworkConfig → AgentRegistryConfig type reference - #5: Track processed message hashes to prevent re-processing on clear failure HIGH: - #7: Change default pollIntervalSeconds from 300 to 15 - #8: Fix startDaemon regex to match Pilot address format (0:xxxx.xxxx.xxxx) - #9: Use registerAgent() for re-registration to preserve enriched metadata - #10: Fix search query param from 'query' to 'q' - #12: handleRefresh supports optional accountId, refreshes all if omitted - #13: Move spawn import to top-level, remove 7 inline dynamic imports - #14: Mark inbox tool action as debug-only (poll loop handles processing) MEDIUM: - #15: Add installation_id to PilotPeer interface and agentToPeer - #16: Use exact /by-hostname/ endpoint for get_agent action - #17: Add LRU-style eviction to hostnameToNodeId (max 1000 entries) - openclaw#18: Add mutex (isPolling flag) to executePollCycle - openclaw#19: Add lifecycle comments to startDaemon/stopDaemon (container vs local) - openclaw#20: Pass gatewayToken in fetchNetworkMetadata Convex query - openclaw#21: Wrap sendText fallback in try/catch with error handling - openclaw#23: Document sendMessage JSON vs raw text design intent
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR adds the Context Harvester tool — a TypeScript-based cron job that runs every 30 minutes to:
What's included
platform-tools/context-harvester/ — Complete tool with:
patches/context-harvester-cron-config.patch — Cron job config for Butley agents
CONTEXT-HARVESTER-CRON-CONFIG.md — Complete deployment documentation
How it works
The tool reads from ~/.openclaw/agents/main/sessions/ JSONL files, extracts messages from the last 60 minutes, and sends them to DeepSeek Chat API along with the current CONTEXT.md. DeepSeek rewrites the entire file with only active topics — completed items naturally disappear on the next run.
Benefits:
Testing
Currently running in production on main Gee agent (~/clawd/) — running successfully every 30 minutes, generating accurate context snapshots.
Integration