Skip to content

feat: add context-harvester platform tool with cron integration#15

Merged
guiramos merged 10 commits intoworkfrom
feat/context-harvester
Mar 28, 2026
Merged

feat: add context-harvester platform tool with cron integration#15
guiramos merged 10 commits intoworkfrom
feat/context-harvester

Conversation

@guiramos
Copy link
Copy Markdown

Overview

This PR adds the Context Harvester tool — a TypeScript-based cron job that runs every 30 minutes to:

  1. Scan active OpenClaw sessions
  2. Extract recent messages
  3. Send to DeepSeek for intelligent summarization
  4. Generate a live CONTEXT.md snapshot briefing

What's included

  • platform-tools/context-harvester/ — Complete tool with:

    • harvester.ts: Main script (session scanning, DeepSeek integration, retries)
    • run.sh: Executable wrapper
    • .env: API key configuration
    • Full README and tsconfig
  • patches/context-harvester-cron-config.patch — Cron job config for Butley agents

  • CONTEXT-HARVESTER-CRON-CONFIG.md — Complete deployment documentation

How it works

The tool reads from ~/.openclaw/agents/main/sessions/ JSONL files, extracts messages from the last 60 minutes, and sends them to DeepSeek Chat API along with the current CONTEXT.md. DeepSeek rewrites the entire file with only active topics — completed items naturally disappear on the next run.

Benefits:

  • Live briefing: agents always know what's happening
  • Concise: ~2-5KB snapshot, not an accumulating log
  • Privacy: filters out private sessions
  • Resilient: retries 3x on API failures, handles missing/empty state

Testing

Currently running in production on main Gee agent (~/clawd/) — running successfully every 30 minutes, generating accurate context snapshots.

Integration

  1. Add cron config from patch to agent's openclaw.json
  2. Set DEEPSEEK_API_KEY env var
  3. Deploy

- Add context-harvester-setup.md with complete installation guide
- Include reference implementation files in docker/agent/context-harvester-files/
  - harvester.ts, package.json, tsconfig.json, run.sh
  - .gitignore and README
- Add Dockerfile.snippet showing integration pattern
- Context Harvester runs as cron job from agent workspace (~/clawd/tools/context-harvester/)
- Cron config goes in agent's openclaw.json (not in repo)
- Script generates live CONTEXT.md snapshot every 30 minutes
- No manifest needed — runs via cron, not as extension
@guiramos guiramos force-pushed the feat/context-harvester branch from 7b09fd2 to 5cba556 Compare March 27, 2026 02:35
- Add context-harvester tool installation to Dockerfile
- Files copied to /root/workspace/tools/context-harvester/
- Dependencies (openai, dotenv) installed at build time
- run.sh made executable
- Add .env.example template for API key
- Tool ready to run via cron at container startup
- Reduces setup overhead — just set DEEPSEEK_API_KEY at runtime
Add idempotent cron provisioning that runs on every container start:
- Lists existing cron jobs via 'openclaw cron list --json'
- Creates missing system-* crons via 'openclaw cron add' CLI flags
- Skips if context-harvester tool not installed in workspace
- Waits for gateway readiness before attempting provisioning

This replaces the invalid cron.jobs config approach (cron is not a
recognized key in openclaw.json schema — jobs are managed via API).
- DeepSeek returns JSON with contextMd + followUps array
- FollowUps are optional, based on DeepSeek judgment
- 2h cooldown per session to prevent spam
- Session keys with cooldown info included in prompt
- Cron agent processes followUps via sessions_send
- .followup-state.json tracks last followUp per session
The sync-platform.sh script was doing an early exit when the stamp
version matched, skipping BOTH file sync AND config patches. But the
config may have been reset/overwritten by the backend provisioner
during redeploy, losing plugin entries (e.g. lossless-claw).

Split into two phases:
1. File sync — still controlled by stamp, skipped when version matches
2. Config patches — ALWAYS run, idempotent, ensures openclaw.json
   has required plugin entries regardless of provisioner overwrites
FollowUps injected via sessions_send appear as user messages in the
target session, which is confusing. Disable until a proper delivery
mechanism (system event or clearly marked automation) is implemented.

DeepSeek still generates followUp suggestions (for logging), but they
are not acted upon.
The openclaw sessions CLI crashes in some container builds due to
duplicate qmd command registration. Add fallback: scan session JSONL
files by mtime (last 60min) to discover active sessions directly.
CLI is still tried first; fallback only activates when CLI fails.
…ory dir exists

Container workspace is ~/workspace not ~/clawd. Also mkdir -p the
memory directory before writing CONTEXT.md.
OpenClaw 2026.3.14 registers the qmd subcommand twice in
register.subclis-*.js, causing all CLI commands to fail with
"cannot add command qmd as already have command qmd".

Added a build-time Python patch script that removes the duplicate
entry. Safe to run on future versions without the bug (no-ops
when only 1 entry found).
@guiramos guiramos merged commit 68d4ebd into work Mar 28, 2026
2 of 9 checks passed
@guiramos guiramos deleted the feat/context-harvester branch March 28, 2026 04:05
guiramos added a commit that referenced this pull request Mar 31, 2026
CRITICAL:
- #1: Fix InboxMessage.from type from number to string (Pilot address)
- #2: Fix parseNodeIdFromPilotAddress to extract only last hex group
- #3: Rebuild hostnameToNodeId cache from Convex on startup
- #4: Fix AgentNetworkConfig → AgentRegistryConfig type reference
- #5: Track processed message hashes to prevent re-processing on clear failure

HIGH:
- #7: Change default pollIntervalSeconds from 300 to 15
- #8: Fix startDaemon regex to match Pilot address format (0:xxxx.xxxx.xxxx)
- #9: Use registerAgent() for re-registration to preserve enriched metadata
- #10: Fix search query param from 'query' to 'q'
- #12: handleRefresh supports optional accountId, refreshes all if omitted
- #13: Move spawn import to top-level, remove 7 inline dynamic imports
- #14: Mark inbox tool action as debug-only (poll loop handles processing)

MEDIUM:
- #15: Add installation_id to PilotPeer interface and agentToPeer
- #16: Use exact /by-hostname/ endpoint for get_agent action
- #17: Add LRU-style eviction to hostnameToNodeId (max 1000 entries)
- openclaw#18: Add mutex (isPolling flag) to executePollCycle
- openclaw#19: Add lifecycle comments to startDaemon/stopDaemon (container vs local)
- openclaw#20: Pass gatewayToken in fetchNetworkMetadata Convex query
- openclaw#21: Wrap sendText fallback in try/catch with error handling
- openclaw#23: Document sendMessage JSON vs raw text design intent
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant