Skip to content

feat(hooks): fixed and adding various agent, messaging, system and compaction hooks#9761

Closed
vincentkoc wants to merge 70 commits intoopenclaw:mainfrom
vincentkoc:vk/otel-plugin-hooks
Closed

feat(hooks): fixed and adding various agent, messaging, system and compaction hooks#9761
vincentkoc wants to merge 70 commits intoopenclaw:mainfrom
vincentkoc:vk/otel-plugin-hooks

Conversation

@vincentkoc
Copy link
Contributor

@vincentkoc vincentkoc commented Feb 5, 2026

This PR adds and wires missing lifecycle hooks across OpenClaw (internal hooks and plugin hooks) to unlock observability, guardrails, and monitoring. Also patched CI/CD to ensure formatter passes.

New/expanded internal hooks (HOOK.md)

  • session:compact:before / session:compact:after (compaction lifecycle)
  • session:prune (context pruning of tool outputs)
  • gateway:shutdown / gateway:pre-restart (gateway lifecycle)
  • agent:thinking:start / agent:thinking:end (model call boundary)
  • agent:response:start / agent:response:end (response generation boundary)
  • agent:tool:start / agent:tool:end (tool execution boundary)

Plugin hooks wired

  • before_compaction / after_compaction now called in the compaction pipeline
  • message_received / message_sending / message_sent wired into outbound delivery
  • before_tool_call / after_tool_call now receive toolCallId when available

Fixes and other context improvements

  • Formatter was not matching with CI/CD based pnpm format
  • Session keys are treated as case-insensitive identifiers across routing/storage
  • Message hook context now includes sessionKey/sessionId
  • Tool hook context now includes toolCallId
  • Compaction hook context now includes counts and token metadata
  • Upstream test session-write-lock failures + unhandled process.exit caused by ec0728b35 (fix: release session locks on process termination [AI-assisted] #1962)
  • Upstream test dispatch-from-config onToolResult expectation failures caused by 05b28c147 ee1ec3fab (fix: wire onToolResult callback for verbose tool summaries #2022)
  • Upstream test Telegram native slash command count drift same onToolResult behavior change from 05b28c147 + ee1ec3fab

Issues addressed

FYI

  • Compaction hooks currently report metadata only (not full message snapshots).
  • Pruning hook reports counts/tool names only (not pruned content).

Greptile Overview

Greptile Summary

This PR expands OpenClaw’s lifecycle hook surface area and wires plugin hooks into core workflows. Key changes include:

  • Adds internal hook emissions for session compaction (session:compact:before/after), context pruning (session:prune), gateway shutdown/restart (gateway:shutdown, gateway:pre-restart), and agent lifecycle boundaries (agent:thinking:*, agent:response:*, agent:tool:*).
  • Wires plugin hooks for compaction and outbound messaging (message_sending, message_sent) and extends tool hook context to include toolCallId.
  • Normalizes session keys as case-insensitive identifiers via a shared normalizeSessionKey helper and propagates sessionKey/sessionId through outbound delivery paths.
  • Adds test coverage around new hook lifecycles (compaction hooks, agent response/thinking hooks, gateway close hooks) and adjusts existing tests for async pruning.

Overall, the PR is centered on improving observability/guardrails by ensuring lifecycle hooks fire consistently across internal and plugin APIs, while also tightening session key handling for routing/storage consistency.

Confidence Score: 3/5

  • This PR is moderately safe to merge, but has a couple of correctness risks around hook attribution and async extension handler semantics.
  • Most changes are additive hook wiring and test updates, but there are two areas that can cause real runtime behavioral issues: (1) internal hook events fabricating sessionKey-like values (can break attribution/routing assumptions for consumers), and (2) context pruning now relies on an async extension handler, which will break pruning if the host does not await Promises. The rest of the hook wiring appears guarded and best-effort with tests covering key lifecycles.
  • src/agents/pi-extensions/context-pruning/extension.ts, src/agents/pi-tools.before-tool-call.ts, src/agents/pi-embedded-runner/compact.ts

@openclaw-barnacle openclaw-barnacle bot added docs Improvements or additions to documentation gateway Gateway runtime commands Command implementations agents Agent runtime and tooling labels Feb 5, 2026
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@openclaw-barnacle openclaw-barnacle bot removed the channel: telegram Channel integration: telegram label Feb 6, 2026
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@vincentkoc
Copy link
Contributor Author

@greptileai

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 6, 2026

Additional Comments (2)

src/infra/outbound/deliver.ts
message_sent not fired

deliverOutboundPayloads() emits message_sent after sendTextChunks()/sendSignalTextChunks() and inside the media loop, but the early-return path for handler.sendPayload && payload.channelData only calls message_sent on success. If handler.sendPayload(payload) throws, the catch will run runMessageSentHook(attemptedSendContent, false, ...), but attemptedSendContent is initialized from the pre-hook payloadSummary.text and doesn’t track per-channelData payload content after message_sending mutation. This makes message_sent report stale content for failures on sendPayload channels.

Set attemptedSendContent after applying message_sending (and/or right before each send attempt) so failure hooks report the same content that was actually attempted.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/infra/outbound/deliver.ts
Line: 232:235

Comment:
**message_sent not fired**

`deliverOutboundPayloads()` emits `message_sent` after `sendTextChunks()`/`sendSignalTextChunks()` and inside the media loop, but the early-return path for `handler.sendPayload && payload.channelData` only calls `message_sent` on success. If `handler.sendPayload(payload)` throws, the `catch` will run `runMessageSentHook(attemptedSendContent, false, ...)`, but `attemptedSendContent` is initialized from the *pre-hook* `payloadSummary.text` and doesn’t track per-channelData payload content after `message_sending` mutation. This makes `message_sent` report stale content for failures on `sendPayload` channels.

Set `attemptedSendContent` after applying `message_sending` (and/or right before each send attempt) so failure hooks report the same content that was actually attempted.


How can I resolve this? If you propose a fix, please make it concise.

src/agents/pi-extensions/context-pruning/pruner.ts
Prune hook await changes

pruneContextMessages() is now async solely to await triggerInternalHook via emitPruneHook(). This makes every pruning pass wait on hook handlers, which can slow the agent loop and (if hooks hang) block prompt execution. Since the hook is best-effort telemetry, consider firing it without awaiting (e.g., void triggerInternalHook(...) with .catch(...)) or using a timeout so pruning remains deterministic.

This is especially relevant because pruning runs in the context extension on every turn when enabled.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-extensions/context-pruning/pruner.ts
Line: 266:268

Comment:
**Prune hook await changes**

`pruneContextMessages()` is now `async` solely to `await triggerInternalHook` via `emitPruneHook()`. This makes every pruning pass wait on hook handlers, which can slow the agent loop and (if hooks hang) block prompt execution. Since the hook is best-effort telemetry, consider firing it without awaiting (e.g., `void triggerInternalHook(...)` with `.catch(...)`) or using a timeout so pruning remains deterministic.

This is especially relevant because pruning runs in the `context` extension on every turn when enabled.


How can I resolve this? If you propose a fix, please make it concise.

@vincentkoc
Copy link
Contributor Author

@greptileai

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 6, 2026

Additional Comments (2)

src/agents/pi-tools.before-tool-call.ts
Tool hook sessionKey misuse

resolveHookSessionKey() fabricates a sessionKey like tool:<agentId>:<toolCallId> when ctx.sessionKey is missing (src/agents/pi-tools.before-tool-call.ts:1704-1718). This makes internal agent:tool:* hooks look like they’re attached to a real session key and can collide across runs if toolCallIds repeat, breaking routing/attribution for any consumer that treats sessionKey as a persisted identifier. Consider using a clearly non-session key (e.g. run:<runId> like attempt.ts does) or leaving sessionKey empty and putting the fallback into context only.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-tools.before-tool-call.ts
Line: 1704:1718

Comment:
**Tool hook sessionKey misuse**

`resolveHookSessionKey()` fabricates a `sessionKey` like `tool:<agentId>:<toolCallId>` when `ctx.sessionKey` is missing (`src/agents/pi-tools.before-tool-call.ts:1704-1718`). This makes internal `agent:tool:*` hooks look like they’re attached to a real session key and can collide across runs if toolCallIds repeat, breaking routing/attribution for any consumer that treats `sessionKey` as a persisted identifier. Consider using a clearly non-session key (e.g. `run:<runId>` like `attempt.ts` does) or leaving `sessionKey` empty and putting the fallback into `context` only.

How can I resolve this? If you propose a fix, please make it concise.

src/agents/pi-extensions/context-pruning/extension.ts
Async handler may be ignored

contextPruningExtension changed the api.on("context", ...) handler to async (src/agents/pi-extensions/context-pruning/extension.ts:1413-1416). If the underlying extension host expects a synchronous return value (common for hook-style APIs), returning a Promise can cause the pruned messages to be ignored (or applied too late), meaning pruning silently stops working in production. This needs confirmation against the ExtensionAPI.on contract; if it’s sync-only, keep the handler synchronous and avoid awaiting inside it. Does the pi-coding-agent ExtensionAPI.on("context") contract support async handlers (i.e., does it await Promises)? If not, we need to keep pruning synchronous and remove async/await here.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-extensions/context-pruning/extension.ts
Line: 1413:1416

Comment:
**Async handler may be ignored**

`contextPruningExtension` changed the `api.on("context", ...)` handler to `async` (`src/agents/pi-extensions/context-pruning/extension.ts:1413-1416`). If the underlying extension host expects a synchronous return value (common for hook-style APIs), returning a Promise can cause the pruned `messages` to be ignored (or applied too late), meaning pruning silently stops working in production. This needs confirmation against the `ExtensionAPI.on` contract; if it’s sync-only, keep the handler synchronous and avoid awaiting inside it. Does the pi-coding-agent ExtensionAPI.on("context") contract support async handlers (i.e., does it await Promises)? If not, we need to keep pruning synchronous and remove async/await here.

How can I resolve this? If you propose a fix, please make it concise.

@cpojer
Copy link
Member

cpojer commented Feb 6, 2026

This is too big. Make small PRs. Stop spamming me.

@vincentkoc
Copy link
Contributor Author

Moved to #16788

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling commands Command implementations docs Improvements or additions to documentation gateway Gateway runtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments