Skip to content

hooks: unify internal + plugin hook registries to fix 5 systemic bugs #30784

@mcaxtr

Description

@mcaxtr

Summary

The codebase has two independent hook systems with overlapping responsibilities and duplicated module-level singleton state. This architectural split has been the root cause of 30+ bugs over the past 2 months.

Problem to solve

Two hook systems, no shared infrastructure

  1. Internal hooks (src/hooks/internal-hooks.ts): module-level handlers Map, file-based HOOK.md discovery, single-arg handlers, 8 event types
  2. Plugin typed hooks (src/plugins/hooks.ts + hook-runner-global.ts): module-level globalHookRunner, typed 2-arg handlers with return values, 24 event types

Both singletons are broken by the Rolldown bundler (duplicated across output chunks). Three overlapping events (message_received, message_sent, gateway_start) are dispatched through both systems via manual dual-dispatch calls scattered across the codebase.

Five systemic bugs from the same root cause

  1. Bundler duplication: Module-level let globalHookRunner and const handlers = new Map() get duplicated across Rolldown output chunks. When code in chunk A registers a hook and code in chunk B dispatches, the hook silently doesn't fire.

  2. clearInternalHooks() wipes plugin hooks: clearInternalHooks() calls handlers.clear() which destroys everything — including hooks registered by plugins via registerInternalHook() in src/plugins/registry.ts:264. After gateway hot-reload, plugin hooks vanish.

  3. gateway:startup fires before hooks are loaded: In server-startup.ts:142-151, the startup hook fires via setTimeout(() => triggerInternalHook(...), 250) — a race condition that fires before or after hooks are loaded depending on disk speed.

  4. sessions.reset never fires before_reset: The gateway RPC handler at server-methods/sessions.ts:436-447 fires triggerInternalHook("command", ...) but never calls hookRunner.runBeforeReset(). The /new and /reset command path in commands-core.ts correctly fires both, but the gateway RPC path doesn't — so plugins lose memory extraction opportunities on gateway-initiated resets.

  5. after_compaction missing sessionKey: In pi-embedded-subscribe.handlers.compaction.ts:78, the after_compaction hook fires with {} as the context object, losing the sessionKey that before_compaction correctly passes.

Dual-dispatch maintenance burden

Every message event requires 15-30 lines of manual dual dispatch at each call site. There are currently 3 call sites for message_received and 3 for message_sent, each independently constructing the same event objects with slightly different field mappings. Adding a new overlapping event requires updating every call site — and missing one creates a silent bug.

Proposed solution

Unified hook registry

A single Map<string, HookRegistryEntry[]> backed by globalThis[Symbol.for("openclaw:hookRegistry")]. All handlers — file-based, plugin-typed, bundled — register here with source metadata.

Source-scoped clearing: clearHooksBySource(["bundled", "workspace", "managed", "config"]) preserves "plugin" handlers. This replaces handlers.clear() and prevents the wipe-everything bug.

Dispatch helpers for overlapping events

Co-located dispatch functions (emitMessageReceived, emitMessageSent, emitGatewayStartup, emitSessionResetHooks) that call both systems in one place. Each call site replaces 15-30 lines of manual dual dispatch with one function call. The "bridge" lives in one file (src/hooks/dispatch-unified.ts), not scattered across the codebase.

Fix hook-runner-global bundler duplication

Replace let globalHookRunner: HookRunner | null = null with globalThis[Symbol.for("openclaw:hookRunner")] — same lazy-init pattern already used in src/plugins/runtime.ts.

Alternatives considered

  • Merge into one hook system: Would require rewriting all 24 typed hook handlers and 4 bundled HOOK.md handlers. Too much churn for the same result.
  • Keep two systems, just fix the 5 bugs individually: Each fix is shallow and doesn't prevent the next dual-dispatch bug. The last 2 months show that independent fixes don't work when the architecture is the problem.
  • Event bus pattern: Over-engineered for the actual need. The two systems have different calling conventions (1-arg vs 2-arg, void vs returning) that an event bus would paper over with type erasure.

Impact

  • Affected: All hook users — plugins, HOOK.md workspace hooks, bundled hooks
  • Severity: High — hooks silently not firing, plugin state lost on reload
  • Frequency: Every gateway restart (bundler duplication), every hot-reload (clear wipes plugins), intermittent (startup race)
  • Consequence: Lost memory extraction (before_reset missing), hooks not firing after reload, startup hooks unreliable

Related issues and PRs

This supersedes/fixes the following competing approaches:

Root-cause fixes for:

Evidence

The Symbol.for() + globalThis pattern is already proven in this codebase:

  • src/plugins/runtime.ts:3-21 uses Symbol.for("openclaw.pluginRegistryState") on globalThis
  • This exact pattern survives Rolldown chunk splitting

Additional information

Implementation plan: unified registry (~80 LOC new), dispatch helpers (~80 LOC new), 9 modified files, no changes to HookRunner facade or existing hook handler signatures. All 24 typed hook type definitions remain unchanged. All 4 bundled handlers remain unchanged. Backward compatible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions