Skip to content

Expose assembly provenance/cache policy and treat openai-codex cache windows as mutation-sensitive #534

@100yenadmin

Description

@100yenadmin

Summary

lossless-claw/LCM appears to assemble prompt-ready compact context correctly, but the current context-engine result surface is too coarse for the host to reason safely about assembly provenance, fallback mode, raw-history debt, and prompt-cache sensitivity. In the observed OpenClaw integration, this contributed to ambiguity around whether the assembled context should be prompt-authoritative and whether Codex prompt-cache windows should delay prompt-mutating deferred compaction.

This is a companion hardening issue to the OpenClaw host-side bug filed as openclaw/openclaw#74229. The primary live reset loop appears OpenClaw-side, but LCM can make the contract harder to misuse.

Environment observed

  • Installed plugin: @martian-engineering/[email protected]
  • Active OpenClaw model: openai-codex/gpt-5.5
  • Active context window: 258000
  • LCM config highlights:
    • freshTailCount=64
    • freshTailMaxTokens=16000
    • proactiveThresholdCompactionMode=deferred
    • cacheAwareCompaction.enabled=true
    • cacheAwareCompaction.cacheTTLSeconds=300
    • promptAwareEviction=false

Confidence matrix

  • 0.88 - issue: LCM assembly is intended to produce prompt-ready compact context, but the returned SDK surface does not expose enough metadata for hosts to distinguish authoritative assembly from fallback live context or emergency recovery.
  • 0.84 - issue: prompt-cache mutation sensitivity is still Anthropic/Claude-shaped in the current source lineage, while live telemetry shows openai-codex/gpt-5.5 has active hot cache signals.
  • 0.84 - live evidence: DB telemetry shows active openai-codex/gpt-5.5, cache_state=hot, retention=long, last_observed_cache_read=59904, and prompt token telemetry.
  • 0.75 - contributing cause: session reset/rotation in the host makes exact at-failure LCM attribution harder because the DB tracks the active sessionKey through reset rather than preserving a clean per-session assembly snapshot.
  • 0.70 - possible cause: the installed plugin is bundled/minified and does not preserve source-level provenance, making live debugging harder when multiple local PR worktrees exist.
  • 0.65 - possible cause: after a host reset, continuity depends on LCM summaries and fresh tail; if exact artifacts are not preserved in summaries or recall misses, the user experiences partial forgetfulness even though raw data exists in the DB.

Source evidence

LCM assembly is intended to build model-ready context:

  • src/assembler.ts defines assembled messages as ordered messages ready for the model.
  • The assembler builds context under tokenBudget, protects fresh tail, drops older items to fit, and returns chronological messages.
  • Summaries are converted into synthetic user messages with structured XML wrappers before being returned.
  • OpenClaw's SDK contract also states ContextEngine.assemble() returns ordered messages ready for the model.

The current result surface is coarse:

  • LCM returns only { messages, estimatedTokens } through the context-engine interface.
  • Richer internal information such as summary/raw counts, selection mode, fallback reason, cache-aware state, fresh-tail size, and raw-history debt is logged/debug-only rather than returned to the host.
  • ContextEngineInfo only exposes id/name/version plus broad capabilities like ownsCompaction and turnMaintenanceMode; there is no capability indicating whether assembly is prompt-authoritative, whether the result is a fallback, or what host precheck behavior is safe.

Prompt-cache sensitivity appears too provider-specific:

  • Current source lineage has generic cache telemetry and hot-cache handling, but prompt-mutating deferred compaction delay is Anthropic/Claude-family-specific.
  • Live telemetry shows openai-codex/gpt-5.5 with active hot cache and retention=long, so Codex should likely be treated as mutation-sensitive when runtime telemetry says a cache window is active.
  • Cache-write-only usage should probably count as an active cache touch/window, not become cold by default, when lastCacheTouchAt/retention are available.

Live evidence

Read-only DB inspection showed:

conversations: 755
messages: 275264
summaries: 3558
context_items: 35596
conversation_compaction_telemetry: 201

Active main telemetry included:

provider=openai-codex
model=gpt-5.5
cache_state=hot
retention=long
last_observed_cache_read=59904
last_observed_cache_write=0
last_observed_prompt_token_count=60817
last_cache_touch_at=2026-04-29T07:48:04.137Z

Active main context after recovery was compact relative to raw history:

  • raw stored history: tens of thousands of messages and about 20M stored message tokens for the active main conversation
  • current context items: compact summary + recent message set
  • maintenance row: reason=cold-cache-catchup, token_budget=258000, current_token_count=87767, completed successfully

OpenClaw logs in the same incident window show repeated openai-codex/gpt-5.5 overflow prechecks and eventual session reset, so the LCM side needs enough metadata to help the host avoid misclassifying raw-history debt as prompt-admission failure.

Possible causes to investigate

  1. LCM does not expose result-level provenance: assembled, fallback-live, fallback-empty, fallback-no-user-turn, emergency, etc.
  2. LCM does not expose whether the assembled result should be treated as prompt-authoritative by host prechecks.
  3. LCM does not expose raw-history debt separately from prompt-ready assembled size.
  4. LCM does not expose enough cache-policy metadata for hosts to avoid cache-breaking prompt mutations.
  5. Prompt-cache mutation delay is Anthropic/Claude-family-specific even though Codex cache telemetry is observed.
  6. Cache-write-only or cache-touch-only telemetry may be classified too cold when it should represent an active mutation-sensitive cache window.
  7. Session-key continuity through host resets makes it hard to reconstruct exact per-session assembly state after incidents.
  8. Summary/fresh-tail continuity can be partial after host reset if exact artifacts are not in the assembled summaries or if recall tools miss.

Suggested fixes

Assembly result metadata

Extend the context-engine result or add an optional LCM-specific metadata field with values like:

type LcmAssemblyMetadata = {
  source: "assembled" | "fallback-live" | "fallback-empty" | "fallback-no-user-turn" | "emergency";
  promptAuthoritative: boolean;
  estimatedTokens: number;
  rawHistoryEstimatedTokens?: number;
  rawHistoryDebtTokens?: number;
  contextItemCount?: number;
  summaryCount?: number;
  rawMessageCount?: number;
  freshTailCount?: number;
  freshTailTokens?: number;
  cacheState?: "hot" | "cold" | "unknown";
  cacheMutationSensitive?: boolean;
  fallbackReason?: string;
};

The exact shape can differ, but the host needs to know whether it should treat LCM assembly as the prompt-admission authority.

Cache policy

Replace the Anthropic-only prompt-mutation sensitivity check with a provider/cache-policy helper that includes openai-codex when telemetry shows an active prompt-cache window.

Suggested test cases:

  • openai-codex + cache read + retention long => mutation-sensitive hot window
  • openai-codex + cache write / last cache touch + retention long => mutation-sensitive hot window
  • explicit cache break observation => cold / no delay
  • retention none => no delay
  • Anthropic/Claude behavior remains covered

Observability

Persist enough assembly snapshots or telemetry to reconstruct incident-time state after a host session reset:

  • last assembly source/provenance
  • last assembled token estimate
  • last raw-history estimate/debt
  • last fallback reason
  • last selected context item counts

Impact

Without richer metadata, hosts can accidentally treat LCM's raw transcript debt as prompt-admission failure, retry compaction unnecessarily, break prompt-cache locality, or reset the session even when LCM has already assembled a compact prompt-ready context. The user-visible result is context blowup, cache disruption, and partial forgetfulness after reset.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions