Skip to content

feat: add temporal lcm_recent rollups (1/10)#516

Closed
100yenadmin wants to merge 55 commits intoMartian-Engineering:mainfrom
electricsheephq:feat/lcm-temporal-memory-consolidated
Closed

feat: add temporal lcm_recent rollups (1/10)#516
100yenadmin wants to merge 55 commits intoMartian-Engineering:mainfrom
electricsheephq:feat/lcm-temporal-memory-consolidated

Conversation

@100yenadmin
Copy link
Copy Markdown
Contributor

@100yenadmin 100yenadmin commented Apr 28, 2026

PR #516: Temporal spine: lcm_recent rollups and bounded temporal recall

Maintainer Quick Read

Field Value
Stack position 1/10
Logical parent for review main
Current head db62164626e680d35b5ef4cadc89d45ea8b4a3f9
Current PR state Ready for review/merge after normal maintainer sign-off
Merge note Base of the production path.
Last body refresh 2026-05-02

Why Maintainers Should Care

This is the time-native memory foundation. It lets an agent ask normal questions like "what happened yesterday?" or "what happened yesterday 4-8pm?" and get a bounded recap with coverage/provenance, while staying honest that the recap is not proof.

Stack Map

flowchart LR
  P516["#516 Temporal spine"] --> P517["#517 Observed work read model"]
  P517 --> P518["#518 Suggestion ledger"]
  P518 --> P537["#537 Deterministic extraction"]
  P537 --> P538["#538 Event observations"]
  P518 --> P539["#539 Opt-in suggestion tools"]
  P538 --> P540["#540 Maintenance wiring"]
  P540 --> P530["#530 Tracker/open state"]
  P538 --> P531["#531 Event episodes"]
  P530 --> P532["#532 Topic/vocabulary"]
  P526["#526 Draft reference"] -. "not the merge path" .-> P516
Loading

How It Works

flowchart TD
  A["Agent asks a time-bounded question"] --> B["Parse period in configured timezone"]
  B --> C{"Fresh stored rollup exists?"}
  C -- "yes: day/week/month" --> D["Read rollup summary and coverage"]
  C -- "no: missing, stale, current day, sub-day" --> E["Bounded fallback over leaf summaries/messages"]
  D --> F["Return recap, coverage, degraded/missing flags, optional sources"]
  E --> F
  F --> G["Use describe/expand for exact proof"]
Loading

What Ships In This PR

  • lcm_recent for day/week/month and sub-day window entry points
  • daily, weekly, and monthly rollup storage plus source/provenance tables
  • bounded source fallback for missing, stale, current-day, or precise sub-day windows
  • operator/debug visibility through lcm_rollup_debug with source redaction by default
  • strict date/time behavior including invalid date rejection, DST-gap rejection, and UTC+13 local date preservation

What This PR Intentionally Does Not Do

  • does not prove exact commands, SHAs, file paths, timestamps, root cause, or causal chains
  • does not write Cortex memory or OpenClaw task state
  • does not build or rebuild rollups from the lcm_recent read path

How An Agent Uses It

  • "What happened yesterday?" -> lcm_recent({ period: "yesterday" }) reads a fresh day rollup or returns an explicit degraded/missing result with bounded fallback.
  • "What happened yesterday 4-8pm?" -> precise windows use bounded source fallback, not a whole-day rollup pretending to have sub-day precision.
  • "What exactly shipped?" -> use lcm_recent only to find candidate windows/sources, then verify with describe/expand/GitHub.

Engineering Boundaries

  • LCM is an evidence layer. It does not become Cortex, GBrain, OpenClaw Tasks, or a fuzzy semantic authority.
  • Recaps and observed states are entry points for investigation, not proof of exact commands, SHAs, paths, timestamps, root cause, or task completion.
  • Exact claims still require source drilldown through lcm_describe, lcm_expand, lcm_expand_query, raw messages, GitHub, logs, tickets, or the relevant authority.
  • Source identifiers stay hidden by default unless a tool has an explicit includeSources or debug affordance.
  • Reads stay read-only. Writes happen through explicit storage, maintenance, extraction, or review-ledger paths for the slice that owns them.

Primary Review Files

  • src/tools/lcm-recent-tool.ts
  • src/rollup-builder.ts
  • src/store/rollup-store.ts
  • src/tools/lcm-rollup-debug-tool.ts
  • test/rollup-store-builder.test.ts
  • test/lcm-tools.test.ts

Size And Review Surface

These are intentionally large LCM changes, so the body separates the GitHub-visible diff from the intended logical review slice. Later PRs target main, which makes the GitHub Files tab include parent-stack code; reviewers should use the logical parent listed above when judging the slice.

View Compare base Files Additions Deletions
GitHub-visible PR page main 23 7785 44
Intended reviewer slice main 45 7907 2140

Logical slice breakdown:

Category Files Additions Deletions
runtime code 16 3936 927
tests 10 2758 1153
docs/release notes 17 1205 52
metadata/package 2 8 8

Review Concerns Already Folded Into The Code

  • Fallback message timestamp scans keep mixed-format julianday() correctness and now use matching expression indexes for indexed range/order plans.
  • Read-path mutation is guarded: lcm_recent reads and reports missing/stale state rather than rebuilding.
  • Stale/pending rollups are bypassed so the tool does not serve a stale day as fresh.
  • Rollup rebuild state uses source summary fingerprints/watermarks so content edits and summary changes invalidate correctly.
  • Debug/source surfaces redact source IDs by default.

Validation Evidence

  • npm test -- --run test/rollup-store-builder.test.ts test/lcm-tools.test.ts test/plugin-prompt-hook.test.ts
  • npm test
  • npm run build
  • git diff --check
  • GitHub test check: success

Suggested Maintainer Review Path

  1. Start with the Why Maintainers Should Care and How It Works sections.
  2. Use Primary Review Files for the logical slice; the GitHub Files tab can look much larger because the public PRs target main while the series is logically stacked.
  3. Check the non-goals before asking whether LCM should create tasks, close tasks, write Cortex memory, or make authoritative project claims.
  4. Review and merge in the order shown in the stack diagram. feat: implement LCM architecture follow-up (draft) #526 stays draft/reference unless maintainers explicitly choose a single large integration branch.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a temporal rollup layer to lossless-claw and exposes it via a new lcm_recent tool (plus an opt-in lcm_rollup_debug tool), enabling deterministic day/week/month recaps and bounded sub-day window recall with provenance.

Changes:

  • Introduces SQLite schema + store + builder for day/week/month rollups with provenance (lcm_rollups, lcm_rollup_sources, lcm_rollup_state).
  • Adds lcm_recent for deterministic period/window parsing and bounded leaf-summary SQL fallback when rollups are not appropriate/available.
  • Wires rollup rebuilds into ingest/maintenance, adds operator-only lcm_rollup_debug, updates docs/changesets, and adds comprehensive test coverage.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/db/migration.ts Adds rollup tables/indexes/views and rollup-state columns.
src/store/rollup-store.ts Implements rollup/state CRUD + provenance source replacement and leaf-summary lookups.
src/rollup-builder.ts Builds daily rollups from leaf summaries and aggregates into week/month rollups.
src/tools/lcm-recent-tool.ts Implements lcm_recent period/window parsing, rollup selection, and bounded SQL fallback rendering/budgeting.
src/tools/lcm-rollup-debug-tool.ts Implements lcm_rollup_debug inspection with source redaction unless includeSources=true.
src/engine.ts Instantiates rollup services; marks conversations dirty on ingest; triggers rebuilds after maintenance.
src/plugin/index.ts Registers lcm_recent by default and lcm_rollup_debug only when enabled.
src/db/config.ts Adds rollupDebugEnabled config + env var resolver.
openclaw.plugin.json Documents/configures rollupDebugEnabled in plugin settings schema.
docs/configuration.md Documents LCM_ROLLUP_DEBUG_ENABLED / rollupDebugEnabled.
README.md Adds LCM_ROLLUP_DEBUG_ENABLED to env var table.
skills/lossless-claw/references/recall-tools.md Updates recall-tool guidance with availability gating + “recap vs proof” guardrails.
skills/lossless-claw/SKILL.md Updates skill overview and guidance to use lcm_recent only when available.
specs/lcm-temporal-memory-plan.md Adds/records consolidated implementation plan and tool contract.
test/rollup-store-builder.test.ts Adds extensive coverage for schema, DST windows, fallback behavior, aggregation, and debug redaction/caps.
.changeset/lcm-temporal-rollups.md Declares minor release for temporal rollups/tools.
.changeset/lcm-temporal-skill-docs.md Declares patch release for skill/doc guidance updates.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/tools/lcm-recent-tool.ts Outdated
Comment thread src/engine.ts Outdated
Comment thread src/rollup-builder.ts Outdated
@100yenadmin
Copy link
Copy Markdown
Contributor Author


title: "LCM upstream PR stack scenario flows"
type: handoff-test
created: 2026-04-29
status: active
tags: [lossless-claw, lcm, lcm_recent, observed-work, task-bridge, scenarios, agent-flows]
related:


LCM upstream PR stack scenario flows

This document pressure-tests the planned LCM stack against realistic agent workflows.

The goal is not just “does the schema exist?” The goal is: when a user asks a natural operational question, which tool does the agent call, what layer owns the answer, when does it escalate, and what must never happen silently?

Layer model being tested

  1. LCM temporal spine — PR feat: add temporal lcm_recent rollups (1/10) #516

    • Tool: lcm_recent
    • Optional explicit/admin writer: lcm_rollup_build or equivalent maintenance path
    • Purpose: answer bounded temporal recall questions cheaply and conservatively.
  2. Observed work density — PR feat: add LCM observed work density (2/10) #517

    • Tool: lcm_work_density
    • Purpose: summarize work signals from conversation evidence.
    • Authority: advisory only; not tasks.
  3. Task bridge suggestions — PR feat: add experimental LCM task bridge suggestions (3/10) #518

    • Possible future tool: lcm_task_suggestions
    • Purpose: store/review suggestions derived from observed work.
    • Authority: inert suggestion ledger only; no automatic task writes.
  4. Deep recall / proof layer — existing LCM tools

    • Tools: lcm_grep, lcm_describe, lcm_expand_query
    • Purpose: when rollups/density are missing, ambiguous, or need exact evidence.
  5. External authority tools

    • GitHub skill / gh: PR/issue truth.
    • OpenClaw task system / TaskFlow: operational commitments.
    • Cortex: authoritative memory/identity-like semantic memory.
    • These remain separate from LCM observations.

Scenario 1 — “What did we get done yesterday?”

User asks

What did we get done yesterday?

Ideal agent flow

  1. Call lcm_recent({ period: "yesterday" }).

  2. Call lcm_work_density({ period: "yesterday", detailLevel: 1 }).

  3. If lcm_work_density flags ambiguous items, call lcm_expand_query for the top 1–3 uncertain claims.

    • Purpose: proof before making strong claims.

Expected answer shape

Observed work yesterday:
- Completed/likely completed: 11 items
- Still unfinished: 5
- Ambiguous: 2

Highlights:
1. Fleet rollout completed on six healthy VMs.
2. Image generation default moved to openai/gpt-image-2 fleet-wide.
3. David was isolated as a transport/runtime outlier and ticketed.

Still open:
1. David image POST transport issue — ticketed, not solved.
2. LCM upstream PR stack cleanup/consolidation.

Confidence: medium-high.
Note: This is observed from LCM summaries, not authoritative task state.

Ownership boundary

  • LCM may say “appears completed” or “observed completed.”
  • LCM must not mark OpenClaw tasks done.
  • If the user asks for authoritative task status, use OpenClaw task/TaskFlow/GitHub state, not LCM alone.

Implementation implications

#516 must make yesterday’s rollup cheap and stable. #517 must expose density without pretending to be a task system.


Scenario 2 — “Where are we on the upstream LCM PRs?”

User asks

Where are we on the upstream LCM PRs?

Ideal agent flow

  1. Call GitHub via gh pr view/list for canonical truth:

  2. Call lcm_recent({ period: "today", topic: "LCM upstream PRs" }) if topic filtering exists; otherwise lcm_grep("LCM upstream PRs").

    • Purpose: recover recent planning/context.
  3. Call lcm_work_density({ period: "today", topic: "LCM upstream PRs", statuses: ["observed_unfinished", "observed_ambiguous"] }).

    • Purpose: identify remaining observed work.
  4. If user wants handoff-quality detail, call lcm_expand_query against the handoff/audit summaries or read the artifact files.

Expected answer shape

GitHub truth:
- #516 temporal lcm_recent rollups: open/draft, backbone.
- #517 observed work density: open/draft, depends on #516.
- #518 task bridge suggestions: open/draft, depends on #517.

Observed plan:
1. Finish #516 daily/weekly/monthly correctness first.
2. Keep #517 advisory/read-only.
3. Keep #518 inert/suggestion-only.

Local cleanup remains: close/supersede fork PRs once upstream links are verified.

Ownership boundary

  • GitHub owns PR state.
  • LCM owns “what we discussed/planned.”
  • Observed work density may say “cleanup appears unfinished,” but GitHub confirms actual open/closed status.

Implementation implications

Observed work should include source/period/topic filters, but should not replace live GitHub checks.


Scenario 3 — “What’s blocked?”

User asks

What’s blocked right now?

Ideal agent flow

  1. Call lcm_work_density({ period: "7d", statuses: ["observed_unfinished", "observed_ambiguous"], kinds: ["blocker", "risk", "follow_up"], detailLevel: 2 }).

  2. If a blocker is high-impact, call lcm_expand_query for evidence:

    • “What evidence says this is still blocked? Was it later resolved?”
  3. If the blocker maps to GitHub PR/issue, call GitHub for live state.

Expected answer shape

Observed blockers:
1. David image generation transport issue — unresolved, ticketed in Intercom.
   - LCM evidence: fleet rollout smoke failed only on David after provider routing was fixed.
   - Authority: Intercom ticket exists; not an LCM task.

2. LCM #516 merge readiness — still needs daily/weekly/monthly correctness validation.
   - Authority: GitHub PR state and tests.

3. Local fork PR cleanup — not yet done.
   - Authority: GitHub fork PR state.

Ownership boundary

  • LCM can surface “observed blocker.”
  • It must not create a task or reminder unless user explicitly asks.
  • If the user asks to track it, bridge/task system owns that commitment.

Implementation implications

This is the strongest justification for #517. Users ask “what’s blocked?” more often than “show me a rollup row.”


Scenario 4 — “What happened yesterday between 4 and 8pm?”

User asks

What happened yesterday from 4 to 8pm?

Ideal agent flow

  1. Call lcm_recent({ period: "yesterday 4-8pm" }) once feat: add temporal lcm_recent rollups (1/10) #516 supports sub-day windows.

  2. If unsupported or rollup missing, fallback explicitly:

    • lcm_grep or bounded summary query with since/before timestamps.
    • Convert local time to UTC safely.
  3. If exact causal chain is needed, call lcm_expand_query on candidate summaries.

Expected answer shape

For yesterday 16:00–20:00 local time:
- We investigated Golden/David image-generation config differences.
- We confirmed Golden needed Codex OAuth auth state, not just config.
- Golden smoke passed after auth profile copy.
- David remained an outlier later.

Coverage: bounded to summaries/messages in that local-time window.
Confidence: medium; exact command output available via expansion.

Ownership boundary

  • If the local time is invalid due to DST, reject with a clear error.
  • Do not “round” nonexistent local times.
  • Do not answer from a whole-day rollup as if it proves a sub-day window.

Implementation implications

Sub-day windows are Patch 4, not a day-one dependency. They require careful timezone/DST tests.


Scenario 5 — “Make me a handoff for the next agent”

User asks

Make me a handoff for the agent taking over this work.

Ideal agent flow

  1. Call lcm_recent({ period: "today", topic: "LCM PR stack" }) for compact current context.

  2. Call lcm_work_density({ period: "today", topic: "LCM PR stack", detailLevel: 2 }) to identify completed/open/ambiguous work.

  3. Call live GitHub checks for PR/issue truth.

  4. Read existing artifacts from disk if known:

    • handoff doc
    • audit doc
    • architecture docs
  5. Produce a handoff artifact, not just chat prose.

Expected artifact sections

  • Canonical upstream PRs and order.
  • What is already done.
  • What is still open.
  • What must not be changed.
  • Patch sequence.
  • Validation checklist.
  • Local cleanup plan.
  • Copy-paste agent prompt.

Ownership boundary

  • LCM provides continuity.
  • GitHub provides upstream truth.
  • Files/docs become the handoff artifact.

Implementation implications

This validates that LCM recall plus observed work can produce operationally useful handoffs while still verifying mutable external state live.


Scenario 6 — “Should we create tasks for these unfinished things?”

User asks

Should we make tasks from the unfinished items?

Ideal agent flow

  1. Call lcm_work_density({ period: "7d", statuses: ["observed_unfinished"], includeSources: true, minConfidence: 0.8 }).

  2. If Option C is enabled, call future lcm_task_suggestions({ dryRun: true, suggestionKinds: ["create_task"], minConfidence: 0.85 }).

  3. Present suggestions for review.

  4. Only after explicit user approval, use the real task system to create tasks.

Expected answer shape

I found three good task candidates from observed evidence:

1. Close/supersede local fork PRs #12–#18.
   - Confidence: high.
   - Evidence: upstream #516/#517/#518 now canonical.

2. Finish #516 daily/weekly/monthly rollup correctness.
   - Confidence: high.
   - Evidence: R-515 gap report.

3. Revisit David image-generation transport issue.
   - Confidence: medium-high.
   - Evidence: Intercom ticket #96638636.

Want me to create these as OpenClaw tasks?

Ownership boundary

  • LCM suggests.
  • User approves.
  • Task system creates.
  • LCM never silently writes tasks.

Implementation implications

This is #518’s entire point. It must prove restraint.


Scenario 7 — Missing rollup / cold start

User asks

What happened this week?

Bad behavior to avoid

  • Silently fall back to raw grep and present it as a full week rollup.
  • Build a rollup during the read call.
  • Return a confident recap when source coverage is partial.

Ideal agent flow

  1. Call lcm_recent({ period: "week" }).

  2. If response says week rollup missing:

    • Tell the user coverage is degraded.
    • Optionally call bounded lcm_grep/lcm_expand_query for a best-effort answer.
    • Optionally suggest explicit rollup build if the user/operator wants it.
  3. If explicit build is available and user/admin context allows:

    • call lcm_rollup_build({ period: "week" }) or schedule maintenance.
    • then retry lcm_recent.

Expected answer shape

I don’t have a built week rollup yet, so this is a best-effort answer from bounded summaries, not a complete weekly rollup.

Best-effort themes:
- Fleet rollout completed.
- Image-gen default rollout mostly completed; David ticketed.
- LCM upstream handoff and scenario docs created.

If you want a stronger answer, I can run/build the week rollup and retry.

Ownership boundary

Reads do not mutate. Build is explicit.

Implementation implications

#516 needs structured missing/degraded responses and a separate build path.


Scenario 8 — “Did we already file this upstream?”

User asks

Did we already file this upstream, or is it only local?

Ideal agent flow

  1. Search GitHub upstream issues/PRs live.
  2. Search local/fork PRs live.
  3. Call lcm_grep for recent discussion if needed.
  4. Use lcm_work_density only to identify observed cleanup items, not truth.

Expected answer shape

Upstream truth:
- The active PRs are #516/#517/#518 in Martian-Engineering/lossless-claw.
- The 100yenadmin PRs are local mirrors/historical branches.

Local cleanup needed:
- Close/supersede #16/#17/#18 with upstream links.
- Close/supersede #12–#15 after verifying all review notes are captured.
- Defer/cherry-pick #4–#11 ideas as noted.

Ownership boundary

GitHub owns “filed upstream.” LCM only remembers the plan.

Implementation implications

This scenario proves agents must combine LCM with live external checks.


Scenario 9 — “What should I read first?”

User/agent asks

I’m taking over. What should I read first?

Ideal agent flow

  1. Read handoff doc.
  2. Read R-515 gap report.
  3. Read Option B/C docs.
  4. Read upstream PR descriptions/diffs.
  5. Run tests/build only after understanding the target.

Expected answer shape

Read in this order:
1. HANDOFF-lcm-upstream-pr-stack-2026-04-29.md
2. R-515-lcm-rollup-pr-stack-gap-report-2026-04-28.md
3. lcm-observed-work-density-option-b-2026-04-28.md
4. lcm-task-bridge-option-c-experimental-2026-04-28.md
5. Upstream PRs #516/#517/#518 and their diffs

Ownership boundary

Docs guide work; PR diffs and tests verify implementation.


Scenario 10 — “How does this avoid becoming a second memory/task brain?”

User asks

How do these layers not collapse into a mess?

Expected answer

The stack separates three questions:

1. When did something happen?
   - LCM temporal spine: lcm_recent.

2. What does conversation evidence suggest is done/unfinished?
   - Observed work density: lcm_work_density.
   - Advisory only.

3. Should this become an operational task?
   - OpenClaw task system / TaskFlow, optionally assisted by inert suggestions.
   - Requires explicit acceptance.

LCM never becomes the authority for tasks. It provides evidence and summaries.

Implementation implications

The words matter. Tool outputs and docs must consistently say “observed,” “appears,” “evidence,” and “confidence,” not “task is done.”


Scenario matrix

Scenario First tool Escalation External truth Must not do
What happened yesterday? lcm_recent lcm_work_density, lcm_expand_query none unless PR/task claims claim authority from observation
What got done? lcm_work_density lcm_expand_query GitHub/tasks if named mark tasks done
What is blocked? lcm_work_density lcm_expand_query GitHub/Intercom/tasks create tasks silently
Where are PRs? GitHub skill / gh lcm_recent, lcm_grep GitHub trust stale rollup over live PR
Sub-day window lcm_recent bounded grep/expand none fake precision from day rollup
Missing rollup lcm_recent grep/expand, explicit build none mutate on read
Create task candidates lcm_work_density lcm_task_suggestions dry-run task system direct write from LCM
Local cleanup GitHub skill / gh LCM handoff docs GitHub delete branches before salvage

Resulting design pressure on PRs

#516 must provide

  • read-only lcm_recent
  • explicit degraded/missing rollup response
  • explicit/admin rollup build path or maintenance path
  • daily/weekly/monthly real builders
  • provenance and coverage accounting
  • timezone/DST-safe windows before sub-day support ships

#517 must provide

  • observed-work storage and read tool
  • status vocabulary that avoids task authority
  • confidence/rationale/provenance
  • compact density response
  • no read mutation
  • no sync to Cortex or tasks

#518 must provide

  • suggestion storage only
  • review states: pending/accepted/rejected/dismissed/expired
  • source IDs and confidence
  • tests proving no OpenClaw task writes
  • no registered write tool by default

Final takeaway

The stack works if each layer answers one narrow question:

  • LCM recent: “What happened in this time window?”
  • Observed work density: “What does the evidence suggest is done/unfinished?”
  • Task bridge suggestions: “Would you like to turn this evidence into an explicit task action?”

If any layer starts silently mutating the next layer, the architecture fails.

@100yenadmin
Copy link
Copy Markdown
Contributor Author

Final adversarial pass - merge order 1/10

Score: mergeability 9.8/10, functionality 9.4/10, description 9.7/10.

Functionality audit: Works as the temporal spine. lcm_recent covers today/yesterday/date/week/month/7d/30d plus sub-day bounded windows; missing/stale/degraded coverage is explicit; debug source IDs stay hidden unless requested; exact proof still routes to source expansion. The final hardening fixed stale daily rollups being promoted into ready aggregate recaps and added raw-message timestamp indexes for bounded fallback.

Scenario coverage: Supports temporal recap and “yesterday 4-8pm” style entry flows. It intentionally does not prove exact commands, SHAs, root cause, shipped status, or task status.

Final state: MERGEABLE, GitHub test pass, GraphQL review threads: 0 unresolved. Local focused suite, full npm test, npm run build, and git diff --check passed on head 076e476.

Residual caveat: lcm_recent is a recap/window-entry tool, not proof. Merge this first.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/tools/lcm-recent-tool.ts Outdated
Comment thread specs/lcm-temporal-memory-plan.md
@100yenadmin
Copy link
Copy Markdown
Contributor Author

Final May 1 hardening/status update:

  • Merge order: 1/10 — Temporal spine
  • Head: f51042f
  • Review state: 0 unresolved GraphQL review threads after the latest push
  • GitHub state: MERGEABLE / CLEAN, test check green
  • Validation: 47 focused tests; build; diff-check
  • Score: Functionality 10/10, mergeability 10/10, description 10/10
  • Note: Ready first. Latest hardening prevents pending/stale rollups from hiding fresh fallback evidence and keeps rebuild pending when leaf summaries change during a sweep.

The PR description top note has been refreshed with current head SHA, visible-vs-logical LOC breakdown, scope, caveats, and validation status. LCM remains an evidence layer: recaps, observed work, events, episodes, and suggestions are entry points, not proof or task authority.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/tools/lcm-recent-tool.ts
@100yenadmin 100yenadmin requested a review from Copilot May 2, 2026 19:17
Eva added 16 commits May 4, 2026 02:04
…ve ints

Round 1's P2-2 clamped the env path via \`parseFinitePositiveInt\`, but
the plugin-config path still resolved through plain \`toNumber\` and
silently accepted 0 / negative — manifest's \`minimum:1\` invariant
held only when the env var was the source of truth. Add
\`toPositiveNumber\` and route the three rollup token caps through it
so both config sources enforce the same floor.

Closes R2-FIX-5 from audit/pr516/round2-A-fix-correctness.md (P2-2 caveat).
R3-FIX-1: both queries `ORDER BY julianday(...) DESC LIMIT 1001` lacked a
PK tiebreaker. Deterministic within a single SQLite instance via rowid
stability, but breaks across `.dump | sqlite3 newdb` reimport — two rows
with identical timestamps flip ordering after the round-trip.

Adds `, summary_id ASC` to the summaries fallback query and
`, m.message_id ASC` to the messages fallback query. Mirrors the
c192ee8 fix for listConversationsBySessionKey (P1-9).

Exposes getRecentSummaryFallback via __lcmRecentTestInternals so the
regression test can drive it directly.

Adds a regression test that inserts two summaries with identical
latest_at/created_at in reverse alphabetical insertion order and
asserts the ASC tiebreak puts the lexicographically smaller summary_id
first across repeated invocations.
…TE tx

R3-FIX-2: P1-8 closed the FK-consistency hole in buildAggregateRollup
and the non-empty buildDayRollup path, but the empty-day cleanup
branches still ran `getRollup` + `deleteRollup` outside any transaction.
A concurrent rebuild that upserts a fresh rollup at the same
(conversation, kind, key) slot had its work destroyed by a
stale-empty-snapshot delete from a racing caller. Self-healing via
pending_rebuild=1, but transient data loss until next maintain.

Wraps both empty-day cleanup sites — one in buildDailyRollups' main
loop, one in buildDayRollup itself — in
`withDatabaseTransaction(this.store.db, "BEGIN IMMEDIATE", …)` and
reads getRollup INSIDE the transaction so the decision to delete is
made against the row that's actually there at lock time.

Adds a structural regression test (mirrors P1-8's source-text scan)
asserting that in both empty-day branches `getRollup(` and
`deleteRollup(` appear AFTER `withDatabaseTransaction(`. Per Round-1
spec, deterministic concurrency repros are hard; the structural
assertion is the accepted approach.
R3-FIX-3: the original assertion `details.tokenCount <= 500` would also
pass on the pre-R2-FIX-2 code because the outer enforceResponseBudget
at lcm-recent-tool.ts:2133 unconditionally clamps responses to budget.
The fix is correct; the test was tautological (R3-H ⚠ from
audit/pr516/SUMMARY_R3.md).

Strengthens the test in three ways:

1. Bulks the fixture to six prior-day digests + four live-fallback
   leaves so combinedContent (the inner index assembly) genuinely
   overflows the 500-token budget — the post-R2-FIX-2 re-truncation
   path at lcm-recent-tool.ts:1944 actually fires, rather than the
   test silently passing because the synthetic content fit.
2. Adds structural assertions that the rollup-index header is
   well-formed, the period count in the header matches the number of
   `#### ` entries (R2-FIX-3 invariant cross-check), and the
   live-fallback section header is present — so a future refactor
   that drops the live-fallback merge or breaks the count-matching
   would now fail this test.
3. Asserts `details.truncated` is true, which is the externally
   observable signal callers act on; under-reporting was R2-H's
   concern about the pre-fix path.

A truly black-box discriminator (response text differs pre- vs
post-fix) is not achievable here because the outer enforceResponseBudget
clamps both pipelines down to the same length whenever combinedContent
overflows. The test now defends the structural integrity of the
combined-output pipeline rather than a tautological budget check.

No production change.
P0-2: Wire a synchronous, replay-clean resolver that walks
runtimeConfig.models.providers[*].models[*] for {id, contextWindow}
and returns a closure mapped to the registration-time snapshot.
Optionally seeds gaps from the pi-ai catalog (mod.getModels) at
lower precedence. Exposes via createLcmDependencies as
getModelContextWindow so lcm-recent's auto-detail-level can drop
its live-RPC tiers (Tier 2/3) in a follow-up commit.

Replay-determinism analysis: result is a pure function over the
snapshot; rebuild-per-process matches replay-machine guarantees;
no wall-clock or live RPC reads.

Finding ID: P0-2 (round2-C-design.md Item 1)
P0-2: Delete the unreachable Tier 2 (status RPC) and Tier 3
(models.list RPC) ladders from computeAutoDetailLevel. The
investigation confirmed both have never returned non-null:

- Tier 2's pickFirstNumber walked response paths
  (sessions.byAgent[].entries[].usage.{promptTokens,...}) that
  don't match openclaw's actual StatusSummary shape (which uses
  recent[].totalTokens / recent[].contextTokens keyed by key).
- Tier 3 is unreachable because plugin-sdk doesn't expose
  gateway-RPC invocation; callGateway throws on models.list.

Removed: runtimeStatusCache, modelContextWindowCache,
modelCatalogFetchPromise, runtimeStatusFetchPromise,
RUNTIME_STATUS_CACHE_TTL_MS, fetchRuntimeStatus,
ensureModelCatalogLoaded, pickFirstNumber. computeAutoDetailLevel
loses its callGateway and sessionKey parameters and is now
synchronous. Updated the single call site in the tool body.

Net diff: -215 / +21. Output is a pure function of (telemetry-row,
registry-snapshot) — replay-clean by construction. See
audit/pr516/INVESTIGATION-design-calls.md Question 1 for the full
reachability analysis.

Finding ID: P0-2 (round2-C-design.md Item 1)
P1-3: The post-build state-update block at the end of
buildDailyRollups read latestState / latestSummaryCreatedAt /
latestSummaryFingerprint OUTSIDE any transaction, decided
shouldClearPending, then wrote pending_rebuild=0 — a concurrent
ingest writer that flipped pending_rebuild=1 between our read and
our write was silently clobbered, leaving rollup state stale until
the next ingest re-armed.

Fix: wrap the read-decide-write in
withDatabaseTransaction(this.store.db, "BEGIN IMMEDIATE", ...) and
re-read all three fields INSIDE the txn body. With BEGIN IMMEDIATE,
SQLite blocks any concurrent writer until our COMMIT; our reads
observe the freshest committed value.

Tests: structural pin (P1-8 / R3-FIX-2 pattern) asserting all
three reads + the upsertState write appear AFTER
withDatabaseTransaction("BEGIN IMMEDIATE", ...), plus a runtime
race test that hooks getState to simulate a concurrent
last_message_at advance and asserts the builder re-arms
pending_rebuild=1 instead of clobbering to 0.

Finding ID: P1-3 (round2-C-design.md Item 3)
Adds `Clock { now(): Date }` interface to `LcmDependencies` as a REQUIRED
field. Plugin's `createLcmDependencies` populates it with a live-clock
impl (`{ now: () => new Date() }`). Tests inject a frozen clock for
deterministic windows.

This is the contract that read-path wall-clock reads will switch to in
follow-up commits — replacing direct `new Date()` / `Date.now()` calls
in `engine.ts` and `lcm-recent-tool.ts`. Object-with-method shape
matches the `subagent` convention in `PluginRuntime`, future-extensible
to `monotonic()` without breaking callers.

Updated `createTestDeps` in 4 test files (engine, bootstrap-flood,
session-operation-queues, lcm-command) so the typed return matches the
new required field. `*.test.ts` factories that cast `as unknown as
LcmDependencies` are unaffected.

Finding ID: P0-3 (B1, B2)
Replaces three direct \`new Date()\` reads in \`lcm-recent-tool.ts\` with
\`deps.clock.now()\`:

- \`resolveWindowPeriod\` (line ~424): \`last Nh\` / \`last Nm\` window end
- \`resolvePeriod\` (line ~482): \`today\` anchor day-key resolution
- \`assembleAcrossConversations\` (line ~1466): \`currentDayKey\`

\`resolvePeriod\` and \`resolveWindowPeriod\` gain a required \`clock: Clock\`
parameter, threaded from the tool's main \`execute\` via \`input.deps.clock\`.
The new test \"anchors 'today' off the injected clock — B3 regression\" pins
the contract: with a frozen clock, \`resolvePeriod(\"today\", ...)\` produces
the expected day-key independent of the wall clock.

\`new Date(value)\` calls that wrap a recorded value (\`row.built_at\`,
\`rollup.period_start\`, etc.) are deterministic by construction and not
changed.

Existing \`__lcmRecentTestInternals.resolvePeriod\` callers updated to pass
a live-clock third arg. \`makeRecentDeps()\` factory in
\`rollup-store-builder.test.ts\` populated \`clock\` so the tool can find it
at runtime.

Finding ID: P0-3 (B3)
…sBack

Removes the \`now = new Date()\` default arg on
\`computeRollupMaintenanceDaysBack\` and threads the clock through the
single caller in \`maintain()\` via \`this.deps.clock.now()\`. This is the
last read-path wall-clock site in \`engine.ts\` related to rollup
maintenance, bundling R3-D-1.

Adds the regression test \"maintain() uses deps.clock.now() to compute
rollup-maintenance daysBack — B4 regression\" that injects a frozen
clock at engine construction (not via vi.setSystemTime) and asserts the
computed daysBack uses the injected time, not the wall clock. With
last_rollup_check_at at frozenAt - 2d 2h, daysBack must be 4 (ceil 2.083
+ 1).

Finding ID: P0-3 (B4) / R3-D-1
Removes \`Intl.DateTimeFormat().resolvedOptions().timeZone\` from the
\`engine.timezone\` getter. The fallback caused read-path output to vary
by host machine, defeating reproducibility — a workstation in Pacific
time and a server in UTC produced different day-keys for the same
recorded conversation.

Behavior now:
- If \`config.timezone\` is non-empty, return it as-is.
- Otherwise return 'UTC' and emit a one-time \`log.warn\` at engine
  construction telling the operator to set the config. Subsequent
  getter calls do NOT re-warn (debounced by emitting only in the
  constructor).

Adds two regression tests (B5):
- empty config.timezone returns 'UTC' and warns exactly once across
  repeated getter calls
- non-empty config.timezone returns it without warning

Finding ID: P0-4 (B5)
Adds \`RollupStore.getTimezone(conversationId): string | null\` that reads
the \`lcm_rollup_state.timezone\` column for one conversation (returning
null if no row exists). \`upsertState\` already wrote this column;
this is just the read path.

\`lcm_recent\` now resolves timezone as
\`store.getTimezone(conversationId) ?? engine.timezone\`, with the rollup
state row winning over the engine-wide default. This means a conversation
recorded in \`America/Los_Angeles\` keeps its day-keys / sub-day windows
even if the operator later changes the plugin-level config or runs the
engine on a host with a different default. Brand-new conversations
without state rows still fall back to the engine default.

Adds two B6 regression tests in \`rollup-store-builder.test.ts\`:
- \`RollupStore.getTimezone\` returns null pre-state and the persisted tz
  post-\`upsertState\`
- \`lcm_recent\` reads the persisted LA tz over engine UTC default —
  proven by a sub-day window that only intersects the seed under LA tz

Finding ID: P0-4 (B6)
The strict-UTC getter and construction-time warning gated on
`config.timezone.length === 0`, so a whitespace-only string
("   ", a tab, etc.) bypassed BOTH the warning and the UTC
fallback. The getter returned the whitespace verbatim and
RollupBuilder threaded it into Intl.DateTimeFormat, which
throws RangeError on the first rollup-maintenance pass.

Switch both gates to `!config.timezone?.trim()` so empty,
whitespace, null and undefined all fall through to UTC and
emit the one-time warning.

Finding: RD-FIX-1 (P2) from audit/pr516/round-design-B-new-bugs.md.
…olver

The optional pi-ai catalog seeding via mod.getModels was wired into
the resolver signature but the production call site always passed
undefined — pi-ai catalog seeding cannot run synchronously inside
createLcmDependencies. The mod-only branch is dead code that adds
function-surface noise.

Remove the parameter from the signature, the seeding branch from
the body, and the undefined arg from the call site. Drop the two
mod-specific tests (they were testing dead behavior). Add a
regression test pinning last-write-wins precedence across
cross-provider model-id collisions and document it in the JSDoc.

Finding: RD-FIX-2 (P3) from audit/pr516/round-design-B-new-bugs.md.
Also addresses RD-FIX (P3-2) precedence-undocumented gap.
Batch A added \`export\` to computeAutoDetailLevel so the test file
could import it directly. That widened the public API surface for
what is genuinely internal logic — a downstream consumer importing
it would lock in the signature and turn future changes into
breaking ones.

Drop the export keyword and add the function to the existing
__lcmRecentTestInternals namespace alongside the other test hooks
(resolvePeriod, extractRollupDigest, ...). Update the test to
import via that namespace. Behavior is unchanged.

Finding: RD-FIX-3 (P3) from audit/pr516/round-design-B-new-bugs.md.
…y drift

execute() observed wall-clock at two independent sites: resolvePeriod
(via deps.clock.now() → today day-key) and the currentDayKey gate at
the rollup-state-vs-current-day check. With a live clock, two reads a
few ms apart could land on different sides of local midnight,
producing an inconsistent view of "today" within a single tool call.

Capture deps.clock.now() exactly once at execute() entry as callTime
and thread the captured Date through both sites. To keep
resolvePeriod a pure function of its inputs, its signature changes
from (period, timezone, clock: Clock) to (period, timezone, now: Date)
— resolveWindowPeriod likewise. Tests in test/rollup-store-builder.test.ts
update accordingly.

Add a regression test pinning the contract: a stepping clock that
crosses local midnight between calls is consumed exactly ONCE per
lcm_recent invocation.

Finding: RD-FIX-4 (P3) from audit/pr516/round-design-B-new-bugs.md.
@100yenadmin 100yenadmin force-pushed the feat/lcm-temporal-memory-consolidated branch from 1552e50 to 5902c47 Compare May 3, 2026 19:07
100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request May 5, 2026
…ent rejection + 5 user scenarios

Per reviewer/operator feedback: the prior PR description was an architecture
dump that didn't explain why v4.1 is positively better than the rollup
approach in Martian-Engineering#516, didn't walk through user scenarios, and didn't reference
the lcm_recent rejection history.

This rewrite:

- Leads with "Why we threw out lcm_recent" explaining the three failure
  modes we hit: repetition, compression-of-compression, and the inability
  to ask sideways (topic-not-time) questions.
- Walks through 5 concrete user scenarios with before/after comparison:
  yesterday's work / paraphrastic rebase question / operator hard-forget /
  entity tracking ("all the work I've done with Voyage") / opt-in themes.
- Adds a cost discipline table (per-tier model dispatch is the lever).
- Adds "What v4.1 is NOT" (intentional non-goals: not RAG, not auto-rollups,
  not auto-tied to themes).
- Operator setup walkthrough with expected log lines.
- Architecture diagrams collapsed into <details> for reviewers who want
  the technical depth but skippable for first read.
- Final.review (ec99fd0) findings documented in adversarial review history.

Live-DB harness output (still PASSED) preserved as the smoking gun.
@100yenadmin 100yenadmin closed this May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants