feat(atlas): persist preferred task session reuse by HaD0Yun · Pull Request #2640 · code-yeongyu/oh-my-openagent

HaD0Yun · 2026-03-17T08:27:36Z

Summary

persist and prefer the last reusable subagent session for the current top-level Atlas plan task
keep Atlas continuation task-scoped by avoiding cross-task session bleed from plan advancement or explicit old-session follow-ups
surface the preferred reuse session back into Atlas continuation / verification guidance while preserving fresh-session fallback

Changes

extend persisted boulder state with optional task_sessions so Atlas can remember a preferred reusable subagent session per current top-level plan task
add a conservative top-level task parser that tracks the first unchecked top-level TODO / final-wave task without treating nested checkboxes as task boundaries
capture the current task key before delegation and reuse that stable key when persisting task-session state after task() completes
skip updating the current task mapping for explicit task(session_id=...) continuations, so old-session follow-ups do not poison the new task's preferred session
make Atlas prefer the tracked task session in completion / continuation reminders, while still falling back to a fresh session when no reusable session exists or reuse would cross task boundaries
tighten session-id extraction so Atlas prefers the authoritative trailing <task_metadata> block instead of incidental body text
add regression coverage for:
- missing task_sessions backward compatibility
- current top-level task parsing
- stable task-key persistence across checkbox advancement
- explicit old-session continuation not overwriting the current task mapping
- preferred reuse session hints in Atlas continuation output

Testing

Targeted local verification run before opening this PR:

bun run typecheck
bun run build
bun test src/features/boulder-state/storage.test.ts src/hooks/atlas/subagent-session-id.test.ts src/hooks/atlas/index.test.ts src/hooks/start-work/index.test.ts src/tools/delegate-task/sync-continuation.test.ts src/tools/delegate-task/background-continuation.test.ts src/tools/delegate-task/tools.test.ts src/agents/atlas/atlas-prompt.test.ts

Results:

bun run typecheck ✅
bun run build ✅
targeted bun test ... command above → 242 pass, 0 fail

Related Issues

This PR is intentionally scoped as a narrow Atlas continuity improvement rather than a broad session-architecture redesign.

Continuation metadata lineage

Boulder / session storage concurrency lineage

Atlas handoff / contract / structured-plan lineage

Atlas execution-quality background

Why this change is needed

Atlas already has the primitive needed for continuation: task(session_id=...) can resume an existing delegated session with preserved context. The missing piece is persistence and preference at the current top-level plan-task boundary. Today, persisted Atlas / boulder state is plan-wide, while task-scoped reuse depends too heavily on transient session output, in-memory bookkeeping, and the orchestrator reusing the right session_id at the right moment.

That gap shows up in two concrete ways:

the last good subagent session for the current task is not durably attached to that task in persisted state
once the plan advances or Atlas explicitly follows up on an older session, the current task can accidentally inherit the wrong reusable session unless the runtime is careful

This PR addresses that gap without trying to redesign Atlas into a general per-plan / per-session state machine.

Scope / non-goals

Persist and prefer the last subagent session for the current top-level plan task; fall back to a fresh session when no reusable session exists or reuse would cross task boundaries; explicitly not a broad per-plan/per-session redesign.

Here, “top-level plan task” is kept conservative: the current task unit Atlas is executing from its todo-list / structured-plan flow, not a broader undocumented global task graph.

Summary by cubic

Persist and prefer the last reusable subagent session for the current top‑level Atlas task, now hardened with lineage validation and ambiguity guards to improve continuity without cross‑task bleed. Preferred sessions are surfaced in continuation, completion, and verification prompts, with a safe fresh‑session fallback.

New Features
- Persist task_sessions in boulder.json and expose getTaskSessionState/upsertTaskSessionState.
- Capture the delegated task key at task() start and persist on completion, even if the plan advances.
- Skip persistence when task(session_id=...) is used or when multiple in‑flight calls claim the same task.
- Prefer the tracked session in completion gates, orchestrator reminders, and idle continuation prompts; add a “Preferred reuse session” hint.
- Validate extracted session IDs against the active boulder lineage before suggesting or persisting them.
Bug Fixes
- Tightened session reuse parsing: prefer the trailing <task_metadata> block, ignore body text, and handle missing IDs with a placeholder.
- Do not reuse explicitly resumed session IDs in follow‑up reminders for the current task.
- Clean up pending task refs on background launches to avoid stale state.

^{Written for commit 8859da5. Summary will update on new commits.}

HaD0Yun · 2026-03-17T08:27:45Z

I have read the CLA Document and I hereby sign the CLA

github-actions · 2026-03-17T08:27:47Z

All contributors have signed the CLA. Thank you! ✅
_{Posted by the CLA Assistant Lite bot.}

cubic-dev-ai

2 issues found across 14 files

Confidence score: 3/5

There is some merge risk here: both reported issues are medium severity (6/10) with high confidence (9/10), and each can cause concrete runtime misbehavior rather than cosmetic problems.
In src/hooks/atlas/subagent-session-id.ts, the task_metadata regex may span closing tags and pick up an incidental session_id, which can cause the true metadata session ID to be missed and lead to incorrect session association.
In src/hooks/atlas/tool-execute-after.ts, early return on background task launches can skip pendingTaskRefs.delete(callID), creating a memory-growth risk from stale map entries over time.
Pay close attention to src/hooks/atlas/subagent-session-id.ts and src/hooks/atlas/tool-execute-after.ts - session ID parsing correctness and pendingTaskRefs cleanup on background launches.

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/hooks/atlas/subagent-session-id.ts">

<violation number="1" location="src/hooks/atlas/subagent-session-id.ts:2">
P2: `task_metadata` regex can cross closing tags, causing extraction of incidental `session_id` outside metadata and skipping the real metadata session ID.</violation>
</file>

<file name="src/hooks/atlas/tool-execute-after.ts">

<violation number="1" location="src/hooks/atlas/tool-execute-after.ts:84">
P2: `pendingTaskRefs` entries are not cleaned up on background task launches because the function returns before `pendingTaskRefs.delete(callID)` executes, causing unbounded map growth over time.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

src/hooks/atlas/subagent-session-id.ts

src/hooks/atlas/tool-execute-after.ts

HaD0Yun · 2026-03-17T09:14:58Z

Thanks — I addressed the 2 issues identified by cubic and pushed a follow-up commit (8adf6a2c).

What changed:

tightened task_metadata session extraction so Atlas now parses the last explicit <task_metadata>...</task_metadata> block first, then only falls back to Session ID: output
added a regression test covering incidental session_id text outside the metadata block
cleaned up pendingTaskRefs before the background-launch early return path so background task launches do not leave stale entries behind
added a regression test covering pending task-ref cleanup on background launches

Fresh verification after the fix:

bun run typecheck
bun run build
bun test src/features/boulder-state/storage.test.ts src/hooks/atlas/subagent-session-id.test.ts src/hooks/atlas/index.test.ts src/hooks/start-work/index.test.ts src/tools/delegate-task/sync-continuation.test.ts src/tools/delegate-task/background-continuation.test.ts src/tools/delegate-task/tools.test.ts src/agents/atlas/atlas-prompt.test.ts

Result: 244 pass, 0 fail

@cubic-dev-ai re-run review please.

cubic-dev-ai · 2026-03-17T09:15:21Z

Thanks — I addressed the 2 issues identified by cubic and pushed a follow-up commit (8adf6a2c).

What changed:

tightened task_metadata session extraction so Atlas now parses the last explicit <task_metadata>...</task_metadata> block first, then only falls back to Session ID: output

added a regression test covering incidental session_id text outside the metadata block
...

@HaD0Yun I have started the AI code review. It will take a few minutes to complete.

cubic-dev-ai

No issues found across 14 files

Confidence score: 5/5

Automated review surfaced no issues in the provided summaries.
No files require special attention.

_{Auto-approved: Defensive implementation with backward compatibility for boulder state, extensive regression tests, and targeted logic that improves session continuity without breaking existing flows.}

code-yeongyu

Hey @HaD0Yun! 👋 Thanks for putting this together — the core idea of persisting task-scoped session reuse for Atlas is really solid, and I can see you've thought carefully about the edge cases (plan advancement, explicit resume guard, backward compat). Tests are comprehensive and all green (116 pass).

That said, I found a few issues that I think we should address before merging. I've left inline comments on each file — please take a look!

TL;DR — 3 blocking issues:

Explicit resume path leaks the wrong preferred session into reminders (tool-execute-after.ts)
Parallel task() calls all bind to the same task key (tool-execute-before.ts)
Session ID parsed from model output is trusted without validation (subagent-session-id.ts → storage.ts)

Everything else looks great. The type design, backward compat handling, and test coverage are all well done. Let's get these three sorted and this is good to go! 🚀

code-yeongyu · 2026-03-18T08:15:10Z

📁 `src/hooks/atlas/tool-execute-after.ts` — Blocking

Hey! This is the main one I wanted to flag. Really nice work on the resolvePreferredSessionId helper and the pendingTaskRef flow — the sequential happy path is clean.

The issue: When Atlas explicitly resumes an older session via task(session_id="ses_old"), the before-hook correctly sets pendingTaskRef = null to skip persistence. But in the after-hook, resolvePreferredSessionId(currentSessionId, trackedSessionId) still picks up currentSessionId (which is the old resumed session), and that gets fed into buildCompletionGate() / buildOrchestratorReminder().

So the immediate reminder after an explicit resume tells Atlas: "reuse ses_old" — which is exactly the cross-task bleed we're trying to prevent!

Suggested fix:

// When pendingTaskRef === null (explicit resume), don't let the resumed
// session become the preferred session for the CURRENT task.
const preferredSessionId = pendingTaskRef === null
  ? resolvePreferredSessionId(undefined, trackedTaskSession?.session_id)
  : resolvePreferredSessionId(subagentSessionId, trackedTaskSession?.session_id)

This way, explicit resumes still work fine, but the reminder points to the correct tracked session (or falls back to "<session_id>").

Would love to see a regression test for this too — something like: "after explicit resume of old session, reminder should NOT contain the old session ID as preferred".

code-yeongyu · 2026-03-18T08:15:19Z

📁 `src/hooks/atlas/tool-execute-before.ts` — Blocking (but scopeable)

The task-ref capture logic here is clever — snapshotting the current top-level task before delegation so it stays stable even when the plan advances. Nice!

The concern: readCurrentTopLevelTask() always returns the first unchecked top-level task. Atlas's agent instructions explicitly allow multiple task() calls in one message (parallel delegation). If Atlas fires off tasks for TODO items 3, 4, and 5 simultaneously, all three capture todo:3 as their task key. When they complete, they overwrite each other's task_sessions entry.

Options (pick what feels right for this PR's scope):

Scope it out explicitly — Add a code comment acknowledging this as a known limitation, create a follow-up issue. Since Atlas's primary flow is sequential task delegation, this is pragmatic.

Skip persistence when ambiguous — If there are already pending task refs for the same key, skip persistence for subsequent ones. Something like:

const existingRefForKey = [...pendingTaskRefs.values()].some(
  ref => ref?.key === currentTask.key
)
if (existingRefForKey) {
  // Another task() call already claimed this key — skip to avoid collision
  pendingTaskRefs.set(toolInput.callID, null)
}

I'd be fine with option 1 honestly — it keeps this PR focused. Just want to make sure we're conscious of it.

code-yeongyu · 2026-03-18T08:15:29Z

📁 `src/hooks/atlas/subagent-session-id.ts` — Blocking (security)

The tightened parsing here is a big improvement over the old version — preferring <task_metadata> over incidental body text is the right call, and the test coverage for edge cases (body text vs metadata, post-tag bleed) is excellent.

The security concern: The session ID is ultimately parsed from model-generated text. A prompt-injected or misbehaving subagent could emit:

<task_metadata>
session_id: ses_some_unrelated_session
</task_metadata>

This gets persisted to boulder.json via upsertTaskSessionState(), then surfaced in reuse hints — potentially steering Atlas to resume an unrelated session.

Possible mitigations (in order of effort):

Cheapest: Validate the extracted session ID actually exists and belongs to the current orchestrator's session tree before persisting. Something like:
```
const session = await ctx.client.session.get({ path: { id: extractedId } })
if (!session?.data) { /* skip persistence */ }
```
Better: Source the session ID from the tool's runtime metadata (if available) rather than parsing output text. The background-agent spawner already knows the session ID it created.
Best (but bigger scope): Have the task tool include session_id in its structured metadata output, not just in the text body. Then tool-execute-after reads toolOutput.metadata.session_id directly.

For this PR, option 1 would be sufficient I think. Or if you'd rather scope it out, at minimum add a // TODO: validate session ownership before persisting comment so we don't forget.

code-yeongyu · 2026-03-18T08:15:37Z

📁 `src/features/boulder-state/top-level-task.ts` — Non-blocking, suggestion

Really clean implementation! The regex patterns are well-named, the section state machine is easy to follow, and skipping nested checkboxes via indentation check is smart.

Two small things:

Parser duplication: There's an existing final-wave-plan-state.ts that also parses ## TODOs / ## Final Verification Wave sections with slightly different semantics. Right now it's fine, but if the plan format ever changes, these two parsers could drift apart silently. Maybe worth a comment like // See also: final-wave-plan-state.ts for related parsing so future contributors know to update both?
Regex for session IDs: The new extractSessionIdFromOutput uses ses_[a-zA-Z0-9_]+, but src/hooks/task-resume-info/hook.ts uses ses_[a-zA-Z0-9_-]+ (includes hyphens). If any session IDs contain hyphens, they'd be truncated here. Might want to align the regex patterns — easy one-char fix: ses_[a-zA-Z0-9_-]+

Neither of these is blocking, just worth keeping in mind! 🙂

code-yeongyu · 2026-03-18T08:15:40Z

📁 `src/features/boulder-state/types.ts` — Looks great ✅

The type design is clean:

TaskSessionState captures exactly what's needed for session reuse
TopLevelTaskRef is a nice abstraction for the plan parser output
task_sessions being optional on BoulderState ensures backward compat
All persisted keys are snake_case ✅

No notes here — well done!

code-yeongyu · 2026-03-18T08:15:44Z

📁 `src/features/boulder-state/storage.ts` — Non-blocking note

The upsertTaskSessionState follows the existing appendSessionId pattern perfectly — read-modify-write with fallback to null on failure. Consistent and easy to follow.

Minor note: Both this function and appendSessionId do read-modify-write without any file locking. This isn't a regression (same pattern already exists), but concurrent task completions could theoretically lose entries. Just flagging for awareness — probably worth a follow-up issue alongside the per-plan boulder redesign (#2233 / #2050).

The task_sessions defaulting to {} in readBoulderState is the right call for backward compat. 👍

code-yeongyu · 2026-03-18T08:15:47Z

📁 `src/hooks/atlas/verification-reminders.ts` — Looks good ✅

buildReuseHint() is a clean addition. The hint text is clear and actionable for the orchestrator. Injecting it into both the completion gate and standalone verification reminder makes sense.

One tiny thought: once the blocking issue in tool-execute-after.ts is fixed (explicit resume path), the hint will always show the correct session — so no changes needed here, just making sure the upstream data is right.

code-yeongyu · 2026-03-18T08:15:54Z

📁 Tests (`index.test.ts`, `subagent-session-id.test.ts`, `storage.test.ts`) — Great coverage ✅

Really impressed with the test coverage here:

Stable task key test (plan advances before task completes) — this is exactly the kind of race condition test that catches real bugs
Cross-task resume guard (explicit session_id doesn't poison current task) — perfect
Session ID extraction (metadata vs body text, post-tag bleed) — thorough
Backward compat (missing task_sessions defaults to {}) — necessary and correct

One small thing: I noticed a couple of lines in the existing test had their indentation changed from 6 spaces to 5 spaces (the PHASE 1 / PHASE 2 assertions). Looks accidental — might want to revert those to keep the diff clean.

Missing test suggestion: After fixing the explicit-resume reminder issue, add a test that asserts: "when Atlas explicitly resumes an old session, the verification reminder should NOT contain the old session ID as the preferred reuse session."

Overall, really solid test work! 🎯

HaD0Yun · 2026-03-18T08:31:36Z

Thanks for the careful review — I pushed a follow-up that addresses the blocking points you called out.

Changes in this update:

explicit task(session_id=...) resumes no longer leak the resumed session into the current task's reuse hint
parallel in-flight task() calls that would claim the same top-level task key now skip persistence instead of colliding
extracted subagent session_ids are now validated against the active session lineage before being persisted or surfaced in reminders

I also added regressions for each case and re-ran:

bun test src/hooks/atlas
bun run typecheck
bun run build

cubic-dev-ai

0 issues found across 8 files (changes from recent commits).

_{Requires human review: Large PR (1k+ lines) involving complex state persistence and regex-based markdown parsing; cannot guarantee 100% zero regressions without manual review.}

feat(atlas): persist preferred task session reuse

5c61943

github-actions bot added a commit that referenced this pull request Mar 17, 2026

@HaD0Yun has signed the CLA in #2640

d808338

cubic-dev-ai bot reviewed Mar 17, 2026

View reviewed changes

src/hooks/atlas/subagent-session-id.ts Outdated Show resolved Hide resolved

src/hooks/atlas/tool-execute-after.ts Outdated Show resolved Hide resolved

fix(atlas): tighten session reuse metadata parsing

8adf6a2

cubic-dev-ai bot approved these changes Mar 17, 2026

View reviewed changes

code-yeongyu requested changes Mar 18, 2026

View reviewed changes

fix(atlas): harden task session reuse

8859da5

cubic-dev-ai bot reviewed Mar 18, 2026

View reviewed changes

code-yeongyu merged commit 33ef4db into code-yeongyu:dev Mar 18, 2026
3 checks passed

code-yeongyu mentioned this pull request Mar 18, 2026

fix(atlas): address review findings for task session reuse #2676

Merged

Conversation

HaD0Yun commented Mar 17, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Related Issues

Continuation metadata lineage

Boulder / session storage concurrency lineage

Atlas handoff / contract / structured-plan lineage

Atlas execution-quality background

Why this change is needed

Scope / non-goals

Summary by cubic

Uh oh!

HaD0Yun commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

HaD0Yun commented Mar 17, 2026

Uh oh!

cubic-dev-ai bot commented Mar 17, 2026

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

code-yeongyu left a comment

Choose a reason for hiding this comment

Uh oh!

code-yeongyu commented Mar 18, 2026

📁 src/hooks/atlas/tool-execute-after.ts — Blocking

Uh oh!

code-yeongyu commented Mar 18, 2026

📁 src/hooks/atlas/tool-execute-before.ts — Blocking (but scopeable)

Uh oh!

code-yeongyu commented Mar 18, 2026

📁 src/hooks/atlas/subagent-session-id.ts — Blocking (security)

Uh oh!

code-yeongyu commented Mar 18, 2026

📁 src/features/boulder-state/top-level-task.ts — Non-blocking, suggestion

Uh oh!

code-yeongyu commented Mar 18, 2026

📁 src/features/boulder-state/types.ts — Looks great ✅

Uh oh!

code-yeongyu commented Mar 18, 2026

📁 src/features/boulder-state/storage.ts — Non-blocking note

Uh oh!

code-yeongyu commented Mar 18, 2026

📁 src/hooks/atlas/verification-reminders.ts — Looks good ✅

Uh oh!

code-yeongyu commented Mar 18, 2026

📁 Tests (index.test.ts, subagent-session-id.test.ts, storage.test.ts) — Great coverage ✅

Uh oh!

HaD0Yun commented Mar 18, 2026

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HaD0Yun commented Mar 17, 2026 •

edited by cubic-dev-ai bot

Loading

github-actions bot commented Mar 17, 2026 •

edited

Loading

📁 `src/hooks/atlas/tool-execute-after.ts` — Blocking

📁 `src/hooks/atlas/tool-execute-before.ts` — Blocking (but scopeable)

📁 `src/hooks/atlas/subagent-session-id.ts` — Blocking (security)

📁 `src/features/boulder-state/top-level-task.ts` — Non-blocking, suggestion

📁 `src/features/boulder-state/types.ts` — Looks great ✅

📁 `src/features/boulder-state/storage.ts` — Non-blocking note

📁 `src/hooks/atlas/verification-reminders.ts` — Looks good ✅

📁 Tests (`index.test.ts`, `subagent-session-id.test.ts`, `storage.test.ts`) — Great coverage ✅