Skip to content

feat(atlas): persist preferred task session reuse#2640

Merged
code-yeongyu merged 3 commits intocode-yeongyu:devfrom
HaD0Yun:had0yun/atlas-task-session-reuse
Mar 18, 2026
Merged

feat(atlas): persist preferred task session reuse#2640
code-yeongyu merged 3 commits intocode-yeongyu:devfrom
HaD0Yun:had0yun/atlas-task-session-reuse

Conversation

@HaD0Yun
Copy link
Copy Markdown
Contributor

@HaD0Yun HaD0Yun commented Mar 17, 2026

Summary

  • persist and prefer the last reusable subagent session for the current top-level Atlas plan task
  • keep Atlas continuation task-scoped by avoiding cross-task session bleed from plan advancement or explicit old-session follow-ups
  • surface the preferred reuse session back into Atlas continuation / verification guidance while preserving fresh-session fallback

Changes

  • extend persisted boulder state with optional task_sessions so Atlas can remember a preferred reusable subagent session per current top-level plan task
  • add a conservative top-level task parser that tracks the first unchecked top-level TODO / final-wave task without treating nested checkboxes as task boundaries
  • capture the current task key before delegation and reuse that stable key when persisting task-session state after task() completes
  • skip updating the current task mapping for explicit task(session_id=...) continuations, so old-session follow-ups do not poison the new task's preferred session
  • make Atlas prefer the tracked task session in completion / continuation reminders, while still falling back to a fresh session when no reusable session exists or reuse would cross task boundaries
  • tighten session-id extraction so Atlas prefers the authoritative trailing <task_metadata> block instead of incidental body text
  • add regression coverage for:
    • missing task_sessions backward compatibility
    • current top-level task parsing
    • stable task-key persistence across checkbox advancement
    • explicit old-session continuation not overwriting the current task mapping
    • preferred reuse session hints in Atlas continuation output

Testing

Targeted local verification run before opening this PR:

bun run typecheck
bun run build
bun test src/features/boulder-state/storage.test.ts src/hooks/atlas/subagent-session-id.test.ts src/hooks/atlas/index.test.ts src/hooks/start-work/index.test.ts src/tools/delegate-task/sync-continuation.test.ts src/tools/delegate-task/background-continuation.test.ts src/tools/delegate-task/tools.test.ts src/agents/atlas/atlas-prompt.test.ts

Results:

  • bun run typecheck
  • bun run build
  • targeted bun test ... command above → 242 pass, 0 fail

Related Issues

This PR is intentionally scoped as a narrow Atlas continuity improvement rather than a broad session-architecture redesign.

Continuation metadata lineage

Boulder / session storage concurrency lineage

Atlas handoff / contract / structured-plan lineage

Atlas execution-quality background

Why this change is needed

Atlas already has the primitive needed for continuation: task(session_id=...) can resume an existing delegated session with preserved context. The missing piece is persistence and preference at the current top-level plan-task boundary. Today, persisted Atlas / boulder state is plan-wide, while task-scoped reuse depends too heavily on transient session output, in-memory bookkeeping, and the orchestrator reusing the right session_id at the right moment.

That gap shows up in two concrete ways:

  1. the last good subagent session for the current task is not durably attached to that task in persisted state
  2. once the plan advances or Atlas explicitly follows up on an older session, the current task can accidentally inherit the wrong reusable session unless the runtime is careful

This PR addresses that gap without trying to redesign Atlas into a general per-plan / per-session state machine.

Scope / non-goals

Persist and prefer the last subagent session for the current top-level plan task; fall back to a fresh session when no reusable session exists or reuse would cross task boundaries; explicitly not a broad per-plan/per-session redesign.

Here, “top-level plan task” is kept conservative: the current task unit Atlas is executing from its todo-list / structured-plan flow, not a broader undocumented global task graph.


Summary by cubic

Persist and prefer the last reusable subagent session for the current top‑level Atlas task, now hardened with lineage validation and ambiguity guards to improve continuity without cross‑task bleed. Preferred sessions are surfaced in continuation, completion, and verification prompts, with a safe fresh‑session fallback.

  • New Features

    • Persist task_sessions in boulder.json and expose getTaskSessionState/upsertTaskSessionState.
    • Capture the delegated task key at task() start and persist on completion, even if the plan advances.
    • Skip persistence when task(session_id=...) is used or when multiple in‑flight calls claim the same task.
    • Prefer the tracked session in completion gates, orchestrator reminders, and idle continuation prompts; add a “Preferred reuse session” hint.
    • Validate extracted session IDs against the active boulder lineage before suggesting or persisting them.
  • Bug Fixes

    • Tightened session reuse parsing: prefer the trailing <task_metadata> block, ignore body text, and handle missing IDs with a placeholder.
    • Do not reuse explicitly resumed session IDs in follow‑up reminders for the current task.
    • Clean up pending task refs on background launches to avoid stale state.

Written for commit 8859da5. Summary will update on new commits.

@HaD0Yun
Copy link
Copy Markdown
Contributor Author

HaD0Yun commented Mar 17, 2026

I have read the CLA Document and I hereby sign the CLA

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 17, 2026

All contributors have signed the CLA. Thank you! ✅
Posted by the CLA Assistant Lite bot.

github-actions bot added a commit that referenced this pull request Mar 17, 2026
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 14 files

Confidence score: 3/5

  • There is some merge risk here: both reported issues are medium severity (6/10) with high confidence (9/10), and each can cause concrete runtime misbehavior rather than cosmetic problems.
  • In src/hooks/atlas/subagent-session-id.ts, the task_metadata regex may span closing tags and pick up an incidental session_id, which can cause the true metadata session ID to be missed and lead to incorrect session association.
  • In src/hooks/atlas/tool-execute-after.ts, early return on background task launches can skip pendingTaskRefs.delete(callID), creating a memory-growth risk from stale map entries over time.
  • Pay close attention to src/hooks/atlas/subagent-session-id.ts and src/hooks/atlas/tool-execute-after.ts - session ID parsing correctness and pendingTaskRefs cleanup on background launches.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/hooks/atlas/subagent-session-id.ts">

<violation number="1" location="src/hooks/atlas/subagent-session-id.ts:2">
P2: `task_metadata` regex can cross closing tags, causing extraction of incidental `session_id` outside metadata and skipping the real metadata session ID.</violation>
</file>

<file name="src/hooks/atlas/tool-execute-after.ts">

<violation number="1" location="src/hooks/atlas/tool-execute-after.ts:84">
P2: `pendingTaskRefs` entries are not cleaned up on background task launches because the function returns before `pendingTaskRefs.delete(callID)` executes, causing unbounded map growth over time.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@HaD0Yun
Copy link
Copy Markdown
Contributor Author

HaD0Yun commented Mar 17, 2026

Thanks — I addressed the 2 issues identified by cubic and pushed a follow-up commit (8adf6a2c).

What changed:

  • tightened task_metadata session extraction so Atlas now parses the last explicit <task_metadata>...</task_metadata> block first, then only falls back to Session ID: output
  • added a regression test covering incidental session_id text outside the metadata block
  • cleaned up pendingTaskRefs before the background-launch early return path so background task launches do not leave stale entries behind
  • added a regression test covering pending task-ref cleanup on background launches

Fresh verification after the fix:

bun run typecheck
bun run build
bun test src/features/boulder-state/storage.test.ts src/hooks/atlas/subagent-session-id.test.ts src/hooks/atlas/index.test.ts src/hooks/start-work/index.test.ts src/tools/delegate-task/sync-continuation.test.ts src/tools/delegate-task/background-continuation.test.ts src/tools/delegate-task/tools.test.ts src/agents/atlas/atlas-prompt.test.ts

Result: 244 pass, 0 fail

@cubic-dev-ai re-run review please.

@cubic-dev-ai
Copy link
Copy Markdown

cubic-dev-ai bot commented Mar 17, 2026

Thanks — I addressed the 2 issues identified by cubic and pushed a follow-up commit (8adf6a2c).

What changed:

  • tightened task_metadata session extraction so Atlas now parses the last explicit <task_metadata>...</task_metadata> block first, then only falls back to Session ID: output
  • added a regression test covering incidental session_id text outside the metadata block
    ...

@HaD0Yun I have started the AI code review. It will take a few minutes to complete.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 14 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

Auto-approved: Defensive implementation with backward compatibility for boulder state, extensive regression tests, and targeted logic that improves session continuity without breaking existing flows.

Copy link
Copy Markdown
Owner

@code-yeongyu code-yeongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @HaD0Yun! 👋 Thanks for putting this together — the core idea of persisting task-scoped session reuse for Atlas is really solid, and I can see you've thought carefully about the edge cases (plan advancement, explicit resume guard, backward compat). Tests are comprehensive and all green (116 pass).

That said, I found a few issues that I think we should address before merging. I've left inline comments on each file — please take a look!

TL;DR — 3 blocking issues:

  1. Explicit resume path leaks the wrong preferred session into reminders (tool-execute-after.ts)
  2. Parallel task() calls all bind to the same task key (tool-execute-before.ts)
  3. Session ID parsed from model output is trusted without validation (subagent-session-id.tsstorage.ts)

Everything else looks great. The type design, backward compat handling, and test coverage are all well done. Let's get these three sorted and this is good to go! 🚀

@code-yeongyu
Copy link
Copy Markdown
Owner

📁 src/hooks/atlas/tool-execute-after.ts — Blocking

Hey! This is the main one I wanted to flag. Really nice work on the resolvePreferredSessionId helper and the pendingTaskRef flow — the sequential happy path is clean.

The issue: When Atlas explicitly resumes an older session via task(session_id="ses_old"), the before-hook correctly sets pendingTaskRef = null to skip persistence. But in the after-hook, resolvePreferredSessionId(currentSessionId, trackedSessionId) still picks up currentSessionId (which is the old resumed session), and that gets fed into buildCompletionGate() / buildOrchestratorReminder().

So the immediate reminder after an explicit resume tells Atlas: "reuse ses_old" — which is exactly the cross-task bleed we're trying to prevent!

Suggested fix:

// When pendingTaskRef === null (explicit resume), don't let the resumed
// session become the preferred session for the CURRENT task.
const preferredSessionId = pendingTaskRef === null
  ? resolvePreferredSessionId(undefined, trackedTaskSession?.session_id)
  : resolvePreferredSessionId(subagentSessionId, trackedTaskSession?.session_id)

This way, explicit resumes still work fine, but the reminder points to the correct tracked session (or falls back to "<session_id>").

Would love to see a regression test for this too — something like: "after explicit resume of old session, reminder should NOT contain the old session ID as preferred".

@code-yeongyu
Copy link
Copy Markdown
Owner

📁 src/hooks/atlas/tool-execute-before.ts — Blocking (but scopeable)

The task-ref capture logic here is clever — snapshotting the current top-level task before delegation so it stays stable even when the plan advances. Nice!

The concern: readCurrentTopLevelTask() always returns the first unchecked top-level task. Atlas's agent instructions explicitly allow multiple task() calls in one message (parallel delegation). If Atlas fires off tasks for TODO items 3, 4, and 5 simultaneously, all three capture todo:3 as their task key. When they complete, they overwrite each other's task_sessions entry.

Options (pick what feels right for this PR's scope):

  1. Scope it out explicitly — Add a code comment acknowledging this as a known limitation, create a follow-up issue. Since Atlas's primary flow is sequential task delegation, this is pragmatic.

  2. Skip persistence when ambiguous — If there are already pending task refs for the same key, skip persistence for subsequent ones. Something like:

    const existingRefForKey = [...pendingTaskRefs.values()].some(
      ref => ref?.key === currentTask.key
    )
    if (existingRefForKey) {
      // Another task() call already claimed this key — skip to avoid collision
      pendingTaskRefs.set(toolInput.callID, null)
    }

I'd be fine with option 1 honestly — it keeps this PR focused. Just want to make sure we're conscious of it.

@code-yeongyu
Copy link
Copy Markdown
Owner

📁 src/hooks/atlas/subagent-session-id.ts — Blocking (security)

The tightened parsing here is a big improvement over the old version — preferring <task_metadata> over incidental body text is the right call, and the test coverage for edge cases (body text vs metadata, post-tag bleed) is excellent.

The security concern: The session ID is ultimately parsed from model-generated text. A prompt-injected or misbehaving subagent could emit:

<task_metadata>
session_id: ses_some_unrelated_session
</task_metadata>

This gets persisted to boulder.json via upsertTaskSessionState(), then surfaced in reuse hints — potentially steering Atlas to resume an unrelated session.

Possible mitigations (in order of effort):

  1. Cheapest: Validate the extracted session ID actually exists and belongs to the current orchestrator's session tree before persisting. Something like:

    const session = await ctx.client.session.get({ path: { id: extractedId } })
    if (!session?.data) { /* skip persistence */ }
  2. Better: Source the session ID from the tool's runtime metadata (if available) rather than parsing output text. The background-agent spawner already knows the session ID it created.

  3. Best (but bigger scope): Have the task tool include session_id in its structured metadata output, not just in the text body. Then tool-execute-after reads toolOutput.metadata.session_id directly.

For this PR, option 1 would be sufficient I think. Or if you'd rather scope it out, at minimum add a // TODO: validate session ownership before persisting comment so we don't forget.

@code-yeongyu
Copy link
Copy Markdown
Owner

📁 src/features/boulder-state/top-level-task.ts — Non-blocking, suggestion

Really clean implementation! The regex patterns are well-named, the section state machine is easy to follow, and skipping nested checkboxes via indentation check is smart.

Two small things:

  1. Parser duplication: There's an existing final-wave-plan-state.ts that also parses ## TODOs / ## Final Verification Wave sections with slightly different semantics. Right now it's fine, but if the plan format ever changes, these two parsers could drift apart silently. Maybe worth a comment like // See also: final-wave-plan-state.ts for related parsing so future contributors know to update both?

  2. Regex for session IDs: The new extractSessionIdFromOutput uses ses_[a-zA-Z0-9_]+, but src/hooks/task-resume-info/hook.ts uses ses_[a-zA-Z0-9_-]+ (includes hyphens). If any session IDs contain hyphens, they'd be truncated here. Might want to align the regex patterns — easy one-char fix: ses_[a-zA-Z0-9_-]+

Neither of these is blocking, just worth keeping in mind! 🙂

@code-yeongyu
Copy link
Copy Markdown
Owner

📁 src/features/boulder-state/types.ts — Looks great ✅

The type design is clean:

  • TaskSessionState captures exactly what's needed for session reuse
  • TopLevelTaskRef is a nice abstraction for the plan parser output
  • task_sessions being optional on BoulderState ensures backward compat
  • All persisted keys are snake_case

No notes here — well done!

@code-yeongyu
Copy link
Copy Markdown
Owner

📁 src/features/boulder-state/storage.ts — Non-blocking note

The upsertTaskSessionState follows the existing appendSessionId pattern perfectly — read-modify-write with fallback to null on failure. Consistent and easy to follow.

Minor note: Both this function and appendSessionId do read-modify-write without any file locking. This isn't a regression (same pattern already exists), but concurrent task completions could theoretically lose entries. Just flagging for awareness — probably worth a follow-up issue alongside the per-plan boulder redesign (#2233 / #2050).

The task_sessions defaulting to {} in readBoulderState is the right call for backward compat. 👍

@code-yeongyu
Copy link
Copy Markdown
Owner

📁 src/hooks/atlas/verification-reminders.ts — Looks good ✅

buildReuseHint() is a clean addition. The hint text is clear and actionable for the orchestrator. Injecting it into both the completion gate and standalone verification reminder makes sense.

One tiny thought: once the blocking issue in tool-execute-after.ts is fixed (explicit resume path), the hint will always show the correct session — so no changes needed here, just making sure the upstream data is right.

@code-yeongyu
Copy link
Copy Markdown
Owner

📁 Tests (index.test.ts, subagent-session-id.test.ts, storage.test.ts) — Great coverage ✅

Really impressed with the test coverage here:

  • Stable task key test (plan advances before task completes) — this is exactly the kind of race condition test that catches real bugs
  • Cross-task resume guard (explicit session_id doesn't poison current task) — perfect
  • Session ID extraction (metadata vs body text, post-tag bleed) — thorough
  • Backward compat (missing task_sessions defaults to {}) — necessary and correct

One small thing: I noticed a couple of lines in the existing test had their indentation changed from 6 spaces to 5 spaces (the PHASE 1 / PHASE 2 assertions). Looks accidental — might want to revert those to keep the diff clean.

Missing test suggestion: After fixing the explicit-resume reminder issue, add a test that asserts: "when Atlas explicitly resumes an old session, the verification reminder should NOT contain the old session ID as the preferred reuse session."

Overall, really solid test work! 🎯

@HaD0Yun
Copy link
Copy Markdown
Contributor Author

HaD0Yun commented Mar 18, 2026

Thanks for the careful review — I pushed a follow-up that addresses the blocking points you called out.

Changes in this update:

  • explicit task(session_id=...) resumes no longer leak the resumed session into the current task's reuse hint
  • parallel in-flight task() calls that would claim the same top-level task key now skip persistence instead of colliding
  • extracted subagent session_ids are now validated against the active session lineage before being persisted or surfaced in reminders

I also added regressions for each case and re-ran:

  • bun test src/hooks/atlas
  • bun run typecheck
  • bun run build

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 8 files (changes from recent commits).

Requires human review: Large PR (1k+ lines) involving complex state persistence and regex-based markdown parsing; cannot guarantee 100% zero regressions without manual review.

@code-yeongyu code-yeongyu merged commit 33ef4db into code-yeongyu:dev Mar 18, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants