Skip to content

feat: sub agents#1685

Merged
yottahmd merged 26 commits intomainfrom
feat-subagent
Feb 18, 2026
Merged

feat: sub agents#1685
yottahmd merged 26 commits intomainfrom
feat-subagent

Conversation

@yottahmd
Copy link
Copy Markdown
Collaborator

@yottahmd yottahmd commented Feb 17, 2026

Summary by CodeRabbit

  • New Features

    • Added comprehensive agent session management with create, list, retrieve, and chat endpoints.
    • Introduced delegate tool enabling agents to spawn and manage multiple parallel sub-agents for concurrent task execution.
    • Added agent tool discovery and listing via new settings endpoint.
    • Implemented session cancellation and user prompt response capabilities.
    • Added UI panels for monitoring and interacting with delegate agents.
  • Tests

    • Expanded test coverage for agent sessions, delegation workflows, and tool registry.
    • Added testing infrastructure for UI components.

@yottahmd yottahmd changed the title feat: sub-agent feat: sub agents Feb 17, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 17, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This pull request introduces a delegate tool enabling agents to spawn parallel sub-agents, reorganizes the agent API from per-endpoint handlers to high-level session/tool methods, establishes a tool registry system for dynamic tool initialization, extends agent types to support parent-child session relationships, and adds frontend UI components to manage delegate sessions alongside chat interactions.

Changes

Cohort / File(s) Summary
Agent OpenAPI Schema
api/v1/api.yaml, ui/src/api/v1/schema.ts
Added 7 new public endpoints for agent session management (create, list, get, chat, cancel, respond) and tools listing; introduced 20+ new schema types for agent sessions, tool calls, delegate tracking, and messaging.
Agent Core API & Session Management
internal/agent/api.go, internal/agent/api_test.go, internal/agent/session.go, internal/agent/session_test.go
Reworked API surface from per-endpoint handlers to public methods (CreateSession, ListSessions, GetSessionDetail, SendMessage, CancelSession, SubmitUserResponse); added session delegation fields and sub-session registration hooks; extended SessionManager with delegation context and external message recording.
Agent Delegation & Tool Execution
internal/agent/delegate.go, internal/agent/delegate_test.go, internal/agent/loop.go, internal/agent/loop_test.go, internal/agent/types.go
Implemented delegate tool for parallel sub-agent spawning with task validation, concurrent execution capping, sub-session lifecycle management, and cost aggregation; extended Loop with delegation fields (SessionStore, RegisterSubSession, NotifyParent, AddCost); added DelegateContext and delegate-related types to ToolContext and StreamResponse.
Agent Tool Registry & Initialization
internal/agent/tool_registry.go, internal/agent/tool_registry_test.go, internal/agent/tools.go, internal/agent/tools_test.go, internal/agent/{ask_user,bash,navigate,patch,read,read_schema,think,web_search}.go
Established centralized tool registry with metadata (Name, Label, Description, DefaultEnabled) and factory-based tool creation; converted all tools to register via init functions; refactored CreateTools to dynamically construct tools from registry.
Agent Configuration & Policy
internal/agent/model_config.go, internal/agent/policy.go, internal/agent/policy_test.go, internal/agent/errors.go, internal/agent/system_prompt.txt
Converted hardcoded tool lists to registry-based queries; delegated tool name/policy lookups to registry; added 6 new error variables for validation and state management; documented delegate tool in system prompt.
Agent Session Persistence
internal/persis/filesession/store.go, internal/persis/filesession/store_test.go
Extended SessionForStorage with parent-child relationship fields (ParentSessionID, DelegateTask); updated conversion methods and added tests for sub-session round-trip persistence and filtering.
Frontend API Service Layer
internal/service/frontend/api/v1/agent_config.go, internal/service/frontend/api/v1/agent_sessions.go, internal/service/frontend/api/v1/api.go, internal/service/frontend/server.go
Added ListAgentTools endpoint and 6 session-management endpoints (CreateAgentSession, ListAgentSessions, GetAgentSession, ChatAgentSession, CancelAgentSession, RespondAgentSession) with user context extraction, error mapping, and API model conversion helpers; integrated agent API via dependency injection.
Frontend UI Components
ui/src/features/agent/components/AgentChatModal.tsx, ui/src/features/agent/components/ChatMessages.tsx, ui/src/features/agent/components/DelegatePanel.tsx
Extended AgentChatModal with delegate panel rendering and lifecycle management; enhanced ChatMessages to display sub-agent task chips and delegate interaction handlers; introduced DelegatePanel component with draggable/resizable positioning, status indicators, and auto-close animation.
Frontend State Management
ui/src/features/agent/hooks/useAgentChat.ts, ui/src/features/agent/context/AgentChatContext.tsx, ui/src/features/agent/types.ts
Added delegate state tracking (delegates array, delegateStatuses, delegateMessages); implemented delegate operations (bringToFront, reopenDelegate, removeDelegate); integrated SSE parsing for delegate events and messages; added closing animation state; extended types with delegate_ids, parent_session_id, and DelegateInfo/DelegateEvent/DelegateMessages structures.
Frontend Configuration & Testing Setup
ui/src/pages/agent-settings/index.tsx, ui/package.json, ui/vitest.config.ts, ui/src/test/setup.ts, ui/src/styles/global.css
Replaced static tool configuration with API-driven tool fetching; added Vitest/testing-library dependencies; introduced test environment setup (jsdom, scrollIntoView mock); added CSS variables and animations (agent-modal-in/out, delegate-panel-in).
Documentation & Error Handling
rfcs/017-sub-agent.md
Added RFC documenting delegate tool architecture, sub-session lifecycle, concurrent execution limits, cost tracking, and observability patterns.

Sequence Diagram

sequenceDiagram
    participant User
    participant MainAgent as Main Agent
    participant Delegate as Delegate Tool
    participant TaskMgr as Sub-Session<br/>Manager
    participant SubAgent as Sub-Agent<br/>Loop
    participant Store as Session<br/>Store

    User->>MainAgent: Chat: "Do these in parallel"
    MainAgent->>MainAgent: Execute tools<br/>(includes delegate)
    MainAgent->>Delegate: Execute delegate tool<br/>(tasks array)
    
    par Parallel Task Execution
        Delegate->>TaskMgr: Create sub-session 1
        TaskMgr->>Store: Persist sub-session
        TaskMgr->>SubAgent: Initialize Loop
        SubAgent->>SubAgent: Execute task 1
        SubAgent->>Store: Update session history
        SubAgent-->>TaskMgr: Return result + cost
        TaskMgr-->>Delegate: Task 1 complete
    and
        Delegate->>TaskMgr: Create sub-session 2
        TaskMgr->>Store: Persist sub-session
        TaskMgr->>SubAgent: Initialize Loop
        SubAgent->>SubAgent: Execute task 2
        SubAgent->>Store: Update session history
        SubAgent-->>TaskMgr: Return result + cost
        TaskMgr-->>Delegate: Task 2 complete
    end
    
    Delegate->>Delegate: Aggregate results<br/>+ costs
    Delegate-->>MainAgent: Return summary<br/>+ delegate_ids
    MainAgent->>MainAgent: Continue execution
    MainAgent-->>User: Final response<br/>(with delegate refs)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • feat: agent step type #1681: Modifies agent subsystem core (tool/loop contexts, tool execution primitives) with overlapping changes to internal/agent/loop.go, types.go, and agent execution wiring.
  • chore: Upgrade to go 1.26 #1655: Changes internal/agent/api.go structure and agent API surface design in the same file affected by this PR's API rework.
  • feat: tool calling support for chat step #1591: Implements foundational LLM tool-calling support (ToolCall/ToolDefinition types, tool registries, tool-calls on messages) that forms the basis for delegate tool execution in this PR.
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.89% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: sub agents' accurately summarizes the main changeset, which introduces a comprehensive sub-agent delegation feature with new API endpoints, session management, delegate tool, and related infrastructure across backend and frontend.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat-subagent

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/agent/session.go (1)

347-376: ⚠️ Potential issue | 🟡 Minor

SubscribeWithSnapshot omits ParentSessionID and DelegateTask from the Session snapshot.

GetSession() (lines 240–248) includes ParentSessionID and DelegateTask, but the snapshot session built here at lines 357–362 does not. Sub-session SSE clients would receive an incomplete initial snapshot.

Proposed fix
 	sess := Session{
-		ID:        id,
-		UserID:    sm.userID,
-		CreatedAt: sm.createdAt,
-		UpdatedAt: sm.lastActivity,
+		ID:              id,
+		UserID:          sm.userID,
+		DAGName:         sm.dagName,
+		ParentSessionID: sm.parentSessionID,
+		DelegateTask:    sm.delegateTask,
+		CreatedAt:       sm.createdAt,
+		UpdatedAt:       sm.lastActivity,
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/agent/session.go` around lines 347 - 376, SubscribeWithSnapshot
builds a Session snapshot but omits ParentSessionID and DelegateTask, causing
incomplete snapshots for sub-session SSE clients; update the sess construction
in SubscribeWithSnapshot to set ParentSessionID and DelegateTask from sm (the
same fields included by GetSession) so the returned StreamResponse Session
includes those two fields (i.e., populate sess.ParentSessionID and
sess.DelegateTask using sm.parentSessionID and sm.delegateTask or their actual
field names).
🧹 Nitpick comments (26)
internal/agent/delegate_test.go (1)

489-513: callCount in TestDelegateTool_MaxIterationsDefault lacks synchronization.

Unlike other tests (e.g., TestDelegateTool_PartialFailure), callCount at line 492 is incremented without a mutex. While currently safe with a single task and stop-on-first-call provider, this is fragile if the test is ever modified to use multiple tasks.

🛡️ Suggested defensive fix
+	var mu sync.Mutex
 	var callCount int
 	provider := &mockLLMProvider{
 		chatFunc: func(_ context.Context, _ *llm.ChatRequest) (*llm.ChatResponse, error) {
+			mu.Lock()
 			callCount++
+			mu.Unlock()
 			return &llm.ChatResponse{Content: fmt.Sprintf("iter %d", callCount), FinishReason: "stop"}, nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/agent/delegate_test.go` around lines 489 - 513, The test
TestDelegateTool_MaxIterationsDefault increments shared variable callCount
inside the mockLLMProvider.chatFunc without synchronization; make the counter
concurrency-safe by replacing the plain int with a synchronized counter (e.g.,
use a sync.Mutex around increments and reads or use an atomic integer via
sync/atomic with atomic.AddInt32/LoadInt32), update references to callCount in
the assertion to read the synchronized value, and keep the
mockLLMProvider.chatFunc behavior otherwise unchanged.
internal/agent/policy_test.go (1)

16-24: Consider asserting on the total number of registered tools to catch drift.

If a new tool is registered in the future but not added to this test, the omission would go unnoticed. Adding a length check makes the test self-updating:

💡 Proposed addition
+	assert.Len(t, resolved.Tools, len(RegisteredTools()), "tool count should match registry")
 	assert.True(t, resolved.Tools["bash"])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/agent/policy_test.go` around lines 16 - 24, The test currently
asserts individual tools via resolved.Tools["..."] but misses a check on the
total number of registered tools, so add an assertion that len(resolved.Tools)
equals the expected count to catch future drift; update the test (in
policy_test.go) to assert the exact length of resolved.Tools alongside the
existing assertions for
"bash","read","patch","think","navigate","read_schema","ask_user","web_search","delegate"
so any added/removed tool will fail the test.
internal/agent/tools_test.go (1)

13-16: Hardcoded tool count is fragile but acceptable as a change-detection guard.

The assert.Len(t, tools, 9) on line 14 will break whenever a tool is added or removed. Consider deriving the count from the expected list to keep them in sync:

♻️ Optional: derive count from expected list
-	tools := CreateTools(ToolConfig{})
-	assert.Len(t, tools, 9)
-
 	expectedTools := []string{"bash", "read", "patch", "think", "navigate", "read_schema", "ask_user", "web_search", "delegate"}
+	tools := CreateTools(ToolConfig{})
+	assert.Len(t, tools, len(expectedTools))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/agent/tools_test.go` around lines 13 - 16, The test uses a hardcoded
length (assert.Len(t, tools, 9)) which will drift when tools change; update the
assertion to derive the expected count from the expectedTools slice instead—use
the length of expectedTools (the expectedTools variable defined in the test)
when calling assert.Len so CreateTools(ToolConfig{}) is validated against the
same source of truth.
ui/src/features/agent/context/AgentChatContext.tsx (1)

63-71: Missing timer cleanup on unmount.

If AgentChatProvider unmounts during the 250ms closing window, setTimeout will fire and attempt state updates on an unmounted component. While React 19 won't throw, it's still a stale update. Consider cleaning up in a useEffect:

♻️ Proposed fix

Add a cleanup effect after the closeTimerRef declaration:

useEffect(() => {
  return () => {
    if (closeTimerRef.current) {
      clearTimeout(closeTimerRef.current);
    }
  };
}, []);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ui/src/features/agent/context/AgentChatContext.tsx` around lines 63 - 71, The
closeChat logic schedules a 250ms timeout via closeTimerRef which isn't cleared
on unmount, risking stale state updates; add a cleanup useEffect inside
AgentChatProvider (near closeTimerRef and closeChat) that on unmount checks
closeTimerRef.current and calls clearTimeout and nulls the ref to prevent the
timeout from firing after unmount.
ui/src/styles/global.css (1)

4-4: Static analysis tools flag @custom-variant — configure parsers to recognize Tailwind v4 directives.

Biome and Stylelint both report this as an unknown at-rule. The directive itself is valid Tailwind CSS v4 syntax, but your linter configs need updating (e.g., enable tailwindDirectives in Biome's CSS parser options, and add @custom-variant to Stylelint's ignoreAtRules).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ui/src/styles/global.css` at line 4, Static analyzers flag the Tailwind v4
directive "@custom-variant" in global.css as an unknown at-rule; update the
linter configs to recognize Tailwind directives by enabling the Biome CSS parser
option "tailwindDirectives": true and add "@custom-variant" to Stylelint's
"ignoreAtRules" (or use a pattern that includes custom-variant) so the
`@custom-variant` rule is accepted by both Biome and Stylelint.
internal/agent/session_test.go (1)

631-653: Consider require.Eventually instead of fixed sleep for robustness.

Line 647 uses time.Sleep(200 * time.Millisecond) to wait for async processing, while other tests in this same file (e.g., lines 610-614) use require.Eventually. The fixed sleep could cause flakiness on slow CI runners.

Suggested fix
-	// Wait for processing
-	time.Sleep(200 * time.Millisecond)
-
-	msgs := sm.GetMessages()
-	assert.GreaterOrEqual(t, len(msgs), 3, "should have at least 3 user messages")
+	// Wait for processing
+	require.Eventually(t, func() bool {
+		return len(sm.GetMessages()) >= 3
+	}, 5*time.Second, 50*time.Millisecond, "should have at least 3 user messages")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/agent/session_test.go` around lines 631 - 653, The test
TestSessionManager_ConcurrentMessages currently uses a fixed time.Sleep(200 *
time.Millisecond) to wait for async processing which can flake; replace that
sleep with require.Eventually to poll sm.GetMessages and assert len(msgs) >= 3
within a reasonable timeout (e.g., 1-2s) and a small tick (e.g., 10-50ms).
Update the test to call sm.AcceptUserMessage as before, then use
require.Eventually with a closure that calls sm.GetMessages and returns
len(msgs) >= 3 so the test reliably waits for asynchronous processing instead of
relying on a fixed sleep. Ensure you reference
TestSessionManager_ConcurrentMessages, sm.AcceptUserMessage, and sm.GetMessages
when locating the change.
internal/agent/tool_registry.go (2)

29-33: No duplicate name detection in RegisterTool.

If two tools accidentally register with the same name (e.g., during refactoring), the registry silently contains duplicates. CreateTools would produce two tools with the same name, and GetToolByName would find the first one. Consider adding a panic or dedup check since this runs only during init().

Suggested guard
 func RegisterTool(reg ToolRegistration) {
+	for _, existing := range toolRegistry {
+		if existing.Name == reg.Name {
+			panic("duplicate tool registration: " + reg.Name)
+		}
+	}
 	toolRegistry = append(toolRegistry, reg)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/agent/tool_registry.go` around lines 29 - 33, RegisterTool currently
appends registrations blindly, so add a duplicate-name guard: before appending
the incoming ToolRegistration (reg) to toolRegistry, iterate the existing
toolRegistry and compare reg.Name (or reg.Name normalized if needed) to detect
an existing registration with the same name; if found, panic with a clear
message indicating the duplicate tool name and both registrations (this runs
during init so a panic is acceptable); otherwise append as before. Also mention
related symbols CreateTools and GetToolByName in the comment so reviewers know
this prevents duplicate lookups later.

35-38: RegisteredTools returns the backing slice — callers can mutate the registry.

Returning the slice directly means any caller doing append or element mutation would affect the global registry. Since this is an internal package and callers are trusted, this is low risk, but returning a copy would be more defensive.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/agent/tool_registry.go` around lines 35 - 38, RegisteredTools
currently returns the backing slice toolRegistry allowing callers to mutate
global state; update RegisteredTools to return a defensive copy (e.g., allocate
a new []ToolRegistration and copy or use append to copy elements from
toolRegistry) so callers cannot modify the original registry while preserving
the same contents and signature.
ui/src/features/agent/components/ChatMessages.tsx (1)

44-57: Dual-field access for tool result IDs is a code smell.

The (tr as unknown as Record<string, unknown>)['tool_call_id'] cast works around SSE vs REST field naming inconsistency. Consider normalizing the field at the SSE ingestion point (in useAgentChat.ts when parsing SSE messages) so downstream components don't need this workaround.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ui/src/features/agent/components/ChatMessages.tsx` around lines 44 - 57, The
code in ChatMessages.tsx builds completedToolCallIds by reading either
tr.tool_use_id or the SSE-only tr["tool_call_id"]; move this normalization into
the SSE parsing in useAgentChat.ts so downstream components can always rely on
tool_use_id. Update the SSE parsing logic that constructs messages to
copy/rename tool_call_id -> tool_use_id (and any nested tool_results entries)
when present, keep the existing handling for REST messages, and then remove the
fallback cast/lookup from completedToolCallIds (the const completedToolCallIds /
loop over messages in ChatMessages.tsx) so it simply reads tr.tool_use_id.
Ensure the normalization preserves string typing and covers all places where SSE
message shapes are converted (e.g., convertApiMessage or the SSE event handler).
internal/agent/loop_test.go (2)

809-869: 30-second timeout is very generous for a unit test.

TestLoop_BatchedDelegateExceedsMax uses a 30s timeout. If something goes wrong, this will stall the test suite. Consider reducing to 10-15s, which should still be ample for spawning capped sub-sessions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/agent/loop_test.go` around lines 809 - 869, Reduce the overly long
30s test timeout in TestLoop_BatchedDelegateExceedsMax to a shorter value (e.g.,
10s or 15s) to avoid long stalls; locate the call to waitForLoopDone(ctx,
cancel, loop, 30*time.Second) and replace the timeout argument with a shorter
duration such as 10*time.Second so the test still validates spawning capped
sub-sessions but fails faster on issues.

871-932: Race condition risk: callback flags are written without synchronization.

registerCalled, notifyCalled, and addCostCalled are written from the loop's goroutine (inside callbacks) and read from the test goroutine after waitForLoopDone returns. While waitForLoopDone does establish a happens-before via the channel receive, the callbacks themselves run inside the loop goroutine that writes to done — so this should be safe in practice. However, OnWorking calls cancel() which races with the timeout path in waitForLoopDone that also calls cancel. Since cancel is idempotent, this is fine.

Consider using atomic.Bool for the flags to make the intent explicit and satisfy the race detector if the callback scheduling ever changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/agent/loop_test.go` around lines 871 - 932, The three boolean flags
(registerCalled, notifyCalled, addCostCalled) in
TestLoop_ExecuteTool_PassesSubSessionCallbacks are subject to a race because
they are written by callbacks running in the loop goroutine; replace them with
sync/atomic's atomic.Bool (or equivalent) to make the intent explicit and
race-detector-safe: change declarations to atomic.Bool types, update each
callback to call
registerCalled.Store(true)/notifyCalled.Store(true)/addCostCalled.Store(true),
and update the final assertions to check
registerCalled.Load()/notifyCalled.Load()/addCostCalled.Load() (ensuring you add
the necessary import for sync/atomic). This touches symbols:
TestLoop_ExecuteTool_PassesSubSessionCallbacks, registerCalled, notifyCalled,
addCostCalled, and the callback lambdas passed to NewLoop.
ui/src/features/agent/components/AgentChatModal.tsx (1)

216-228: index prop shifts when a delegate is removed, causing position jumps.

When a delegate panel is removed from the delegates array, the remaining panels get new index values, which changes their computed grid positions (used as defaults in useResizableDraggable). Since the hook persists bounds via storageKey, this only affects panels that haven't been manually moved. Still, this can cause a brief visual jump for newly opened panels after one is closed.

A stable positional index (e.g., based on insertion order rather than array position) would avoid this.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ui/src/features/agent/components/AgentChatModal.tsx` around lines 216 - 228,
The visual jump is caused by using the array index (i) as the positional key;
stop passing i to DelegatePanel and instead pass a stable insertion-based index
or ID (e.g., d.insertOrder, d.createdAt, or a new d.positionIndex assigned when
the delegate is created) so positions don't change when the array reorders.
Update the mapping to use that stable field for the index prop and ensure
DelegatePanel (and its useResizableDraggable/storageKey usage) uses that stable
value for default grid positions and persistent storage keys; add assignment of
the stable position field where delegates are created/added and keep
removeDelegate/bringToFront logic unchanged.
ui/src/features/agent/components/DelegatePanel.tsx (2)

32-51: Grid position is computed from window.innerWidth/innerHeight at render time — not responsive.

If the browser window is resized after the panel mounts, the initial grid position won't recalculate. Since useResizableDraggable stores user-adjusted bounds, this only affects initial placement. A minor UX gap.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ui/src/features/agent/components/DelegatePanel.tsx` around lines 32 - 51, The
grid position uses window.innerWidth/innerHeight once at render so it doesn't
update on resize; make the PANEL_W, PANEL_H, cols, row, left and top
calculations reactive by moving them into state/memo and updating on window
resize (or container resize) via a useEffect that listens to window 'resize' (or
ResizeObserver) and recalculates cols/row/left/top, then pass the updated
defaults into useResizableDraggable (still using delegateId for storageKey) so
the initial placement updates when the viewport changes; ensure you debounce the
resize handler to avoid thrashing and clean up the listener on unmount.

57-71: handleClose dependency instability may cause unnecessary effect re-runs.

onClose is an inline arrow function from the parent (() => removeDelegate(d.id)), which means handleClose gets a new identity every render, causing the auto-close useEffect to re-subscribe on each render cycle. The wasRunningRef guard prevents double-triggering, so this is not a bug — just unnecessary overhead.

Memoizing onClose in the parent or using a ref for the callback would stabilize it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ui/src/features/agent/components/DelegatePanel.tsx` around lines 57 - 71,
handleClose is recreated every render because it depends on the inline onClose
from the parent (e.g. () => removeDelegate(d.id)), causing the useEffect that
watches [status, handleClose] to re-run unnecessarily; to fix this either
stabilize the parent callback (memoize the onClose passed in, e.g. make
removeDelegate wrapper stable) or make handleClose stable by moving onClose into
a ref and reading that ref inside a useCallback/timeout (use onCloseRef.current
inside handleClose) so that handleClose identity doesn't change on each render;
update references to handleClose, onClose, wasRunningRef and the useEffect
accordingly.
ui/src/pages/agent-settings/index.tsx (1)

146-157: Silently swallowing the tools fetch error could hide configuration issues.

If /settings/agent/tools fails, the user sees no tool toggles with no error message. Consider setting an error state or at least logging to console so misconfiguration is discoverable.

Suggested improvement
     } catch {
-      return [];
+      setError('Failed to load tool definitions');
+      return [];
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ui/src/pages/agent-settings/index.tsx` around lines 146 - 157, The fetchTools
useCallback currently swallows errors; update its catch block to surface
failures by logging the caught error (e.g., console.error or processLogger) and
setting a component error state (add a useState like toolFetchError and setter
setToolFetchError) so the UI can show an error message; keep returning [] on
failure but ensure fetchTools, setToolMetas and the new error state are
referenced/updated so callers and the component can render an error indicator
when the GET to '/settings/agent/tools' fails.
ui/src/features/agent/hooks/useAgentChat.ts (1)

206-223: Delegate message deduplication via findIndex is O(n²) in the worst case.

For each incoming message, findIndex scans the existing array. For large delegate conversations this could become a bottleneck on the SSE hot path. An id → index map or a Set for known IDs would be O(1) lookup.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ui/src/features/agent/hooks/useAgentChat.ts` around lines 206 - 223, The
current loop inside setDelegateMessages (handling data.delegate_messages with
delegate_id and msgs) uses updated.findIndex per incoming msg, causing O(n²)
behavior; fix by building an id→index map or a Set of existing message IDs for
the existing array (existing = prev[delegate_id] || []) before iterating msgs,
then do O(1) lookups to update or append messages into updated, and finally set
delegateMessagesRef.current = next and return next as before (update logic lives
in useAgentChat's delegate message handler).
internal/agent/types.go (1)

166-176: Consider a typed constant for DelegateEvent.Type.

Type is a bare string accepting "started" / "completed". A dedicated string type with named constants (similar to MessageType and PromptType already in this file) would prevent typos and make the API self-documenting.

♻️ Suggested approach
// DelegateEventType identifies the lifecycle phase of a delegate.
type DelegateEventType string

const (
	DelegateEventStarted   DelegateEventType = "started"
	DelegateEventCompleted DelegateEventType = "completed"
)

Then change DelegateEvent.Type from string to DelegateEventType.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/agent/types.go` around lines 166 - 176, The DelegateEvent.Type field
is an untyped string that should use a dedicated enum-like type to avoid typos;
add a new type DelegateEventType (string) and define constants
DelegateEventStarted and DelegateEventCompleted, then change
DelegateEvent.Type's type from string to DelegateEventType and update any
usages/constructors/serializations that set or check the field (look for
DelegateEvent and places that create or compare its Type).
internal/agent/delegate.go (3)

115-119: Silently dropping excess tasks may confuse the LLM.

When the LLM submits more than maxConcurrentDelegates tasks, the excess are silently discarded. The LLM (and user) will never see results for tasks beyond index 8, which could lead to incomplete or confusing behavior. Consider returning an error or at least including a note in the output indicating that tasks were truncated.

♻️ Suggested approach
 	tasks := args.Tasks
 	if len(tasks) > maxConcurrentDelegates {
+		// Inform the caller about truncation so results aren't silently lost.
+		slog.Warn("Delegate tasks truncated", "requested", len(tasks), "max", maxConcurrentDelegates)
 		tasks = tasks[:maxConcurrentDelegates]
 	}

Also consider appending a note to the final output content indicating how many tasks were dropped.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/agent/delegate.go` around lines 115 - 119, The code silently
truncates args.Tasks to maxConcurrentDelegates which drops work without
feedback; update the handler that uses args.Tasks (the delegate function around
maxConcurrentDelegates) to detect when len(args.Tasks) > maxConcurrentDelegates
and either return an explicit error to the caller or continue processing only
the first maxConcurrentDelegates while appending a clear note to the final
output indicating how many tasks were dropped; reference the variables
maxConcurrentDelegates and tasks (sliced from args.Tasks) so you locate the
truncation, and ensure the returned response or output content includes the
dropped count and context for the LLM/user.

234-234: Ignored error from RecordExternalMessage.

If persisting the initial user message fails, the sub-agent will proceed without its task being recorded, which could lead to an inconsistent session history. At minimum, log the error.

🔧 Suggested fix
-	_ = subMgr.RecordExternalMessage(ctx.Context, userMsg)
+	if err := subMgr.RecordExternalMessage(ctx.Context, userMsg); err != nil {
+		logger.Warn("Failed to record initial delegate message", "error", err)
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/agent/delegate.go` at line 234, The call to
subMgr.RecordExternalMessage(ctx.Context, userMsg) currently ignores its error;
capture the returned error (e.g., err := subMgr.RecordExternalMessage(...)) and
at minimum log it (using the existing logger/context logger available in this
file — e.g., ctx.Logger or a package logger); if failure should abort the flow,
return or propagate the error instead of proceeding. Ensure you reference
subMgr.RecordExternalMessage, userMsg and ctx.Context in your change so the fix
is applied at the exact call site in delegate.go.

378-383: truncate operates on bytes, not runes — may split multi-byte characters.

len(s) and s[:maxLen] work on byte offsets, so a multi-byte UTF-8 character at the boundary could be cut in half, producing invalid UTF-8 in log output and summaries returned to the LLM.

♻️ Rune-safe alternative
 func truncate(s string, maxLen int) string {
-	if len(s) <= maxLen {
+	runes := []rune(s)
+	if len(runes) <= maxLen {
 		return s
 	}
-	return s[:maxLen] + "..."
+	return string(runes[:maxLen]) + "..."
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/agent/delegate.go` around lines 378 - 383, The truncate function
currently slices by bytes which can split multi-byte UTF-8 characters; update
truncate to be rune-safe by converting the string to a rune slice (e.g., runes
:= []rune(s)), compare len(runes) to maxLen, and return either s or
string(runes[:maxLen]) + "..." so you never cut a rune in half; reference the
truncate function to locate and replace the byte-based logic.
internal/agent/session.go (1)

251-276: RecordExternalMessage always returns nil — the error return type is misleading.

The method logs persistence failures but never returns a non-nil error. This makes the error return type misleading to callers. If the intent is to propagate persistence errors in the future, consider adding a TODO. Otherwise, the signature could return nothing or the persistence error could be propagated.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/agent/session.go` around lines 251 - 276,
SessionManager.RecordExternalMessage currently declares an error return but
always returns nil; either propagate persistence errors from sm.onMessage or
remove the misleading error return. To fix, update RecordExternalMessage: if you
want callers to see persistence failures, capture the error from
sm.onMessage(ctx, msg), log it with sm.logger.Warn as now and then return that
error (ensuring callers handle it); otherwise, change the method signature to
not return error and update callers to remove error handling. Refer to
SessionManager.RecordExternalMessage, sm.onMessage, and the existing logging to
implement the chosen approach and add a TODO comment if you intend to change
behavior later.
api/v1/api.yaml (5)

7762-7824: Constrain and document totalCost.

Consider adding minimum: 0 and a short unit/currency note (e.g., “USD”) to prevent negative values and ambiguity in client displays.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1/api.yaml` around lines 7762 - 7824, Update the
AgentSessionState.totalCost and AgentSessionWithState.totalCost schema entries
to constrain values and clarify units: add "minimum: 0" to prevent negative
costs and augment the property description to mention the expected unit/currency
(e.g., "Total cost in USD") or a generic unit note; make these changes on the
totalCost properties inside the AgentSessionState and AgentSessionWithState
definitions so clients know values are non-negative and what currency/units to
display.

4576-4651: Consider pagination for listing agent sessions.

GET /agent/sessions returns a raw array with no paging. As session volume grows, this can become expensive for both server and UI. Consider adding page/perPage (or a server-side limit) and returning a wrapper with pagination metadata, similar to other list endpoints.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1/api.yaml` around lines 4576 - 4651, The GET /agent/sessions operation
(operationId: listAgentSessions) currently returns a raw array of
AgentSessionWithState; change it to support pagination by adding query
parameters (e.g., page and perPage or limit and offset) to the parameters list,
enforce a sensible server-side default/max for perPage, and update the 200
response to return a wrapper object (e.g., { items: AgentSessionWithState[],
page: number, perPage: number, total: number }) so clients receive pagination
metadata and the server can limit load.

7825-7885: Add validation for token counts and UI actions.

Recommend adding minimum: 0 to AgentTokenUsage fields and requiring at least AgentUIAction.type (and possibly path) to avoid emitting empty UI actions to clients.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1/api.yaml` around lines 7825 - 7885, Add numeric validation to the
AgentTokenUsage schema by adding minimum: 0 to promptTokens, completionTokens,
and totalTokens to prevent negative values; and enforce that AgentUIAction
defines a required "type" property (add required: [type] under AgentUIAction) so
empty UI actions cannot be emitted (optionally also add "path" to required if
your callers expect it). Reference schemas: AgentTokenUsage (promptTokens,
completionTokens, totalTokens) and AgentUIAction (type, path).

7715-7761: Clarify/validate AgentDAGContext.dagFile format.

If this field is intended to accept only DAG file IDs, consider reusing #/components/schemas/DAGFileName (or add a pattern/minLength). If full paths are allowed, call that out explicitly so clients can validate correctly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1/api.yaml` around lines 7715 - 7761, The AgentDAGContext.dagFile
description is ambiguous about whether it accepts DAG file IDs or full paths;
update AgentDAGContext.dagFile to enforce and document the expected format by
either referencing the existing DAGFileName schema (use $ref:
"#/components/schemas/DAGFileName") if only IDs are allowed, or add explicit
validation (e.g., pattern and/or minLength) and a clear description stating that
full filesystem paths are accepted; ensure the change is applied where
AgentDAGContext is defined and consider noting the expectation in
AgentChatRequest.dagContexts documentation so clients can validate inputs
consistently.

7886-8023: Tighten response schema requirements and status typing.

If messages, session, and sessionState are always returned, mark them required to keep generated client types accurate. Also consider enumerating AgentStatusResponse.status (or referencing an existing status schema) so consumers can rely on known values.

🔧 Suggested schema tightening
     AgentSessionDetailResponse:
       type: object
       description: "Session details including messages and current state"
+      required:
+        - messages
+        - session
+        - sessionState
       properties:
         messages:
           type: array
           items:
             $ref: "#/components/schemas/AgentMessage"
         session:
           $ref: "#/components/schemas/AgentSession"
         sessionState:
           $ref: "#/components/schemas/AgentSessionState"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1/api.yaml` around lines 7886 - 8023, Mark AgentSessionDetailResponse
properties messages, session, and sessionState as required in the schema (update
AgentSessionDetailResponse.required to include "messages", "session",
"sessionState") so generated clients treat them as always present; ensure
AgentMessage, AgentSession and AgentSessionState schemas remain compatible with
being required. Also tighten AgentStatusResponse by enumerating allowed status
values (or replacing the free-form string with a $ref to an existing status
enum) in AgentStatusResponse.properties.status so consumers get a typed set of
known statuses.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/agent/api_test.go`:
- Around line 1011-1077: The test TestAPI_CreateSession_MultipleDelegates
reveals a mismatch between RFC 017 and the implementation: the
mockLLMProvider.chatFunc returns a batched delegate payload {"tasks":[...]} but
RFC 017 documents a single-task payload {"task":..., "max_iterations":...};
either update RFC 017 to document the batched schema or change the delegate
handling to accept the RFC single-task shape. Concretely, either (A) update the
RFC and any schema/validation code to reflect the batched format used by
CreateSession and the delegate ToolCall (search for CreateSession,
mockLLMProvider.chatFunc, and delegate ToolCall handling), or (B) refactor the
delegate handling in CreateSession and the code that parses
llm.ToolCall.Function.Arguments so it expects and processes a single {"task":
"...", "max_iterations": ...} per call (splitting batched inputs into separate
session creations at the call site). Ensure tests and parsing/validation logic
are consistent with the chosen approach.

In `@internal/agent/api.go`:
- Around line 546-602: CreateSession currently persists the session
(persistNewSession) and adds it to the in-memory map (a.sessions.Store) before
calling mgr.AcceptUserMessage, which can fail and leave a persisted orphan;
either move the persistNewSession and a.sessions.Store calls to after
AcceptUserMessage succeeds, or keep them in place but on AcceptUserMessage error
remove both the in-memory entry (a.sessions.Delete(id)) and the persisted record
(call the session store delete/remove method) to ensure no orphaned persisted
session remains; update code references in CreateSession around
NewSessionManager, persistNewSession, a.sessions.Store, and
mgr.AcceptUserMessage accordingly.

In `@internal/agent/delegate.go`:
- Line 332: Replace the direct equality check against context.Canceled with
errors.Is to handle wrapped errors: update the condition that currently reads
`if err != nil && err != context.Canceled` to use `errors.Is(err,
context.Canceled)` (i.e., `if err != nil && !errors.Is(err, context.Canceled)`)
and add the "errors" package to the import block; ensure you reference the same
`err` variable and `context.Canceled` symbol in the `internal/agent/delegate.go`
location.
- Around line 211-214: The registered sub-session (via
dc.RegisterSubSession(delegateID, subMgr)) isn't removed on delegate completion,
leaking memory; add a cleanup call after the delegate finishes (around where the
delegate ends, ~lines 315–330) to deregister the sub-session. Implement this by
checking for a deregister callback and invoking it, e.g. if
dc.DeregisterSubSession != nil { dc.DeregisterSubSession(delegateID) } (or the
corresponding UnregisterSubSession API on dc) so the subMgr entry is removed
from the session map; ensure this runs in the delegate completion/cleanup path
and does not swallow errors.
- Around line 291-299: In the OnWorking callback (the anonymous func handling
working state that calls subMgr.SetWorking), stop calling cancelChild()
unconditionally on every !working transition; instead, when working becomes
false increment iteration and only call cancelChild() if iteration >=
maxIterations (and log via logger.Info about max iterations reached). In other
words, move the cancelChild() call into the branch that checks iteration >=
maxIterations so the sub-agent is only canceled when the iteration limit is
reached.

In `@internal/service/frontend/api/v1/agent_sessions.go`:
- Around line 82-101: mapAgentError currently returns raw errors in the default
branch; change the default to return a wrapped *Error instance (same wrapper
type used by errAgentBadRequest/errAgentNotConfigured/etc.) instead of returning
err directly — for example replace "return err" with constructing and returning
&Error{Message: err.Error(), Internal: err} (or the project’s canonical *Error
constructor) so all unmapped agent errors are consistently wrapped by
mapAgentError.

In `@rfcs/017-sub-agent.md`:
- Around line 183-188: The "Out-of-scope" list includes "Frontend implementation
for expandable sub-session blocks" but the PR and code clearly add UI components
(e.g., DelegatePanel.tsx, ChatMessages.tsx) related to delegate panels; update
rfcs/017-sub-agent.md to either remove that item or explicitly mark it as
partially in-scope (and document which frontend pieces are included), and adjust
the list of out-of-scope items and any summary text to reference the concrete
components (DelegatePanel.tsx, ChatMessages.tsx, sub-session blocks) so the RFC
matches the implemented scope.
- Around line 67-86: Update the two fenced code blocks that currently lack a
language tag by adding a language specifier (use "text" or "plaintext");
specifically locate the block that begins with "Parent Loop" and the lifecycle
block that begins with "1. Parent LLM returns delegate tool call(s)" in
rfcs/017-sub-agent.md and change their opening ``` fences to ```text so
markdownlint MD040 is satisfied.

In `@ui/src/features/agent/components/ChatMessages.tsx`:
- Around line 288-318: The current lookup
Object.values(delegateStatuses).find((d) => d.task === task) in the tasks.map
block is fragile because it matches by task text; change the UI to use an
explicit delegate ID mapping instead: ensure the producer/tool result supplies a
delegateId for each task (or maintain a task->delegateId map), then in the
rendering use that delegateId to lookup delegateStatuses[delegateId] (or pass
delegateId directly) and call onOpenDelegate(delegateId). Update the
canClick/isRunning logic to derive state from the delegate by ID (e.g., delegate
= delegateStatuses[delegateId]) and stop relying on string equality for
matching.

In `@ui/src/features/agent/hooks/useAgentChat.ts`:
- Around line 371-386: selectSession currently doesn't clear delegate-related
state, so stale delegates, delegateStatuses, and delegateMessages can persist
between sessions; update the selectSession function to reset these pieces of
state (call setDelegates with an empty array and setDelegateStatuses and
setDelegateMessages with their empty initial values) when loading a new session
(e.g., alongside setMessages/setAnsweredPrompts), and add those setter functions
to the useCallback dependency array so React tracks them (reference the
selectSession function and the setters setDelegates, setDelegateStatuses,
setDelegateMessages).

---

Outside diff comments:
In `@internal/agent/session.go`:
- Around line 347-376: SubscribeWithSnapshot builds a Session snapshot but omits
ParentSessionID and DelegateTask, causing incomplete snapshots for sub-session
SSE clients; update the sess construction in SubscribeWithSnapshot to set
ParentSessionID and DelegateTask from sm (the same fields included by
GetSession) so the returned StreamResponse Session includes those two fields
(i.e., populate sess.ParentSessionID and sess.DelegateTask using
sm.parentSessionID and sm.delegateTask or their actual field names).

---

Nitpick comments:
In `@api/v1/api.yaml`:
- Around line 7762-7824: Update the AgentSessionState.totalCost and
AgentSessionWithState.totalCost schema entries to constrain values and clarify
units: add "minimum: 0" to prevent negative costs and augment the property
description to mention the expected unit/currency (e.g., "Total cost in USD") or
a generic unit note; make these changes on the totalCost properties inside the
AgentSessionState and AgentSessionWithState definitions so clients know values
are non-negative and what currency/units to display.
- Around line 4576-4651: The GET /agent/sessions operation (operationId:
listAgentSessions) currently returns a raw array of AgentSessionWithState;
change it to support pagination by adding query parameters (e.g., page and
perPage or limit and offset) to the parameters list, enforce a sensible
server-side default/max for perPage, and update the 200 response to return a
wrapper object (e.g., { items: AgentSessionWithState[], page: number, perPage:
number, total: number }) so clients receive pagination metadata and the server
can limit load.
- Around line 7825-7885: Add numeric validation to the AgentTokenUsage schema by
adding minimum: 0 to promptTokens, completionTokens, and totalTokens to prevent
negative values; and enforce that AgentUIAction defines a required "type"
property (add required: [type] under AgentUIAction) so empty UI actions cannot
be emitted (optionally also add "path" to required if your callers expect it).
Reference schemas: AgentTokenUsage (promptTokens, completionTokens, totalTokens)
and AgentUIAction (type, path).
- Around line 7715-7761: The AgentDAGContext.dagFile description is ambiguous
about whether it accepts DAG file IDs or full paths; update
AgentDAGContext.dagFile to enforce and document the expected format by either
referencing the existing DAGFileName schema (use $ref:
"#/components/schemas/DAGFileName") if only IDs are allowed, or add explicit
validation (e.g., pattern and/or minLength) and a clear description stating that
full filesystem paths are accepted; ensure the change is applied where
AgentDAGContext is defined and consider noting the expectation in
AgentChatRequest.dagContexts documentation so clients can validate inputs
consistently.
- Around line 7886-8023: Mark AgentSessionDetailResponse properties messages,
session, and sessionState as required in the schema (update
AgentSessionDetailResponse.required to include "messages", "session",
"sessionState") so generated clients treat them as always present; ensure
AgentMessage, AgentSession and AgentSessionState schemas remain compatible with
being required. Also tighten AgentStatusResponse by enumerating allowed status
values (or replacing the free-form string with a $ref to an existing status
enum) in AgentStatusResponse.properties.status so consumers get a typed set of
known statuses.

In `@internal/agent/delegate_test.go`:
- Around line 489-513: The test TestDelegateTool_MaxIterationsDefault increments
shared variable callCount inside the mockLLMProvider.chatFunc without
synchronization; make the counter concurrency-safe by replacing the plain int
with a synchronized counter (e.g., use a sync.Mutex around increments and reads
or use an atomic integer via sync/atomic with atomic.AddInt32/LoadInt32), update
references to callCount in the assertion to read the synchronized value, and
keep the mockLLMProvider.chatFunc behavior otherwise unchanged.

In `@internal/agent/delegate.go`:
- Around line 115-119: The code silently truncates args.Tasks to
maxConcurrentDelegates which drops work without feedback; update the handler
that uses args.Tasks (the delegate function around maxConcurrentDelegates) to
detect when len(args.Tasks) > maxConcurrentDelegates and either return an
explicit error to the caller or continue processing only the first
maxConcurrentDelegates while appending a clear note to the final output
indicating how many tasks were dropped; reference the variables
maxConcurrentDelegates and tasks (sliced from args.Tasks) so you locate the
truncation, and ensure the returned response or output content includes the
dropped count and context for the LLM/user.
- Line 234: The call to subMgr.RecordExternalMessage(ctx.Context, userMsg)
currently ignores its error; capture the returned error (e.g., err :=
subMgr.RecordExternalMessage(...)) and at minimum log it (using the existing
logger/context logger available in this file — e.g., ctx.Logger or a package
logger); if failure should abort the flow, return or propagate the error instead
of proceeding. Ensure you reference subMgr.RecordExternalMessage, userMsg and
ctx.Context in your change so the fix is applied at the exact call site in
delegate.go.
- Around line 378-383: The truncate function currently slices by bytes which can
split multi-byte UTF-8 characters; update truncate to be rune-safe by converting
the string to a rune slice (e.g., runes := []rune(s)), compare len(runes) to
maxLen, and return either s or string(runes[:maxLen]) + "..." so you never cut a
rune in half; reference the truncate function to locate and replace the
byte-based logic.

In `@internal/agent/loop_test.go`:
- Around line 809-869: Reduce the overly long 30s test timeout in
TestLoop_BatchedDelegateExceedsMax to a shorter value (e.g., 10s or 15s) to
avoid long stalls; locate the call to waitForLoopDone(ctx, cancel, loop,
30*time.Second) and replace the timeout argument with a shorter duration such as
10*time.Second so the test still validates spawning capped sub-sessions but
fails faster on issues.
- Around line 871-932: The three boolean flags (registerCalled, notifyCalled,
addCostCalled) in TestLoop_ExecuteTool_PassesSubSessionCallbacks are subject to
a race because they are written by callbacks running in the loop goroutine;
replace them with sync/atomic's atomic.Bool (or equivalent) to make the intent
explicit and race-detector-safe: change declarations to atomic.Bool types,
update each callback to call
registerCalled.Store(true)/notifyCalled.Store(true)/addCostCalled.Store(true),
and update the final assertions to check
registerCalled.Load()/notifyCalled.Load()/addCostCalled.Load() (ensuring you add
the necessary import for sync/atomic). This touches symbols:
TestLoop_ExecuteTool_PassesSubSessionCallbacks, registerCalled, notifyCalled,
addCostCalled, and the callback lambdas passed to NewLoop.

In `@internal/agent/policy_test.go`:
- Around line 16-24: The test currently asserts individual tools via
resolved.Tools["..."] but misses a check on the total number of registered
tools, so add an assertion that len(resolved.Tools) equals the expected count to
catch future drift; update the test (in policy_test.go) to assert the exact
length of resolved.Tools alongside the existing assertions for
"bash","read","patch","think","navigate","read_schema","ask_user","web_search","delegate"
so any added/removed tool will fail the test.

In `@internal/agent/session_test.go`:
- Around line 631-653: The test TestSessionManager_ConcurrentMessages currently
uses a fixed time.Sleep(200 * time.Millisecond) to wait for async processing
which can flake; replace that sleep with require.Eventually to poll
sm.GetMessages and assert len(msgs) >= 3 within a reasonable timeout (e.g.,
1-2s) and a small tick (e.g., 10-50ms). Update the test to call
sm.AcceptUserMessage as before, then use require.Eventually with a closure that
calls sm.GetMessages and returns len(msgs) >= 3 so the test reliably waits for
asynchronous processing instead of relying on a fixed sleep. Ensure you
reference TestSessionManager_ConcurrentMessages, sm.AcceptUserMessage, and
sm.GetMessages when locating the change.

In `@internal/agent/session.go`:
- Around line 251-276: SessionManager.RecordExternalMessage currently declares
an error return but always returns nil; either propagate persistence errors from
sm.onMessage or remove the misleading error return. To fix, update
RecordExternalMessage: if you want callers to see persistence failures, capture
the error from sm.onMessage(ctx, msg), log it with sm.logger.Warn as now and
then return that error (ensuring callers handle it); otherwise, change the
method signature to not return error and update callers to remove error
handling. Refer to SessionManager.RecordExternalMessage, sm.onMessage, and the
existing logging to implement the chosen approach and add a TODO comment if you
intend to change behavior later.

In `@internal/agent/tool_registry.go`:
- Around line 29-33: RegisterTool currently appends registrations blindly, so
add a duplicate-name guard: before appending the incoming ToolRegistration (reg)
to toolRegistry, iterate the existing toolRegistry and compare reg.Name (or
reg.Name normalized if needed) to detect an existing registration with the same
name; if found, panic with a clear message indicating the duplicate tool name
and both registrations (this runs during init so a panic is acceptable);
otherwise append as before. Also mention related symbols CreateTools and
GetToolByName in the comment so reviewers know this prevents duplicate lookups
later.
- Around line 35-38: RegisteredTools currently returns the backing slice
toolRegistry allowing callers to mutate global state; update RegisteredTools to
return a defensive copy (e.g., allocate a new []ToolRegistration and copy or use
append to copy elements from toolRegistry) so callers cannot modify the original
registry while preserving the same contents and signature.

In `@internal/agent/tools_test.go`:
- Around line 13-16: The test uses a hardcoded length (assert.Len(t, tools, 9))
which will drift when tools change; update the assertion to derive the expected
count from the expectedTools slice instead—use the length of expectedTools (the
expectedTools variable defined in the test) when calling assert.Len so
CreateTools(ToolConfig{}) is validated against the same source of truth.

In `@internal/agent/types.go`:
- Around line 166-176: The DelegateEvent.Type field is an untyped string that
should use a dedicated enum-like type to avoid typos; add a new type
DelegateEventType (string) and define constants DelegateEventStarted and
DelegateEventCompleted, then change DelegateEvent.Type's type from string to
DelegateEventType and update any usages/constructors/serializations that set or
check the field (look for DelegateEvent and places that create or compare its
Type).

In `@ui/src/features/agent/components/AgentChatModal.tsx`:
- Around line 216-228: The visual jump is caused by using the array index (i) as
the positional key; stop passing i to DelegatePanel and instead pass a stable
insertion-based index or ID (e.g., d.insertOrder, d.createdAt, or a new
d.positionIndex assigned when the delegate is created) so positions don't change
when the array reorders. Update the mapping to use that stable field for the
index prop and ensure DelegatePanel (and its useResizableDraggable/storageKey
usage) uses that stable value for default grid positions and persistent storage
keys; add assignment of the stable position field where delegates are
created/added and keep removeDelegate/bringToFront logic unchanged.

In `@ui/src/features/agent/components/ChatMessages.tsx`:
- Around line 44-57: The code in ChatMessages.tsx builds completedToolCallIds by
reading either tr.tool_use_id or the SSE-only tr["tool_call_id"]; move this
normalization into the SSE parsing in useAgentChat.ts so downstream components
can always rely on tool_use_id. Update the SSE parsing logic that constructs
messages to copy/rename tool_call_id -> tool_use_id (and any nested tool_results
entries) when present, keep the existing handling for REST messages, and then
remove the fallback cast/lookup from completedToolCallIds (the const
completedToolCallIds / loop over messages in ChatMessages.tsx) so it simply
reads tr.tool_use_id. Ensure the normalization preserves string typing and
covers all places where SSE message shapes are converted (e.g.,
convertApiMessage or the SSE event handler).

In `@ui/src/features/agent/components/DelegatePanel.tsx`:
- Around line 32-51: The grid position uses window.innerWidth/innerHeight once
at render so it doesn't update on resize; make the PANEL_W, PANEL_H, cols, row,
left and top calculations reactive by moving them into state/memo and updating
on window resize (or container resize) via a useEffect that listens to window
'resize' (or ResizeObserver) and recalculates cols/row/left/top, then pass the
updated defaults into useResizableDraggable (still using delegateId for
storageKey) so the initial placement updates when the viewport changes; ensure
you debounce the resize handler to avoid thrashing and clean up the listener on
unmount.
- Around line 57-71: handleClose is recreated every render because it depends on
the inline onClose from the parent (e.g. () => removeDelegate(d.id)), causing
the useEffect that watches [status, handleClose] to re-run unnecessarily; to fix
this either stabilize the parent callback (memoize the onClose passed in, e.g.
make removeDelegate wrapper stable) or make handleClose stable by moving onClose
into a ref and reading that ref inside a useCallback/timeout (use
onCloseRef.current inside handleClose) so that handleClose identity doesn't
change on each render; update references to handleClose, onClose, wasRunningRef
and the useEffect accordingly.

In `@ui/src/features/agent/context/AgentChatContext.tsx`:
- Around line 63-71: The closeChat logic schedules a 250ms timeout via
closeTimerRef which isn't cleared on unmount, risking stale state updates; add a
cleanup useEffect inside AgentChatProvider (near closeTimerRef and closeChat)
that on unmount checks closeTimerRef.current and calls clearTimeout and nulls
the ref to prevent the timeout from firing after unmount.

In `@ui/src/features/agent/hooks/useAgentChat.ts`:
- Around line 206-223: The current loop inside setDelegateMessages (handling
data.delegate_messages with delegate_id and msgs) uses updated.findIndex per
incoming msg, causing O(n²) behavior; fix by building an id→index map or a Set
of existing message IDs for the existing array (existing = prev[delegate_id] ||
[]) before iterating msgs, then do O(1) lookups to update or append messages
into updated, and finally set delegateMessagesRef.current = next and return next
as before (update logic lives in useAgentChat's delegate message handler).

In `@ui/src/pages/agent-settings/index.tsx`:
- Around line 146-157: The fetchTools useCallback currently swallows errors;
update its catch block to surface failures by logging the caught error (e.g.,
console.error or processLogger) and setting a component error state (add a
useState like toolFetchError and setter setToolFetchError) so the UI can show an
error message; keep returning [] on failure but ensure fetchTools, setToolMetas
and the new error state are referenced/updated so callers and the component can
render an error indicator when the GET to '/settings/agent/tools' fails.

In `@ui/src/styles/global.css`:
- Line 4: Static analyzers flag the Tailwind v4 directive "@custom-variant" in
global.css as an unknown at-rule; update the linter configs to recognize
Tailwind directives by enabling the Biome CSS parser option
"tailwindDirectives": true and add "@custom-variant" to Stylelint's
"ignoreAtRules" (or use a pattern that includes custom-variant) so the
`@custom-variant` rule is accepted by both Biome and Stylelint.

Comment thread internal/agent/api_test.go
Comment thread internal/agent/api.go
Comment thread internal/agent/delegate.go
Comment thread internal/agent/delegate.go
Comment thread internal/agent/delegate.go Outdated
Comment thread internal/service/frontend/api/v1/agent_sessions.go
Comment thread rfcs/017-sub-agent.md Outdated
Comment thread rfcs/017-sub-agent.md Outdated
Comment thread ui/src/features/agent/components/ChatMessages.tsx
Comment thread ui/src/features/agent/hooks/useAgentChat.ts
@yottahmd yottahmd merged commit 4d640a5 into main Feb 18, 2026
6 checks passed
@yottahmd yottahmd deleted the feat-subagent branch February 18, 2026 11:24
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 18, 2026

Codecov Report

❌ Patch coverage is 78.69986% with 154 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.25%. Comparing base (ae13102) to head (94a2ad4).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
internal/agent/delegate.go 78.92% 19 Missing and 32 partials ⚠️
internal/agent/api.go 69.42% 23 Missing and 25 partials ⚠️
internal/agent/web_search.go 56.52% 11 Missing and 9 partials ⚠️
internal/agent/session.go 89.53% 4 Missing and 5 partials ⚠️
internal/agent/loop.go 88.52% 0 Missing and 7 partials ⚠️
internal/agent/contextkeys.go 0.00% 6 Missing ⚠️
internal/agent/tool_registry.go 75.00% 1 Missing and 4 partials ⚠️
internal/agent/schema/registry.go 42.85% 4 Missing ⚠️
internal/agent/read.go 86.66% 1 Missing and 1 partial ⚠️
internal/agent/patch.go 88.88% 0 Missing and 1 partial ⚠️
... and 1 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1685      +/-   ##
==========================================
- Coverage   69.71%   69.25%   -0.46%     
==========================================
  Files         359      361       +2     
  Lines       39941    40301     +360     
==========================================
+ Hits        27845    27911      +66     
+ Misses       9865     9861       -4     
- Partials     2231     2529     +298     
Files with missing lines Coverage Δ
internal/agent/ask_user.go 95.83% <100.00%> (+0.32%) ⬆️
internal/agent/bash.go 100.00% <100.00%> (ø)
internal/agent/model_config.go 95.83% <100.00%> (-0.40%) ⬇️
internal/agent/navigate.go 100.00% <100.00%> (ø)
internal/agent/presets.go 80.00% <ø> (-20.00%) ⬇️
internal/agent/read_schema.go 76.74% <100.00%> (+4.52%) ⬆️
internal/agent/system_prompt.go 51.72% <ø> (-32.90%) ⬇️
internal/agent/think.go 100.00% <100.00%> (ø)
internal/agent/tools.go 100.00% <100.00%> (ø)
internal/agent/types.go 75.00% <ø> (-25.00%) ⬇️
... and 12 more

... and 11 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ae13102...94a2ad4. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant