Skip to content

feat: CLI-via-Goosed unified agent architecture with multi-agent routing#7238

Closed
bioinfornatics wants to merge 525 commits intoblock:mainfrom
bioinfornatics:feature/cli-via-goosed
Closed

feat: CLI-via-Goosed unified agent architecture with multi-agent routing#7238
bioinfornatics wants to merge 525 commits intoblock:mainfrom
bioinfornatics:feature/cli-via-goosed

Conversation

@bioinfornatics
Copy link
Copy Markdown
Contributor

CLI-via-Goosed: Unified Agent Architecture

Summary

This PR introduces a unified architecture where the CLI communicates with agents through goosed (the server binary), aligning desktop and CLI on a single communication path. It also adds multi-agent orchestration with an intent router, ACP/A2A protocol compatibility, and comprehensive UI improvements.

Key Changes

🏗️ Architecture: CLI-via-Goosed

  • CLI now communicates through goosed server instead of directly instantiating agents
  • GoosedClient manages server lifecycle (spawn, health check, graceful shutdown)
  • Process discovery & reuse via PID state file (~/.config/goose/goosed.state)
  • goose service install|uninstall|status|logs for managed daemon lifecycle (systemd/launchd)

🤖 Multi-Agent System

  • GooseAgent: 7 behavioral modes (assistant, specialist, recipe_maker, app_maker, app_iterator, judge, planner)
  • CodingAgent: 8 SDLC modes (pm, architect, backend, frontend, qa, security, sre, devsecops)
  • IntentRouter: Keyword-based routing with fuzzy prefix matching and configurable confidence thresholds
  • OrchestratorAgent: LLM-based meta-coordinator with fallback to IntentRouter
  • Internal modes (judge, planner, recipe_maker) filtered from public discovery

📡 Protocol Compatibility

  • ACP (Agent Communication Protocol): Full run lifecycle (create → stream → complete/cancel), elicitation, await flows
  • A2A (Agent-to-Agent): Dynamic agent card generation from IntentRouter slots
  • RunStore: Single-mutex design with LRU eviction (MAX_COMPLETED_RUNS=1000), TOCTOU-safe resume
  • ACP-IDE WebSocket: Session mode state with available/current mode tracking, notification forwarding

📊 Analytics & Observability

  • Routing analytics endpoints: POST /analytics/routing/inspect, POST /analytics/routing/eval, GET /analytics/routing/catalog
  • Routing evaluation framework: YAML-based test sets (29 cases), per-agent/mode accuracy metrics, confusion matrix
  • OpenTelemetry spans: orchestrator.route, orchestrator.llm_classify, intent_router.route
  • AgentEvent::PlanCreated: New event variant for orchestration plan tracking

🖥️ UI Improvements

  • WorkBlockIndicator: Collapsible tool-call chains with auto-open streaming, live-update panel
  • Progressive message rendering: Two-tier final answer detection, suppress transient tool call flash
  • Agent management: Dedup agents by ID, mode switching
  • ReasoningDetailPanel: Enhanced for work blocks with streaming support
  • Refactored hooks: useChatStream split into streamReducer.ts + streamDecoder.ts (860→576 lines)

🔒 Security & Reliability

  • Concurrency limit (10) on all /runs endpoints via ServiceBuilder
  • Structured ErrorResponse on all 11 bare StatusCode returns in runs.rs
  • ErrorResponse::bad_request() and conflict() constructors added
  • AcpIdeSessions LRU eviction (MAX_IDE_SESSIONS=100) with idle timeout

Quality Gates

Gate Status
cargo build --all-targets
cargo fmt --check
cargo clippy --all-targets -- -D warnings
cargo test -p goose --lib (789 tests)
cargo test -p goose-server (40 tests)
npx tsc --noEmit
npx vitest run (325/326, 1 pre-existing)
npx eslint
Merge conflict check ✅ Clean

New Test Coverage

  • 14 RunStore lifecycle tests: create/get, status transitions, await/elicitation, cancellation, events, output, errors, pagination, eviction
  • 23 WorkBlock non-regression tests: streaming, completed, tool chains, final answer detection, dual indicator prevention
  • 7 routing evaluation tests: YAML parsing, accuracy thresholds, metrics computation, report generation
  • 6 SSE parser tests: event boundary handling, multi-event buffers, partial data
  • 6 IntentRouter tests: keyword routing, fallback, disabled agents

Routing Evaluation Baseline

Overall: 41.4% (keyword router — LLM router pending)
Goose Agent:  100% (5/5)
Coding Agent:  33% (8/24)
Best modes:  architect 100%, qa 100%
Worst modes: frontend 0%, devsecops 0%, backend 20%

Files Changed

  • Rust: ~30 new/modified files across goose, goose-server, goose-cli, goose-mcp
  • TypeScript/React: ~15 new/modified files in ui/desktop
  • Tests: 50+ new tests (Rust + Vitest)
  • Docs: Architecture review, analytics backlog, protocol analysis

Follow-up Work

  • BL-2: Analytics UI dashboard (3-tab React page)
  • BL-3: Live user feedback (👍/👎 on routing)
  • Agent extraction: QA, PM, Security as standalone agents
  • LLM-based router (replace keyword matching)
  • Full OTel dashboard integration

@michaelneale
Copy link
Copy Markdown
Collaborator

thanks @bioinfornatics I like that general idea - looks like a lot of work to tidy up conflicts but would like to see what it looks like if you could show it here.

@bioinfornatics bioinfornatics force-pushed the feature/cli-via-goosed branch 6 times, most recently from e8dc59e to 4b68302 Compare February 17, 2026 12:59
@DOsinga
Copy link
Copy Markdown
Collaborator

DOsinga commented Feb 17, 2026

this is a massive change! I like the ideas, but I think we should discuss some of them separately (where we do we want to go with clients?) but also, it seems to introduce 5 big ideas, shouldn't we split those up?

…o-scroll

The auto-scroll useEffect depended on `children` prop, which is a new
React element reference on every render. This caused:

1. useEffect firing every render (not just on content changes)
2. scrollTo() triggering Radix ScrollArea's internal setRef callback
3. setRef calling setState → re-render → new children ref → loop
4. 'Maximum update depth exceeded' crash

Fix: Replace children dependency with ResizeObserver on the viewport's
scroll container. ResizeObserver fires only when content size actually
changes, breaking the render loop while preserving auto-scroll behavior.

Before: useEffect([children, autoScroll]) → fires every render
After:  ResizeObserver on scrollContent → fires on resize only
…vent infinite loop

The useRegisterSession hook had a single useEffect that both registered
(setter({...})) and unregistered (cleanup setter(null)) the session.
Its dependency array included unstable references (functions, arrays, objects)
that changed identity every render, causing:

  render → effect fires → setter({...}) → provider re-renders →
  BaseChat re-renders → new refs → cleanup setter(null) → re-render → loop

Split into two effects:
1. Register/unregister: depends only on sessionId + stableSubmit.
   The cleanup setter(null) runs ONLY here — on session change or unmount.
2. Update fields: depends on primitive/length values only.
   Uses functional updater setter(prev => ...) with no cleanup.

Also widened setSessionState type to accept functional updater pattern.
Dependency array uses .length for arrays to avoid identity-based re-runs.
Three changes to eliminate re-render churn in the work block side panel:

1. Replace smooth scrollIntoView with rAF-throttled auto scroll
   - smooth scroll triggers Radix ScrollArea reflow → setState → re-render
   - rAF loop with behavior: 'auto' avoids the feedback cascade

2. Stabilize prop identity with module-level constants
   - new Map() → EMPTY_TOOL_NOTIFICATIONS (module const)
   - () => {} → NOOP (module const)
   - Prevents GooseMessage from seeing new prop refs every render

3. Add rafId ref for proper cleanup on unmount/stream-end
… render loops

ToolCallWithResponse:
- Replace useState/useEffect for startTime with useRef (no re-render)
- Memoize toolResults, logs, progressEntries with React.useMemo
- Remove unused useMemo named import (use React.useMemo instead)

TooltipWrapper:
- Remove per-instance TooltipProvider — uses app-level one from AppLayout
- Prevents creating new Radix context on every render during streaming
- Reduces component tree depth and re-render cascade

These changes eliminate the 'Maximum update depth exceeded' crash
in ReasoningDetailPanel → GooseMessage → ToolCallWithResponse path
during streaming by preventing unstable prop identity and unnecessary
state updates.
Root cause: during streaming, WorkBlockIndicator called updateWorkBlock with
a new object every render (messages array recreated by .map() in parent).
The context provider created a new value object every render, causing all
consumers to re-render in an infinite loop.

ReasoningDetailContext:
- Memoize Provider value with useMemo
- Use refs (panelDetailRef, detailRef) in toggle callbacks to remove
  state from useCallback deps, making callbacks fully stable
- Add shallow value comparison in updateWorkBlock (messages.length +
  toolCount + isStreaming) to skip state updates when nothing changed

WorkBlockIndicator:
- Consolidate 4 individual refs into single latestRef object
- Wrap buildDetail in useCallback with stable deps
- Add proper dependency arrays to all useEffect hooks
…treaming

During streaming with multiple assistant messages, the LLM often outputs
text before adding tool calls. The previous logic would prematurely select
this streaming text as the 'final answer' and render it outside the work
block, causing content to flash in and out as tool requests arrive.

Fix: skip final answer detection for multi-message streaming runs
(assistantIndices.length > 1). Single-message streaming runs still use
normal detection since the rendering layer handles tool-call suppression
via suppressToolCalls prop.

Also fixes the one-liner summary: since all messages stay as intermediates
during streaming, extractOneLiner now correctly picks up the latest
assistant text for the work block indicator description.

Add 2 regression tests for multi-message streaming scenarios.
…within TooltipProvider'

Add TooltipProvider at the App root level so all Radix Tooltip consumers
are guaranteed coverage, regardless of where they render in the component
tree. Previously, only the SidebarProvider in AppLayout wrapped a
TooltipProvider, leaving edge cases (error boundaries, modals, race
conditions during render loop recovery) uncovered.
…ool results in side panel

WorkBlockIndicator: extractOneLiner now prefers the last tool call
description (e.g. 'editing src/App.tsx', 'running ls -la') over
raw LLM reasoning text. Uses describeToolCall() with human-readable
summaries for common tools (text_editor, shell, analyze, etc.) and
a generic fallback for unknown tools.

ReasoningDetailPanel: pass suppressToolCalls to GooseMessage so the
side panel doesn't render full tool response content (file contents,
command outputs). Also add type='button' to close button.
extractOneLiner now prioritizes the latest assistant text's first
sentence (e.g. 'Let me fix the render loop' / 'Now I'll check the
scroll area') over tool call descriptions. This captures the LLM's
*intent* — what it's thinking about — rather than mechanical details
like file paths.

Priority order:
  1. First sentence of latest assistant text (≥10 chars)
  2. Last tool call description (fallback)

firstSentence() strips markdown/HTML/code-blocks and splits on
sentence boundaries (. ! ? : —) for clean, readable summaries.
During streaming, the one-liner flickered on every token because
extractOneLiner recomputed from messages on every render.

Replace with useStableOneLiner hook that:
- Shows tool call descriptions immediately (they appear all at once)
- Only updates thinking sentences when they look complete (end with
  punctuation) and differ from what's currently shown
- Holds the current value for a minimum 1.5s to prevent flicker
- Falls back to tool descriptions when no assistant text exists
- Shows final value immediately when streaming ends

Split extractOneLiner into focused helpers:
- extractToolDescription: last tool call as human-readable text
- extractThinkingSentence: first complete sentence from assistant text
- useStableOneLiner: debounced hook combining both with hold logic
The previous approach tried to extract assistant thinking sentences,
but these stream word-by-word causing constant flickering. The debounce
hook (useStableOneLiner) added complexity but still showed partial text.

Simplify to tool-call-only descriptions:
- Tool calls are discrete events (appear all at once, never stream)
- They tell you exactly what Goose is doing: 'reading src/App.tsx',
  'running npm test', 'editing WorkBlockIndicator.tsx'
- Only updates when a NEW tool call starts (stable between calls)
- Remove: useStableOneLiner, firstSentence, extractThinkingSentence
- Remove: useState import (no longer needed)

121 lines removed, 8 added — dramatically simpler.
- Add Badge atom: variant-based (default/secondary/accent/muted/outline), two sizes
- Add StatusDot atom: active (blue pulse), completed (green), idle (gray)
- Export both from atoms barrel (index.ts)

WorkBlockIndicator:
- Use Badge for agent/mode info (only shown when non-default agent)
- Use StatusDot for streaming/completed status
- Cleaner layout: status line + one-liner instead of inline text
- Remove redundant agent/mode display when it's the default 'Goose'

ReasoningDetailPanel:
- Rename title 'Work Block' → 'Activity'
- Use Badge for agent/mode badge in header (only when non-default)
- Use StatusDot in tool count status bar
- suppressToolCalls on GooseMessage (hides raw tool outputs)
- Extract _routingInfo (agentName, modeSlug) from first message of each block
- Track previous agent/mode across blocks in ProgressiveMessageList
- Pass showAgentBadge prop to WorkBlockIndicator — true only when
  agent/mode differs from previous block
- Pass showAgentBadge through to ReasoningDetailPanel via WorkBlockDetail
- Filter out default agents (Goose, Goose Agent) from badge display
- Reduces visual noise: badge only appears when 'who is talking' changes
GooseMessage renders agent/mode badges on each message. When rendered
inside ReasoningDetailPanel (activity side panel), this creates a
stack of repeated badges. Add hideRoutingBadges prop to GooseMessage
and pass it from ReasoningDetailPanel to suppress all three badge
locations (tool-only early return, main badge, timestamp area).
Replace full GooseMessage rendering in the activity side panel with a
compact tool-call list view:

- New ActivityStep molecule: icon + description + status indicator
  - Tool-specific icons (Terminal for shell, FileText for editor, etc.)
  - Spinning loader for active tools, green check for completed
- ReasoningDetailPanel now extracts ActivityEntry[] from messages:
  - Shows tool calls as compact steps ("reading src/App.tsx")
  - Shows thinking text as italic summaries (first sentence only)
  - Skips tool-result user messages entirely
- Much cleaner than rendering full GooseMessage with raw tool outputs
- Exported from molecules barrel
During streaming, the last assistant message has partial text that builds
up token by token. Previously extractActivityEntries extracted the first
sentence from this partial text, causing thinking entries to flicker.

Now:
- Completed messages: extract first complete sentence via firstSentence()
  which uses sentence-ending punctuation (. ! ? : —) as boundaries
- Streaming message: skip thinking text entirely (it's partial), the
  active tool spinner already indicates activity
- Result: stable, non-flickering activity log during streaming
…able tool details

- Add scripts/categorize_diagnostic_logs.py: standalone Python tool to parse/categorize
  diagnostic session logs into UI rendering zones (main panel, work block, hidden)
  with --timeline, --json, --validate modes

- Add ui/desktop/src/utils/diagnosticLogParser.ts: TypeScript utility mirroring the
  Python categorization for use in the UI. Supports parseLogLines(), parseSession(),
  toMessages() to reconstruct sessions from JSONL diagnostic logs

- Enhance ActivityStep: now expandable with tool arguments (key-value), result text
  (with truncation/expand), and error messages. Collapsed view unchanged.

- Add ThinkingEntry component: renders chain-of-thought text as italic plain text,
  visually distinct from tool activity blocks

- Enhance ReasoningDetailPanel: pairs tool requests with their responses by matching
  IDs across messages, builds rich timeline of interleaved thinking + tool activity

- Add comprehensive tests:
  - 22 diagnosticLogParser tests (zone mapping, parsing, sessions, toMessages)
  - 16 ActivityStep + ThinkingEntry component tests (rendering, expand/collapse, args)
  - 20 ReasoningDetailPanel tests (buildToolResponseMap, extractActivityEntries)

Categories mapped to UI zones:
  Main Panel: USER_INPUT, ASSISTANT_TEXT, STREAMING_CHUNK
  Work Block: TOOL_REQUEST, TOOL_RESULT, INTERNAL_WORK
  Reasoning:  THINKING
  Hidden:     SYSTEM_INFO, TITLE_GENERATION, USAGE_STATS
Major fix: pushMessage now concatenates streaming text deltas instead of
replacing the last message. The server sends each streaming chunk as a
separate Message with the same ID containing only the delta text (a few
tokens), not the full accumulated text. The old code replaced the entire
last message with each incoming chunk, causing only the final chunk to
be displayed (the 'single dot' bug where 843 chunks of a response
resulted in just '.' being shown).

Minor fix: WorkBlockIndicator one-liner now prefers showing the latest
assistant thinking text (e.g. 'I'll start by analyzing...') over tool
call descriptions (e.g. 'running command...'), giving better context
about what the agent is doing.

Includes 11 tests for pushMessage covering accumulation, interleaved
messages, metadata preservation, and the exact 'single dot' regression
scenario.
…ing work blocks

During multi-message streaming (tool calls + text), the pure-text final answer
message is now shown outside the work block immediately, so users can read the
response as it streams token-by-token while tool calls remain collapsed above.

Previously, ALL messages were hidden behind the WorkBlockIndicator during
streaming, and text only appeared after the entire response finished.

Key changes:
- identifyWorkBlocks: During streaming, pure-text messages are identified as
  final answers for progressive rendering. Text+tool messages stay collapsed.
- If the LLM later adds tool calls to a pure-text message, identifyWorkBlocks
  re-runs and absorbs it back into the work block automatically.
- Updated tests to verify progressive rendering behavior.
…ioning

- MainPanelLayout: change h-dvh to h-full so the panel respects flex
  constraints and doesn't extend behind GlobalChatInput
- ProgressiveMessageList: broaden showPendingIndicator to remain visible
  whenever streaming is active (not just before first assistant message),
  so the activity indicator stays visible under the last user message
- useSandboxBridge: prefix unused resourceUri with underscore
- resultsCache: add LRU eviction (max 5 sessions) and only cache when
  idle to prevent caching on every streaming chunk
- pushMessage: mutate array in-place instead of copying on every chunk,
  reducing O(n²) allocations during streaming
- maybeUpdateUI: spread currentMessages when dispatching to React so
  state changes are detected, and fix reduced-motion branch to actually
  batch updates instead of dispatching immediately (defeating batching)
@lifeizhou-ap
Copy link
Copy Markdown
Collaborator

Hey @bioinfornatics 👋 Just checking in on this draft. Are you still actively working on it? If not, would you mind closing it? We can always reopen later if you pick it back up. Thanks!

SOTA-aligned improvements based on multi-agent AI research (2025):

Orchestrator Prompts (Anthropic 2025 best practices):
- Rewrite system.md, routing.md, splitting.md with XML-structured tags
- XML agent catalog in build_catalog_text() (<agent>, <mode>, <use_when>)
- Filter internal modes from routing prompts and user-facing catalogs
- Add explicit mode validation ('Do NOT invent new modes')

AGENTS.md (Linux Foundation standard):
- Enhance root AGENTS.md with architecture docs, routing flow, A2A, design decisions
- Add nested crates/goose/src/agents/AGENTS.md (agent system docs)
- Add nested crates/goose/src/prompts/AGENTS.md (template conventions)

Tests: 42 pass (24 orchestrator + 13 intent_router + 5 dispatch)
Clippy: 0 warnings
Add embedding-based semantic routing between keyword matching (<10ms)
and LLM-as-Judge (~1-5s), providing ~100ms routing that's 50x faster
than LLM and far more robust than keywords.

Architecture (3-tier hybrid routing):
  Layer 1: Keyword matching (IntentRouter, <10ms)
  Layer 2: TF-IDF cosine similarity (SemanticRouter, ~1ms)  [NEW]
  Layer 3: LLM-as-Judge (OrchestratorAgent, ~1-5s)

SemanticRouter implementation:
- TF-IDF vectorization with smoothed IDF weighting
- Cosine similarity matching against pre-computed route vectors
- Minimal English suffix stemmer (no external dependencies)
- Stop word filtering, configurable similarity threshold (0.15)
- Top matching terms for explainability
- Zero external dependencies — pure Rust

IntentRouter integration:
- SemanticRouter field auto-built from agent slots
- Rebuilt on slot add/remove/enable changes
- Semantic layer activated when keyword score < 0.2 threshold
- Tracing spans record routing strategy for observability

Files:
- NEW: crates/goose/src/agents/semantic_router.rs (607 lines)
- MOD: crates/goose/src/agents/intent_router.rs (+150 lines)
- MOD: crates/goose/src/agents/mod.rs (module registration)

Tests: 51 pass (12 semantic_router + 15 intent_router + 24 orchestrator_agent)
SOTA ref: Semantic Router pattern (Aurelio Labs), hybrid routing architecture
…ntrast

Root cause: hover:bg-background-danger-muted was undefined (no CSS variable),
causing the button to lose its background on hover and expose the card bg
(#3f434b), which fails WCAG AA contrast with text-danger (#ff6b6b) at 3.58:1.

Fix:
- Add --background-danger-muted, --background-success-muted, --background-warning-muted
  tokens to both light and dark themes in main.css
- Add corresponding @theme inline aliases for Tailwind class generation
- Change Delete button border from border-default to border-danger for
  better visual affordance
- Light mode tokens use near-white tints (#fff5f5, #e6f4ea, #fff8e1)
- Dark mode tokens use solid dark-tinted colors (#3d2222, #1e2e1a, #302a18)

Contrast audit (all PASS WCAG AA ≥4.5:1):
- Dark default:  #ff6b6b on #22252a = 5.54:1
- Dark hover:    #ff6b6b on #3d2222 = 5.22:1
- Light default: #d32f2f on #ffffff = 4.98:1
- Light hover:   #d32f2f on #fff5f5 = 4.65:1

Fixes: goose4-ntlm
…ix noExplicitAny/noNonNullAssertion

- Auto-fix import organization across 112 files (biome organizeImports)
- Add type='button' to mock buttons in ProviderGrid.test.tsx (useButtonType)
- Replace 'as any' with typed cast in ModelAndProviderContext.test.tsx (noExplicitAny)
- Replace non-null assertion with null guard in ModelAndProviderContext.test.tsx (noNonNullAssertion)
- Add QA and Security debug prompt templates
- Result: 0 biome errors, 0 biome warnings across 484 files
- All 540 UI tests pass
…s routes

- universal_mode: richer when_to_use descriptions for all 5 modes
- goose/review.md, pm/review.md, pm/write.md: enhanced prompt templates
- security/ask.md, security/write.md: improved security agent prompts
- goose_agent, qa_agent, security_agent: agent refinements
- analytics.rs: new analytics route endpoints
- agent_management.rs: agent management improvements
- tool_analytics.rs: analytics tracking updates
- prompt_template.rs: template rendering improvements
…mprovements

- EvalRunner, AgentCatalog, RoutingInspector: enhanced analytics UI
- ModelAndProviderContext: improved provider state management
- useChatStream: streaming enhancements
- AppSidebar, SessionListView, RecipesView: navigation refinements
- SettingsView, ConfigSettings, TelemetrySettings: settings improvements
- SwitchModelModal, ModelsBottomBar: model switching updates
- ChatInput, BaseChat: chat UX improvements
- AppLayout: layout refinements
- main.ts: electron main process updates
- Various component and toast improvements
…sion guards

Routing quality improvements:
- Enrich PM Agent description with RICE, MoSCoW, sprint planning, acceptance criteria,
  phased rollout vocabulary — PM accuracy 28.6% → 71.4%
- Enrich Research Agent description with literature review, benchmarking, RFC summaries,
  concept explanation — Research accuracy 0% → 50%
- Agent-level accuracy: 58% → 70%

New eval regression guards:
- test_agent_level_accuracy_baseline: ≥60% agent accuracy
- test_pm_routing_baseline: ≥50% PM routing
- test_research_routing_baseline: ≥30% Research routing
- test_semantic_layer_used: ≥3 cases routed via semantic layer

Total: 62 routing tests pass (12 semantic + 15 intent_router + 11 eval + 24 orchestrator)
…aggregation prompt

SOTA E1 improvements:

1. GenUI Cross-Agent Binding
   - Add genui to Developer Agent recommended_extensions (ask, write, debug modes)
   - Add genui to QA Agent recommended_extensions
   - Add genui to Research Agent recommended_extensions
   - Any agent can now produce data visualizations via genui tools

2. Adaptive Thinking Debug Prompts (Anthropic 2025)
   - Developer debug: hypothesis matrix, 5 Whys, fault tree, interleaved thinking
   - QA debug: flaky test decision tree, test isolation, failure classification
   - Security debug: attack vector tree, incident timeline, blast radius assessment
   - All 3 include anti-overthinking guard and effort calibration

3. Compound Task Result Aggregation
   - New orchestrator/aggregation.md prompt template
   - XML-structured with synthesis instructions
   - Produces unified responses instead of concatenated parts

All 1029 tests pass, clippy clean, fmt clean.
…dispatch

- Add aggregate_results_with_llm() to orchestrator_agent.rs
  Uses orchestrator/aggregation.md prompt template to synthesize
  multiple sub-task results into a coherent unified response via LLM
  Falls back to simple string concatenation on any error

- Register orchestrator/aggregation.md in prompt_template.rs
  Template with {{task_count}}, {{user_message}}, {{results}} variables

- Wire LLM aggregation into reply.rs compound dispatch flow
  When provider available, uses LLM synthesis; otherwise falls back
  to aggregate_results() simple concatenation

Quality: 1029 tests pass, clippy clean, fmt clean
Add ProjectAgentConfig system for per-project agent customization:

agent_config.rs (418 lines):
- Load .goose/agents.yaml with serde deserialization
- Enable/disable agents, override descriptions, add extensions
- Custom mode creation (with slug, name, description, tool_groups)
- Routing feedback persistence (.goose/routing_feedback.json)
- Mode override (description, when_to_use per mode per agent)
- 6 tests covering parsing, loading, applying, feedback, custom modes

intent_router.rs:
- apply_project_config() integration method
- Project-level default_agent/default_mode fallback in route()
- 3 new tests for config integration (disable, default, custom mode)

YAML schema supports:
  default_agent: 'Developer Agent'
  default_mode: 'write'
  agents:
    'Developer Agent':
      enabled: true
      description: 'Custom description'
      extra_extensions: ['flutter-tools']
      modes:
        write:
          when_to_use: 'When creating Flutter widgets'
  custom_modes:
    - slug: 'data-pipeline'
      name: 'Data Pipeline'
      description: 'Build and debug data pipelines'
      agents: ['Developer Agent']
      tool_groups: ['read', 'edit', 'command']
Layer 0 in the 4-tier hybrid routing architecture:
  [0] Feedback corrections (learned, 0.95 confidence)
  [1] Keyword match (<10ms)
  [2] TF-IDF semantic (~1ms)
  [3] Default fallback

- Add routing_feedback field to IntentRouter
- check_feedback() uses keyword overlap (≥50%) to find matching corrections
- record_routing_feedback() stores user corrections for similar future queries
- Feedback takes highest priority (Layer 0) — if a user previously corrected
  routing for a similar message, we trust that correction
- Integrates with .goose/agents.yaml routing_feedback persistence
- 2 new tests: feedback override + unrelated message non-match
- All 1040 tests pass, clippy clean
- GET /agent-config — load current project agent config
- PUT /agent-config — save updated config to .goose/agents.yaml
- GET /agent-config/routing-feedback — list routing corrections
- POST /agent-config/routing-feedback — record new routing correction

Server backbone for the Agent Config UI feature.
Closes goose4-j6mp.
- knowledge_extraction.rs: Extract structured KG entities and relations from
  conversation text via LLM-powered prompt
- Entity types: Concept, Component, Decision, Finding, Risk, RepoPath
- Relation types: depends_on, implements, affects, derived_from, etc.
- Features: merge/dedup, confidence filtering, JSON fence parsing, cap at 20
- knowledge_extraction.md: Prompt template for entity/relation extraction
- Registered in TEMPLATE_REGISTRY
- 7 tests covering parsing, merging, filtering, caps, JSON extraction

Part of SOTA E6: GraphRAG-style knowledge management
@jamadeo jamadeo closed this Mar 6, 2026
@bioinfornatics
Copy link
Copy Markdown
Contributor Author

bioinfornatics commented Mar 6, 2026

yes sorry I pushed my test on the wrong repo . It is for testing purpose. maybe I will split and push features later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants