feat: CLI-via-Goosed unified agent architecture with multi-agent routing by bioinfornatics · Pull Request #7238 · block/goose

bioinfornatics · 2026-02-16T00:15:04Z

CLI-via-Goosed: Unified Agent Architecture

Summary

This PR introduces a unified architecture where the CLI communicates with agents through goosed (the server binary), aligning desktop and CLI on a single communication path. It also adds multi-agent orchestration with an intent router, ACP/A2A protocol compatibility, and comprehensive UI improvements.

Key Changes

🏗️ Architecture: CLI-via-Goosed

CLI now communicates through goosed server instead of directly instantiating agents
GoosedClient manages server lifecycle (spawn, health check, graceful shutdown)
Process discovery & reuse via PID state file (~/.config/goose/goosed.state)
goose service install|uninstall|status|logs for managed daemon lifecycle (systemd/launchd)

🤖 Multi-Agent System

GooseAgent: 7 behavioral modes (assistant, specialist, recipe_maker, app_maker, app_iterator, judge, planner)
CodingAgent: 8 SDLC modes (pm, architect, backend, frontend, qa, security, sre, devsecops)
IntentRouter: Keyword-based routing with fuzzy prefix matching and configurable confidence thresholds
OrchestratorAgent: LLM-based meta-coordinator with fallback to IntentRouter
Internal modes (judge, planner, recipe_maker) filtered from public discovery

📡 Protocol Compatibility

ACP (Agent Communication Protocol): Full run lifecycle (create → stream → complete/cancel), elicitation, await flows
A2A (Agent-to-Agent): Dynamic agent card generation from IntentRouter slots
RunStore: Single-mutex design with LRU eviction (MAX_COMPLETED_RUNS=1000), TOCTOU-safe resume
ACP-IDE WebSocket: Session mode state with available/current mode tracking, notification forwarding

📊 Analytics & Observability

Routing analytics endpoints: POST /analytics/routing/inspect, POST /analytics/routing/eval, GET /analytics/routing/catalog
Routing evaluation framework: YAML-based test sets (29 cases), per-agent/mode accuracy metrics, confusion matrix
OpenTelemetry spans: orchestrator.route, orchestrator.llm_classify, intent_router.route
AgentEvent::PlanCreated: New event variant for orchestration plan tracking

🖥️ UI Improvements

WorkBlockIndicator: Collapsible tool-call chains with auto-open streaming, live-update panel
Progressive message rendering: Two-tier final answer detection, suppress transient tool call flash
Agent management: Dedup agents by ID, mode switching
ReasoningDetailPanel: Enhanced for work blocks with streaming support
Refactored hooks: useChatStream split into streamReducer.ts + streamDecoder.ts (860→576 lines)

🔒 Security & Reliability

Concurrency limit (10) on all /runs endpoints via ServiceBuilder
Structured ErrorResponse on all 11 bare StatusCode returns in runs.rs
ErrorResponse::bad_request() and conflict() constructors added
AcpIdeSessions LRU eviction (MAX_IDE_SESSIONS=100) with idle timeout

Quality Gates

Gate	Status
`cargo build --all-targets`	✅
`cargo fmt --check`	✅
`cargo clippy --all-targets -- -D warnings`	✅
`cargo test -p goose --lib` (789 tests)	✅
`cargo test -p goose-server` (40 tests)	✅
`npx tsc --noEmit`	✅
`npx vitest run` (325/326, 1 pre-existing)	✅
`npx eslint`	✅
Merge conflict check	✅ Clean

New Test Coverage

14 RunStore lifecycle tests: create/get, status transitions, await/elicitation, cancellation, events, output, errors, pagination, eviction
23 WorkBlock non-regression tests: streaming, completed, tool chains, final answer detection, dual indicator prevention
7 routing evaluation tests: YAML parsing, accuracy thresholds, metrics computation, report generation
6 SSE parser tests: event boundary handling, multi-event buffers, partial data
6 IntentRouter tests: keyword routing, fallback, disabled agents

Routing Evaluation Baseline

Overall: 41.4% (keyword router — LLM router pending)
Goose Agent:  100% (5/5)
Coding Agent:  33% (8/24)
Best modes:  architect 100%, qa 100%
Worst modes: frontend 0%, devsecops 0%, backend 20%

Files Changed

Rust: ~30 new/modified files across goose, goose-server, goose-cli, goose-mcp
TypeScript/React: ~15 new/modified files in ui/desktop
Tests: 50+ new tests (Rust + Vitest)
Docs: Architecture review, analytics backlog, protocol analysis

Follow-up Work

BL-2: Analytics UI dashboard (3-tab React page)
BL-3: Live user feedback (👍/👎 on routing)
Agent extraction: QA, PM, Security as standalone agents
LLM-based router (replace keyword matching)
Full OTel dashboard integration

michaelneale · 2026-02-16T00:50:12Z

thanks @bioinfornatics I like that general idea - looks like a lot of work to tidy up conflicts but would like to see what it looks like if you could show it here.

DOsinga · 2026-02-17T14:32:53Z

this is a massive change! I like the ideas, but I think we should discuss some of them separately (where we do we want to go with clients?) but also, it seems to introduce 5 big ideas, shouldn't we split those up?

…o-scroll The auto-scroll useEffect depended on `children` prop, which is a new React element reference on every render. This caused: 1. useEffect firing every render (not just on content changes) 2. scrollTo() triggering Radix ScrollArea's internal setRef callback 3. setRef calling setState → re-render → new children ref → loop 4. 'Maximum update depth exceeded' crash Fix: Replace children dependency with ResizeObserver on the viewport's scroll container. ResizeObserver fires only when content size actually changes, breaking the render loop while preserving auto-scroll behavior. Before: useEffect([children, autoScroll]) → fires every render After: ResizeObserver on scrollContent → fires on resize only

…vent infinite loop The useRegisterSession hook had a single useEffect that both registered (setter({...})) and unregistered (cleanup setter(null)) the session. Its dependency array included unstable references (functions, arrays, objects) that changed identity every render, causing: render → effect fires → setter({...}) → provider re-renders → BaseChat re-renders → new refs → cleanup setter(null) → re-render → loop Split into two effects: 1. Register/unregister: depends only on sessionId + stableSubmit. The cleanup setter(null) runs ONLY here — on session change or unmount. 2. Update fields: depends on primitive/length values only. Uses functional updater setter(prev => ...) with no cleanup. Also widened setSessionState type to accept functional updater pattern. Dependency array uses .length for arrays to avoid identity-based re-runs.

Three changes to eliminate re-render churn in the work block side panel: 1. Replace smooth scrollIntoView with rAF-throttled auto scroll - smooth scroll triggers Radix ScrollArea reflow → setState → re-render - rAF loop with behavior: 'auto' avoids the feedback cascade 2. Stabilize prop identity with module-level constants - new Map() → EMPTY_TOOL_NOTIFICATIONS (module const) - () => {} → NOOP (module const) - Prevents GooseMessage from seeing new prop refs every render 3. Add rafId ref for proper cleanup on unmount/stream-end

… render loops ToolCallWithResponse: - Replace useState/useEffect for startTime with useRef (no re-render) - Memoize toolResults, logs, progressEntries with React.useMemo - Remove unused useMemo named import (use React.useMemo instead) TooltipWrapper: - Remove per-instance TooltipProvider — uses app-level one from AppLayout - Prevents creating new Radix context on every render during streaming - Reduces component tree depth and re-render cascade These changes eliminate the 'Maximum update depth exceeded' crash in ReasoningDetailPanel → GooseMessage → ToolCallWithResponse path during streaming by preventing unstable prop identity and unnecessary state updates.

Root cause: during streaming, WorkBlockIndicator called updateWorkBlock with a new object every render (messages array recreated by .map() in parent). The context provider created a new value object every render, causing all consumers to re-render in an infinite loop. ReasoningDetailContext: - Memoize Provider value with useMemo - Use refs (panelDetailRef, detailRef) in toggle callbacks to remove state from useCallback deps, making callbacks fully stable - Add shallow value comparison in updateWorkBlock (messages.length + toolCount + isStreaming) to skip state updates when nothing changed WorkBlockIndicator: - Consolidate 4 individual refs into single latestRef object - Wrap buildDetail in useCallback with stable deps - Add proper dependency arrays to all useEffect hooks

…treaming During streaming with multiple assistant messages, the LLM often outputs text before adding tool calls. The previous logic would prematurely select this streaming text as the 'final answer' and render it outside the work block, causing content to flash in and out as tool requests arrive. Fix: skip final answer detection for multi-message streaming runs (assistantIndices.length > 1). Single-message streaming runs still use normal detection since the rendering layer handles tool-call suppression via suppressToolCalls prop. Also fixes the one-liner summary: since all messages stay as intermediates during streaming, extractOneLiner now correctly picks up the latest assistant text for the work block indicator description. Add 2 regression tests for multi-message streaming scenarios.

…within TooltipProvider' Add TooltipProvider at the App root level so all Radix Tooltip consumers are guaranteed coverage, regardless of where they render in the component tree. Previously, only the SidebarProvider in AppLayout wrapped a TooltipProvider, leaving edge cases (error boundaries, modals, race conditions during render loop recovery) uncovered.

…ool results in side panel WorkBlockIndicator: extractOneLiner now prefers the last tool call description (e.g. 'editing src/App.tsx', 'running ls -la') over raw LLM reasoning text. Uses describeToolCall() with human-readable summaries for common tools (text_editor, shell, analyze, etc.) and a generic fallback for unknown tools. ReasoningDetailPanel: pass suppressToolCalls to GooseMessage so the side panel doesn't render full tool response content (file contents, command outputs). Also add type='button' to close button.

extractOneLiner now prioritizes the latest assistant text's first sentence (e.g. 'Let me fix the render loop' / 'Now I'll check the scroll area') over tool call descriptions. This captures the LLM's *intent* — what it's thinking about — rather than mechanical details like file paths. Priority order: 1. First sentence of latest assistant text (≥10 chars) 2. Last tool call description (fallback) firstSentence() strips markdown/HTML/code-blocks and splits on sentence boundaries (. ! ? : —) for clean, readable summaries.

During streaming, the one-liner flickered on every token because extractOneLiner recomputed from messages on every render. Replace with useStableOneLiner hook that: - Shows tool call descriptions immediately (they appear all at once) - Only updates thinking sentences when they look complete (end with punctuation) and differ from what's currently shown - Holds the current value for a minimum 1.5s to prevent flicker - Falls back to tool descriptions when no assistant text exists - Shows final value immediately when streaming ends Split extractOneLiner into focused helpers: - extractToolDescription: last tool call as human-readable text - extractThinkingSentence: first complete sentence from assistant text - useStableOneLiner: debounced hook combining both with hold logic

The previous approach tried to extract assistant thinking sentences, but these stream word-by-word causing constant flickering. The debounce hook (useStableOneLiner) added complexity but still showed partial text. Simplify to tool-call-only descriptions: - Tool calls are discrete events (appear all at once, never stream) - They tell you exactly what Goose is doing: 'reading src/App.tsx', 'running npm test', 'editing WorkBlockIndicator.tsx' - Only updates when a NEW tool call starts (stable between calls) - Remove: useStableOneLiner, firstSentence, extractThinkingSentence - Remove: useState import (no longer needed) 121 lines removed, 8 added — dramatically simpler.

- Add Badge atom: variant-based (default/secondary/accent/muted/outline), two sizes - Add StatusDot atom: active (blue pulse), completed (green), idle (gray) - Export both from atoms barrel (index.ts) WorkBlockIndicator: - Use Badge for agent/mode info (only shown when non-default agent) - Use StatusDot for streaming/completed status - Cleaner layout: status line + one-liner instead of inline text - Remove redundant agent/mode display when it's the default 'Goose' ReasoningDetailPanel: - Rename title 'Work Block' → 'Activity' - Use Badge for agent/mode badge in header (only when non-default) - Use StatusDot in tool count status bar - suppressToolCalls on GooseMessage (hides raw tool outputs)

- Extract _routingInfo (agentName, modeSlug) from first message of each block - Track previous agent/mode across blocks in ProgressiveMessageList - Pass showAgentBadge prop to WorkBlockIndicator — true only when agent/mode differs from previous block - Pass showAgentBadge through to ReasoningDetailPanel via WorkBlockDetail - Filter out default agents (Goose, Goose Agent) from badge display - Reduces visual noise: badge only appears when 'who is talking' changes

GooseMessage renders agent/mode badges on each message. When rendered inside ReasoningDetailPanel (activity side panel), this creates a stack of repeated badges. Add hideRoutingBadges prop to GooseMessage and pass it from ReasoningDetailPanel to suppress all three badge locations (tool-only early return, main badge, timestamp area).

Replace full GooseMessage rendering in the activity side panel with a compact tool-call list view: - New ActivityStep molecule: icon + description + status indicator - Tool-specific icons (Terminal for shell, FileText for editor, etc.) - Spinning loader for active tools, green check for completed - ReasoningDetailPanel now extracts ActivityEntry[] from messages: - Shows tool calls as compact steps ("reading src/App.tsx") - Shows thinking text as italic summaries (first sentence only) - Skips tool-result user messages entirely - Much cleaner than rendering full GooseMessage with raw tool outputs - Exported from molecules barrel

During streaming, the last assistant message has partial text that builds up token by token. Previously extractActivityEntries extracted the first sentence from this partial text, causing thinking entries to flicker. Now: - Completed messages: extract first complete sentence via firstSentence() which uses sentence-ending punctuation (. ! ? : —) as boundaries - Streaming message: skip thinking text entirely (it's partial), the active tool spinner already indicates activity - Result: stable, non-flickering activity log during streaming

…able tool details - Add scripts/categorize_diagnostic_logs.py: standalone Python tool to parse/categorize diagnostic session logs into UI rendering zones (main panel, work block, hidden) with --timeline, --json, --validate modes - Add ui/desktop/src/utils/diagnosticLogParser.ts: TypeScript utility mirroring the Python categorization for use in the UI. Supports parseLogLines(), parseSession(), toMessages() to reconstruct sessions from JSONL diagnostic logs - Enhance ActivityStep: now expandable with tool arguments (key-value), result text (with truncation/expand), and error messages. Collapsed view unchanged. - Add ThinkingEntry component: renders chain-of-thought text as italic plain text, visually distinct from tool activity blocks - Enhance ReasoningDetailPanel: pairs tool requests with their responses by matching IDs across messages, builds rich timeline of interleaved thinking + tool activity - Add comprehensive tests: - 22 diagnosticLogParser tests (zone mapping, parsing, sessions, toMessages) - 16 ActivityStep + ThinkingEntry component tests (rendering, expand/collapse, args) - 20 ReasoningDetailPanel tests (buildToolResponseMap, extractActivityEntries) Categories mapped to UI zones: Main Panel: USER_INPUT, ASSISTANT_TEXT, STREAMING_CHUNK Work Block: TOOL_REQUEST, TOOL_RESULT, INTERNAL_WORK Reasoning: THINKING Hidden: SYSTEM_INFO, TITLE_GENERATION, USAGE_STATS

Major fix: pushMessage now concatenates streaming text deltas instead of replacing the last message. The server sends each streaming chunk as a separate Message with the same ID containing only the delta text (a few tokens), not the full accumulated text. The old code replaced the entire last message with each incoming chunk, causing only the final chunk to be displayed (the 'single dot' bug where 843 chunks of a response resulted in just '.' being shown). Minor fix: WorkBlockIndicator one-liner now prefers showing the latest assistant thinking text (e.g. 'I'll start by analyzing...') over tool call descriptions (e.g. 'running command...'), giving better context about what the agent is doing. Includes 11 tests for pushMessage covering accumulation, interleaved messages, metadata preservation, and the exact 'single dot' regression scenario.

…ing work blocks During multi-message streaming (tool calls + text), the pure-text final answer message is now shown outside the work block immediately, so users can read the response as it streams token-by-token while tool calls remain collapsed above. Previously, ALL messages were hidden behind the WorkBlockIndicator during streaming, and text only appeared after the entire response finished. Key changes: - identifyWorkBlocks: During streaming, pure-text messages are identified as final answers for progressive rendering. Text+tool messages stay collapsed. - If the LLM later adds tool calls to a pure-text message, identifyWorkBlocks re-runs and absorbs it back into the work block automatically. - Updated tests to verify progressive rendering behavior.

…ioning - MainPanelLayout: change h-dvh to h-full so the panel respects flex constraints and doesn't extend behind GlobalChatInput - ProgressiveMessageList: broaden showPendingIndicator to remain visible whenever streaming is active (not just before first assistant message), so the activity indicator stays visible under the last user message - useSandboxBridge: prefix unused resourceUri with underscore

- resultsCache: add LRU eviction (max 5 sessions) and only cache when idle to prevent caching on every streaming chunk - pushMessage: mutate array in-place instead of copying on every chunk, reducing O(n²) allocations during streaming - maybeUpdateUI: spread currentMessages when dispatching to React so state changes are detected, and fix reduced-motion branch to actually batch updates instead of dispatching immediately (defeating batching)

lifeizhou-ap · 2026-03-04T22:27:29Z

Hey @bioinfornatics 👋 Just checking in on this draft. Are you still actively working on it? If not, would you mind closing it? We can always reopen later if you pick it back up. Thanks!

SOTA-aligned improvements based on multi-agent AI research (2025): Orchestrator Prompts (Anthropic 2025 best practices): - Rewrite system.md, routing.md, splitting.md with XML-structured tags - XML agent catalog in build_catalog_text() (<agent>, <mode>, <use_when>) - Filter internal modes from routing prompts and user-facing catalogs - Add explicit mode validation ('Do NOT invent new modes') AGENTS.md (Linux Foundation standard): - Enhance root AGENTS.md with architecture docs, routing flow, A2A, design decisions - Add nested crates/goose/src/agents/AGENTS.md (agent system docs) - Add nested crates/goose/src/prompts/AGENTS.md (template conventions) Tests: 42 pass (24 orchestrator + 13 intent_router + 5 dispatch) Clippy: 0 warnings

Add embedding-based semantic routing between keyword matching (<10ms) and LLM-as-Judge (~1-5s), providing ~100ms routing that's 50x faster than LLM and far more robust than keywords. Architecture (3-tier hybrid routing): Layer 1: Keyword matching (IntentRouter, <10ms) Layer 2: TF-IDF cosine similarity (SemanticRouter, ~1ms) [NEW] Layer 3: LLM-as-Judge (OrchestratorAgent, ~1-5s) SemanticRouter implementation: - TF-IDF vectorization with smoothed IDF weighting - Cosine similarity matching against pre-computed route vectors - Minimal English suffix stemmer (no external dependencies) - Stop word filtering, configurable similarity threshold (0.15) - Top matching terms for explainability - Zero external dependencies — pure Rust IntentRouter integration: - SemanticRouter field auto-built from agent slots - Rebuilt on slot add/remove/enable changes - Semantic layer activated when keyword score < 0.2 threshold - Tracing spans record routing strategy for observability Files: - NEW: crates/goose/src/agents/semantic_router.rs (607 lines) - MOD: crates/goose/src/agents/intent_router.rs (+150 lines) - MOD: crates/goose/src/agents/mod.rs (module registration) Tests: 51 pass (12 semantic_router + 15 intent_router + 24 orchestrator_agent) SOTA ref: Semantic Router pattern (Aurelio Labs), hybrid routing architecture

@theme

…ntrast Root cause: hover:bg-background-danger-muted was undefined (no CSS variable), causing the button to lose its background on hover and expose the card bg (#3f434b), which fails WCAG AA contrast with text-danger (#ff6b6b) at 3.58:1. Fix: - Add --background-danger-muted, --background-success-muted, --background-warning-muted tokens to both light and dark themes in main.css - Add corresponding @theme inline aliases for Tailwind class generation - Change Delete button border from border-default to border-danger for better visual affordance - Light mode tokens use near-white tints (#fff5f5, #e6f4ea, #fff8e1) - Dark mode tokens use solid dark-tinted colors (#3d2222, #1e2e1a, #302a18) Contrast audit (all PASS WCAG AA ≥4.5:1): - Dark default: #ff6b6b on #22252a = 5.54:1 - Dark hover: #ff6b6b on #3d2222 = 5.22:1 - Light default: #d32f2f on #ffffff = 4.98:1 - Light hover: #d32f2f on #fff5f5 = 4.65:1 Fixes: goose4-ntlm

…ix noExplicitAny/noNonNullAssertion - Auto-fix import organization across 112 files (biome organizeImports) - Add type='button' to mock buttons in ProviderGrid.test.tsx (useButtonType) - Replace 'as any' with typed cast in ModelAndProviderContext.test.tsx (noExplicitAny) - Replace non-null assertion with null guard in ModelAndProviderContext.test.tsx (noNonNullAssertion) - Add QA and Security debug prompt templates - Result: 0 biome errors, 0 biome warnings across 484 files - All 540 UI tests pass

…s routes - universal_mode: richer when_to_use descriptions for all 5 modes - goose/review.md, pm/review.md, pm/write.md: enhanced prompt templates - security/ask.md, security/write.md: improved security agent prompts - goose_agent, qa_agent, security_agent: agent refinements - analytics.rs: new analytics route endpoints - agent_management.rs: agent management improvements - tool_analytics.rs: analytics tracking updates - prompt_template.rs: template rendering improvements

…mprovements - EvalRunner, AgentCatalog, RoutingInspector: enhanced analytics UI - ModelAndProviderContext: improved provider state management - useChatStream: streaming enhancements - AppSidebar, SessionListView, RecipesView: navigation refinements - SettingsView, ConfigSettings, TelemetrySettings: settings improvements - SwitchModelModal, ModelsBottomBar: model switching updates - ChatInput, BaseChat: chat UX improvements - AppLayout: layout refinements - main.ts: electron main process updates - Various component and toast improvements

…sion guards Routing quality improvements: - Enrich PM Agent description with RICE, MoSCoW, sprint planning, acceptance criteria, phased rollout vocabulary — PM accuracy 28.6% → 71.4% - Enrich Research Agent description with literature review, benchmarking, RFC summaries, concept explanation — Research accuracy 0% → 50% - Agent-level accuracy: 58% → 70% New eval regression guards: - test_agent_level_accuracy_baseline: ≥60% agent accuracy - test_pm_routing_baseline: ≥50% PM routing - test_research_routing_baseline: ≥30% Research routing - test_semantic_layer_used: ≥3 cases routed via semantic layer Total: 62 routing tests pass (12 semantic + 15 intent_router + 11 eval + 24 orchestrator)

…aggregation prompt SOTA E1 improvements: 1. GenUI Cross-Agent Binding - Add genui to Developer Agent recommended_extensions (ask, write, debug modes) - Add genui to QA Agent recommended_extensions - Add genui to Research Agent recommended_extensions - Any agent can now produce data visualizations via genui tools 2. Adaptive Thinking Debug Prompts (Anthropic 2025) - Developer debug: hypothesis matrix, 5 Whys, fault tree, interleaved thinking - QA debug: flaky test decision tree, test isolation, failure classification - Security debug: attack vector tree, incident timeline, blast radius assessment - All 3 include anti-overthinking guard and effort calibration 3. Compound Task Result Aggregation - New orchestrator/aggregation.md prompt template - XML-structured with synthesis instructions - Produces unified responses instead of concatenated parts All 1029 tests pass, clippy clean, fmt clean.

…dispatch - Add aggregate_results_with_llm() to orchestrator_agent.rs Uses orchestrator/aggregation.md prompt template to synthesize multiple sub-task results into a coherent unified response via LLM Falls back to simple string concatenation on any error - Register orchestrator/aggregation.md in prompt_template.rs Template with {{task_count}}, {{user_message}}, {{results}} variables - Wire LLM aggregation into reply.rs compound dispatch flow When provider available, uses LLM synthesis; otherwise falls back to aggregate_results() simple concatenation Quality: 1029 tests pass, clippy clean, fmt clean

Add ProjectAgentConfig system for per-project agent customization: agent_config.rs (418 lines): - Load .goose/agents.yaml with serde deserialization - Enable/disable agents, override descriptions, add extensions - Custom mode creation (with slug, name, description, tool_groups) - Routing feedback persistence (.goose/routing_feedback.json) - Mode override (description, when_to_use per mode per agent) - 6 tests covering parsing, loading, applying, feedback, custom modes intent_router.rs: - apply_project_config() integration method - Project-level default_agent/default_mode fallback in route() - 3 new tests for config integration (disable, default, custom mode) YAML schema supports: default_agent: 'Developer Agent' default_mode: 'write' agents: 'Developer Agent': enabled: true description: 'Custom description' extra_extensions: ['flutter-tools'] modes: write: when_to_use: 'When creating Flutter widgets' custom_modes: - slug: 'data-pipeline' name: 'Data Pipeline' description: 'Build and debug data pipelines' agents: ['Developer Agent'] tool_groups: ['read', 'edit', 'command']

Layer 0 in the 4-tier hybrid routing architecture: [0] Feedback corrections (learned, 0.95 confidence) [1] Keyword match (<10ms) [2] TF-IDF semantic (~1ms) [3] Default fallback - Add routing_feedback field to IntentRouter - check_feedback() uses keyword overlap (≥50%) to find matching corrections - record_routing_feedback() stores user corrections for similar future queries - Feedback takes highest priority (Layer 0) — if a user previously corrected routing for a similar message, we trust that correction - Integrates with .goose/agents.yaml routing_feedback persistence - 2 new tests: feedback override + unrelated message non-match - All 1040 tests pass, clippy clean

- GET /agent-config — load current project agent config - PUT /agent-config — save updated config to .goose/agents.yaml - GET /agent-config/routing-feedback — list routing corrections - POST /agent-config/routing-feedback — record new routing correction Server backbone for the Agent Config UI feature. Closes goose4-j6mp.

- knowledge_extraction.rs: Extract structured KG entities and relations from conversation text via LLM-powered prompt - Entity types: Concept, Component, Decision, Finding, Risk, RepoPath - Relation types: depends_on, implements, affects, derived_from, etc. - Features: merge/dedup, confidence filtering, JSON fence parsing, cap at 20 - knowledge_extraction.md: Prompt template for entity/relation extraction - Registered in TEMPLATE_REGISTRY - 7 tests covering parsing, merging, filtering, caps, JSON extraction Part of SOTA E6: GraphRAG-style knowledge management

bioinfornatics · 2026-03-06T11:30:23Z

yes sorry I pushed my test on the wrong repo . It is for testing purpose. maybe I will split and push features later.

bioinfornatics force-pushed the feature/cli-via-goosed branch 6 times, most recently from e8dc59e to 4b68302 Compare February 17, 2026 12:59

bioinfornatics force-pushed the feature/cli-via-goosed branch from 059ea64 to 6604f1c Compare February 17, 2026 16:41

bioinfornatics added 21 commits February 19, 2026 15:08

bioinfornatics added 14 commits March 3, 2026 19:55

desktop: overlay project hover actions

c9c33c9

desktop: show user name in sidebar footer

62a4fe0

desktop: live-update activity panel during streaming

6097de6

desktop: populate activity panel from streaming tool calls

9addc56

desktop: decouple activity panel detail messages from inline preview

eb3edbd

desktop: avoid marking completed tools as active in activity panel

ecb4f1d

desktop: add backend activity events to reasoning panel

7c14e88

server: stream routing/orchestrator activity events

b7efde6

server: stream per-subtask dispatch activity events

e2b0ebc

desktop: persist goose/activity events and use for workblock summary

c7b7be5

desktop: add activity observability and tests

2ad8c04

desktop: robustly parse goose/activity notifications

6a0f743

desktop: fix activityEvents wiring and test typing

0cf69ec

server: emit routing activity notifications for non-compound runs

a612fec

bioinfornatics added 13 commits March 5, 2026 01:15

jamadeo closed this Mar 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: CLI-via-Goosed unified agent architecture with multi-agent routing#7238

feat: CLI-via-Goosed unified agent architecture with multi-agent routing#7238
bioinfornatics wants to merge 525 commits intoblock:mainfrom
bioinfornatics:feature/cli-via-goosed

bioinfornatics commented Feb 16, 2026

Uh oh!

michaelneale commented Feb 16, 2026

Uh oh!

DOsinga commented Feb 17, 2026

Uh oh!

lifeizhou-ap commented Mar 4, 2026

Uh oh!

bioinfornatics commented Mar 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

bioinfornatics commented Feb 16, 2026

CLI-via-Goosed: Unified Agent Architecture

Summary

Key Changes

🏗️ Architecture: CLI-via-Goosed

🤖 Multi-Agent System

📡 Protocol Compatibility

📊 Analytics & Observability

🖥️ UI Improvements

🔒 Security & Reliability

Quality Gates

New Test Coverage

Routing Evaluation Baseline

Files Changed

Follow-up Work

Uh oh!

michaelneale commented Feb 16, 2026

Uh oh!

DOsinga commented Feb 17, 2026

Uh oh!

lifeizhou-ap commented Mar 4, 2026

Uh oh!

bioinfornatics commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

bioinfornatics commented Mar 6, 2026 •

edited

Loading