bug-ops
diff --git a/‎CHANGELOG.md‎
Lines changed: 4 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎crates/zeph-core/src/agent/mod.rs‎
Lines changed: 83 additions & 15 deletions b/‎crates/zeph-core/src/agent/mod.rs‎
Lines changed: 83 additions & 15 deletions
@@ -37,6 +37,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
 
 ### Added
 
+- **#1515**: Add `SubAgentError::ConcurrencyLimit { active: usize, max: usize }` variant to replace the fragile `Spawn(String)` concurrency message. `record_spawn_failure()` now accepts `&SubAgentError` and uses a typed `matches!` check instead of string matching. Both `spawn()` and `resume()` in `SubAgentManager` emit the new variant. Callers pass `&e` instead of `&e.to_string()`.
+- **#1516**: Add three edge-case tests for `DagScheduler` concurrency-deferral: running task is unaffected when a concurrent task defers (`test_concurrency_deferral_does_not_affect_running_task`), `max_parallel=0` stalls the scheduler without triggering deadlock detection (`test_max_concurrent_zero_no_infinite_loop`), and all tasks deferring with `ConcurrencyLimit` keep the graph in `Running` and retry on the next tick (`test_all_tasks_deferred_graph_stays_running`).
+- **#1457**: Add `plan_cancel_token: Option<CancellationToken>` to `Agent`. A fresh token is created in `handle_plan_confirm()` and passed into `run_scheduler_loop()`. The tick loop adds a `tokio::select!` branch on `cancel_token.cancelled()` at `wait_event()` (calls `cancel_all()` and breaks) and wraps `RunInline` execution so it can be interrupted. `handle_plan_cancel()` fires the token if a plan is in flight. `plan_cancel_token` is always cleared in both `Ok` and `Err` paths to prevent stale-token bugs. **Known limitation**: the delivery path for `/plan cancel` during active execution requires restructuring the agent message loop (SEC-M34-002; currently only reachable from concurrent-reader channels such as Telegram).
+
 - **#1551**: Remove the `index` feature flag — `zeph-index` and `tree-sitter` are now always-on base dependencies. All `#[cfg(feature = "index")]` guards are removed from `zeph-core`, `zeph` binary, and `lsp_hooks/hover.rs`. The `index` entry is removed from root `Cargo.toml` `[features]` and `full` feature list, and from `zeph-core/Cargo.toml`. Tree-sitter and code index functionality is always compiled; no feature gating required.
 - **#1554**: Decouple repo map injection from Qdrant retriever. `IndexState` now populates `repo_map_tokens`/`repo_map_ttl` independently via `AgentBuilder::with_repo_map()`. The repo map is injected into the system prompt whenever `repo_map_tokens > 0`, regardless of whether a Qdrant-backed `CodeRetriever` is available. Semantic code RAG via Qdrant is unaffected and still requires the retriever. The `apply_code_index()` bootstrap function now configures repo map for all providers (including Claude/OpenAI with native `tool_use`), then skips only the Qdrant retriever setup for tool-use providers. `apply_config()` hot-reload now correctly refreshes both `repo_map_tokens` and `repo_map_ttl`. Fixes silent repo map omission for the most common provider configurations.
 - **#1552**: Replace heuristic AST walking in `generate_repo_map()` with tree-sitter ts-query extraction. New public types in `zeph-index`: `SymbolInfo`, `SymbolKind`, `Visibility`, and `extract_symbols()`. `Lang::symbol_query()` and `Lang::method_query()` provide lazily-compiled `LazyLock<Query>` per language (Rust, Python, JS, TS, Go). Visibility is parsed from `visibility_modifier` node text: `pub`→Public, `pub(crate)`→Crate, `pub(super|in …)`→Restricted, absent→Private. Query compilation failures log a warning and return `None` (no panics); heuristic extraction serves as fallback. Repo map output now includes visibility and 1-based line numbers per symbol (e.g. `pub fn:hello(1)`, `impl:Foo(5){pub fn:bar}`). Token budget behaviour is preserved with the new format. `zeph-index::languages` is now a public module.
 
@@ -247,6 +247,15 @@ pub struct Agent<C: Channel> {
     pending_image_parts: Vec<zeph_llm::provider::MessagePart>,
     /// Graph waiting for `/plan confirm` before execution starts.
     pub(super) pending_graph: Option<crate::orchestration::TaskGraph>,
+    /// Cancellation token for the currently executing plan. `None` when no plan is running.
+    /// Created fresh in `handle_plan_confirm()`, cancelled in `handle_plan_cancel()`.
+    ///
+    /// # Known limitation
+    ///
+    /// Token plumbing is ready; the delivery path requires the agent message loop to be
+    /// restructured so `/plan cancel` can be received while `run_scheduler_loop` holds
+    /// `&mut self`. See follow-up issue (SEC-M34-002).
+    plan_cancel_token: Option<CancellationToken>,
 
     /// LSP context injection hooks. Fires after native tool execution, injects
     /// diagnostics/hover notes as `Role::System` messages before the next LLM call.
@@ -452,6 +461,7 @@ impl<C: Channel> Agent<C> {
             },
             pending_image_parts: Vec::new(),
             pending_graph: None,
+            plan_cancel_token: None,
 
             #[cfg(feature = "lsp-context")]
             lsp_hooks: None,
@@ -664,7 +674,18 @@ impl<C: Channel> Agent<C> {
             ))
             .await?;
 
-        let final_status = self.run_scheduler_loop(&mut scheduler, task_count).await?;
+        let plan_token = CancellationToken::new();
+        self.plan_cancel_token = Some(plan_token.clone());
+
+        // Use match instead of ? so plan_cancel_token is always cleared (CRIT-07).
+        let scheduler_result = self
+            .run_scheduler_loop(&mut scheduler, task_count, plan_token)
+            .await;
+        self.plan_cancel_token = None;
+        let final_status = match scheduler_result {
+            Ok(s) => s,
+            Err(e) => return Err(e),
+        };
 
         let completed_graph = scheduler.into_graph();
 
@@ -693,14 +714,17 @@ impl<C: Channel> Agent<C> {
     /// # Known limitations
     ///
     /// The agent is single-threaded; this loop blocks all message processing while
-    /// running. `/plan cancel` cannot interrupt an active execution. A future phase
-    /// will add a `CancellationToken` field to `Agent` and wire it into this loop.
-    /// (SEC-M34-001, tracked in GitHub issue.)
+    /// running. The `cancel_token` parameter wires cancellation into the tick loop at
+    /// `wait_event()` and `RunInline` boundaries. However, `/plan cancel` cannot deliver
+    /// the token signal while `run_scheduler_loop` holds `&mut self` — the agent command
+    /// dispatch is paused. The token plumbing is in place for a follow-up that restructures
+    /// the delivery path (SEC-M34-002).
     #[allow(clippy::too_many_lines)]
     async fn run_scheduler_loop(
         &mut self,
         scheduler: &mut crate::orchestration::DagScheduler,
         task_count: usize,
+        cancel_token: CancellationToken,
     ) -> Result<crate::orchestration::GraphStatus, error::AgentError> {
         use crate::orchestration::SchedulerAction;
 
@@ -762,7 +786,7 @@ impl<C: Channel> Agent<C> {
                             }
                             Err(e) => {
                                 tracing::error!(error = %e, %task_id, "spawn_for_task failed");
-                                let extra = scheduler.record_spawn_failure(task_id, &e.to_string());
+                                let extra = scheduler.record_spawn_failure(task_id, &e);
                                 for a in extra {
                                     match a {
                                         SchedulerAction::Cancel { agent_handle_id } => {
@@ -822,14 +846,23 @@ impl<C: Channel> Agent<C> {
 
                         let event_tx = scheduler.event_sender();
                         let max_iter = self.tool_orchestrator.max_iterations;
-                        let outcome = match self.run_inline_tool_loop(&prompt, max_iter).await {
-                            Ok(output) => crate::orchestration::TaskOutcome::Completed {
-                                output,
-                                artifacts: vec![],
-                            },
-                            Err(e) => crate::orchestration::TaskOutcome::Failed {
-                                error: e.to_string(),
-                            },
+                        let outcome = tokio::select! {
+                            result = self.run_inline_tool_loop(&prompt, max_iter) => {
+                                match result {
+                                    Ok(output) => crate::orchestration::TaskOutcome::Completed {
+                                        output,
+                                        artifacts: vec![],
+                                    },
+                                    Err(e) => crate::orchestration::TaskOutcome::Failed {
+                                        error: e.to_string(),
+                                    },
+                                }
+                            }
+                            () = cancel_token.cancelled() => {
+                                crate::orchestration::TaskOutcome::Failed {
+                                    error: "canceled".to_string(),
+                                }
+                            }
                         };
                         let event = crate::orchestration::TaskEvent {
                             task_id,
@@ -860,7 +893,33 @@ impl<C: Channel> Agent<C> {
                 m.orchestration_graph = Some(snapshot);
             });
 
-            scheduler.wait_event().await;
+            tokio::select! {
+                () = cancel_token.cancelled() => {
+                    let cancel_actions = scheduler.cancel_all();
+                    for action in cancel_actions {
+                        match action {
+                            SchedulerAction::Cancel { agent_handle_id } => {
+                                if let Some(mgr) = self.subagent_manager.as_mut() {
+                                    let _ = mgr.cancel(&agent_handle_id).inspect_err(|e| {
+                                        tracing::trace!(
+                                            error = %e,
+                                            "cancel during plan cancellation: agent already gone"
+                                        );
+                                    });
+                                }
+                            }
+                            SchedulerAction::Done { status } => {
+                                break 'tick status;
+                            }
+                            SchedulerAction::Spawn { .. } | SchedulerAction::RunInline { .. } => {}
+                        }
+                    }
+                    // Defensive fallback: cancel_all always emits Done, but guard against
+                    // future changes.
+                    break 'tick crate::orchestration::GraphStatus::Canceled;
+                }
+                () = scheduler.wait_event() => {}
+            }
         };
 
         // Final drain: if the loop exited via Done on the first tick, secret
@@ -1178,7 +1237,16 @@ impl<C: Channel> Agent<C> {
         &mut self,
         _graph_id: Option<&str>,
     ) -> Result<(), error::AgentError> {
-        if self.pending_graph.take().is_some() {
+        if let Some(token) = self.plan_cancel_token.take() {
+            // In-flight plan: signal cancellation. The scheduler loop will pick this up
+            // in the next tokio::select! iteration at wait_event().
+            // NOTE: Due to &mut self being held by run_scheduler_loop, this branch is only
+            // reachable if the channel has a concurrent reader (e.g. Telegram, TUI events).
+            // CLI and synchronous channels cannot deliver this while the loop is active
+            // (SEC-M34-002).
+            token.cancel();
+            self.channel.send("Canceling plan execution...").await?;
+        } else if self.pending_graph.take().is_some() {
             let now = std::time::Instant::now();
             self.update_metrics(|m| {
                 if let Some(ref mut s) = m.orchestration_graph {