Skip to content

fix(context-compression): make extract_task_goal fire-and-forget (#1909)#1922

Merged
bug-ops merged 1 commit intomainfrom
1909-extract-task-goal
Mar 16, 2026
Merged

fix(context-compression): make extract_task_goal fire-and-forget (#1909)#1922
bug-ops merged 1 commit intomainfrom
1909-extract-task-goal

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Mar 16, 2026

Summary

Root cause

extract_task_goal() called summary_or_primary_provider().chat() with a hardcoded 5s timeout inside the Soft compaction hot path. Cloud API latency regularly exceeds 5s under concurrent inference, so it always timed out — leaving current_task_goal = None and falling back to prune_tool_outputs_oldest_first().

Test plan

  • cargo +nightly fmt --check — clean
  • cargo clippy --features full --workspace -- -D warnings — clean
  • cargo nextest run --config-file .github/nextest.toml --workspace --features full --lib --bins — 6047 passed, 12 skipped
  • Live session test with pruning_strategy = "task_aware" and a cloud provider to verify no timeout log entries and goal is populated after the second compaction

@github-actions github-actions bot added documentation Improvements or additions to documentation rust Rust code changes core zeph-core crate bug Something isn't working size/M Medium PR (51-200 lines) labels Mar 16, 2026
@bug-ops bug-ops force-pushed the 1909-extract-task-goal branch from aed7116 to 8ba1749 Compare March 16, 2026 17:21
@bug-ops bug-ops enabled auto-merge (squash) March 16, 2026 17:21
@github-actions github-actions bot added the llm zeph-llm crate (Ollama, Claude) label Mar 16, 2026
Previously, `maybe_refresh_task_goal()` called `extract_task_goal()` with a
blocking 5-second `.await` during every Soft tier compaction. On cloud providers
(OpenAI, Claude) where inference latency exceeds 5s under concurrent load, this
always timed out — adding 5s of user-visible latency and leaving
`current_task_goal = None`, making `task_aware`/`mig`/`task_aware_mig` strategies
functionally identical to `reactive`.

Fix: two-phase non-blocking design (mirrors `maybe_sidequest_eviction`):

- Phase 1 (apply): if the background task spawned last compaction has finished,
  apply its result to `current_task_goal` via `JoinHandle::is_finished()` +
  `now_or_never()`.
- Phase 2 (schedule): if the user message hash changed and no task is in-flight,
  spawn a new background `tokio::spawn`. Current compaction uses the cached goal
  from the previous turn — zero blocking.

Also raises the background task timeout from 5s to 30s, giving cloud providers
a realistic window to respond without ever blocking the agent loop.
@bug-ops bug-ops force-pushed the 1909-extract-task-goal branch from b9a8f57 to f2b5c94 Compare March 16, 2026 17:29
@github-actions github-actions bot removed the llm zeph-llm crate (Ollama, Claude) label Mar 16, 2026
@bug-ops bug-ops merged commit e8bfda3 into main Mar 16, 2026
20 checks passed
@bug-ops bug-ops deleted the 1909-extract-task-goal branch March 16, 2026 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working core zeph-core crate documentation Improvements or additions to documentation rust Rust code changes size/M Medium PR (51-200 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(context-compression): extract_task_goal blocks Soft tier for 5s on every compaction — task_aware/mig strategies effectively broken for cloud LLMs

1 participant