Skip to content

feat(tools): tool invocation phase taxonomy and reasoning model hallucination detection#2354

Merged
bug-ops merged 1 commit intomainfrom
tool-hallucination-detection
Mar 28, 2026
Merged

feat(tools): tool invocation phase taxonomy and reasoning model hallucination detection#2354
bug-ops merged 1 commit intomainfrom
tool-hallucination-detection

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Mar 28, 2026

Summary

  • Adds ToolInvocationPhase enum (Setup/ParamHandling/Execution/ResultInterpretation) to zeph-tools, mapping all ToolErrorCategory variants to their diagnostic phase per arXiv:2601.16280
  • Adds error_phase: Option<String> to AuditEntry for phase-level failure clustering in [tools.audit] output
  • Adds tool_provider field to OrchestrationConfig for routing tool-heavy tasks to reliability-optimized models (prefer qwen2.5:14b-equivalent)
  • Adds is_reasoning_model() helper detecting o1/o3/o4-mini, QwQ, DeepSeek-R1, and Claude extended-thinking model families per arXiv:2510.22977
  • Adds AnomalyDetector::record_reasoning_quality_failure() — records quality failures from reasoning models in the anomaly window and emits a reasoning_amplification WARN
  • Adds reasoning_model_warning: bool flag to AnomalyConfig (default: true)

Closes #2234, #2284

Test plan

  • cargo nextest run --workspace --features full --lib --bins — 7049/7049 pass
  • cargo clippy --workspace --features full -- -D warnings — clean
  • cargo +nightly fmt --check — clean

…cination detection

Add ToolInvocationPhase enum (Setup/ParamHandling/Execution/ResultInterpretation)
mapping all ToolErrorCategory variants to their diagnostic phase per arXiv:2601.16280.
Adds error_phase field to AuditEntry for phase-level failure clustering in audit output.
Adds tool_provider to OrchestrationConfig for routing tool-heavy tasks to reliable models.

Add is_reasoning_model() helper detecting o1/o3/o4, QwQ, DeepSeek-R1, and Claude
extended-thinking models per arXiv:2510.22977. AnomalyDetector gains
record_reasoning_quality_failure() which counts quality failures from reasoning models
in the anomaly window and emits a reasoning_amplification WARN. Adds
reasoning_model_warning flag to AnomalyConfig (default: true).

Closes #2234, #2284
@github-actions github-actions bot added documentation Improvements or additions to documentation rust Rust code changes core zeph-core crate enhancement New feature or request size/L Large PR (201-500 lines) labels Mar 28, 2026
@bug-ops bug-ops enabled auto-merge (squash) March 28, 2026 17:01
@bug-ops bug-ops merged commit 73444ee into main Mar 28, 2026
25 checks passed
@bug-ops bug-ops deleted the tool-hallucination-detection branch March 28, 2026 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core zeph-core crate documentation Improvements or additions to documentation enhancement New feature or request rust Rust code changes size/L Large PR (201-500 lines)

Projects

None yet

1 participant