Conversation
This was
linked to
issues
Mar 28, 2026
…entinel, TurnCausalAnalyzer (#2193, #2208, #2335) Add three complementary layers to Zeph's indirect prompt injection defense: - InjectionEnforcementMode (Warn/Block) for DeBERTa classifier: default Warn mode returns Suspicious instead of Blocked, preventing high FPR from off-the-shelf models from disrupting legitimate tool operations. All fallback paths (regex, error, timeout) respect the enforcement mode via regex_verdict() helper. - CandleThreeClassClassifier: two-stage pipeline that runs binary detection first, then refines positive hits with a three-class model (misaligned-instruction / aligned-instruction / no-instruction). Aligned instructions are downgraded to Clean, substantially reducing FPR. Dynamic id2label from model config.json; load failures allow retry (not permanently cached). - TurnCausalAnalyzer: per-batch LLM probes at tool-return boundaries compute behavioral deviation via normalized Levenshtein + Jaccard. Probe responses bounded by probe_max_chars. Never blocks — emits metric ipi.causal_deviation and SecurityEvent on threshold crossing. Config: [security.causal_ipi] enabled=false, threshold=0.7, provider="fast". Bootstrap wiring in runner.rs, daemon.rs, acp.rs via apply_enforcement_mode(), apply_three_class_classifier(), apply_causal_analyzer(). Pre-existing zeph-sanitizer/guardrail feature forwarding bug fixed as a side effect.
a6d9992 to
d432809
Compare
… unused-variable warning in bundle checks
bug-ops
added a commit
that referenced
this pull request
Mar 30, 2026
…PI duplication
- Populate InitializeResponse.auth_methods with [{type: agent, id: zeph}] using
the typed builder; previously returned authMethods: [] which blocked ACP Registry
inclusion (#2422)
- Serve GET /agent.json with agent identity manifest (id, name, version, description,
distribution) for ACP Registry discovery; gated on discovery_enabled (#2422)
- Extract apply_three_class_classifier_with_cfg and apply_causal_analyzer_with_cfg
helpers in agent_setup.rs; acp.rs now delegates instead of inlining construction
eliminating the DRY gap from #2369 (#2370)
- discovery.rs already reflects ProtocolVersion::LATEST since PR #2423 (#2412)
Closes #2422, closes #2370
4 tasks
bug-ops
added a commit
that referenced
this pull request
Mar 30, 2026
…PI duplication (#2431) - Populate InitializeResponse.auth_methods with [{type: agent, id: zeph}] using the typed builder; previously returned authMethods: [] which blocked ACP Registry inclusion (#2422) - Serve GET /agent.json with agent identity manifest (id, name, version, description, distribution) for ACP Registry discovery; gated on discovery_enabled (#2422) - Extract apply_three_class_classifier_with_cfg and apply_causal_analyzer_with_cfg helpers in agent_setup.rs; acp.rs now delegates instead of inlining construction eliminating the DRY gap from #2369 (#2370) - discovery.rs already reflects ProtocolVersion::LATEST since PR #2423 (#2412) Closes #2422, closes #2370
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements three complementary IPI defense enhancements from research issues #2193, #2208, #2335:
InjectionEnforcementMode(Warn/Block) — defaultWarnreturnsSuspiciousinstead ofBlocked, preventing high DeBERTa FPR from blocking legitimate tool operations. All fallback paths (regex, error, timeout) respect the mode viaregex_verdict()helper.CandleThreeClassClassifiertwo-stage pipeline — binary detection first, three-class refinement (misaligned/aligned/no-instruction) on positive hits. Aligned instructions downgraded to Clean. Dynamicid2labelfrom model config.json; load failures allow retry.TurnCausalAnalyzer— per-batch LLM probes at tool-return boundaries, local Levenshtein+Jaccard deviation scoring. Never blocks; emitsipi.causal_deviationmetric + SecurityEvent on threshold crossing. Config:[security.causal_ipi].Bootstrap wiring added to
runner.rs,daemon.rs,acp.rs. Pre-existingzeph-sanitizer/guardrailfeature forwarding bug fixed as a side effect.Test plan
cargo +nightly fmt --checkpassescargo clippy --workspace --features full -- -D warnings— 0 warningscargo nextest run --workspace --features full --lib --bins— 7073/7073 passedclassify_injection_warn_mode_above_threshold_returns_suspiciousFollow-up issues to file
acp.rsinlines wiring instead of shared helpers (DRY gap)NoInstructiondowngrade confidence threshold (IM-1)SNIPPET_MAX_BYTESdoc says "chars" not "bytes" (IM-2)Suspiciousverdict not in SecurityEvent enum