feat(classifiers): implement ClassifierMetrics with p50/p95 latency ring buffer#2291
Merged
feat(classifiers): implement ClassifierMetrics with p50/p95 latency ring buffer#2291
Conversation
…ing buffer (#2249, #2250) Add ClassifierMetrics struct in zeph-llm with per-ClassifierTask ring buffer (default capacity 100), p50/p95 computed on demand via nearest-rank algorithm. Metrics are emitted as tracing::debug! with structured fields after each classifier invocation and surfaced in the TUI resources panel. Wire Arc<ClassifierMetrics> through AgentBuilder into ContentSanitizer (injection, PII tasks) and LlmClassifier (feedback task). MetricsSnapshot gains a classifier: ClassifierMetricsSnapshot field pushed eagerly via push_classifier_metrics() at three sites in the agent loop. TUI resources panel renders a compact classifier section (calls, p50ms, p95ms per task) when at least one task has been invoked. Document FeedbackVerdict/JudgeVerdict coupling: add NOTE comments in both structs explaining the circular-dep mirror relationship (#2250). Add a cfg(test) serde round-trip test that deserializes JudgeVerdict JSON as FeedbackVerdict, breaking CI if fields diverge. Closes #2249 Closes #2250
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ClassifierMetricsstruct inzeph-llmwith per-ClassifierTaskVecDeque<Duration>ring buffer (capacity 100), p50/p95 computed on demand via nearest-rank formulaArc<ClassifierMetrics>throughAgentBuilderintoContentSanitizer(injection, PII) andLlmClassifier(feedback); metrics pushed eagerly toMetricsSnapshotat 3 sitescalls / p50ms / p95msper task) when at least one task has been invokedFeedbackVerdict/JudgeVerdictcoupling with// NOTE:comments; add#[cfg(test)]serde round-trip test that breaks CI if fields divergeCloses #2249
Closes #2250
Test plan
crates/zeph-llm/src/classifier/metrics.rs: ring buffer eviction, p50/p95 correctness, empty buffer, single sample, identical valuesfeedback_detector.rsforJudgeVerdict→FeedbackVerdictfield synccargo nextest run --workspace --features full --lib --bins— 6837 passed, 22 skippedcargo +nightly fmt --check— PASScargo clippy --workspace --features full -- -D warnings— PASS