-
Notifications
You must be signed in to change notification settings - Fork 2
test(candle): add integration tests for Candle-backed classifier models #2190
Copy link
Copy link
Closed
Labels
P2High value, medium complexityHigh value, medium complexityenhancementNew feature or requestNew feature or request
Description
Context
Issue #2185 proposes replacing regex heuristics with Candle-backed lightweight classifiers. The research (CI-178/CI-180) confirmed:
candle_transformers::models::deberta_v2supports both sequence classification and NEREmbedModelincandle_provider/embed.rsalready covers 90% of the boilerplate- parry-guard (vaporif/parry-guard) proves the pipeline works in Rust today
Required Tests
1. ClassifierModel unit tests (once implemented per #2185)
- Load
protectai/deberta-v3-small-prompt-injection-v2via Candle deberta_v2 module - Score a known injection string (should exceed threshold)
- Score a benign string (should be below threshold)
- Verify tokenizer handles SentencePiece correctly
2. PII/NER model tests
- Load
iiiorg/piiranha-v1-detect-personal-informationvia Candle deberta_v2 NER mode - Token-level label output: verify EMAIL/PHONE/SSN entities are tagged correctly
- Multi-language test: one English, one German PII phrase
3. Performance benchmarks (criterion)
- Candle vs ort path latency for injection detection
- Cold start (model load) vs warm inference latency
- CPU-only target (no Metal/CUDA for CI)
4. Integration with FeedbackDetector
detector_mode = "model"routes through zero-shot provider path- Fallback to regex on provider timeout
5. Regression tests for #2189
- Verify
cargo nextest run -p zeph-llm --features candle --libcompiles and passes after bug(candle): --features candle alone fails to compile (MessageMetadata unresolved) #2189 fix
Notes
- Tests requiring model downloads (
#[ignore]) — run manually or in a separate CI job with HF cache - CPU-only inference must complete in < 500ms per call in CI
- Future: many specialized lightweight models per task (strategic direction CI-178)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P2High value, medium complexityHigh value, medium complexityenhancementNew feature or requestNew feature or request