-
Notifications
You must be signed in to change notification settings - Fork 2
perf(classifiers): candle classifier hardcoded to CPU — Metal/CUDA not used, model load takes >15s #2396
Copy link
Copy link
Closed
Labels
P2High value, medium complexityHigh value, medium complexitybugSomething isn't workingSomething isn't workingperformancePerformance improvementsPerformance improvements
Description
Summary
Both Candle-based classifiers (DeBERTa injection, piiranha NER) use Device::Cpu hardcoded in candle.rs:265. On Apple Silicon or CUDA hardware, Metal/GPU acceleration is never utilized, causing model load times exceeding 15s on production hardware.
Root Cause
// crates/zeph-llm/src/classifier/candle.rs:265
let device = Device::Cpu; // hardcoded — Metal/CUDA ignoredObserved Impact (CI-274, 2026-03-30, v0.18.0)
- piiranha NER model (1.1 GB): timed out with
timeout_ms = 15000(>15s load time on CPU) - Effectively non-functional in CLI testing on macOS Apple Silicon
- Falls back to regex-only PII detection on every session
- DeBERTa similarly would be CPU-only once HF auth is resolved
Expected Behavior
When the metal or cuda feature is enabled, use Device::new_metal(0) or Device::new_cuda(0) respectively, falling back to CPU on failure:
#[cfg(feature = "metal")]
let device = Device::new_metal(0).unwrap_or(Device::Cpu);
#[cfg(not(feature = "metal"))]
let device = Device::Cpu;Priority
P2 — ML classifiers are non-functional in default CPU builds with large models.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P2High value, medium complexityHigh value, medium complexitybugSomething isn't workingSomething isn't workingperformancePerformance improvementsPerformance improvements