Skip to content

perf(classifiers): candle classifier hardcoded to CPU — Metal/CUDA not used, model load takes >15s #2396

@bug-ops

Description

@bug-ops

Summary

Both Candle-based classifiers (DeBERTa injection, piiranha NER) use Device::Cpu hardcoded in candle.rs:265. On Apple Silicon or CUDA hardware, Metal/GPU acceleration is never utilized, causing model load times exceeding 15s on production hardware.

Root Cause

// crates/zeph-llm/src/classifier/candle.rs:265
let device = Device::Cpu;  // hardcoded — Metal/CUDA ignored

Observed Impact (CI-274, 2026-03-30, v0.18.0)

  • piiranha NER model (1.1 GB): timed out with timeout_ms = 15000 (>15s load time on CPU)
  • Effectively non-functional in CLI testing on macOS Apple Silicon
  • Falls back to regex-only PII detection on every session
  • DeBERTa similarly would be CPU-only once HF auth is resolved

Expected Behavior

When the metal or cuda feature is enabled, use Device::new_metal(0) or Device::new_cuda(0) respectively, falling back to CPU on failure:

#[cfg(feature = "metal")]
let device = Device::new_metal(0).unwrap_or(Device::Cpu);
#[cfg(not(feature = "metal"))]
let device = Device::Cpu;

Priority

P2 — ML classifiers are non-functional in default CPU builds with large models.

Metadata

Metadata

Assignees

Labels

P2High value, medium complexitybugSomething isn't workingperformancePerformance improvements

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions