perf(classifiers): candle classifier hardcoded to CPU — Metal/CUDA not used, model load takes >15s

## Summary

Both Candle-based classifiers (DeBERTa injection, piiranha NER) use `Device::Cpu` hardcoded in `candle.rs:265`. On Apple Silicon or CUDA hardware, Metal/GPU acceleration is never utilized, causing model load times exceeding 15s on production hardware.

## Root Cause

```rust
// crates/zeph-llm/src/classifier/candle.rs:265
let device = Device::Cpu;  // hardcoded — Metal/CUDA ignored
```

## Observed Impact (CI-274, 2026-03-30, v0.18.0)

- piiranha NER model (1.1 GB): timed out with `timeout_ms = 15000` (>15s load time on CPU)
- Effectively non-functional in CLI testing on macOS Apple Silicon
- Falls back to regex-only PII detection on every session
- DeBERTa similarly would be CPU-only once HF auth is resolved

## Expected Behavior

When the `metal` or `cuda` feature is enabled, use `Device::new_metal(0)` or `Device::new_cuda(0)` respectively, falling back to CPU on failure:

```rust
#[cfg(feature = "metal")]
let device = Device::new_metal(0).unwrap_or(Device::Cpu);
#[cfg(not(feature = "metal"))]
let device = Device::Cpu;
```

## Priority

P2 — ML classifiers are non-functional in default CPU builds with large models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(classifiers): candle classifier hardcoded to CPU — Metal/CUDA not used, model load takes >15s #2396

Summary

Root Cause

Observed Impact (CI-274, 2026-03-30, v0.18.0)

Expected Behavior

Priority

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

perf(classifiers): candle classifier hardcoded to CPU — Metal/CUDA not used, model load takes >15s #2396

Description

Summary

Root Cause

Observed Impact (CI-274, 2026-03-30, v0.18.0)

Expected Behavior

Priority

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions