Run AI models on-device. CoreML and MLX inference with one API, automatic Neural Engine benchmarking, streaming generation, and OTA model updates.
A Swift SDK that handles everything between your app and an on-device ML model. Load a CoreML or MLX model, get automatic Neural Engine vs CPU benchmarking, streaming token generation, telemetry, OTA updates, and smart cloud/device routing -- without writing any of that infrastructure yourself. 100% Swift, zero dependencies in the core module.
Swift Package Manager -- add to your Package.swift:
dependencies: [
.package(url: "https://github.com/octomil/octomil-ios.git", from: "1.1.0")
]Or in Xcode: File > Add Package Dependencies > paste https://github.com/octomil/octomil-ios.git.
Four library products available:
| Product | What it adds |
|---|---|
Octomil |
Core SDK -- CoreML inference, telemetry, model management. Zero dependencies. |
OctomilMLX |
MLX LLM inference via mlx-swift. |
OctomilTimeSeries |
Time series forecasting engine. |
OctomilClip |
App Clip pairing flow. |
import Octomil
let model = try await Deploy.model(
at: Bundle.main.url(forResource: "MobileNet", withExtension: "mlmodelc")!
)
let result = try model.predict(inputs: ["image": pixelBuffer])
print(model.activeDelegate) // "neural_engine" or "cpu" (auto-selected)
print(model.warmupResult!) // cold: 45ms, warm: 3ms, cpu: 12msDeploy.model loads the model, runs a warmup benchmark comparing Neural Engine vs CPU, and picks the fastest delegate. No configuration needed.
import OctomilMLX
let llm = try await Deploy.mlxModel(at: modelDirectory)
let (stream, getResult) = llm.predictStream(prompt: "Explain quicksort in Swift")
for try await chunk in stream {
print(String(data: chunk.data, encoding: .utf8)!, terminator: "")
}
let metrics = getResult() // ttfcMs, throughput, totalChunksAlready using MLModel directly? Wrap it in one line to get telemetry, input validation, and OTA updates with zero call-site changes:
// Before
let model = try MLModel(contentsOf: modelURL)
let output = try model.prediction(from: input)
// After
let model = try Octomil.wrap(MLModel(contentsOf: modelURL), modelId: "classifier")
let output = try model.predict(input: input) // same result, now with telemetry + OTAAutomatically switches compute units (ANE/GPU/CPU) based on battery, thermal pressure, and memory:
let model = try await Deploy.adaptiveModel(from: modelURL)
let result = try await model.predict(input: features)
// Downgrades from Neural Engine to CPU under thermal pressure
// Throttles inference rate in Low Power ModeText, image, audio, and video generation with per-chunk latency tracking:
let (stream, getResult) = model.predictStream(input: prompt, modality: .text)
for try await chunk in stream {
print(chunk.data, chunk.latencyMs)
}
let result = getResult()
// result.ttfcMs, result.throughput, result.avgChunkLatencyMsRoute inference on-device or to the cloud based on device capabilities:
let model = try Octomil.wrap(MLModel(contentsOf: url), modelId: "classifier")
model.configureRouting(RoutingConfig(serverURL: apiURL, apiKey: key))
// Automatically routes to cloud when device is constrained
// Falls back to local CoreML on any cloud failureDeterministic experiment assignment with metric tracking:
let experiments = ExperimentsClient(apiClient: client.apiClient)
let variant = experiments.getVariant(experiment: exp, deviceId: deviceId)
try await experiments.trackMetric(
experimentId: exp.id, metricName: "accuracy", metricValue: 0.94
)On-device training with weight extraction, battery/thermal gating, and offline gradient caching:
let result = try await trainer.trainIfEligible(
model: model,
dataProvider: { trainingBatch },
config: TrainingConfig(epochs: 3, learningRate: 0.001),
deviceState: await monitor.currentState
)
// Skips training if battery < 20% or thermal state is serious
// Caches gradients offline when network is unavailableFull lifecycle: register device, download models, run inference, upload training results:
let client = OctomilClient(
deviceAccessToken: "<device-token>",
orgId: "org_123",
serverURL: URL(string: "https://api.octomil.com")!
)
let registration = try await client.register()
let model = try await client.downloadModel(modelId: "fraud_detection")| Format | Engine | Use case |
|---|---|---|
.mlmodelc / .mlmodel / .mlpackage |
CoreML (Neural Engine, GPU, CPU) | Vision, classification, regression |
| MLX safetensors | MLX via mlx-swift | LLM text generation |
| Time series | MLX-Swift-TS | Forecasting |
HuggingFace Hub models can be loaded directly for development:
let llm = try await Deploy.mlxModelFromHub(modelId: "mlx-community/Llama-3.2-1B-Instruct-4bit")- iOS 17.0+ / macOS 14.0+
- Swift 5.9+
- Xcode 15.0+
Sources/
Octomil/ Core SDK (zero dependencies)
Deploy/ Model loading, benchmarking, adaptive inference
Wrapper/ Drop-in MLModel wrapper with telemetry + OTA
Inference/ Streaming engines (text, image, audio, video)
Client/ Server API client, routing, certificate pinning
Training/ Federated trainer, weight extraction, gradient cache
Runtime/ Device state monitor, adaptive model loader
Experiments/ A/B experiment client
Telemetry/ Batched event reporting
Security/ Secure aggregation (SecAgg+, Shamir)
Privacy/ Differential privacy configuration
OctomilMLX/ MLX LLM inference (mlx-swift dependency)
OctomilTimeSeries/ Time series forecasting
OctomilClip/ App Clip pairing UI
| Raw CoreML | Octomil | |
|---|---|---|
| Load + predict | ~15 lines | 3 lines |
| Neural Engine benchmarking | Manual | Automatic |
| Streaming generation | Build it yourself | Built in |
| OTA model updates | Build it yourself | One config flag |
| Telemetry / latency tracking | Build it yourself | Automatic |
| Adaptive compute switching | Build it yourself | Deploy.adaptiveModel |
| A/B model experiments | Build it yourself | ExperimentsClient |
| Smart device/cloud routing | Build it yourself | configureRouting() |
The Octomil Manifest (octomil.yaml) is a declarative config file that describes which models your app needs, their delivery mode (bundled, managed, or cloud), and which capability each model serves. Generate it with the CLI:
octomil manifest initThe iOS SDK reads octomil.yaml via AppManifest and ModelCatalogService to handle model downloads and runtime resolution automatically.
Minimal examples for the three main mobile SDK capabilities:
| Sample | Capability | Key API |
|---|---|---|
| ChatSample | Text generation | OctomilChat.stream() |
| TranscriptionSample | Speech-to-text | client.audio.transcriptions.create() |
| PredictionSample | Next-word prediction | client.text.predictions.create() |
Each sample is a standalone SwiftUI app focused on one capability. Open Package.swift in Xcode and run it on a real device. A simulator is fine for UI smoke testing, but deployed-model flows should be validated on hardware.
Prerequisites: Org API credentials, one deployed model per capability, and a bundled test_audio.wav for the transcription sample. See Examples/README.md for setup.
Need the full device app? The Octomil iOS App is the broader evaluation app for model testing, pairing, recovery, and golden-path automation. These samples are intentionally narrower: one feature, minimal setup, copyable code.
See CONTRIBUTING.md.