Octomil iOS

Run AI models on-device. CoreML and MLX inference with one API, automatic Neural Engine benchmarking, streaming generation, and OTA model updates.

What is this?

A Swift SDK that handles everything between your app and an on-device ML model. Load a CoreML or MLX model, get automatic Neural Engine vs CPU benchmarking, streaming token generation, telemetry, OTA updates, and smart cloud/device routing -- without writing any of that infrastructure yourself. 100% Swift, zero dependencies in the core module.

Installation

Swift Package Manager -- add to your Package.swift:

dependencies: [
    .package(url: "https://github.com/octomil/octomil-ios.git", from: "1.1.0")
]

Or in Xcode: File > Add Package Dependencies > paste https://github.com/octomil/octomil-ios.git.

Four library products available:

Product	What it adds
`Octomil`	Core SDK -- CoreML inference, telemetry, model management. Zero dependencies.
`OctomilMLX`	MLX LLM inference via mlx-swift.
`OctomilTimeSeries`	Time series forecasting engine.
`OctomilClip`	App Clip pairing flow.

Quick Start

CoreML inference (5 lines)

import Octomil

let model = try await Deploy.model(
    at: Bundle.main.url(forResource: "MobileNet", withExtension: "mlmodelc")!
)

let result = try model.predict(inputs: ["image": pixelBuffer])

print(model.activeDelegate)   // "neural_engine" or "cpu" (auto-selected)
print(model.warmupResult!)    // cold: 45ms, warm: 3ms, cpu: 12ms

Deploy.model loads the model, runs a warmup benchmark comparing Neural Engine vs CPU, and picks the fastest delegate. No configuration needed.

MLX LLM streaming (6 lines)

import OctomilMLX

let llm = try await Deploy.mlxModel(at: modelDirectory)

let (stream, getResult) = llm.predictStream(prompt: "Explain quicksort in Swift")
for try await chunk in stream {
    print(String(data: chunk.data, encoding: .utf8)!, terminator: "")
}
let metrics = getResult()  // ttfcMs, throughput, totalChunks

Features

Drop-in wrapper for existing CoreML models

Already using MLModel directly? Wrap it in one line to get telemetry, input validation, and OTA updates with zero call-site changes:

// Before
let model = try MLModel(contentsOf: modelURL)
let output = try model.prediction(from: input)

// After
let model = try Octomil.wrap(MLModel(contentsOf: modelURL), modelId: "classifier")
let output = try model.predict(input: input)  // same result, now with telemetry + OTA

Adaptive inference

Automatically switches compute units (ANE/GPU/CPU) based on battery, thermal pressure, and memory:

let model = try await Deploy.adaptiveModel(from: modelURL)
let result = try await model.predict(input: features)
// Downgrades from Neural Engine to CPU under thermal pressure
// Throttles inference rate in Low Power Mode

Streaming across modalities

Text, image, audio, and video generation with per-chunk latency tracking:

let (stream, getResult) = model.predictStream(input: prompt, modality: .text)
for try await chunk in stream {
    print(chunk.data, chunk.latencyMs)
}
let result = getResult()
// result.ttfcMs, result.throughput, result.avgChunkLatencyMs

Smart routing (device vs cloud)

Route inference on-device or to the cloud based on device capabilities:

let model = try Octomil.wrap(MLModel(contentsOf: url), modelId: "classifier")
model.configureRouting(RoutingConfig(serverURL: apiURL, apiKey: key))
// Automatically routes to cloud when device is constrained
// Falls back to local CoreML on any cloud failure

A/B experiments

Deterministic experiment assignment with metric tracking:

let experiments = ExperimentsClient(apiClient: client.apiClient)
let variant = experiments.getVariant(experiment: exp, deviceId: deviceId)
try await experiments.trackMetric(
    experimentId: exp.id, metricName: "accuracy", metricValue: 0.94
)

Federated training

On-device training with weight extraction, battery/thermal gating, and offline gradient caching:

let result = try await trainer.trainIfEligible(
    model: model,
    dataProvider: { trainingBatch },
    config: TrainingConfig(epochs: 3, learningRate: 0.001),
    deviceState: await monitor.currentState
)
// Skips training if battery < 20% or thermal state is serious
// Caches gradients offline when network is unavailable

Server integration

Full lifecycle: register device, download models, run inference, upload training results:

let client = OctomilClient(
    deviceAccessToken: "<device-token>",
    orgId: "org_123",
    serverURL: URL(string: "https://api.octomil.com")!
)
let registration = try await client.register()
let model = try await client.downloadModel(modelId: "fraud_detection")

Supported Models

Format	Engine	Use case
`.mlmodelc` / `.mlmodel` / `.mlpackage`	CoreML (Neural Engine, GPU, CPU)	Vision, classification, regression
MLX safetensors	MLX via mlx-swift	LLM text generation
Time series	MLX-Swift-TS	Forecasting

HuggingFace Hub models can be loaded directly for development:

let llm = try await Deploy.mlxModelFromHub(modelId: "mlx-community/Llama-3.2-1B-Instruct-4bit")

Requirements

iOS 17.0+ / macOS 14.0+
Swift 5.9+
Xcode 15.0+

Architecture

Sources/
  Octomil/           Core SDK (zero dependencies)
    Deploy/          Model loading, benchmarking, adaptive inference
    Wrapper/         Drop-in MLModel wrapper with telemetry + OTA
    Inference/       Streaming engines (text, image, audio, video)
    Client/          Server API client, routing, certificate pinning
    Training/        Federated trainer, weight extraction, gradient cache
    Runtime/         Device state monitor, adaptive model loader
    Experiments/     A/B experiment client
    Telemetry/       Batched event reporting
    Security/        Secure aggregation (SecAgg+, Shamir)
    Privacy/         Differential privacy configuration
  OctomilMLX/        MLX LLM inference (mlx-swift dependency)
  OctomilTimeSeries/ Time series forecasting
  OctomilClip/       App Clip pairing UI

vs. raw CoreML

	Raw CoreML	Octomil
Load + predict	~15 lines	3 lines
Neural Engine benchmarking	Manual	Automatic
Streaming generation	Build it yourself	Built in
OTA model updates	Build it yourself	One config flag
Telemetry / latency tracking	Build it yourself	Automatic
Adaptive compute switching	Build it yourself	`Deploy.adaptiveModel`
A/B model experiments	Build it yourself	`ExperimentsClient`
Smart device/cloud routing	Build it yourself	`configureRouting()`

Octomil Manifest

The Octomil Manifest (octomil.yaml) is a declarative config file that describes which models your app needs, their delivery mode (bundled, managed, or cloud), and which capability each model serves. Generate it with the CLI:

octomil manifest init

The iOS SDK reads octomil.yaml via AppManifest and ModelCatalogService to handle model downloads and runtime resolution automatically.

Samples

Minimal examples for the three main mobile SDK capabilities:

Sample	Capability	Key API
ChatSample	Text generation	`OctomilChat.stream()`
TranscriptionSample	Speech-to-text	`client.audio.transcriptions.create()`
PredictionSample	Next-word prediction	`client.text.predictions.create()`

Each sample is a standalone SwiftUI app focused on one capability. Open Package.swift in Xcode and run it on a real device. A simulator is fine for UI smoke testing, but deployed-model flows should be validated on hardware.

Prerequisites: Org API credentials, one deployed model per capability, and a bundled test_audio.wav for the transcription sample. See Examples/README.md for setup.

Need the full device app? The Octomil iOS App is the broader evaluation app for model testing, pairing, recovery, and golden-path automation. These samples are intentionally narrower: one feature, minimal setup, copyable code.

Contributing

See CONTRIBUTING.md.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 175 Commits
.github		.github
.swiftpm/xcode/xcshareddata/xcschemes		.swiftpm/xcode/xcshareddata/xcschemes
Docs		Docs
Examples		Examples
Sources		Sources
Tests		Tests
scripts		scripts
.editorconfig		.editorconfig
.gitignore		.gitignore
.swiftlint.yml		.swiftlint.yml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
Package.swift		Package.swift
README.md		README.md
SECURITY.md		SECURITY.md
knope.toml		knope.toml
sonar-project.properties		sonar-project.properties

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Octomil iOS

What is this?

Installation

Quick Start

CoreML inference (5 lines)

MLX LLM streaming (6 lines)

Features

Drop-in wrapper for existing CoreML models

Adaptive inference

Streaming across modalities

Smart routing (device vs cloud)

A/B experiments

Federated training

Server integration

Supported Models

Requirements

Architecture

vs. raw CoreML

Octomil Manifest

Samples

Contributing

License

About

Uh oh!

Releases 6

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Octomil iOS

What is this?

Installation

Quick Start

CoreML inference (5 lines)

MLX LLM streaming (6 lines)

Features

Drop-in wrapper for existing CoreML models

Adaptive inference

Streaming across modalities

Smart routing (device vs cloud)

A/B experiments

Federated training

Server integration

Supported Models

Requirements

Architecture

vs. raw CoreML

Octomil Manifest

Samples

Contributing

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages