Run ML models on-device with one function call. TFLite + NNAPI + GPU, automatic hardware benchmarking, telemetry, and OTA updates.
A Kotlin SDK that deploys TFLite models to Android devices and handles the hard parts: hardware acceleration selection, delegate benchmarking, inference telemetry, and model updates. Drop a .tflite file in your assets, call Octomil.deploy(), and get GPU/NPU-accelerated inference with zero boilerplate. Optionally connect to the Octomil platform for federated learning, A/B experiments, and fleet-wide model management.
// build.gradle.kts
repositories {
maven("https://maven.pkg.github.com/octomil/octomil-android")
}
dependencies {
implementation("ai.octomil:octomil:1.2.0")
}import ai.octomil.Octomil
// Load from assets/ and auto-benchmark hardware delegates
val model = Octomil.deploy(context, "mobilenet.tflite")
val result = model.predict(inputData).getOrThrow()
println(result.predictions) // [0.02, 0.91, 0.07, ...]
println(model.activeDelegate) // "gpu"
println(model.warmupResult) // cold: 62ms, warm: 4msThat's it. The SDK copies the asset to cache, loads the TFLite interpreter, benchmarks GPU vs CPU vs vendor NPU, picks the fastest delegate, and warms up the model.
val options = LocalModelOptions(
enableGpu = true, // TFLite GPU delegate
enableVendorNpu = true, // Qualcomm QNN / Samsung Eden / MediaTek NeuroPilot
enableFloat16 = true, // ~2x throughput on supported GPUs
preferBigCores = true, // Pin to performance cores on ARM big.LITTLE
)
val model = Octomil.deploy(context, "classifier.tflite", options = options)val topK = model.classify(inputData, topK = 5).getOrThrow()
topK.forEach { (classIndex, confidence) ->
println("Class $classIndex: ${confidence * 100}%")
}val results = model.predictBatch(listOf(input1, input2, input3)).getOrThrow()Already using Interpreter directly? Wrap it to get telemetry and OTA updates with no call-site changes:
import ai.octomil.wrapper.Octomil
// Before
val interpreter = Interpreter(modelFile)
interpreter.run(input, output)
// After — same predict API, plus telemetry + OTA
val interpreter = Octomil.wrap(Interpreter(modelFile), modelId = "classifier")
interpreter.predict(input, output)client.stream(prompt, modality = Modality.TEXT).collect { chunk ->
print(String(chunk.data)) // token-by-token output
}val outcome = client.train(
dataProvider = InMemoryTrainingDataProvider(trainingData),
config = TrainingConfig(batchSize = 32, epochs = 5, learningRate = 0.001f),
)
println("Loss: ${outcome.trainingResult.loss}")
// Weight updates are uploaded to the server automatically (or manually via UploadPolicy.MANUAL)Connect to the Octomil platform for fleet-wide observability:
val config = OctomilConfig.Builder()
.serverUrl("https://api.octomil.com")
.deviceAccessToken("<device-token>")
.orgId("org_123")
.modelId("fraud_detection")
.build()
val client = OctomilClient.Builder(context).config(config).build()
client.initialize().getOrThrow()
// Inference, training, experiments, telemetry — all through client
val result = client.runInference(inputData)| Octomil SDK | Raw TFLite | |
|---|---|---|
| Hardware delegate selection | Automatic benchmarking (GPU/NPU/CPU) | Manual setup per device |
| Model loading | Octomil.deploy(ctx, "model.tflite") |
Copy asset, create Interpreter, configure delegates, handle errors |
| Telemetry | Built-in latency/error tracking | Build your own |
| OTA model updates | Config flag | Build your own |
| Batch inference | .predictBatch() |
Manual loop |
| Vendor NPU support | Auto-detected via reflection | Integrate each vendor SDK |
| Federated learning | .train() with secure aggregation |
Not available |
- Format: TensorFlow Lite (
.tflite) - Delegates: CPU, GPU (OpenGL/OpenCL), NNAPI, Qualcomm QNN, Samsung Eden, MediaTek NeuroPilot
- Modalities: Classification, regression, text generation, image, audio, video
- Training: On-device gradient updates (requires TFLite training signatures)
- Android API 24+ (Android 7.0)
- Kotlin 1.9+
- TFLite models (
.tfliteformat)
ai.octomil
├── Octomil # Local-first entry point: deploy, loadModel
├── wrapper/Octomil # Drop-in TFLite Interpreter wrapper
├── client/ # OctomilClient — server-connected client
├── inference/ # Streaming engines (text, image, audio, video)
├── training/ # On-device training, weight extraction
├── models/ # Model manager, caching, versioning
├── runtime/ # Adaptive interpreter, device state monitoring
├── secagg/ # Secure aggregation (ECDH, Shamir, SecAgg+)
├── experiments/ # A/B testing client
├── analytics/ # Federated analytics
├── privacy/ # Differential privacy
├── discovery/ # NSD/mDNS for `octomil deploy --phone`
├── pairing/ # Device pairing UI (Compose)
└── storage/ # Android Keystore-backed secure storage
The Octomil Manifest (octomil.yaml) is a declarative config file that describes which models your app needs, their delivery mode (bundled, managed, or cloud), and which capability each model serves. Generate it with the CLI:
octomil manifest initThe Android SDK reads octomil.yaml via AppManifest and ModelCatalogService to handle model downloads and runtime resolution automatically.
Minimal examples for the three main Android SDK capabilities:
| Sample | Capability | Key API |
|---|---|---|
ChatSampleActivity |
Text generation | Octomil.responses.stream() |
TranscriptionSampleActivity |
Speech-to-text | Octomil.audio.transcribe() |
PredictionSampleActivity |
Next-word prediction | Octomil.text.predict() |
Each sample is a single Activity in the samples/ module and shows the shortest useful local integration path for one capability. Build and install on a physical Android device:
./gradlew :samples:installDebugPrerequisites: One deployed model per capability on the device. These samples use Octomil.init(...) and local model resolution only; they do not cover auth, pairing, or control-plane setup. See samples/README.md for setup.
Need the full device app? The Octomil Android App is the broader evaluation app for model testing, pairing, recovery, and golden-path automation. These samples are intentionally narrower: one feature, local assumptions, copyable code.
See CONTRIBUTING.md.