Octomil Android SDK

Run ML models on-device with one function call. TFLite + NNAPI + GPU, automatic hardware benchmarking, telemetry, and OTA updates.

What is this?

A Kotlin SDK that deploys TFLite models to Android devices and handles the hard parts: hardware acceleration selection, delegate benchmarking, inference telemetry, and model updates. Drop a .tflite file in your assets, call Octomil.deploy(), and get GPU/NPU-accelerated inference with zero boilerplate. Optionally connect to the Octomil platform for federated learning, A/B experiments, and fleet-wide model management.

Installation

// build.gradle.kts
repositories {
    maven("https://maven.pkg.github.com/octomil/octomil-android")
}

dependencies {
    implementation("ai.octomil:octomil:1.2.0")
}

Quick Start

import ai.octomil.Octomil

// Load from assets/ and auto-benchmark hardware delegates
val model = Octomil.deploy(context, "mobilenet.tflite")

val result = model.predict(inputData).getOrThrow()
println(result.predictions)        // [0.02, 0.91, 0.07, ...]
println(model.activeDelegate)      // "gpu"
println(model.warmupResult)        // cold: 62ms, warm: 4ms

That's it. The SDK copies the asset to cache, loads the TFLite interpreter, benchmarks GPU vs CPU vs vendor NPU, picks the fastest delegate, and warms up the model.

Features

Inference with automatic hardware selection

val options = LocalModelOptions(
    enableGpu = true,           // TFLite GPU delegate
    enableVendorNpu = true,     // Qualcomm QNN / Samsung Eden / MediaTek NeuroPilot
    enableFloat16 = true,       // ~2x throughput on supported GPUs
    preferBigCores = true,      // Pin to performance cores on ARM big.LITTLE
)
val model = Octomil.deploy(context, "classifier.tflite", options = options)

Classification

val topK = model.classify(inputData, topK = 5).getOrThrow()
topK.forEach { (classIndex, confidence) ->
    println("Class $classIndex: ${confidence * 100}%")
}

Batch inference

val results = model.predictBatch(listOf(input1, input2, input3)).getOrThrow()

Drop-in TFLite wrapper (one-line migration)

Already using Interpreter directly? Wrap it to get telemetry and OTA updates with no call-site changes:

import ai.octomil.wrapper.Octomil

// Before
val interpreter = Interpreter(modelFile)
interpreter.run(input, output)

// After — same predict API, plus telemetry + OTA
val interpreter = Octomil.wrap(Interpreter(modelFile), modelId = "classifier")
interpreter.predict(input, output)

Streaming inference (text, image, audio, video)

client.stream(prompt, modality = Modality.TEXT).collect { chunk ->
    print(String(chunk.data))  // token-by-token output
}

On-device training (federated learning)

val outcome = client.train(
    dataProvider = InMemoryTrainingDataProvider(trainingData),
    config = TrainingConfig(batchSize = 32, epochs = 5, learningRate = 0.001f),
)
println("Loss: ${outcome.trainingResult.loss}")
// Weight updates are uploaded to the server automatically (or manually via UploadPolicy.MANUAL)

Telemetry and model management

Connect to the Octomil platform for fleet-wide observability:

val config = OctomilConfig.Builder()
    .serverUrl("https://api.octomil.com")
    .deviceAccessToken("<device-token>")
    .orgId("org_123")
    .modelId("fraud_detection")
    .build()

val client = OctomilClient.Builder(context).config(config).build()
client.initialize().getOrThrow()

// Inference, training, experiments, telemetry — all through client
val result = client.runInference(inputData)

Why not raw TFLite?

	Octomil SDK	Raw TFLite
Hardware delegate selection	Automatic benchmarking (GPU/NPU/CPU)	Manual setup per device
Model loading	`Octomil.deploy(ctx, "model.tflite")`	Copy asset, create Interpreter, configure delegates, handle errors
Telemetry	Built-in latency/error tracking	Build your own
OTA model updates	Config flag	Build your own
Batch inference	`.predictBatch()`	Manual loop
Vendor NPU support	Auto-detected via reflection	Integrate each vendor SDK
Federated learning	`.train()` with secure aggregation	Not available

Supported Models

Format: TensorFlow Lite (.tflite)
Delegates: CPU, GPU (OpenGL/OpenCL), NNAPI, Qualcomm QNN, Samsung Eden, MediaTek NeuroPilot
Modalities: Classification, regression, text generation, image, audio, video
Training: On-device gradient updates (requires TFLite training signatures)

Requirements

Android API 24+ (Android 7.0)
Kotlin 1.9+
TFLite models (.tflite format)

Architecture

ai.octomil
├── Octomil              # Local-first entry point: deploy, loadModel
├── wrapper/Octomil      # Drop-in TFLite Interpreter wrapper
├── client/              # OctomilClient — server-connected client
├── inference/           # Streaming engines (text, image, audio, video)
├── training/            # On-device training, weight extraction
├── models/              # Model manager, caching, versioning
├── runtime/             # Adaptive interpreter, device state monitoring
├── secagg/              # Secure aggregation (ECDH, Shamir, SecAgg+)
├── experiments/         # A/B testing client
├── analytics/           # Federated analytics
├── privacy/             # Differential privacy
├── discovery/           # NSD/mDNS for `octomil deploy --phone`
├── pairing/             # Device pairing UI (Compose)
└── storage/             # Android Keystore-backed secure storage

Octomil Manifest

The Octomil Manifest (octomil.yaml) is a declarative config file that describes which models your app needs, their delivery mode (bundled, managed, or cloud), and which capability each model serves. Generate it with the CLI:

octomil manifest init

The Android SDK reads octomil.yaml via AppManifest and ModelCatalogService to handle model downloads and runtime resolution automatically.

Samples

Minimal examples for the three main Android SDK capabilities:

Sample	Capability	Key API
`ChatSampleActivity`	Text generation	`Octomil.responses.stream()`
`TranscriptionSampleActivity`	Speech-to-text	`Octomil.audio.transcribe()`
`PredictionSampleActivity`	Next-word prediction	`Octomil.text.predict()`

Each sample is a single Activity in the samples/ module and shows the shortest useful local integration path for one capability. Build and install on a physical Android device:

./gradlew :samples:installDebug

Prerequisites: One deployed model per capability on the device. These samples use Octomil.init(...) and local model resolution only; they do not cover auth, pairing, or control-plane setup. See samples/README.md for setup.

Need the full device app? The Octomil Android App is the broader evaluation app for model testing, pairing, recovery, and golden-path automation. These samples are intentionally narrower: one feature, local assumptions, copyable code.

Contributing

See CONTRIBUTING.md.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 174 Commits
.clusterfuzzlite		.clusterfuzzlite
.github		.github
build-config		build-config
docs		docs
gradle/wrapper		gradle/wrapper
octomil		octomil
samples		samples
.editorconfig		.editorconfig
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
build.gradle.kts		build.gradle.kts
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
knope.toml		knope.toml
settings.gradle.kts		settings.gradle.kts
sonar-project.properties		sonar-project.properties

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Octomil Android SDK

What is this?

Installation

Quick Start

Features

Inference with automatic hardware selection

Classification

Batch inference

Drop-in TFLite wrapper (one-line migration)

Streaming inference (text, image, audio, video)

On-device training (federated learning)

Telemetry and model management

Why not raw TFLite?

Supported Models

Requirements

Architecture

Octomil Manifest

Samples

Contributing

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Octomil Android SDK

What is this?

Installation

Quick Start

Features

Inference with automatic hardware selection

Classification

Batch inference

Drop-in TFLite wrapper (one-line migration)

Streaming inference (text, image, audio, video)

On-device training (federated learning)

Telemetry and model management

Why not raw TFLite?

Supported Models

Requirements

Architecture

Octomil Manifest

Samples

Contributing

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages