
A high-performance inference engine for AI models on Apple Silicon. Key features:
- Simple, high-level API
- Unified model configurations, making it easy to add support for new models
- Traceable computations to ensure correctness against the source-of-truth implementation
- Utilizes unified memory on Apple devices
For a detailed explanation of the architecture, please refer to the documentation.
uzu uses its own model format. You can download a test model using the sample script:
./scripts/download_test_model.shOr you can download any supported model that has already been converted using:
cd ./tools/helpers/
uv run main.py list-models # show the list of supported models
uv run main.py download-model {REPO_ID} # download a specific model using repo_idAfter that, you can find the downloaded model at ./models/{VERSION}/.
Alternatively, you can export a specific model yourself with lalamo:
git clone https://github.com/trymirai/lalamo.git
cd lalamoAfter that, you can retrieve the list of supported models:
uv run lalamo list-modelsThen, export the specific one:
uv run lalamo convert meta-llama/Llama-3.2-1B-Instruct- uzu-swift - a prebuilt Swift framework, ready to use with SPM
- uzu-ts - a prebuilt TypeScript framework made for Node.js ecosystem
You can run uzu in a CLI mode:
cargo run --release -p cli -- helpUsage: cli [COMMAND]
Commands:
run Run a model with the specified path
serve Start a server with the specified model path
bench Run benchmarks for the specified model
help Print this message or the help of the given subcommand(s)For now, we only support the Metal backend, so to compile corresponding kernels you’ll need to install Xcode and run the following commands:
xcodebuild -runFirstLaunch
xcodebuild -downloadComponent MetalToolchainFirst, add the uzu dependency to your Cargo.toml:
[dependencies]
uzu = { git = "https://github.com/trymirai/uzu", branch = "main", package = "uzu" }Then, create an inference Session with a specific model and configuration:
use std::path::PathBuf;
use uzu::session::{
Session,
config::{DecodingConfig, RunConfig},
types::{Input, Output},
};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let model_path = PathBuf::from("MODEL_PATH");
let mut session = Session::new(model_path, DecodingConfig::default())?;
let input = Input::Text(String::from("Tell about London"));
let output = session.run(
input,
RunConfig::default().tokens_limit(128),
Some(|_: Output| {
return true;
}),
)?;
println!("{}", output.text.original);
Ok(())
}To run benchmarks, you can use the following command:
cargo run --release -p cli -- bench ./models/{ENGINE_VERSION}/{MODEL_NAME} ./models/{ENGINE_VERSION}/{MODEL_NAME}/benchmark_task.json ./models/{ENGINE_VERSION}/{MODEL_NAME}/benchmark_result.jsonbenchmark_task.json will be automatically generated after the model is downloaded via ./tools/helpers/, as described earlier.
This project is licensed under the MIT License. See the LICENSE file for details.