You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: deployment/macos/README.md
+53-2Lines changed: 53 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -462,14 +462,65 @@ llmkube-metal-agent \
462
462
--omlx-bin /path/to/omlx # Auto-detected from Homebrew if not set
463
463
```
464
464
465
+
## Ollama Runtime
466
+
467
+
The Metal Agent also supports [Ollama](https://ollama.com) as a runtime backend. Since Ollama 0.19 uses MLX natively on Apple Silicon, this gives you fast inference with the tool most Mac users already have installed.
468
+
469
+
### Prerequisites
470
+
471
+
Install Ollama if you haven't already:
472
+
473
+
```bash
474
+
brew install ollama
475
+
```
476
+
477
+
### Usage
478
+
479
+
Start Ollama (if not already running as a menu bar app):
480
+
481
+
```bash
482
+
ollama serve
483
+
```
484
+
485
+
Start the Metal Agent with the Ollama runtime:
486
+
487
+
```bash
488
+
llmkube-metal-agent --runtime ollama
489
+
```
490
+
491
+
Deploy a model. The agent will pull the model through Ollama automatically:
492
+
493
+
```bash
494
+
llmkube deploy llama-3.2-3b --gpu --accelerator metal
495
+
```
496
+
497
+
The agent maps LLMKube catalog names to Ollama model tags (e.g., `llama-3.2-3b` becomes `llama3.2:3b`). If the model isn't already downloaded, Ollama pulls it from the Ollama registry.
498
+
499
+
### Differences from llama-server and oMLX
500
+
501
+
| | llama-server | oMLX | Ollama |
502
+
|---|---|---|---|
503
+
| Model format | GGUF | MLX | GGUF (via Ollama registry) |
0 commit comments