AI-powered image processing pipeline written in Rust
Analyze, embed, and tag images locally using SigLIP — no cloud required.
Quick Start • Usage • How It Works • Configuration • Library Usage
Photon takes images as input and outputs structured JSON: 768-dim vector embeddings, semantic tags, EXIF metadata, content hashes, and thumbnails. It's a pure processing pipeline — no database, no server, no cloud dependency. Process locally, store wherever you want.
image.jpg ──▶ Photon ──▶ { embedding, tags, metadata, hash, thumbnail }
- SigLIP Embeddings — 768-dimensional vectors for semantic similarity search, powered by ONNX Runtime
- Zero-Shot Tagging — 68,000+ term vocabulary (WordNet + curated visual terms) scored locally via SigLIP
- EXIF Extraction — Camera, GPS coordinates, datetime, ISO, aperture, focal length
- Content Hashing — BLAKE3 cryptographic hash + perceptual hash for deduplication and similarity
- Thumbnails — WebP generation with configurable size and quality
- LLM Descriptions — BYOK enrichment via Ollama, Anthropic, OpenAI, Hyperbolic
- Batch Processing — Parallel workers with progress bar and skip-existing support
- Single Binary — No Python, no Docker, no runtime dependencies
pip install photon-imager
# Download the SigLIP model (~350 MB, one-time)
photon models download
# Process a single image
photon process photo.jpgAvailable for macOS (Apple Silicon) and Linux (x86_64, aarch64).
git clone https://github.com/kaminocorp/photon.git
cd photon
cargo build --release
# Download the SigLIP model (~350 MB, one-time)
cargo run --release -- models download
# Process a single image
cargo run --release -- process photo.jpg
# Process an entire directory
cargo run --release -- process ./photos/ --format jsonl --output results.jsonl# Single image → JSON to stdout
photon process image.jpg
# Directory → JSONL file (one JSON object per line)
photon process ./photos/ --format jsonl --output results.jsonl
# Parallel processing with 8 workers
photon process ./photos/ --parallel 8 --output results.jsonl
# Skip already-processed images on re-runs
photon process ./photos/ --output results.jsonl --skip-existing
# Higher quality embeddings (384px model, slower but more detailed)
photon process image.jpg --quality high# Local via Ollama
photon process image.jpg --llm ollama --llm-model llama3.2-vision
# Anthropic API
photon process image.jpg --llm anthropic --llm-model claude-sonnet-4-5-20250929
# OpenAI API
photon process image.jpg --llm openai --llm-model gpt-4o-mini
# Batch with LLM enrichment
photon process ./photos/ --format jsonl --output results.jsonl --llm anthropic# Metadata and hashes only (no AI)
photon process image.jpg --no-embedding --no-tagging
# Skip thumbnail generation
photon process image.jpg --no-thumbnail
# Custom thumbnail size
photon process image.jpg --thumbnail-size 128photon models download # Download SigLIP models from HuggingFace
photon models list # Show installed models and status
photon models path # Show model storage directoryphoton config init # Create config file with defaults
photon config show # Display current settings
photon config path # Show config file locationPhoton runs a sequential pipeline where each stage is independent and optional:
Input ┌──────────┐ ┌──────┐ ┌──────┐ ┌───────────┐ ┌───────┐ ┌───────┐
image.jpg ──▶│ Validate │▶│Decode│▶│ EXIF │▶│ Hash │▶│Thumb- │▶│ Embed │──▶ ...
│ │ │ │ │ │ │BLAKE3+pHash│ │ nail │ │SigLIP │
└──────────┘ └──────┘ └──────┘ └───────────┘ └───────┘ └───────┘
... ──▶ ┌──────────┐ ┌─────────────┐ Output
│Zero-Shot │▶│ LLM Enrich │──▶ Structured JSON
│ Tags │ │ (BYOK) │ { embedding, tags,
│ (SigLIP) │ │ │ metadata, hash, ... }
└──────────┘ └─────────────┘
| Stage | What it does | Speed |
|---|---|---|
| Validate | Check file exists, size limits, format detection via magic bytes | <1ms |
| Decode | Load image pixels (JPEG, PNG, WebP, GIF, TIFF, BMP, AVIF) | ~5ms |
| EXIF | Extract camera, GPS, datetime, shooting parameters | ~2ms |
| Hash | BLAKE3 content hash (dedup) + perceptual hash (similarity) | ~3ms |
| Thumbnail | Aspect-preserving resize to WebP, base64 encoded | ~5ms |
| Embed | SigLIP vision encoder → 768-dim L2-normalized vector | ~200ms |
| Tag | Dot product against 68K vocabulary, SigLIP sigmoid scoring | ~2ms |
Each processed image produces a JSON object:
{
"file_path": "/photos/beach.jpg",
"file_name": "beach.jpg",
"content_hash": "a7f3b2c1d4e5...",
"width": 4032,
"height": 3024,
"format": "jpeg",
"file_size": 2458624,
"embedding": [0.023, -0.156, 0.089, "... 768 floats"],
"tags": [
{ "name": "beach", "confidence": 0.94, "category": "scene" },
{ "name": "ocean", "confidence": 0.87, "category": "scene" },
{ "name": "tropical", "confidence": 0.76, "category": "style" }
],
"exif": {
"captured_at": "2024-07-15T14:32:00",
"camera_model": "iPhone 15 Pro",
"gps_latitude": 25.7617,
"gps_longitude": -80.1918
},
"thumbnail": "base64-encoded-webp...",
"perceptual_hash": "d4c3b2a1..."
}Use --format jsonl for batch processing — one JSON object per line, streamed as each image completes.
Photon uses a layered configuration system: code defaults < config file < CLI flags.
photon config init # Creates ~/.photon/config.toml (or platform-appropriate path)Key settings in config.toml:
[processing]
parallel_workers = 4
supported_formats = ["jpg", "jpeg", "png", "webp", "heic", "raw", "cr2", "nef", "arw"]
[limits]
max_file_size_mb = 100
max_image_dimension = 10000
embed_timeout_ms = 30000
[embedding]
model = "siglip-base-patch16" # or "siglip-base-patch16-384" for higher quality
[thumbnail]
enabled = true
size = 256
[tagging]
enabled = true
max_tags = 15
[logging]
level = "info" # error, warn, info, debug, tracePhoton's processing engine lives in the photon-core crate and can be embedded directly in Rust applications:
use photon_core::{Config, ImageProcessor};
use std::path::Path;
#[tokio::main]
async fn main() -> photon_core::Result<()> {
let config = Config::load()?;
let mut processor = ImageProcessor::new(&config);
// Load AI components (optional — pipeline works without them)
processor.load_embedding(&config)?;
processor.load_tagging(&config)?;
let result = processor.process(Path::new("photo.jpg")).await?;
println!("Hash: {}", result.content_hash);
println!("Embedding: {} dimensions", result.embedding.len());
println!("Tags: {:?}", result.tags.iter().map(|t| &t.name).collect::<Vec<_>>());
Ok(())
}Add to your Cargo.toml:
[dependencies]
photon-core = { git = "https://github.com/kaminocorp/photon.git" }
tokio = { version = "1", features = ["full"] }Photon is designed to feed into your own storage and search infrastructure. Pipe the output to your ingestion scripts:
# Stream results into your backend
photon process ./photos/ --format jsonl | your-ingestion-script
# Or process to file, then ingest
photon process ./photos/ --format jsonl --output results.jsonl
python ingest.py results.jsonlExample — storing embeddings in PostgreSQL with pgvector:
import subprocess, json
result = subprocess.run(
["photon", "process", "photo.jpg"],
capture_output=True, text=True
)
data = json.loads(result.stdout)
db.execute(
"INSERT INTO images (path, hash, embedding, tags) VALUES (%s, %s, %s, %s)",
[data["file_path"], data["content_hash"], data["embedding"], json.dumps(data["tags"])]
)photon/
├── crates/
│ ├── photon/ # CLI binary (thin clap wrapper)
│ └── photon-core/ # Embeddable library
│ └── src/
│ ├── pipeline/ # Processing stages (decode, metadata, hash, thumbnail)
│ ├── embedding/ # SigLIP vision encoder (ONNX Runtime)
│ ├── tagging/ # Zero-shot classification (68K vocabulary)
│ ├── llm/ # LLM provider abstraction (Ollama, Anthropic, OpenAI, Hyperbolic)
│ └── output.rs # JSON/JSONL serialization
├── data/vocabulary/ # WordNet nouns + supplemental visual terms
├── tests/fixtures/ # Test images
└── docs/ # Phase plans and changelogs
Two-crate design: photon-core contains all processing logic and can be used as a library. photon is a thin CLI that calls into it. This means you can embed Photon's pipeline directly in your Rust application without pulling in CLI dependencies.
| Phase | Status |
|---|---|
| Foundation (CLI, config, logging) | Complete |
| Image pipeline (decode, EXIF, hashing, thumbnails) | Complete |
| SigLIP embedding (768-dim vectors via ONNX) | Complete |
| Zero-shot tagging (68K vocabulary, self-organizing pools) | Complete |
| LLM enrichment (BYOK descriptions) | Complete |
| Polish & release (progress bar, skip-existing, benchmarks) | Complete |
- Rust 2021 edition (stable)
- ~350 MB disk for SigLIP model (downloaded on first
models download) - Tested on macOS (Apple Silicon) and Linux (aarch64/x86_64)
Contributions are welcome. Please open an issue to discuss significant changes before submitting a PR.
cargo test # Run all tests (226 across workspace)
cargo clippy # Lint
cargo fmt # Format
cargo bench -p photon-core # Run benchmarksDual-licensed under MIT or Apache 2.0, at your option.