Skip to content

kaminocorp/photon

Repository files navigation

Photon

Photon

AI-powered image processing pipeline written in Rust
Analyze, embed, and tag images locally using SigLIP — no cloud required.

Quick Start  •  Usage  •  How It Works  •  Configuration  •  Library Usage

PyPI License Rust


Photon takes images as input and outputs structured JSON: 768-dim vector embeddings, semantic tags, EXIF metadata, content hashes, and thumbnails. It's a pure processing pipeline — no database, no server, no cloud dependency. Process locally, store wherever you want.

image.jpg ──▶ Photon ──▶ { embedding, tags, metadata, hash, thumbnail }

Features

  • SigLIP Embeddings — 768-dimensional vectors for semantic similarity search, powered by ONNX Runtime
  • Zero-Shot Tagging — 68,000+ term vocabulary (WordNet + curated visual terms) scored locally via SigLIP
  • EXIF Extraction — Camera, GPS coordinates, datetime, ISO, aperture, focal length
  • Content Hashing — BLAKE3 cryptographic hash + perceptual hash for deduplication and similarity
  • Thumbnails — WebP generation with configurable size and quality
  • LLM Descriptions — BYOK enrichment via Ollama, Anthropic, OpenAI, Hyperbolic
  • Batch Processing — Parallel workers with progress bar and skip-existing support
  • Single Binary — No Python, no Docker, no runtime dependencies

Quick Start

Install via PyPI (easiest)

pip install photon-imager

# Download the SigLIP model (~350 MB, one-time)
photon models download

# Process a single image
photon process photo.jpg

Available for macOS (Apple Silicon) and Linux (x86_64, aarch64).

Build from source

git clone https://github.com/kaminocorp/photon.git
cd photon
cargo build --release

# Download the SigLIP model (~350 MB, one-time)
cargo run --release -- models download

# Process a single image
cargo run --release -- process photo.jpg

# Process an entire directory
cargo run --release -- process ./photos/ --format jsonl --output results.jsonl

Usage

Process Images

# Single image → JSON to stdout
photon process image.jpg

# Directory → JSONL file (one JSON object per line)
photon process ./photos/ --format jsonl --output results.jsonl

# Parallel processing with 8 workers
photon process ./photos/ --parallel 8 --output results.jsonl

# Skip already-processed images on re-runs
photon process ./photos/ --output results.jsonl --skip-existing

# Higher quality embeddings (384px model, slower but more detailed)
photon process image.jpg --quality high

LLM Descriptions (BYOK)

# Local via Ollama
photon process image.jpg --llm ollama --llm-model llama3.2-vision

# Anthropic API
photon process image.jpg --llm anthropic --llm-model claude-sonnet-4-5-20250929

# OpenAI API
photon process image.jpg --llm openai --llm-model gpt-4o-mini

# Batch with LLM enrichment
photon process ./photos/ --format jsonl --output results.jsonl --llm anthropic

Control What Gets Generated

# Metadata and hashes only (no AI)
photon process image.jpg --no-embedding --no-tagging

# Skip thumbnail generation
photon process image.jpg --no-thumbnail

# Custom thumbnail size
photon process image.jpg --thumbnail-size 128

Manage Models

photon models download    # Download SigLIP models from HuggingFace
photon models list        # Show installed models and status
photon models path        # Show model storage directory

Configuration

photon config init        # Create config file with defaults
photon config show        # Display current settings
photon config path        # Show config file location

How It Works

Photon runs a sequential pipeline where each stage is independent and optional:

 Input        ┌──────────┐ ┌──────┐ ┌──────┐ ┌───────────┐ ┌───────┐ ┌───────┐
 image.jpg ──▶│ Validate │▶│Decode│▶│ EXIF │▶│   Hash    │▶│Thumb- │▶│ Embed │──▶ ...
              │          │ │      │ │      │ │BLAKE3+pHash│ │ nail  │ │SigLIP │
              └──────────┘ └──────┘ └──────┘ └───────────┘ └───────┘ └───────┘

 ... ──▶ ┌──────────┐ ┌─────────────┐        Output
         │Zero-Shot │▶│  LLM Enrich │──▶  Structured JSON
         │  Tags    │ │  (BYOK)     │     { embedding, tags,
         │ (SigLIP) │ │             │       metadata, hash, ... }
         └──────────┘ └─────────────┘
Stage What it does Speed
Validate Check file exists, size limits, format detection via magic bytes <1ms
Decode Load image pixels (JPEG, PNG, WebP, GIF, TIFF, BMP, AVIF) ~5ms
EXIF Extract camera, GPS, datetime, shooting parameters ~2ms
Hash BLAKE3 content hash (dedup) + perceptual hash (similarity) ~3ms
Thumbnail Aspect-preserving resize to WebP, base64 encoded ~5ms
Embed SigLIP vision encoder → 768-dim L2-normalized vector ~200ms
Tag Dot product against 68K vocabulary, SigLIP sigmoid scoring ~2ms

Output Format

Each processed image produces a JSON object:

{
  "file_path": "/photos/beach.jpg",
  "file_name": "beach.jpg",
  "content_hash": "a7f3b2c1d4e5...",
  "width": 4032,
  "height": 3024,
  "format": "jpeg",
  "file_size": 2458624,
  "embedding": [0.023, -0.156, 0.089, "... 768 floats"],
  "tags": [
    { "name": "beach", "confidence": 0.94, "category": "scene" },
    { "name": "ocean", "confidence": 0.87, "category": "scene" },
    { "name": "tropical", "confidence": 0.76, "category": "style" }
  ],
  "exif": {
    "captured_at": "2024-07-15T14:32:00",
    "camera_model": "iPhone 15 Pro",
    "gps_latitude": 25.7617,
    "gps_longitude": -80.1918
  },
  "thumbnail": "base64-encoded-webp...",
  "perceptual_hash": "d4c3b2a1..."
}

Use --format jsonl for batch processing — one JSON object per line, streamed as each image completes.

Configuration

Photon uses a layered configuration system: code defaults < config file < CLI flags.

photon config init    # Creates ~/.photon/config.toml (or platform-appropriate path)

Key settings in config.toml:

[processing]
parallel_workers = 4
supported_formats = ["jpg", "jpeg", "png", "webp", "heic", "raw", "cr2", "nef", "arw"]

[limits]
max_file_size_mb = 100
max_image_dimension = 10000
embed_timeout_ms = 30000

[embedding]
model = "siglip-base-patch16"         # or "siglip-base-patch16-384" for higher quality

[thumbnail]
enabled = true
size = 256

[tagging]
enabled = true
max_tags = 15

[logging]
level = "info"                        # error, warn, info, debug, trace

Library Usage

Photon's processing engine lives in the photon-core crate and can be embedded directly in Rust applications:

use photon_core::{Config, ImageProcessor};
use std::path::Path;

#[tokio::main]
async fn main() -> photon_core::Result<()> {
    let config = Config::load()?;
    let mut processor = ImageProcessor::new(&config);

    // Load AI components (optional — pipeline works without them)
    processor.load_embedding(&config)?;
    processor.load_tagging(&config)?;

    let result = processor.process(Path::new("photo.jpg")).await?;

    println!("Hash:      {}", result.content_hash);
    println!("Embedding: {} dimensions", result.embedding.len());
    println!("Tags:      {:?}", result.tags.iter().map(|t| &t.name).collect::<Vec<_>>());

    Ok(())
}

Add to your Cargo.toml:

[dependencies]
photon-core = { git = "https://github.com/kaminocorp/photon.git" }
tokio = { version = "1", features = ["full"] }

Integrating with Your Backend

Photon is designed to feed into your own storage and search infrastructure. Pipe the output to your ingestion scripts:

# Stream results into your backend
photon process ./photos/ --format jsonl | your-ingestion-script

# Or process to file, then ingest
photon process ./photos/ --format jsonl --output results.jsonl
python ingest.py results.jsonl

Example — storing embeddings in PostgreSQL with pgvector:

import subprocess, json

result = subprocess.run(
    ["photon", "process", "photo.jpg"],
    capture_output=True, text=True
)
data = json.loads(result.stdout)

db.execute(
    "INSERT INTO images (path, hash, embedding, tags) VALUES (%s, %s, %s, %s)",
    [data["file_path"], data["content_hash"], data["embedding"], json.dumps(data["tags"])]
)

Architecture

photon/
├── crates/
│   ├── photon/              # CLI binary (thin clap wrapper)
│   └── photon-core/         # Embeddable library
│       └── src/
│           ├── pipeline/    # Processing stages (decode, metadata, hash, thumbnail)
│           ├── embedding/   # SigLIP vision encoder (ONNX Runtime)
│           ├── tagging/     # Zero-shot classification (68K vocabulary)
│           ├── llm/         # LLM provider abstraction (Ollama, Anthropic, OpenAI, Hyperbolic)
│           └── output.rs    # JSON/JSONL serialization
├── data/vocabulary/         # WordNet nouns + supplemental visual terms
├── tests/fixtures/          # Test images
└── docs/                    # Phase plans and changelogs

Two-crate design: photon-core contains all processing logic and can be used as a library. photon is a thin CLI that calls into it. This means you can embed Photon's pipeline directly in your Rust application without pulling in CLI dependencies.

Project Status

Phase Status
Foundation (CLI, config, logging) Complete
Image pipeline (decode, EXIF, hashing, thumbnails) Complete
SigLIP embedding (768-dim vectors via ONNX) Complete
Zero-shot tagging (68K vocabulary, self-organizing pools) Complete
LLM enrichment (BYOK descriptions) Complete
Polish & release (progress bar, skip-existing, benchmarks) Complete

Requirements

  • Rust 2021 edition (stable)
  • ~350 MB disk for SigLIP model (downloaded on first models download)
  • Tested on macOS (Apple Silicon) and Linux (aarch64/x86_64)

Contributing

Contributions are welcome. Please open an issue to discuss significant changes before submitting a PR.

cargo test              # Run all tests (226 across workspace)
cargo clippy            # Lint
cargo fmt               # Format
cargo bench -p photon-core  # Run benchmarks

License

Dual-licensed under MIT or Apache 2.0, at your option.

About

Photon is a perception engine for image pipelines: ingest, analyze, extract, and semantically index visual data. Built for high-volume processing and designed to serve as the core visual intelligence layer within larger data mining systems.

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Packages

 
 
 

Contributors