High-performance log normalization pipeline written in Go.
Raw logs go in — from any provider, any format.
Structured, canonical, token-efficient events come out.
Quickstart · How It Works · Library API · Connectors · Taxonomy · Integration Guide · Changelog
Every log provider has a different API, auth mechanism, and response format. Every application logs differently. Lumber normalizes all of it into a single schema using a local embedding model and semantic classification — no cloud API calls, no LLM dependency.
This matters most for AI agent workflows that consume logs. Raw log dumps waste tokens, break on inconsistent formats, and require per-source integration code. Lumber solves that.
Raw logs (Vercel, Fly.io, Supabase, …)
↓ connectors
Embed → Classify → Canonicalize → Compact
↓ engine
Structured canonical events (JSON)
- Connectors ingest raw logs from providers via a unified interface (stream or query)
- Embedder converts each log line into a 1024-dim vector using a local ONNX model (~23MB, CPU-only)
- Classifier compares the vector against 42 pre-embedded taxonomy labels via cosine similarity
- Compactor strips noise, truncates stack traces, and deduplicates repeated events
A raw log like this:
ERROR [2026-02-19 12:00:00] UserService — connection refused (host=db-primary, port=5432)
Becomes:
{
"type": "ERROR",
"category": "connection_failure",
"severity": "error",
"timestamp": "2026-02-19T12:00:00Z",
"summary": "UserService — connection refused (host=db-primary)",
"confidence": 0.91
}- Go 1.23+
curl(for model download)
git clone https://github.com/kaminocorp/lumber.git
cd lumber
# Download the embedding model (~23MB) and ONNX runtime library
make download-model
# Build
make buildexport LUMBER_CONNECTOR=vercel
export LUMBER_API_KEY=your-token-here
export LUMBER_VERCEL_PROJECT_ID=prj_your-project-id
./bin/lumber./bin/lumber -mode query \
-from 2026-02-24T00:00:00Z \
-to 2026-02-24T01:00:00Z./bin/lumber -versionFlags override environment variables when set explicitly.
lumber [flags]
-mode string Pipeline mode: stream or query
-connector string Connector: vercel, flyio, supabase
-from string Query start time (RFC3339)
-to string Query end time (RFC3339)
-limit int Query result limit
-verbosity string Verbosity: minimal, standard, full
-pretty Pretty-print JSON output
-log-level string Log level: debug, info, warn, error
-version Print version and exit
Examples:
# Stream from Fly.io with debug logging
./bin/lumber -connector flyio -log-level debug
# Query last hour, pretty-printed
./bin/lumber -mode query -from 2026-02-24T07:00:00Z -to 2026-02-24T08:00:00Z -pretty
# Minimal verbosity for token-efficient output
./bin/lumber -verbosity minimalThree connectors are implemented. Each handles auth, pagination, rate limiting, and produces RawLog entries that feed into the classification engine.
Connects to the Vercel REST API for project logs.
export LUMBER_CONNECTOR=vercel
export LUMBER_API_KEY=your-vercel-token
export LUMBER_VERCEL_PROJECT_ID=prj_xxx
export LUMBER_VERCEL_TEAM_ID=team_xxx # optionalConnects to the Fly.io HTTP logs API.
export LUMBER_CONNECTOR=flyio
export LUMBER_API_KEY=your-fly-token
export LUMBER_FLY_APP_NAME=your-app-nameConnects to the Supabase Analytics API. Queries across multiple log tables.
export LUMBER_CONNECTOR=supabase
export LUMBER_API_KEY=your-supabase-service-key
export LUMBER_SUPABASE_PROJECT_REF=your-project-ref
export LUMBER_SUPABASE_TABLES=edge_logs,postgres_logs # optional, defaults to all| Variable | Default | Description |
|---|---|---|
LUMBER_CONNECTOR |
vercel |
Log provider: vercel, flyio, supabase |
LUMBER_API_KEY |
— | Provider API key/token |
LUMBER_ENDPOINT |
— | Provider API endpoint URL override |
LUMBER_MODE |
stream |
Pipeline mode: stream or query |
LUMBER_VERBOSITY |
standard |
Output verbosity: minimal, standard, full |
LUMBER_OUTPUT |
stdout |
Output destination |
LUMBER_OUTPUT_PRETTY |
false |
Pretty-print JSON output |
| Variable | Default | Description |
|---|---|---|
LUMBER_MODEL_PATH |
models/model_quantized.onnx |
Path to ONNX model file |
LUMBER_VOCAB_PATH |
models/vocab.txt |
Path to tokenizer vocabulary |
LUMBER_PROJECTION_PATH |
models/2_Dense/model.safetensors |
Path to projection weights |
LUMBER_CONFIDENCE_THRESHOLD |
0.5 |
Min confidence to classify (0–1) |
LUMBER_DEDUP_WINDOW |
5s |
Dedup window duration (0 disables) |
LUMBER_MAX_BUFFER_SIZE |
1000 |
Max events buffered before force flush |
| Variable | Default | Description |
|---|---|---|
LUMBER_LOG_LEVEL |
info |
Internal log level: debug, info, warn, error |
LUMBER_SHUTDOWN_TIMEOUT |
10s |
Max drain time on shutdown |
LUMBER_POLL_INTERVAL |
provider default | Polling interval for stream mode |
| Variable | Provider | Description |
|---|---|---|
LUMBER_VERCEL_PROJECT_ID |
Vercel | Vercel project ID |
LUMBER_VERCEL_TEAM_ID |
Vercel | Vercel team ID (optional) |
LUMBER_FLY_APP_NAME |
Fly.io | Fly.io application name |
LUMBER_SUPABASE_PROJECT_REF |
Supabase | Supabase project reference |
LUMBER_SUPABASE_TABLES |
Supabase | Comma-separated log table list |
| Level | Behavior |
|---|---|
minimal |
Raw logs truncated to 200 characters |
standard |
Raw logs truncated to 2000 characters |
full |
Complete raw logs preserved |
Lumber ships with 42 leaf labels organized under 8 top-level categories. Every log is classified into exactly one leaf. The taxonomy is opinionated by design — a finite label set makes downstream consumption predictable.
| Category | Labels |
|---|---|
| ERROR | connection_failure, auth_failure, authorization_failure, timeout, runtime_exception, validation_error, out_of_memory, rate_limited, dependency_error |
| REQUEST | success, client_error, server_error, redirect, slow_request |
| DEPLOY | build_started, build_succeeded, build_failed, deploy_started, deploy_succeeded, deploy_failed, rollback |
| SYSTEM | health_check, scaling_event, resource_alert, process_lifecycle, config_change |
| ACCESS | login_success, login_failure, session_expired, permission_change, api_key_event |
| PERFORMANCE | latency_spike, throughput_drop, queue_backlog, cache_event, db_slow_query |
| DATA | query_executed, migration, replication |
| SCHEDULED | cron_started, cron_completed, cron_failed |
Classification uses cosine similarity between the log's embedding vector and pre-embedded taxonomy label descriptions. Labels below the confidence threshold (default 0.5) are marked UNCLASSIFIED.
Lumber uses MongoDB LEAF (mdbr-leaf-mt), a 23M parameter text embedding model. Runs locally via ONNX Runtime — no external API calls, no GPU required.
| Property | Value |
|---|---|
| Size | ~23MB (int8 quantized) |
| Output dimension | 1024 (384-dim transformer + learned projection) |
| Tokenizer | WordPiece (30,522 tokens, lowercase) |
| Max sequence length | 128 tokens |
| Runtime | ONNX Runtime via onnxruntime-go |
Lumber can be imported as a Go library. Classify log text directly in your application — no subprocess, no stdout parsing.
go get github.com/kaminocorp/lumberimport "github.com/kaminocorp/lumber/pkg/lumber"// Load once at startup (~100-300ms)
l, err := lumber.New(lumber.WithModelDir("models/"))
if err != nil {
log.Fatal(err)
}
defer l.Close()
event, _ := l.Classify("ERROR: connection refused to db-primary:5432")
fmt.Println(event.Type, event.Category) // ERROR connection_failure// Single batched ONNX inference call — ~10x faster than looping Classify
events, _ := l.ClassifyBatch([]string{
"ERROR: connection refused",
"GET /api/users 200 OK 12ms",
"Build succeeded in 45s",
})event, _ := l.ClassifyLog(lumber.Log{
Text: "ERROR: connection refused",
Timestamp: time.Now(),
Source: "vercel",
Metadata: map[string]any{"project": "api-prod"},
})for _, cat := range l.Taxonomy() {
fmt.Printf("%s: %d labels\n", cat.Name, len(cat.Labels))
}The Lumber instance is safe for concurrent use. Create once, reuse across requests.
For complete API reference, integration patterns (monitoring agents, HTTP middleware, batch workers), performance tuning, and troubleshooting, see the Integration Guide.
Lumber supports multiple simultaneous output destinations.
| Destination | Env Var | CLI Flag | Behavior |
|---|---|---|---|
| stdout | (always on) | — | NDJSON to stdout (synchronous) |
| File | LUMBER_OUTPUT_FILE |
-output-file |
NDJSON to file with optional rotation |
| Webhook | LUMBER_WEBHOOK_URL |
-webhook-url |
Batched HTTP POST with retry |
File and webhook outputs run asynchronously — they don't stall the pipeline. Webhook uses drop-on-full semantics (lossy by design for non-critical destinations).
# Stream to stdout + file + webhook simultaneously
export LUMBER_OUTPUT_FILE=/var/log/lumber/events.jsonl
export LUMBER_OUTPUT_FILE_MAX_SIZE=104857600 # 100MB rotation
export LUMBER_WEBHOOK_URL=https://hooks.example.com/lumber
./bin/lumbercmd/lumber/ CLI entrypoint
pkg/lumber/ Public library API (Classify, ClassifyBatch, Taxonomy)
internal/
config/ Environment + CLI flag configuration
connector/ Connector interface, registry
vercel/ Vercel REST API connector
flyio/ Fly.io HTTP logs connector
supabase/ Supabase Analytics connector
httpclient/ Shared HTTP client (auth, retry, rate limits)
engine/ Classification engine orchestration
embedder/ ONNX Runtime embedding (tokenizer, projection)
classifier/ Cosine similarity classification
compactor/ Token-aware log compaction
dedup/ Event deduplication
taxonomy/ Taxonomy tree and default labels
testdata/ 153-entry labeled test corpus
logging/ Structured internal logging (slog)
model/ Domain types (RawLog, CanonicalEvent, TaxonomyNode)
output/ Output formatting and writers
stdout/ NDJSON stdout writer
file/ NDJSON file writer with rotation
webhook/ Batched HTTP POST with retry
multi/ Fan-out to multiple outputs
async/ Channel-based async wrapper
pipeline/ Stream and Query orchestration, buffering
models/ ONNX model files (downloaded via make)
docs/ Plans, completion notes, changelog
make build # Build binary to bin/lumber
make test # Run all tests
make lint # Run golangci-lint
make clean # Remove build artifacts
make download-model # Fetch ONNX model + tokenizer from HuggingFaceLumber is in beta.
- Project scaffolding and pipeline skeleton
- ONNX Runtime integration and model download
- Pure-Go WordPiece tokenizer
- Mean pooling and dense projection (1024-dim embeddings)
- Taxonomy pre-embedding (42 leaves, 8 roots)
- Classification pipeline — 100% accuracy on 153-entry test corpus
- Log connectors (Vercel, Fly.io, Supabase)
- Shared HTTP client with retry and rate limit handling
- Pipeline integration (stream + query modes)
- Structured internal logging, config validation
- Per-log error resilience, bounded dedup buffer
- Graceful shutdown with timeout
- CLI flags and query mode access
- Multi-output architecture (file, webhook, async fan-out)
- Public library API (
pkg/lumber) - Additional connectors (AWS CloudWatch, Datadog, Grafana Loki)
- HTTP server mode
- Adaptive taxonomy (self-growing/trimming)
See docs/changelog.md for detailed release notes.