The TensorZero Gateway exposes runtime metrics through a Prometheus-compatible endpoint. This allows you to monitor gateway performance, track usage patterns, and set up alerting using standard Prometheus tooling. This endpoint provides operational metrics about the gateway itself. It’s not meant to replace TensorZero’s observability features. You can access the metrics by scraping theDocumentation Index
Fetch the complete documentation index at: https://www.tensorzero.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
/metrics endpoint.
tensorzero_inference_latency_overhead_seconds
This metric tracks the latency overhead introduced by TensorZero on inference requests.
It measures the total request duration minus the time spent waiting for external model provider HTTP requests.
This is useful for understanding how much latency TensorZero adds to your inference requests, independently of model provider latency.
This metric is reported as a histogram with configurable buckets (default: [0.001, 0.01, 0.1]).
You can customize the buckets in your configuration file using gateway.metrics.tensorzero_inference_latency_overhead_seconds_buckets:
tensorzero.toml
GET /metrics
tensorzero_inferences_total
This metric counts the total number of inferences performed by TensorZero.
GET /metrics
tensorzero_requests_total
This metric counts the total number of requests handled by TensorZero.
GET /metrics