{"id":164764,"date":"2026-03-26T22:09:13","date_gmt":"2026-03-26T19:09:13","guid":{"rendered":"https:\/\/computingforgeeks.com\/deploy-tempo-kubernetes\/"},"modified":"2026-03-27T01:52:28","modified_gmt":"2026-03-26T22:52:28","slug":"deploy-tempo-kubernetes","status":"publish","type":"post","link":"https:\/\/computingforgeeks.com\/deploy-tempo-kubernetes\/","title":{"rendered":"Deploy Grafana Tempo for Distributed Tracing in Kubernetes"},"content":{"rendered":"\n<p>Something goes wrong in production. A user&#8217;s checkout takes 12 seconds instead of the usual 200ms. Metrics tell you latency spiked. Logs tell you which pod threw an error. But neither tells you <em>which service in the chain<\/em> caused the slowdown. That gap is exactly what distributed tracing fills.<\/p>\n\n\n\n<p><a href=\"https:\/\/grafana.com\/docs\/tempo\/latest\/\" target=\"_blank\" rel=\"noreferrer noopener\">Grafana Tempo<\/a> is a distributed tracing backend that stores traces with minimal resource overhead. It plugs directly into Grafana alongside Prometheus (metrics) and Loki (logs), completing the three pillars of observability in one UI. This guide deploys Tempo and an <a href=\"https:\/\/opentelemetry.io\/docs\/collector\/\" target=\"_blank\" rel=\"noreferrer noopener\">OpenTelemetry Collector<\/a> on Kubernetes using Helm, sends test traces from a simulated <code>order-service<\/code>, and queries them in Grafana with TraceQL. If you followed the <a href=\"https:\/\/computingforgeeks.com\/install-prometheus-grafana-kubernetes\/\" target=\"_blank\" rel=\"noreferrer noopener\">Prometheus and Grafana deployment guide<\/a> (Article 1) and the <a href=\"https:\/\/computingforgeeks.com\/deploy-loki-kubernetes\/\" target=\"_blank\" rel=\"noreferrer noopener\">Loki log aggregation guide<\/a> (Article 2), this is the natural next step.<\/p>\n\n\n\n<p><em>Tested <strong>March 2026<\/strong> | Tempo 2.9.0 (chart 1.24.4), OTel Collector 0.120.0, k3s v1.34.5, Grafana 11.x<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">How Distributed Tracing Works<\/h2>\n\n\n\n<p>A <strong>trace<\/strong> represents the full journey of a single request through your system. It is a tree of <strong>spans<\/strong>, where each span captures one discrete operation: an HTTP call, a database query, a message publish, a cache lookup. Every span carries a <code>traceID<\/code> (shared across the entire request), a <code>spanID<\/code> (unique to this operation), a <code>parentSpanID<\/code> (which span triggered it), plus the service name, operation name, start time, duration, and arbitrary key-value attributes.<\/p>\n\n\n\n<p>When a user hits <code>\/api\/checkout<\/code> and that request touches five microservices, tracing shows each hop as a span in a waterfall diagram. You see exactly where the 12 seconds went: 10ms in the API gateway, 40ms in inventory, 11.8 seconds waiting on the payment provider.<\/p>\n\n\n\n<p><strong>OpenTelemetry<\/strong> (OTel) is the vendor-neutral standard for instrumenting applications and collecting telemetry data. The data flow looks like this:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Application code (instrumented with OTel SDK) generates spans<\/li>\n\n<li>Spans are sent to the <strong>OTel Collector<\/strong>, which batches, processes, and forwards them<\/li>\n\n<li>The Collector exports spans to <strong>Tempo<\/strong> for storage<\/li>\n\n<li><strong>Grafana<\/strong> queries Tempo via TraceQL and renders waterfall diagrams<\/li>\n<\/ol>\n\n\n\n<p>The OTel Collector sits between your apps and Tempo so that applications never need to know the backend storage details. Swapping Tempo for Jaeger or another backend later means reconfiguring one Collector, not every microservice.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Prerequisites<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A running Kubernetes cluster with <code>kubectl<\/code> and <code>helm<\/code> configured (tested on <a href=\"https:\/\/computingforgeeks.com\/run-kubernetes-cluster-on-rocky-linux-using-k3s\/\" target=\"_blank\" rel=\"noreferrer noopener\">k3s v1.34.5<\/a>)<\/li>\n\n<li>Grafana already deployed from <a href=\"https:\/\/computingforgeeks.com\/install-prometheus-grafana-kubernetes\/\" target=\"_blank\" rel=\"noreferrer noopener\">kube-prometheus-stack<\/a> (Article 1)<\/li>\n\n<li>Optionally, Loki deployed from <a href=\"https:\/\/computingforgeeks.com\/deploy-loki-kubernetes\/\" target=\"_blank\" rel=\"noreferrer noopener\">the Loki guide<\/a> (Article 2) for log correlation<\/li>\n\n<li>The Grafana Helm repo already added: <code>helm repo add grafana https:\/\/grafana.github.io\/helm-charts<\/code><\/li>\n\n<li>A <code>monitoring<\/code> namespace where Prometheus, Grafana, and Loki are running<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Create the Tempo Values File<\/h2>\n\n\n\n<p>Tempo ships as a Helm chart with sensible defaults, but a few settings need explicit configuration: the OTLP receivers (so the Collector can send traces), persistent storage (so traces survive pod restarts), and the metrics generator (which derives RED metrics from traces and pushes them to Prometheus).<\/p>\n\n\n\n<p>Create a values file for the Tempo Helm chart:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>vi tempo-values.yaml<\/code><\/pre>\n\n\n\n<p>Add the following configuration:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>tempo:\n  storage:\n    trace:\n      backend: local\n      local:\n        path: \/var\/tempo\/traces\n      wal:\n        path: \/var\/tempo\/wal\n  receivers:\n    otlp:\n      protocols:\n        grpc:\n          endpoint: \"0.0.0.0:4317\"\n        http:\n          endpoint: \"0.0.0.0:4318\"\n  metricsGenerator:\n    enabled: true\n    remoteWriteUrl: \"http:\/\/prometheus-kube-prometheus-prometheus.monitoring.svc:9090\/api\/v1\/write\"\npersistence:\n  enabled: true\n  storageClassName: local-path\n  size: 5Gi<\/code><\/pre>\n\n\n\n<p>A few things worth noting in this configuration:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>OTLP receivers<\/strong> on gRPC (port 4317) and HTTP (port 4318) accept traces from the OTel Collector or directly from instrumented applications<\/li>\n\n<li><strong>metricsGenerator<\/strong> automatically derives rate, error, and duration (RED) metrics from incoming traces and writes them to Prometheus via remote write. This means you get service-level metrics without adding a single Prometheus scrape target<\/li>\n\n<li><strong>local backend<\/strong> stores traces on a persistent volume. For production clusters with S3 or MinIO, change <code>backend: s3<\/code> and add the bucket configuration (same pattern as the Loki article)<\/li>\n\n<li><strong>5Gi PVC<\/strong> on <code>local-path<\/code> is enough for development and small clusters. Production workloads generating thousands of traces per second will need more<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Deploy Tempo<\/h2>\n\n\n\n<p>Install the Tempo Helm chart into the <code>monitoring<\/code> namespace using the values file:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>helm install tempo grafana\/tempo \\\n  --namespace monitoring \\\n  --values tempo-values.yaml \\\n  --wait --timeout 5m<\/code><\/pre>\n\n\n\n<p>Helm pulls chart version 1.24.4 (app version 2.9.0) and deploys a StatefulSet with a single replica:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>NAME: tempo\nLAST DEPLOYED: Wed Mar 25 2026 14:22:31\nNAMESPACE: monitoring\nSTATUS: deployed\nCHART: tempo-1.24.4\nAPP VERSION: 2.9.0<\/code><\/pre>\n\n\n\n<p>Verify the pod is running:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get pods -n monitoring -l app.kubernetes.io\/name=tempo<\/code><\/pre>\n\n\n\n<p>You should see the Tempo pod in <code>Running<\/code> state with all containers ready:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>NAME      READY   STATUS    RESTARTS   AGE\ntempo-0   1\/1     Running   0          47s<\/code><\/pre>\n\n\n\n<p>Confirm the service exposes the expected ports:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get svc tempo -n monitoring<\/code><\/pre>\n\n\n\n<p>The output shows three ports: gRPC 4317, HTTP 4318 (both for OTLP ingestion), and API 3200 (which Grafana uses to query traces):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>NAME    TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                                        AGE\ntempo   ClusterIP   10.43.87.214   &lt;none&gt;        3200\/TCP,9095\/TCP,4317\/TCP,4318\/TCP,9411\/TCP   52s<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Deploy the OpenTelemetry Collector<\/h2>\n\n\n\n<p>The OTel Collector acts as a trace pipeline between your applications and Tempo. Applications send OTLP data to the Collector, which batches spans and forwards them to Tempo. This decouples your app instrumentation from the storage backend.<\/p>\n\n\n\n<p>Create the manifest file:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>vi otel-collector.yaml<\/code><\/pre>\n\n\n\n<p>Add the full ConfigMap, Deployment, and Service:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>apiVersion: v1\nkind: ConfigMap\nmetadata:\n  name: otel-collector-config\n  namespace: monitoring\ndata:\n  otel-collector-config.yaml: |\n    receivers:\n      otlp:\n        protocols:\n          grpc:\n            endpoint: \"0.0.0.0:4317\"\n          http:\n            endpoint: \"0.0.0.0:4318\"\n    processors:\n      batch:\n        timeout: 5s\n        send_batch_size: 1024\n    exporters:\n      otlp\/tempo:\n        endpoint: \"tempo.monitoring.svc.cluster.local:4317\"\n        tls:\n          insecure: true\n    service:\n      pipelines:\n        traces:\n          receivers: [otlp]\n          processors: [batch]\n          exporters: [otlp\/tempo]\n---\napiVersion: apps\/v1\nkind: Deployment\nmetadata:\n  name: otel-collector\n  namespace: monitoring\n  labels:\n    app: otel-collector\nspec:\n  replicas: 1\n  selector:\n    matchLabels:\n      app: otel-collector\n  template:\n    metadata:\n      labels:\n        app: otel-collector\n    spec:\n      containers:\n      - name: otel-collector\n        image: otel\/opentelemetry-collector-contrib:0.120.0\n        args:\n        - \"--config=\/etc\/otel\/otel-collector-config.yaml\"\n        ports:\n        - containerPort: 4317\n          name: otlp-grpc\n        - containerPort: 4318\n          name: otlp-http\n        volumeMounts:\n        - name: config\n          mountPath: \/etc\/otel\n      volumes:\n      - name: config\n        configMap:\n          name: otel-collector-config\n---\napiVersion: v1\nkind: Service\nmetadata:\n  name: otel-collector\n  namespace: monitoring\nspec:\n  selector:\n    app: otel-collector\n  ports:\n  - name: otlp-grpc\n    port: 4317\n    targetPort: 4317\n  - name: otlp-http\n    port: 4318\n    targetPort: 4318<\/code><\/pre>\n\n\n\n<p>The pipeline is straightforward: the <code>otlp<\/code> receiver accepts traces on both gRPC and HTTP, the <code>batch<\/code> processor groups spans into batches of 1024 (or flushes every 5 seconds), and the <code>otlp\/tempo<\/code> exporter forwards everything to Tempo&#8217;s gRPC endpoint inside the cluster. The <code>tls.insecure: true<\/code> setting is fine for in-cluster communication where traffic stays on the pod network.<\/p>\n\n\n\n<p>Apply the manifest:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl apply -f otel-collector.yaml<\/code><\/pre>\n\n\n\n<p>All three resources should be created:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>configmap\/otel-collector-config created\ndeployment.apps\/otel-collector created\nservice\/otel-collector created<\/code><\/pre>\n\n\n\n<p>Confirm the Collector pod is running:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get pods -n monitoring -l app=otel-collector<\/code><\/pre>\n\n\n\n<p>The pod should reach <code>Running<\/code> status within a few seconds:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>NAME                              READY   STATUS    RESTARTS   AGE\notel-collector-6b8f4d7c9a-xk2mf   1\/1     Running   0          18s<\/code><\/pre>\n\n\n\n<p>Check the Collector logs to verify it connected to Tempo successfully:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl logs -n monitoring -l app=otel-collector --tail=10<\/code><\/pre>\n\n\n\n<p>Look for a line confirming the exporter started without errors. If you see connection refused messages, confirm the Tempo service is reachable on port 4317.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Add Tempo as a Grafana Data Source<\/h2>\n\n\n\n<p>Grafana needs a Tempo data source to query and visualize traces. You can add it through the Grafana UI or via the HTTP API. The API approach is reproducible and works well in automated setups.<\/p>\n\n\n\n<p>First, get the Grafana service URL (if using a NodePort setup from Article 1):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get svc -n monitoring prometheus-grafana<\/code><\/pre>\n\n\n\n<p>Create the Tempo data source via the API:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>curl -s -X POST \"http:\/\/10.0.1.10:30080\/api\/datasources\" \\\n  -H \"Content-Type: application\/json\" \\\n  -u \"admin:password\" \\\n  -d '{\n    \"name\": \"Tempo\",\n    \"type\": \"tempo\",\n    \"url\": \"http:\/\/tempo.monitoring.svc.cluster.local:3200\",\n    \"access\": \"proxy\",\n    \"jsonData\": {\n      \"nodeGraph\": {\"enabled\": true},\n      \"tracesToLogs\": {\n        \"datasourceUid\": \"loki\",\n        \"filterByTraceID\": true,\n        \"filterBySpanID\": false\n      }\n    }\n  }'<\/code><\/pre>\n\n\n\n<p>Note that Tempo&#8217;s API port is <strong>3200<\/strong>, not 3100 (which is Loki&#8217;s). The <code>tracesToLogs<\/code> configuration links trace spans to their corresponding log entries in Loki, which becomes useful when debugging issues that span multiple services. The <code>nodeGraph<\/code> option enables the service dependency graph visualization.<\/p>\n\n\n\n<p>A successful response returns the datasource ID:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>{\"datasource\":{\"id\":4,\"uid\":\"tempo\",\"name\":\"Tempo\",\"type\":\"tempo\"},\"id\":4,\"message\":\"Datasource added\",\"name\":\"Tempo\"}<\/code><\/pre>\n\n\n\n<p>Open Grafana and navigate to <strong>Connections > Data sources<\/strong>. You should see all four data sources listed: Prometheus (default), Loki, Alertmanager, and the newly added Tempo.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1920\" height=\"941\" src=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/04-datasources-all.png\" alt=\"Grafana data sources showing Prometheus, Loki, Alertmanager, and Tempo configured\" class=\"wp-image-164763\" title=\"\" srcset=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/04-datasources-all.png 1920w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/04-datasources-all-300x147.png 300w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/04-datasources-all-1024x502.png 1024w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/04-datasources-all-768x376.png 768w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/04-datasources-all-1536x753.png 1536w\" sizes=\"auto, (max-width: 1920px) 100vw, 1920px\" \/><\/figure>\n\n\n\n<p>Click into the Tempo data source to verify the connection settings. The URL should point to <code>http:\/\/tempo.monitoring.svc.cluster.local:3200<\/code> and the &#8220;Save &#038; test&#8221; button should return a green success message.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1920\" height=\"941\" src=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/05-tempo-datasource.png\" alt=\"Tempo data source configuration in Grafana showing connection URL and settings\" class=\"wp-image-164762\" title=\"\" srcset=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/05-tempo-datasource.png 1920w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/05-tempo-datasource-300x147.png 300w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/05-tempo-datasource-1024x502.png 1024w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/05-tempo-datasource-768x376.png 768w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/05-tempo-datasource-1536x753.png 1536w\" sizes=\"auto, (max-width: 1920px) 100vw, 1920px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Send Test Traces<\/h2>\n\n\n\n<p>Before instrumenting a real application, you can verify the entire pipeline by sending OTLP traces directly to Tempo via its HTTP receiver. This confirms that Tempo accepts, stores, and serves traces to Grafana without any application-side complexity.<\/p>\n\n\n\n<p>Get the Tempo ClusterIP:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>TEMPO_IP=$(kubectl get svc tempo -n monitoring -o jsonpath='{.spec.clusterIP}')\necho $TEMPO_IP<\/code><\/pre>\n\n\n\n<p>This returns the internal service IP that accepts OTLP data on port 4318:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>10.43.87.214<\/code><\/pre>\n\n\n\n<p>Send a test trace simulating an <code>order-service<\/code> handling a <code>process-order<\/code> request. Run this from any pod with <code>curl<\/code> available, or from the node if using k3s with host networking:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>TRACE_ID=$(cat \/proc\/sys\/kernel\/random\/uuid | tr -d '-' | head -c 32)\nSPAN_ID=$(cat \/proc\/sys\/kernel\/random\/uuid | tr -d '-' | head -c 16)\nSTART=$(date +%s%N)\nEND=$(( $(date +%s) + 1 ))$(date +%N)\n\ncurl -X POST \"http:\/\/$TEMPO_IP:4318\/v1\/traces\" \\\n  -H 'Content-Type: application\/json' \\\n  -d '{\n    \"resourceSpans\": [{\n      \"resource\": {\n        \"attributes\": [\n          {\"key\": \"service.name\", \"value\": {\"stringValue\": \"order-service\"}}\n        ]\n      },\n      \"scopeSpans\": [{\n        \"scope\": {\"name\": \"demo\"},\n        \"spans\": [{\n          \"traceId\": \"'\"$TRACE_ID\"'\",\n          \"spanId\": \"'\"$SPAN_ID\"'\",\n          \"name\": \"process-order\",\n          \"kind\": 2,\n          \"startTimeUnixNano\": \"'\"$START\"'\",\n          \"endTimeUnixNano\": \"'\"$END\"'\",\n          \"status\": {\"code\": 1},\n          \"attributes\": [\n            {\"key\": \"http.method\", \"value\": {\"stringValue\": \"POST\"}},\n            {\"key\": \"http.url\", \"value\": {\"stringValue\": \"\/api\/orders\"}},\n            {\"key\": \"http.status_code\", \"value\": {\"intValue\": \"200\"}}\n          ]\n        }]\n      }]\n    }]\n  }'<\/code><\/pre>\n\n\n\n<p>A successful ingestion returns an empty partial success object, which means all spans were accepted:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>{\"partialSuccess\":{}}<\/code><\/pre>\n\n\n\n<p>Send a batch of traces to populate Grafana with enough data for meaningful exploration. This loop creates 10 traces with ~1 second duration each:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>for i in $(seq 1 10); do\n  TRACE_ID=$(cat \/proc\/sys\/kernel\/random\/uuid | tr -d '-' | head -c 32)\n  SPAN_ID=$(cat \/proc\/sys\/kernel\/random\/uuid | tr -d '-' | head -c 16)\n  START=$(date +%s%N)\n  sleep 0.1\n  END=$(( $(date +%s) + 1 ))$(date +%N)\n  curl -s -X POST \"http:\/\/$TEMPO_IP:4318\/v1\/traces\" \\\n    -H 'Content-Type: application\/json' \\\n    -d '{\"resourceSpans\":[{\"resource\":{\"attributes\":[{\"key\":\"service.name\",\"value\":{\"stringValue\":\"order-service\"}}]},\"scopeSpans\":[{\"scope\":{\"name\":\"demo\"},\"spans\":[{\"traceId\":\"'\"$TRACE_ID\"'\",\"spanId\":\"'\"$SPAN_ID\"'\",\"name\":\"process-order\",\"kind\":2,\"startTimeUnixNano\":\"'\"$START\"'\",\"endTimeUnixNano\":\"'\"$END\"'\",\"status\":{\"code\":1},\"attributes\":[{\"key\":\"http.method\",\"value\":{\"stringValue\":\"POST\"}},{\"key\":\"http.url\",\"value\":{\"stringValue\":\"\/api\/orders\"}},{\"key\":\"http.status_code\",\"value\":{\"intValue\":\"200\"}}]}]}]}]}'\n  echo \" trace $i sent\"\ndone<\/code><\/pre>\n\n\n\n<p>Each iteration should print the success response followed by the trace number:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>{\"partialSuccess\":{}} trace 1 sent\n{\"partialSuccess\":{}} trace 2 sent\n{\"partialSuccess\":{}} trace 3 sent\n...\n{\"partialSuccess\":{}} trace 10 sent<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Query Traces in Grafana<\/h2>\n\n\n\n<p>Open Grafana and navigate to <strong>Explore<\/strong>. Select <strong>Tempo<\/strong> from the data source dropdown at the top.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Search by Service Name<\/h3>\n\n\n\n<p>Switch to the <strong>TraceQL<\/strong> query type and enter:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>{resource.service.name=\"order-service\"}<\/code><\/pre>\n\n\n\n<p>Click <strong>Run query<\/strong>. Grafana displays a list of matching traces with their trace IDs, duration, and span count. The test traces from the <code>order-service<\/code> should appear with durations around 1001ms each (the 1-second gap between start and end timestamps).<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1920\" height=\"941\" src=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/01-tempo-trace-search.png\" alt=\"Grafana Explore view showing TraceQL search results for order-service traces\" class=\"wp-image-164759\" title=\"\" srcset=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/01-tempo-trace-search.png 1920w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/01-tempo-trace-search-300x147.png 300w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/01-tempo-trace-search-1024x502.png 1024w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/01-tempo-trace-search-768x376.png 768w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/01-tempo-trace-search-1536x753.png 1536w\" sizes=\"auto, (max-width: 1920px) 100vw, 1920px\" \/><\/figure>\n\n\n\n<p>The results table shows each trace with its ID, root service, root span name, start time, and duration. All 10+ traces from the batch send should be visible.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1920\" height=\"941\" src=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/02-tempo-traceql-results.png\" alt=\"TraceQL query results showing order-service traces with process-order spans and durations\" class=\"wp-image-164760\" title=\"\" srcset=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/02-tempo-traceql-results.png 1920w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/02-tempo-traceql-results-300x147.png 300w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/02-tempo-traceql-results-1024x502.png 1024w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/02-tempo-traceql-results-768x376.png 768w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/02-tempo-traceql-results-1536x753.png 1536w\" sizes=\"auto, (max-width: 1920px) 100vw, 1920px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">View the Trace Waterfall<\/h3>\n\n\n\n<p>Click any trace ID to open the detailed waterfall view. Each span appears as a horizontal bar showing its duration relative to the total trace. For the test data, you will see a single <code>process-order<\/code> span from <code>order-service<\/code>. In a real application with multiple microservices, this waterfall would show the full call chain with parent-child relationships between spans.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1920\" height=\"941\" src=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/03-tempo-trace-detail.png\" alt=\"Tempo trace waterfall diagram showing process-order span details with HTTP attributes\" class=\"wp-image-164761\" title=\"\" srcset=\"https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/03-tempo-trace-detail.png 1920w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/03-tempo-trace-detail-300x147.png 300w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/03-tempo-trace-detail-1024x502.png 1024w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/03-tempo-trace-detail-768x376.png 768w, https:\/\/computingforgeeks.com\/wp-content\/uploads\/2026\/03\/03-tempo-trace-detail-1536x753.png 1536w\" sizes=\"auto, (max-width: 1920px) 100vw, 1920px\" \/><\/figure>\n\n\n\n<p>The span detail panel shows all attributes attached to the span: <code>http.method=POST<\/code>, <code>http.url=\/api\/orders<\/code>, <code>http.status_code=200<\/code>. These attributes are what make traces searchable and filterable in TraceQL.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">TraceQL Quick Reference<\/h2>\n\n\n\n<p>TraceQL is Tempo&#8217;s query language, similar in spirit to PromQL and LogQL. Here are the most useful queries for everyday debugging:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Query<\/th><th>Purpose<\/th><\/tr><\/thead><tbody><tr><td><code>{resource.service.name=\"order-service\"}<\/code><\/td><td>All traces from a specific service<\/td><\/tr><tr><td><code>{span.http.status_code >= 500}<\/code><\/td><td>Traces containing server errors<\/td><\/tr><tr><td><code>{name=\"process-order\"}<\/code><\/td><td>Spans matching a specific operation name<\/td><\/tr><tr><td><code>{duration > 1s}<\/code><\/td><td>Slow spans exceeding 1 second<\/td><\/tr><tr><td><code>{resource.service.name=\"order-service\" &amp;&amp; duration > 500ms}<\/code><\/td><td>Slow operations in a specific service<\/td><\/tr><tr><td><code>{span.http.method=\"POST\" &amp;&amp; span.http.status_code=200}<\/code><\/td><td>Successful POST requests<\/td><\/tr><tr><td><code>{rootServiceName=\"api-gateway\"}<\/code><\/td><td>Traces originating from a specific service<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The <code>resource.*<\/code> prefix queries attributes on the resource (service-level metadata), while <code>span.*<\/code> queries attributes on individual spans. The <code>duration<\/code> and <code>name<\/code> fields are built-in span properties that don&#8217;t need a prefix.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Instrument a Real Application<\/h2>\n\n\n\n<p>Sending manual traces proves the pipeline works, but real value comes from auto-instrumenting applications. Here is a Python Flask example using the OpenTelemetry SDK. The OTel Flask instrumentation automatically creates spans for every incoming HTTP request without modifying your route handlers.<\/p>\n\n\n\n<p>The required Python packages:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>flask\nopentelemetry-api\nopentelemetry-sdk\nopentelemetry-exporter-otlp-proto-grpc\nopentelemetry-instrumentation-flask<\/code><\/pre>\n\n\n\n<p>The application code with OTel instrumentation:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from flask import Flask\nfrom opentelemetry import trace\nfrom opentelemetry.sdk.trace import TracerProvider\nfrom opentelemetry.sdk.trace.export import BatchSpanProcessor\nfrom opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter\nfrom opentelemetry.instrumentation.flask import FlaskInstrumentor\n\n# Configure the OTLP exporter pointing to the OTel Collector\nprovider = TracerProvider()\nexporter = OTLPSpanExporter(\n    endpoint=\"otel-collector.monitoring.svc.cluster.local:4317\",\n    insecure=True\n)\nprovider.add_span_processor(BatchSpanProcessor(exporter))\ntrace.set_tracer_provider(provider)\n\napp = Flask(__name__)\nFlaskInstrumentor().instrument_app(app)\n\n@app.route(\"\/order\")\ndef create_order():\n    tracer = trace.get_tracer(__name__)\n    with tracer.start_as_current_span(\"validate-order\"):\n        pass  # validation logic\n    with tracer.start_as_current_span(\"charge-payment\"):\n        pass  # payment logic\n    return {\"status\": \"created\"}<\/code><\/pre>\n\n\n\n<p>The <code>FlaskInstrumentor<\/code> automatically creates a root span for each HTTP request. The manual <code>start_as_current_span<\/code> calls create child spans within that request, giving you visibility into individual operations like order validation and payment processing.<\/p>\n\n\n\n<p>In your Kubernetes Deployment manifest, set the OTel environment variables so the SDK knows where to send traces:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>env:\n- name: OTEL_EXPORTER_OTLP_ENDPOINT\n  value: \"http:\/\/otel-collector.monitoring.svc.cluster.local:4317\"\n- name: OTEL_SERVICE_NAME\n  value: \"order-service\"<\/code><\/pre>\n\n\n\n<p>The <code>OTEL_SERVICE_NAME<\/code> variable sets the <code>service.name<\/code> resource attribute, which is the primary identifier you use in TraceQL queries. Every microservice should have a unique value here.<\/p>\n\n\n\n<p>Other languages have equivalent OTel SDKs. Java, Go, Node.js, .NET, and Ruby all support auto-instrumentation that generates spans with zero code changes beyond adding the SDK dependency and setting the endpoint environment variable.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Correlate Traces with Logs and Metrics<\/h2>\n\n\n\n<p>Each pillar of observability answers a different question. <strong>Metrics<\/strong> from Prometheus detect that something is wrong (latency spike, error rate increase). <strong>Traces<\/strong> from Tempo pinpoint which service and operation caused it. <strong>Logs<\/strong> from Loki show the exact error messages and stack traces from that service at that moment.<\/p>\n\n\n\n<p>Grafana ties all three together. When you added the Tempo data source earlier with the <code>tracesToLogs<\/code> configuration, you enabled a direct link from trace spans to Loki log queries filtered by the same time window and service labels. In practice, the workflow looks like this:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>A Prometheus alert fires because <code>order-service<\/code> p99 latency exceeded 2 seconds<\/li>\n\n<li>You open Tempo in Grafana and query <code>{resource.service.name=\"order-service\" && duration > 2s}<\/code><\/li>\n\n<li>The waterfall shows that the <code>charge-payment<\/code> span took 1.8 seconds (normally 50ms)<\/li>\n\n<li>You click &#8220;Logs for this span&#8221; which opens a Loki query filtered to that service and time range<\/li>\n\n<li>The logs reveal a payment gateway timeout with the exact error message<\/li>\n<\/ol>\n\n\n\n<p>The Tempo metrics generator (configured earlier) also closes the loop in the other direction. RED metrics derived from traces appear as Prometheus metrics, so you can create <a href=\"https:\/\/computingforgeeks.com\/grafana-dashboards-alerting-kubernetes\/\" target=\"_blank\" rel=\"noreferrer noopener\">Grafana dashboards and alerts<\/a> based on trace-derived data without writing any PromQL recording rules yourself.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Complete Observability Stack<\/h2>\n\n\n\n<p>With Tempo deployed, the full LGTM stack (Loki, Grafana, Tempo, Metrics) is now running in the <code>monitoring<\/code> namespace. List all Helm releases to confirm:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>helm list -n monitoring<\/code><\/pre>\n\n\n\n<p>All four releases should show <code>deployed<\/code> status:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>NAME        NAMESPACE    REVISION  STATUS    CHART                          APP VERSION\nloki        monitoring   1         deployed  loki-6.55.0                    3.6.7\nprometheus  monitoring   1         deployed  kube-prometheus-stack-82.14.1  v0.89.0\npromtail    monitoring   1         deployed  promtail-6.17.1                3.5.1\ntempo       monitoring   1         deployed  tempo-1.24.4                   2.9.0<\/code><\/pre>\n\n\n\n<p>Here is how each component fits together:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Component<\/th><th>Tool<\/th><th>Purpose<\/th><th>Data Source Port<\/th><\/tr><\/thead><tbody><tr><td>Logs<\/td><td>Loki<\/td><td>Log aggregation and search via LogQL<\/td><td>3100<\/td><\/tr><tr><td>Grafana<\/td><td>Grafana<\/td><td>Visualization, dashboards, alerting<\/td><td>30080 (NodePort)<\/td><\/tr><tr><td>Traces<\/td><td>Tempo<\/td><td>Distributed tracing via TraceQL<\/td><td>3200<\/td><\/tr><tr><td>Metrics<\/td><td>Prometheus<\/td><td>Metrics collection and PromQL queries<\/td><td>9090<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This completes the LGTM observability stack on Kubernetes. Every metric, log line, and trace from your cluster is now queryable from a single Grafana instance. For clusters generating large volumes of metrics that need long-term storage and global querying, <strong>Grafana Mimir<\/strong> is the natural next addition, handling the same role for metrics that Loki handles for logs and Tempo handles for traces.<\/p>\n\n","protected":false},"excerpt":{"rendered":"<p>Something goes wrong in production. A user&#8217;s checkout takes 12 seconds instead of the usual 200ms. Metrics tell you latency spiked. Logs tell you which pod threw an error. But neither tells you which service in the chain caused the slowdown. That gap is exactly what distributed tracing fills. Grafana Tempo is a distributed tracing &#8230; <a title=\"Deploy Grafana Tempo for Distributed Tracing in Kubernetes\" class=\"read-more\" href=\"https:\/\/computingforgeeks.com\/deploy-tempo-kubernetes\/\" aria-label=\"Read more about Deploy Grafana Tempo for Distributed Tracing in Kubernetes\">Read more<\/a><\/p>\n","protected":false},"author":3,"featured_media":164765,"comment_status":"open","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[299,317,165],"tags":[],"class_list":["post-164764","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-how-to","category-kubernetes","category-monitoring"],"_links":{"self":[{"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/posts\/164764","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/comments?post=164764"}],"version-history":[{"count":1,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/posts\/164764\/revisions"}],"predecessor-version":[{"id":164837,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/posts\/164764\/revisions\/164837"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/media\/164765"}],"wp:attachment":[{"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/media?parent=164764"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/categories?post=164764"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/computingforgeeks.com\/wp-json\/wp\/v2\/tags?post=164764"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}