Skip to content

Tempo (OpenTelemetry)

Tempo is PerfShop's distributed tracing sink. It receives OTLP spans exported by the OpenTelemetry Java agent embedded in the backend, stores them in local blocks, and feeds Grafana datasources with rich features: traces ↔ logs correlation, service map, derived span-metrics pushed to Prometheus.

Source of truth

This page is taken from tempo/tempo-config.yml, the backend JAVA_OPTS in the compose files, and the Tempo datasource in grafana/provisioning/datasources/tempo.yml.

Pinned version

PerfShop pins Tempo to version 2.4.2 (image grafana/tempo:2.4.2). It is the only observability image, along with Grafana, to be pinned to a specific version (the others use latest). The reason: the metrics_generator configuration and the span-metrics format have evolved between major versions, and the integration with Grafana 12.0.0 has been validated specifically against Tempo 2.4.2.

Architecture

flowchart LR
  subgraph BE["perfshop-app (JVM)"]
    direction TB
    APP["Spring Boot 3.2"]
    OA["OpenTelemetry agent<br/>opentelemetry-javaagent.jar"]
    APP -.injected.-> OA
  end

  TEMPO["perfshop-tempo<br/>(2.4.2)"]
  PROM[("Prometheus")]
  GRAF["Grafana"]

  OA -->|"OTLP gRPC :4317<br/>(spans)"| TEMPO
  TEMPO -->|"metrics_generator<br/>service-graphs<br/>span-metrics<br/>remote_write"| PROM
  TEMPO -->|"Tempo datasource<br/>(query)"| GRAF
  PROM -->|"Prometheus datasource<br/>(spanmetrics + serviceMap)"| GRAF

Tempo configuration

OTLP receivers

distributor:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
        http:
          endpoint: 0.0.0.0:4318

Tempo exposes two OTLP endpoints simultaneously:

Port Protocol Usage
4317 gRPC Used by the OpenTelemetry Java agent of perfshop-app
4318 HTTP Available for SDKs that do not support gRPC (curl, Python without grpcio, etc.)

Both ports are exposed to the host via the variables TEMPO_OTLP_GRPC_PORT and TEMPO_OTLP_HTTP_PORT.

Ingester and storage

ingester:
  trace_idle_period: 10s
  max_block_bytes: 1_000_000
  max_block_duration: 5m

storage:
  trace:
    backend: local
    block:
      bloom_filter_false_positive: .05
    wal:
      path: /var/tempo/wal
    local:
      path: /var/tempo/blocks
    pool:
      max_workers: 100
      queue_depth: 10000
Parameter Value Effect
trace_idle_period 10s A trace inactive for 10 s is considered finished and flushed
max_block_bytes 1 MB WAL blocks are cut at 1 MB
max_block_duration 5m Hard duration cap per block
backend: local Filesystem storage (named volume tempo-data)
wal.path /var/tempo/wal Write-Ahead Log before flush
local.path /var/tempo/blocks Flushed blocks (immutable, compacted)
bloom_filter_false_positive 0.05 5% false positives on the index bloom filters

Server

server:
  http_listen_port: 3200
  log_level: warn

stream_over_http_enabled: true
Parameter Value Effect
http_listen_port 3200 HTTP query API (used by the Grafana Tempo datasource)
log_level warn Log level for Tempo itself (not verbose)
stream_over_http_enabled true Enables HTTP streaming for long searches

The default host port is 19200 (variable TEMPO_HTTP_PORT). The container internal port remains 3200 — only the host mapping changes.

metrics_generator — the game changer

metrics_generator:
  registry:
    external_labels:
      source: tempo
      cluster: perfshop
      job: perfshop-backend
  storage:
    path: /var/tempo/generator/wal
    remote_write:
      - url: http://prometheus:9090/api/v1/write
        send_exemplars: true

overrides:
  defaults:
    metrics_generator:
      processors:
        - service-graphs
        - span-metrics

This is where Tempo becomes much more than a simple trace store. The metrics_generator analyzes spans in real time and generates two families of derived metrics that it pushes into Prometheus via remote-write:

Family 1 — service-graphs

For each pair of entities (calling service → called service), generates metrics:

Metric Type Description
traces_service_graph_request_total counter Number of calls
traces_service_graph_request_failed_total counter Number of failed calls
traces_service_graph_request_server_seconds_* histogram Server-side latency
traces_service_graph_request_client_seconds_* histogram Client-side latency

These metrics feed Grafana's Service Map (Tempo datasource → Service Graph tab) which displays a directed graph of dependencies between services.

Family 2 — span-metrics

For each instrumented operation (HTTP URI, method, status), generates:

Metric Type Description
traces_spanmetrics_calls_total{service, span_name, status_code, ...} counter Number of invocations
traces_spanmetrics_latency_*{service, span_name, ...} histogram Latency distribution

These metrics enable the "P50 / P95 / P99 latency — all routes" panel of the Instructor APM dashboard to work — it queries Prometheus, not Tempo, but on series derived from spans.

External labels

external_labels:
  source: tempo
  cluster: perfshop
  job: perfshop-backend

These three labels are added to all the metrics pushed by Tempo into Prometheus. The label job: perfshop-backend aligns the span-metrics with the other Spring Boot metrics, which makes it possible to mix the two in a single PromQL query.

Exemplars

remote_write:
  - url: http://prometheus:9090/api/v1/write
    send_exemplars: true

Exemplars are links from a metric to a specific trace_id. With send_exemplars: true, each histogram sample can carry a trace_id that flows up to Grafana — the user then sees clickable points on latency graphs, and a click opens the corresponding trace. It is one of the three metrics ↔ traces correlation mechanisms in the stack.

OpenTelemetry agent on the backend side

The OpenTelemetry Java agent is embedded in the perfshop-backend image (path /agents/opentelemetry-javaagent.jar) and activated via JAVA_OPTS in the compose files:

-javaagent:/agents/opentelemetry-javaagent.jar
-Dotel.service.name=perfshop
-Dotel.exporter.otlp.endpoint=http://perfshop-tempo:4317
-Dotel.exporter.otlp.protocol=grpc
-Dotel.traces.exporter=otlp
-Dotel.metrics.exporter=none
-Dotel.logs.exporter=none
-Dotel.instrumentation.http.capture-headers.server.request=X-Admin-Token,Content-Type
-Dotel.instrumentation.jdbc.captured-statements.enabled=true
-Dotel.span.attribute.count.limit=256

Property breakdown

Property Pedagogical effect
otel.service.name=perfshop All spans carry service.name=perfshop — appears in Tempo attributes and enables TraceQL filtering {resource.service.name="perfshop"}
otel.exporter.otlp.endpoint=http://perfshop-tempo:4317 Targets the gRPC endpoint of the Tempo collector via Docker DNS
otel.exporter.otlp.protocol=grpc Forces gRPC (the default switches based on the port)
otel.traces.exporter=otlp OTLP export of traces only
otel.metrics.exporter=none No OTel export of metrics (Prometheus is the preferred sink)
otel.logs.exporter=none No OTel export of logs (Loki/OpenSearch are the preferred sinks)
otel.instrumentation.http.capture-headers.server.request=X-Admin-Token,Content-Type Explicit capture of HTTP headers on the server side — this is what enables the instructor panel "Traces with admin trigger — X-Admin-Token captured"
otel.instrumentation.jdbc.captured-statements.enabled=true Captures JDBC SQL statements — displayed in the Tempo span details and used by the "Instrumented SQL traces" panel of the Instructor APM dashboard (useful for the Security S1 — SQL injection scenario)
otel.span.attribute.count.limit=256 Cap of 256 attributes per span — protects against cardinality explosion

Auto-instrumentation

The OpenTelemetry Java agent auto-instruments more than 100 frameworks automatically. For PerfShop, the most useful are:

Instrumentation Generated spans
Spring Web MVC HTTP GET /api/products, HTTP POST /api/orders, etc.
HikariCP / JDBC SELECT users.*, INSERT orders, with SQL statement capture
Hibernate ORM Hibernate Session.flush, etc.
Tomcat HTTP server Root server spans for each request
java.net.http / OkHttp Client spans for outgoing HTTP calls
Logback / SLF4J MDC Automatic injection of trace_id and span_id into logs (for Loki correlation)

No line of Spring code needs to be modified to benefit from this instrumentation — the agent does everything at JVM startup.

Grafana datasource — correlations

The Tempo datasource in grafana/provisioning/datasources/tempo.yml enables three correlation mechanisms already described in grafana.md:

flowchart LR
  TEMPO["Tempo datasource"]

  TEMPO -->|tracesToLogsV2| L["Loki<br/>±1 min filter<br/>around the span"]
  TEMPO -->|tracesToMetrics| P["Prometheus<br/>operation metrics"]
  TEMPO -->|serviceMap| SM["Service Map<br/>(via traces_service_graph_*)"]
  TEMPO -->|nodeGraph| NG["Node Graph<br/>(graph view)"]
  TEMPO -->|lokiSearch| LS["Loki Search<br/>(free-form search)"]

The typical pedagogical flow:

  1. The instructor activates a chaos that causes NullPointerExceptions.
  2. They open the Instructor APM dashboard and look at the "F1 — NullPointerException traces" panel (TraceQL {span.exception.type="NullPointerException"}).
  3. They click on a trace to see the span details.
  4. They click on a span → tracesToLogsV2 opens Loki with the time filter and the traceID → they see the Spring Boot logs around the exception.
  5. Optionally, they click on tracesToMetrics to see the Prometheus latency of the operation over the same window.
  6. To understand the global context, they open the Service Map of the Tempo datasource which displays the client → perfshop-app → mysql graph with average latencies on each edge.

Volumes and persistence

Volume Mount Content
tempo-data (named volume) /var/tempo WAL, blocks, generator WAL
./tempo/tempo-config.yml (bind mount) /etc/tempo/tempo-config.yml Tempo configuration (read-only)

Ports

Service Host port Container port Env variable Usage
perfshop-tempo 19200 3200 TEMPO_HTTP_PORT HTTP query API (Grafana datasource)
perfshop-tempo 4317 4317 TEMPO_OTLP_GRPC_PORT OTLP gRPC (export from the Java agent)
perfshop-tempo 4318 4318 TEMPO_OTLP_HTTP_PORT OTLP HTTP (alternative)

To go further

  • Overview — four signals and correlations
  • Grafana — Tempo datasource and tracesToLogsV2, tracesToMetrics correlations
  • Shipped dashboards — details of the TraceQL panels of the Instructor APM dashboard
  • Prometheus — reception of span-metrics via remote-write
  • Pyroscope — the other agent embedded in the backend image