Tempo (OpenTelemetry)¶
Tempo is PerfShop's distributed tracing sink. It receives OTLP spans exported by the OpenTelemetry Java agent embedded in the backend, stores them in local blocks, and feeds Grafana datasources with rich features: traces ↔ logs correlation, service map, derived span-metrics pushed to Prometheus.
Source of truth
This page is taken from tempo/tempo-config.yml, the backend JAVA_OPTS in the compose files, and the Tempo datasource in grafana/provisioning/datasources/tempo.yml.
Pinned version¶
PerfShop pins Tempo to version 2.4.2 (image grafana/tempo:2.4.2). It is the only observability image, along with Grafana, to be pinned to a specific version (the others use latest). The reason: the metrics_generator configuration and the span-metrics format have evolved between major versions, and the integration with Grafana 12.0.0 has been validated specifically against Tempo 2.4.2.
Architecture¶
flowchart LR
subgraph BE["perfshop-app (JVM)"]
direction TB
APP["Spring Boot 3.2"]
OA["OpenTelemetry agent<br/>opentelemetry-javaagent.jar"]
APP -.injected.-> OA
end
TEMPO["perfshop-tempo<br/>(2.4.2)"]
PROM[("Prometheus")]
GRAF["Grafana"]
OA -->|"OTLP gRPC :4317<br/>(spans)"| TEMPO
TEMPO -->|"metrics_generator<br/>service-graphs<br/>span-metrics<br/>remote_write"| PROM
TEMPO -->|"Tempo datasource<br/>(query)"| GRAF
PROM -->|"Prometheus datasource<br/>(spanmetrics + serviceMap)"| GRAF
Tempo configuration¶
OTLP receivers¶
Tempo exposes two OTLP endpoints simultaneously:
| Port | Protocol | Usage |
|---|---|---|
| 4317 | gRPC | Used by the OpenTelemetry Java agent of perfshop-app |
| 4318 | HTTP | Available for SDKs that do not support gRPC (curl, Python without grpcio, etc.) |
Both ports are exposed to the host via the variables TEMPO_OTLP_GRPC_PORT and TEMPO_OTLP_HTTP_PORT.
Ingester and storage¶
ingester:
trace_idle_period: 10s
max_block_bytes: 1_000_000
max_block_duration: 5m
storage:
trace:
backend: local
block:
bloom_filter_false_positive: .05
wal:
path: /var/tempo/wal
local:
path: /var/tempo/blocks
pool:
max_workers: 100
queue_depth: 10000
| Parameter | Value | Effect |
|---|---|---|
trace_idle_period |
10s | A trace inactive for 10 s is considered finished and flushed |
max_block_bytes |
1 MB | WAL blocks are cut at 1 MB |
max_block_duration |
5m | Hard duration cap per block |
backend: local |
— | Filesystem storage (named volume tempo-data) |
wal.path |
/var/tempo/wal |
Write-Ahead Log before flush |
local.path |
/var/tempo/blocks |
Flushed blocks (immutable, compacted) |
bloom_filter_false_positive |
0.05 | 5% false positives on the index bloom filters |
Server¶
| Parameter | Value | Effect |
|---|---|---|
http_listen_port |
3200 | HTTP query API (used by the Grafana Tempo datasource) |
log_level |
warn | Log level for Tempo itself (not verbose) |
stream_over_http_enabled |
true | Enables HTTP streaming for long searches |
The default host port is 19200 (variable TEMPO_HTTP_PORT). The container internal port remains 3200 — only the host mapping changes.
metrics_generator — the game changer¶
metrics_generator:
registry:
external_labels:
source: tempo
cluster: perfshop
job: perfshop-backend
storage:
path: /var/tempo/generator/wal
remote_write:
- url: http://prometheus:9090/api/v1/write
send_exemplars: true
overrides:
defaults:
metrics_generator:
processors:
- service-graphs
- span-metrics
This is where Tempo becomes much more than a simple trace store. The metrics_generator analyzes spans in real time and generates two families of derived metrics that it pushes into Prometheus via remote-write:
Family 1 — service-graphs¶
For each pair of entities (calling service → called service), generates metrics:
| Metric | Type | Description |
|---|---|---|
traces_service_graph_request_total |
counter | Number of calls |
traces_service_graph_request_failed_total |
counter | Number of failed calls |
traces_service_graph_request_server_seconds_* |
histogram | Server-side latency |
traces_service_graph_request_client_seconds_* |
histogram | Client-side latency |
These metrics feed Grafana's Service Map (Tempo datasource → Service Graph tab) which displays a directed graph of dependencies between services.
Family 2 — span-metrics¶
For each instrumented operation (HTTP URI, method, status), generates:
| Metric | Type | Description |
|---|---|---|
traces_spanmetrics_calls_total{service, span_name, status_code, ...} |
counter | Number of invocations |
traces_spanmetrics_latency_*{service, span_name, ...} |
histogram | Latency distribution |
These metrics enable the "P50 / P95 / P99 latency — all routes" panel of the Instructor APM dashboard to work — it queries Prometheus, not Tempo, but on series derived from spans.
External labels¶
These three labels are added to all the metrics pushed by Tempo into Prometheus. The label job: perfshop-backend aligns the span-metrics with the other Spring Boot metrics, which makes it possible to mix the two in a single PromQL query.
Exemplars¶
Exemplars are links from a metric to a specific trace_id. With send_exemplars: true, each histogram sample can carry a trace_id that flows up to Grafana — the user then sees clickable points on latency graphs, and a click opens the corresponding trace. It is one of the three metrics ↔ traces correlation mechanisms in the stack.
OpenTelemetry agent on the backend side¶
The OpenTelemetry Java agent is embedded in the perfshop-backend image (path /agents/opentelemetry-javaagent.jar) and activated via JAVA_OPTS in the compose files:
-javaagent:/agents/opentelemetry-javaagent.jar
-Dotel.service.name=perfshop
-Dotel.exporter.otlp.endpoint=http://perfshop-tempo:4317
-Dotel.exporter.otlp.protocol=grpc
-Dotel.traces.exporter=otlp
-Dotel.metrics.exporter=none
-Dotel.logs.exporter=none
-Dotel.instrumentation.http.capture-headers.server.request=X-Admin-Token,Content-Type
-Dotel.instrumentation.jdbc.captured-statements.enabled=true
-Dotel.span.attribute.count.limit=256
Property breakdown¶
| Property | Pedagogical effect |
|---|---|
otel.service.name=perfshop |
All spans carry service.name=perfshop — appears in Tempo attributes and enables TraceQL filtering {resource.service.name="perfshop"} |
otel.exporter.otlp.endpoint=http://perfshop-tempo:4317 |
Targets the gRPC endpoint of the Tempo collector via Docker DNS |
otel.exporter.otlp.protocol=grpc |
Forces gRPC (the default switches based on the port) |
otel.traces.exporter=otlp |
OTLP export of traces only |
otel.metrics.exporter=none |
No OTel export of metrics (Prometheus is the preferred sink) |
otel.logs.exporter=none |
No OTel export of logs (Loki/OpenSearch are the preferred sinks) |
otel.instrumentation.http.capture-headers.server.request=X-Admin-Token,Content-Type |
Explicit capture of HTTP headers on the server side — this is what enables the instructor panel "Traces with admin trigger — X-Admin-Token captured" |
otel.instrumentation.jdbc.captured-statements.enabled=true |
Captures JDBC SQL statements — displayed in the Tempo span details and used by the "Instrumented SQL traces" panel of the Instructor APM dashboard (useful for the Security S1 — SQL injection scenario) |
otel.span.attribute.count.limit=256 |
Cap of 256 attributes per span — protects against cardinality explosion |
Auto-instrumentation¶
The OpenTelemetry Java agent auto-instruments more than 100 frameworks automatically. For PerfShop, the most useful are:
| Instrumentation | Generated spans |
|---|---|
| Spring Web MVC | HTTP GET /api/products, HTTP POST /api/orders, etc. |
| HikariCP / JDBC | SELECT users.*, INSERT orders, with SQL statement capture |
| Hibernate ORM | Hibernate Session.flush, etc. |
| Tomcat HTTP server | Root server spans for each request |
| java.net.http / OkHttp | Client spans for outgoing HTTP calls |
| Logback / SLF4J MDC | Automatic injection of trace_id and span_id into logs (for Loki correlation) |
No line of Spring code needs to be modified to benefit from this instrumentation — the agent does everything at JVM startup.
Grafana datasource — correlations¶
The Tempo datasource in grafana/provisioning/datasources/tempo.yml enables three correlation mechanisms already described in grafana.md:
flowchart LR
TEMPO["Tempo datasource"]
TEMPO -->|tracesToLogsV2| L["Loki<br/>±1 min filter<br/>around the span"]
TEMPO -->|tracesToMetrics| P["Prometheus<br/>operation metrics"]
TEMPO -->|serviceMap| SM["Service Map<br/>(via traces_service_graph_*)"]
TEMPO -->|nodeGraph| NG["Node Graph<br/>(graph view)"]
TEMPO -->|lokiSearch| LS["Loki Search<br/>(free-form search)"]
The typical pedagogical flow:
- The instructor activates a chaos that causes
NullPointerExceptions. - They open the Instructor APM dashboard and look at the "F1 — NullPointerException traces" panel (TraceQL
{span.exception.type="NullPointerException"}). - They click on a trace to see the span details.
- They click on a span → tracesToLogsV2 opens Loki with the time filter and the
traceID→ they see the Spring Boot logs around the exception. - Optionally, they click on tracesToMetrics to see the Prometheus latency of the operation over the same window.
- To understand the global context, they open the Service Map of the Tempo datasource which displays the
client → perfshop-app → mysqlgraph with average latencies on each edge.
Volumes and persistence¶
| Volume | Mount | Content |
|---|---|---|
tempo-data (named volume) |
/var/tempo |
WAL, blocks, generator WAL |
./tempo/tempo-config.yml (bind mount) |
/etc/tempo/tempo-config.yml |
Tempo configuration (read-only) |
Ports¶
| Service | Host port | Container port | Env variable | Usage |
|---|---|---|---|---|
perfshop-tempo |
19200 | 3200 | TEMPO_HTTP_PORT |
HTTP query API (Grafana datasource) |
perfshop-tempo |
4317 | 4317 | TEMPO_OTLP_GRPC_PORT |
OTLP gRPC (export from the Java agent) |
perfshop-tempo |
4318 | 4318 | TEMPO_OTLP_HTTP_PORT |
OTLP HTTP (alternative) |
To go further¶
- Overview — four signals and correlations
- Grafana — Tempo datasource and
tracesToLogsV2,tracesToMetricscorrelations - Shipped dashboards — details of the TraceQL panels of the Instructor APM dashboard
- Prometheus — reception of span-metrics via remote-write
- Pyroscope — the other agent embedded in the backend image