Skip to content

Shipped dashboards

PerfShop ships 10 Grafana dashboards organized into two folders — Students (5 dashboards with anonymous access) and Instructors (5 Admin-only dashboards). This page lists each dashboard, its UID, target audience, primary datasource, and all of its panels, read directly from the source JSON files.

Source of truth

All panel titles, UIDs, and datasources on this page are extracted programmatically from the grafana/dashboards/eleves/*.json and grafana/dashboards/formateurs/*.json files. No title is paraphrased: it is the exact label as displayed to the user.

Summary table

Folder File UID Primary datasource(s) Panels
Students dashboard-apm-eleve.json perfshop-apm-eleve Prometheus + Tempo + Pyroscope 22
Students dashboard-backend-eleve.json perfshop-backend-eleve Prometheus 19
Students dashboard-frontend-eleve.json perfshop-frontend-eleve Prometheus 23
Students dashboard-jmeter.json perfshop-jmeter-live Prometheus + Loki 22
Students dashboard-logs-eleve.json perfshop-logs-eleve Loki 9
Instructors dashboard-apm-formateur.json perfshop-apm-formateur Prometheus + Tempo + Pyroscope 23
Instructors dashboard-backend-formateur.json perfshop-backend-formateur Prometheus 27
Instructors dashboard-frontend-formateur.json perfshop-frontend-formateur Prometheus 28
Instructors dashboard-logs-formateur.json perfshop-logs-formateur Loki 12
Instructors perfshop-general-v1.json perfshop-general-v1 Prometheus (docker job) 16

Total: 201 panels across 10 dashboards. The perfshop-general-v1 dashboard is configured as the home dashboard by the Grafana seed — this is what anonymous users see when landing on the Grafana URL without specifying a path.


Students folder

perfshop-apm-eleve — Student APM (Traces + Latencies)

Target audience: a student who wants to understand global backend latencies, observe recent traces, and explore a CPU flamegraph.

Sections and panels:

# Section Panels
1 Overview — Application health Traces received / min (stat) · P95 latency — all routes (stat) · HTTP 5xx error rate (stat) · JVM heap used (stat)
2 Latencies — Spotting degradations P50 / P95 / P99 latency — all routes (timeseries)
3 Traces — Searching for anomalies Pedagogical guide (text) · HTTP 5xx errors (last hour) (stat) · Instrumented operations (stat) · Max observed latency (stat) · "TraceQL toolbox" (text) · Recent traces — click a Trace ID to explore (TraceQL {} table)
4 JVM — Memory & Threads JVM heap — Used / Max (timeseries) · JVM threads — live / daemon (timeseries)
5 CPU profiling — Pyroscope flamegraphs Pedagogical guide (text) · CPU Flamegraph — Linux (perf_event) (flamegraph) · CPU Flamegraph — Docker Desktop / Windows / macOS (itimer) (flamegraph) · Lock Contention Flamegraph — JVM locks (A8 Race Condition) (flamegraph)

Pedagogical specificity: two CPU flamegraph panels coexist — one uses perf_event (which works on native Linux hosts) and the other uses itimer (which works on Docker Desktop). One will be empty depending on the environment, but the student can see both and understand the difference.

Pyroscope datasource is used for the flamegraphs; Tempo for the TraceQL table; Prometheus for everything else.


perfshop-backend-eleve — Student Backend (Analysis)

Target audience: a student who wants to understand the health of the Spring Boot backend — CPU, JVM, threads, database, HTTP latencies.

Sections and panels:

# Section Panels
1 Real-time indicators Application CPU (stat) · JVM memory (stat) · JVM threads (stat) · Active DB connections (stat) · HTTP threads busy (stat) · p99 response time (stat)
2 Application metrics CPU — Application container (timeseries) · JVM heap — used vs max (timeseries) · Tomcat HTTP threads — busy vs max (timeseries) · GC — Pause time (timeseries) · DB connections — HikariCP pool state (timeseries) · DB connection acquisition time (ms) (timeseries) · HTTP request rate — 2xx / 5xx (timeseries) · HTTP response time — p50 / p95 / p99 (timeseries) · /api/products latency — p50 / p95 / p99 (timeseries) · /api/products HTTP errors (timeseries) · GC — Collection frequency (timeseries) · JVM threads — states (timeseries)

Single datasource: Prometheus.

Metrics used (all listed in prometheus.md): docker_container_cpu_percent, jvm_memory_used_bytes, jvm_memory_max_bytes, jvm_threads_live_threads, jvm_threads_states_threads, hikaricp_connections_*, tomcat_threads_*, http_server_requests_seconds_*, jvm_gc_pause_seconds_*.


perfshop-frontend-eleve — Student Frontend (Analysis)

Target audience: a student who wants to observe the impact of a chaos on the frontend — both on the nginx container side and on the browser side (Web Vitals).

Sections and panels:

# Section Panels
1 Nginx container Container CPU · Container RAM · RAM % · Network IN · Network OUT · Processes (6 stats)
2 Client browser — JS performance FPS · JS heap · Long Tasks/s · Fetch requests/s · DOM nodes · Fetch requests/s (6 stats — Web Vitals pushed by chaos-agent.js)
3 Worker CPU & metric freshness Active Worker CPU (stat) · Client metrics age (s) (stat) · Fetch requests/s — trend (timeseries)
4 Time series — Container CPU — Nginx container · RAM — usage vs limit · Network IN / OUT (bytes/s) · Disk I/O (bytes/s) (4 timeseries)
5 Time series — Browser JS FPS (frames per second) · JS heap used (MB) · Long Tasks per second (tasks >50ms) · Fetch requests issued (req/s) · DOM nodes · Container OUT network (bytes/s) (6 timeseries)

Specificity: this dashboard mixes two very different metric families — docker_container_* (job perfshop-docker, periodic scrape) and perfshop_client_* (pushed in real time by chaos-agent.js from the student's browser). See dashboard-html.md for the details of the client → backend → Prometheus flow.

Single datasource: Prometheus.


perfshop-jmeter-live — JMeter Load Test

Target audience: a student who launches a JMeter run via perfshop-jmeter-ui and wants to track the run in real time as well as the impact on the backend.

Sections and panels:

# Section Panels
1 JMeter — Current run overview Active vUsers · Transactions/s · Error rate % · P95 (ms) · Total samples · Mean time (ms) (6 stats)
2 Response times P50 / P95 / P99 / Avg latency (ms) · P95 / P99 latency by label (sampler) (2 timeseries)
3 Throughput & errors Transactions/s (success + errors) · Success / error rate % (by label) · Transactions/s by label (sampler) · Cumulative transactions by label (3 timeseries + 1 bargauge)
4 Virtual users (threads) Active / started / finished threads · Cumulative samples (total) (2 timeseries)
5 PerfShop backend correlation (JVM + Spring Boot) Backend JVM heap (used / max) · Backend HTTP error rate (4xx / 5xx) · Backend P50 / P95 / P99 latency (Spring Boot) · JVM threads (daemon / non-daemon) · Backend CPU (process) · Backend latency by endpoint (P95) (6 timeseries)
6 Logs — JMeter & JMeter UI Logs — perfshop-jmeter (JMeter runs) (Loki logs) · Logs — perfshop-jmeter-ui (Node.js API) (Loki logs)

Specificity: this is the only Student dashboard that mixes three sources — Prometheus (jmeter_* metrics and backend metrics), Loki (JMeter and JMeter UI logs). It is used to correlate in real time: "my run applies 200 vUsers → what is the impact on the JVM heap and the HikariCP pool?".

Loki datasource is used on the last two panels via the selectors {container="perfshop-jmeter"} and {container="perfshop-jmeter-ui"}.


perfshop-logs-eleve — Student Logs (Filtered)

Target audience: a student who wants to read backend logs, errors, and frontend nginx and database logs — with filtering that excludes the chaos engine internal logs (so as not to spoil what is happening under the hood).

Sections and panels:

# Section Panels
1 Spring Boot backend logs Application logs — Spring Boot backend (Loki logs, exclusion filter [BusinessChaos], [BackendChaos], [SecurityChaos], [ChaosInterceptor], [FrontendChaos], [ChaosScripting], chaos_intensity)
2 Errors & exceptions Backend errors only (ERROR / Exception) (Loki logs, same exclusions + |= "ERROR")
3 Frontend & database logs Nginx logs — Frontend HTTP access (Loki logs) · MySQL logs — Database (Loki logs, exclusion filter [note])
4 Analysis — Volume & trends Log volume by level (excluding chaos) (timeseries — count_over_time with chaos exclusion for ERROR/WARN/INFO) · Nginx HTTP errors (4xx / 5xx) (timeseries — count_over_time \| " 4" and \| " 5")

Exclusion strategy: the LogQL query uses several != to hide all internal logs from the chaos engine. This is intentional — the student must be able to observe the impact (exception, latency) without seeing the implementation details. Instructors have their own logs dashboard without these exclusions.

Single datasource: Loki.


Instructors folder

perfshop-apm-formateur — Instructor APM (Tempo + Pyroscope)

Target audience: an instructor who wants to investigate in depth — advanced TraceQL, panels by exception type, capture of admin headers.

Sections and panels:

# Section Panels
1 Distributed traces — Tempo Traces / min · P95 latency — /api/orders · HTTP 5xx errors (traces) · Functional Chaos level · JVM heap used · HTTP 5xx errors (5 min) · F4 — Silent corruptions (5 min) (7 stats)
2 Latencies & error rates — Spanmetrics (Tempo → Prometheus) P50 / P95 / P99 latency — all routes (timeseries) · Error rate per operation (timeseries)
3 Trace explorer — Tempo Traces with admin trigger — X-Admin-Token captured (TraceQL {span.http.request.header.x-admin-token != ""} table) · Instrumented SQL traces — captured JDBC queries (S1 SQLi) (TraceQL {span.db.statement != ""} table) · F1 — NullPointerException traces (level 1+) (TraceQL {span.exception.type="NullPointerException"}) · F2 — StackOverflowError traces (level 2+) (TraceQL {span.exception.type="StackOverflowError"}) · F3 — OutOfMemoryError traces (level 3+) (TraceQL {span.exception.type="OutOfMemoryError"}) · Recent traces — all (filterable by service/operation) (TraceQL {})
4 Continuous profiling — Pyroscope (flamegraphs) CPU Flamegraph — Linux (perf_event) · CPU Flamegraph — Docker Desktop / Windows / macOS (itimer) · Lock Contention Flamegraph — JVM locks (A8 Race Condition) · Heap Flamegraph — memory allocations (F3-OOM) (4 flamegraphs)
5 JVM details — Memory / Threads / GC JVM heap — Used / Committed / Max (timeseries) · JVM threads — live / daemon / peak (timeseries)

Specificity: this is the most powerful dashboard — it combines advanced TraceQL (filtering by exception type, by HTTP header, by SQL statement), Pyroscope flamegraphs (4 different views including a heap one to track leaks), and the spanmetrics generated by Tempo (metrics_generator which pushes aggregated latencies into Prometheus).

The Traces with admin trigger panel deserves special attention: thanks to Dotel.instrumentation.http.capture-headers.server.request=X-Admin-Token, the OpenTelemetry agent captures the value of the X-Admin-Token header in each span. The instructor can therefore see which requests were triggered by an authenticated admin call — useful for auditing a demonstration.


perfshop-backend-formateur — Instructor Backend (Analysis)

Target audience: extended version of the Student Backend dashboard, with additional sections on chaos state.

Sections and panels:

# Section Panels
1 Real-time indicators Identical to the Student dashboard (6 stats)
2 Application metrics Identical to the Student dashboard (12 timeseries)
3 Chaos state — Instructor Chaos anomaly intensity (bargauge) · Chaos anomaly evolution over time (multi-series timeseries chaos_intensity{type="cpu|memory|thread_pool|db_pool|slow_query|deadlock|network"})
4 Scripting Chaos — HTTP activity (marker section, no dedicated panel)
5 Security Chaos — OWASP activity Security Chaos requests by type (timeseries rate(...uri=~".*/api/security.*"), rate(...uri=~".*/api/chaos.*")) · Checkout & Auth activity — Endpoints subject to Scripting Chaos (timeseries rate(...uri=~".*/api/auth.*"), rate(...uri=~".*/api/checkout.*\|.*/api/orders.*"), rate(...status=~"4..")) · 4xx error rate (rejected tokens) (stat) · Checkout p99 latency (stat histogram_quantile(0.99, ...uri=~".*/api/checkout.*\|.*/api/orders.*")) · HTTP success rate (%) (stat)

Single datasource: Prometheus.

Specificity: the "Chaos state" section exposes the custom gauge chaos_intensity{type=...} declared by the backend ChaosService. The instructor sees at a glance which chaos types are active and at what intensity.


perfshop-frontend-formateur — Instructor Frontend (Analysis)

Target audience: extended version of the Student Frontend dashboard, with additional sections on Frontend Chaos correlation.

Sections and panels:

# Section Panels
1 to 5 (Identical to the Student dashboard) 23 panels reproduced identically
6 Frontend Chaos — State & impact Active Worker CPU (stat) · Fetch flood — client req/s (stat) · Browser FPS (stat) · Client metrics age (s) (stat) · Correlation — FPS, Fetch flood, Worker CPU, Long Tasks (multi-series timeseries)
7 Detailed impact — Observed degradation FPS vs Long Tasks — UI degradation (CPU chaos) (timeseries) · Fetch flood vs JS heap — Correlation (fetch + memory chaos) (timeseries)

Specificity: the two final panels in section 7 are correlation panels that overlay several browser metrics to show the effect of a frontend chaos on Web Vitals. When the CPU Worker chaos is active, the instructor sees FPS drop and Long Tasks explode on the same graph instantly.


perfshop-logs-formateur — Instructor Logs (Complete)

Target audience: an instructor who wants to see everything — the chaos engine internal logs are included here, unlike the Student dashboard.

Sections and panels:

# Section Panels
1 Search & filters Spring Boot backend logs — Complete (instructor) (Loki logs {container="perfshop-app"} \| logfmt)
2 Chaos logs — Backend BusinessChaos logs (Business A1-A16) only (\|= "[BusinessChaos]") · BackendChaos logs (Infra: CPU/Memory/Pool/Network) only (\|= "[BackendChaos]") · SecurityChaos logs only (\|= "[SecurityChaos]") · ChaosInterceptor logs — Sessions (\|= "[ChaosInterceptor]")
3 Frontend & database logs Nginx logs — Frontend · MySQL logs — Database
4 Volume & error rates Log volume by level (backend) (timeseries — sum by (level)(count_over_time({container="perfshop-app"} \| logfmt \| level="ERROR" [1m])), same for WARN and INFO) · Chaos log volume (BusinessChaos + BackendChaos + SecurityChaos) (timeseries — count_over_time separated by prefix [BusinessChaos], [BackendChaos], [SecurityChaos], [ChaosInterceptor])

Single datasource: Loki.

LogQL strategy: each chaos family prefixes its logs with a tag in brackets. The filter |= "[BusinessChaos]" only surfaces logs emitted by BusinessChaosService. For the breakdown by level (ERROR/WARN/INFO), the logfmt parser is used to extract the level label directly from the Spring Boot JSON line.


perfshop-general-v1 — Container Overview (home dashboard)

Target audience: everyone — this is the home dashboard configured by the Grafana seed via PATCH /api/org/preferences {"homeDashboardUID":"perfshop-general-v1"}. Any visitor landing on the Grafana URL without specifying a path arrives here.

Sections and panels:

# Section Panels
1 Global overview Total CPU (stat sum(docker_container_cpu_percent)) · Total RAM (stat sum(docker_container_mem_usage_bytes)) · Total IN network (stat sum(rate(docker_container_net_rx_bytes[1m]))) · Total OUT network (stat sum(rate(docker_container_net_tx_bytes[1m])))
2 All-container comparison CPU % — all containers (timeseries docker_container_cpu_percent) · RAM — all containers (timeseries docker_container_mem_usage_bytes) · Network IN — all containers · Network OUT — all containers · Disk I/O Read — all · Disk I/O Write — all (6 timeseries)
3 Comparative bargauges CPU % — instant comparison (bargauge) · RAM % — instant comparison (bargauge)
4 Recap table State of all containers (table with cpu_percent, mem_percent, mem_usage_bytes)

Single datasource: Prometheus, perfshop-docker job exclusively.

Specificity: this is a dashboard placed in the Instructors folder, but it uses only the docker_container_* metrics (so no JVM panels, no HTTP panels). This is intentional — the "general" version focuses on infrastructure and container state, without going into application-level detail.

Visible containers

As explained in prometheus.md, only four containers are monitored by the perfshop-monitoring service that produces the docker_container_* metrics: perfshop-frontend, perfshop-app, perfshop-db, perfshop-monitoring. The other services in the stack (Grafana, Loki, Tempo, Squash TM, etc.) do not appear on this dashboard. This is consistent with the pedagogical use case: the front → back → DB chain is monitored during chaos demos, not the observability stack itself.


How to add or modify a dashboard

The shipped dashboards are reloaded every 10 seconds by Grafana (updateIntervalSeconds: 10 in dashboards.yml). Three ways to modify them:

Approach When? Drawback
Direct edit in the Grafana UI Quick test, exploration Overwritten every 10 s by the file content
Edit the JSON file then copy into the bind mount Development Requires knowing the Grafana JSON structure
Export from the UI then replace the file Lasting modification The exported JSON contains useless UI keys to clean up

To add a new shipped dashboard, drop a JSON file into grafana/dashboards/eleves/ or grafana/dashboards/formateurs/ depending on the target. Grafana will detect it within the next 10 seconds, with no container restart.

To go further

  • Overview — global observability flow and correlations
  • Grafana — datasources, ACL, anonymous access
  • Prometheus — scraped metrics and PromQL examples
  • Loki — log pipelines and LogQL examples
  • Tempo — traces and metrics_generator
  • Pyroscope — CPU / heap / lock flamegraphs