Real-time HTML dashboard¶

perfshop-monitoring is a custom Node.js service that provides a real-time HTML dashboard parallel to Grafana. It plays three distinct roles in the stack:

Docker metrics producer — it queries the Docker socket to compute CPU/RAM/network/I/O statistics for the main containers and exposes them in Prometheus format on its own /metrics route. This is the source that feeds the Prometheus perfshop-docker job and therefore all the "Containers" panels in the Grafana dashboards.
Browser metrics receiver — it receives, via POST, the Web Vitals sent by chaos-agent.js from the student's browser (FPS, JS heap, long tasks, fetch/s, DOM nodes, active worker CPU) and re-exposes them in /metrics under the perfshop_client_* prefix.
Standalone HTML dashboard — it serves a static HTML page that directly queries the Spring Boot Actuator API (via /api/prometheus-raw, which is a proxy) and the Docker socket, to offer a real-time view with no Grafana dependency.

Source of truth

This page is taken from monitoring/src/server.js (~440 lines), the bind mounts of the perfshop-monitoring service in the compose files, and monitoring/public/index.html for the served HTML dashboard.

Architecture¶

flowchart LR
  subgraph BROWSER["Student browser"]
    direction TB
    HTML["index.html<br/>(real-time dashboard)"]
    JS["chaos-agent.js<br/>(injected by the frontend)"]
  end

  subgraph MON["perfshop-monitoring (Node + Express)"]
    direction TB
    SVR["server.js"]
    CACHE[("statsCache<br/>TTL 5s")]
    CLIENT[("lastClientMetrics<br/>(Web Vitals)")]
  end

  SOCK["/var/run/docker.sock<br/>(bind mount RO)"]
  BE["perfshop-app:9090<br/>/actuator/prometheus<br/>/actuator/heapdump"]
  PROM["Prometheus<br/>(scrape /metrics 5s)"]

  HTML -->|"polling 2s<br/>GET /api/docker/all<br/>GET /api/prometheus-raw"| SVR
  JS -->|"POST /api/chaos/client-metrics<br/>(every 2s)"| SVR
  SVR -->|"docker API"| SOCK
  SVR -->|"fetch /actuator/prometheus<br/>(timeout 5s)"| BE
  SVR -->|"GET /api/heapdump<br/>(proxy timeout 60s)"| BE

  PROM -->|"scrape GET /metrics"| SVR
  SVR -.cache 5s.-> CACHE
  SVR -.last value.-> CLIENT

Runtime configuration¶

Environment variables¶

environment:
  APP_METRICS_URL: http://perfshop-app:9090/actuator/prometheus
  POLL_INTERVAL: 2000
  DOCKER_SOCKET: /var/run/docker.sock
  PUBLIC_API_URL: ${PUBLIC_API_URL:-http://localhost:8080}
  PUBLIC_MONITORING_URL: ${PUBLIC_MONITORING_URL:-http://localhost:3001}
  PUBLIC_GRAFANA_URL: ${PUBLIC_GRAFANA_URL:-http://localhost:3002}
  PUBLIC_CHAOS_URL: ${PUBLIC_CHAOS_URL:-http://localhost:3003}
  PERFSHOP_API_INTERNAL: ${PERFSHOP_API_INTERNAL:-http://perfshop-app:8080}
  PERFSHOP_LANG: ${PERFSHOP_LANG:-fr}

Variable	Effect
`APP_METRICS_URL`	Backend `/actuator/prometheus` endpoint (management port 9090, internal)
`POLL_INTERVAL=2000`	Backend polling every 2 seconds on the Node side
`DOCKER_SOCKET=/var/run/docker.sock`	Docker socket path (mounted as bind mount)
`PUBLIC_*_URL`	Injected into `window.__CONFIG__` on the browser side to generate UI links
`PERFSHOP_API_INTERNAL`	Backend endpoint for the `/api/admin/login` proxy
`PERFSHOP_LANG`	UI language (`fr` or `en`)

Bind mounts¶

volumes:
  - ./monitoring/public:/app/public
  - /var/run/docker.sock:/var/run/docker.sock:ro

Mount	Effect
`./monitoring/public:/app/public`	Static HTML/CSS/JS sources for the dashboard
`/var/run/docker.sock:/var/run/docker.sock:ro`	Read-only access to the Docker socket — enables querying the Docker API for container stats

Docker socket in read-only

The socket is mounted as :ro, but it is a cosmetic protection: a client that talks to /var/run/docker.sock can potentially list, inspect, and stop containers (the Docker API has no GET/POST granularity on the socket side). This is acceptable because perfshop-monitoring is not publicly exposed and does not execute user code — but it is something to keep in mind during a security audit.

Express routes¶

The service exposes about a dozen routes, organized into four families.

Family 1 — Static HTML pages¶

Route	Method	Description
`/`	GET	Serves `index.html` injecting `window.__CONFIG__ = {API_URL, MONITORING_URL, GRAFANA_URL, CHAOS_URL, LANG}` into `<head>`
`/config.js`	GET	Serves `window.__CONFIG__ = ...;` in JS format — used by HTML pages that load their config via `<script src="...">`
`/admin/`, `/css/`, `/js/`, `/i18n/`, `/fonts/*`	GET	Static assets served by `express.static` from `/app/public`
`/heapdump-widget.html`	GET	Mini HTML widget for the heapdump button

Family 2 — Docker API¶

Route	Method	Description
`/api/docker/all`	GET	Consolidated JSON with stats for the 4 monitored containers (5 s cache)
`/api/docker/stats?container=<n>`	GET	Detailed stats for a single container

The stats computation code is what makes this service useful:

async function fetchContainerStats(name) {
  const realName = resolvedNames[name] || name;
  const s = await dockerRequest(`/containers/${realName}/stats?stream=false`);

  const cpuDelta = s.cpu_stats.cpu_usage.total_usage - s.precpu_stats.cpu_usage.total_usage;
  const sysDelta = s.cpu_stats.system_cpu_usage - s.precpu_stats.system_cpu_usage;
  const numCpus = s.cpu_stats.online_cpus || s.cpu_stats.cpu_usage.percpu_usage?.length || 1;
  const cpuPercent = sysDelta > 0 ? (cpuDelta / sysDelta) * numCpus * 100 : 0;

  const memUsage = s.memory_stats.usage || 0;
  const memCache = s.memory_stats.stats?.cache || s.memory_stats.stats?.inactive_file || 0;
  const memActual = Math.max(0, memUsage - memCache);
  // ...
}

Key points:

stream=false on the Docker API — otherwise Docker streams the stats continuously (chunked HTTP), which does not fit a REST endpoint.
CPU% computed from the delta between the current call and precpu_stats (the Docker API provides both), multiplied by the number of online CPUs.
Useful memory = memory_stats.usage - memory_stats.stats.cache — the raw API counts the page cache as "used", which artificially inflates the value. PerfShop subtracts the cache for an accurate measurement.
Network: aggregation of all container interfaces (s.networks).
Disk I/O: extraction of Read and Write operations in blkio_stats.io_service_bytes_recursive.

Dynamic name resolution¶

async function resolveContainerNames() {
  const list = await dockerRequest('/containers/json?all=false');
  for (const c of list) {
    const names = (c.Names || []).map(n => n.replace(/^\//, ''));
    for (const logical of CONTAINERS_TO_WATCH) {
      for (const real of names) {
        if (real === logical || real.endsWith('-' + logical) || real.endsWith('_' + logical)) {
          resolvedNames[logical] = real;
        }
      }
    }
  }
}
setInterval(resolveContainerNames, 60000);

Depending on the Docker Compose project_name used (the default is the folder name), Docker can prefix containers: perfshop-perfshop-app, myproject_perfshop-app, etc. The code dynamically resolves the real names every 60 seconds to remain compatible with any project name.

Family 3 — Prometheus `/metrics`¶

app.get('/metrics', async (req, res) => {
  const all = await refreshStats();
  const lines = [...];
  // For each monitored container
  for (const [name, s] of Object.entries(all)) {
    const l = `{container="${name}"}`;
    lines.push(`docker_container_cpu_percent${l} ${s.cpu_percent}`);
    lines.push(`docker_container_mem_usage_bytes${l} ${s.mem_usage}`);
    lines.push(`docker_container_mem_limit_bytes${l} ${s.mem_limit}`);
    // ... 9 metrics per container
  }
  // Browser metrics (if fresh)
  const m = lastClientMetrics;
  const stale = !m.receivedAt || (Date.now() - m.receivedAt) > 10000;
  if (!stale) {
    lines.push(`perfshop_client_fps ${m.fps ?? 0}`);
    lines.push(`perfshop_client_heap_used_mb ${m.heapUsedMB ?? 0}`);
    // ...
  }
  res.set('Content-Type', 'text/plain; version=0.0.4');
  res.send(lines.join('\n') + '\n');
});

This is the endpoint scraped by the Prometheus perfshop-docker job every 5 seconds. See prometheus.md for the complete list of metrics produced.

Browser metrics — staleness¶

The perfshop_client_* metrics are conditionally emitted: if more than 10 seconds have elapsed since the last reception, they are not exposed. This avoids Prometheus continuing to see frozen series when the student closes their browser tab.

Family 4 — Browser metrics (push from the frontend)¶

Route	Method	Description
`/api/chaos/client-metrics`	POST	Reception of Web Vitals from `chaos-agent.js` (every 2 seconds)
`/api/chaos/client-metrics`	GET	Read of the last value (with `stale: true/false` flag)

The POST expects a JSON body:

{
  "fps": 60,
  "longTasksPerSec": 0.5,
  "heapUsedMB": 42.3,
  "heapLimitMB": 2048,
  "pendingFetches": 1.2,
  "domNodeCount": 543,
  "cpuWorkerActive": false,
  "timestamp": 1712345678901
}

All values are optional (the server retains the last known one if a key is missing), with explicit type validation.

Family 5 — Proxies to the backend¶

`/api/prometheus-raw`¶

app.get('/api/prometheus-raw', async (req, res) => {
  const response = await fetchWithTimeout(APP_METRICS_URL);
  const text = await response.text();
  res.set('Content-Type', 'text/plain; version=0.0.4');
  res.send(text);
});

Direct proxy to http://perfshop-app:9090/actuator/prometheus, with a 5-second timeout (AbortController). Used by the HTML dashboard which parses metrics on the client side itself (with its own parsePrometheus() function) — this is what enables the dashboard to work without Grafana.

`/api/heapdump`¶

const HEAPDUMP_URL = process.env.HEAPDUMP_URL || 'http://perfshop-app:9090/actuator/heapdump';
app.get('/api/heapdump', async (req, res) => {
  const controller = new AbortController();
  const timer = setTimeout(() => controller.abort(), 60000);
  const response = await fetch(HEAPDUMP_URL, { signal: controller.signal });
  // ...
  const filename = `heapdump-${new Date().toISOString().replace(/[:.]/g, '-')}.hprof`;
  res.set('Content-Type', 'application/octet-stream');
  res.set('Content-Disposition', `attachment; filename="${filename}"`);
  response.body.pipe(res);
});

Proxy to /actuator/heapdump of the backend, with:

A generous 60-second timeout (a heap dump can take ~30 s on a loaded JVM)
Renaming to heapdump-<ISO>.hprof so the browser offers a download with a timestamped name
Streaming via pipe() — the .hprof file can weigh several hundred MB, it is not loaded in RAM on the Node side

This is the entry point of the pedagogical memory chaos: the student clicks a "Download heap dump" button, waits about thirty seconds, and receives a .hprof file that they open in Eclipse MAT or VisualVM to analyze a memory leak. See ../architecture/multi-session.md for the coupling with the optional memory cache of pedagogical sessions.

`/api/admin/login`¶

const BACKEND_INTERNAL = process.env.PERFSHOP_API_INTERNAL || 'http://perfshop-app:8080';
app.post('/api/admin/login', async (req, res) => {
  const resp = await fetch(`${BACKEND_INTERNAL}/api/admin/login`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(req.body || {}),
  });
  if (resp.status === 402) return res.status(402).json({ error: 'PerfShop license missing' });
  const data = await resp.json();
  res.status(resp.status).json(data);
});

Proxy to the backend admin login. Why a proxy? To avoid having to handle cross-origin issues from the browser — if the perfshop-monitoring dashboard wants to authenticate as admin, it calls its own Node backend which relays to Spring Boot, thus staying on the same origin.

Cache and freshness¶

let statsCache = {};
let lastFetch = 0;
const CACHE_TTL = 5000;

async function refreshStats() {
  const now = Date.now();
  if (now - lastFetch < CACHE_TTL) return statsCache;
  lastFetch = now;
  const results = await Promise.all(CONTAINERS_TO_WATCH.map(async name =>
    ({ name, stats: await fetchContainerStats(name) })
  ));
  statsCache = {};
  for (const { name, stats } of results) if (stats) statsCache[name] = stats;
  return statsCache;
}

The Docker stats cache has a 5-second TTL. This avoids dozens of clients (Prometheus + HTML dashboard + others) refreshing in parallel triggering as many calls to the Docker API — a single call every 5 seconds is enough, and all consumers read from the cache.

This is aligned with Prometheus's scrape_interval: 5s.

Monitored containers¶

const CONTAINERS_TO_WATCH = ['perfshop-frontend', 'perfshop-app', 'perfshop-db', 'perfshop-monitoring'];

Four containers, no more. This is intentional:

perfshop-frontend — the end-to-end chain (the front to observe)
perfshop-app — the backend (heart of the chaos)
perfshop-db — the database (DB impacts of the chaos)
perfshop-monitoring — itself (useful to verify that the service does not consume too many resources itself)

The other services (Grafana, Loki, Tempo, Squash TM, Forgejo, etc.) are not monitored by this job. They have their own logs (Loki / OpenSearch) and their own Grafana dashboards; their health in terms of Docker resources is less of a priority for pedagogical demos.

Docker socket dependency — architectural implication¶

The fact that perfshop-monitoring consumes the Docker socket in read mode is what makes it non-relocatable: it cannot be deployed on a machine different from the one hosting the other containers. This is consistent with the PerfShop model (single-host), but it is a constraint to be aware of.

To go further¶

Overview — three parallel visualization interfaces
Prometheus — perfshop-docker job that scrapes the /metrics of this service
Shipped dashboards — perfshop-frontend-eleve dashboard that consumes the perfshop_client_* metrics
Pedagogical multi-session — memory chaos ↔ heap dump coupling via /api/heapdump
Interfaces section — details of the monitoring HTML dashboard UI served by this service