Metrics reference¶
This document lists all Prometheus metrics exposed by PerfShop, organized by family. All are scraped by Prometheus every 15 seconds through the Spring Boot backend's /actuator/prometheus endpoint.
The names below are the names exposed to Prometheus: Micrometer automatically converts the dots (.) of Java Gauges into underscores (_) at export time. A Gauge declared chaos.business.a1.tva on the Java side therefore appears as chaos_business_a1_tva in Prometheus.
Infrastructure chaos metrics¶
All exposed by ChaosService.java with a type= tag that identifies the lever:
| Metric | Range | Description |
|---|---|---|
chaos_intensity{type="memory"} |
0–105 | Memory target (105 = OOM) |
chaos_guardrail{type="memory"} |
0–100 | Memory guardrail (% of max heap) |
chaos_intensity{type="gc_pressure"} |
0–100 | Sawtooth GC pressure intensity |
chaos_intensity{type="db_pool"} |
0–100 | HikariCP saturation |
chaos_intensity{type="thread_pool"} |
0–100 | Tomcat pool saturation |
chaos_intensity{type="cpu"} |
0–100 | Backend CPU load |
chaos_intensity{type="slow_query"} |
0–100 | Injected SQL delay |
chaos_intensity{type="deadlock"} |
0–100 | Deadlock probability |
chaos_intensity{type="network"} |
0–100 | Network delay + random 503s |
The type tag lets you aggregate or filter per lever in PromQL queries: chaos_intensity{type="cpu"} returns only the CPU slider.
Business chaos metrics¶
Exposed by BusinessChaosService.java. Each A1 – A16 anomaly has its own counter — incremented on every trigger.
| Metric | Type | Description |
|---|---|---|
chaos_business_level |
Gauge | Current level (0–4) |
chaos_business_a1_tva |
Gauge | A1 counter — VAT 19.6 % |
chaos_business_a2_arrondi |
Gauge | A2 counter — Floor price rounding |
chaos_business_a3_stock |
Gauge | A3 counter — Stock not decremented |
chaos_business_a4_email |
Gauge | A4 counter — Email missing shipping fees |
chaos_business_a5_doublon |
Gauge | A5 counter — Double order |
chaos_business_a6_promo |
Gauge | A6 counter — Invalid promo code |
chaos_business_a7_livraison |
Gauge | A7 counter — Calendar-day delivery |
chaos_business_a8_race |
Gauge | A8 counter — Stock race condition |
chaos_business_a9_inject |
Gauge | A9 counter — Log injection |
chaos_business_a10_total |
Gauge | A10 counter — Wrong history total |
chaos_business_a11_token |
Gauge | A11 counter — Logout token |
chaos_business_a12_loyalty |
Gauge | A12 counter — Loyalty discount |
chaos_business_a13_currency |
Gauge | A13 counter — USD currency |
chaos_business_a14_shipping |
Gauge | A14 counter — Doubled shipping fees |
chaos_business_a15_history |
Gauge | A15 counter — History corruption |
chaos_business_a16_cancel |
Gauge | A16 counter — Cancellation without restock |
The counters are implemented as Gauges (rather than Counters) because they can be reset by BusinessChaosService.reset(), which a Prometheus Counter does not semantically support.
Functional chaos metrics¶
Exposed by FunctionalChaosService.java. Four counters for the four F1 – F4 anomalies:
| Metric | Type | Description |
|---|---|---|
chaos_functional_level |
Gauge | Current level (0–4) |
chaos_functional_f1_npe |
Gauge | F1 — Payment NullPointerException |
chaos_functional_f2_stackoverflow |
Gauge | F2 — Calculation StackOverflowError |
chaos_functional_f3_oom |
Gauge | F3 — Catalog OutOfMemoryError |
chaos_functional_f4_corruption |
Gauge | F4 — Silent corruption |
Security chaos metrics¶
Exposed by SecurityChaosService.java. Twelve counters for the twelve S1 – S12 flaws:
| Metric | Type | Description |
|---|---|---|
chaos_security_level |
Gauge | Current level (0–4) |
chaos_security_s1_sqli |
Gauge | S1 — SQL Injection |
chaos_security_s2_idor |
Gauge | S2 — Order IDOR |
chaos_security_s3_hash |
Gauge | S3 — Exposed password hash |
chaos_security_s4_xss |
Gauge | S4 — Stored XSS |
chaos_security_s5_price |
Gauge | S5 — Price tampering |
chaos_security_s6_timing |
Gauge | S6 — Login timing attack |
chaos_security_s7_token |
Gauge | S7 — Weak HMAC token |
chaos_security_s8_path |
Gauge | S8 — Path Traversal |
chaos_security_s9_mass |
Gauge | S9 — Mass Assignment |
chaos_security_s10_portal |
Gauge | S10 — Unauthenticated portal stats |
chaos_security_s11_sqli |
Gauge | S11 — Portal login SQLi |
chaos_security_s12_idor |
Gauge | S12 — Privilege escalation IDOR |
Scripting chaos metrics¶
Exposed by ChaosScriptingService.java. No per-event counter — only the level and the number of active bundles (which reflects the memory pressure from logged-in sessions):
| Metric | Type | Description |
|---|---|---|
chaos_scripting_level |
Gauge | Current level (0–4) |
chaos_scripting_bundles_active |
Gauge | Number of TokenBundles in memory |
System metrics¶
Exposed by ContainerCpuMetrics.java:
| Metric | Type | Description |
|---|---|---|
container_cpu_usage |
Gauge | Container CPU load (0.0 – 1.0) |
The metric is obtained through reflection over com.sun.management.OperatingSystemMXBean.getCpuLoad() — it may return 0.0 on JVMs that do not expose this method (case with JDK 16+ strong modules without --add-opens).
Spring Boot / JVM metrics¶
PerfShop exposes all the standard Micrometer metrics through Spring Boot Actuator autoconfiguration. The main ones useful for chaos diagnosis:
JVM memory¶
| Metric | Type | Description |
|---|---|---|
jvm_memory_used_bytes{area="heap"} |
Gauge | Used heap |
jvm_memory_max_bytes{area="heap"} |
Gauge | Configured -Xmx |
jvm_memory_used_bytes{area="nonheap"} |
Gauge | Metaspace, code cache |
jvm_memory_committed_bytes{area="heap"} |
Gauge | Currently committed heap |
Garbage collector¶
| Metric | Type | Description |
|---|---|---|
jvm_gc_pause_seconds_count |
Counter | Number of GC pauses |
jvm_gc_pause_seconds_sum |
Counter | Cumulative time spent in pause |
jvm_gc_pause_seconds_max |
Gauge | Longest recent GC pause |
jvm_gc_memory_promoted_bytes_total |
Counter | Bytes promoted to old gen |
JVM threads¶
| Metric | Type | Description |
|---|---|---|
jvm_threads_states_threads{state="runnable"} |
Gauge | Active threads |
jvm_threads_states_threads{state="blocked"} |
Gauge | Threads blocked on a monitor |
jvm_threads_states_threads{state="waiting"} |
Gauge | Waiting threads |
jvm_threads_live_threads |
Gauge | Total live threads |
Tomcat¶
| Metric | Type | Description |
|---|---|---|
tomcat_threads_busy_threads |
Gauge | Tomcat busy threads |
tomcat_threads_current_threads |
Gauge | Current Tomcat threads |
tomcat_threads_config_max_threads |
Gauge | Configured max (200 by default) |
HikariCP (DB pool)¶
| Metric | Type | Description |
|---|---|---|
hikaricp_connections_active |
Gauge | Connections currently in use |
hikaricp_connections_idle |
Gauge | Idle connections in the pool |
hikaricp_connections_pending |
Gauge | Threads waiting for a connection |
hikaricp_connections_max |
Gauge | Max pool size |
hikaricp_connections_min |
Gauge | Min pool size |
hikaricp_connections_acquire_seconds_max |
Gauge | Recent max acquisition time |
HTTP Server (Spring MVC)¶
| Metric | Type | Description |
|---|---|---|
http_server_requests_seconds_count{uri,method,status} |
Counter | Number of requests |
http_server_requests_seconds_sum{uri,method,status} |
Counter | Cumulative total time |
http_server_requests_seconds_max{uri,method,status} |
Gauge | Recent max latency |
http_server_requests_seconds_bucket{uri,le} |
Histogram | Histogram for quantiles |
The _bucket histogram lets you compute arbitrary quantiles through histogram_quantile() in PromQL — the standard Prometheus mechanism for latencies.
A few useful PromQL queries¶
Global p99 latency correlated with CPU¶
# p99 latency over 1 minute
histogram_quantile(0.99,
sum(rate(http_server_requests_seconds_bucket[1m])) by (le))
# To be compared with
container_cpu_usage
To be plotted in the same Grafana panel — you directly observe the correlation between CPU load and p99 latency degradation.
Business anomaly rate over 5 minutes¶
# All business anomalies aggregated
sum(rate(chaos_business_a1_tva[5m]))
+ sum(rate(chaos_business_a2_arrondi[5m]))
+ sum(rate(chaos_business_a3_stock[5m]))
# … add up the 16 counters
More practical: use a regex on the metric name with {__name__=~"chaos_business_a.*"} (depending on the Prometheus version).
HikariCP pool saturation¶
# Threads waiting for a DB connection
hikaricp_connections_pending > 0
# Saturation ratio (0 to 1)
hikaricp_connections_active / hikaricp_connections_max
Good alert trigger: hikaricp_connections_pending > 0 for more than 30 seconds almost always signals either an active DB Pool Chaos or a real pool saturation in production.
Heap exceeding 80 % of max¶
Threshold at 0.8 = the default memoryGuardrail of Memory Chaos. When this ratio exceeds 0.8, either memory chaos is active or the application has a real memory leak.
Excessive GC pressure¶
# Cumulative GC time over 1 minute (in seconds)
rate(jvm_gc_pause_seconds_sum[1m])
# > 0.1 (10 % of the time spent in GC) = problem
This is the typical signature of GC Pressure Chaos: the ratio easily exceeds 0.2 at intensity 100 %, whereas a healthy application stays below 0.05.
Active security flaws¶
Lets you visualize an aggregated view of Security Chaos activity without having to list the 12 counters individually.
Client metrics (Frontend Chaos)¶
Frontend Chaos metrics are not exposed to Prometheus directly — they are collected by chaos-agent.js on the browser side and POSTed to /api/chaos/client-metrics every 2 seconds. The monitoring service consumes and displays them in real time, without Prometheus persistence.
The collected fields are documented in the Frontend Chaos page.
Prometheus endpoint¶
All the backend metrics above are available at:
This endpoint is excluded from ChaosInterceptor — it remains reachable even when the backend is under 100 % chaos, which guarantees that Prometheus can keep scraping. Scraping happens every 15 seconds by default (configurable in prometheus/prometheus.yml).
Going further¶
- Performance Chaos — infrastructure levers and their metrics
- Business Chaos — detail of the 16 business anomalies
- Functional Chaos — detail of the 4 F1 – F4 anomalies
- Security Chaos — detail of the 12 S1 – S12 flaws
- Frontend Chaos — browser client metrics
- Observability — Prometheus — scraping configuration
- Observability — Grafana — shipped dashboards