Skip to content

Metrics reference

This document lists all Prometheus metrics exposed by PerfShop, organized by family. All are scraped by Prometheus every 15 seconds through the Spring Boot backend's /actuator/prometheus endpoint.

The names below are the names exposed to Prometheus: Micrometer automatically converts the dots (.) of Java Gauges into underscores (_) at export time. A Gauge declared chaos.business.a1.tva on the Java side therefore appears as chaos_business_a1_tva in Prometheus.

Infrastructure chaos metrics

All exposed by ChaosService.java with a type= tag that identifies the lever:

Metric Range Description
chaos_intensity{type="memory"} 0–105 Memory target (105 = OOM)
chaos_guardrail{type="memory"} 0–100 Memory guardrail (% of max heap)
chaos_intensity{type="gc_pressure"} 0–100 Sawtooth GC pressure intensity
chaos_intensity{type="db_pool"} 0–100 HikariCP saturation
chaos_intensity{type="thread_pool"} 0–100 Tomcat pool saturation
chaos_intensity{type="cpu"} 0–100 Backend CPU load
chaos_intensity{type="slow_query"} 0–100 Injected SQL delay
chaos_intensity{type="deadlock"} 0–100 Deadlock probability
chaos_intensity{type="network"} 0–100 Network delay + random 503s

The type tag lets you aggregate or filter per lever in PromQL queries: chaos_intensity{type="cpu"} returns only the CPU slider.

Business chaos metrics

Exposed by BusinessChaosService.java. Each A1 – A16 anomaly has its own counter — incremented on every trigger.

Metric Type Description
chaos_business_level Gauge Current level (0–4)
chaos_business_a1_tva Gauge A1 counter — VAT 19.6 %
chaos_business_a2_arrondi Gauge A2 counter — Floor price rounding
chaos_business_a3_stock Gauge A3 counter — Stock not decremented
chaos_business_a4_email Gauge A4 counter — Email missing shipping fees
chaos_business_a5_doublon Gauge A5 counter — Double order
chaos_business_a6_promo Gauge A6 counter — Invalid promo code
chaos_business_a7_livraison Gauge A7 counter — Calendar-day delivery
chaos_business_a8_race Gauge A8 counter — Stock race condition
chaos_business_a9_inject Gauge A9 counter — Log injection
chaos_business_a10_total Gauge A10 counter — Wrong history total
chaos_business_a11_token Gauge A11 counter — Logout token
chaos_business_a12_loyalty Gauge A12 counter — Loyalty discount
chaos_business_a13_currency Gauge A13 counter — USD currency
chaos_business_a14_shipping Gauge A14 counter — Doubled shipping fees
chaos_business_a15_history Gauge A15 counter — History corruption
chaos_business_a16_cancel Gauge A16 counter — Cancellation without restock

The counters are implemented as Gauges (rather than Counters) because they can be reset by BusinessChaosService.reset(), which a Prometheus Counter does not semantically support.

Functional chaos metrics

Exposed by FunctionalChaosService.java. Four counters for the four F1 – F4 anomalies:

Metric Type Description
chaos_functional_level Gauge Current level (0–4)
chaos_functional_f1_npe Gauge F1 — Payment NullPointerException
chaos_functional_f2_stackoverflow Gauge F2 — Calculation StackOverflowError
chaos_functional_f3_oom Gauge F3 — Catalog OutOfMemoryError
chaos_functional_f4_corruption Gauge F4 — Silent corruption

Security chaos metrics

Exposed by SecurityChaosService.java. Twelve counters for the twelve S1 – S12 flaws:

Metric Type Description
chaos_security_level Gauge Current level (0–4)
chaos_security_s1_sqli Gauge S1 — SQL Injection
chaos_security_s2_idor Gauge S2 — Order IDOR
chaos_security_s3_hash Gauge S3 — Exposed password hash
chaos_security_s4_xss Gauge S4 — Stored XSS
chaos_security_s5_price Gauge S5 — Price tampering
chaos_security_s6_timing Gauge S6 — Login timing attack
chaos_security_s7_token Gauge S7 — Weak HMAC token
chaos_security_s8_path Gauge S8 — Path Traversal
chaos_security_s9_mass Gauge S9 — Mass Assignment
chaos_security_s10_portal Gauge S10 — Unauthenticated portal stats
chaos_security_s11_sqli Gauge S11 — Portal login SQLi
chaos_security_s12_idor Gauge S12 — Privilege escalation IDOR

Scripting chaos metrics

Exposed by ChaosScriptingService.java. No per-event counter — only the level and the number of active bundles (which reflects the memory pressure from logged-in sessions):

Metric Type Description
chaos_scripting_level Gauge Current level (0–4)
chaos_scripting_bundles_active Gauge Number of TokenBundles in memory

System metrics

Exposed by ContainerCpuMetrics.java:

Metric Type Description
container_cpu_usage Gauge Container CPU load (0.0 – 1.0)

The metric is obtained through reflection over com.sun.management.OperatingSystemMXBean.getCpuLoad() — it may return 0.0 on JVMs that do not expose this method (case with JDK 16+ strong modules without --add-opens).

Spring Boot / JVM metrics

PerfShop exposes all the standard Micrometer metrics through Spring Boot Actuator autoconfiguration. The main ones useful for chaos diagnosis:

JVM memory

Metric Type Description
jvm_memory_used_bytes{area="heap"} Gauge Used heap
jvm_memory_max_bytes{area="heap"} Gauge Configured -Xmx
jvm_memory_used_bytes{area="nonheap"} Gauge Metaspace, code cache
jvm_memory_committed_bytes{area="heap"} Gauge Currently committed heap

Garbage collector

Metric Type Description
jvm_gc_pause_seconds_count Counter Number of GC pauses
jvm_gc_pause_seconds_sum Counter Cumulative time spent in pause
jvm_gc_pause_seconds_max Gauge Longest recent GC pause
jvm_gc_memory_promoted_bytes_total Counter Bytes promoted to old gen

JVM threads

Metric Type Description
jvm_threads_states_threads{state="runnable"} Gauge Active threads
jvm_threads_states_threads{state="blocked"} Gauge Threads blocked on a monitor
jvm_threads_states_threads{state="waiting"} Gauge Waiting threads
jvm_threads_live_threads Gauge Total live threads

Tomcat

Metric Type Description
tomcat_threads_busy_threads Gauge Tomcat busy threads
tomcat_threads_current_threads Gauge Current Tomcat threads
tomcat_threads_config_max_threads Gauge Configured max (200 by default)

HikariCP (DB pool)

Metric Type Description
hikaricp_connections_active Gauge Connections currently in use
hikaricp_connections_idle Gauge Idle connections in the pool
hikaricp_connections_pending Gauge Threads waiting for a connection
hikaricp_connections_max Gauge Max pool size
hikaricp_connections_min Gauge Min pool size
hikaricp_connections_acquire_seconds_max Gauge Recent max acquisition time

HTTP Server (Spring MVC)

Metric Type Description
http_server_requests_seconds_count{uri,method,status} Counter Number of requests
http_server_requests_seconds_sum{uri,method,status} Counter Cumulative total time
http_server_requests_seconds_max{uri,method,status} Gauge Recent max latency
http_server_requests_seconds_bucket{uri,le} Histogram Histogram for quantiles

The _bucket histogram lets you compute arbitrary quantiles through histogram_quantile() in PromQL — the standard Prometheus mechanism for latencies.

A few useful PromQL queries

Global p99 latency correlated with CPU

# p99 latency over 1 minute
histogram_quantile(0.99,
  sum(rate(http_server_requests_seconds_bucket[1m])) by (le))

# To be compared with
container_cpu_usage

To be plotted in the same Grafana panel — you directly observe the correlation between CPU load and p99 latency degradation.

Business anomaly rate over 5 minutes

# All business anomalies aggregated
sum(rate(chaos_business_a1_tva[5m]))
+ sum(rate(chaos_business_a2_arrondi[5m]))
+ sum(rate(chaos_business_a3_stock[5m]))
# … add up the 16 counters

More practical: use a regex on the metric name with {__name__=~"chaos_business_a.*"} (depending on the Prometheus version).

HikariCP pool saturation

# Threads waiting for a DB connection
hikaricp_connections_pending > 0

# Saturation ratio (0 to 1)
hikaricp_connections_active / hikaricp_connections_max

Good alert trigger: hikaricp_connections_pending > 0 for more than 30 seconds almost always signals either an active DB Pool Chaos or a real pool saturation in production.

Heap exceeding 80 % of max

jvm_memory_used_bytes{area="heap"}
  / jvm_memory_max_bytes{area="heap"} > 0.8

Threshold at 0.8 = the default memoryGuardrail of Memory Chaos. When this ratio exceeds 0.8, either memory chaos is active or the application has a real memory leak.

Excessive GC pressure

# Cumulative GC time over 1 minute (in seconds)
rate(jvm_gc_pause_seconds_sum[1m])

# > 0.1 (10 % of the time spent in GC) = problem

This is the typical signature of GC Pressure Chaos: the ratio easily exceeds 0.2 at intensity 100 %, whereas a healthy application stays below 0.05.

Active security flaws

# All flaws triggered over 1 minute
sum(rate({__name__=~"chaos_security_s.*"}[1m]))

Lets you visualize an aggregated view of Security Chaos activity without having to list the 12 counters individually.

Client metrics (Frontend Chaos)

Frontend Chaos metrics are not exposed to Prometheus directly — they are collected by chaos-agent.js on the browser side and POSTed to /api/chaos/client-metrics every 2 seconds. The monitoring service consumes and displays them in real time, without Prometheus persistence.

The collected fields are documented in the Frontend Chaos page.

Prometheus endpoint

All the backend metrics above are available at:

GET /actuator/prometheus

This endpoint is excluded from ChaosInterceptor — it remains reachable even when the backend is under 100 % chaos, which guarantees that Prometheus can keep scraping. Scraping happens every 15 seconds by default (configurable in prometheus/prometheus.yml).

Going further