Metrics reference¶

This document lists all Prometheus metrics exposed by PerfShop, organized by family. All are scraped by Prometheus every 15 seconds through the Spring Boot backend's /actuator/prometheus endpoint.

The names below are the names exposed to Prometheus: Micrometer automatically converts the dots (.) of Java Gauges into underscores (_) at export time. A Gauge declared chaos.business.a1.tva on the Java side therefore appears as chaos_business_a1_tva in Prometheus.

Infrastructure chaos metrics¶

All exposed by ChaosService.java with a type= tag that identifies the lever:

Metric	Range	Description
`chaos_intensity{type="memory"}`	0–105	Memory target (105 = OOM)
`chaos_guardrail{type="memory"}`	0–100	Memory guardrail (% of max heap)
`chaos_intensity{type="gc_pressure"}`	0–100	Sawtooth GC pressure intensity
`chaos_intensity{type="db_pool"}`	0–100	HikariCP saturation
`chaos_intensity{type="thread_pool"}`	0–100	Tomcat pool saturation
`chaos_intensity{type="cpu"}`	0–100	Backend CPU load
`chaos_intensity{type="slow_query"}`	0–100	Injected SQL delay
`chaos_intensity{type="deadlock"}`	0–100	Deadlock probability
`chaos_intensity{type="network"}`	0–100	Network delay + random 503s

The type tag lets you aggregate or filter per lever in PromQL queries: chaos_intensity{type="cpu"} returns only the CPU slider.

Business chaos metrics¶

Exposed by BusinessChaosService.java. Each A1 – A16 anomaly has its own counter — incremented on every trigger.

Metric	Type	Description
`chaos_business_level`	Gauge	Current level (0–4)
`chaos_business_a1_tva`	Gauge	A1 counter — VAT 19.6 %
`chaos_business_a2_arrondi`	Gauge	A2 counter — Floor price rounding
`chaos_business_a3_stock`	Gauge	A3 counter — Stock not decremented
`chaos_business_a4_email`	Gauge	A4 counter — Email missing shipping fees
`chaos_business_a5_doublon`	Gauge	A5 counter — Double order
`chaos_business_a6_promo`	Gauge	A6 counter — Invalid promo code
`chaos_business_a7_livraison`	Gauge	A7 counter — Calendar-day delivery
`chaos_business_a8_race`	Gauge	A8 counter — Stock race condition
`chaos_business_a9_inject`	Gauge	A9 counter — Log injection
`chaos_business_a10_total`	Gauge	A10 counter — Wrong history total
`chaos_business_a11_token`	Gauge	A11 counter — Logout token
`chaos_business_a12_loyalty`	Gauge	A12 counter — Loyalty discount
`chaos_business_a13_currency`	Gauge	A13 counter — USD currency
`chaos_business_a14_shipping`	Gauge	A14 counter — Doubled shipping fees
`chaos_business_a15_history`	Gauge	A15 counter — History corruption
`chaos_business_a16_cancel`	Gauge	A16 counter — Cancellation without restock

The counters are implemented as Gauges (rather than Counters) because they can be reset by BusinessChaosService.reset(), which a Prometheus Counter does not semantically support.

Functional chaos metrics¶

Exposed by FunctionalChaosService.java. Four counters for the four F1 – F4 anomalies:

Metric	Type	Description
`chaos_functional_level`	Gauge	Current level (0–4)
`chaos_functional_f1_npe`	Gauge	F1 — Payment NullPointerException
`chaos_functional_f2_stackoverflow`	Gauge	F2 — Calculation StackOverflowError
`chaos_functional_f3_oom`	Gauge	F3 — Catalog OutOfMemoryError
`chaos_functional_f4_corruption`	Gauge	F4 — Silent corruption

Security chaos metrics¶

Exposed by SecurityChaosService.java. Twelve counters for the twelve S1 – S12 flaws:

Metric	Type	Description
`chaos_security_level`	Gauge	Current level (0–4)
`chaos_security_s1_sqli`	Gauge	S1 — SQL Injection
`chaos_security_s2_idor`	Gauge	S2 — Order IDOR
`chaos_security_s3_hash`	Gauge	S3 — Exposed password hash
`chaos_security_s4_xss`	Gauge	S4 — Stored XSS
`chaos_security_s5_price`	Gauge	S5 — Price tampering
`chaos_security_s6_timing`	Gauge	S6 — Login timing attack
`chaos_security_s7_token`	Gauge	S7 — Weak HMAC token
`chaos_security_s8_path`	Gauge	S8 — Path Traversal
`chaos_security_s9_mass`	Gauge	S9 — Mass Assignment
`chaos_security_s10_portal`	Gauge	S10 — Unauthenticated portal stats
`chaos_security_s11_sqli`	Gauge	S11 — Portal login SQLi
`chaos_security_s12_idor`	Gauge	S12 — Privilege escalation IDOR

Scripting chaos metrics¶

Exposed by ChaosScriptingService.java. No per-event counter — only the level and the number of active bundles (which reflects the memory pressure from logged-in sessions):

Metric	Type	Description
`chaos_scripting_level`	Gauge	Current level (0–4)
`chaos_scripting_bundles_active`	Gauge	Number of TokenBundles in memory

System metrics¶

Exposed by ContainerCpuMetrics.java:

Metric	Type	Description
`container_cpu_usage`	Gauge	Container CPU load (0.0 – 1.0)

The metric is obtained through reflection over com.sun.management.OperatingSystemMXBean.getCpuLoad() — it may return 0.0 on JVMs that do not expose this method (case with JDK 16+ strong modules without --add-opens).

Spring Boot / JVM metrics¶

PerfShop exposes all the standard Micrometer metrics through Spring Boot Actuator autoconfiguration. The main ones useful for chaos diagnosis:

JVM memory¶

Metric	Type	Description
`jvm_memory_used_bytes{area="heap"}`	Gauge	Used heap
`jvm_memory_max_bytes{area="heap"}`	Gauge	Configured `-Xmx`
`jvm_memory_used_bytes{area="nonheap"}`	Gauge	Metaspace, code cache
`jvm_memory_committed_bytes{area="heap"}`	Gauge	Currently committed heap

Garbage collector¶

Metric	Type	Description
`jvm_gc_pause_seconds_count`	Counter	Number of GC pauses
`jvm_gc_pause_seconds_sum`	Counter	Cumulative time spent in pause
`jvm_gc_pause_seconds_max`	Gauge	Longest recent GC pause
`jvm_gc_memory_promoted_bytes_total`	Counter	Bytes promoted to old gen

JVM threads¶

Metric	Type	Description
`jvm_threads_states_threads{state="runnable"}`	Gauge	Active threads
`jvm_threads_states_threads{state="blocked"}`	Gauge	Threads blocked on a monitor
`jvm_threads_states_threads{state="waiting"}`	Gauge	Waiting threads
`jvm_threads_live_threads`	Gauge	Total live threads

Tomcat¶

Metric	Type	Description
`tomcat_threads_busy_threads`	Gauge	Tomcat busy threads
`tomcat_threads_current_threads`	Gauge	Current Tomcat threads
`tomcat_threads_config_max_threads`	Gauge	Configured max (200 by default)

HikariCP (DB pool)¶

Metric	Type	Description
`hikaricp_connections_active`	Gauge	Connections currently in use
`hikaricp_connections_idle`	Gauge	Idle connections in the pool
`hikaricp_connections_pending`	Gauge	Threads waiting for a connection
`hikaricp_connections_max`	Gauge	Max pool size
`hikaricp_connections_min`	Gauge	Min pool size
`hikaricp_connections_acquire_seconds_max`	Gauge	Recent max acquisition time

HTTP Server (Spring MVC)¶

Metric	Type	Description
`http_server_requests_seconds_count{uri,method,status}`	Counter	Number of requests
`http_server_requests_seconds_sum{uri,method,status}`	Counter	Cumulative total time
`http_server_requests_seconds_max{uri,method,status}`	Gauge	Recent max latency
`http_server_requests_seconds_bucket{uri,le}`	Histogram	Histogram for quantiles

The _bucket histogram lets you compute arbitrary quantiles through histogram_quantile() in PromQL — the standard Prometheus mechanism for latencies.

A few useful PromQL queries¶

Global p99 latency correlated with CPU¶

# p99 latency over 1 minute
histogram_quantile(0.99,
  sum(rate(http_server_requests_seconds_bucket[1m])) by (le))

# To be compared with
container_cpu_usage

To be plotted in the same Grafana panel — you directly observe the correlation between CPU load and p99 latency degradation.

Business anomaly rate over 5 minutes¶

# All business anomalies aggregated
sum(rate(chaos_business_a1_tva[5m]))
+ sum(rate(chaos_business_a2_arrondi[5m]))
+ sum(rate(chaos_business_a3_stock[5m]))
# … add up the 16 counters

More practical: use a regex on the metric name with {__name__=~"chaos_business_a.*"} (depending on the Prometheus version).

HikariCP pool saturation¶

# Threads waiting for a DB connection
hikaricp_connections_pending > 0

# Saturation ratio (0 to 1)
hikaricp_connections_active / hikaricp_connections_max

Good alert trigger: hikaricp_connections_pending > 0 for more than 30 seconds almost always signals either an active DB Pool Chaos or a real pool saturation in production.

Heap exceeding 80 % of max¶

jvm_memory_used_bytes{area="heap"}
  / jvm_memory_max_bytes{area="heap"} > 0.8

Threshold at 0.8 = the default memoryGuardrail of Memory Chaos. When this ratio exceeds 0.8, either memory chaos is active or the application has a real memory leak.

Excessive GC pressure¶

# Cumulative GC time over 1 minute (in seconds)
rate(jvm_gc_pause_seconds_sum[1m])

# > 0.1 (10 % of the time spent in GC) = problem

This is the typical signature of GC Pressure Chaos: the ratio easily exceeds 0.2 at intensity 100 %, whereas a healthy application stays below 0.05.

Active security flaws¶

# All flaws triggered over 1 minute
sum(rate({__name__=~"chaos_security_s.*"}[1m]))

Lets you visualize an aggregated view of Security Chaos activity without having to list the 12 counters individually.

Client metrics (Frontend Chaos)¶

Frontend Chaos metrics are not exposed to Prometheus directly — they are collected by chaos-agent.js on the browser side and POSTed to /api/chaos/client-metrics every 2 seconds. The monitoring service consumes and displays them in real time, without Prometheus persistence.

The collected fields are documented in the Frontend Chaos page.

Prometheus endpoint¶

All the backend metrics above are available at:

GET /actuator/prometheus

This endpoint is excluded from ChaosInterceptor — it remains reachable even when the backend is under 100 % chaos, which guarantees that Prometheus can keep scraping. Scraping happens every 15 seconds by default (configurable in prometheus/prometheus.yml).

Going further¶

Performance Chaos — infrastructure levers and their metrics
Business Chaos — detail of the 16 business anomalies
Functional Chaos — detail of the 4 F1 – F4 anomalies
Security Chaos — detail of the 12 S1 – S12 flaws
Frontend Chaos — browser client metrics
Observability — Prometheus — scraping configuration
Observability — Grafana — shipped dashboards