Performance Chaos¶
Performance Chaos groups eight backend infrastructure levers exposed by ChaosService and executed by four technical components: ChaosInterceptor (thread pool, slow query, network), CpuChaosScheduler (CPU), MemoryLeakSimulator (memory leak), GcPressureSimulator (GC pressure), and DbPoolChaosScheduler (HikariCP saturation). OrderService additionally carries the deadlock injection specific to checkout.
Each lever is an independent 0 – 100 slider (except memory, which uses two coupled sliders) driven by POST /api/admin/chaos/<lever>.
Lever overview¶
| Lever | Range | Technical class | Admin endpoint |
|---|---|---|---|
| CPU | 0 – 100 | CpuChaosScheduler |
POST /api/admin/chaos/cpu |
| CPU ratio | 1 – 5 | CpuChaosScheduler |
POST /api/admin/chaos/cpu |
| Memory leak | 0 – 105 | MemoryLeakSimulator |
POST /api/admin/chaos/memory |
| Memory guardrail | 0 – 100 | MemoryLeakSimulator |
POST /api/admin/chaos/memory |
| GC pressure | 0 – 100 | GcPressureSimulator |
POST /api/admin/chaos/gc-pressure |
| DB pool | 0 – 100 | DbPoolChaosScheduler |
POST /api/admin/chaos/db-pool |
| Thread pool | 0 – 100 | ChaosInterceptor |
POST /api/admin/chaos/thread-pool |
| Slow query | 0 – 100 | ChaosInterceptor |
POST /api/admin/chaos/slow-queries |
| Deadlock | 0 – 100 | OrderService |
POST /api/admin/chaos/deadlock |
| Network timeout | 0 – 100 | ChaosInterceptor |
POST /api/admin/chaos/network |
All endpoints require the X-Admin-Token header or a valid admin session. The returned value follows the { success, message, status } schema where status is the full state from ChaosService.getStatus().
CPU — CpuChaosScheduler¶
Class: CpuChaosScheduler.java
Metrics: chaos_intensity{type="cpu"}, container_cpu_usage
The scheduler runs on @Scheduled(fixedRate = 100) and submits to a dedicated thread pool (Executors.newFixedThreadPool(5)) a SHA-256 hashing loop calibrated to saturate CPU in a controlled way. The formula:
Reference calibration: on an Intel i7-8700T, intensity = 100 with ratio = 1 produces ≈ 100 % load on a single core. The ratio parameter multiplies the number of parallel threads submitted to the pool — it lets you adapt the chaos to more powerful machines (Ryzen 5800X and above).
Activation¶
curl -X POST https://perfshop-api.perfshop.io/api/admin/chaos/cpu \
-H "X-Admin-Token: $TOKEN" \
-H "Content-Type: application/json" \
-d '{"intensity": 80, "ratio": 2}'
Observation¶
- Main metric:
container_cpu_usage(Gauge 0.0 – 1.0) - Side effect:
http_server_requests_seconds{quantile="0.99"}rises proportionally - Logs:
[BackendChaos] CPU intensity set to: 80on every change
Memory — MemoryLeakSimulator¶
Class: MemoryLeakSimulator.java (version v3-bidirectional)
Metrics: chaos_intensity{type="memory"}, chaos_guardrail{type="memory"}, jvm_memory_used_bytes{area="heap"}
The memory simulator uses two coupled sliders to allow a progressive, bounded memory leak:
| Slider | Range | Meaning |
|---|---|---|
memoryLeakTarget |
0 – 100 | Percentage of the guardrail capacity to fill |
memoryLeakTarget |
105 | Special value — intentional OOM with no cap |
memoryGuardrail |
0 – 100 | Safety cap as % of max heap (-Xmx), default 80 |
Effective formula¶
effective_target_pct = target × guardrail / 100
effective_target_bytes = -Xmx × effective_target_pct / 100
The cap applies to the used heap (totalMemory - freeMemory), not only to memory allocated by the simulator. As long as the used heap is below the target, the simulator allocates 5 % of -Xmx per second. As soon as it exceeds the target, it releases 2 blocks per tick and triggers System.gc().
Intentional OOM mode¶
The combination target = 105 + guardrail = 100 disables the guardrail and allocates until it triggers an OutOfMemoryError. This is the only configuration that can crash the JVM — all other combinations are bounded by the guardrail.
Examples on -Xmx 1g¶
target |
guardrail |
Target heap | Behavior |
|---|---|---|---|
| 0 | 80 | 0 MB | Disabled — releases all memory |
| 50 | 80 | 410 MB | Plateau at ~40 % of heap |
| 100 | 80 | 820 MB | Plateau at 80 % — guardrail active |
| 100 | 50 | 512 MB | Plateau at 50 % |
| 105 | 80 | 820 MB | OOM mode capped by 80 % guardrail |
| 105 | 100 | ∞ | Intentional OOM — JVM crash |
Bidirectional behavior¶
The slider is reactive in both directions. When the instructor lowers memoryLeakTarget from 100 to 50 %, the simulator progressively releases the allocated blocks (2 per second) until it drops below the new target. At 0, all leaked memory is released at once and System.gc() is called.
Activation¶
curl -X POST https://perfshop-api.perfshop.io/api/admin/chaos/memory \
-H "X-Admin-Token: $TOKEN" \
-H "Content-Type: application/json" \
-d '{"intensity": 100, "guardrail": 80}'
The intensity field maps to memoryLeakTarget, guardrail to memoryGuardrail. The latter is optional — if omitted, the current value is preserved (default 80).
GC Pressure — GcPressureSimulator¶
Class: GcPressureSimulator.java (version v2-sawtooth-cycle)
Metric: chaos_intensity{type="gc_pressure"}
Unlike MemoryLeakSimulator which produces a plateau (leaked memory is not released as long as the slider stays high), the GC pressure simulator cyclically allocates then releases, producing a characteristic sawtooth pattern on the Grafana heap graph.
FILL → RELEASE → PAUSE cycle¶
gantt
title GC pressure cycle (intensity = 50%, -Xmx 1g)
dateFormat s
axisFormat %Ss
section Phase
FILL +75 MB :a1, 0, 1s
FILL +75 MB :a2, 1, 1s
FILL +75 MB :a3, 2, 1s
FILL +75 MB :a4, 3, 1s
RELEASE all + GC :crit, a5, 4, 1s
PAUSE :a6, 5, 1s
PAUSE :a7, 6, 1s
| Phase | Duration | Action |
|---|---|---|
| FILL | 4 ticks | Allocate one block per tick (20 % × intensity / 100) |
| RELEASE | 1 tick | clear() the buffer + System.gc() |
| PAUSE | 2 ticks | No action — let the GC finish |
The full cycle lasts 7 seconds regardless of -Xmx. At intensity = 50 on -Xmx 1g, each cycle allocates ≈ 300 MB in 4 seconds then releases abruptly. At intensity = 100, the teeth climb until they saturate the heap — the GC becomes extremely aggressive.
Activation¶
curl -X POST https://perfshop-api.perfshop.io/api/admin/chaos/gc-pressure \
-H "X-Admin-Token: $TOKEN" \
-d '{"intensity": 50}'
Observation¶
- Main metric:
jvm_memory_used_bytes{area="heap"}— sawtooth pattern - GC metrics:
jvm_gc_pause_seconds_countrises,jvm_gc_pause_seconds_sumas well;jvm_gc_pause_seconds_maxcan reach several hundred ms - Side effect: parasitic CPU spikes visible in
container_cpu_usage - Logs:
[BackendChaos] GcPressure: FILL +75 MB — heap=...on every tick
DB Pool — DbPoolChaosScheduler¶
Class: DbPoolChaosScheduler.java
Metrics: chaos_intensity{type="db_pool"}, hikaricp_connections_active, hikaricp_connections_pending
The scheduler steals a proportional number of HikariCP connections by keeping them open with setAutoCommit(false) (uncommitted transaction). The pool's maximum size is read from spring.datasource.hikari.maximum-pool-size (default 20).
Formula¶
At least 1 connection remains free so that the scheduler itself does not deadlock. At intensity = 100 with a pool of 20, the scheduler blocks 19 connections — all new requests wait until the HikariCP timeout (default 30 s).
Tomcat thread pool — ChaosInterceptor¶
Class: ChaosInterceptor.applyThreadPoolChaos()
Metrics: chaos_intensity{type="thread_pool"}, tomcat_threads_busy_threads
For every non-excluded HTTP request, the interceptor applies a Thread.sleep proportional to the intensity — calibrated to progressively saturate the Tomcat pool (default 200 threads). The formula:
| Intensity | Applied delay |
|---|---|
| 1 – 24 % | intensity × 20 ms |
| 25 – 49 % | intensity × 30 ms |
| 50 – 74 % | 1 500 ms fixed |
| 75 – 99 % | 3 000 ms fixed |
| 100 % | 5 000 ms fixed |
At 25 % the delay is 750 ms, at 50 % it jumps to 1 500 ms: the formula is deliberately progressive at the lower steps and fixed-plateau beyond, to produce a gradual degradation visible in Grafana rather than an abrupt jump. The exact shape is documented in the ChaosInterceptor source code.
Slow Query — ChaosInterceptor¶
Class: ChaosInterceptor.applySlowQueryChaos()
Metrics: chaos_intensity{type="slow_query"}, http_server_requests_seconds{quantile="0.99"}
Identical in principle to the thread pool (similar formula) but applied only to endpoints outside /api/products — the latter is handled by ProductService.applySlowQueryChaos() to avoid double-counting.
| Intensity | Applied delay |
|---|---|
| 1 – 24 % | intensity × 15 ms |
| 25 – 49 % | intensity × 25 ms |
| 50 – 74 % | 2 000 ms fixed |
| 75 – 99 % | 4 000 ms fixed |
| 100 % | 6 000 ms fixed |
Deadlock — OrderService¶
Metric: chaos_intensity{type="deadlock"}
Endpoint: POST /api/admin/chaos/deadlock
The deadlock injection is carried by OrderService at checkout time. It simulates two concurrent transactions acquiring locks in reverse order (classic anti-pattern: SELECT … FOR UPDATE on two products in opposite orders). At high intensity, the transaction is aborted by MySQL with the error Deadlock found when trying to get lock and the client receives an HTTP 500 with the i18n message order.error.deadlock.
Since deadlocks are inherently probabilistic, the slider level tunes the trigger probability per checkout, not a fixed duration.
Network timeout — ChaosInterceptor¶
Class: ChaosInterceptor.applyNetworkChaos()
Metrics: chaos_intensity{type="network"}, HTTP 503 count
Network Chaos covers every endpoint in the user journey: /api/orders, /api/auth, /api/products, /api/cart, /api/checkout. The trigger probability is proportional to the intensity (at intensity = 30, 30 % of targeted requests are impacted).
| Intensity | Applied delay |
|---|---|
| 1 – 49 % | intensity × 20 ms |
| 50 – 74 % | 1 500 ms fixed |
| 75 – 99 % | 3 000 ms fixed |
| 100 % | 6 000 ms fixed |
From 75 % upward, the interceptor additionally has a 20 % chance of sending an HTTP 503 with a Retry-After: 5 header, simulating an unavailable upstream service. The max delay is capped at 6 seconds to stay compatible with the delays added by OrderService.processPaymentPublic() (+ 4 s max), within the usual 15 s client timeout.
Global reset¶
POST /api/admin/chaos/reset returns all Performance intensities to zero (memory, GC, DB pool, threads, CPU, slow query, deadlock, network), immediately releases the leaked memory and the GC buffer, and additionally resets Frontend Chaos, Scripting Chaos, Business, Functional, and Security. This is the equivalent of the instructor's "Reset all chaos" button.
Default values after reset:
| Lever | Post-reset value |
|---|---|
cpuRatio |
1 |
memoryGuardrail |
80 % |
| All others | 0 |
Pedagogical relevance¶
Each lever is modeled on a real cause of production incident:
| Lever | Real-world cause illustrated |
|---|---|
| CPU | Heavy synchronous computation not offloaded (hashing, cryptography) |
| Memory | Cache without eviction, undetached JPA listeners, growing statics |
| GC pressure | Excessive per-request allocations, no object pooling |
| DB pool | FetchType.EAGER on large collections, transactions kept too long |
| Thread pool | Synchronous calls to a slow external service, no timeout |
| Slow query | Missing indexes discovered under load |
| Deadlock | Transactions acquiring locks in opposite orders |
| Network | Degraded downstream payment service with no circuit breaker |
Status endpoint¶
Returns the full state (backend + frontend + business + functional) without authentication. This is the reference endpoint for real-time monitoring.