Skip to content

Performance Chaos

Performance Chaos groups eight backend infrastructure levers exposed by ChaosService and executed by four technical components: ChaosInterceptor (thread pool, slow query, network), CpuChaosScheduler (CPU), MemoryLeakSimulator (memory leak), GcPressureSimulator (GC pressure), and DbPoolChaosScheduler (HikariCP saturation). OrderService additionally carries the deadlock injection specific to checkout.

Each lever is an independent 0 – 100 slider (except memory, which uses two coupled sliders) driven by POST /api/admin/chaos/<lever>.

Lever overview

Lever Range Technical class Admin endpoint
CPU 0 – 100 CpuChaosScheduler POST /api/admin/chaos/cpu
CPU ratio 1 – 5 CpuChaosScheduler POST /api/admin/chaos/cpu
Memory leak 0 – 105 MemoryLeakSimulator POST /api/admin/chaos/memory
Memory guardrail 0 – 100 MemoryLeakSimulator POST /api/admin/chaos/memory
GC pressure 0 – 100 GcPressureSimulator POST /api/admin/chaos/gc-pressure
DB pool 0 – 100 DbPoolChaosScheduler POST /api/admin/chaos/db-pool
Thread pool 0 – 100 ChaosInterceptor POST /api/admin/chaos/thread-pool
Slow query 0 – 100 ChaosInterceptor POST /api/admin/chaos/slow-queries
Deadlock 0 – 100 OrderService POST /api/admin/chaos/deadlock
Network timeout 0 – 100 ChaosInterceptor POST /api/admin/chaos/network

All endpoints require the X-Admin-Token header or a valid admin session. The returned value follows the { success, message, status } schema where status is the full state from ChaosService.getStatus().

CPU — CpuChaosScheduler

Class: CpuChaosScheduler.java Metrics: chaos_intensity{type="cpu"}, container_cpu_usage

The scheduler runs on @Scheduled(fixedRate = 100) and submits to a dedicated thread pool (Executors.newFixedThreadPool(5)) a SHA-256 hashing loop calibrated to saturate CPU in a controlled way. The formula:

iterations = intensity × 3 200
parallel threads = ratio (1 – 5)

Reference calibration: on an Intel i7-8700T, intensity = 100 with ratio = 1 produces ≈ 100 % load on a single core. The ratio parameter multiplies the number of parallel threads submitted to the pool — it lets you adapt the chaos to more powerful machines (Ryzen 5800X and above).

Activation

curl -X POST https://perfshop-api.perfshop.io/api/admin/chaos/cpu \
  -H "X-Admin-Token: $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"intensity": 80, "ratio": 2}'

Observation

  • Main metric: container_cpu_usage (Gauge 0.0 – 1.0)
  • Side effect: http_server_requests_seconds{quantile="0.99"} rises proportionally
  • Logs: [BackendChaos] CPU intensity set to: 80 on every change

Memory — MemoryLeakSimulator

Class: MemoryLeakSimulator.java (version v3-bidirectional) Metrics: chaos_intensity{type="memory"}, chaos_guardrail{type="memory"}, jvm_memory_used_bytes{area="heap"}

The memory simulator uses two coupled sliders to allow a progressive, bounded memory leak:

Slider Range Meaning
memoryLeakTarget 0 – 100 Percentage of the guardrail capacity to fill
memoryLeakTarget 105 Special value — intentional OOM with no cap
memoryGuardrail 0 – 100 Safety cap as % of max heap (-Xmx), default 80

Effective formula

effective_target_pct = target × guardrail / 100
effective_target_bytes = -Xmx × effective_target_pct / 100

The cap applies to the used heap (totalMemory - freeMemory), not only to memory allocated by the simulator. As long as the used heap is below the target, the simulator allocates 5 % of -Xmx per second. As soon as it exceeds the target, it releases 2 blocks per tick and triggers System.gc().

Intentional OOM mode

The combination target = 105 + guardrail = 100 disables the guardrail and allocates until it triggers an OutOfMemoryError. This is the only configuration that can crash the JVM — all other combinations are bounded by the guardrail.

Examples on -Xmx 1g

target guardrail Target heap Behavior
0 80 0 MB Disabled — releases all memory
50 80 410 MB Plateau at ~40 % of heap
100 80 820 MB Plateau at 80 % — guardrail active
100 50 512 MB Plateau at 50 %
105 80 820 MB OOM mode capped by 80 % guardrail
105 100 Intentional OOM — JVM crash

Bidirectional behavior

The slider is reactive in both directions. When the instructor lowers memoryLeakTarget from 100 to 50 %, the simulator progressively releases the allocated blocks (2 per second) until it drops below the new target. At 0, all leaked memory is released at once and System.gc() is called.

Activation

curl -X POST https://perfshop-api.perfshop.io/api/admin/chaos/memory \
  -H "X-Admin-Token: $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"intensity": 100, "guardrail": 80}'

The intensity field maps to memoryLeakTarget, guardrail to memoryGuardrail. The latter is optional — if omitted, the current value is preserved (default 80).

GC Pressure — GcPressureSimulator

Class: GcPressureSimulator.java (version v2-sawtooth-cycle) Metric: chaos_intensity{type="gc_pressure"}

Unlike MemoryLeakSimulator which produces a plateau (leaked memory is not released as long as the slider stays high), the GC pressure simulator cyclically allocates then releases, producing a characteristic sawtooth pattern on the Grafana heap graph.

FILL → RELEASE → PAUSE cycle

gantt
    title GC pressure cycle (intensity = 50%, -Xmx 1g)
    dateFormat  s
    axisFormat  %Ss
    section Phase
    FILL +75 MB        :a1, 0, 1s
    FILL +75 MB        :a2, 1, 1s
    FILL +75 MB        :a3, 2, 1s
    FILL +75 MB        :a4, 3, 1s
    RELEASE all + GC   :crit, a5, 4, 1s
    PAUSE              :a6, 5, 1s
    PAUSE              :a7, 6, 1s
Phase Duration Action
FILL 4 ticks Allocate one block per tick (20 % × intensity / 100)
RELEASE 1 tick clear() the buffer + System.gc()
PAUSE 2 ticks No action — let the GC finish

The full cycle lasts 7 seconds regardless of -Xmx. At intensity = 50 on -Xmx 1g, each cycle allocates ≈ 300 MB in 4 seconds then releases abruptly. At intensity = 100, the teeth climb until they saturate the heap — the GC becomes extremely aggressive.

Activation

curl -X POST https://perfshop-api.perfshop.io/api/admin/chaos/gc-pressure \
  -H "X-Admin-Token: $TOKEN" \
  -d '{"intensity": 50}'

Observation

  • Main metric: jvm_memory_used_bytes{area="heap"} — sawtooth pattern
  • GC metrics: jvm_gc_pause_seconds_count rises, jvm_gc_pause_seconds_sum as well; jvm_gc_pause_seconds_max can reach several hundred ms
  • Side effect: parasitic CPU spikes visible in container_cpu_usage
  • Logs: [BackendChaos] GcPressure: FILL +75 MB — heap=... on every tick

DB Pool — DbPoolChaosScheduler

Class: DbPoolChaosScheduler.java Metrics: chaos_intensity{type="db_pool"}, hikaricp_connections_active, hikaricp_connections_pending

The scheduler steals a proportional number of HikariCP connections by keeping them open with setAutoCommit(false) (uncommitted transaction). The pool's maximum size is read from spring.datasource.hikari.maximum-pool-size (default 20).

Formula

target_blocked = floor(intensity / 100 × (hikari_max_pool − 1))

At least 1 connection remains free so that the scheduler itself does not deadlock. At intensity = 100 with a pool of 20, the scheduler blocks 19 connections — all new requests wait until the HikariCP timeout (default 30 s).

Tomcat thread pool — ChaosInterceptor

Class: ChaosInterceptor.applyThreadPoolChaos() Metrics: chaos_intensity{type="thread_pool"}, tomcat_threads_busy_threads

For every non-excluded HTTP request, the interceptor applies a Thread.sleep proportional to the intensity — calibrated to progressively saturate the Tomcat pool (default 200 threads). The formula:

Intensity Applied delay
1 – 24 % intensity × 20 ms
25 – 49 % intensity × 30 ms
50 – 74 % 1 500 ms fixed
75 – 99 % 3 000 ms fixed
100 % 5 000 ms fixed

At 25 % the delay is 750 ms, at 50 % it jumps to 1 500 ms: the formula is deliberately progressive at the lower steps and fixed-plateau beyond, to produce a gradual degradation visible in Grafana rather than an abrupt jump. The exact shape is documented in the ChaosInterceptor source code.

Slow Query — ChaosInterceptor

Class: ChaosInterceptor.applySlowQueryChaos() Metrics: chaos_intensity{type="slow_query"}, http_server_requests_seconds{quantile="0.99"}

Identical in principle to the thread pool (similar formula) but applied only to endpoints outside /api/products — the latter is handled by ProductService.applySlowQueryChaos() to avoid double-counting.

Intensity Applied delay
1 – 24 % intensity × 15 ms
25 – 49 % intensity × 25 ms
50 – 74 % 2 000 ms fixed
75 – 99 % 4 000 ms fixed
100 % 6 000 ms fixed

Deadlock — OrderService

Metric: chaos_intensity{type="deadlock"} Endpoint: POST /api/admin/chaos/deadlock

The deadlock injection is carried by OrderService at checkout time. It simulates two concurrent transactions acquiring locks in reverse order (classic anti-pattern: SELECT … FOR UPDATE on two products in opposite orders). At high intensity, the transaction is aborted by MySQL with the error Deadlock found when trying to get lock and the client receives an HTTP 500 with the i18n message order.error.deadlock.

Since deadlocks are inherently probabilistic, the slider level tunes the trigger probability per checkout, not a fixed duration.

Network timeout — ChaosInterceptor

Class: ChaosInterceptor.applyNetworkChaos() Metrics: chaos_intensity{type="network"}, HTTP 503 count

Network Chaos covers every endpoint in the user journey: /api/orders, /api/auth, /api/products, /api/cart, /api/checkout. The trigger probability is proportional to the intensity (at intensity = 30, 30 % of targeted requests are impacted).

Intensity Applied delay
1 – 49 % intensity × 20 ms
50 – 74 % 1 500 ms fixed
75 – 99 % 3 000 ms fixed
100 % 6 000 ms fixed

From 75 % upward, the interceptor additionally has a 20 % chance of sending an HTTP 503 with a Retry-After: 5 header, simulating an unavailable upstream service. The max delay is capped at 6 seconds to stay compatible with the delays added by OrderService.processPaymentPublic() (+ 4 s max), within the usual 15 s client timeout.

Global reset

curl -X POST https://perfshop-api.perfshop.io/api/admin/chaos/reset \
  -H "X-Admin-Token: $TOKEN"

POST /api/admin/chaos/reset returns all Performance intensities to zero (memory, GC, DB pool, threads, CPU, slow query, deadlock, network), immediately releases the leaked memory and the GC buffer, and additionally resets Frontend Chaos, Scripting Chaos, Business, Functional, and Security. This is the equivalent of the instructor's "Reset all chaos" button.

Default values after reset:

Lever Post-reset value
cpuRatio 1
memoryGuardrail 80 %
All others 0

Pedagogical relevance

Each lever is modeled on a real cause of production incident:

Lever Real-world cause illustrated
CPU Heavy synchronous computation not offloaded (hashing, cryptography)
Memory Cache without eviction, undetached JPA listeners, growing statics
GC pressure Excessive per-request allocations, no object pooling
DB pool FetchType.EAGER on large collections, transactions kept too long
Thread pool Synchronous calls to a slow external service, no timeout
Slow query Missing indexes discovered under load
Deadlock Transactions acquiring locks in opposite orders
Network Degraded downstream payment service with no circuit breaker

Status endpoint

curl https://perfshop-api.perfshop.io/api/chaos/public/status

Returns the full state (backend + frontend + business + functional) without authentication. This is the reference endpoint for real-time monitoring.