Performance Chaos¶

Performance Chaos groups eight backend infrastructure levers exposed by ChaosService and executed by four technical components: ChaosInterceptor (thread pool, slow query, network), CpuChaosScheduler (CPU), MemoryLeakSimulator (memory leak), GcPressureSimulator (GC pressure), and DbPoolChaosScheduler (HikariCP saturation). OrderService additionally carries the deadlock injection specific to checkout.

Each lever is an independent 0 – 100 slider (except memory, which uses two coupled sliders) driven by POST /api/admin/chaos/<lever>.

Lever overview¶

Lever	Range	Technical class	Admin endpoint
CPU	0 – 100	`CpuChaosScheduler`	`POST /api/admin/chaos/cpu`
CPU ratio	1 – 5	`CpuChaosScheduler`	`POST /api/admin/chaos/cpu`
Memory leak	0 – 105	`MemoryLeakSimulator`	`POST /api/admin/chaos/memory`
Memory guardrail	0 – 100	`MemoryLeakSimulator`	`POST /api/admin/chaos/memory`
GC pressure	0 – 100	`GcPressureSimulator`	`POST /api/admin/chaos/gc-pressure`
DB pool	0 – 100	`DbPoolChaosScheduler`	`POST /api/admin/chaos/db-pool`
Thread pool	0 – 100	`ChaosInterceptor`	`POST /api/admin/chaos/thread-pool`
Slow query	0 – 100	`ChaosInterceptor`	`POST /api/admin/chaos/slow-queries`
Deadlock	0 – 100	`OrderService`	`POST /api/admin/chaos/deadlock`
Network timeout	0 – 100	`ChaosInterceptor`	`POST /api/admin/chaos/network`

All endpoints require the X-Admin-Token header or a valid admin session. The returned value follows the { success, message, status } schema where status is the full state from ChaosService.getStatus().

CPU — `CpuChaosScheduler`¶

Class: CpuChaosScheduler.java Metrics: chaos_intensity{type="cpu"}, container_cpu_usage

The scheduler runs on @Scheduled(fixedRate = 100) and submits to a dedicated thread pool (Executors.newFixedThreadPool(5)) a SHA-256 hashing loop calibrated to saturate CPU in a controlled way. The formula:

iterations = intensity × 3 200
parallel threads = ratio (1 – 5)

Reference calibration: on an Intel i7-8700T, intensity = 100 with ratio = 1 produces ≈ 100 % load on a single core. The ratio parameter multiplies the number of parallel threads submitted to the pool — it lets you adapt the chaos to more powerful machines (Ryzen 5800X and above).

Activation¶

curl -X POST https://perfshop-api.perfshop.io/api/admin/chaos/cpu \
  -H "X-Admin-Token: $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"intensity": 80, "ratio": 2}'

Observation¶

Main metric: container_cpu_usage (Gauge 0.0 – 1.0)
Side effect: http_server_requests_seconds{quantile="0.99"} rises proportionally
Logs: [BackendChaos] CPU intensity set to: 80 on every change

Memory — `MemoryLeakSimulator`¶

Class: MemoryLeakSimulator.java (version v3-bidirectional) Metrics: chaos_intensity{type="memory"}, chaos_guardrail{type="memory"}, jvm_memory_used_bytes{area="heap"}

The memory simulator uses two coupled sliders to allow a progressive, bounded memory leak:

Slider	Range	Meaning
`memoryLeakTarget`	0 – 100	Percentage of the guardrail capacity to fill
`memoryLeakTarget`	105	Special value — intentional OOM with no cap
`memoryGuardrail`	0 – 100	Safety cap as % of max heap (`-Xmx`), default 80

Effective formula¶

effective_target_pct = target × guardrail / 100
effective_target_bytes = -Xmx × effective_target_pct / 100

The cap applies to the used heap (totalMemory - freeMemory), not only to memory allocated by the simulator. As long as the used heap is below the target, the simulator allocates 5 % of -Xmx per second. As soon as it exceeds the target, it releases 2 blocks per tick and triggers System.gc().

Intentional OOM mode¶

The combination target = 105 + guardrail = 100 disables the guardrail and allocates until it triggers an OutOfMemoryError. This is the only configuration that can crash the JVM — all other combinations are bounded by the guardrail.

Examples on `-Xmx 1g`¶

`target`	`guardrail`	Target heap	Behavior
0	80	0 MB	Disabled — releases all memory
50	80	410 MB	Plateau at ~40 % of heap
100	80	820 MB	Plateau at 80 % — guardrail active
100	50	512 MB	Plateau at 50 %
105	80	820 MB	OOM mode capped by 80 % guardrail
105	100	∞	Intentional OOM — JVM crash

Bidirectional behavior¶

The slider is reactive in both directions. When the instructor lowers memoryLeakTarget from 100 to 50 %, the simulator progressively releases the allocated blocks (2 per second) until it drops below the new target. At 0, all leaked memory is released at once and System.gc() is called.

Activation¶

curl -X POST https://perfshop-api.perfshop.io/api/admin/chaos/memory \
  -H "X-Admin-Token: $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"intensity": 100, "guardrail": 80}'

The intensity field maps to memoryLeakTarget, guardrail to memoryGuardrail. The latter is optional — if omitted, the current value is preserved (default 80).

GC Pressure — `GcPressureSimulator`¶

Class: GcPressureSimulator.java (version v2-sawtooth-cycle) Metric: chaos_intensity{type="gc_pressure"}

Unlike MemoryLeakSimulator which produces a plateau (leaked memory is not released as long as the slider stays high), the GC pressure simulator cyclically allocates then releases, producing a characteristic sawtooth pattern on the Grafana heap graph.

FILL → RELEASE → PAUSE cycle¶

gantt
    title GC pressure cycle (intensity = 50%, -Xmx 1g)
    dateFormat  s
    axisFormat  %Ss
    section Phase
    FILL +75 MB        :a1, 0, 1s
    FILL +75 MB        :a2, 1, 1s
    FILL +75 MB        :a3, 2, 1s
    FILL +75 MB        :a4, 3, 1s
    RELEASE all + GC   :crit, a5, 4, 1s
    PAUSE              :a6, 5, 1s
    PAUSE              :a7, 6, 1s

Phase	Duration	Action
FILL	4 ticks	Allocate one block per tick (20 % × intensity / 100)
RELEASE	1 tick	`clear()` the buffer + `System.gc()`
PAUSE	2 ticks	No action — let the GC finish

The full cycle lasts 7 seconds regardless of -Xmx. At intensity = 50 on -Xmx 1g, each cycle allocates ≈ 300 MB in 4 seconds then releases abruptly. At intensity = 100, the teeth climb until they saturate the heap — the GC becomes extremely aggressive.

Activation¶

curl -X POST https://perfshop-api.perfshop.io/api/admin/chaos/gc-pressure \
  -H "X-Admin-Token: $TOKEN" \
  -d '{"intensity": 50}'

Observation¶

Main metric: jvm_memory_used_bytes{area="heap"} — sawtooth pattern
GC metrics: jvm_gc_pause_seconds_count rises, jvm_gc_pause_seconds_sum as well; jvm_gc_pause_seconds_max can reach several hundred ms
Side effect: parasitic CPU spikes visible in container_cpu_usage
Logs: [BackendChaos] GcPressure: FILL +75 MB — heap=... on every tick

DB Pool — `DbPoolChaosScheduler`¶

Class: DbPoolChaosScheduler.java Metrics: chaos_intensity{type="db_pool"}, hikaricp_connections_active, hikaricp_connections_pending

The scheduler steals a proportional number of HikariCP connections by keeping them open with setAutoCommit(false) (uncommitted transaction). The pool's maximum size is read from spring.datasource.hikari.maximum-pool-size (default 20).

Formula¶

target_blocked = floor(intensity / 100 × (hikari_max_pool − 1))

At least 1 connection remains free so that the scheduler itself does not deadlock. At intensity = 100 with a pool of 20, the scheduler blocks 19 connections — all new requests wait until the HikariCP timeout (default 30 s).

Tomcat thread pool — `ChaosInterceptor`¶

Class: ChaosInterceptor.applyThreadPoolChaos() Metrics: chaos_intensity{type="thread_pool"}, tomcat_threads_busy_threads

For every non-excluded HTTP request, the interceptor applies a Thread.sleep proportional to the intensity — calibrated to progressively saturate the Tomcat pool (default 200 threads). The formula:

Intensity	Applied delay
1 – 24 %	`intensity × 20 ms`
25 – 49 %	`intensity × 30 ms`
50 – 74 %	1 500 ms fixed
75 – 99 %	3 000 ms fixed
100 %	5 000 ms fixed

At 25 % the delay is 750 ms, at 50 % it jumps to 1 500 ms: the formula is deliberately progressive at the lower steps and fixed-plateau beyond, to produce a gradual degradation visible in Grafana rather than an abrupt jump. The exact shape is documented in the ChaosInterceptor source code.

Slow Query — `ChaosInterceptor`¶

Class: ChaosInterceptor.applySlowQueryChaos() Metrics: chaos_intensity{type="slow_query"}, http_server_requests_seconds{quantile="0.99"}

Identical in principle to the thread pool (similar formula) but applied only to endpoints outside /api/products — the latter is handled by ProductService.applySlowQueryChaos() to avoid double-counting.

Intensity	Applied delay
1 – 24 %	`intensity × 15 ms`
25 – 49 %	`intensity × 25 ms`
50 – 74 %	2 000 ms fixed
75 – 99 %	4 000 ms fixed
100 %	6 000 ms fixed

Deadlock — `OrderService`¶

Metric: chaos_intensity{type="deadlock"} Endpoint: POST /api/admin/chaos/deadlock

The deadlock injection is carried by OrderService at checkout time. It simulates two concurrent transactions acquiring locks in reverse order (classic anti-pattern: SELECT … FOR UPDATE on two products in opposite orders). At high intensity, the transaction is aborted by MySQL with the error Deadlock found when trying to get lock and the client receives an HTTP 500 with the i18n message order.error.deadlock.

Since deadlocks are inherently probabilistic, the slider level tunes the trigger probability per checkout, not a fixed duration.

Network timeout — `ChaosInterceptor`¶

Class: ChaosInterceptor.applyNetworkChaos() Metrics: chaos_intensity{type="network"}, HTTP 503 count

Network Chaos covers every endpoint in the user journey: /api/orders, /api/auth, /api/products, /api/cart, /api/checkout. The trigger probability is proportional to the intensity (at intensity = 30, 30 % of targeted requests are impacted).

Intensity	Applied delay
1 – 49 %	`intensity × 20 ms`
50 – 74 %	1 500 ms fixed
75 – 99 %	3 000 ms fixed
100 %	6 000 ms fixed

From 75 % upward, the interceptor additionally has a 20 % chance of sending an HTTP 503 with a Retry-After: 5 header, simulating an unavailable upstream service. The max delay is capped at 6 seconds to stay compatible with the delays added by OrderService.processPaymentPublic() (+ 4 s max), within the usual 15 s client timeout.

Global reset¶

curl -X POST https://perfshop-api.perfshop.io/api/admin/chaos/reset \
  -H "X-Admin-Token: $TOKEN"

POST /api/admin/chaos/reset returns all Performance intensities to zero (memory, GC, DB pool, threads, CPU, slow query, deadlock, network), immediately releases the leaked memory and the GC buffer, and additionally resets Frontend Chaos, Scripting Chaos, Business, Functional, and Security. This is the equivalent of the instructor's "Reset all chaos" button.

Default values after reset:

Lever	Post-reset value
`cpuRatio`	1
`memoryGuardrail`	80 %
All others	0

Pedagogical relevance¶

Each lever is modeled on a real cause of production incident:

Lever	Real-world cause illustrated
CPU	Heavy synchronous computation not offloaded (hashing, cryptography)
Memory	Cache without eviction, undetached JPA listeners, growing statics
GC pressure	Excessive per-request allocations, no object pooling
DB pool	`FetchType.EAGER` on large collections, transactions kept too long
Thread pool	Synchronous calls to a slow external service, no timeout
Slow query	Missing indexes discovered under load
Deadlock	Transactions acquiring locks in opposite orders
Network	Degraded downstream payment service with no circuit breaker

Status endpoint¶

curl https://perfshop-api.perfshop.io/api/chaos/public/status

Returns the full state (backend + frontend + business + functional) without authentication. This is the reference endpoint for real-time monitoring.

Performance Chaos¶

Lever overview¶

CPU — CpuChaosScheduler¶

Activation¶

Observation¶

Memory — MemoryLeakSimulator¶

Effective formula¶

Intentional OOM mode¶

Examples on -Xmx 1g¶

Bidirectional behavior¶

Activation¶

GC Pressure — GcPressureSimulator¶

FILL → RELEASE → PAUSE cycle¶

Activation¶

Observation¶

DB Pool — DbPoolChaosScheduler¶

Formula¶

Tomcat thread pool — ChaosInterceptor¶

Slow Query — ChaosInterceptor¶

Deadlock — OrderService¶

Network timeout — ChaosInterceptor¶

Global reset¶

Pedagogical relevance¶

Status endpoint¶

CPU — `CpuChaosScheduler`¶

Memory — `MemoryLeakSimulator`¶

Examples on `-Xmx 1g`¶

GC Pressure — `GcPressureSimulator`¶

DB Pool — `DbPoolChaosScheduler`¶

Tomcat thread pool — `ChaosInterceptor`¶

Slow Query — `ChaosInterceptor`¶

Deadlock — `OrderService`¶

Network timeout — `ChaosInterceptor`¶