OpenSearch and Vector¶
OpenSearch is PerfShop's second log sink, in parallel with Loki. Where Loki indexes only on labels and stores content as text (lightweight model, query by filtering), OpenSearch indexes all fields in full-text and enables rich aggregations and facets. Vector plays the role of collection and transformation agent between Docker logs and OpenSearch.
Source of truth
This page is taken from vector/vector.toml, opensearch/opensearch.yml, opensearch-seed/seed.py, and the perfshop-opensearch, perfshop-opensearch-dashboards, perfshop-vector, perfshop-opensearch-seed blocks of the compose files.
Why two log sinks?¶
This is a legitimate question — collecting the same logs twice does not seem natural. The answer comes down to three points:
- Pedagogical demonstration: students must be able to concretely compare the "label-based index" model (Loki) and the "full-text index" model (OpenSearch / Elasticsearch). Having both in parallel makes it possible to illustrate live, on the same real logs, the strengths and limitations of each approach.
- Different use cases: Loki is unbeatable for fast filtering by container and level during a lab. OpenSearch is unbeatable for exploratory full-text search ("find all exceptions where the word
connectionappears, regardless of the service"). - Grafana / OpenSearch Dashboards coupling: Loki is natively integrated into Grafana; OpenSearch has its own UI (OpenSearch Dashboards, a Kibana fork). Two UIs, two paradigms — the student sees both worlds.
Pinned versions¶
PerfShop pins OpenSearch and OpenSearch Dashboards to 2.13.0, and Vector to 0.38.0-alpine. The three components are linked: Vector 0.38 uses a stable VRL syntax, and OpenSearch 2.13 supports the APIs used by the Python seed (_index_template, saved_objects, _import).
Architecture¶
flowchart LR
SOCK["/var/run/docker.sock"]
VEC["perfshop-vector<br/>(timberio/vector:0.38.0-alpine)"]
OS["perfshop-opensearch<br/>(2.13.0)<br/>full-text indexing"]
OSD["perfshop-opensearch-dashboards<br/>(2.13.0)<br/>Kibana-compatible UI"]
SEED["perfshop-opensearch-seed<br/>(python:3.11-slim)<br/>one-shot"]
SOCK -->|"docker_logs source"| VEC
VEC -->|"VRL transform<br/>JSON parse +<br/>service_family routing"| VEC
VEC -->|"elasticsearch sink<br/>bulk.index = perfshop-{family}"| OS
OS --> OSD
SEED -.|"index templates +<br/>index patterns +<br/>dashboard import"| OS
SEED -.|"GET /api/status"| OSD
Vector — collection and transformation¶
Vector is the most technically interesting component of this stack. It works as a declarative TOML pipeline: sources → transforms → sinks.
Source — docker_logs¶
[sources.docker_logs]
type = "docker_logs"
docker_host = "unix:///var/run/docker.sock"
include_containers = [
"perfshop-app",
"perfshop-frontend",
"perfshop-db",
"perfshop-monitoring",
"perfshop-chaos-admin",
"perfshop-admin",
"perfshop-jmeter",
"perfshop-jmeter-ui",
"perfshop-loki",
"perfshop-promtail",
"perfshop-tempo",
"perfshop-pyroscope",
"perfshop-prometheus",
"perfshop-grafana",
"perfshop-testmgmt",
"perfshop-squash-db",
"perfshop-selenium",
"perfshop-test-runner",
"perfshop-orchestrator",
"perfshop-forgejo",
"perfshop-scripts-ui",
"perfshop-welcome",
"perfshop-docs",
]
Vector reads logs via the Docker socket (mounted as a bind mount), exactly like Promtail. But unlike Promtail, which only covers 4 containers, Vector collects 23 containers: all the application, observability and QA services. The one-shot services (*-seed) are excluded because they only emit a few lines at startup.
Pedagogical games hub container
The pedagogical games hub is included in the Vector sources because it is technically an nginx container like any other. No information about its URL, its port, or its Docker service name appears in the user documentation — only the technical log collection is mentioned here.
Transform — VRL (Vector Remap Language)¶
[transforms.enrich]
type = "remap"
inputs = ["docker_logs"]
source = '''
.container = replace(string!(.container_name), "/", "")
if exists(.timestamp) {
."@timestamp" = .timestamp
} else if exists(.time) {
."@timestamp" = .time
} else {
."@timestamp" = now()
}
parsed, err = parse_json(.message)
if err == null {
if exists(parsed.level) { .level = string!(parsed.level) }
if exists(parsed.logger_name) { .logger = string!(parsed.logger_name) }
if exists(parsed.message) { .msg = string!(parsed.message) }
if exists(parsed.chaos_family) { .chaos_family = string!(parsed.chaos_family) }
if exists(parsed.chaos_level) { .chaos_level = string!(parsed.chaos_level) }
if exists(parsed.scenario_id) { .scenario_id = string!(parsed.scenario_id) }
} else {
.msg = string!(.message)
.level = "INFO"
}
del(.label)
del(.labels)
del(.host)
del(.source_type)
c = .container
.service_family = if c == "perfshop-app" {
"spring"
} else if c == "perfshop-frontend" || c == "perfshop-admin" || c == "perfshop-chaos-admin" || c == "perfshop-monitoring" || c == "perfshop-scripts-ui" || c == "perfshop-welcome" || c == "perfshop-docs" {
"nginx"
} else if c == "perfshop-db" || c == "perfshop-squash-db" {
"mysql"
} else if c == "perfshop-jmeter" || c == "perfshop-jmeter-ui" {
"jmeter"
} else if c == "perfshop-testmgmt" || c == "perfshop-orchestrator" || c == "perfshop-selenium" || c == "perfshop-test-runner" {
"qa"
} else if c == "perfshop-forgejo" {
"forgejo"
} else {
"observability"
}
'''
This is a real little VRL program that does five things on each event:
1. Container name cleanup¶
Docker adds a / prefix to container names (/perfshop-app). The replace strips this prefix to expose a clean container=perfshop-app field.
2. Timestamp mapping¶
OpenSearch Dashboards requires an @timestamp field for the time-based index pattern. Vector maps timestamp → @timestamp (or time → @timestamp, or now() as a last resort).
3. Conditional JSON parsing¶
Spring Boot with logstash-logback-encoder produces logs in JSON format:
{"@timestamp":"...","level":"ERROR","logger_name":"com.perfshop.controller.AuthController","message":"Login failed","chaos_family":"security","chaos_level":2,"scenario_id":"S6"}
Vector tries to parse the message field as JSON. If it succeeds, it extracts six specific fields (level, logger, msg, chaos_family, chaos_level, scenario_id) and promotes them as top-level fields indexed by OpenSearch. If parsing fails (raw-text nginx, MySQL, etc. logs), the raw message is placed in msg and level is forced to INFO.
This is where the added value of OpenSearch over Loki becomes visible: the chaos_family, chaos_level, and scenario_id fields are indexed as keyword, which enables aggregations such as "how many events with scenario_id=S6 in the last hour?" — impossible to do efficiently in LogQL.
4. Removal of noisy Docker fields¶
Docker Compose labels contain dots (com.docker.compose.project) which are incompatible with OpenSearch mappings (dots are interpreted as nesting). Vector removes them before indexing.
5. Routing by service family¶
Each container is mapped to a family (spring, nginx, mysql, jmeter, qa, forgejo, observability). This family becomes the suffix of the target OpenSearch index — having a single container per family is not required.
Sink — elasticsearch (ES compatibility)¶
[sinks.opensearch]
type = "elasticsearch"
inputs = ["enrich"]
endpoints = ["http://perfshop-opensearch:9200"]
mode = "bulk"
suppress_type_name = true
bulk.index = "perfshop-{{ service_family }}"
compression = "gzip"
request.retry_attempts = 10
healthcheck.enabled = true
Vector does not (or no longer) have a separate native opensearch sink — it uses the standard elasticsearch sink, which is compatible with the OpenSearch REST API (OpenSearch is an Elasticsearch fork).
| Parameter | Effect |
|---|---|
mode = "bulk" |
Bulk insert to reduce the number of HTTP requests |
suppress_type_name = true |
Removes the _type field (deprecated since ES 7+) |
bulk.index = "perfshop-{{ service_family }}" |
Templating: the target index is computed dynamically from the service_family field set by the transform — a Spring container goes to perfshop-spring, an nginx goes to perfshop-nginx, etc. |
compression = "gzip" |
gzip network compression |
request.retry_attempts = 10 |
10 retries before giving up |
healthcheck.enabled = true |
Verifies at startup that the OpenSearch endpoint responds |
OpenSearch — configuration¶
And on the environment variables side in compose:
environment:
- cluster.name=perfshop-logs
- node.name=perfshop-opensearch-node1
- discovery.type=single-node
- OPENSEARCH_JAVA_OPTS=${OPENSEARCH_JAVA_OPTS:--Xms512m -Xmx512m}
- DISABLE_SECURITY_PLUGIN=true
- DISABLE_PERFORMANCE_ANALYZER_AGENT_CLI=true
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
| Parameter | Effect |
|---|---|
cluster.name=perfshop-logs |
Cluster name (a single node) |
discovery.type=single-node |
Disables the multi-node bootstrap (otherwise OpenSearch refuses to start in single-node without explicit config) |
OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m |
512 MB JVM heap (configurable via env) |
DISABLE_SECURITY_PLUGIN=true |
Security plugin disabled — no TLS, no auth, freely accessible from the internal Docker network. This is intentional in the pedagogical context; in real production it would have to be enabled. |
ulimits memlock=-1 |
Memory lock disabled on the ulimit side |
bootstrap.memory_lock: false |
Memory lock disabled on the config side |
nofile=65536 |
High open-file cap (OpenSearch consumes many of them for Lucene segments) |
Healthcheck:
healthcheck:
test: ["CMD-SHELL", "curl -sf http://localhost:9200/_cluster/health | grep -qE '\"status\":\"(green|yellow)\"'"]
interval: 15s
timeout: 10s
retries: 12
start_period: 60s
The healthcheck waits for the cluster status to be yellow or green (single-node: replication is not possible, so the maximum reachable status is yellow).
OpenSearch Dashboards¶
environment:
- OPENSEARCH_HOSTS=["http://perfshop-opensearch:9200"]
- DISABLE_SECURITY_DASHBOARDS_PLUGIN=true
volumes:
- ./opensearch/dashboards.yml:/usr/share/opensearch-dashboards/config/opensearch_dashboards.yml
OpenSearch Dashboards is a Kibana fork — the UI is familiar to anyone who has used the ELK stack. Minimal configuration: points to the OpenSearch cluster and disables its own security plugin.
Default host port: 5601 (variable OPENSEARCH_HTTP_PORT).
The opensearch-seed/seed.py seed¶
The perfshop-opensearch-seed service (one-shot, restart: "no") performs three steps at first startup.
sequenceDiagram
autonumber
participant S as opensearch-seed
participant OS as perfshop-opensearch
participant OSD as perfshop-opensearch-dashboards
S->>OS: GET /_cluster/health<br/>(loop 5s × 60)
OS-->>S: {"status":"yellow"} ✓
loop for each family (7)
S->>OS: PUT /_index_template/perfshop-{family}-template<br/>{ "index_patterns":["perfshop-{family}*"],<br/> "template":{"mappings":{...}} }
OS-->>S: 200 OK
end
S->>OSD: GET /api/status<br/>(loop 5s × 90)
OSD-->>S: 200 OK
loop for each pattern (8)
S->>OSD: POST /api/saved_objects/index-pattern/{pid}<br/>{"attributes":{"title":"perfshop-...*","timeFieldName":"@timestamp"}}
OSD-->>S: 200 or 409 (already existing)
end
S->>OSD: POST /api/opensearch-dashboards/settings<br/>{"changes":{"defaultIndex":"perfshop-all"}}
OSD-->>S: 200 OK
S->>OSD: POST /api/saved_objects/_import?overwrite=true<br/>file: perfshop-all-logs.ndjson
OSD-->>S: 200 OK + successCount
Step 1 — 7 index templates¶
For each family (spring, nginx, mysql, jmeter, qa, forgejo, observability), the seed creates a template that:
- Matches the
perfshop-{family}*indices - Sets
number_of_shards: 1,number_of_replicas: 0,index.refresh_interval: 5s - Defines an explicit mapping on the fields:
@timestamp(date),ts(date),container(keyword),service_family(keyword),level(keyword),logger(keyword),msg(text + sub-fieldrawkeyword),message(text),stream(keyword),chaos_family(keyword),chaos_level(keyword),scenario_id(keyword),host(keyword)
The mapping ensures that aggregations on chaos_family, scenario_id, etc. are efficient (keyword fields indexed as doc_values).
Step 2 — 8 index patterns in Dashboards¶
The seed creates 8 index patterns in OpenSearch Dashboards:
| Pattern | Target |
|---|---|
perfshop-all |
perfshop-* |
perfshop-spring |
perfshop-spring* |
perfshop-nginx |
perfshop-nginx* |
perfshop-mysql |
perfshop-mysql* |
perfshop-jmeter |
perfshop-jmeter* |
perfshop-qa |
perfshop-qa* |
perfshop-forgejo |
perfshop-forgejo* |
perfshop-observability |
perfshop-observability* |
And sets perfshop-all as the default index pattern (Discover view).
Step 3 — Import of the PerfShop — All Logs dashboard¶
The seed imports a pre-built NDJSON dashboard (opensearch/dashboards/perfshop-all-logs.ndjson) via the POST /api/saved_objects/_import?overwrite=true API. If the file does not exist, the step is silently skipped.
Volumes¶
| Volume | Mount | Content |
|---|---|---|
opensearch-data (named volume) |
/usr/share/opensearch/data |
Data indexed by OpenSearch (Lucene segments, translog) |
./opensearch/opensearch.yml (bind mount) |
/usr/share/opensearch/config/opensearch.yml |
OpenSearch config (read-only) |
./opensearch/dashboards.yml (bind mount) |
/usr/share/opensearch-dashboards/config/opensearch_dashboards.yml |
OpenSearch Dashboards config |
./vector/vector.toml (bind mount) |
/etc/vector/vector.toml |
Vector pipeline (read-only) |
/var/run/docker.sock (bind mount) |
/var/run/docker.sock |
Docker socket for Vector's docker_logs source |
./opensearch-seed/seed.py (bind mount) |
/app/seed.py |
Seed Python script (read-only) |
./opensearch/dashboards (bind mount) |
/app/dashboards |
Pre-built NDJSON dashboards |
Ports¶
| Service | Host port | Container port | Env variable |
|---|---|---|---|
perfshop-opensearch |
9201 | 9200 | OPENSEARCH_API_PORT |
perfshop-opensearch-dashboards |
5601 | 5601 | OPENSEARCH_HTTP_PORT |
perfshop-vector |
(none) | (internal only) | — |
To go further¶
- Overview — Loki vs OpenSearch comparison
- Loki — the other log sink (label-based index model)
- Docker Compose — details of the
perfshop-opensearch*andperfshop-vectorservices