Skip to content

OpenSearch and Vector

OpenSearch is PerfShop's second log sink, in parallel with Loki. Where Loki indexes only on labels and stores content as text (lightweight model, query by filtering), OpenSearch indexes all fields in full-text and enables rich aggregations and facets. Vector plays the role of collection and transformation agent between Docker logs and OpenSearch.

Source of truth

This page is taken from vector/vector.toml, opensearch/opensearch.yml, opensearch-seed/seed.py, and the perfshop-opensearch, perfshop-opensearch-dashboards, perfshop-vector, perfshop-opensearch-seed blocks of the compose files.

Why two log sinks?

This is a legitimate question — collecting the same logs twice does not seem natural. The answer comes down to three points:

  1. Pedagogical demonstration: students must be able to concretely compare the "label-based index" model (Loki) and the "full-text index" model (OpenSearch / Elasticsearch). Having both in parallel makes it possible to illustrate live, on the same real logs, the strengths and limitations of each approach.
  2. Different use cases: Loki is unbeatable for fast filtering by container and level during a lab. OpenSearch is unbeatable for exploratory full-text search ("find all exceptions where the word connection appears, regardless of the service").
  3. Grafana / OpenSearch Dashboards coupling: Loki is natively integrated into Grafana; OpenSearch has its own UI (OpenSearch Dashboards, a Kibana fork). Two UIs, two paradigms — the student sees both worlds.

Pinned versions

PerfShop pins OpenSearch and OpenSearch Dashboards to 2.13.0, and Vector to 0.38.0-alpine. The three components are linked: Vector 0.38 uses a stable VRL syntax, and OpenSearch 2.13 supports the APIs used by the Python seed (_index_template, saved_objects, _import).

Architecture

flowchart LR
  SOCK["/var/run/docker.sock"]

  VEC["perfshop-vector<br/>(timberio/vector:0.38.0-alpine)"]

  OS["perfshop-opensearch<br/>(2.13.0)<br/>full-text indexing"]

  OSD["perfshop-opensearch-dashboards<br/>(2.13.0)<br/>Kibana-compatible UI"]

  SEED["perfshop-opensearch-seed<br/>(python:3.11-slim)<br/>one-shot"]

  SOCK -->|"docker_logs source"| VEC
  VEC -->|"VRL transform<br/>JSON parse +<br/>service_family routing"| VEC
  VEC -->|"elasticsearch sink<br/>bulk.index = perfshop-{family}"| OS

  OS --> OSD

  SEED -.|"index templates +<br/>index patterns +<br/>dashboard import"| OS
  SEED -.|"GET /api/status"| OSD

Vector — collection and transformation

Vector is the most technically interesting component of this stack. It works as a declarative TOML pipeline: sources → transforms → sinks.

Source — docker_logs

[sources.docker_logs]
type = "docker_logs"
docker_host = "unix:///var/run/docker.sock"
include_containers = [
  "perfshop-app",
  "perfshop-frontend",
  "perfshop-db",
  "perfshop-monitoring",
  "perfshop-chaos-admin",
  "perfshop-admin",
  "perfshop-jmeter",
  "perfshop-jmeter-ui",
  "perfshop-loki",
  "perfshop-promtail",
  "perfshop-tempo",
  "perfshop-pyroscope",
  "perfshop-prometheus",
  "perfshop-grafana",
  "perfshop-testmgmt",
  "perfshop-squash-db",
  "perfshop-selenium",
  "perfshop-test-runner",
  "perfshop-orchestrator",
  "perfshop-forgejo",
  "perfshop-scripts-ui",
  "perfshop-welcome",
  "perfshop-docs",
]

Vector reads logs via the Docker socket (mounted as a bind mount), exactly like Promtail. But unlike Promtail, which only covers 4 containers, Vector collects 23 containers: all the application, observability and QA services. The one-shot services (*-seed) are excluded because they only emit a few lines at startup.

Pedagogical games hub container

The pedagogical games hub is included in the Vector sources because it is technically an nginx container like any other. No information about its URL, its port, or its Docker service name appears in the user documentation — only the technical log collection is mentioned here.

Transform — VRL (Vector Remap Language)

[transforms.enrich]
type = "remap"
inputs = ["docker_logs"]
source = '''
.container = replace(string!(.container_name), "/", "")

if exists(.timestamp) {
  ."@timestamp" = .timestamp
} else if exists(.time) {
  ."@timestamp" = .time
} else {
  ."@timestamp" = now()
}

parsed, err = parse_json(.message)
if err == null {
  if exists(parsed.level)        { .level        = string!(parsed.level) }
  if exists(parsed.logger_name)  { .logger       = string!(parsed.logger_name) }
  if exists(parsed.message)      { .msg          = string!(parsed.message) }
  if exists(parsed.chaos_family) { .chaos_family = string!(parsed.chaos_family) }
  if exists(parsed.chaos_level)  { .chaos_level  = string!(parsed.chaos_level) }
  if exists(parsed.scenario_id)  { .scenario_id  = string!(parsed.scenario_id) }
} else {
  .msg   = string!(.message)
  .level = "INFO"
}

del(.label)
del(.labels)
del(.host)
del(.source_type)

c = .container

.service_family = if c == "perfshop-app" {
  "spring"
} else if c == "perfshop-frontend" || c == "perfshop-admin" || c == "perfshop-chaos-admin" || c == "perfshop-monitoring" || c == "perfshop-scripts-ui" || c == "perfshop-welcome" || c == "perfshop-docs" {
  "nginx"
} else if c == "perfshop-db" || c == "perfshop-squash-db" {
  "mysql"
} else if c == "perfshop-jmeter" || c == "perfshop-jmeter-ui" {
  "jmeter"
} else if c == "perfshop-testmgmt" || c == "perfshop-orchestrator" || c == "perfshop-selenium" || c == "perfshop-test-runner" {
  "qa"
} else if c == "perfshop-forgejo" {
  "forgejo"
} else {
  "observability"
}
'''

This is a real little VRL program that does five things on each event:

1. Container name cleanup

Docker adds a / prefix to container names (/perfshop-app). The replace strips this prefix to expose a clean container=perfshop-app field.

2. Timestamp mapping

OpenSearch Dashboards requires an @timestamp field for the time-based index pattern. Vector maps timestamp@timestamp (or time@timestamp, or now() as a last resort).

3. Conditional JSON parsing

Spring Boot with logstash-logback-encoder produces logs in JSON format:

{"@timestamp":"...","level":"ERROR","logger_name":"com.perfshop.controller.AuthController","message":"Login failed","chaos_family":"security","chaos_level":2,"scenario_id":"S6"}

Vector tries to parse the message field as JSON. If it succeeds, it extracts six specific fields (level, logger, msg, chaos_family, chaos_level, scenario_id) and promotes them as top-level fields indexed by OpenSearch. If parsing fails (raw-text nginx, MySQL, etc. logs), the raw message is placed in msg and level is forced to INFO.

This is where the added value of OpenSearch over Loki becomes visible: the chaos_family, chaos_level, and scenario_id fields are indexed as keyword, which enables aggregations such as "how many events with scenario_id=S6 in the last hour?" — impossible to do efficiently in LogQL.

4. Removal of noisy Docker fields

del(.label)
del(.labels)
del(.host)
del(.source_type)

Docker Compose labels contain dots (com.docker.compose.project) which are incompatible with OpenSearch mappings (dots are interpreted as nesting). Vector removes them before indexing.

5. Routing by service family

Each container is mapped to a family (spring, nginx, mysql, jmeter, qa, forgejo, observability). This family becomes the suffix of the target OpenSearch index — having a single container per family is not required.

Sink — elasticsearch (ES compatibility)

[sinks.opensearch]
type = "elasticsearch"
inputs = ["enrich"]
endpoints = ["http://perfshop-opensearch:9200"]
mode = "bulk"
suppress_type_name = true

bulk.index = "perfshop-{{ service_family }}"

compression = "gzip"
request.retry_attempts = 10
healthcheck.enabled = true

Vector does not (or no longer) have a separate native opensearch sink — it uses the standard elasticsearch sink, which is compatible with the OpenSearch REST API (OpenSearch is an Elasticsearch fork).

Parameter Effect
mode = "bulk" Bulk insert to reduce the number of HTTP requests
suppress_type_name = true Removes the _type field (deprecated since ES 7+)
bulk.index = "perfshop-{{ service_family }}" Templating: the target index is computed dynamically from the service_family field set by the transform — a Spring container goes to perfshop-spring, an nginx goes to perfshop-nginx, etc.
compression = "gzip" gzip network compression
request.retry_attempts = 10 10 retries before giving up
healthcheck.enabled = true Verifies at startup that the OpenSearch endpoint responds

OpenSearch — configuration

# opensearch.yml
network.host: 0.0.0.0
plugins.security.disabled: true
bootstrap.memory_lock: false

And on the environment variables side in compose:

environment:
  - cluster.name=perfshop-logs
  - node.name=perfshop-opensearch-node1
  - discovery.type=single-node
  - OPENSEARCH_JAVA_OPTS=${OPENSEARCH_JAVA_OPTS:--Xms512m -Xmx512m}
  - DISABLE_SECURITY_PLUGIN=true
  - DISABLE_PERFORMANCE_ANALYZER_AGENT_CLI=true
ulimits:
  memlock:
    soft: -1
    hard: -1
  nofile:
    soft: 65536
    hard: 65536
Parameter Effect
cluster.name=perfshop-logs Cluster name (a single node)
discovery.type=single-node Disables the multi-node bootstrap (otherwise OpenSearch refuses to start in single-node without explicit config)
OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m 512 MB JVM heap (configurable via env)
DISABLE_SECURITY_PLUGIN=true Security plugin disabled — no TLS, no auth, freely accessible from the internal Docker network. This is intentional in the pedagogical context; in real production it would have to be enabled.
ulimits memlock=-1 Memory lock disabled on the ulimit side
bootstrap.memory_lock: false Memory lock disabled on the config side
nofile=65536 High open-file cap (OpenSearch consumes many of them for Lucene segments)

Healthcheck:

healthcheck:
  test: ["CMD-SHELL", "curl -sf http://localhost:9200/_cluster/health | grep -qE '\"status\":\"(green|yellow)\"'"]
  interval: 15s
  timeout: 10s
  retries: 12
  start_period: 60s

The healthcheck waits for the cluster status to be yellow or green (single-node: replication is not possible, so the maximum reachable status is yellow).

OpenSearch Dashboards

environment:
  - OPENSEARCH_HOSTS=["http://perfshop-opensearch:9200"]
  - DISABLE_SECURITY_DASHBOARDS_PLUGIN=true
volumes:
  - ./opensearch/dashboards.yml:/usr/share/opensearch-dashboards/config/opensearch_dashboards.yml

OpenSearch Dashboards is a Kibana fork — the UI is familiar to anyone who has used the ELK stack. Minimal configuration: points to the OpenSearch cluster and disables its own security plugin.

Default host port: 5601 (variable OPENSEARCH_HTTP_PORT).

The opensearch-seed/seed.py seed

The perfshop-opensearch-seed service (one-shot, restart: "no") performs three steps at first startup.

sequenceDiagram
  autonumber
  participant S as opensearch-seed
  participant OS as perfshop-opensearch
  participant OSD as perfshop-opensearch-dashboards

  S->>OS: GET /_cluster/health<br/>(loop 5s × 60)
  OS-->>S: {"status":"yellow"} ✓

  loop for each family (7)
    S->>OS: PUT /_index_template/perfshop-{family}-template<br/>{ "index_patterns":["perfshop-{family}*"],<br/>  "template":{"mappings":{...}} }
    OS-->>S: 200 OK
  end

  S->>OSD: GET /api/status<br/>(loop 5s × 90)
  OSD-->>S: 200 OK

  loop for each pattern (8)
    S->>OSD: POST /api/saved_objects/index-pattern/{pid}<br/>{"attributes":{"title":"perfshop-...*","timeFieldName":"@timestamp"}}
    OSD-->>S: 200 or 409 (already existing)
  end

  S->>OSD: POST /api/opensearch-dashboards/settings<br/>{"changes":{"defaultIndex":"perfshop-all"}}
  OSD-->>S: 200 OK

  S->>OSD: POST /api/saved_objects/_import?overwrite=true<br/>file: perfshop-all-logs.ndjson
  OSD-->>S: 200 OK + successCount

Step 1 — 7 index templates

For each family (spring, nginx, mysql, jmeter, qa, forgejo, observability), the seed creates a template that:

  • Matches the perfshop-{family}* indices
  • Sets number_of_shards: 1, number_of_replicas: 0, index.refresh_interval: 5s
  • Defines an explicit mapping on the fields: @timestamp (date), ts (date), container (keyword), service_family (keyword), level (keyword), logger (keyword), msg (text + sub-field raw keyword), message (text), stream (keyword), chaos_family (keyword), chaos_level (keyword), scenario_id (keyword), host (keyword)

The mapping ensures that aggregations on chaos_family, scenario_id, etc. are efficient (keyword fields indexed as doc_values).

Step 2 — 8 index patterns in Dashboards

The seed creates 8 index patterns in OpenSearch Dashboards:

Pattern Target
perfshop-all perfshop-*
perfshop-spring perfshop-spring*
perfshop-nginx perfshop-nginx*
perfshop-mysql perfshop-mysql*
perfshop-jmeter perfshop-jmeter*
perfshop-qa perfshop-qa*
perfshop-forgejo perfshop-forgejo*
perfshop-observability perfshop-observability*

And sets perfshop-all as the default index pattern (Discover view).

Step 3 — Import of the PerfShop — All Logs dashboard

ndjson_path = "/app/dashboards/perfshop-all-logs.ndjson"

The seed imports a pre-built NDJSON dashboard (opensearch/dashboards/perfshop-all-logs.ndjson) via the POST /api/saved_objects/_import?overwrite=true API. If the file does not exist, the step is silently skipped.

Volumes

Volume Mount Content
opensearch-data (named volume) /usr/share/opensearch/data Data indexed by OpenSearch (Lucene segments, translog)
./opensearch/opensearch.yml (bind mount) /usr/share/opensearch/config/opensearch.yml OpenSearch config (read-only)
./opensearch/dashboards.yml (bind mount) /usr/share/opensearch-dashboards/config/opensearch_dashboards.yml OpenSearch Dashboards config
./vector/vector.toml (bind mount) /etc/vector/vector.toml Vector pipeline (read-only)
/var/run/docker.sock (bind mount) /var/run/docker.sock Docker socket for Vector's docker_logs source
./opensearch-seed/seed.py (bind mount) /app/seed.py Seed Python script (read-only)
./opensearch/dashboards (bind mount) /app/dashboards Pre-built NDJSON dashboards

Ports

Service Host port Container port Env variable
perfshop-opensearch 9201 9200 OPENSEARCH_API_PORT
perfshop-opensearch-dashboards 5601 5601 OPENSEARCH_HTTP_PORT
perfshop-vector (none) (internal only)

To go further

  • Overview — Loki vs OpenSearch comparison
  • Loki — the other log sink (label-based index model)
  • Docker Compose — details of the perfshop-opensearch* and perfshop-vector services