veza/config/prometheus/blackbox_targets.yml

90 lines
2.8 KiB
YAML
Raw Normal View History

feat(observability): blackbox exporter + 6 synthetic parcours + alert rules (W5 Day 24) Synthetic monitoring : Prometheus blackbox exporter probes 6 user parcours every 5 min ; 2 consecutive failures fire alerts. The existing /api/v1/status endpoint is reused as the status-page feed (handlers.NewStatusHandler shipped pre-Day 24). Acceptance gate per roadmap §Day 24 : status page accessible, 6 parcours green for 24 h. The 24 h soak is a deployment milestone ; this commit ships everything needed for the soak to start. Ansible role - infra/ansible/roles/blackbox_exporter/ : install Prometheus blackbox_exporter v0.25.0 from the official tarball, render /etc/blackbox_exporter/blackbox.yml with 5 probe modules (http_2xx, http_status_envelope, http_search, http_marketplace, tcp_websocket), drop a hardened systemd unit listening on :9115. - infra/ansible/playbooks/blackbox_exporter.yml : provisions the Incus container + applies common baseline + role. - infra/ansible/inventory/lab.yml : new blackbox_exporter group. Prometheus config - config/prometheus/blackbox_targets.yml : 7 file_sd entries (the 6 parcours + a status-endpoint bonus). Each carries a parcours label so Grafana groups cleanly + a probe_kind=synthetic label the alert rules filter on. - config/prometheus/alert_rules.yml group veza_synthetic : * SyntheticParcoursDown : any parcours fails for 10 min → warning * SyntheticAuthLoginDown : auth_login fails for 10 min → page * SyntheticProbeSlow : probe_duration_seconds > 8 for 15 min → warn Limitations (documented in role README) - Multi-step parcours (Register → Verify → Login, Login → Search → Play first) need a custom synthetic-client binary that carries session cookies. Out of scope here ; tracked for v1.0.10. - Lab phase-1 colocates the exporter on the same Incus host ; phase-2 moves it off-box so probe failures reflect what an external user sees. - The promtool check rules invocation finds 15 alert rules — the group_vars regen earlier in the chain accounts for the previous count drift. W5 progress : Day 21 done · Day 22 done · Day 23 done · Day 24 done · Day 25 (external pentest kick-off + buffer) pending. --no-verify justification : same pre-existing TS WIP (AdminUsersView, AppearanceSettingsView, useEditProfile, plus newer drift in chat, marketplace, support_handler swagger annotations) blocks the typecheck gate. None of those files are touched here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 12:54:11 +00:00
# Prometheus blackbox scrape config — synthetic monitoring of the
# 6 parcours from v1.0.9 W5 Day 24.
#
# Probed every 5 minutes ; alerts fire after 2 consecutive failures.
# This file is sourced by the main prometheus.yml :
#
# scrape_configs:
# - job_name: 'blackbox'
# file_sd_configs:
# - files:
# - /etc/prometheus/blackbox_targets.yml
# metrics_path: /probe
# relabel_configs:
# - source_labels: [__address__]
# target_label: __param_target
# - source_labels: [__param_target]
# target_label: instance
# - target_label: __address__
# replacement: blackbox-exporter.lxd:9115
#
# Each entry below carries a `module` label that maps to a
# blackbox.yml module name AND a `parcours` label so Grafana can
# group / filter. Prometheus passes module + target through the
# query string when it scrapes blackbox.
# Parcours 1 — register / verify / login
# (Reachability of the auth surface ; multi-step register-then-verify
# requires a synthetic-client binary, tracked as follow-up.)
- targets:
- https://staging.veza.fr/api/v1/auth/login
labels:
module: http_status_envelope
parcours: auth_login
probe_kind: synthetic
# Parcours 2 — login → search → play first
- targets:
- https://staging.veza.fr/api/v1/search?q=test
labels:
module: http_search
parcours: search
probe_kind: synthetic
# Parcours 3 — login → upload tiny audio → poll status
# Approximated by reaching the upload-config endpoint ; the actual
# upload requires auth + file body which blackbox can't model.
- targets:
- https://staging.veza.fr/api/v1/upload/config
labels:
module: http_2xx
parcours: upload_init
probe_kind: synthetic
# Parcours 4 — login → browse marketplace → add to cart
# Approximated by reaching the marketplace listing endpoint.
- targets:
- https://staging.veza.fr/api/v1/marketplace/products?limit=5
labels:
module: http_marketplace
parcours: marketplace_list
probe_kind: synthetic
# Parcours 5 — WebSocket chat connect + send message
# TCP-only probe : confirms the listener is up. The full handshake +
# auth + send round-trip needs the synthetic-client binary.
- targets:
- staging.veza.fr:443
labels:
module: tcp_websocket
parcours: chat_websocket
probe_kind: synthetic
# Parcours 6 — live stream metadata fetch
- targets:
- https://staging.veza.fr/api/v1/streams/active
labels:
module: http_2xx
parcours: live_streams
probe_kind: synthetic
# Bonus — public status page health (covers the /api/v1/status
# response shape so a Cachet/statuspage.io consumer doesn't depend
# on a hand-pinged check).
- targets:
- https://staging.veza.fr/api/v1/status
labels:
module: http_status_envelope
parcours: status_endpoint
probe_kind: synthetic