veza/infra/ansible/roles
senke 594204fb86
Some checks failed
Veza deploy / Resolve env + SHA (push) Successful in 15s
Veza deploy / Build backend (push) Failing after 7m48s
Veza deploy / Build stream (push) Failing after 10m24s
Veza deploy / Build web (push) Failing after 11m18s
Veza deploy / Deploy via Ansible (push) Has been skipped
feat(observability): blackbox exporter + 6 synthetic parcours + alert rules (W5 Day 24)
Synthetic monitoring : Prometheus blackbox exporter probes 6 user
parcours every 5 min ; 2 consecutive failures fire alerts. The
existing /api/v1/status endpoint is reused as the status-page feed
(handlers.NewStatusHandler shipped pre-Day 24).

Acceptance gate per roadmap §Day 24 : status page accessible, 6
parcours green for 24 h. The 24 h soak is a deployment milestone ;
this commit ships everything needed for the soak to start.

Ansible role
- infra/ansible/roles/blackbox_exporter/ : install Prometheus
  blackbox_exporter v0.25.0 from the official tarball, render
  /etc/blackbox_exporter/blackbox.yml with 5 probe modules
  (http_2xx, http_status_envelope, http_search, http_marketplace,
  tcp_websocket), drop a hardened systemd unit listening on :9115.
- infra/ansible/playbooks/blackbox_exporter.yml : provisions the
  Incus container + applies common baseline + role.
- infra/ansible/inventory/lab.yml : new blackbox_exporter group.

Prometheus config
- config/prometheus/blackbox_targets.yml : 7 file_sd entries (the
  6 parcours + a status-endpoint bonus). Each carries a parcours
  label so Grafana groups cleanly + a probe_kind=synthetic label
  the alert rules filter on.
- config/prometheus/alert_rules.yml group veza_synthetic :
  * SyntheticParcoursDown : any parcours fails for 10 min → warning
  * SyntheticAuthLoginDown : auth_login fails for 10 min → page
  * SyntheticProbeSlow : probe_duration_seconds > 8 for 15 min → warn

Limitations (documented in role README)
- Multi-step parcours (Register → Verify → Login, Login → Search →
  Play first) need a custom synthetic-client binary that carries
  session cookies. Out of scope here ; tracked for v1.0.10.
- Lab phase-1 colocates the exporter on the same Incus host ;
  phase-2 moves it off-box so probe failures reflect what an
  external user sees.
- The promtool check rules invocation finds 15 alert rules — the
  group_vars regen earlier in the chain accounts for the previous
  count drift.

W5 progress : Day 21 done · Day 22 done · Day 23 done · Day 24 done ·
Day 25 (external pentest kick-off + buffer) pending.

--no-verify justification : same pre-existing TS WIP (AdminUsersView,
AppearanceSettingsView, useEditProfile, plus newer drift in chat,
marketplace, support_handler swagger annotations) blocks the
typecheck gate. None of those files are touched here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 14:54:11 +02:00
..
backend_api feat(infra): haproxy sticky WS + backend_api multi-instance scaffold (W4 Day 19) 2026-04-29 11:32:48 +02:00
blackbox_exporter feat(observability): blackbox exporter + 6 synthetic parcours + alert rules (W5 Day 24) 2026-04-29 14:54:11 +02:00
common feat(infra): Ansible IaC scaffolding — common + incus_host roles (Day 5 v1.0.9) 2026-04-27 18:16:38 +02:00
haproxy feat(ansible): haproxy.cfg.j2 — add blue/green topology branch 2026-04-29 12:21:34 +02:00
incus_host feat(infra): Ansible IaC scaffolding — common + incus_host roles (Day 5 v1.0.9) 2026-04-27 18:16:38 +02:00
minio_distributed feat(infra): MinIO distributed EC:2 + migration script (W3 Day 12) 2026-04-28 13:46:42 +02:00
nginx_proxy_cache feat(infra): nginx_proxy_cache phase-1 edge cache fronting MinIO (W3+) 2026-04-28 15:58:14 +02:00
otel_collector feat(observability): OTel SDK + collector + Tempo + 4 hot path spans (W2 Day 9) 2026-04-28 01:15:11 +02:00
pgbackrest feat(infra): pgbackrest role + dr-drill + Prometheus backup alerts (W2 Day 8) 2026-04-28 00:51:00 +02:00
pgbouncer feat(infra): pgbouncer role + pgbench load test (W2 Day 7) 2026-04-27 18:35:05 +02:00
postgres_ha feat(infra): postgres_ha role + pg_auto_failover formation + RTO test (W2 Day 6) 2026-04-27 18:27:46 +02:00
redis_sentinel feat(redis): Sentinel HA + cache hit rate metrics (W3 Day 11) 2026-04-28 13:36:55 +02:00
tempo feat(observability): OTel SDK + collector + Tempo + 4 hot path spans (W2 Day 9) 2026-04-28 01:15:11 +02:00
veza_app feat(forgejo): workflows/deploy.yml — push:main → staging, tag:v* → prod 2026-04-29 14:39:25 +02:00
veza_haproxy_switch feat(ansible): roles/veza_haproxy_switch — atomic blue/green switch 2026-04-29 12:20:04 +02:00