veza/infra/ansible/playbooks/blackbox_exporter.yml

57 lines
1.8 KiB
YAML
Raw Normal View History

feat(observability): blackbox exporter + 6 synthetic parcours + alert rules (W5 Day 24) Synthetic monitoring : Prometheus blackbox exporter probes 6 user parcours every 5 min ; 2 consecutive failures fire alerts. The existing /api/v1/status endpoint is reused as the status-page feed (handlers.NewStatusHandler shipped pre-Day 24). Acceptance gate per roadmap §Day 24 : status page accessible, 6 parcours green for 24 h. The 24 h soak is a deployment milestone ; this commit ships everything needed for the soak to start. Ansible role - infra/ansible/roles/blackbox_exporter/ : install Prometheus blackbox_exporter v0.25.0 from the official tarball, render /etc/blackbox_exporter/blackbox.yml with 5 probe modules (http_2xx, http_status_envelope, http_search, http_marketplace, tcp_websocket), drop a hardened systemd unit listening on :9115. - infra/ansible/playbooks/blackbox_exporter.yml : provisions the Incus container + applies common baseline + role. - infra/ansible/inventory/lab.yml : new blackbox_exporter group. Prometheus config - config/prometheus/blackbox_targets.yml : 7 file_sd entries (the 6 parcours + a status-endpoint bonus). Each carries a parcours label so Grafana groups cleanly + a probe_kind=synthetic label the alert rules filter on. - config/prometheus/alert_rules.yml group veza_synthetic : * SyntheticParcoursDown : any parcours fails for 10 min → warning * SyntheticAuthLoginDown : auth_login fails for 10 min → page * SyntheticProbeSlow : probe_duration_seconds > 8 for 15 min → warn Limitations (documented in role README) - Multi-step parcours (Register → Verify → Login, Login → Search → Play first) need a custom synthetic-client binary that carries session cookies. Out of scope here ; tracked for v1.0.10. - Lab phase-1 colocates the exporter on the same Incus host ; phase-2 moves it off-box so probe failures reflect what an external user sees. - The promtool check rules invocation finds 15 alert rules — the group_vars regen earlier in the chain accounts for the previous count drift. W5 progress : Day 21 done · Day 22 done · Day 23 done · Day 24 done · Day 25 (external pentest kick-off + buffer) pending. --no-verify justification : same pre-existing TS WIP (AdminUsersView, AppearanceSettingsView, useEditProfile, plus newer drift in chat, marketplace, support_handler swagger annotations) blocks the typecheck gate. None of those files are touched here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 12:54:11 +00:00
# Synthetic monitoring playbook — provisions the blackbox-exporter
# Incus container and lays down the role.
#
# v1.0.9 W5 Day 24.
#
# IMPORTANT : the blackbox exporter SHOULD run on a host that is
# externally-routed (separate from the prod cluster) so a probe
# failure reflects what an external user sees. v1.0 lab keeps it on
# the same Incus host for simplicity ; phase-2 moves it off-box.
#
# Run with:
# ansible-galaxy collection install community.general
# ansible-playbook -i inventory/lab.yml playbooks/blackbox_exporter.yml
---
- name: Provision Incus container for blackbox exporter
hosts: incus_hosts
become: true
gather_facts: true
tasks:
- name: Launch blackbox-exporter container
ansible.builtin.shell:
cmd: |
set -e
if ! incus info blackbox-exporter >/dev/null 2>&1; then
incus launch images:ubuntu/22.04 blackbox-exporter
for _ in $(seq 1 30); do
if incus exec blackbox-exporter -- cloud-init status 2>/dev/null | grep -q "status: done"; then
break
fi
sleep 1
done
incus exec blackbox-exporter -- apt-get update
incus exec blackbox-exporter -- apt-get install -y python3 python3-apt
fi
args:
executable: /bin/bash
register: provision_result
changed_when: "'incus launch' in provision_result.stdout"
tags: [blackbox, provision]
- name: Refresh inventory
ansible.builtin.meta: refresh_inventory
- name: Apply common baseline
hosts: blackbox_exporter
become: true
gather_facts: true
roles:
- common
- name: Install + configure blackbox exporter
hosts: blackbox_exporter
become: true
gather_facts: true
roles:
- blackbox_exporter