94 lines
4.9 KiB
Markdown
94 lines
4.9 KiB
Markdown
|
|
# `blackbox_exporter` role — synthetic monitoring runner
|
|||
|
|
|
|||
|
|
Single Incus container running Prometheus' `blackbox_exporter`. Probed by Prometheus every 5 minutes against the 6 user parcours from v1.0.9 W5 Day 24. Alerts fire after 2 consecutive failures (`for: 10m` × 5-min scrape = 2 cycles).
|
|||
|
|
|
|||
|
|
## Topology
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Prometheus :9090
|
|||
|
|
│ scrape every 5m
|
|||
|
|
▼
|
|||
|
|
┌─────────────────────────────┐
|
|||
|
|
│ blackbox-exporter.lxd:9115 │
|
|||
|
|
│ (this role) │
|
|||
|
|
└────────────┬────────────────┘
|
|||
|
|
│ probes (HTTP / TCP)
|
|||
|
|
┌─────────────────────┼─────────────────────┐
|
|||
|
|
▼ ▼ ▼
|
|||
|
|
staging.veza.fr/api/v1/auth/login /api/v1/search?q=test /api/v1/marketplace/products
|
|||
|
|
... ...
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The exporter SHOULD run on a host **external** to the prod cluster so probe failures reflect what an external user sees, not what an already-broken internal service hides. v1.0 lab phase-1 colocates it for simplicity ; phase-2 moves the container off-box.
|
|||
|
|
|
|||
|
|
## Probe modules (defined in `templates/blackbox.yml.j2`)
|
|||
|
|
|
|||
|
|
| Module | Used by parcours | What it asserts |
|
|||
|
|
| ---------------------- | ---------------------- | ------------------------------------------------------ |
|
|||
|
|
| `http_2xx` | upload_init, live_streams | Status code 200 or 204, TLS valid |
|
|||
|
|
| `http_status_envelope` | auth_login, status_endpoint | Body matches `"success":\s*true` |
|
|||
|
|
| `http_search` | search | Body matches `"tracks"` (seed data must include hits) |
|
|||
|
|
| `http_marketplace` | marketplace_list | 200 (no body assertion ; an empty array is valid) |
|
|||
|
|
| `tcp_websocket` | chat_websocket | TLS-wrapped TCP handshake completes |
|
|||
|
|
|
|||
|
|
Multi-step parcours that need session state (Register → Verify → Login, Login → Search → Play first result) are **out of scope** for blackbox. Tracked as a follow-up : a small Go binary that runs as a CronJob, walks the steps, and writes textfile-collector metrics to `/var/lib/node_exporter/textfile_collector/veza_synthetic.prom`.
|
|||
|
|
|
|||
|
|
## Defaults
|
|||
|
|
|
|||
|
|
| variable | default | meaning |
|
|||
|
|
| -------------------------- | ----------------------------- | ---------------------------------------- |
|
|||
|
|
| `blackbox_version` | `0.25.0` | Prometheus blackbox_exporter release |
|
|||
|
|
| `blackbox_listen_port` | `9115` | Prometheus default |
|
|||
|
|
| `blackbox_target_base_url` | `https://staging.veza.fr` | base URL the probes hit |
|
|||
|
|
|
|||
|
|
## Prometheus scrape config
|
|||
|
|
|
|||
|
|
`config/prometheus/blackbox_targets.yml` carries the 7 file-SD entries (6 parcours + status-endpoint bonus). Wire it in `prometheus.yml` :
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
scrape_configs:
|
|||
|
|
- job_name: blackbox
|
|||
|
|
file_sd_configs:
|
|||
|
|
- files: [/etc/prometheus/blackbox_targets.yml]
|
|||
|
|
metrics_path: /probe
|
|||
|
|
relabel_configs:
|
|||
|
|
- source_labels: [__address__]
|
|||
|
|
target_label: __param_target
|
|||
|
|
- source_labels: [__param_target]
|
|||
|
|
target_label: instance
|
|||
|
|
- source_labels: [module]
|
|||
|
|
target_label: __param_module
|
|||
|
|
- target_label: __address__
|
|||
|
|
replacement: blackbox-exporter.lxd:9115
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Alert rules
|
|||
|
|
|
|||
|
|
`config/prometheus/alert_rules.yml` group `veza_synthetic` :
|
|||
|
|
|
|||
|
|
- `SyntheticParcoursDown` — any parcours fails for 10 m → warning.
|
|||
|
|
- `SyntheticAuthLoginDown` — auth_login fails for 10 m → critical (page).
|
|||
|
|
- `SyntheticProbeSlow` — probe duration > 8 s for 15 m → warning.
|
|||
|
|
|
|||
|
|
## Operations
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Service status :
|
|||
|
|
sudo systemctl status blackbox_exporter
|
|||
|
|
|
|||
|
|
# One-off probe (dev / debug) :
|
|||
|
|
curl 'http://blackbox-exporter.lxd:9115/probe?target=https://staging.veza.fr/api/v1/health&module=http_status_envelope'
|
|||
|
|
|
|||
|
|
# Live probe latency tail :
|
|||
|
|
curl -s http://blackbox-exporter.lxd:9115/metrics | grep probe_duration
|
|||
|
|
|
|||
|
|
# Tail the exporter log :
|
|||
|
|
sudo journalctl -u blackbox_exporter -f
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## What this role does NOT cover
|
|||
|
|
|
|||
|
|
- **Multi-step parcours.** Blackbox can't carry session cookies across probes ; the Register-then-Verify-then-Login flow needs a custom synthetic client. Tracked for v1.0.10.
|
|||
|
|
- **Status page.** Cachet/statuspage.io is a separate operator decision per the roadmap. The `/api/v1/status` endpoint is consumable by both.
|
|||
|
|
- **Off-box deploy.** Lab phase-1 runs the container on the same Incus host as the things it's probing. Phase-2 moves it off-cluster.
|