veza/scripts at a36d9b2d5907c5b5e1ea60b19b95cb0eddb85ccd - senke/veza

senke/veza

History

senke bf31a91ae6 Some checks failed Veza CI / Frontend (Web) (push) Failing after 16m6s Details Veza CI / Notify on failure (push) Successful in 11s Details E2E Playwright / e2e (full) (push) Successful in 19m59s Details Veza CI / Rust (Stream Server) (push) Successful in 4m57s Details Security Scan / Secret Scanning (gitleaks) (push) Successful in 49s Details Veza CI / Backend (Go) (push) Successful in 6m4s Details feat(infra): pgbackrest role + dr-drill + Prometheus backup alerts (W2 Day 8) ROADMAP_V1.0_LAUNCH.md §Semaine 2 day 8 deliverable: - Postgres backups land in MinIO via pgbackrest - dr-drill restores them weekly into an ephemeral Incus container and asserts the data round-trips - Prometheus alerts fire when the drill fails OR when the timer has stopped firing for >8 days Cadence: full — weekly (Sun 02:00 UTC, systemd timer) diff — daily (Mon-Sat 02:00 UTC, systemd timer) WAL — continuous (postgres archive_command, archive_timeout=60s) drill — weekly (Sun 04:00 UTC — runs 2h after the Sun full so the restore exercises fresh data) RPO ≈ 1 min (archive_timeout). RTO ≤ 30 min (drill measures actual restore wall-clock). Files: infra/ansible/roles/pgbackrest/ defaults/main.yml — repo1-* config (MinIO/S3, path-style, aes-256-cbc encryption, vault-backed creds), retention 4 full / 7 diff / 4 archive cycles, zstd@3 compression. The role's first task asserts the placeholder secrets are gone — refuses to apply until the vault carries real keys. tasks/main.yml — install pgbackrest, render /etc/pgbackrest/pgbackrest.conf, set archive_command on the postgres instance via ALTER SYSTEM, detect role at runtime via `pg_autoctl show state --json`, stanza-create from primary only, render + enable systemd timers (full + diff + drill). templates/pgbackrest.conf.j2 — global + per-stanza sections; pg1-path defaults to the pg_auto_failover state dir so the role plugs straight into the Day 6 formation. templates/pgbackrest-{full,diff,drill}.{service,timer}.j2 — systemd units. Backup services run as `postgres`, drill service runs as `root` (needs `incus`). RandomizedDelaySec on every timer to absorb clock skew + node collision risk. README.md — RPO/RTO guarantees, vault setup, repo wiring, operational cheatsheet (info / check / manual backup), restore procedure documented separately as the dr-drill. scripts/dr-drill.sh Acceptance script for the day. Sequence: 0. pre-flight: required tools, latest backup metadata visible 1. launch ephemeral `pg-restore-drill` Incus container 2. install postgres + pgbackrest inside, push the SAME pgbackrest.conf as the host (read-only against the bucket by pgbackrest semantics — the same s3 keys get reused so the drill exercises the production credential path) 3. `pgbackrest restore` — full + WAL replay 4. start postgres, wait for pg_isready 5. smoke query: SELECT count() FROM users — must be ≥ MIN_USERS_EXPECTED 6. write veza_backup_drill_ metrics to the textfile-collector 7. teardown (or --keep for postmortem inspection) Exit codes 0/1/2 (pass / drill failure / env problem) so a Prometheus runner can plug in directly. config/prometheus/alert_rules.yml — new `veza_backup` group: - BackupRestoreDrillFailed (critical, 5m): the last drill reported success=0. Pages because a backup we haven't proved restorable is dette technique waiting for a disaster. - BackupRestoreDrillStale (warning, 1h after >8 days): the drill timer has stopped firing. Catches a broken cron / unit / runner before the failure-mode alert above ever sees data. Both annotations include a runbook_url stub (veza.fr/runbooks/...) — those land alongside W2 day 10's SLO runbook batch. infra/ansible/playbooks/postgres_ha.yml Two new plays: 6. apply pgbackrest role to postgres_ha_nodes (install + config + full/diff timers on every data node; pgbackrest's repo lock arbitrates collision) 7. install dr-drill on the incus_hosts group (push /usr/local/bin/dr-drill.sh + render drill timer + ensure /var/lib/node_exporter/textfile_collector exists) Acceptance verified locally: $ ansible-playbook -i inventory/lab.yml playbooks/postgres_ha.yml \ --syntax-check playbook: playbooks/postgres_ha.yml ← clean $ python3 -c "import yaml; yaml.safe_load(open('config/prometheus/alert_rules.yml'))" YAML OK $ bash -n scripts/dr-drill.sh syntax OK Real apply + drill needs the lab R720 + a populated MinIO bucket + the secrets in vault — operator's call. Out of scope (deferred per ROADMAP §2): - Off-site backup replica (B2 / Bunny.net) — v1.1+ - Logical export pipeline for RGPD per-user dumps — separate feature track, not a backup-system concern - PITR admin UI — CLI-only via `--type=time` for v1.0 - pgbackrest_exporter Prometheus integration — W2 day 9 alongside the OTel collector Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-04-28 00:51:00 +02:00
..
archive	refonte: backend-api go first; phase 1	2025-12-12 21:34:34 -05:00
probes	chore(release): v1.0.6.2 — subscription payment-gate bypass hotfix	2026-04-17 12:21:53 +02:00
align-8px-grid.py	aesthetic-improvements: align spacing to 8px grid (Action 11.2.1.3)	2026-01-16 11:50:46 +01:00
audit_backend_endpoints.py	[INT-005] int: Verify all backend endpoints have frontend usage	2025-12-25 15:08:30 +01:00
auto_migrate_tailwind_colors.py	feat: add automated scripts for Tailwind color migration with batch processing and verification	2026-01-16 01:54:57 +01:00
auto_migrate_tailwind_colors_batch.py	feat: add automated scripts for Tailwind color migration with batch processing and verification	2026-01-16 01:54:57 +01:00
bfg-cleanup.sh	chore(cleanup): add scripts/bfg-cleanup.sh for history rewrite	2026-04-20 18:55:17 +02:00
coverage-trend.mjs	chore(cleanup): remove orphan code + archive disabled workflows + .playwright-mcp	2026-04-20 20:33:40 +02:00
deploy-blue-green.sh	feat(infra): blue-green deployment via HAProxy	2026-02-23 19:52:19 +01:00
deploy-staging.sh	stabilisation commit A	2026-01-07 19:39:21 +01:00
diagnose-register.sh	[FIX] Added TokenVersion field to user creation	2026-01-04 01:44:13 +01:00
dr-drill.sh	feat(infra): pgbackrest role + dr-drill + Prometheus backup alerts (W2 Day 8)	2026-04-28 00:51:00 +02:00
flaky-detection.mjs	chore(cleanup): remove orphan code + archive disabled workflows + .playwright-mcp	2026-04-20 20:33:40 +02:00
generate-bug-report.sh	[TEST] MVP integration tests executed - 2/28 API passed, 0/20 E2E passed, 3 bugs found	2026-01-04 01:44:13 +01:00
generate-jwt-keys.sh	v0.9.1	2026-03-05 19:22:31 +01:00
generate-ssl-cert.sh	fix(infra): HAProxy HTTPS and stats security	2026-02-15 15:58:51 +01:00
generate_full_schema.sh	chore(release): v0.942 — Compress (migration consolidation procedure, mark script)	2026-03-02 19:05:54 +01:00
generate_tailwind_list.py	docs: generate comprehensive list of all remaining Tailwind default color instances	2026-01-16 01:51:32 +01:00
mark_consolidated.sql	chore(release): v0.942 — Compress (migration consolidation procedure, mark script)	2026-03-02 19:05:54 +01:00
README_TAILWIND_MIGRATION.md	feat: add automated scripts for Tailwind color migration with batch processing and verification	2026-01-16 01:54:57 +01:00
replace-decorative-cyan.py	aesthetic-improvements: automated replacement of decorative cyan with steel (80/20 rule, Action 11.3.1.3)	2026-01-16 11:40:13 +01:00
rotate_logs.sh	feat: centraliser tous les logs dans /var/log/veza avec rotation	2026-01-04 01:44:23 +01:00
run-all-mvp-tests.sh	[TEST] MVP integration tests executed - 2/28 API passed, 0/20 E2E passed, 3 bugs found	2026-01-04 01:44:13 +01:00
run-e2e-local.sh	fix(e2e): align local E2E setup with CI or document CI-only validation	2026-02-19 19:10:15 +01:00
setup-mvp-test-env.sh	[TEST] MVP integration tests executed - 2/28 API passed, 0/20 E2E passed, 3 bugs found	2026-01-04 01:44:13 +01:00
setup_logs.sh	feat: centraliser tous les logs dans /var/log/veza avec rotation	2026-01-04 01:44:23 +01:00
smoke_test.go	P0 UUID Phase A: migrations + backend Go UUID refactor	2025-12-04 02:15:48 +01:00
squash_migrations.sh	chore(release): v0.602 — Payout, Dette Technique & Tests E2E	2026-02-23 22:32:01 +01:00
staging-stability-check.sh	feat(v0.14.0): validation runtime & staging pipeline	2026-03-13 16:09:43 +01:00
start-backend.sh	chore(audit 2.4, 2.5): supprimer code mort Education et cmd/modern-server	2026-02-15 14:39:40 +01:00
start_boot.sh	chore(audit 2.4, 2.5): supprimer code mort Education et cmd/modern-server	2026-02-15 14:39:40 +01:00
start_minimal.sh	chore(audit 2.4, 2.5): supprimer code mort Education et cmd/modern-server	2026-02-15 14:39:40 +01:00
stop_minimal.sh	feat: global update including storybook setup and backend fixes	2026-02-02 19:34:14 +01:00
sync-cursor.py	BASE: completing the initial repo state	2025-12-03 22:56:50 +01:00
test-endpoint-formats.sh	api-contracts: identify endpoint response formats	2026-01-11 16:36:13 +01:00
test-mvp-api.sh	fix: resolve stream server compilation errors and integrate chat stability fixes	2026-01-04 01:44:22 +01:00
validate-env.sh	v0.9.3	2026-03-05 19:35:57 +01:00
validate-full.sh	docs: align FEATURE_STATUS and validation scripts with v0.101 state	2026-02-17 15:35:58 +01:00
validate-light.sh	chore: consolidate CI, E2E, backend and frontend updates	2026-02-17 16:43:21 +01:00
verify-rust-build.sh	fix(rust): ensure chat-server and stream-server compile in release mode	2026-02-15 15:54:03 +01:00
verify_minimal_journey.sh	feat: global update including storybook setup and backend fixes	2026-02-02 19:34:14 +01:00
view_logs.sh	release(v0.903): Vault - ORDER BY whitelist, rate limiter, VERSION sync, chat-server cleanup, Go 1.24	2026-02-27 09:43:25 +01:00
visual-update-baselines.sh	chore(cleanup): remove orphan code + archive disabled workflows + .playwright-mcp	2026-04-20 20:33:40 +02:00