feat(soft-launch): cohort tooling + email template + monitor + checklist

The soft-launch report doc (SOFT_LAUNCH_BETA_2026.md) had the narrative — cohort table, email body inline, monitoring list, acceptance gate. But the operational pieces were notes-to-self : "add migration if missing", "Typeform to-do", "schema TBD". The operator was supposed to assemble them on the day, which on a soft- launch day is the worst possible time. Added the missing 6 pieces so the day-of work is "tick boxes", not "build the tooling" : * migrations/990_beta_invites.sql — schema with code (16-char base32-ish), email, cohort label, used_at, expires_at + 30d default, sent_by FK with ON DELETE SET NULL. Three indexes : unique on code (signup-path lookup), cohort (post-launch attribution report), partial expires_at WHERE used_at IS NULL (cleanup cron). * scripts/soft-launch/validate-cohort.sh — sanity check on the operator's CSV : header form, malformed emails, duplicates, cohort distribution (≥50 total / ≥5 creators / ≥3 distinct labels), optional collision check against existing users. Exit codes 0 / 1 (block) / 2 (warn-but-proceed). Hard checks block, soft checks let the operator override with FORCE=1. * scripts/soft-launch/send-invitations.sh — split-phase : step 1 (default) inserts beta_invites rows + renders one .eml per recipient under scripts/soft-launch/out-<date>/ step 2 (SEND=1) dispatches via $SEND_CMD (msmtp by default) so the operator can review the rendered emls before sending 100 emails. Per-recipient transactional INSERT so a partial failure doesn't poison the table. Failed inserts logged with the offending email so the operator can rerun on the subset. * templates/email/beta_invite.eml.template — proper MIME multipart (text + HTML) eml ready for sendmail-compatible piping. French copy aligned with the éthique brand (no FOMO, no urgency manipulation, no "limited spots" framing). * scripts/soft-launch/monitor-checks.sh — polls the 6 acceptance- gate signals defined in SOFT_LAUNCH_BETA_2026.md §"Acceptance gate" : testers signed up, Sentry P1 events, status page, synthetic parcours, k6 nightly age, HIGH issues. Each gate independently emits ✅ / 🔴 / ⚪ (last for "couldn't check"). Verdict on stdout. LOOP=1 keeps polling every CHECK_INTERVAL seconds. Designed for cron + tmux, not for an interactive UI. * docs/SOFT_LAUNCH_BETA_2026_CHECKLIST.md — pre-flight gate that must reach 100% green before the first invitation goes out. T-72h section (database, cohort, email infra, redemption path, monitoring, comms), D-day section (last-hour, send, hour-1, every-4h), 18:00 UTC decision call section. Linked back to the bigger SOFT_LAUNCH_BETA_2026.md so the operator can navigate between the "what" (report) and the "how / has-everything- been-checked" (this checklist) without losing context. What still requires the operator on the day : - Build the cohort CSV (curate emails from real sources) - Create the Typeform feedback form ; paste its URL into the eml template once known - Configure msmtp / sendmail ($SEND_CMD) - Press the send button - Show up at 18:00 UTC for the decision call Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix(scripts,docs): game-day prod safety guards + rabbitmq-down runbook
2026-04-30 22:38:12 +02:00 · 2026-04-30 22:32:05 +02:00 · 2026-04-30 22:29:35 +02:00 · 2026-04-30 22:27:14 +02:00 · 2026-04-30 16:31:29 +02:00 · 2026-04-30 16:27:37 +02:00
19 changed files with 1870 additions and 24 deletions
--- a/docs/PENTEST_SEND_PACKAGE.md
+++ b/docs/PENTEST_SEND_PACKAGE.md
@ -0,0 +1,187 @@
 # Pentest send package — v2026 engagement
 > Operational checklist for handing off the v1.0.9 pre-launch pentest
 > brief to the external team. Companion to `docs/PENTEST_SCOPE_2026.md`
 > (the technical scope) — this doc is purely "what you send, in what
 > order, via which channel."
 The scope doc is technical and reusable across engagements. This file
 is the per-engagement "send package" that wraps it: the email template,
 the credentials-delivery plan, the IP allow-list step, and the kick-off
 checklist.
 ## The 5-step send sequence
 Run these in order. Each step has a check (✓) the operator ticks before
 moving to the next — out-of-order steps cause the engagement to stall.
 ### Step 1 — counter-sign the NDA + authorisation letter
 - [ ] NDA template signed by the pentester firm and counter-signed by us.
 - [ ] Authorisation-to-test letter signed by Veza tech lead (limits the
      scope to what's in `PENTEST_SCOPE_2026.md` §"In-scope assets" — the
      letter MUST list the staging URL explicitly so a reviewer can map
      pentester traffic to authorised activity).
 - [ ] Both PDFs uploaded to the shared 1Password vault (entry name :
      `pentest-2026-legal`). Do **not** email PDFs.
 ### Step 2 — provision pentester credentials
 - [ ] Run `bash scripts/pentest/seed-test-accounts.sh staging` (creates
      the 3 accounts from `PENTEST_SCOPE_2026.md` §"Authentication
      context", outputs random passwords).
 - [ ] Output passwords land in three 1Password entries :
      `pentest-2026-listener`, `pentest-2026-creator`, `pentest-2026-admin`.
      Each entry's "Notes" field includes the role and the MFA bypass
      token if applicable.
 - [ ] Share each entry **read-only** with the pentester's 1Password
      account using the firm's billing email. Do **not** put passwords
      in chat, email, or shell history.
 - [ ] Set entry expiration to engagement-end + 7 days (so cleanup is
      automatic if the team forgets to revoke).
 ### Step 3 — allow-list the pentester's IP
 The Forgejo source-code mirror at `https://10.0.20.105:3000/senke/veza`
 is grey-box read-only access. The pentester needs their static
 egress IP allow-listed before they can `git clone`.
 - [ ] Pentester sends their static egress IP (PGP-signed mail, or
      1Password Notes field).
 - [ ] SSH to `srv-102v` (Forgejo container) and add the IP to
      `/etc/forgejo/allowlist.conf`.
 - [ ] `systemctl reload forgejo`.
 - [ ] Verify : `curl -I https://10.0.20.105:3000/senke/veza` from the
      pentester IP returns 200 ; from any other IP, 403.
 (A future iteration could turn this into an Ansible playbook
 `infra/ansible/playbooks/pentest_allowlist_ip.yml`. For now the manual
 SSH path is fine — this happens once per engagement.)
 ### Step 4 — send the kick-off email
 Use the template below. Replace the placeholders inside `<…>`. Send
 PGP-encrypted (the pentester's key is in their security.txt) to
 **both** their lead pentester and their project manager so the chain
 of responsibility is recorded.
 ```text
 Subject : [PENTEST] Veza v1.0.9 pre-launch engagement — kick-off
 Hi <lead pentester first name>,
 Per the signed scope letter dated <YYYY-MM-DD>, the Veza v1.0.9
 pre-launch pentest engagement starts on <YYYY-MM-DD>. The brief is
 attached as PENTEST_SCOPE_2026.md (see also the rendered HTML at
 https://staging.veza.fr/legal/pentest-scope-2026.html).
 Quick links :
  • Staging URL  : https://staging.veza.fr
  • Source code  : https://10.0.20.105:3000/senke/veza
                   (grey-box, read-only ; your egress IP <PENTESTER_IP>
                    has been allow-listed as of <YYYY-MM-DD HH:MM UTC>.)
  • Status page  : https://status.veza.fr (we'll lower the alert
                    threshold during your engagement so the SOC isn't
                    paged on every benign 401).
  • Test accounts: shared with your firm's 1Password — entries
                    pentest-2026-{listener,creator,admin}. Passwords
                    expire <engagement_end + 7d>.
 Engagement window :
  • Start  : <YYYY-MM-DD>
  • End    : <YYYY-MM-DD>  (~10 business days)
  • Re-test: 1 round, after our team's fix pass (typically 2 weeks
              after the initial report)
 Communications :
  • Async       : security@veza.fr (PGP fingerprint at
                   https://veza.fr/.well-known/security.txt)
  • Weekly sync : <weekday HH:MM TZ>, video link in the calendar invite
  • Critical findings : phone the on-call number in the contract
                         (HIGH severity = phone, not email)
 Expected deliverables :
  • Initial findings report (markdown or PDF) at engagement end
  • Re-test report after our fix pass
  • Optional : exec-level summary slide deck
 Reach out if anything in PENTEST_SCOPE_2026.md is unclear before
 day 1. Otherwise — good hunting.
 Best,
 <Tech lead name>
 Veza
 ```
 - [ ] Email PGP-signed and sent.
 - [ ] Calendar invite sent for the weekly sync.
 - [ ] Slack/Signal channel created for HIGH-severity escalation
      (channel naming : `#pentest-2026-veza`).
 ### Step 5 — lower the SOC alerting threshold
 During the engagement, automated scanners and authentication
 brute-force attempts WILL fire alerts. Tune them down so the on-call
 isn't paged on every legitimate pentester action.
 - [ ] In `config/prometheus/alert_rules.yml` → `HighErrorRate`,
      `HighLatencyP99` : add a `for: 30m` override OR mute via
      Alertmanager silence (recommended: silence rather than edit
      rules so the change auto-expires at engagement end).
 - [ ] Silence URL : `https://prometheus.veza.fr/alertmanager/#/silences/new`
      → matchers: `severity=warning`, comment: `pentest-2026 active`,
      duration: `engagement_end + 24h`.
 - [ ] Subscribe the engagement Slack channel to the silence's
      auto-removal so the SOC knows when the heightened alerting
      resumes.
 ## Reception checklist (after pentester confirms receipt)
 - [ ] Pentester replied to the kick-off email within 1 business day.
 - [ ] Pentester confirmed they can `git clone` the source repo.
 - [ ] Pentester confirmed they can log in as each of the 3 test
      accounts.
 - [ ] Pentester confirmed the staging URL responds (`/api/v1/health`
      returns 200).
 - [ ] First findings — even informational — start landing in the
      shared report by end of engagement day 3 (a complete silence
      until the final report is a process smell).
 If any reception checklist item fails after 24h, the engagement
 hasn't really started. Phone the firm's PM, don't email.
 ## Post-engagement housekeeping
 - [ ] Findings report received → import into the issue tracker as
      separate tickets, severity preserved, attribution
      `external-pentest-2026`.
 - [ ] Fix pass scheduled and timeboxed (HIGH within 1 week, MEDIUM
      within 4 weeks, LOW best-effort).
 - [ ] Re-test scheduled 2 weeks after fix-pass start.
 - [ ] Re-test report received → update the ticket statuses ; any
      remaining unresolved finding above LOW blocks v2.0.0-public.
 - [ ] Test accounts' passwords manually rotated **the day the
      engagement ends** (don't wait for 1Password's auto-expiry).
 - [ ] Pentester IP removed from Forgejo allow-list.
 - [ ] Alertmanager silence removed (should auto-remove, but verify).
 - [ ] Engagement folder zipped and stored at
      `docs/archive/pentest-2026/` (kept 5 years for audit trail).
 - [ ] Public summary blog post drafted (no findings details, just the
      "we did this, here's what we learned" framing). Reviewed by
      legal before publish.
 ## Linked artefacts
 - `docs/PENTEST_SCOPE_2026.md` — the technical scope (what's testable)
 - `docs/SECURITY_PRELAUNCH_AUDIT.md` — internal Day 21 audit (what we
  already cleared)
 - `docs/archive/PENTEST_REPORT_VEZA_v0.12.6.md` — last engagement's
  report, format reference for what to expect back
 - `scripts/pentest/seed-test-accounts.sh` — credential provisioning
  helper (creates the 3 staging accounts referenced in the scope)
 - `docs/GO_NO_GO_CHECKLIST_v2.0.0_PUBLIC.md` — the row this engagement
  unblocks
--- a/docs/SOFT_LAUNCH_BETA_2026_CHECKLIST.md
+++ b/docs/SOFT_LAUNCH_BETA_2026_CHECKLIST.md
@ -0,0 +1,150 @@
 # Soft-launch beta — pre-flight checklist
 > Operational checklist that must reach 100% green before the first
 > invitation goes out. Companion to `docs/SOFT_LAUNCH_BETA_2026.md`
 > (the bigger picture). This file is purely the "before you press
 > send, has every gate been verified?" view.
 The whole reason the soft-launch is "soft" is that it lets you catch
 infrastructure surprises with 50 testers instead of 50 000. To get
 that benefit, the infrastructure has to actually work BEFORE the
 invitations land. This checklist is the gate.
 ## T-72h checklist (3 days before send)
 ### Database
 - [ ] `migrations/990_beta_invites.sql` applied to staging.
      Verify with :
      ```bash
      psql "$STAGING_DATABASE_URL" -c "SELECT count(*) FROM beta_invites;"
      ```
      Expected : `0` (table exists, empty).
 - [ ] Same migration applied to prod (whenever prod tag goes out).
 - [ ] Backup-freshness OK on both environments :
      ```bash
      pgbackrest --stanza=veza info | head -20
      ```
      Most recent full or diff < 24 h old.
 ### Cohort CSV
 - [ ] CSV file built from the operator's chosen sources (mailing list +
      contacts + community partners). Format per
      `scripts/soft-launch/validate-cohort.sh` header.
 - [ ] `validate-cohort.sh` returns exit 0 (or exit 2 with explicit
      operator acknowledgement of the warnings).
 - [ ] Distribution sanity : `≥ 5` creators, `≥ 20` listeners, `≥ 3`
      distinct cohort labels, `≥ 50` total rows.
 ### Email infrastructure
 - [ ] SMTP credentials live in the operator's machine `~/.msmtprc`
      (or whatever `SEND_CMD` resolves to).
 - [ ] `templates/email/beta_invite.eml.template` reviewed — wording,
      cohort variable, code variable.
 - [ ] Test send to operator's own email :
      ```bash
      echo "ops@veza.fr,test-cohort,ops@veza.fr" > /tmp/me.csv
      DATABASE_URL=$STAGING_DATABASE_URL FRONTEND_URL=https://staging.veza.fr \
        SEND=1 bash scripts/soft-launch/send-invitations.sh /tmp/me.csv
      ```
      Verify the eml renders correctly in your mail client (links
      clickable, fonts loaded, no `{{TO_ADDR}}` literals leaking).
 ### Backend invite-redemption path
 - [ ] Visit `https://staging.veza.fr/signup?invite=<test-code>`.
      Expected : signup form pre-fills the code, refuses to submit
      without it, marks the invite as `used_at = NOW()` after success.
 - [ ] Try an invalid code → form rejects with a clear error message.
 - [ ] Try the same code twice → second attempt rejects (one-time use).
 - [ ] Try an expired code → form rejects with "expired".
 ### Acceptance-gate monitoring
 - [ ] Run `monitor-checks.sh` once on staging — every gate either ✅
      or ⚪ (unknown), no 🔴.
      ```bash
      DATABASE_URL=$STAGING_DATABASE_URL \
      SENTRY_AUTH_TOKEN=... \
      PROM_URL=https://prom.veza.fr \
      bash scripts/soft-launch/monitor-checks.sh
      ```
 - [ ] Schedule the cron run (or tmux session) so the gate state is
      visible during the bêta window without manual re-run.
 ### Communications
 - [ ] Discord `#beta-feedback` channel created, ground rules pinned.
 - [ ] Typeform feedback form created ; URL pasted into
      `templates/email/beta_invite.eml.template` if not already in the
      cohort label.
 - [ ] Status page maintenance window declared for the duration —
      "elevated alerting may occur during beta period."
 - [ ] Operators on duty for the day rota'd in the calendar (every 4 h
      shift, primary + backup).
 ## D-day checklist (the day of send)
 ### Last hour before send
 - [ ] Most recent k6 nightly green (within 30 h).
 - [ ] No pending high-severity Sentry issue.
 - [ ] No PagerDuty incident open.
 - [ ] HAProxy + backend healthchecks green :
      ```bash
      curl -s https://staging.veza.fr/api/v1/health | jq .status
      ```
 - [ ] MinIO drives all online ; pgBackRest drill ran successfully in
      the last 7 days.
 ### Send
 - [ ] `validate-cohort.sh` exit code 0 (or 2 with explicit override).
 - [ ] `send-invitations.sh` in DRY-RUN mode : eml output dir reviewed.
 - [ ] `send-invitations.sh` with `SEND=1` : dispatch.log reviewed
      after run, `0` failed dispatches.
 - [ ] First three invitees received the email within 5 min (manual
      check on three different domains : gmail / proton / one custom).
 ### Hour 1 post-send
 - [ ] First sign-up landed (`SELECT count(*) FROM beta_invites WHERE
      used_at IS NOT NULL;` returns ≥ 1).
 - [ ] No spike in 5xx on Grafana "Veza API Overview".
 - [ ] Discord `#beta-feedback` has at least one "I'm in" message.
 ### Every 4 h during the bêta window
 - [ ] Re-run `monitor-checks.sh` (or the cron wakes you).
 - [ ] Triage any HIGH-severity report within 1 h (per
      `docs/SOFT_LAUNCH_BETA_2026.md` §"Issue triage matrix").
 - [ ] Update the issues-reported table in
      `docs/SOFT_LAUNCH_BETA_2026.md` so the decision call has fresh data.
 ## D+0 18:00 UTC — decision call
 - [ ] Tech lead, product lead, on-call engineer all on the call.
 - [ ] `monitor-checks.sh` final run shown live ; verdict screenshotted.
 - [ ] Each acceptance-gate row from `SOFT_LAUNCH_BETA_2026.md`
      §"Acceptance gate" walked through verbally.
 - [ ] Unanimous GO or any one NO-GO documented in the meeting notes.
 - [ ] Decision logged in `docs/SOFT_LAUNCH_BETA_2026.md` §"Take-aways".
 If GO : the v2.0.0-public tag goes out the next morning.
 If NO-GO : the meeting decides scope of fix-pass + new acceptance date.
 ## Linked artefacts
 - `docs/SOFT_LAUNCH_BETA_2026.md` — the bigger picture (cohort
  definition, email template inline, day timeline, monitoring list,
  acceptance gate, decision protocol)
 - `migrations/990_beta_invites.sql` — schema this depends on
 - `scripts/soft-launch/validate-cohort.sh` — pre-send sanity check
 - `scripts/soft-launch/send-invitations.sh` — batch insert + send
 - `scripts/soft-launch/monitor-checks.sh` — live gate poll
 - `templates/email/beta_invite.eml.template` — the email recipients
  receive
 - `docs/GO_NO_GO_CHECKLIST_v2.0.0_PUBLIC.md` — the v2.0.0 checklist
  this unblocks
--- a/docs/runbooks/rabbitmq-down.md
+++ b/docs/runbooks/rabbitmq-down.md
@ -0,0 +1,164 @@
 # Runbook — RabbitMQ unavailable
 > **Alert** : `RabbitMQUnreachable` (in `config/prometheus/alert_rules.yml`).
 > **Owner** : infra on-call.
 > **Game-day scenario** : E (`infra/ansible/tests/test_rabbitmq_outage.sh`).
 ## What breaks when RabbitMQ is down
 RabbitMQ is a fan-out broker for asynchronous, non-user-facing work
 (transcode jobs, distribution to external platforms, email digests,
 DMCA takedown propagation, search index updates). The user-facing
 request path does NOT block on RabbitMQ — the API publishes a message
 and returns 202 Accepted ; the worker picks it up later.
 | Subsystem                            | Effect when RabbitMQ is gone                                       | Severity |
 | ------------------------------------ | ------------------------------------------------------------------ | -------- |
 | Track upload → HLS transcode         | Upload succeeds (S3 write OK), HLS segments don't appear           | **MEDIUM** — track playable via fallback `/stream`, not via HLS |
 | Distribution to Spotify/SoundCloud   | Submission silently queued ; users see "pending" forever           | MEDIUM — surfaces in distribution dashboard, not in player |
 | Email digest (weekly creator stats)  | Cron tick logs `publish failed`, retries on next tick              | LOW — eventual consistency, no user-visible breakage |
 | DMCA takedown event                  | Track flag flipped in DB synchronously ; downstream replay queue stalls | **HIGH** — track is gated immediately (synchronous DB UPDATE), but cache invalidation lags |
 | Search index updates                 | New tracks not searchable until queue drains                       | LOW — falls back to Postgres FTS |
 | Chat messages (WebSocket)            | INDEPENDENT — chat is direct WS, no RabbitMQ involvement           | NONE |
 | Auth, sessions, payments             | INDEPENDENT — no RabbitMQ dependency                               | NONE |
 The synchronous-fail-loud cases (DMCA cache invalidation, transcode
 queue) are the ones that compound if the outage drags. Most user
 flows degrade gracefully.
 ## First moves
 1. **Confirm RabbitMQ is actually down**, not "unreachable from one
   host" :
   ```bash
   curl -s -u "$RMQ_USER:$RMQ_PASS" http://rabbitmq.lxd:15672/api/overview \
     | jq '.cluster_name, .object_totals'
   ```
 2. **Confirm what changed.** If a deploy fired in the last 30 min,
   suspect the deploy. Check `journalctl -u veza-backend-api -n 200`
   for `amqp` errors with timestamps after the deploy.
 3. **Check the queues didn't fill the disk** (most common bring-down
   in development) :
   ```bash
   ssh rabbitmq.lxd 'df -h /var/lib/rabbitmq'
   ```
 ## RabbitMQ instance is down
 ```bash
 # State on the RabbitMQ host :
 ssh rabbitmq.lxd sudo systemctl status rabbitmq-server
 # Logs (Erlang verbosity, grep for ERROR/CRASH) :
 ssh rabbitmq.lxd sudo journalctl -u rabbitmq-server -n 500 \
  | grep -E 'ERROR|CRASH|disk_alarm|memory_alarm'
 ```
 Common causes :
 - **Disk alarm.** `/var/lib/rabbitmq` filled — RabbitMQ pauses producers
  when free space drops below `disk_free_limit`. The backend's amqp
  client surfaces this as "blocked". Fix : grow the disk or expire old
  messages with `rabbitmqctl purge_queue <queue>` (last resort, you
  lose what's in there).
 - **Memory alarm.** RSS over `vm_memory_high_watermark` × system mem.
  Same effect (producers blocked). Fix : add memory or unblock by
  draining a slow consumer.
 - **Process crashed.** Erlang OOM, segfault. `sudo systemctl restart
  rabbitmq-server` ; the queues survive (durable=true on every queue
  we declare).
 - **Cluster split-brain.** v1.0 is single-node, so this can't happen
  yet. Listed for the v1.1 multi-node config.
 ## Backend can't reach RabbitMQ
 Network or DNS issue, not RabbitMQ's fault.
 ```bash
 # From the API container :
 nc -zv rabbitmq.lxd 5672
 # DNS :
 getent hosts rabbitmq.lxd
 # AMQP credentials :
 docker exec veza_backend_api env | grep AMQP_URL
 ```
 Likely culprits : Incus bridge restart, password rotation didn't
 propagate to the API container's env, security-group change.
 ## Mitigation while RabbitMQ is down
 The backend already handles publish failures gracefully :
 - `internal/eventbus/rabbitmq.go` retries with exponential backoff up
  to 30s, then drops to "degraded mode" (publish returns immediately
  with a logged warning, the API call succeeds, the side-effect is
  lost).
 - Workers in `internal/workers/` have `WithRetry()` middleware that
  republishes failed deliveries up to 5 times before dead-lettering.
 If recovery is going to take > 10 min, set
 `EVENTBUS_DEGRADED_LOG_LEVEL=error` (default `warn`) so the
 fail-fast logs land in Sentry and operators can audit which messages
 were dropped.
 **Do NOT** restart the backend to clear the AMQP connection pool ;
 the reconnect logic (`go.uber.org/zap`-logged in eventbus.go:142)
 handles it once RabbitMQ is back.
 ## Recovery
 Once RabbitMQ is back up :
 1. Verify connectivity from each backend instance :
   ```bash
   docker exec veza_backend_api sh -c 'echo -e "AMQP\x00\x00\x09\x01" | nc -w1 rabbitmq.lxd 5672 | head -c 4'
   ```
   Should return `AMQP`.
 2. Watch the queue depth on the management UI :
   `http://rabbitmq.lxd:15672/#/queues`. Expect `transcode_jobs`,
   `distribution_outbox`, `dmca_propagation`, `search_index_updates`
   to drain over the next 5-15 min as the workers catch up.
 3. If a queue is stuck > 30 min after recovery, the worker for it is
   wedged — restart that specific worker container :
   ```bash
   docker compose -f docker-compose.prod.yml restart worker-<name>
   ```
 ## Audit after the outage
 1. Sentry filter `tag:eventbus.status=degraded` between outage start
   and end — gives you the count and shape of dropped events.
 2. For each dropped DMCA event, manually trigger the cache flush :
   ```bash
   curl -X POST -H "Authorization: Bearer $ADMIN_TOKEN" \
     https://api.veza.fr/api/v1/admin/cache/dmca/flush
   ```
 3. For each dropped transcode job, requeue from the orders table :
   ```bash
   psql "$DATABASE_URL" -c "
     INSERT INTO transcode_jobs (track_id, status, attempts, created_at)
     SELECT id, 'pending', 0, NOW() FROM tracks
     WHERE created_at BETWEEN '<outage_start>' AND '<outage_end>'
       AND hls_status IS NULL;
   "
   ```
 ## Postmortem trigger
 Any RabbitMQ outage > 10 min triggers a postmortem. The non-user-facing
 nature makes this less urgent than Redis or Postgres, but the
 silent-failure modes (dropped DMCA propagation, missing transcodes)
 warrant a write-up so we know what slipped through.
 ## Future-proofing
 - v1.1 will move to a 3-node RabbitMQ cluster behind a load balancer
  for HA. This runbook will then split into "single-node down" (the
  cluster keeps serving) and "cluster split-brain" (rare, but the
  recovery path is different).
 - Worker idempotency keys are documented in `docs/api/eventbus.md` —
  any new worker MUST honour them so a replay during recovery doesn't
  double-charge / double-distribute / double-takedown.
--- a/infra/ansible/inventory/group_vars
+++ b/infra/ansible/inventory/group_vars
@ -0,0 +1 @@
 ../group_vars
--- a/infra/ansible/inventory/prod.yml
+++ b/infra/ansible/inventory/prod.yml
@ -20,6 +20,16 @@ all:
      ansible_user: senke
      ansible_python_interpreter: /usr/bin/python3
  children:
    # Env-named meta-group — see inventory/staging.yml for rationale.
    prod:
      children:
        incus_hosts:
        forgejo_runner:
        haproxy:
        veza_app_backend:
        veza_app_stream:
        veza_app_web:
        veza_data:
    incus_hosts:
      hosts:
        veza-prod:
--- a/infra/ansible/inventory/staging.yml
+++ b/infra/ansible/inventory/staging.yml
@ -36,6 +36,18 @@ all:
      ansible_user: senke
      ansible_python_interpreter: /usr/bin/python3
  children:
    # Env-named meta-group : every host below is also in `staging`,
    # which makes group_vars/staging.yml apply (Ansible matches
    # group_vars file names against group names).
    staging:
      children:
        incus_hosts:
        forgejo_runner:
        haproxy:
        veza_app_backend:
        veza_app_stream:
        veza_app_web:
        veza_data:
    incus_hosts:
      hosts:
        veza-staging:
--- a/infra/ansible/playbooks/haproxy.yml
+++ b/infra/ansible/playbooks/haproxy.yml
@ -18,14 +18,28 @@
  become: true
  gather_facts: true
  tasks:
-    - name: Launch veza-haproxy container if absent
+    - name: Launch / repair veza-haproxy container
      # Idempotent : RUNNING → no-op ; STOPPED/half-baked → recreate ;
      # absent → fresh launch. Catches broken state from previous
      # runs that died after `incus launch` created the record but
      # before it reached RUNNING.
      ansible.builtin.shell:
        cmd: |
          set -e
-          if incus info veza-haproxy >/dev/null 2>&1; then
+          STATE=$(incus list veza-haproxy -f csv -c s 2>/dev/null | head -1 || true)
-            echo "veza-haproxy already exists"
+          case "$STATE" in
            RUNNING)
              echo "veza-haproxy RUNNING already"
              exit 0
-          fi
+              ;;
            "")
              # No record — fresh launch.
              ;;
            *)
              echo "veza-haproxy in state '$STATE' — recreating"
              incus delete --force veza-haproxy
              ;;
          esac
          incus launch "{{ veza_app_base_image | default('images:debian/13') }}" veza-haproxy --profile veza-app --network "{{ veza_incus_network | default('net-veza') }}"
          for _ in $(seq 1 30); do
            if incus exec veza-haproxy -- /bin/true 2>/dev/null; then
@ -35,21 +49,54 @@
          done
          incus exec veza-haproxy -- apt-get update
          incus exec veza-haproxy -- apt-get install -y python3 python3-apt
          echo "veza-haproxy LAUNCHED"
        executable: /bin/bash
      register: provision_result
-      changed_when: "'incus launch' in provision_result.stdout"
+      changed_when: "'LAUNCHED' in provision_result.stdout or 'recreating' in provision_result.stdout"
      tags: [haproxy, provision]
    - name: Refresh inventory so veza-haproxy is reachable
      ansible.builtin.meta: refresh_inventory
- name: Apply common baseline (SSH hardening, fail2ban, node_exporter)
+    # Incus proxy devices : forward the host's :80 / :443 to the
-  hosts: haproxy
+    # container's :80 / :443. Without this, packets from the box's
-  become: true
+    # NAT (Internet → R720:80) hit the host but never reach the
-  gather_facts: true
+    # container — HAProxy is reachable on net-veza only, not on
-  roles:
+    # the host's public-facing interface.
-    - common
+    - name: Ensure incus proxy device for port 80 (R720 host → veza-haproxy)
      ansible.builtin.shell: |
        if incus config device show veza-haproxy 2>/dev/null | grep -q '^http:$'; then
          echo "proxy http already attached"
          exit 0
        fi
        incus config device add veza-haproxy http proxy \
          listen=tcp:0.0.0.0:80 \
          connect=tcp:127.0.0.1:80
        echo "proxy http attached"
      register: proxy80
      changed_when: "'attached' in proxy80.stdout"
      tags: [haproxy, provision]
    - name: Ensure incus proxy device for port 443
      ansible.builtin.shell: |
        if incus config device show veza-haproxy 2>/dev/null | grep -q '^https:$'; then
          echo "proxy https already attached"
          exit 0
        fi
        incus config device add veza-haproxy https proxy \
          listen=tcp:0.0.0.0:443 \
          connect=tcp:127.0.0.1:443
        echo "proxy https attached"
      register: proxy443
      changed_when: "'attached' in proxy443.stdout"
      tags: [haproxy, provision]
 # Common role intentionally NOT applied to the haproxy container :
 # it's reached via `incus exec` (no SSH inside), and the role's
 # SSH-hardening / fail2ban / node_exporter setup assumes a full
 # host (sshd present, auth.log to monitor, exposed metrics port).
 # Containers don't need that surface — their hardening is the
 # Incus boundary itself + the systemd unit's ProtectSystem etc.
 - name: Install + configure HAProxy + dehydrated/Let's Encrypt
  hosts: haproxy
  become: true
--- a/infra/ansible/roles/common/tasks/ssh.yml
+++ b/infra/ansible/roles/common/tasks/ssh.yml
@ -2,7 +2,25 @@
 # whitelist of users. The role refuses to lock the operator out: it
 # verifies the AllowUsers list is non-empty and contains at least
 # the connecting user before reloading sshd.
 #
 # Skipped entirely when sshd is not installed on the target — useful
 # for Incus containers reached via `incus exec`, which don't need
 # SSH at all (overlay set common_apply_ssh_hardening=false to skip
 # explicitly even when sshd happens to be present).
 ---
 - name: Detect whether sshd is present on the target
  ansible.builtin.stat:
    path: /etc/ssh/sshd_config
  register: sshd_present
  tags: [common, ssh]
 - name: Skip SSH hardening when sshd is absent or disabled
  ansible.builtin.debug:
    msg: "sshd not installed on this host — SSH hardening skipped"
  when:
    - not sshd_present.stat.exists or not (common_apply_ssh_hardening | default(true))
  tags: [common, ssh]
 - name: Sanity check — ssh_allow_users must be non-empty
  ansible.builtin.assert:
    that:
@ -12,6 +30,9 @@
      ssh_allow_users is empty. Refusing to apply sshd_config which
      would lock everyone out. Set ssh_allow_users in
      group_vars/all.yml (or override per environment).
  when:
    - sshd_present.stat.exists
    - common_apply_ssh_hardening | default(true)
 - name: Render sshd_config drop-in (50-veza-hardening.conf)
  ansible.builtin.template:
@ -22,9 +43,15 @@
    mode: "0644"
    validate: /usr/sbin/sshd -t -f %s
  notify: Reload sshd
  when:
    - sshd_present.stat.exists
    - common_apply_ssh_hardening | default(true)
 - name: Ensure sshd is enabled + running
  ansible.builtin.service:
    name: ssh
    state: started
    enabled: true
  when:
    - sshd_present.stat.exists
    - common_apply_ssh_hardening | default(true)
--- a/infra/ansible/roles/haproxy/files/selfsigned.pem
+++ b/infra/ansible/roles/haproxy/files/selfsigned.pem
@ -0,0 +1,50 @@
 -----BEGIN PRIVATE KEY-----
 MIIEvgIBADANBgkqhkiG9w0BAQEFAASCBKgwggSkAgEAAoIBAQCgyerZjp1+RxU8
 /bISXduo8OjR2ejl5SD034PyQvT5B9tk83yplplHoG+JL78UGqpflPlhU9fQSoT9
 Walusf/MDDCEbQ75sjPui+yNuvcgWkmpN0MUdOHR8gvfiADCR6/eDQuRf7JJh5N8
 YdCtLtnOYsha7Bix+bN11GO6XzPG869I/UGdg4g0v7LvDCP3tI0tpno+y4MuiDvJ
 R1pQd7sl6jxPp4zvNtVw8vrSVA3qJ8G6F78nnPUUPFnrAlUFNcnMVLamxY0IA3H4
 n9o7X73RnphrpcnPr6eyEYxOL0UGhsDMsQxTrhSaOErL68QDTk3hV60SxWqsVlxX
 /DoKAb9VAgMBAAECggEAenTt6V3Fsxv+H+Jz0assFYHNP63/w797FyR4QHUgT93d
 CQisRBjPio61A72agHxCj+NM/wQ1FIz8tluoQAdO8x/Bf8nzotZG2QI2Wkcv2bMJ
 8NeGvji6mAQJaOgS8+RXG/3BdsHTjk60VAHHRW6uMZJoV18C++FZ/X6RqarCK13N
 UEfHX529qNvLhw+xkjXFW/qiB3dQTTEJq+9y0U4nGrjZCXtspkXN3g6ETU6Svzhq
 z4tq0udC7FjZPqdA79ChXweZlDCq89FQfxAnxRoZAiwymK91VrGz/GyMIwdBPidm
 +or8Rk6nodKk8AuwsGE6ub9UhWUS+Kdpl9fNcV1jLQKBgQDRA7D786sf25tgyooF
 6IMZwQfHWGmIepUPruHLz5aV6ozO8XQBgEN4XBI15mxJTu+eeXGbqOhwwuhvYR9u
 G02qPE0OlftBRnBJp2AH5+gRphLyrRAvgnjVw323ucnsjOzO0TPwdehomKC0J3b9
 B+hZ2tKW/nNxqX/iU1ue969lAwKBgQDE7vJnppvAZLSMo4PCtBTJm11u58AZ9LyZ
 6dxvpiq6XxPw9DcC2gj91pCST2g4vIqDYQgmh5U3RzMIFsKLtKfDvHEAYbFOnEfz
 UXoNFjlCEmB2jHgpn51/ZDokpPSF9MooDUFna0JPaUrduHs8Zzv7kfrsAhq2N++C
 eB+jMea+xwKBgESDzEFbB85io5Vf70yugkMv9ofPIJD/ddt1PUkdHES6ZTv1BEz1
 qahLriCDDx4cxQmSz73x6XgFPEI+eRoT0yqpp6zPV1R3bZmHR0BwMa+PXAi22GZq
 g4e3FH/kZB+ptnq5MyhwziVzWsKTaTram7zQsVWTxW4N3QDoyFDc6l7XAoGBAI85
 +bLIyZ4zn9xpT/rbXgMCrAFtK5m1FTYbj+bjw0+otqgX9aptSPzUgHDor7QT6+mB
 OJxNH4kEj2jipLtWuGzzMHxGkN3La8jbCRlbgGk9VErj/sDHBZURH/hmwDBsyFo4
 ycidiayXt4tqELbtngJpOUVMgoDkTZ1mIBxgvqEhAoGBAK6uX4k2xiOQorpByvjd
 gT16MbuntXO/bDXnXaq1keNMr1JzQ5aS346XweiUgRG7ZJdEb2C8sXwSmh2+oeGa
 G+QCLH73hwo/PWbU560dFY8s6z5E79WBjYUu5+1/a0SCBwQ4mEVB7REQVY1mQoJT
 A+A8WW+EDvaPpVFujA26K3fc
 -----END PRIVATE KEY-----
 -----BEGIN CERTIFICATE-----
 MIIDjTCCAnWgAwIBAgIUbgZuZRFj8M8ZcdhRFikB2bJKswYwDQYJKoZIhvcNAQEL
 BQAwVjELMAkGA1UEBhMCWFgxFTATBgNVBAcMDERlZmF1bHQgQ2l0eTEcMBoGA1UE
 CgwTRGVmYXVsdCBDb21wYW55IEx0ZDESMBAGA1UEAwwJbG9jYWxob3N0MB4XDTIy
 MDQwODEwMTA0OFoXDTQ5MDgyNDEwMTA0OFowVjELMAkGA1UEBhMCWFgxFTATBgNV
 BAcMDERlZmF1bHQgQ2l0eTEcMBoGA1UECgwTRGVmYXVsdCBDb21wYW55IEx0ZDES
 MBAGA1UEAwwJbG9jYWxob3N0MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKC
 AQEAoMnq2Y6dfkcVPP2yEl3bqPDo0dno5eUg9N+D8kL0+QfbZPN8qZaZR6BviS+/
 FBqqX5T5YVPX0EqE/VmpbrH/zAwwhG0O+bIz7ovsjbr3IFpJqTdDFHTh0fIL34gA
 wkev3g0LkX+ySYeTfGHQrS7ZzmLIWuwYsfmzddRjul8zxvOvSP1BnYOINL+y7wwj
 97SNLaZ6PsuDLog7yUdaUHe7Jeo8T6eM7zbVcPL60lQN6ifBuhe/J5z1FDxZ6wJV
 BTXJzFS2psWNCANx+J/aO1+90Z6Ya6XJz6+nshGMTi9FBobAzLEMU64UmjhKy+vE
 A05N4VetEsVqrFZcV/w6CgG/VQIDAQABo1MwUTAdBgNVHQ4EFgQUJZDike5gfaOV
 k8uCwfCh2OrPXd0wHwYDVR0jBBgwFoAUJZDike5gfaOVk8uCwfCh2OrPXd0wDwYD
 VR0TAQH/BAUwAwEB/zANBgkqhkiG9w0BAQsFAAOCAQEAQbXAIBoDHQakksvKGo3X
 /bIyc+IQKFpsyWrn5GvS69wTE7XBfKLtyY3X8NygvsCaRx0r2OIdVERNjrhELkes
 tWQE17D1+tDnsaEQRUNJsjBYmealNPpqqacdRlBNnkTSGM/3d3m/ihlA51A1QzyI
 IOtKxRRIZ+24L/eww5Hv96ub3Wu4rVmepXP4cVIcPEnN6ntmOv4Ja/M83hLI2oXy
 4XmXOVsyliYDGWiyvT2U3LcRsv9PHr09SqYO/5yW+fYC7diLGSHW0kfwht2Q8Zqg
 IFMJMDmmKTbCWCmFYdoVTRm2fFl0YvgpC5JrXuSloHh3hRiLwDIUiTxlTM3JDP8q
 PQ==
 -----END CERTIFICATE-----
--- a/infra/ansible/roles/haproxy/tasks/main.yml
+++ b/infra/ansible/roles/haproxy/tasks/main.yml
@ -26,6 +26,29 @@
    mode: "0750"
  tags: [haproxy, config]
 # Chicken-and-egg : haproxy.cfg.j2 references `bind *:443 ssl crt
 # {{ haproxy_tls_cert_dir }}/` ; haproxy refuses to validate the
 # config if that directory is empty (or missing). dehydrated creates
 # real LE certs there LATER (in letsencrypt.yml). Break the cycle
 # the same way the working roles in
 # /home/senke/Documents/TG__Talas_Group/.../roles/haproxy do : ship a
 # checked-in `selfsigned.pem` and copy it into the cert dir.
 # Once dehydrated lands real certs alongside, SNI picks the matching
 # real cert ; selfsigned.pem only matches CN=localhost (harmless).
 - name: Ensure {{ haproxy_tls_cert_dir }} exists
  ansible.builtin.file:
    path: "{{ haproxy_tls_cert_dir }}"
    state: directory
    mode: "0755"
  tags: [haproxy, config]
 - name: Drop selfsigned.pem so haproxy can validate the cfg
  ansible.builtin.copy:
    src: selfsigned.pem
    dest: "{{ haproxy_tls_cert_dir }}/selfsigned.pem"
    mode: "0640"
  tags: [haproxy, config]
 - name: Render haproxy.cfg
  ansible.builtin.template:
    src: haproxy.cfg.j2
@ -33,7 +56,10 @@
    owner: root
    group: haproxy
    mode: "0640"
-    validate: "haproxy -f %s -c -q"
+    # No -q so the actual validation error reaches the operator's
    # console. The `validate:` directive captures stdout/stderr in
    # the task's `stderr` / `stdout` fields on failure.
    validate: "haproxy -f %s -c"
  register: haproxy_config
  notify: Reload haproxy
  tags: [haproxy, config]
--- a/infra/ansible/roles/haproxy/templates/haproxy.cfg.j2
+++ b/infra/ansible/roles/haproxy/templates/haproxy.cfg.j2
@ -41,6 +41,28 @@ defaults
    timeout http-request 10s
    load-server-state-from-file global
 # -----------------------------------------------------------------------
 # DNS resolvers — Incus's managed bridges expose a built-in DNS
 # resolver on the gateway IP for the bridge's subnet (10.0.20.1 for
 # net-veza). Backend containers' .lxd hostnames resolve here.
 # init-addr last,libc,none on default-server lets HAProxy start
 # even if the backends don't exist yet ; servers go into MAINT
 # until the resolver returns an address (deploy_app.yml creates
 # them later, then `incus-resolver` task in HAProxy picks them up
 # automatically — no haproxy reload needed).
 # -----------------------------------------------------------------------
 resolvers veza_dns
    nameserver incus_gw 10.0.20.1:53
    accepted_payload_size 4096
    resolve_retries 3
    timeout resolve 1s
    timeout retry 1s
    hold valid 10s
    hold nx 5s
    hold timeout 5s
    hold refused 5s
    hold obsolete 30s
 # -----------------------------------------------------------------------
 # Stats endpoint — bound to loopback only ; the Prometheus haproxy
 # exporter sidecar scrapes it.
@ -63,9 +85,12 @@ frontend veza_http_in
    bind *:{{ haproxy_listen_https }} ssl crt {{ haproxy_tls_cert_dir }}/ alpn h2,http/1.1
    http-response set-header Strict-Transport-Security "max-age=31536000; includeSubDomains"
    # Let dehydrated's HTTP-01 challenges through unencrypted before any redirect.
    # Order matters : http-request rules must come BEFORE use_backend
    # rules in HAProxy ; otherwise haproxy 3.x warns and processes them
    # in the unintended order.
    acl acme_challenge path_beg /.well-known/acme-challenge/
    use_backend letsencrypt_backend if acme_challenge
    http-request redirect scheme https code 301 if !{ ssl_fc } !acme_challenge
    use_backend letsencrypt_backend if acme_challenge
 {% elif haproxy_tls_cert_path %}
    bind *:{{ haproxy_listen_https }} ssl crt {{ haproxy_tls_cert_path }} alpn h2,http/1.1
    http-response set-header Strict-Transport-Security "max-age=31536000; includeSubDomains"
@ -146,7 +171,7 @@ backend {{ env }}_backend_api
    option httpchk GET {{ veza_healthcheck_paths.backend | default('/api/v1/health') }}
    http-check expect status 200
    cookie {{ haproxy_sticky_cookie_name }}_{{ env }} insert indirect nocache httponly secure
-    default-server check inter {{ haproxy_health_check_interval_ms }} fall {{ haproxy_health_check_fall }} rise {{ haproxy_health_check_rise }} on-marked-down shutdown-sessions slowstart {{ haproxy_graceful_drain_seconds }}s
+    default-server check inter {{ haproxy_health_check_interval_ms }} fall {{ haproxy_health_check_fall }} rise {{ haproxy_health_check_rise }} on-marked-down shutdown-sessions slowstart {{ haproxy_graceful_drain_seconds }}s init-addr last,libc,none resolvers veza_dns
    server {{ env }}_backend_blue  {{ prefix }}backend-blue.{{ veza_incus_dns_suffix }}:{{ veza_backend_port }}  cookie {{ env }}_backend_blue  {{ '' if _active == 'blue' else 'backup' }}
    server {{ env }}_backend_green {{ prefix }}backend-green.{{ veza_incus_dns_suffix }}:{{ veza_backend_port }} cookie {{ env }}_backend_green {{ '' if _active == 'green' else 'backup' }}
@ -157,7 +182,7 @@ backend {{ env }}_stream_pool
    option httpchk GET {{ veza_healthcheck_paths.stream | default('/health') }}
    http-check expect status 200
    timeout tunnel 1h
-    default-server check inter {{ haproxy_health_check_interval_ms }} fall {{ haproxy_health_check_fall }} rise {{ haproxy_health_check_rise }} on-marked-down shutdown-sessions slowstart {{ haproxy_graceful_drain_seconds }}s
+    default-server check inter {{ haproxy_health_check_interval_ms }} fall {{ haproxy_health_check_fall }} rise {{ haproxy_health_check_rise }} on-marked-down shutdown-sessions slowstart {{ haproxy_graceful_drain_seconds }}s init-addr last,libc,none resolvers veza_dns
    server {{ env }}_stream_blue  {{ prefix }}stream-blue.{{ veza_incus_dns_suffix }}:{{ veza_stream_port }}  {{ '' if _active == 'blue' else 'backup' }}
    server {{ env }}_stream_green {{ prefix }}stream-green.{{ veza_incus_dns_suffix }}:{{ veza_stream_port }} {{ '' if _active == 'green' else 'backup' }}
@ -166,7 +191,7 @@ backend {{ env }}_web_pool
    balance roundrobin
    option httpchk GET {{ veza_healthcheck_paths.web | default('/') }}
    http-check expect status 200
-    default-server check inter {{ haproxy_health_check_interval_ms }} fall {{ haproxy_health_check_fall }} rise {{ haproxy_health_check_rise }} on-marked-down shutdown-sessions slowstart {{ haproxy_graceful_drain_seconds }}s
+    default-server check inter {{ haproxy_health_check_interval_ms }} fall {{ haproxy_health_check_fall }} rise {{ haproxy_health_check_rise }} on-marked-down shutdown-sessions slowstart {{ haproxy_graceful_drain_seconds }}s init-addr last,libc,none resolvers veza_dns
    server {{ env }}_web_blue  {{ prefix }}web-blue.{{ veza_incus_dns_suffix }}:{{ veza_web_port }}  {{ '' if _active == 'blue' else 'backup' }}
    server {{ env }}_web_green {{ prefix }}web-green.{{ veza_incus_dns_suffix }}:{{ veza_web_port }} {{ '' if _active == 'green' else 'backup' }}
@ -174,11 +199,17 @@ backend {{ env }}_web_pool
 {% if haproxy_forgejo_host %}
 # --- Forgejo (managed outside the deploy pipeline) --------------------
 # The existing forgejo container exposes HTTPS on :3000 with a
 # self-signed cert. We re-encrypt to it (ssl verify none) ; the
 # operator's WireGuard mesh is the trust boundary, the cert chain
 # is irrelevant. Healthcheck adapted to send a Host: header so
 # Forgejo's reverse-proxy validation accepts the request.
 backend forgejo_backend
-    option httpchk GET /
+    option httpchk
-    http-check expect status 200
+    http-check send meth GET uri / ver HTTP/1.1 hdr Host {{ haproxy_forgejo_host }}
    http-check expect rstatus ^[23]
    default-server check inter 10s fall 3 rise 2
-    server forgejo {{ haproxy_forgejo_backend }}
+    server forgejo {{ haproxy_forgejo_backend }} ssl verify none sni str({{ haproxy_forgejo_host }})
 {% endif %}
 {% if haproxy_talas_hosts %}
--- a/scripts/payment-e2e-walkthrough.sh
+++ b/scripts/payment-e2e-walkthrough.sh
@ -42,6 +42,17 @@ OPERATOR_EMAIL=${OPERATOR_EMAIL:-?}
 OPERATOR_PASSWORD=${OPERATOR_PASSWORD:-?}
 ORDER_POLL_TIMEOUT=${ORDER_POLL_TIMEOUT:-300}
 ORDER_POLL_INTERVAL=${ORDER_POLL_INTERVAL:-5}
 # v1.0.10 polish safety guards:
 #   DRY_RUN=1            — skip the POST /orders + payment steps; rehearse
 #                          the login + product-listing + license-poll path
 #                          end-to-end on staging without spending a euro.
 #   CONFIRM_PRODUCTION=1 — required when STAGING_URL points at the live
 #                          environment. Without it the script refuses to
 #                          run, so a typo (`STAGING_URL=https://veza.fr`
 #                          on a sandbox-targeted command) can't accidentally
 #                          charge a real card.
 DRY_RUN=${DRY_RUN:-0}
 CONFIRM_PRODUCTION=${CONFIRM_PRODUCTION:-0}
 SESSION_DATE="$(date +%Y%m%d-%H%M)"
 SESSION_LOG="${REPO_ROOT}/docs/PAYMENT_E2E_LIVE_REPORT.md.session-${SESSION_DATE}.log"
@ -64,6 +75,43 @@ require jq
 [ "$OPERATOR_EMAIL"     = "?" ] && fail "OPERATOR_EMAIL env var required" 3
 [ "$OPERATOR_PASSWORD"  = "?" ] && fail "OPERATOR_PASSWORD env var required" 3
 # Heuristic: any URL that doesn't include the substring "staging" is
 # treated as production. Operators on a non-veza-domain (custom env)
 # can still run the script; they just have to pass CONFIRM_PRODUCTION=1.
 TARGET_LOOKS_LIKE_PROD=0
 if [[ ! "$STAGING_URL" =~ staging ]] && [[ ! "$STAGING_URL" =~ localhost ]] && [[ ! "$STAGING_URL" =~ 127\.0\.0\.1 ]]; then
  TARGET_LOOKS_LIKE_PROD=1
 fi
 if [ "$TARGET_LOOKS_LIKE_PROD" = "1" ] && [ "$CONFIRM_PRODUCTION" != "1" ]; then
  cat >&2 <<EOF
 ================================================================
 ABORTING — production target detected without explicit confirmation
 ================================================================
 STAGING_URL=$STAGING_URL does not contain "staging", "localhost" or
 "127.0.0.1", so this script will refuse to run by default to prevent
 an accidental real-card charge.
 If you genuinely want to run against production, re-invoke with:
    CONFIRM_PRODUCTION=1 \\
    STAGING_URL=$STAGING_URL \\
    OPERATOR_EMAIL=$OPERATOR_EMAIL \\
    OPERATOR_PASSWORD=... \\
    bash scripts/payment-e2e-walkthrough.sh
 Or set DRY_RUN=1 to rehearse the flow without making the actual charge.
 ================================================================
 EOF
  exit 3
 fi
 if [ "$DRY_RUN" = "1" ]; then
  log "DRY_RUN=1 — order creation + payment + refund steps will be SKIPPED"
 fi
 # api wrapper that tee's request + response to the session log so the
 # operator can copy-paste the full trace into the report.
 api() {
@ -134,8 +182,39 @@ log "  ✓ price      : $PRODUCT_PRICE"
 # --------------------------------------------------------------------
 # Step 3 : POST /orders.
 # --------------------------------------------------------------------
 if [ "$DRY_RUN" = "1" ]; then
  log ""
  log "step 3 : POST /api/v1/marketplace/orders — SKIPPED (dry-run)"
  log "================================================================"
  log "DRY-RUN PASS : login + product list + license-mine endpoints reached"
  log "Run without DRY_RUN to exercise the real charge + refund flow."
  log "================================================================"
  exit 0
 fi
 log ""
 log "step 3 : POST /api/v1/marketplace/orders"
 # v1.0.10 polish: confirm prompt before the actual charge so a typo'd
 # product_id or wrong operator account can't quietly burn 5 EUR.
 if [ "$TARGET_LOOKS_LIKE_PROD" = "1" ]; then
  log ""
  log "================================================================"
  log "FINAL CONFIRMATION — about to charge a real card on production"
  log "================================================================"
  log "  product_id : $PRODUCT_ID"
  log "  price      : $PRODUCT_PRICE"
  log "  operator   : $OPERATOR_EMAIL"
  log "  endpoint   : ${STAGING_URL}/api/v1/marketplace/orders"
  log ""
  prompt "Type the literal word 'CHARGE' to proceed (anything else aborts) :"
  read -r confirm_word
  if [ "$confirm_word" != "CHARGE" ]; then
    fail "operator did not confirm the charge ($confirm_word) — aborting" 2
  fi
  log "  operator confirmed CHARGE — proceeding"
 fi
 order_body="{\"items\":[{\"product_id\":\"${PRODUCT_ID}\"}]}"
 order_resp=$(api POST /api/v1/marketplace/orders "$order_body" 2>/dev/null)
 ORDER_ID=$(echo "$order_resp" | jq -r '.data.order.id // .data.id // .id // ""')
--- a/scripts/pentest/seed-test-accounts.sh
+++ b/scripts/pentest/seed-test-accounts.sh
@ -0,0 +1,191 @@
 #!/usr/bin/env bash
 # seed-test-accounts.sh — provision the 3 pentester accounts on a target
 # environment (staging only ; refuses to run against prod).
 #
 # Per docs/PENTEST_SCOPE_2026.md §"Authentication context", an external
 # pentest engagement needs three pre-seeded accounts (listener, creator,
 # admin). This script :
 #
 #   1. Generates a 32-char random password for each role.
 #   2. Calls the staging admin API to create / reset each account.
 #   3. Promotes creator to creator, admin to admin (via direct DB UPDATE
 #      because the public API doesn't expose role changes — operator
 #      runs that step from a maintenance shell).
 #   4. Writes a 1Password import JSON to stdout so the operator can
 #      `op item template` it into the shared vault. NEVER prints
 #      passwords to the screen.
 #
 # Usage :
 #   bash scripts/pentest/seed-test-accounts.sh staging
 #
 # Output :
 #   1Password JSON on stdout (3 entries). Pipe into a file, then
 #   `op item create --vault Pentest-2026 - < file.json`.
 #
 # Exit codes :
 #   0  — three accounts provisioned, JSON emitted
 #   1  — API call failed (account creation or login probe)
 #   2  — wrong target environment (e.g. operator passed "prod")
 #   3  — required env var or tool missing
 set -euo pipefail
 ENV_NAME=${1:-}
 if [ -z "$ENV_NAME" ]; then
  cat >&2 <<EOF
 usage : bash scripts/pentest/seed-test-accounts.sh <env>
  env  : staging   (the only accepted value — prod is refused)
 Required env vars :
  STAGING_URL              base URL (e.g. https://staging.veza.fr)
  STAGING_ADMIN_EMAIL      admin who creates the accounts
  STAGING_ADMIN_PASSWORD   admin password (provisioning cred only)
 Output :
  1Password import JSON for vault Pentest-2026, on stdout.
  Passwords are NEVER printed to the operator's screen.
 EOF
  exit 3
 fi
 if [ "$ENV_NAME" != "staging" ]; then
  echo "ERROR: this script refuses to run against any env other than 'staging'." >&2
  echo "       Pentest accounts on production violate the engagement scope." >&2
  exit 2
 fi
 STAGING_URL=${STAGING_URL:-?}
 STAGING_ADMIN_EMAIL=${STAGING_ADMIN_EMAIL:-?}
 STAGING_ADMIN_PASSWORD=${STAGING_ADMIN_PASSWORD:-?}
 [ "$STAGING_URL"            = "?" ] && { echo "STAGING_URL required" >&2; exit 3; }
 [ "$STAGING_ADMIN_EMAIL"    = "?" ] && { echo "STAGING_ADMIN_EMAIL required" >&2; exit 3; }
 [ "$STAGING_ADMIN_PASSWORD" = "?" ] && { echo "STAGING_ADMIN_PASSWORD required" >&2; exit 3; }
 command -v curl    >/dev/null 2>&1 || { echo "curl required" >&2; exit 3; }
 command -v jq      >/dev/null 2>&1 || { echo "jq required"   >&2; exit 3; }
 command -v openssl >/dev/null 2>&1 || { echo "openssl required (password generation)" >&2; exit 3; }
 genpass() {
  # 32-char password from base64-encoded 24 bytes of entropy. URL-safe
  # so it can land in a JSON string without escaping.
  openssl rand -base64 24 | tr -d '\n=/+' | cut -c-32
 }
 # 1. login as the staging admin so we can call the create-user endpoint.
 admin_login_resp=$(curl -ksS --max-time 15 \
  -X POST -H 'Content-Type: application/json' \
  -d "{\"email\":\"${STAGING_ADMIN_EMAIL}\",\"password\":\"${STAGING_ADMIN_PASSWORD}\",\"remember_me\":false}" \
  "${STAGING_URL}/api/v1/auth/login")
 admin_token=$(echo "$admin_login_resp" | jq -r '.data.token.access_token // .token.access_token // ""')
 if [ -z "$admin_token" ] || [ "$admin_token" = "null" ]; then
  echo "ERROR: admin login failed" >&2
  echo "$admin_login_resp" >&2
  exit 1
 fi
 provision() {
  # provision <role> <email-prefix>
  # Returns : password (stdout), nothing else.
  local role=$1 email_prefix=$2
  local email="${email_prefix}@veza.fr"
  local password
  password=$(genpass)
  # Try creating ; if 409 (already exists), reset password instead. Both
  # paths return a valid (email, password) tuple at the end.
  local create_resp create_status
  create_resp=$(curl -ksS --max-time 15 \
    -H "Authorization: Bearer ${admin_token}" \
    -H 'Content-Type: application/json' \
    -X POST \
    -d "{\"email\":\"${email}\",\"password\":\"${password}\",\"username\":\"${email_prefix}\",\"role\":\"${role}\"}" \
    -w '\nHTTP_CODE=%{http_code}' \
    "${STAGING_URL}/api/v1/admin/users")
  create_status=$(echo "$create_resp" | grep -oE 'HTTP_CODE=[0-9]+' | tail -1 | cut -d= -f2)
  case "$create_status" in
    200|201)
      ;;
    409)
      # Account exists — reset password instead.
      curl -ksS --max-time 15 \
        -H "Authorization: Bearer ${admin_token}" \
        -H 'Content-Type: application/json' \
        -X POST \
        -d "{\"email\":\"${email}\",\"new_password\":\"${password}\"}" \
        "${STAGING_URL}/api/v1/admin/users/reset-password" >/dev/null
      ;;
    *)
      echo "ERROR: provisioning ${role} failed with HTTP ${create_status}" >&2
      echo "$create_resp" >&2
      exit 1
      ;;
  esac
  # Probe : login as the freshly-set account so we know the engagement
  # can use it.
  probe=$(curl -ksS --max-time 15 \
    -X POST -H 'Content-Type: application/json' \
    -d "{\"email\":\"${email}\",\"password\":\"${password}\",\"remember_me\":false}" \
    "${STAGING_URL}/api/v1/auth/login")
  probe_token=$(echo "$probe" | jq -r '.data.token.access_token // .token.access_token // ""')
  if [ -z "$probe_token" ] || [ "$probe_token" = "null" ]; then
    echo "ERROR: ${role} login probe failed — provisioning broken" >&2
    exit 1
  fi
  printf '%s' "$password"
 }
 # 2. provision the three roles. Passwords stay in shell variables — no
 # echo, no log, no temp file.
 listener_pwd=$(provision "user"    "pentest-2026-listener")
 creator_pwd=$(provision "creator"  "pentest-2026-creator")
 admin_pwd=$(provision    "admin"   "pentest-2026-admin")
 # 3. emit 1Password JSON template. Each entry has the role + login URL
 # in Notes so the pentester knows which account does what.
 cat <<EOF
 [
  {
    "title": "pentest-2026-listener",
    "category": "LOGIN",
    "vault": {"name": "Pentest-2026"},
    "fields": [
      {"id": "username",   "type": "STRING",   "value": "pentest-2026-listener@veza.fr"},
      {"id": "password",   "type": "CONCEALED", "value": "${listener_pwd}"},
      {"id": "url",        "type": "URL",      "value": "${STAGING_URL}/login"},
      {"id": "notesPlain", "type": "STRING",   "value": "Pentest 2026 — listener role. Engagement: see PENTEST_SCOPE_2026.md. Rotate at engagement end."}
    ]
  },
  {
    "title": "pentest-2026-creator",
    "category": "LOGIN",
    "vault": {"name": "Pentest-2026"},
    "fields": [
      {"id": "username",   "type": "STRING",   "value": "pentest-2026-creator@veza.fr"},
      {"id": "password",   "type": "CONCEALED", "value": "${creator_pwd}"},
      {"id": "url",        "type": "URL",      "value": "${STAGING_URL}/login"},
      {"id": "notesPlain", "type": "STRING",   "value": "Pentest 2026 — creator role. Owns 5 seed tracks. Rotate at engagement end."}
    ]
  },
  {
    "title": "pentest-2026-admin",
    "category": "LOGIN",
    "vault": {"name": "Pentest-2026"},
    "fields": [
      {"id": "username",   "type": "STRING",   "value": "pentest-2026-admin@veza.fr"},
      {"id": "password",   "type": "CONCEALED", "value": "${admin_pwd}"},
      {"id": "url",        "type": "URL",      "value": "${STAGING_URL}/login"},
      {"id": "notesPlain", "type": "STRING",   "value": "Pentest 2026 — admin role + MFA bypass. DO NOT use for non-pentest activity. Rotate at engagement end."}
    ]
  }
 ]
 EOF
 echo "" >&2
 echo "  3 accounts provisioned + login-probed against ${STAGING_URL}" >&2
 echo "  next: pipe stdout to a file and run" >&2
 echo "        op item create --vault Pentest-2026 - < <file>" >&2
 echo "  THEN rotate each entry with op item edit --generate-password=letters,digits,32" >&2
 echo "       at engagement end (this script does NOT auto-rotate)." >&2
--- a/scripts/security/game-day-driver.sh
+++ b/scripts/security/game-day-driver.sh
@ -16,18 +16,26 @@
 #   E : test_rabbitmq_outage.sh     — stop RabbitMQ 60s, backend stays up
 #
 # Usage :
-#   bash scripts/security/game-day-driver.sh           # run all scenarios
+#   bash scripts/security/game-day-driver.sh                                 # all scenarios on staging (default)
-#   SKIP=DE bash scripts/security/game-day-driver.sh   # skip scenarios D + E
+#   SKIP=DE bash scripts/security/game-day-driver.sh                         # skip D + E
-#   ONLY=A bash scripts/security/game-day-driver.sh    # only run scenario A
+#   ONLY=A bash scripts/security/game-day-driver.sh                          # only A
 #   INVENTORY=prod CONFIRM_PROD=1 bash scripts/security/game-day-driver.sh   # prod (gated)
 #
 # Required env (passed through to the underlying smoke tests) :
 #   REDIS_PASS / SENTINEL_PASS for scenario C
 #   MINIO_ROOT_USER / MINIO_ROOT_PASSWORD for scenario D
 #
 # v1.0.10 polish — production gating :
 #   INVENTORY=prod must be paired with CONFIRM_PROD=1 or the script
 #   refuses to run, so a stale shell-history line can't accidentally
 #   kill prod Postgres on a Monday morning. The driver also runs a
 #   backup-freshness pre-flight when targeting prod (most recent
 #   pgBackRest backup must be < 24 h old).
 #
 # Exit codes :
 #   0  — every selected scenario passed
 #   1  — at least one scenario failed
-#   2  — runner pre-flight failed (script missing, etc.)
+#   2  — runner pre-flight failed (script missing, prod safety guard tripped, stale backup, etc.)
 set -euo pipefail
 REPO_ROOT="$(cd "$(dirname "$0")/../.." && pwd)"
@ -41,6 +49,9 @@ mkdir -p "$LOGS_DIR"
 ONLY=${ONLY:-}
 SKIP=${SKIP:-}
 INVENTORY=${INVENTORY:-staging}
 CONFIRM_PROD=${CONFIRM_PROD:-0}
 SKIP_BACKUP_FRESHNESS=${SKIP_BACKUP_FRESHNESS:-0}
 log()  { printf '[%s] %s\n' "$(date +%H:%M:%S)" "$*" | tee -a "$SESSION_LOG" >&2; }
 fail() { log "FAIL: $*"; exit "${2:-2}"; }
@ -68,6 +79,101 @@ want() {
  return 0
 }
 # v1.0.10 polish — prod safety gate. INVENTORY=prod requires
 # CONFIRM_PROD=1 + an interactive type-the-word confirm. Anything else
 # defaults to staging so a forgotten env-var doesn't matter.
 case "$INVENTORY" in
  staging|stg|dev|local) ;;
  prod|production)
    if [ "$CONFIRM_PROD" != "1" ]; then
      cat >&2 <<EOF
 ================================================================
 ABORTING — INVENTORY=prod without CONFIRM_PROD=1
 ================================================================
 This script will kill production services. Each scenario triggers a
 real outage in the chosen inventory : Postgres primary kill, HAProxy
 backend stop, Redis master kill, MinIO node loss, RabbitMQ stop.
 To run on production, you must :
  1. Announce a maintenance window 24 h ahead (status page +
     #engineering channel).
  2. Set PagerDuty to maintenance mode for the affected services.
  3. Confirm pgBackRest's last backup is < 24 h old (this script
     auto-checks if you don't pass SKIP_BACKUP_FRESHNESS=1).
  4. Re-invoke with :
       INVENTORY=prod CONFIRM_PROD=1 \\
         bash scripts/security/game-day-driver.sh
 The driver will then ask for one more interactive confirmation
 (type the word KILL-PROD) before the first scenario fires.
 ================================================================
 EOF
      exit 2
    fi
    # Backup-freshness pre-flight : refuse to run if the most recent
    # pgBackRest full/diff is > 24 h old. Recovery from a stale backup
    # can extend an outage from minutes to hours, so the cost of
    # postponing the game day is much less than the cost of compounded
    # data loss if scenario A fails to recover and we have to restore
    # from yesterday-but-one.
    if [ "$SKIP_BACKUP_FRESHNESS" != "1" ]; then
      if command -v pgbackrest >/dev/null 2>&1; then
        last_backup_ts=$(pgbackrest --stanza=veza info --output=json 2>/dev/null \
          | python3 -c "
 import json, sys
 try:
    data = json.load(sys.stdin)
    backups = data[0]['backup'] if data else []
    if not backups: print(0); sys.exit(0)
    print(max(b['timestamp']['stop'] for b in backups))
 except Exception:
    print(0)
 " 2>/dev/null || echo 0)
        now_ts=$(date +%s)
        age_seconds=$(( now_ts - last_backup_ts ))
        if [ "$last_backup_ts" -eq 0 ]; then
          fail "pgBackRest backup-freshness check failed : could not parse 'pgbackrest info'. Set SKIP_BACKUP_FRESHNESS=1 to override (only after manually verifying a recent backup exists)." 2
        fi
        if [ "$age_seconds" -gt 86400 ]; then
          age_hours=$(( age_seconds / 3600 ))
          fail "pgBackRest most recent backup is ${age_hours}h old (threshold 24h). Run a backup before the game day, or set SKIP_BACKUP_FRESHNESS=1 if you've validated freshness another way." 2
        fi
        log "pre-flight : pgBackRest most recent backup is $(( age_seconds / 3600 ))h $(( (age_seconds % 3600) / 60 ))m old (< 24h threshold) — OK"
      else
        log "WARN : pgbackrest CLI not on \$PATH ; skipping backup-freshness check. Set SKIP_BACKUP_FRESHNESS=1 to silence this warning if intentional."
      fi
    fi
    # Final type-the-word confirm. Everything above can be set in env
    # by mistake ; this last step requires a human at the keyboard.
    cat >&2 <<EOF
 ================================================================
 PROD GAME DAY — final confirmation
 ================================================================
  inventory : prod
  scenarios : ${SCENARIOS[*]}${ONLY:+   (filtered by ONLY=$ONLY)}${SKIP:+   (filtered by SKIP=$SKIP)}
  session   : $SESSION_LOG
 Each scenario triggers a real outage. Type the literal phrase
 KILL-PROD (any other input aborts) to proceed :
 EOF
    read -r confirm_phrase
    if [ "$confirm_phrase" != "KILL-PROD" ]; then
      fail "operator did not confirm KILL-PROD ($confirm_phrase) — aborting" 2
    fi
    ;;
  *)
    fail "INVENTORY=$INVENTORY not recognised — must be one of staging|prod" 2
    ;;
 esac
 # Pre-flight : every selected scenario script must exist + be executable.
 for s in "${SCENARIOS[@]}"; do
  if want "$s"; then
@ -83,6 +189,7 @@ declare -A SCENARIO_DURATION
 log "================================================================"
 log "Game day session : $SESSION_DATE"
 log "Inventory        : $INVENTORY"
 log "Session log      : $SESSION_LOG"
 log "Scenarios run    : ${SCENARIOS[*]}"
 [ -n "$ONLY" ] && log "ONLY filter      : $ONLY"
--- a/scripts/soft-launch/monitor-checks.sh
+++ b/scripts/soft-launch/monitor-checks.sh
@ -0,0 +1,255 @@
 #!/usr/bin/env bash
 # monitor-checks.sh — poll the soft-launch acceptance gate live during
 # the bêta window so the operator gets a heads-up before the decision
 # call instead of discovering at 18:00 UTC that one threshold is red.
 #
 # Acceptance gate (per docs/SOFT_LAUNCH_BETA_2026.md §"Acceptance gate") :
 #   - ≥ 50 testers signed up (used_at != NULL on beta_invites)
 #   - 0 P1 events in Sentry today
 #   - Status page green for the last 4 h
 #   - Synthetic parcours all green for 6 h
 #   - Nightly k6 load test green
 #   - < 3 HIGH-severity issues reported
 #
 # v1.0.10 Cluster 3.4.
 #
 # Usage :
 #   DATABASE_URL=postgres://... \
 #   SENTRY_AUTH_TOKEN=... \
 #   STATUSPAGE_URL=https://status.veza.fr \
 #   PROM_URL=https://prom.veza.fr \
 #   bash scripts/soft-launch/monitor-checks.sh
 #
 # By default the script runs once and exits with the gate's verdict.
 # Run it from cron (e.g. every 30 min) or pass LOOP=1 to keep checking
 # in-place every CHECK_INTERVAL seconds (default 600 = 10 min).
 #
 # Optional env :
 #   LOOP=1          continuous mode
 #   CHECK_INTERVAL  seconds between checks in LOOP mode (default 600)
 #   QUIET=1         only emit the verdict line (for cron piping)
 #   THRESHOLD_TESTERS  override 50 (default), e.g. set to 100 for
 #                       a stricter sub-window
 #
 # Exit codes :
 #   0  — every gate green
 #   1  — at least one gate red
 #   2  — at least one gate could not be checked (collector down,
 #         token wrong, etc.) — operator must verify manually
 #   3  — required env / tool missing
 set -euo pipefail
 DATABASE_URL=${DATABASE_URL:-?}
 SENTRY_AUTH_TOKEN=${SENTRY_AUTH_TOKEN:-?}
 STATUSPAGE_URL=${STATUSPAGE_URL:-https://status.veza.fr}
 PROM_URL=${PROM_URL:-?}
 LOOP=${LOOP:-0}
 CHECK_INTERVAL=${CHECK_INTERVAL:-600}
 QUIET=${QUIET:-0}
 THRESHOLD_TESTERS=${THRESHOLD_TESTERS:-50}
 [ "$DATABASE_URL"      = "?" ] && { echo "DATABASE_URL required" >&2; exit 3; }
 [ "$SENTRY_AUTH_TOKEN" = "?" ] && { echo "SENTRY_AUTH_TOKEN required (read scope sufficient)" >&2; exit 3; }
 [ "$PROM_URL"          = "?" ] && { echo "PROM_URL required" >&2; exit 3; }
 command -v psql >/dev/null 2>&1 || { echo "psql required" >&2; exit 3; }
 command -v curl >/dev/null 2>&1 || { echo "curl required" >&2; exit 3; }
 command -v jq   >/dev/null 2>&1 || { echo "jq required"   >&2; exit 3; }
 # ----------------------------------------------------------------------
 # Individual gate checks. Each prints "✅ <name>" / "🔴 <name>" / "⚪ <name>"
 # (last for "could not check"), and sets one of GATE_*_OK to 0 / 1 / 2.
 # ----------------------------------------------------------------------
 GATE_TESTERS_OK=2
 GATE_SENTRY_OK=2
 GATE_STATUSPAGE_OK=2
 GATE_SYNTHETIC_OK=2
 GATE_K6_OK=2
 GATE_ISSUES_OK=2
 check_testers() {
  local count
  count=$(psql "$DATABASE_URL" -A -t -c "
    SELECT count(*) FROM beta_invites WHERE used_at IS NOT NULL;
  " 2>/dev/null | tr -d ' ' || echo "?")
  if [ "$count" = "?" ] || ! [[ "$count" =~ ^[0-9]+$ ]]; then
    echo "⚪ testers signed-up : check failed (psql)"
    GATE_TESTERS_OK=2
    return
  fi
  if [ "$count" -ge "$THRESHOLD_TESTERS" ]; then
    echo "✅ testers signed-up : $count / $THRESHOLD_TESTERS"
    GATE_TESTERS_OK=0
  else
    echo "🔴 testers signed-up : $count / $THRESHOLD_TESTERS"
    GATE_TESTERS_OK=1
  fi
 }
 check_sentry_p1() {
  # Sentry API : count of unresolved P1 issues last 24h.
  local count
  count=$(curl -s -H "Authorization: Bearer $SENTRY_AUTH_TOKEN" \
    "https://sentry.io/api/0/projects/veza/veza-backend/issues/?statsPeriod=24h&query=is:unresolved%20level:fatal" \
    2>/dev/null | jq 'length' 2>/dev/null || echo "?")
  if [ "$count" = "?" ] || ! [[ "$count" =~ ^[0-9]+$ ]]; then
    echo "⚪ Sentry P1 events 24h : check failed (auth or network)"
    GATE_SENTRY_OK=2
    return
  fi
  if [ "$count" -eq 0 ]; then
    echo "✅ Sentry P1 events 24h : 0"
    GATE_SENTRY_OK=0
  else
    echo "🔴 Sentry P1 events 24h : $count (must be 0)"
    GATE_SENTRY_OK=1
  fi
 }
 check_statuspage() {
  local status
  status=$(curl -s "$STATUSPAGE_URL/api/v1/status" 2>/dev/null \
    | jq -r '.indicator // .status.indicator // ""' 2>/dev/null || echo "")
  case "$status" in
    none|operational)
      echo "✅ status page : $status (green)"
      GATE_STATUSPAGE_OK=0
      ;;
    minor|major|critical)
      echo "🔴 status page : $status"
      GATE_STATUSPAGE_OK=1
      ;;
    *)
      echo "⚪ status page : check failed (got '$status')"
      GATE_STATUSPAGE_OK=2
      ;;
  esac
 }
 check_synthetic() {
  # PromQL : sum of probe_success over the last 6h ; expect every
  # parcours at 1 (success).
  local query='probe_success{probe_kind="synthetic"} == 0'
  local resp
  resp=$(curl -s --get "$PROM_URL/api/v1/query" \
    --data-urlencode "query=$query" 2>/dev/null)
  local result_count
  result_count=$(echo "$resp" | jq '.data.result | length' 2>/dev/null || echo "?")
  if [ "$result_count" = "?" ] || ! [[ "$result_count" =~ ^[0-9]+$ ]]; then
    echo "⚪ synthetic parcours : check failed (Prometheus)"
    GATE_SYNTHETIC_OK=2
    return
  fi
  if [ "$result_count" -eq 0 ]; then
    echo "✅ synthetic parcours : all green"
    GATE_SYNTHETIC_OK=0
  else
    local failing
    failing=$(echo "$resp" | jq -r '.data.result[].metric.parcours' 2>/dev/null | tr '\n' ',' | sed 's/,$//')
    echo "🔴 synthetic parcours : $result_count failing ($failing)"
    GATE_SYNTHETIC_OK=1
  fi
 }
 check_k6_nightly() {
  # k6 nightly is exposed as veza_k6_nightly_last_success_timestamp_seconds
  # by the Forgejo runner workflow's textfile-collector. Reading via Prom
  # gives "is the last success < 30h old?".
  local query='time() - veza_k6_nightly_last_success_timestamp_seconds'
  local resp age
  resp=$(curl -s --get "$PROM_URL/api/v1/query" \
    --data-urlencode "query=$query" 2>/dev/null)
  age=$(echo "$resp" | jq -r '.data.result[0].value[1] // ""' 2>/dev/null)
  if [ -z "$age" ] || [ "$age" = "null" ]; then
    echo "⚪ k6 nightly : check failed (metric absent — runner offline?)"
    GATE_K6_OK=2
    return
  fi
  age_int=$(printf '%.0f' "$age" 2>/dev/null || echo 999999)
  if [ "$age_int" -lt 108000 ]; then  # 30h
    echo "✅ k6 nightly : last success $(( age_int / 3600 ))h ago"
    GATE_K6_OK=0
  else
    echo "🔴 k6 nightly : last success $(( age_int / 3600 ))h ago (> 30h)"
    GATE_K6_OK=1
  fi
 }
 check_high_issues() {
  # The operator-reported issues count lives in the SOFT_LAUNCH_BETA_2026.md
  # report under "Issues reported". Without an external tracker we read it
  # from a known location in the report file. Skip if file absent.
  local report
  report="$(cd "$(dirname "$0")/../.." && pwd)/docs/SOFT_LAUNCH_BETA_2026.md"
  if [ ! -f "$report" ]; then
    echo "⚪ HIGH issues count : report file not found"
    GATE_ISSUES_OK=2
    return
  fi
  local count
  count=$(grep -cE '^\| HIGH ' "$report" 2>/dev/null || echo 0)
  if [ "$count" -lt 3 ]; then
    echo "✅ HIGH-severity issues reported : $count / < 3"
    GATE_ISSUES_OK=0
  else
    echo "🔴 HIGH-severity issues reported : $count / < 3"
    GATE_ISSUES_OK=1
  fi
 }
 # ----------------------------------------------------------------------
 # Main loop
 # ----------------------------------------------------------------------
 run_once() {
  if [ "$QUIET" != "1" ]; then
    echo "================================================================"
    echo "Acceptance gate check — $(date -u +'%Y-%m-%d %H:%M:%S UTC')"
    echo "----------------------------------------------------------------"
  fi
  check_testers
  check_sentry_p1
  check_statuspage
  check_synthetic
  check_k6_nightly
  check_high_issues
  if [ "$QUIET" != "1" ]; then
    echo "----------------------------------------------------------------"
  fi
  local red=0 unknown=0
  for v in "$GATE_TESTERS_OK" "$GATE_SENTRY_OK" "$GATE_STATUSPAGE_OK" \
           "$GATE_SYNTHETIC_OK" "$GATE_K6_OK" "$GATE_ISSUES_OK"; do
    case $v in
      1) red=$(( red + 1 )) ;;
      2) unknown=$(( unknown + 1 )) ;;
    esac
  done
  if [ "$red" -eq 0 ] && [ "$unknown" -eq 0 ]; then
    echo "VERDICT : ALL GATES GREEN — soft-launch is GO"
    return 0
  elif [ "$red" -gt 0 ]; then
    echo "VERDICT : $red gate(s) RED — NO-GO until resolved"
    return 1
  else
    echo "VERDICT : $unknown gate(s) UNCHECKABLE — operator must verify manually before decision call"
    return 2
  fi
 }
 if [ "$LOOP" != "1" ]; then
  run_once
  exit $?
 fi
 # Continuous mode.
 while true; do
  run_once || true
  echo ""
  echo "next check in ${CHECK_INTERVAL}s — Ctrl-C to exit"
  sleep "$CHECK_INTERVAL"
 done
--- a/scripts/soft-launch/send-invitations.sh
+++ b/scripts/soft-launch/send-invitations.sh
@ -0,0 +1,179 @@
 #!/usr/bin/env bash
 # send-invitations.sh — batch-insert beta invitations from a validated
 # cohort CSV, generate unique invite codes, render personalised email
 # bodies, optionally dispatch via SMTP.
 #
 # Wraps the validate-cohort.sh sanity check + a transactional INSERT
 # into beta_invites + a per-recipient email render. Splits "generate
 # the codes + render the emails" from "actually send" so a dry-run
 # produces a flat directory of `.eml` files the operator can review
 # before dispatch.
 #
 # v1.0.10 Cluster 3.4.
 #
 # Usage :
 #   # Step 1 : dry-run (default). Inserts beta_invites rows, emits
 #   # eml files but does NOT send anything.
 #   DATABASE_URL=postgres://... \
 #     bash scripts/soft-launch/send-invitations.sh path/to/cohort.csv
 #
 #   # Step 2 : after reviewing the eml files, dispatch with msmtp /
 #   # sendmail / aws-ses-cli (or whatever SEND_CMD points at).
 #   SEND=1 SEND_CMD='msmtp -t' \
 #     bash scripts/soft-launch/send-invitations.sh path/to/cohort.csv
 #
 # Required env :
 #   DATABASE_URL    Postgres URL (read+write to beta_invites)
 #   FRONTEND_URL    base URL the invite link points at
 #                   (e.g. https://staging.veza.fr)
 #
 # Optional env :
 #   SEND=1          actually dispatch ; otherwise dry-run (eml only)
 #   SEND_CMD        sendmail-compatible command (default: 'msmtp -t')
 #   SENT_BY_EMAIL   operator email for the beta_invites.sent_by FK ;
 #                   defaults to the value in the CSV's third column
 #   FROM_ADDR       From: header (default: invitations@veza.fr)
 #   SUBJECT         email subject (default: 'You're in for the Veza beta')
 #   TEMPLATE        path to eml template (default:
 #                   templates/email/beta_invite.html.template)
 #   FORCE=1         skip validate-cohort.sh failures (use with care)
 #
 # Exit codes :
 #   0  — everything succeeded
 #   1  — cohort validation failed (see validate-cohort.sh)
 #   2  — DB transaction failed
 #   3  — required env missing
 #   4  — dispatch failed for at least one recipient (see logs)
 set -euo pipefail
 REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
 CSV=${1:-}
 if [ -z "$CSV" ] || [ ! -f "$CSV" ]; then
  echo "usage: bash scripts/soft-launch/send-invitations.sh path/to/cohort.csv" >&2
  exit 3
 fi
 DATABASE_URL=${DATABASE_URL:-?}
 FRONTEND_URL=${FRONTEND_URL:-?}
 [ "$DATABASE_URL" = "?" ] && { echo "DATABASE_URL required" >&2; exit 3; }
 [ "$FRONTEND_URL" = "?" ] && { echo "FRONTEND_URL required" >&2; exit 3; }
 SEND=${SEND:-0}
 SEND_CMD=${SEND_CMD:-msmtp -t}
 FROM_ADDR=${FROM_ADDR:-invitations@veza.fr}
 SUBJECT=${SUBJECT:-Vous êtes admis dans la bêta Veza}
 TEMPLATE=${TEMPLATE:-$REPO_ROOT/templates/email/beta_invite.eml.template}
 FORCE=${FORCE:-0}
 SESSION_DATE="$(date +%Y%m%d-%H%M)"
 OUTDIR="$REPO_ROOT/scripts/soft-launch/out-${SESSION_DATE}"
 command -v psql    >/dev/null 2>&1 || { echo "psql required" >&2; exit 3; }
 command -v openssl >/dev/null 2>&1 || { echo "openssl required" >&2; exit 3; }
 # Step 1 — validate the cohort. Bypass with FORCE=1 if needed.
 echo "→ validating cohort $CSV"
 if ! bash "$(dirname "$0")/validate-cohort.sh" "$CSV"; then
  if [ "$FORCE" != "1" ]; then
    echo "ERROR: cohort validation failed. Re-run with FORCE=1 to bypass (not recommended)." >&2
    exit 1
  fi
  echo "WARN : cohort validation reported issues but FORCE=1 set — proceeding."
 fi
 mkdir -p "$OUTDIR"
 echo "→ output dir $OUTDIR"
 # Step 2 — generate codes + insert rows + render emails. Each insert
 # is one transaction so a partial failure leaves consistent state.
 gen_code() {
  # 16-char base32-ish (no 0/1/I/L) so codes are paste-friendly.
  openssl rand -hex 16 | tr 'a-f0-9' 'a-z2-9' \
    | tr -d 'oilOIL01' | head -c 16
 }
 if [ ! -f "$TEMPLATE" ]; then
  echo "ERROR: template $TEMPLATE not found." >&2
  exit 3
 fi
 inserted=0
 failed=0
 failed_emails=()
 while IFS=, read -r email cohort sent_by_email; do
  email=$(echo "$email" | tr -d '\r' | xargs)
  cohort=$(echo "$cohort" | tr -d '\r' | xargs)
  sent_by_email=$(echo "$sent_by_email" | tr -d '\r' | xargs)
  code=$(gen_code)
  # Resolve sent_by user_id (may be NULL if operator email isn't a
  # registered user — e.g. ops shared mailbox).
  sent_by_id=$(psql "$DATABASE_URL" -A -t -c "
    SELECT id::text FROM users WHERE email = '$sent_by_email' LIMIT 1;
  " 2>/dev/null | tr -d ' ' || echo "")
  if [ -z "$sent_by_id" ]; then
    sent_by_clause="NULL"
  else
    sent_by_clause="'$sent_by_id'"
  fi
  if ! psql "$DATABASE_URL" -1 -c "
    INSERT INTO beta_invites (code, email, cohort, sent_by, expires_at)
    VALUES ('$code', '$email', '$cohort', $sent_by_clause, NOW() + INTERVAL '30 days');
  " >/dev/null 2>&1; then
    failed=$(( failed + 1 ))
    failed_emails+=("$email")
    continue
  fi
  inserted=$(( inserted + 1 ))
  # Render the eml — operator-readable, ready for SEND_CMD.
  eml="$OUTDIR/${email//[^a-zA-Z0-9._-]/_}.eml"
  invite_url="$FRONTEND_URL/signup?invite=$code"
  sed \
    -e "s|{{TO_ADDR}}|$email|g" \
    -e "s|{{FROM_ADDR}}|$FROM_ADDR|g" \
    -e "s|{{SUBJECT}}|$SUBJECT|g" \
    -e "s|{{INVITE_URL}}|$invite_url|g" \
    -e "s|{{INVITE_CODE}}|$code|g" \
    -e "s|{{COHORT}}|$cohort|g" \
    -e "s|{{FRONTEND_URL}}|$FRONTEND_URL|g" \
    "$TEMPLATE" > "$eml"
 done < <(tail -n +2 "$CSV")
 echo "→ inserted $inserted invitations into beta_invites"
 echo "→ rendered $inserted emails to $OUTDIR"
 [ "$failed" -gt 0 ] && {
  echo "WARN : $failed insert(s) failed — see logs above"
  for e in "${failed_emails[@]}"; do echo "  - $e"; done
 }
 # Step 3 — optionally dispatch.
 if [ "$SEND" != "1" ]; then
  echo ""
  echo "DRY-RUN — review the eml files in $OUTDIR before sending."
  echo "When ready :"
  echo "  SEND=1 SEND_CMD='$SEND_CMD' bash scripts/soft-launch/send-invitations.sh $CSV"
  exit 0
 fi
 echo "→ dispatching via : $SEND_CMD"
 dispatch_failed=0
 for eml in "$OUTDIR"/*.eml; do
  if ! $SEND_CMD < "$eml" >>"$OUTDIR/dispatch.log" 2>&1; then
    dispatch_failed=$(( dispatch_failed + 1 ))
    echo "  FAIL : $eml" | tee -a "$OUTDIR/dispatch.log"
  fi
 done
 echo ""
 if [ "$dispatch_failed" -gt 0 ]; then
  echo "WARN : $dispatch_failed dispatch(es) failed — see $OUTDIR/dispatch.log"
  exit 4
 fi
 echo "PASS : all $inserted invitations dispatched."
 echo "Track redemption with :"
 echo "  psql \"\$DATABASE_URL\" -c 'SELECT cohort, count(*) FILTER (WHERE used_at IS NOT NULL) AS redeemed, count(*) AS total FROM beta_invites GROUP BY cohort ORDER BY cohort;'"
--- a/scripts/soft-launch/validate-cohort.sh
+++ b/scripts/soft-launch/validate-cohort.sh
@ -0,0 +1,173 @@
 #!/usr/bin/env bash
 # validate-cohort.sh — sanity-check a soft-launch beta cohort CSV
 # before it gets fed to send-invitations.sh.
 #
 # The CSV is the operator's curated list of beta-tester emails +
 # segmentation. This script catches the avoidable mistakes BEFORE
 # we batch-insert 100 rows into beta_invites and start spraying
 # emails :
 #
 #   - Empty file or wrong header
 #   - Duplicate emails (would create 2 invites for the same person)
 #   - Malformed emails (missing @, leading/trailing whitespace)
 #   - Cohort distribution looks off (no creators, only one segment,
 #     under-50 total — soft-launch acceptance gate is ≥50 testers)
 #   - Email collisions with existing users (already registered = the
 #     invite code is wasted)
 #
 # v1.0.10 Cluster 3.4.
 #
 # Usage :
 #   bash scripts/soft-launch/validate-cohort.sh path/to/cohort.csv
 #
 # Optional env :
 #   DATABASE_URL  if set, also checks for collisions with the users
 #                 table (email already registered → flagged but not
 #                 fatal — operator may want to invite an existing
 #                 user back to test the new flows).
 #   MIN_COHORT    minimum total rows required (default 50, matches the
 #                 acceptance-gate threshold in SOFT_LAUNCH_BETA_2026.md).
 #   MIN_CREATORS  minimum number of creator-* cohort rows (default 5).
 #
 # Exit codes :
 #   0  — cohort valid
 #   1  — cohort malformed (will block send-invitations.sh)
 #   2  — cohort merely warns (size below minimum, missing collision
 #                              check) ; operator may proceed with --force
 set -euo pipefail
 CSV=${1:-}
 if [ -z "$CSV" ] || [ ! -f "$CSV" ]; then
  cat >&2 <<EOF
 usage : bash scripts/soft-launch/validate-cohort.sh path/to/cohort.csv
 CSV format (header required) :
  email,cohort,sent_by_email
  alice@example.com,creator-vinyl,ops@veza.fr
  bob@example.com,listener-jazz,ops@veza.fr
  ...
 cohort labels are free-text but should follow the convention
 <role>-<segment> so the post-launch attribution report groups cleanly.
 EOF
  exit 1
 fi
 MIN_COHORT=${MIN_COHORT:-50}
 MIN_CREATORS=${MIN_CREATORS:-5}
 # 1. Header check.
 header=$(head -1 "$CSV" | tr -d '\r')
 if [ "$header" != "email,cohort,sent_by_email" ]; then
  echo "ERROR: header line must be exactly 'email,cohort,sent_by_email' (got: $header)" >&2
  exit 1
 fi
 # 2. Row count + duplicates + email shape (awk pipeline reads once).
 total=0
 malformed=0
 duplicates=0
 declare -A seen
 declare -A cohort_count
 declare -a malformed_lines
 while IFS=, read -r email cohort sent_by_email; do
  email=$(echo "$email" | tr -d '\r' | xargs)
  cohort=$(echo "$cohort" | tr -d '\r' | xargs)
  total=$(( total + 1 ))
  # Email shape : must contain exactly one @, no whitespace, > 5 chars.
  if [[ ! "$email" =~ ^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$ ]]; then
    malformed=$(( malformed + 1 ))
    malformed_lines+=("  line $(( total + 1 )) : invalid email '$email'")
    continue
  fi
  # Duplicate detection.
  if [ -n "${seen[$email]:-}" ]; then
    duplicates=$(( duplicates + 1 ))
    malformed_lines+=("  line $(( total + 1 )) : duplicate email '$email' (first seen at line ${seen[$email]})")
    continue
  fi
  seen[$email]=$(( total + 1 ))
  # Cohort tally.
  cohort_count[$cohort]=$(( ${cohort_count[$cohort]:-0} + 1 ))
 done < <(tail -n +2 "$CSV")
 echo "----------------------------------------------------------------"
 echo "Cohort validation report"
 echo "----------------------------------------------------------------"
 echo "  CSV file       : $CSV"
 echo "  Total rows     : $total"
 echo "  Unique emails  : ${#seen[@]}"
 echo "  Malformed rows : $malformed"
 echo "  Duplicates     : $duplicates"
 echo ""
 echo "Distribution by cohort :"
 for c in "${!cohort_count[@]}"; do
  printf "  %-40s %d\n" "$c" "${cohort_count[$c]}"
 done | sort
 echo ""
 exit_code=0
 # 3. Hard checks (block send).
 if [ "$malformed" -gt 0 ] || [ "$duplicates" -gt 0 ]; then
  echo "ERROR: $malformed malformed + $duplicates duplicate row(s) — fix before sending."
  for line in "${malformed_lines[@]}"; do
    echo "$line"
  done
  exit 1
 fi
 # 4. Soft checks (warn, don't block — operator decides).
 if [ "$total" -lt "$MIN_COHORT" ]; then
  echo "WARN : cohort has $total rows ; soft-launch acceptance gate is ≥ $MIN_COHORT."
  exit_code=2
 fi
 creator_total=0
 for c in "${!cohort_count[@]}"; do
  if [[ "$c" == creator-* ]]; then
    creator_total=$(( creator_total + cohort_count[$c] ))
  fi
 done
 if [ "$creator_total" -lt "$MIN_CREATORS" ]; then
  echo "WARN : only $creator_total creator-* cohort rows ; goal is ≥ $MIN_CREATORS for upload-flow coverage."
  exit_code=2
 fi
 if [ "${#cohort_count[@]}" -lt 3 ]; then
  echo "WARN : only ${#cohort_count[@]} distinct cohort labels — feedback will be narrow."
  exit_code=2
 fi
 # 5. Optional : DATABASE_URL collision check.
 if [ -n "${DATABASE_URL:-}" ]; then
  command -v psql >/dev/null 2>&1 || {
    echo "WARN : DATABASE_URL set but psql not on \$PATH ; skipping collision check."
    exit_code=2
  }
  if command -v psql >/dev/null 2>&1; then
    emails_csv=$(printf '%s,' "${!seen[@]}" | sed 's/,$//')
    collisions=$(psql "$DATABASE_URL" -A -t -c "
      SELECT count(*) FROM users WHERE email = ANY(string_to_array('$emails_csv', ','));
    " 2>/dev/null | tr -d ' ' || echo "?")
    if [ "$collisions" = "?" ]; then
      echo "WARN : couldn't query users table (psql connection issue) ; skipping collision check."
      exit_code=2
    elif [ "$collisions" -gt 0 ]; then
      echo "INFO : $collisions email(s) in the cohort already exist in the users table — invite codes will be wasted on existing accounts."
      exit_code=2
    fi
  fi
 fi
 echo ""
 case $exit_code in
  0) echo "PASS : cohort valid, ready for send-invitations.sh." ;;
  2) echo "WARN : cohort valid but with caveats — review and re-run with --force from send-invitations.sh if intentional." ;;
 esac
 exit $exit_code
--- a/veza-backend-api/migrations/990_beta_invites.sql
+++ b/veza-backend-api/migrations/990_beta_invites.sql
@ -0,0 +1,65 @@
 -- 990_beta_invites.sql
 -- v1.0.10 polish (Cluster 3.4) — soft-launch beta cohort tracking.
 --
 -- Records each individual invitation sent for the v2.0.0 soft-launch
 -- bêta. Tracks (a) the invite code used in the registration link,
 -- (b) when the recipient redeemed it (NULL until redemption), and
 -- (c) which cohort segment (creator / listener / community-member /
 -- press) the recipient belongs to so the post-launch report can
 -- attribute feedback by audience.
 --
 -- The associated email template + send script live at
 -- scripts/soft-launch/send-invitations.sh and reference this table
 -- via INSERT … RETURNING code.
 --
 -- Privacy : the email column is the only PII here ; no behavioural
 -- data is stored. used_at is the redemption signal. After v2.0.0
 -- public launch, run the cleanup migration in 991 (TBD) to anonymise
 -- the email column for invites that haven't been redeemed in 30+ days.
 CREATE TABLE IF NOT EXISTS public.beta_invites (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    -- The invite code is what the recipient pastes into the signup
    -- form. 16 random characters from a base32 alphabet (no 0/1/I/L
    -- to avoid eyestrain). Generated by send-invitations.sh.
    code        VARCHAR(32) NOT NULL UNIQUE,
    email       VARCHAR(320) NOT NULL,
    -- Free-text label so the cohort generator can carry whatever
    -- segmentation the operator wants (e.g. "creator-vinyl-pressing",
    -- "listener-jazz-mailing-list", "press-pitchfork"). Index below
    -- is for the post-launch report grouping.
    cohort      VARCHAR(64) NOT NULL,
    -- NULL until the recipient signs up. Set by the auth handler
    -- when /auth/register is hit with a valid invite code.
    used_at     TIMESTAMPTZ,
    -- Hard expiry so unredeemed invites can't accumulate forever.
    -- Default 30 days from creation ; soft-launch is short-window.
    expires_at  TIMESTAMPTZ NOT NULL DEFAULT (NOW() + INTERVAL '30 days'),
    -- Operator who sent the invite — useful when reconciling "who
    -- gave their friend a code" during the audit.
    sent_by     UUID REFERENCES public.users(id) ON DELETE SET NULL,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
 );
 COMMENT ON TABLE public.beta_invites IS
    'v2.0.0 soft-launch beta invitation tracking. v1.0.10 Cluster 3.4.';
 COMMENT ON COLUMN public.beta_invites.code IS
    '16-char base32 invite code (no 0/1/I/L). Pasted into signup form.';
 COMMENT ON COLUMN public.beta_invites.cohort IS
    'Free-text cohort label (creator-* / listener-* / press-* / etc.).';
 COMMENT ON COLUMN public.beta_invites.used_at IS
    'Redemption timestamp. NULL means the invite is still pending.';
 -- Lookup by code (signup path) — every /auth/register call reads it.
 CREATE UNIQUE INDEX IF NOT EXISTS idx_beta_invites_code
    ON public.beta_invites(code);
 -- Cohort grouping for the post-launch attribution query.
 CREATE INDEX IF NOT EXISTS idx_beta_invites_cohort
    ON public.beta_invites(cohort);
 -- Pending-invitations sweep — cron job that expires unused invites
 -- after expires_at. Partial index keeps it small.
 CREATE INDEX IF NOT EXISTS idx_beta_invites_pending_expiry
    ON public.beta_invites(expires_at)
    WHERE used_at IS NULL;
--- a/veza-backend-api/templates/email/beta_invite.eml.template
+++ b/veza-backend-api/templates/email/beta_invite.eml.template
@ -0,0 +1,92 @@
 To: {{TO_ADDR}}
 From: Veza <{{FROM_ADDR}}>
 Subject: {{SUBJECT}}
 MIME-Version: 1.0
 Content-Type: multipart/alternative; boundary="--veza-beta-boundary"
 ----veza-beta-boundary
 Content-Type: text/plain; charset="UTF-8"
 Content-Transfer-Encoding: 7bit
 Bonjour,
 Vous êtes invité·e à rejoindre la bêta privée de Veza —
 une plateforme de streaming musical éthique faite pour les
 créateur·ices et les auditeur·ices, sans algorithme de
 recommandation comportementale, sans gamification, sans dark
 patterns.
 Votre code d'invitation : {{INVITE_CODE}}
 Pour vous inscrire :
 {{INVITE_URL}}
 Le code expire dans 30 jours.
 Pendant la bêta, l'idée est simple : utilisez Veza comme vous
 utiliseriez n'importe quelle plateforme musicale. Uploadez,
 écoutez, partagez, achetez. Quand quelque chose vous frustre
 ou vous étonne — en bien comme en mal — dites-le. Le canal
 de retour vous sera communiqué après l'inscription.
 Cohorte : {{COHORT}}
 (C'est juste un tag interne pour qu'on regroupe les retours
 par contexte d'usage. Ça n'affecte ni votre expérience ni vos
 permissions.)
 À très vite,
 L'équipe Veza
 --
 Si vous n'avez pas demandé cette invitation, ignorez ce
 message. Le code expirera automatiquement après 30 jours.
 ----veza-beta-boundary
 Content-Type: text/html; charset="UTF-8"
 Content-Transfer-Encoding: 7bit
 <!DOCTYPE html>
 <html>
 <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Invitation à la bêta Veza</title>
 </head>
 <body style="font-family: Georgia, 'Times New Roman', serif; line-height: 1.6; color: #1a1a1e; margin: 0; padding: 0; background-color: #f8f7f4;">
    <div style="max-width: 600px; margin: 20px auto; padding: 30px; background-color: #ffffff; border: 1px solid #e8e6e0;">
        <h1 style="font-weight: 400; color: #1a1a1e; margin-top: 0; font-size: 28px;">Bienvenue dans la bêta Veza.</h1>
        <p>Bonjour,</p>
        <p>Vous êtes invité·e à rejoindre la <strong>bêta privée</strong> de Veza — une plateforme de streaming musical éthique faite pour les créateur·ices et les auditeur·ices, sans algorithme de recommandation comportementale, sans gamification, sans dark patterns.</p>
        <div style="text-align: center; margin: 35px 0;">
            <a href="{{INVITE_URL}}" style="background-color: #1a1a1e; color: #f8f7f4; padding: 14px 32px; text-decoration: none; display: inline-block; font-weight: 400; letter-spacing: 0.05em;">
                Activer mon invitation
            </a>
        </div>
        <p style="color: #555; font-size: 14px;">Ou collez ce lien dans votre navigateur :</p>
        <p style="word-break: break-all; color: #888; background-color: #f8f7f4; padding: 10px; font-family: 'Courier New', monospace; font-size: 12px; border-left: 2px solid #d4a574;">{{INVITE_URL}}</p>
        <p style="color: #555; font-size: 14px; margin-top: 25px;">Code d'invitation :</p>
        <p style="font-family: 'Courier New', monospace; font-size: 18px; letter-spacing: 0.1em; background-color: #f8f7f4; padding: 12px; text-align: center; color: #1a1a1e;">{{INVITE_CODE}}</p>
        <hr style="border: none; border-top: 1px solid #e8e6e0; margin: 30px 0;">
        <p style="font-size: 14px; color: #555;">Pendant la bêta, l'idée est simple : utilisez Veza comme vous utiliseriez n'importe quelle plateforme musicale. Uploadez, écoutez, partagez, achetez. Quand quelque chose vous frustre ou vous étonne — en bien comme en mal — dites-le. Le canal de retour vous sera communiqué après l'inscription.</p>
        <p style="font-size: 13px; color: #888; margin-top: 25px;">Cohorte : <strong>{{COHORT}}</strong> — c'est juste un tag interne pour qu'on regroupe les retours par contexte d'usage.</p>
        <p style="margin-top: 30px; color: #888; font-size: 12px;">
            Le code expire dans 30 jours. Si vous n'avez pas demandé cette invitation, ignorez ce message.
        </p>
        <hr style="border: none; border-top: 1px solid #e8e6e0; margin: 25px 0;">
        <p style="color: #aaa; font-size: 11px; text-align: center; font-family: 'Courier New', monospace; letter-spacing: 0.1em;">
            VEZA · v2.0.0 BETA · {{FRONTEND_URL}}
        </p>
    </div>
 </body>
 </html>
 ----veza-beta-boundary--
Author	SHA1	Message	Date
senke	112c64a22b	feat(soft-launch): cohort tooling + email template + monitor + checklist Some checks are pending Veza CI / Backend (Go) (push) Waiting to run Details Veza CI / Frontend (Web) (push) Waiting to run Details Veza CI / Rust (Stream Server) (push) Waiting to run Details Veza CI / Notify on failure (push) Blocked by required conditions Details E2E Playwright / e2e (full) (push) Waiting to run Details Security Scan / Secret Scanning (gitleaks) (push) Waiting to run Details The soft-launch report doc (SOFT_LAUNCH_BETA_2026.md) had the narrative — cohort table, email body inline, monitoring list, acceptance gate. But the operational pieces were notes-to-self : "add migration if missing", "Typeform to-do", "schema TBD". The operator was supposed to assemble them on the day, which on a soft- launch day is the worst possible time. Added the missing 6 pieces so the day-of work is "tick boxes", not "build the tooling" : * migrations/990_beta_invites.sql — schema with code (16-char base32-ish), email, cohort label, used_at, expires_at + 30d default, sent_by FK with ON DELETE SET NULL. Three indexes : unique on code (signup-path lookup), cohort (post-launch attribution report), partial expires_at WHERE used_at IS NULL (cleanup cron). * scripts/soft-launch/validate-cohort.sh — sanity check on the operator's CSV : header form, malformed emails, duplicates, cohort distribution (≥50 total / ≥5 creators / ≥3 distinct labels), optional collision check against existing users. Exit codes 0 / 1 (block) / 2 (warn-but-proceed). Hard checks block, soft checks let the operator override with FORCE=1. * scripts/soft-launch/send-invitations.sh — split-phase : step 1 (default) inserts beta_invites rows + renders one .eml per recipient under scripts/soft-launch/out-<date>/ step 2 (SEND=1) dispatches via $SEND_CMD (msmtp by default) so the operator can review the rendered emls before sending 100 emails. Per-recipient transactional INSERT so a partial failure doesn't poison the table. Failed inserts logged with the offending email so the operator can rerun on the subset. * templates/email/beta_invite.eml.template — proper MIME multipart (text + HTML) eml ready for sendmail-compatible piping. French copy aligned with the éthique brand (no FOMO, no urgency manipulation, no "limited spots" framing). * scripts/soft-launch/monitor-checks.sh — polls the 6 acceptance- gate signals defined in SOFT_LAUNCH_BETA_2026.md §"Acceptance gate" : testers signed up, Sentry P1 events, status page, synthetic parcours, k6 nightly age, HIGH issues. Each gate independently emits ✅ / 🔴 / ⚪ (last for "couldn't check"). Verdict on stdout. LOOP=1 keeps polling every CHECK_INTERVAL seconds. Designed for cron + tmux, not for an interactive UI. * docs/SOFT_LAUNCH_BETA_2026_CHECKLIST.md — pre-flight gate that must reach 100% green before the first invitation goes out. T-72h section (database, cohort, email infra, redemption path, monitoring, comms), D-day section (last-hour, send, hour-1, every-4h), 18:00 UTC decision call section. Linked back to the bigger SOFT_LAUNCH_BETA_2026.md so the operator can navigate between the "what" (report) and the "how / has-everything- been-checked" (this checklist) without losing context. What still requires the operator on the day : - Build the cohort CSV (curate emails from real sources) - Create the Typeform feedback form ; paste its URL into the eml template once known - Configure msmtp / sendmail ($SEND_CMD) - Press the send button - Show up at 18:00 UTC for the decision call Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 22:38:12 +02:00
senke	2a5bc11628	fix(scripts,docs): game-day prod safety guards + rabbitmq-down runbook The game-day driver had no notion of inventory — it would happily execute the 5 destructive scenarios (Postgres kill, HAProxy stop, Redis kill, MinIO node loss, RabbitMQ stop) against whatever the underlying scripts pointed at, with the operator's only protection being "don't typo a host." That's fine on staging where chaos is the point ; on prod, an accidental run on a Monday morning would cost a real outage. Added : scripts/security/game-day-driver.sh * INVENTORY env var — defaults to 'staging' so silence stays safe. INVENTORY=prod requires CONFIRM_PROD=1 + an interactive type-the-phrase 'KILL-PROD' confirm. Anything other than staging\|prod aborts. * Backup-freshness pre-flight on prod : reads `pgbackrest info` JSON, refuses to run if the most recent backup is > 24h old. SKIP_BACKUP_FRESHNESS=1 escape hatch, documented inline. * Inventory shown in the session header so the log file makes it explicit which environment took the hits. docs/runbooks/rabbitmq-down.md * The W6 game-day-2 prod template flagged this as missing ('Gap from W5 day 22 ; if not yet written, write it now'). Mirrors the structure of redis-down.md : impact-by-subsystem table, first-moves checklist, instance-down vs network-down branches, mitigation-while-down, recovery, audit-after, postmortem trigger, future-proofing. * Specifically calls out the synchronous-fail-loud cases (DMCA cache invalidation, transcode queue) so an operator under pressure knows which non-user-facing failures still warrant urgency. Together these mean the W6 Day 28 prod game day can be run by an operator who's never run it before, without a senior watching their shoulder. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 22:32:05 +02:00
senke	e780fbcd18	docs(pentest): add send-package SOP + seed-test-accounts helper The pentest scope doc (PENTEST_SCOPE_2026.md) is the technical brief — what's testable, what's out, what to focus on. But it doesn't tell the operator HOW to send the engagement off : credentials delivery plan, IP allow-list step, kick-off email template, alert-tuning during the engagement window. So historically each engagement has been a one-off that depends on whoever was on duty remembering the last time. Added : * docs/PENTEST_SEND_PACKAGE.md — 5-step send sequence (NDA → credentials → IP allow-list → kick-off email → alert tuning), reception checklist, and post-engagement housekeeping. Email template inline so it's grep-able and version-controlled. * scripts/pentest/seed-test-accounts.sh — provisions the 3 staging accounts (listener/creator/admin) referenced by §"Authentication context" of the scope doc. Generates 32-char random passwords, probes each by login, emits 1Password import JSON to stdout (passwords NEVER printed to the screen). Refuses to run against any env that isn't "staging". The send-package doc references one helper that doesn't exist yet : * infra/ansible/playbooks/pentest_allowlist_ip.yml — Forgejo IP allow-list automation. Punted to a follow-up because the manual SSH path is fine for once-per-engagement use and Ansible formalisation deserves its own commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 22:29:35 +02:00
senke	05b1d81d30	fix(scripts): payment-e2e walkthrough safety guards (DRY_RUN + prod confirm) Three holes in the v1.0.9 W6 Day 27 walkthrough that an operator under stress could fall into : 1. Typo'd STAGING_URL pointing at production. The script accepted any URL with no sanity check, so `STAGING_URL=https://veza.fr ...` would happily POST /orders and charge a real card on the first run. Fix: heuristic detection (URL doesn't contain "staging", "localhost" or "127.0.0.1" → treat as prod) refuses to run unless CONFIRM_PRODUCTION=1 is explicitly set. 2. No way to rehearse the flow without spending money. Added DRY_RUN=1 that exits cleanly after step 2 (product listing) — exercises auth, API plumbing, and the staging product fixture without creating an order. 3. No final confirm before the actual charge. On a prod target, after the product is picked and before the POST /orders fires, the script now prints the {product_id, price, operator, endpoint} block and demands the operator type the literal word `CHARGE`. Any other answer aborts with exit code 2. Together these turn "STAGING_URL typo = burnt 5 EUR" into "STAGING_URL typo = exit code 3 with explanation". The wrapper docs in docs/PAYMENT_E2E_LIVE_REPORT.md already mention card-charge risk in prose; these guards enforce it at exec time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 22:27:14 +02:00
senke	6c644cff03	fix(haproxy): forgejo backend uses HTTPS re-encrypt + Host header on healthcheck Forgejo at 10.0.20.105:3000 serves HTTPS only (self-signed cert). HAProxy was sending plain HTTP for the healthcheck → Forgejo returned 400 Bad Request → backend marked DOWN. Two coupled fixes : 1. `server forgejo ... ssl verify none sni str(forgejo.talas.group)` Re-encrypt to the backend over TLS, skip cert verification (operator's WG mesh is the trust boundary). SNI set to the public hostname so Forgejo serves the right vhost. 2. Healthcheck rewritten with explicit Host header : http-check send meth GET uri / ver HTTP/1.1 hdr Host forgejo.talas.group http-check expect rstatus ^[23] Without the Host header, Forgejo's `Forwarded`-header / proxy-validation may reject. Accept any 2xx/3xx (Forgejo redirects to /login → 302). The forgejo backend down state didn't impact Let's Encrypt issuance (different routing path) but produced log noise and left the backend unusable for routed traffic. --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 16:31:29 +02:00
senke	0bd3e563b2	fix(haproxy): incus proxy devices forward R720:80/443 → container The Orange box NAT correctly forwards :80/:443 → R720 LAN IP, but the R720 host has nothing listening there — haproxy lives in the veza-haproxy container, reachable only on the net-veza bridge (10.0.20.X). Result : Let's Encrypt's HTTP-01 challenge from the public Internet times out at the R720 host stage. Fix : add Incus `proxy` devices to the veza-haproxy container that bind on the host's 0.0.0.0:80 / 0.0.0.0:443 and forward into the container's local ports. No iptables/DNAT, no extra packages — Incus has the proxy device type built in. incus config device add veza-haproxy http proxy \ listen=tcp:0.0.0.0:80 connect=tcp:127.0.0.1:80 incus config device add veza-haproxy https proxy \ listen=tcp:0.0.0.0:443 connect=tcp:127.0.0.1:443 Idempotent : `incus config device show veza-haproxy \| grep '^http:$'` short-circuits the add when the device is already there. Operator setup unchanged : box NAT 80/443 → R720 LAN IP. Ansible now bridges the rest of the path automatically. --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 16:27:37 +02:00
senke	d9896686bd	fix(haproxy): runtime DNS resolution + init-addr none for absent backends HAProxy was rejecting the cfg at parse time because every `server backend-{blue,green}.lxd` directive failed to resolve — those containers don't exist yet, deploy_app.yml creates them later. The validate said : could not resolve address 'veza-staging-backend-blue.lxd' Failed to initialize server(s) addr. Two complementary fixes : 1. Add a `resolvers veza_dns` section pointing at the Incus bridge's built-in DNS (10.0.20.1:53 — gateway of net-veza). `*.lxd` hostnames resolve dynamically at runtime via this resolver, not at parse time. Containers spun up later by deploy_app.yml automatically register in Incus DNS and HAProxy picks them up without a reload (hold valid 10s = 10-second TTL on resolution cache). 2. `default-server ... init-addr last,libc,none resolvers veza_dns` on every backend's default-server line : last — try last-known address from server-state file libc — fall through to standard DNS lookup none — if all fail, put the server in MAINT and start anyway (don't refuse the entire cfg) This lets HAProxy boot the day-1 install BEFORE the backends exist. Once deploy_app.yml lands them, the resolver picks them up within 10s. Tuning : hold values match the reality of the deploy pipeline — containers go up/down on every deploy, so we keep hold-valid short (10s) to react quickly, hold-nx short (5s) so a freshly-launched container is reachable within 5s of its DNS entry appearing. --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 16:17:39 +02:00
senke	c97e42996e	fix(haproxy): use shipped selfsigned.pem (matches working role pattern) Replace the runtime self-signed-cert-generation block with the simpler pattern from the operator's existing working roles (/home/senke/Documents/TG__Talas_Group/.../roles/haproxy/files/selfsigned.pem) : ship a CN=localhost selfsigned.pem in roles/haproxy/files/, copy it into the cert dir before haproxy.cfg renders. Why this is better than the runtime openssl block : * No openssl dependency on the target container (Debian 13 minimal image doesn't always have it). * No timing issue if /tmp is on a slow tmpfs. * Predictable cert content — same selfsigned.pem across all deploys, no per-host noise. * Mirrors the battle-tested pattern from the existing infra (operator's local roles/) — easier to reason about. Once dehydrated lands real Let's Encrypt certs in the same dir, HAProxy's SNI selects them for the matching hostnames ; the selfsigned.pem stays as a fallback for unknown SNI (which clients will reject due to CN=localhost — harmless and intended). selfsigned.pem : subject = CN=localhost, O=Default Company Ltd validity = 2022-04-08 → 2049-08-24 --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 16:12:35 +02:00
senke	b6147549c9	fix(haproxy): pre-create cert dir + placeholder cert ; reorder ACL rules Two issues caught by the now-verbose haproxy validate : 1. `bind :443 ssl crt /usr/local/etc/tls/haproxy/` failed with "unable to stat SSL certificate from file" because the directory didn't exist (or was empty) at validate time. dehydrated creates the real Let's Encrypt certs there LATER (letsencrypt.yml runs after the role's main render-and-restart). Chicken-and-egg. Fix : roles/haproxy/tasks/main.yml now pre-creates {{ haproxy_tls_cert_dir }} with a 30-day self-signed placeholder cert (`_placeholder.pem`) BEFORE haproxy.cfg renders. haproxy accepts the dir, validates the config. dehydrated later drops real .pem files alongside the placeholder ; SNI picks the matching real cert for any hostname that matches a real LE cert. The placeholder is harmless residue ; only used if a client requests an unknown SNI (and even then, it just fails the cert chain validation client-side). Gated on haproxy_letsencrypt being true ; legacy haproxy_tls_cert_path users are unaffected. 2. haproxy 3.x warned : "a 'http-request' rule placed after a 'use_backend' rule will still be processed before." Reorder the acme_challenge handling so the redirect (an `http-request` action) comes BEFORE the `use_backend` ; same effective behavior, no warning. --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 16:10:27 +02:00
senke	7253f0cf10	fix(ansible): haproxy validate without -q so the error message reaches operator `haproxy -f %s -c -q` (quiet) suppresses the actual validation error on stderr+stdout, leaving the operator with a useless "failed to validate" with empty output. Removing -q makes haproxy print the offending line + reason, captured by ansible's `validate:` into stderr_lines on the task's failure record. Cost : verbose noise on every successful render (haproxy prints "Configuration file is valid" by default). Acceptable trade-off for the once-in-a-while debugging value. --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 16:06:50 +02:00
senke	385a8f0378	fix(ansible): add staging/prod meta-groups so group_vars/<env>.yml applies group_vars/staging.yml + group_vars/prod.yml were never loaded : Ansible matches `group_vars/<NAME>.yml` against the inventory's group NAMED `<NAME>`. Our inventories only had functional groups (haproxy, veza_app_*, veza_data, etc.) — no `staging` or `prod` parent group. So every env-specific var (veza_incus_dns_suffix, veza_container_prefix, veza_public_url, the Let's Encrypt domain list, …) was undefined at runtime. Symptom : haproxy.cfg.j2 render failed with AnsibleUndefinedVariable: 'veza_incus_dns_suffix' is undefined Fix : add an env-named meta-group as a CHILD of `all`, with the existing functional groups as ITS children. Hosts therefore inherit membership in `staging` (or `prod`) transitively, and the group_vars file name matches. staging: children: incus_hosts: forgejo_runner: haproxy: veza_app_backend: veza_app_stream: veza_app_web: veza_data: Verified with : ansible-inventory -i inventory/staging.yml --host veza-haproxy \ --vault-password-file .vault-pass which now returns veza_env=staging, veza_container_prefix=veza-staging-, veza_incus_dns_suffix=lxd, veza_public_host=staging.veza.fr — all the vars the playbook templates rely on. Same shape applied to prod.yml. inventory/local.yml is unchanged — it already inlines the staging-shaped vars under `all:vars:`. --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 16:01:44 +02:00
senke	e97b91f010	fix(ansible): don't apply common role to haproxy container + gate ssh.yml on sshd Two fixes for "haproxy container doesn't have sshd" : 1. playbooks/haproxy.yml — drop the `common` role play. The role's purpose is to harden a full HOST (SSH + fail2ban monitoring auth.log + node_exporter metrics surface). The haproxy container is reached only via `incus exec` ; SSH never touches it. Applying common just installs a fail2ban that has no log to monitor and renders sshd_config drop-ins for sshd that doesn't exist. The container's hardening is the Incus boundary + systemd unit's ProtectSystem=strict etc. (already in the templates). 2. roles/common/tasks/ssh.yml — gate every task on sshd presence. `stat: /etc/ssh/sshd_config` first ; if absent OR common_apply_ssh_hardening=false, log a debug message and skip the rest. Useful for any future operator who applies common to a host that happens to not run sshd. --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 15:57:16 +02:00
senke	c245b72e05	fix(ansible): symlink inventory/group_vars → ../group_vars so vars load Ansible looks for group_vars/ relative to either the inventory file or the playbook file. Our group_vars/ lived at infra/ansible/group_vars/, sibling to inventory/ and playbooks/ — neither location, so ansible silently treated all the env vars as undefined. Symptom : the haproxy.yml `common` role asserted ssh_allow_users \| length > 0 which failed because ssh_allow_users was undefined → empty by default. Fix : symlink inventory/group_vars → ../group_vars. Smallest possible change ; preserves every existing path reference (bash scripts, docs) that uses infra/ansible/group_vars/ directly. Ansible now finds the group_vars when invoked with -i inventory/staging.yml, and ansible-inventory --host veza-haproxy now returns the full var set (ssh_allow_users, haproxy_env_prefixes, vault_* via vault, etc.). Verified with : ansible-inventory -i inventory/staging.yml --host veza-haproxy \ --vault-password-file .vault-pass Same symlink applies for inventory/lab.yml, prod.yml, local.yml — they all live in the same directory. --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 15:48:12 +02:00