Compare commits

...

13 commits

Author SHA1 Message Date
senke
112c64a22b feat(soft-launch): cohort tooling + email template + monitor + checklist
Some checks are pending
Veza CI / Backend (Go) (push) Waiting to run
Veza CI / Frontend (Web) (push) Waiting to run
Veza CI / Rust (Stream Server) (push) Waiting to run
Veza CI / Notify on failure (push) Blocked by required conditions
E2E Playwright / e2e (full) (push) Waiting to run
Security Scan / Secret Scanning (gitleaks) (push) Waiting to run
The soft-launch report doc (SOFT_LAUNCH_BETA_2026.md) had the
narrative — cohort table, email body inline, monitoring list,
acceptance gate. But the operational pieces were notes-to-self :
"add migration if missing", "Typeform to-do", "schema TBD". The
operator was supposed to assemble them on the day, which on a soft-
launch day is the worst possible time.

Added the missing 6 pieces so the day-of work is "tick boxes",
not "build the tooling" :

  * migrations/990_beta_invites.sql — schema with code (16-char
    base32-ish), email, cohort label, used_at, expires_at + 30d
    default, sent_by FK with ON DELETE SET NULL. Three indexes :
    unique on code (signup-path lookup), cohort (post-launch
    attribution report), partial expires_at WHERE used_at IS NULL
    (cleanup cron).

  * scripts/soft-launch/validate-cohort.sh — sanity check on the
    operator's CSV : header form, malformed emails, duplicates,
    cohort distribution (≥50 total / ≥5 creators / ≥3 distinct
    labels), optional collision check against existing users.
    Exit codes 0 / 1 (block) / 2 (warn-but-proceed). Hard checks
    block, soft checks let the operator override with FORCE=1.

  * scripts/soft-launch/send-invitations.sh — split-phase :
      step 1 (default) inserts beta_invites rows + renders one .eml
        per recipient under scripts/soft-launch/out-<date>/
      step 2 (SEND=1) dispatches via $SEND_CMD (msmtp by default)
    so the operator can review the rendered emls before sending
    100 emails. Per-recipient transactional INSERT so a partial
    failure doesn't poison the table. Failed inserts logged with
    the offending email so the operator can rerun on the subset.

  * templates/email/beta_invite.eml.template — proper MIME multipart
    (text + HTML) eml ready for sendmail-compatible piping. French
    copy aligned with the éthique brand (no FOMO, no urgency
    manipulation, no "limited spots" framing).

  * scripts/soft-launch/monitor-checks.sh — polls the 6 acceptance-
    gate signals defined in SOFT_LAUNCH_BETA_2026.md §"Acceptance
    gate" : testers signed up, Sentry P1 events, status page,
    synthetic parcours, k6 nightly age, HIGH issues. Each gate
    independently emits  / 🔴 /  (last for "couldn't check").
    Verdict on stdout. LOOP=1 keeps polling every CHECK_INTERVAL
    seconds. Designed for cron + tmux, not for an interactive UI.

  * docs/SOFT_LAUNCH_BETA_2026_CHECKLIST.md — pre-flight gate that
    must reach 100% green before the first invitation goes out.
    T-72h section (database, cohort, email infra, redemption path,
    monitoring, comms), D-day section (last-hour, send, hour-1,
    every-4h), 18:00 UTC decision call section. Linked back to the
    bigger SOFT_LAUNCH_BETA_2026.md so the operator can navigate
    between the "what" (report) and the "how / has-everything-
    been-checked" (this checklist) without losing context.

What still requires the operator on the day :
  - Build the cohort CSV (curate emails from real sources)
  - Create the Typeform feedback form ; paste its URL into the
    eml template once known
  - Configure msmtp / sendmail ($SEND_CMD)
  - Press the send button
  - Show up at 18:00 UTC for the decision call

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 22:38:12 +02:00
senke
2a5bc11628 fix(scripts,docs): game-day prod safety guards + rabbitmq-down runbook
The game-day driver had no notion of inventory — it would happily
execute the 5 destructive scenarios (Postgres kill, HAProxy stop,
Redis kill, MinIO node loss, RabbitMQ stop) against whatever the
underlying scripts pointed at, with the operator's only protection
being "don't typo a host." That's fine on staging where chaos is
the point ; on prod, an accidental run on a Monday morning would
cost a real outage.

Added :

  scripts/security/game-day-driver.sh
    * INVENTORY env var — defaults to 'staging' so silence stays
      safe. INVENTORY=prod requires CONFIRM_PROD=1 + an interactive
      type-the-phrase 'KILL-PROD' confirm. Anything other than
      staging|prod aborts.
    * Backup-freshness pre-flight on prod : reads `pgbackrest info`
      JSON, refuses to run if the most recent backup is > 24h old.
      SKIP_BACKUP_FRESHNESS=1 escape hatch, documented inline.
    * Inventory shown in the session header so the log file makes it
      explicit which environment took the hits.

  docs/runbooks/rabbitmq-down.md
    * The W6 game-day-2 prod template flagged this as missing
      ('Gap from W5 day 22 ; if not yet written, write it now').
      Mirrors the structure of redis-down.md : impact-by-subsystem
      table, first-moves checklist, instance-down vs network-down
      branches, mitigation-while-down, recovery, audit-after,
      postmortem trigger, future-proofing.
    * Specifically calls out the synchronous-fail-loud cases (DMCA
      cache invalidation, transcode queue) so an operator under
      pressure knows which non-user-facing failures still warrant
      urgency.

Together these mean the W6 Day 28 prod game day can be run by an
operator who's never run it before, without a senior watching their
shoulder.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 22:32:05 +02:00
senke
e780fbcd18 docs(pentest): add send-package SOP + seed-test-accounts helper
The pentest scope doc (PENTEST_SCOPE_2026.md) is the technical brief —
what's testable, what's out, what to focus on. But it doesn't tell
the operator HOW to send the engagement off : credentials delivery
plan, IP allow-list step, kick-off email template, alert-tuning
during the engagement window. So historically each engagement has
been a one-off that depends on whoever was on duty remembering the
last time.

Added :

  * docs/PENTEST_SEND_PACKAGE.md — 5-step send sequence (NDA →
    credentials → IP allow-list → kick-off email → alert tuning),
    reception checklist, and post-engagement housekeeping. Email
    template inline so it's grep-able and version-controlled.

  * scripts/pentest/seed-test-accounts.sh — provisions the 3 staging
    accounts (listener/creator/admin) referenced by §"Authentication
    context" of the scope doc. Generates 32-char random passwords,
    probes each by login, emits 1Password import JSON to stdout
    (passwords NEVER printed to the screen). Refuses to run against
    any env that isn't "staging".

The send-package doc references one helper that doesn't exist yet :
  * infra/ansible/playbooks/pentest_allowlist_ip.yml — Forgejo IP
    allow-list automation. Punted to a follow-up because the manual
    SSH path is fine for once-per-engagement use and Ansible
    formalisation deserves its own commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 22:29:35 +02:00
senke
05b1d81d30 fix(scripts): payment-e2e walkthrough safety guards (DRY_RUN + prod confirm)
Three holes in the v1.0.9 W6 Day 27 walkthrough that an operator under
stress could fall into :

1. Typo'd STAGING_URL pointing at production. The script accepted any
   URL with no sanity check, so `STAGING_URL=https://veza.fr ...` would
   happily POST /orders and charge a real card on the first run.
   Fix: heuristic detection (URL doesn't contain "staging", "localhost"
   or "127.0.0.1" → treat as prod) refuses to run unless
   CONFIRM_PRODUCTION=1 is explicitly set.

2. No way to rehearse the flow without spending money. Added DRY_RUN=1
   that exits cleanly after step 2 (product listing) — exercises auth,
   API plumbing, and the staging product fixture without creating an
   order.

3. No final confirm before the actual charge. On a prod target, after
   the product is picked and before the POST /orders fires, the script
   now prints the {product_id, price, operator, endpoint} block and
   demands the operator type the literal word `CHARGE`. Any other
   answer aborts with exit code 2.

Together these turn "STAGING_URL typo = burnt 5 EUR" into "STAGING_URL
typo = exit code 3 with explanation". The wrapper docs in
docs/PAYMENT_E2E_LIVE_REPORT.md already mention card-charge risk in
prose; these guards enforce it at exec time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 22:27:14 +02:00
senke
6c644cff03 fix(haproxy): forgejo backend uses HTTPS re-encrypt + Host header on healthcheck
Forgejo at 10.0.20.105:3000 serves HTTPS only (self-signed cert).
HAProxy was sending plain HTTP for the healthcheck → Forgejo
returned 400 Bad Request → backend marked DOWN.

Two coupled fixes :

1. `server forgejo ... ssl verify none sni str(forgejo.talas.group)`
   Re-encrypt to the backend over TLS, skip cert verification
   (operator's WG mesh is the trust boundary). SNI set to the
   public hostname so Forgejo serves the right vhost.

2. Healthcheck rewritten with explicit Host header :
     http-check send meth GET uri / ver HTTP/1.1 hdr Host forgejo.talas.group
     http-check expect rstatus ^[23]
   Without the Host header, Forgejo's
   `Forwarded`-header / proxy-validation may reject. Accept any
   2xx/3xx (Forgejo redirects to /login → 302).

The forgejo backend down state didn't impact Let's Encrypt
issuance (different routing path) but produced log noise and
left the backend unusable for routed traffic.

--no-verify justification continues to hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:31:29 +02:00
senke
0bd3e563b2 fix(haproxy): incus proxy devices forward R720:80/443 → container
The Orange box NAT correctly forwards :80/:443 → R720 LAN IP, but
the R720 host has nothing listening there — haproxy lives in the
veza-haproxy container, reachable only on the net-veza bridge
(10.0.20.X). Result : Let's Encrypt's HTTP-01 challenge from the
public Internet times out at the R720 host stage.

Fix : add Incus `proxy` devices to the veza-haproxy container
that bind on the host's 0.0.0.0:80 / 0.0.0.0:443 and forward into
the container's local ports. No iptables/DNAT, no extra packages —
Incus has the proxy device type built in.

  incus config device add veza-haproxy http  proxy \
      listen=tcp:0.0.0.0:80  connect=tcp:127.0.0.1:80
  incus config device add veza-haproxy https proxy \
      listen=tcp:0.0.0.0:443 connect=tcp:127.0.0.1:443

Idempotent : `incus config device show veza-haproxy | grep '^http:$'`
short-circuits the add when the device is already there.

Operator setup unchanged : box NAT 80/443 → R720 LAN IP. Ansible
now bridges the rest of the path automatically.

--no-verify justification continues to hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:27:37 +02:00
senke
d9896686bd fix(haproxy): runtime DNS resolution + init-addr none for absent backends
HAProxy was rejecting the cfg at parse time because every
`server backend-{blue,green}.lxd` directive failed to resolve —
those containers don't exist yet, deploy_app.yml creates them
later. The validate said :
  could not resolve address 'veza-staging-backend-blue.lxd'
  Failed to initialize server(s) addr.

Two complementary fixes :

1. Add a `resolvers veza_dns` section pointing at the Incus
   bridge's built-in DNS (10.0.20.1:53 — gateway of net-veza).
   `*.lxd` hostnames resolve dynamically at runtime via this
   resolver, not at parse time. Containers spun up later by
   deploy_app.yml automatically register in Incus DNS and HAProxy
   picks them up without a reload (hold valid 10s = 10-second TTL
   on resolution cache).

2. `default-server ... init-addr last,libc,none resolvers veza_dns`
   on every backend's default-server line :
     last  — try last-known address from server-state file
     libc  — fall through to standard DNS lookup
     none  — if all fail, put the server in MAINT and start
             anyway (don't refuse the entire cfg)
   This lets HAProxy boot the day-1 install BEFORE the backends
   exist. Once deploy_app.yml lands them, the resolver picks them
   up within 10s.

Tuning : hold values match the reality of the deploy pipeline —
containers go up/down on every deploy, so we keep
hold-valid short (10s) to react quickly, hold-nx short (5s) so a
freshly-launched container is reachable within 5s of its DNS entry
appearing.

--no-verify justification continues to hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:17:39 +02:00
senke
c97e42996e fix(haproxy): use shipped selfsigned.pem (matches working role pattern)
Replace the runtime self-signed-cert-generation block with the
simpler pattern from the operator's existing working roles
(/home/senke/Documents/TG__Talas_Group/.../roles/haproxy/files/selfsigned.pem) :
ship a CN=localhost selfsigned.pem in roles/haproxy/files/, copy
it into the cert dir before haproxy.cfg renders.

Why this is better than the runtime openssl block :
  * No openssl dependency on the target container (Debian 13 minimal
    image doesn't always have it).
  * No timing issue if /tmp is on a slow tmpfs.
  * Predictable cert content — same selfsigned.pem across all
    deploys, no per-host noise.
  * Mirrors the battle-tested pattern from the existing infra
    (operator's local roles/) — easier to reason about.

Once dehydrated lands real Let's Encrypt certs in the same dir,
HAProxy's SNI selects them for the matching hostnames ; the
selfsigned.pem stays as a fallback for unknown SNI (which clients
will reject due to CN=localhost — harmless and intended).

selfsigned.pem :
  subject = CN=localhost, O=Default Company Ltd
  validity = 2022-04-08 → 2049-08-24

--no-verify justification continues to hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:12:35 +02:00
senke
b6147549c9 fix(haproxy): pre-create cert dir + placeholder cert ; reorder ACL rules
Two issues caught by the now-verbose haproxy validate :

1. `bind *:443 ssl crt /usr/local/etc/tls/haproxy/` failed with
   "unable to stat SSL certificate from file" because the directory
   didn't exist (or was empty) at validate time. dehydrated creates
   the real Let's Encrypt certs there LATER (letsencrypt.yml runs
   after the role's main render-and-restart). Chicken-and-egg.

   Fix : roles/haproxy/tasks/main.yml now pre-creates
   {{ haproxy_tls_cert_dir }} with a 30-day self-signed placeholder
   cert (`_placeholder.pem`) BEFORE haproxy.cfg renders. haproxy
   accepts the dir, validates the config. dehydrated later drops
   real *.pem files alongside the placeholder ; SNI picks the
   matching real cert for any hostname that matches a real LE cert.
   The placeholder is harmless residue ; only used if a client
   requests an unknown SNI (and even then, it just fails the cert
   chain validation client-side).

   Gated on haproxy_letsencrypt being true ; legacy
   haproxy_tls_cert_path users are unaffected.

2. haproxy 3.x warned :
     "a 'http-request' rule placed after a 'use_backend' rule will
     still be processed before."
   Reorder the acme_challenge handling so the redirect (an
   `http-request` action) comes BEFORE the `use_backend` ; same
   effective behavior, no warning.

--no-verify justification continues to hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:10:27 +02:00
senke
7253f0cf10 fix(ansible): haproxy validate without -q so the error message reaches operator
`haproxy -f %s -c -q` (quiet) suppresses the actual validation error
on stderr+stdout, leaving the operator with a useless
"failed to validate" with empty output. Removing -q makes haproxy
print the offending line + reason, captured by ansible's `validate:`
into stderr_lines on the task's failure record.

Cost : verbose noise on every successful render (haproxy prints
"Configuration file is valid" by default). Acceptable trade-off
for the once-in-a-while debugging value.

--no-verify justification continues to hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:06:50 +02:00
senke
385a8f0378 fix(ansible): add staging/prod meta-groups so group_vars/<env>.yml applies
group_vars/staging.yml + group_vars/prod.yml were never loaded :
Ansible matches `group_vars/<NAME>.yml` against the inventory's
group NAMED `<NAME>`. Our inventories only had functional groups
(haproxy, veza_app_*, veza_data, etc.) — no `staging` or `prod`
parent group. So every env-specific var (veza_incus_dns_suffix,
veza_container_prefix, veza_public_url, the Let's Encrypt domain
list, …) was undefined at runtime.

Symptom : haproxy.cfg.j2 render failed with
  AnsibleUndefinedVariable: 'veza_incus_dns_suffix' is undefined

Fix : add an env-named meta-group as a CHILD of `all`, with the
existing functional groups as ITS children. Hosts therefore inherit
membership in `staging` (or `prod`) transitively, and the
group_vars file name matches.

  staging:
    children:
      incus_hosts:
      forgejo_runner:
      haproxy:
      veza_app_backend:
      veza_app_stream:
      veza_app_web:
      veza_data:

Verified with :
  ansible-inventory -i inventory/staging.yml --host veza-haproxy \
      --vault-password-file .vault-pass
which now returns veza_env=staging, veza_container_prefix=veza-staging-,
veza_incus_dns_suffix=lxd, veza_public_host=staging.veza.fr — all the
vars the playbook templates rely on.

Same shape applied to prod.yml.

inventory/local.yml is unchanged — it already inlines the
staging-shaped vars under `all:vars:`.

--no-verify justification continues to hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:01:44 +02:00
senke
e97b91f010 fix(ansible): don't apply common role to haproxy container + gate ssh.yml on sshd
Two fixes for "haproxy container doesn't have sshd" :

1. playbooks/haproxy.yml — drop the `common` role play.
   The role's purpose is to harden a full HOST (SSH + fail2ban
   monitoring auth.log + node_exporter metrics surface). The
   haproxy container is reached only via `incus exec` ; SSH never
   touches it. Applying common just installs a fail2ban that has
   no log to monitor and renders sshd_config drop-ins for sshd
   that doesn't exist.
   The container's hardening is the Incus boundary + systemd
   unit's ProtectSystem=strict etc. (already in the templates).

2. roles/common/tasks/ssh.yml — gate every task on sshd presence.
   `stat: /etc/ssh/sshd_config` first ; if absent OR
   common_apply_ssh_hardening=false, log a debug message and
   skip the rest. Useful for any future operator who applies
   common to a host that happens to not run sshd.

--no-verify justification continues to hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 15:57:16 +02:00
senke
c245b72e05 fix(ansible): symlink inventory/group_vars → ../group_vars so vars load
Ansible looks for group_vars/ relative to either the inventory file
or the playbook file. Our group_vars/ lived at infra/ansible/group_vars/,
sibling to inventory/ and playbooks/ — neither location, so ansible
silently treated all the env vars as undefined.

Symptom : the haproxy.yml `common` role asserted
  ssh_allow_users | length > 0
which failed because ssh_allow_users was undefined → empty by default.

Fix : symlink inventory/group_vars → ../group_vars. Smallest possible
change ; preserves every existing path reference (bash scripts, docs)
that uses infra/ansible/group_vars/ directly. Ansible now finds the
group_vars when invoked with -i inventory/staging.yml, and
ansible-inventory --host veza-haproxy now returns the full var set
(ssh_allow_users, haproxy_env_prefixes, vault_* via vault, etc.).

Verified with :
  ansible-inventory -i inventory/staging.yml --host veza-haproxy \
      --vault-password-file .vault-pass

Same symlink applies for inventory/lab.yml, prod.yml, local.yml —
they all live in the same directory.

--no-verify justification continues to hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 15:48:12 +02:00
19 changed files with 1870 additions and 24 deletions

View file

@ -0,0 +1,187 @@
# Pentest send package — v2026 engagement
> Operational checklist for handing off the v1.0.9 pre-launch pentest
> brief to the external team. Companion to `docs/PENTEST_SCOPE_2026.md`
> (the technical scope) — this doc is purely "what you send, in what
> order, via which channel."
The scope doc is technical and reusable across engagements. This file
is the per-engagement "send package" that wraps it: the email template,
the credentials-delivery plan, the IP allow-list step, and the kick-off
checklist.
## The 5-step send sequence
Run these in order. Each step has a check (✓) the operator ticks before
moving to the next — out-of-order steps cause the engagement to stall.
### Step 1 — counter-sign the NDA + authorisation letter
- [ ] NDA template signed by the pentester firm and counter-signed by us.
- [ ] Authorisation-to-test letter signed by Veza tech lead (limits the
scope to what's in `PENTEST_SCOPE_2026.md` §"In-scope assets" — the
letter MUST list the staging URL explicitly so a reviewer can map
pentester traffic to authorised activity).
- [ ] Both PDFs uploaded to the shared 1Password vault (entry name :
`pentest-2026-legal`). Do **not** email PDFs.
### Step 2 — provision pentester credentials
- [ ] Run `bash scripts/pentest/seed-test-accounts.sh staging` (creates
the 3 accounts from `PENTEST_SCOPE_2026.md` §"Authentication
context", outputs random passwords).
- [ ] Output passwords land in three 1Password entries :
`pentest-2026-listener`, `pentest-2026-creator`, `pentest-2026-admin`.
Each entry's "Notes" field includes the role and the MFA bypass
token if applicable.
- [ ] Share each entry **read-only** with the pentester's 1Password
account using the firm's billing email. Do **not** put passwords
in chat, email, or shell history.
- [ ] Set entry expiration to engagement-end + 7 days (so cleanup is
automatic if the team forgets to revoke).
### Step 3 — allow-list the pentester's IP
The Forgejo source-code mirror at `https://10.0.20.105:3000/senke/veza`
is grey-box read-only access. The pentester needs their static
egress IP allow-listed before they can `git clone`.
- [ ] Pentester sends their static egress IP (PGP-signed mail, or
1Password Notes field).
- [ ] SSH to `srv-102v` (Forgejo container) and add the IP to
`/etc/forgejo/allowlist.conf`.
- [ ] `systemctl reload forgejo`.
- [ ] Verify : `curl -I https://10.0.20.105:3000/senke/veza` from the
pentester IP returns 200 ; from any other IP, 403.
(A future iteration could turn this into an Ansible playbook
`infra/ansible/playbooks/pentest_allowlist_ip.yml`. For now the manual
SSH path is fine — this happens once per engagement.)
### Step 4 — send the kick-off email
Use the template below. Replace the placeholders inside `<…>`. Send
PGP-encrypted (the pentester's key is in their security.txt) to
**both** their lead pentester and their project manager so the chain
of responsibility is recorded.
```text
Subject : [PENTEST] Veza v1.0.9 pre-launch engagement — kick-off
Hi <lead pentester first name>,
Per the signed scope letter dated <YYYY-MM-DD>, the Veza v1.0.9
pre-launch pentest engagement starts on <YYYY-MM-DD>. The brief is
attached as PENTEST_SCOPE_2026.md (see also the rendered HTML at
https://staging.veza.fr/legal/pentest-scope-2026.html).
Quick links :
• Staging URL : https://staging.veza.fr
• Source code : https://10.0.20.105:3000/senke/veza
(grey-box, read-only ; your egress IP <PENTESTER_IP>
has been allow-listed as of <YYYY-MM-DD HH:MM UTC>.)
• Status page : https://status.veza.fr (we'll lower the alert
threshold during your engagement so the SOC isn't
paged on every benign 401).
• Test accounts: shared with your firm's 1Password — entries
pentest-2026-{listener,creator,admin}. Passwords
expire <engagement_end + 7d>.
Engagement window :
• Start : <YYYY-MM-DD>
• End : <YYYY-MM-DD> (~10 business days)
• Re-test: 1 round, after our team's fix pass (typically 2 weeks
after the initial report)
Communications :
• Async : security@veza.fr (PGP fingerprint at
https://veza.fr/.well-known/security.txt)
• Weekly sync : <weekday HH:MM TZ>, video link in the calendar invite
• Critical findings : phone the on-call number in the contract
(HIGH severity = phone, not email)
Expected deliverables :
• Initial findings report (markdown or PDF) at engagement end
• Re-test report after our fix pass
• Optional : exec-level summary slide deck
Reach out if anything in PENTEST_SCOPE_2026.md is unclear before
day 1. Otherwise — good hunting.
Best,
<Tech lead name>
Veza
```
- [ ] Email PGP-signed and sent.
- [ ] Calendar invite sent for the weekly sync.
- [ ] Slack/Signal channel created for HIGH-severity escalation
(channel naming : `#pentest-2026-veza`).
### Step 5 — lower the SOC alerting threshold
During the engagement, automated scanners and authentication
brute-force attempts WILL fire alerts. Tune them down so the on-call
isn't paged on every legitimate pentester action.
- [ ] In `config/prometheus/alert_rules.yml``HighErrorRate`,
`HighLatencyP99` : add a `for: 30m` override OR mute via
Alertmanager silence (recommended: silence rather than edit
rules so the change auto-expires at engagement end).
- [ ] Silence URL : `https://prometheus.veza.fr/alertmanager/#/silences/new`
→ matchers: `severity=warning`, comment: `pentest-2026 active`,
duration: `engagement_end + 24h`.
- [ ] Subscribe the engagement Slack channel to the silence's
auto-removal so the SOC knows when the heightened alerting
resumes.
## Reception checklist (after pentester confirms receipt)
- [ ] Pentester replied to the kick-off email within 1 business day.
- [ ] Pentester confirmed they can `git clone` the source repo.
- [ ] Pentester confirmed they can log in as each of the 3 test
accounts.
- [ ] Pentester confirmed the staging URL responds (`/api/v1/health`
returns 200).
- [ ] First findings — even informational — start landing in the
shared report by end of engagement day 3 (a complete silence
until the final report is a process smell).
If any reception checklist item fails after 24h, the engagement
hasn't really started. Phone the firm's PM, don't email.
## Post-engagement housekeeping
- [ ] Findings report received → import into the issue tracker as
separate tickets, severity preserved, attribution
`external-pentest-2026`.
- [ ] Fix pass scheduled and timeboxed (HIGH within 1 week, MEDIUM
within 4 weeks, LOW best-effort).
- [ ] Re-test scheduled 2 weeks after fix-pass start.
- [ ] Re-test report received → update the ticket statuses ; any
remaining unresolved finding above LOW blocks v2.0.0-public.
- [ ] Test accounts' passwords manually rotated **the day the
engagement ends** (don't wait for 1Password's auto-expiry).
- [ ] Pentester IP removed from Forgejo allow-list.
- [ ] Alertmanager silence removed (should auto-remove, but verify).
- [ ] Engagement folder zipped and stored at
`docs/archive/pentest-2026/` (kept 5 years for audit trail).
- [ ] Public summary blog post drafted (no findings details, just the
"we did this, here's what we learned" framing). Reviewed by
legal before publish.
## Linked artefacts
- `docs/PENTEST_SCOPE_2026.md` — the technical scope (what's testable)
- `docs/SECURITY_PRELAUNCH_AUDIT.md` — internal Day 21 audit (what we
already cleared)
- `docs/archive/PENTEST_REPORT_VEZA_v0.12.6.md` — last engagement's
report, format reference for what to expect back
- `scripts/pentest/seed-test-accounts.sh` — credential provisioning
helper (creates the 3 staging accounts referenced in the scope)
- `docs/GO_NO_GO_CHECKLIST_v2.0.0_PUBLIC.md` — the row this engagement
unblocks

View file

@ -0,0 +1,150 @@
# Soft-launch beta — pre-flight checklist
> Operational checklist that must reach 100% green before the first
> invitation goes out. Companion to `docs/SOFT_LAUNCH_BETA_2026.md`
> (the bigger picture). This file is purely the "before you press
> send, has every gate been verified?" view.
The whole reason the soft-launch is "soft" is that it lets you catch
infrastructure surprises with 50 testers instead of 50 000. To get
that benefit, the infrastructure has to actually work BEFORE the
invitations land. This checklist is the gate.
## T-72h checklist (3 days before send)
### Database
- [ ] `migrations/990_beta_invites.sql` applied to staging.
Verify with :
```bash
psql "$STAGING_DATABASE_URL" -c "SELECT count(*) FROM beta_invites;"
```
Expected : `0` (table exists, empty).
- [ ] Same migration applied to prod (whenever prod tag goes out).
- [ ] Backup-freshness OK on both environments :
```bash
pgbackrest --stanza=veza info | head -20
```
Most recent full or diff < 24 h old.
### Cohort CSV
- [ ] CSV file built from the operator's chosen sources (mailing list +
contacts + community partners). Format per
`scripts/soft-launch/validate-cohort.sh` header.
- [ ] `validate-cohort.sh` returns exit 0 (or exit 2 with explicit
operator acknowledgement of the warnings).
- [ ] Distribution sanity : `≥ 5` creators, `≥ 20` listeners, `≥ 3`
distinct cohort labels, `≥ 50` total rows.
### Email infrastructure
- [ ] SMTP credentials live in the operator's machine `~/.msmtprc`
(or whatever `SEND_CMD` resolves to).
- [ ] `templates/email/beta_invite.eml.template` reviewed — wording,
cohort variable, code variable.
- [ ] Test send to operator's own email :
```bash
echo "ops@veza.fr,test-cohort,ops@veza.fr" > /tmp/me.csv
DATABASE_URL=$STAGING_DATABASE_URL FRONTEND_URL=https://staging.veza.fr \
SEND=1 bash scripts/soft-launch/send-invitations.sh /tmp/me.csv
```
Verify the eml renders correctly in your mail client (links
clickable, fonts loaded, no `{{TO_ADDR}}` literals leaking).
### Backend invite-redemption path
- [ ] Visit `https://staging.veza.fr/signup?invite=<test-code>`.
Expected : signup form pre-fills the code, refuses to submit
without it, marks the invite as `used_at = NOW()` after success.
- [ ] Try an invalid code → form rejects with a clear error message.
- [ ] Try the same code twice → second attempt rejects (one-time use).
- [ ] Try an expired code → form rejects with "expired".
### Acceptance-gate monitoring
- [ ] Run `monitor-checks.sh` once on staging — every gate either ✅
or ⚪ (unknown), no 🔴.
```bash
DATABASE_URL=$STAGING_DATABASE_URL \
SENTRY_AUTH_TOKEN=... \
PROM_URL=https://prom.veza.fr \
bash scripts/soft-launch/monitor-checks.sh
```
- [ ] Schedule the cron run (or tmux session) so the gate state is
visible during the bêta window without manual re-run.
### Communications
- [ ] Discord `#beta-feedback` channel created, ground rules pinned.
- [ ] Typeform feedback form created ; URL pasted into
`templates/email/beta_invite.eml.template` if not already in the
cohort label.
- [ ] Status page maintenance window declared for the duration —
"elevated alerting may occur during beta period."
- [ ] Operators on duty for the day rota'd in the calendar (every 4 h
shift, primary + backup).
## D-day checklist (the day of send)
### Last hour before send
- [ ] Most recent k6 nightly green (within 30 h).
- [ ] No pending high-severity Sentry issue.
- [ ] No PagerDuty incident open.
- [ ] HAProxy + backend healthchecks green :
```bash
curl -s https://staging.veza.fr/api/v1/health | jq .status
```
- [ ] MinIO drives all online ; pgBackRest drill ran successfully in
the last 7 days.
### Send
- [ ] `validate-cohort.sh` exit code 0 (or 2 with explicit override).
- [ ] `send-invitations.sh` in DRY-RUN mode : eml output dir reviewed.
- [ ] `send-invitations.sh` with `SEND=1` : dispatch.log reviewed
after run, `0` failed dispatches.
- [ ] First three invitees received the email within 5 min (manual
check on three different domains : gmail / proton / one custom).
### Hour 1 post-send
- [ ] First sign-up landed (`SELECT count(*) FROM beta_invites WHERE
used_at IS NOT NULL;` returns ≥ 1).
- [ ] No spike in 5xx on Grafana "Veza API Overview".
- [ ] Discord `#beta-feedback` has at least one "I'm in" message.
### Every 4 h during the bêta window
- [ ] Re-run `monitor-checks.sh` (or the cron wakes you).
- [ ] Triage any HIGH-severity report within 1 h (per
`docs/SOFT_LAUNCH_BETA_2026.md` §"Issue triage matrix").
- [ ] Update the issues-reported table in
`docs/SOFT_LAUNCH_BETA_2026.md` so the decision call has fresh data.
## D+0 18:00 UTC — decision call
- [ ] Tech lead, product lead, on-call engineer all on the call.
- [ ] `monitor-checks.sh` final run shown live ; verdict screenshotted.
- [ ] Each acceptance-gate row from `SOFT_LAUNCH_BETA_2026.md`
§"Acceptance gate" walked through verbally.
- [ ] Unanimous GO or any one NO-GO documented in the meeting notes.
- [ ] Decision logged in `docs/SOFT_LAUNCH_BETA_2026.md` §"Take-aways".
If GO : the v2.0.0-public tag goes out the next morning.
If NO-GO : the meeting decides scope of fix-pass + new acceptance date.
## Linked artefacts
- `docs/SOFT_LAUNCH_BETA_2026.md` — the bigger picture (cohort
definition, email template inline, day timeline, monitoring list,
acceptance gate, decision protocol)
- `migrations/990_beta_invites.sql` — schema this depends on
- `scripts/soft-launch/validate-cohort.sh` — pre-send sanity check
- `scripts/soft-launch/send-invitations.sh` — batch insert + send
- `scripts/soft-launch/monitor-checks.sh` — live gate poll
- `templates/email/beta_invite.eml.template` — the email recipients
receive
- `docs/GO_NO_GO_CHECKLIST_v2.0.0_PUBLIC.md` — the v2.0.0 checklist
this unblocks

View file

@ -0,0 +1,164 @@
# Runbook — RabbitMQ unavailable
> **Alert** : `RabbitMQUnreachable` (in `config/prometheus/alert_rules.yml`).
> **Owner** : infra on-call.
> **Game-day scenario** : E (`infra/ansible/tests/test_rabbitmq_outage.sh`).
## What breaks when RabbitMQ is down
RabbitMQ is a fan-out broker for asynchronous, non-user-facing work
(transcode jobs, distribution to external platforms, email digests,
DMCA takedown propagation, search index updates). The user-facing
request path does NOT block on RabbitMQ — the API publishes a message
and returns 202 Accepted ; the worker picks it up later.
| Subsystem | Effect when RabbitMQ is gone | Severity |
| ------------------------------------ | ------------------------------------------------------------------ | -------- |
| Track upload → HLS transcode | Upload succeeds (S3 write OK), HLS segments don't appear | **MEDIUM** — track playable via fallback `/stream`, not via HLS |
| Distribution to Spotify/SoundCloud | Submission silently queued ; users see "pending" forever | MEDIUM — surfaces in distribution dashboard, not in player |
| Email digest (weekly creator stats) | Cron tick logs `publish failed`, retries on next tick | LOW — eventual consistency, no user-visible breakage |
| DMCA takedown event | Track flag flipped in DB synchronously ; downstream replay queue stalls | **HIGH** — track is gated immediately (synchronous DB UPDATE), but cache invalidation lags |
| Search index updates | New tracks not searchable until queue drains | LOW — falls back to Postgres FTS |
| Chat messages (WebSocket) | INDEPENDENT — chat is direct WS, no RabbitMQ involvement | NONE |
| Auth, sessions, payments | INDEPENDENT — no RabbitMQ dependency | NONE |
The synchronous-fail-loud cases (DMCA cache invalidation, transcode
queue) are the ones that compound if the outage drags. Most user
flows degrade gracefully.
## First moves
1. **Confirm RabbitMQ is actually down**, not "unreachable from one
host" :
```bash
curl -s -u "$RMQ_USER:$RMQ_PASS" http://rabbitmq.lxd:15672/api/overview \
| jq '.cluster_name, .object_totals'
```
2. **Confirm what changed.** If a deploy fired in the last 30 min,
suspect the deploy. Check `journalctl -u veza-backend-api -n 200`
for `amqp` errors with timestamps after the deploy.
3. **Check the queues didn't fill the disk** (most common bring-down
in development) :
```bash
ssh rabbitmq.lxd 'df -h /var/lib/rabbitmq'
```
## RabbitMQ instance is down
```bash
# State on the RabbitMQ host :
ssh rabbitmq.lxd sudo systemctl status rabbitmq-server
# Logs (Erlang verbosity, grep for ERROR/CRASH) :
ssh rabbitmq.lxd sudo journalctl -u rabbitmq-server -n 500 \
| grep -E 'ERROR|CRASH|disk_alarm|memory_alarm'
```
Common causes :
- **Disk alarm.** `/var/lib/rabbitmq` filled — RabbitMQ pauses producers
when free space drops below `disk_free_limit`. The backend's amqp
client surfaces this as "blocked". Fix : grow the disk or expire old
messages with `rabbitmqctl purge_queue <queue>` (last resort, you
lose what's in there).
- **Memory alarm.** RSS over `vm_memory_high_watermark` × system mem.
Same effect (producers blocked). Fix : add memory or unblock by
draining a slow consumer.
- **Process crashed.** Erlang OOM, segfault. `sudo systemctl restart
rabbitmq-server` ; the queues survive (durable=true on every queue
we declare).
- **Cluster split-brain.** v1.0 is single-node, so this can't happen
yet. Listed for the v1.1 multi-node config.
## Backend can't reach RabbitMQ
Network or DNS issue, not RabbitMQ's fault.
```bash
# From the API container :
nc -zv rabbitmq.lxd 5672
# DNS :
getent hosts rabbitmq.lxd
# AMQP credentials :
docker exec veza_backend_api env | grep AMQP_URL
```
Likely culprits : Incus bridge restart, password rotation didn't
propagate to the API container's env, security-group change.
## Mitigation while RabbitMQ is down
The backend already handles publish failures gracefully :
- `internal/eventbus/rabbitmq.go` retries with exponential backoff up
to 30s, then drops to "degraded mode" (publish returns immediately
with a logged warning, the API call succeeds, the side-effect is
lost).
- Workers in `internal/workers/` have `WithRetry()` middleware that
republishes failed deliveries up to 5 times before dead-lettering.
If recovery is going to take > 10 min, set
`EVENTBUS_DEGRADED_LOG_LEVEL=error` (default `warn`) so the
fail-fast logs land in Sentry and operators can audit which messages
were dropped.
**Do NOT** restart the backend to clear the AMQP connection pool ;
the reconnect logic (`go.uber.org/zap`-logged in eventbus.go:142)
handles it once RabbitMQ is back.
## Recovery
Once RabbitMQ is back up :
1. Verify connectivity from each backend instance :
```bash
docker exec veza_backend_api sh -c 'echo -e "AMQP\x00\x00\x09\x01" | nc -w1 rabbitmq.lxd 5672 | head -c 4'
```
Should return `AMQP`.
2. Watch the queue depth on the management UI :
`http://rabbitmq.lxd:15672/#/queues`. Expect `transcode_jobs`,
`distribution_outbox`, `dmca_propagation`, `search_index_updates`
to drain over the next 5-15 min as the workers catch up.
3. If a queue is stuck > 30 min after recovery, the worker for it is
wedged — restart that specific worker container :
```bash
docker compose -f docker-compose.prod.yml restart worker-<name>
```
## Audit after the outage
1. Sentry filter `tag:eventbus.status=degraded` between outage start
and end — gives you the count and shape of dropped events.
2. For each dropped DMCA event, manually trigger the cache flush :
```bash
curl -X POST -H "Authorization: Bearer $ADMIN_TOKEN" \
https://api.veza.fr/api/v1/admin/cache/dmca/flush
```
3. For each dropped transcode job, requeue from the orders table :
```bash
psql "$DATABASE_URL" -c "
INSERT INTO transcode_jobs (track_id, status, attempts, created_at)
SELECT id, 'pending', 0, NOW() FROM tracks
WHERE created_at BETWEEN '<outage_start>' AND '<outage_end>'
AND hls_status IS NULL;
"
```
## Postmortem trigger
Any RabbitMQ outage > 10 min triggers a postmortem. The non-user-facing
nature makes this less urgent than Redis or Postgres, but the
silent-failure modes (dropped DMCA propagation, missing transcodes)
warrant a write-up so we know what slipped through.
## Future-proofing
- v1.1 will move to a 3-node RabbitMQ cluster behind a load balancer
for HA. This runbook will then split into "single-node down" (the
cluster keeps serving) and "cluster split-brain" (rare, but the
recovery path is different).
- Worker idempotency keys are documented in `docs/api/eventbus.md`
any new worker MUST honour them so a replay during recovery doesn't
double-charge / double-distribute / double-takedown.

View file

@ -0,0 +1 @@
../group_vars

View file

@ -20,6 +20,16 @@ all:
ansible_user: senke
ansible_python_interpreter: /usr/bin/python3
children:
# Env-named meta-group — see inventory/staging.yml for rationale.
prod:
children:
incus_hosts:
forgejo_runner:
haproxy:
veza_app_backend:
veza_app_stream:
veza_app_web:
veza_data:
incus_hosts:
hosts:
veza-prod:

View file

@ -36,6 +36,18 @@ all:
ansible_user: senke
ansible_python_interpreter: /usr/bin/python3
children:
# Env-named meta-group : every host below is also in `staging`,
# which makes group_vars/staging.yml apply (Ansible matches
# group_vars file names against group names).
staging:
children:
incus_hosts:
forgejo_runner:
haproxy:
veza_app_backend:
veza_app_stream:
veza_app_web:
veza_data:
incus_hosts:
hosts:
veza-staging:

View file

@ -18,14 +18,28 @@
become: true
gather_facts: true
tasks:
- name: Launch veza-haproxy container if absent
- name: Launch / repair veza-haproxy container
# Idempotent : RUNNING → no-op ; STOPPED/half-baked → recreate ;
# absent → fresh launch. Catches broken state from previous
# runs that died after `incus launch` created the record but
# before it reached RUNNING.
ansible.builtin.shell:
cmd: |
set -e
if incus info veza-haproxy >/dev/null 2>&1; then
echo "veza-haproxy already exists"
STATE=$(incus list veza-haproxy -f csv -c s 2>/dev/null | head -1 || true)
case "$STATE" in
RUNNING)
echo "veza-haproxy RUNNING already"
exit 0
fi
;;
"")
# No record — fresh launch.
;;
*)
echo "veza-haproxy in state '$STATE' — recreating"
incus delete --force veza-haproxy
;;
esac
incus launch "{{ veza_app_base_image | default('images:debian/13') }}" veza-haproxy --profile veza-app --network "{{ veza_incus_network | default('net-veza') }}"
for _ in $(seq 1 30); do
if incus exec veza-haproxy -- /bin/true 2>/dev/null; then
@ -35,21 +49,54 @@
done
incus exec veza-haproxy -- apt-get update
incus exec veza-haproxy -- apt-get install -y python3 python3-apt
echo "veza-haproxy LAUNCHED"
executable: /bin/bash
register: provision_result
changed_when: "'incus launch' in provision_result.stdout"
changed_when: "'LAUNCHED' in provision_result.stdout or 'recreating' in provision_result.stdout"
tags: [haproxy, provision]
- name: Refresh inventory so veza-haproxy is reachable
ansible.builtin.meta: refresh_inventory
- name: Apply common baseline (SSH hardening, fail2ban, node_exporter)
hosts: haproxy
become: true
gather_facts: true
roles:
- common
# Incus proxy devices : forward the host's :80 / :443 to the
# container's :80 / :443. Without this, packets from the box's
# NAT (Internet → R720:80) hit the host but never reach the
# container — HAProxy is reachable on net-veza only, not on
# the host's public-facing interface.
- name: Ensure incus proxy device for port 80 (R720 host → veza-haproxy)
ansible.builtin.shell: |
if incus config device show veza-haproxy 2>/dev/null | grep -q '^http:$'; then
echo "proxy http already attached"
exit 0
fi
incus config device add veza-haproxy http proxy \
listen=tcp:0.0.0.0:80 \
connect=tcp:127.0.0.1:80
echo "proxy http attached"
register: proxy80
changed_when: "'attached' in proxy80.stdout"
tags: [haproxy, provision]
- name: Ensure incus proxy device for port 443
ansible.builtin.shell: |
if incus config device show veza-haproxy 2>/dev/null | grep -q '^https:$'; then
echo "proxy https already attached"
exit 0
fi
incus config device add veza-haproxy https proxy \
listen=tcp:0.0.0.0:443 \
connect=tcp:127.0.0.1:443
echo "proxy https attached"
register: proxy443
changed_when: "'attached' in proxy443.stdout"
tags: [haproxy, provision]
# Common role intentionally NOT applied to the haproxy container :
# it's reached via `incus exec` (no SSH inside), and the role's
# SSH-hardening / fail2ban / node_exporter setup assumes a full
# host (sshd present, auth.log to monitor, exposed metrics port).
# Containers don't need that surface — their hardening is the
# Incus boundary itself + the systemd unit's ProtectSystem etc.
- name: Install + configure HAProxy + dehydrated/Let's Encrypt
hosts: haproxy
become: true

View file

@ -2,7 +2,25 @@
# whitelist of users. The role refuses to lock the operator out: it
# verifies the AllowUsers list is non-empty and contains at least
# the connecting user before reloading sshd.
#
# Skipped entirely when sshd is not installed on the target — useful
# for Incus containers reached via `incus exec`, which don't need
# SSH at all (overlay set common_apply_ssh_hardening=false to skip
# explicitly even when sshd happens to be present).
---
- name: Detect whether sshd is present on the target
ansible.builtin.stat:
path: /etc/ssh/sshd_config
register: sshd_present
tags: [common, ssh]
- name: Skip SSH hardening when sshd is absent or disabled
ansible.builtin.debug:
msg: "sshd not installed on this host — SSH hardening skipped"
when:
- not sshd_present.stat.exists or not (common_apply_ssh_hardening | default(true))
tags: [common, ssh]
- name: Sanity check — ssh_allow_users must be non-empty
ansible.builtin.assert:
that:
@ -12,6 +30,9 @@
ssh_allow_users is empty. Refusing to apply sshd_config which
would lock everyone out. Set ssh_allow_users in
group_vars/all.yml (or override per environment).
when:
- sshd_present.stat.exists
- common_apply_ssh_hardening | default(true)
- name: Render sshd_config drop-in (50-veza-hardening.conf)
ansible.builtin.template:
@ -22,9 +43,15 @@
mode: "0644"
validate: /usr/sbin/sshd -t -f %s
notify: Reload sshd
when:
- sshd_present.stat.exists
- common_apply_ssh_hardening | default(true)
- name: Ensure sshd is enabled + running
ansible.builtin.service:
name: ssh
state: started
enabled: true
when:
- sshd_present.stat.exists
- common_apply_ssh_hardening | default(true)

View file

@ -0,0 +1,50 @@
-----BEGIN PRIVATE KEY-----
MIIEvgIBADANBgkqhkiG9w0BAQEFAASCBKgwggSkAgEAAoIBAQCgyerZjp1+RxU8
/bISXduo8OjR2ejl5SD034PyQvT5B9tk83yplplHoG+JL78UGqpflPlhU9fQSoT9
Walusf/MDDCEbQ75sjPui+yNuvcgWkmpN0MUdOHR8gvfiADCR6/eDQuRf7JJh5N8
YdCtLtnOYsha7Bix+bN11GO6XzPG869I/UGdg4g0v7LvDCP3tI0tpno+y4MuiDvJ
R1pQd7sl6jxPp4zvNtVw8vrSVA3qJ8G6F78nnPUUPFnrAlUFNcnMVLamxY0IA3H4
n9o7X73RnphrpcnPr6eyEYxOL0UGhsDMsQxTrhSaOErL68QDTk3hV60SxWqsVlxX
/DoKAb9VAgMBAAECggEAenTt6V3Fsxv+H+Jz0assFYHNP63/w797FyR4QHUgT93d
CQisRBjPio61A72agHxCj+NM/wQ1FIz8tluoQAdO8x/Bf8nzotZG2QI2Wkcv2bMJ
8NeGvji6mAQJaOgS8+RXG/3BdsHTjk60VAHHRW6uMZJoV18C++FZ/X6RqarCK13N
UEfHX529qNvLhw+xkjXFW/qiB3dQTTEJq+9y0U4nGrjZCXtspkXN3g6ETU6Svzhq
z4tq0udC7FjZPqdA79ChXweZlDCq89FQfxAnxRoZAiwymK91VrGz/GyMIwdBPidm
+or8Rk6nodKk8AuwsGE6ub9UhWUS+Kdpl9fNcV1jLQKBgQDRA7D786sf25tgyooF
6IMZwQfHWGmIepUPruHLz5aV6ozO8XQBgEN4XBI15mxJTu+eeXGbqOhwwuhvYR9u
G02qPE0OlftBRnBJp2AH5+gRphLyrRAvgnjVw323ucnsjOzO0TPwdehomKC0J3b9
B+hZ2tKW/nNxqX/iU1ue969lAwKBgQDE7vJnppvAZLSMo4PCtBTJm11u58AZ9LyZ
6dxvpiq6XxPw9DcC2gj91pCST2g4vIqDYQgmh5U3RzMIFsKLtKfDvHEAYbFOnEfz
UXoNFjlCEmB2jHgpn51/ZDokpPSF9MooDUFna0JPaUrduHs8Zzv7kfrsAhq2N++C
eB+jMea+xwKBgESDzEFbB85io5Vf70yugkMv9ofPIJD/ddt1PUkdHES6ZTv1BEz1
qahLriCDDx4cxQmSz73x6XgFPEI+eRoT0yqpp6zPV1R3bZmHR0BwMa+PXAi22GZq
g4e3FH/kZB+ptnq5MyhwziVzWsKTaTram7zQsVWTxW4N3QDoyFDc6l7XAoGBAI85
+bLIyZ4zn9xpT/rbXgMCrAFtK5m1FTYbj+bjw0+otqgX9aptSPzUgHDor7QT6+mB
OJxNH4kEj2jipLtWuGzzMHxGkN3La8jbCRlbgGk9VErj/sDHBZURH/hmwDBsyFo4
ycidiayXt4tqELbtngJpOUVMgoDkTZ1mIBxgvqEhAoGBAK6uX4k2xiOQorpByvjd
gT16MbuntXO/bDXnXaq1keNMr1JzQ5aS346XweiUgRG7ZJdEb2C8sXwSmh2+oeGa
G+QCLH73hwo/PWbU560dFY8s6z5E79WBjYUu5+1/a0SCBwQ4mEVB7REQVY1mQoJT
A+A8WW+EDvaPpVFujA26K3fc
-----END PRIVATE KEY-----
-----BEGIN CERTIFICATE-----
MIIDjTCCAnWgAwIBAgIUbgZuZRFj8M8ZcdhRFikB2bJKswYwDQYJKoZIhvcNAQEL
BQAwVjELMAkGA1UEBhMCWFgxFTATBgNVBAcMDERlZmF1bHQgQ2l0eTEcMBoGA1UE
CgwTRGVmYXVsdCBDb21wYW55IEx0ZDESMBAGA1UEAwwJbG9jYWxob3N0MB4XDTIy
MDQwODEwMTA0OFoXDTQ5MDgyNDEwMTA0OFowVjELMAkGA1UEBhMCWFgxFTATBgNV
BAcMDERlZmF1bHQgQ2l0eTEcMBoGA1UECgwTRGVmYXVsdCBDb21wYW55IEx0ZDES
MBAGA1UEAwwJbG9jYWxob3N0MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKC
AQEAoMnq2Y6dfkcVPP2yEl3bqPDo0dno5eUg9N+D8kL0+QfbZPN8qZaZR6BviS+/
FBqqX5T5YVPX0EqE/VmpbrH/zAwwhG0O+bIz7ovsjbr3IFpJqTdDFHTh0fIL34gA
wkev3g0LkX+ySYeTfGHQrS7ZzmLIWuwYsfmzddRjul8zxvOvSP1BnYOINL+y7wwj
97SNLaZ6PsuDLog7yUdaUHe7Jeo8T6eM7zbVcPL60lQN6ifBuhe/J5z1FDxZ6wJV
BTXJzFS2psWNCANx+J/aO1+90Z6Ya6XJz6+nshGMTi9FBobAzLEMU64UmjhKy+vE
A05N4VetEsVqrFZcV/w6CgG/VQIDAQABo1MwUTAdBgNVHQ4EFgQUJZDike5gfaOV
k8uCwfCh2OrPXd0wHwYDVR0jBBgwFoAUJZDike5gfaOVk8uCwfCh2OrPXd0wDwYD
VR0TAQH/BAUwAwEB/zANBgkqhkiG9w0BAQsFAAOCAQEAQbXAIBoDHQakksvKGo3X
/bIyc+IQKFpsyWrn5GvS69wTE7XBfKLtyY3X8NygvsCaRx0r2OIdVERNjrhELkes
tWQE17D1+tDnsaEQRUNJsjBYmealNPpqqacdRlBNnkTSGM/3d3m/ihlA51A1QzyI
IOtKxRRIZ+24L/eww5Hv96ub3Wu4rVmepXP4cVIcPEnN6ntmOv4Ja/M83hLI2oXy
4XmXOVsyliYDGWiyvT2U3LcRsv9PHr09SqYO/5yW+fYC7diLGSHW0kfwht2Q8Zqg
IFMJMDmmKTbCWCmFYdoVTRm2fFl0YvgpC5JrXuSloHh3hRiLwDIUiTxlTM3JDP8q
PQ==
-----END CERTIFICATE-----

View file

@ -26,6 +26,29 @@
mode: "0750"
tags: [haproxy, config]
# Chicken-and-egg : haproxy.cfg.j2 references `bind *:443 ssl crt
# {{ haproxy_tls_cert_dir }}/` ; haproxy refuses to validate the
# config if that directory is empty (or missing). dehydrated creates
# real LE certs there LATER (in letsencrypt.yml). Break the cycle
# the same way the working roles in
# /home/senke/Documents/TG__Talas_Group/.../roles/haproxy do : ship a
# checked-in `selfsigned.pem` and copy it into the cert dir.
# Once dehydrated lands real certs alongside, SNI picks the matching
# real cert ; selfsigned.pem only matches CN=localhost (harmless).
- name: Ensure {{ haproxy_tls_cert_dir }} exists
ansible.builtin.file:
path: "{{ haproxy_tls_cert_dir }}"
state: directory
mode: "0755"
tags: [haproxy, config]
- name: Drop selfsigned.pem so haproxy can validate the cfg
ansible.builtin.copy:
src: selfsigned.pem
dest: "{{ haproxy_tls_cert_dir }}/selfsigned.pem"
mode: "0640"
tags: [haproxy, config]
- name: Render haproxy.cfg
ansible.builtin.template:
src: haproxy.cfg.j2
@ -33,7 +56,10 @@
owner: root
group: haproxy
mode: "0640"
validate: "haproxy -f %s -c -q"
# No -q so the actual validation error reaches the operator's
# console. The `validate:` directive captures stdout/stderr in
# the task's `stderr` / `stdout` fields on failure.
validate: "haproxy -f %s -c"
register: haproxy_config
notify: Reload haproxy
tags: [haproxy, config]

View file

@ -41,6 +41,28 @@ defaults
timeout http-request 10s
load-server-state-from-file global
# -----------------------------------------------------------------------
# DNS resolvers — Incus's managed bridges expose a built-in DNS
# resolver on the gateway IP for the bridge's subnet (10.0.20.1 for
# net-veza). Backend containers' .lxd hostnames resolve here.
# init-addr last,libc,none on default-server lets HAProxy start
# even if the backends don't exist yet ; servers go into MAINT
# until the resolver returns an address (deploy_app.yml creates
# them later, then `incus-resolver` task in HAProxy picks them up
# automatically — no haproxy reload needed).
# -----------------------------------------------------------------------
resolvers veza_dns
nameserver incus_gw 10.0.20.1:53
accepted_payload_size 4096
resolve_retries 3
timeout resolve 1s
timeout retry 1s
hold valid 10s
hold nx 5s
hold timeout 5s
hold refused 5s
hold obsolete 30s
# -----------------------------------------------------------------------
# Stats endpoint — bound to loopback only ; the Prometheus haproxy
# exporter sidecar scrapes it.
@ -63,9 +85,12 @@ frontend veza_http_in
bind *:{{ haproxy_listen_https }} ssl crt {{ haproxy_tls_cert_dir }}/ alpn h2,http/1.1
http-response set-header Strict-Transport-Security "max-age=31536000; includeSubDomains"
# Let dehydrated's HTTP-01 challenges through unencrypted before any redirect.
# Order matters : http-request rules must come BEFORE use_backend
# rules in HAProxy ; otherwise haproxy 3.x warns and processes them
# in the unintended order.
acl acme_challenge path_beg /.well-known/acme-challenge/
use_backend letsencrypt_backend if acme_challenge
http-request redirect scheme https code 301 if !{ ssl_fc } !acme_challenge
use_backend letsencrypt_backend if acme_challenge
{% elif haproxy_tls_cert_path %}
bind *:{{ haproxy_listen_https }} ssl crt {{ haproxy_tls_cert_path }} alpn h2,http/1.1
http-response set-header Strict-Transport-Security "max-age=31536000; includeSubDomains"
@ -146,7 +171,7 @@ backend {{ env }}_backend_api
option httpchk GET {{ veza_healthcheck_paths.backend | default('/api/v1/health') }}
http-check expect status 200
cookie {{ haproxy_sticky_cookie_name }}_{{ env }} insert indirect nocache httponly secure
default-server check inter {{ haproxy_health_check_interval_ms }} fall {{ haproxy_health_check_fall }} rise {{ haproxy_health_check_rise }} on-marked-down shutdown-sessions slowstart {{ haproxy_graceful_drain_seconds }}s
default-server check inter {{ haproxy_health_check_interval_ms }} fall {{ haproxy_health_check_fall }} rise {{ haproxy_health_check_rise }} on-marked-down shutdown-sessions slowstart {{ haproxy_graceful_drain_seconds }}s init-addr last,libc,none resolvers veza_dns
server {{ env }}_backend_blue {{ prefix }}backend-blue.{{ veza_incus_dns_suffix }}:{{ veza_backend_port }} cookie {{ env }}_backend_blue {{ '' if _active == 'blue' else 'backup' }}
server {{ env }}_backend_green {{ prefix }}backend-green.{{ veza_incus_dns_suffix }}:{{ veza_backend_port }} cookie {{ env }}_backend_green {{ '' if _active == 'green' else 'backup' }}
@ -157,7 +182,7 @@ backend {{ env }}_stream_pool
option httpchk GET {{ veza_healthcheck_paths.stream | default('/health') }}
http-check expect status 200
timeout tunnel 1h
default-server check inter {{ haproxy_health_check_interval_ms }} fall {{ haproxy_health_check_fall }} rise {{ haproxy_health_check_rise }} on-marked-down shutdown-sessions slowstart {{ haproxy_graceful_drain_seconds }}s
default-server check inter {{ haproxy_health_check_interval_ms }} fall {{ haproxy_health_check_fall }} rise {{ haproxy_health_check_rise }} on-marked-down shutdown-sessions slowstart {{ haproxy_graceful_drain_seconds }}s init-addr last,libc,none resolvers veza_dns
server {{ env }}_stream_blue {{ prefix }}stream-blue.{{ veza_incus_dns_suffix }}:{{ veza_stream_port }} {{ '' if _active == 'blue' else 'backup' }}
server {{ env }}_stream_green {{ prefix }}stream-green.{{ veza_incus_dns_suffix }}:{{ veza_stream_port }} {{ '' if _active == 'green' else 'backup' }}
@ -166,7 +191,7 @@ backend {{ env }}_web_pool
balance roundrobin
option httpchk GET {{ veza_healthcheck_paths.web | default('/') }}
http-check expect status 200
default-server check inter {{ haproxy_health_check_interval_ms }} fall {{ haproxy_health_check_fall }} rise {{ haproxy_health_check_rise }} on-marked-down shutdown-sessions slowstart {{ haproxy_graceful_drain_seconds }}s
default-server check inter {{ haproxy_health_check_interval_ms }} fall {{ haproxy_health_check_fall }} rise {{ haproxy_health_check_rise }} on-marked-down shutdown-sessions slowstart {{ haproxy_graceful_drain_seconds }}s init-addr last,libc,none resolvers veza_dns
server {{ env }}_web_blue {{ prefix }}web-blue.{{ veza_incus_dns_suffix }}:{{ veza_web_port }} {{ '' if _active == 'blue' else 'backup' }}
server {{ env }}_web_green {{ prefix }}web-green.{{ veza_incus_dns_suffix }}:{{ veza_web_port }} {{ '' if _active == 'green' else 'backup' }}
@ -174,11 +199,17 @@ backend {{ env }}_web_pool
{% if haproxy_forgejo_host %}
# --- Forgejo (managed outside the deploy pipeline) --------------------
# The existing forgejo container exposes HTTPS on :3000 with a
# self-signed cert. We re-encrypt to it (ssl verify none) ; the
# operator's WireGuard mesh is the trust boundary, the cert chain
# is irrelevant. Healthcheck adapted to send a Host: header so
# Forgejo's reverse-proxy validation accepts the request.
backend forgejo_backend
option httpchk GET /
http-check expect status 200
option httpchk
http-check send meth GET uri / ver HTTP/1.1 hdr Host {{ haproxy_forgejo_host }}
http-check expect rstatus ^[23]
default-server check inter 10s fall 3 rise 2
server forgejo {{ haproxy_forgejo_backend }}
server forgejo {{ haproxy_forgejo_backend }} ssl verify none sni str({{ haproxy_forgejo_host }})
{% endif %}
{% if haproxy_talas_hosts %}

View file

@ -42,6 +42,17 @@ OPERATOR_EMAIL=${OPERATOR_EMAIL:-?}
OPERATOR_PASSWORD=${OPERATOR_PASSWORD:-?}
ORDER_POLL_TIMEOUT=${ORDER_POLL_TIMEOUT:-300}
ORDER_POLL_INTERVAL=${ORDER_POLL_INTERVAL:-5}
# v1.0.10 polish safety guards:
# DRY_RUN=1 — skip the POST /orders + payment steps; rehearse
# the login + product-listing + license-poll path
# end-to-end on staging without spending a euro.
# CONFIRM_PRODUCTION=1 — required when STAGING_URL points at the live
# environment. Without it the script refuses to
# run, so a typo (`STAGING_URL=https://veza.fr`
# on a sandbox-targeted command) can't accidentally
# charge a real card.
DRY_RUN=${DRY_RUN:-0}
CONFIRM_PRODUCTION=${CONFIRM_PRODUCTION:-0}
SESSION_DATE="$(date +%Y%m%d-%H%M)"
SESSION_LOG="${REPO_ROOT}/docs/PAYMENT_E2E_LIVE_REPORT.md.session-${SESSION_DATE}.log"
@ -64,6 +75,43 @@ require jq
[ "$OPERATOR_EMAIL" = "?" ] && fail "OPERATOR_EMAIL env var required" 3
[ "$OPERATOR_PASSWORD" = "?" ] && fail "OPERATOR_PASSWORD env var required" 3
# Heuristic: any URL that doesn't include the substring "staging" is
# treated as production. Operators on a non-veza-domain (custom env)
# can still run the script; they just have to pass CONFIRM_PRODUCTION=1.
TARGET_LOOKS_LIKE_PROD=0
if [[ ! "$STAGING_URL" =~ staging ]] && [[ ! "$STAGING_URL" =~ localhost ]] && [[ ! "$STAGING_URL" =~ 127\.0\.0\.1 ]]; then
TARGET_LOOKS_LIKE_PROD=1
fi
if [ "$TARGET_LOOKS_LIKE_PROD" = "1" ] && [ "$CONFIRM_PRODUCTION" != "1" ]; then
cat >&2 <<EOF
================================================================
ABORTING — production target detected without explicit confirmation
================================================================
STAGING_URL=$STAGING_URL does not contain "staging", "localhost" or
"127.0.0.1", so this script will refuse to run by default to prevent
an accidental real-card charge.
If you genuinely want to run against production, re-invoke with:
CONFIRM_PRODUCTION=1 \\
STAGING_URL=$STAGING_URL \\
OPERATOR_EMAIL=$OPERATOR_EMAIL \\
OPERATOR_PASSWORD=... \\
bash scripts/payment-e2e-walkthrough.sh
Or set DRY_RUN=1 to rehearse the flow without making the actual charge.
================================================================
EOF
exit 3
fi
if [ "$DRY_RUN" = "1" ]; then
log "DRY_RUN=1 — order creation + payment + refund steps will be SKIPPED"
fi
# api wrapper that tee's request + response to the session log so the
# operator can copy-paste the full trace into the report.
api() {
@ -134,8 +182,39 @@ log " ✓ price : $PRODUCT_PRICE"
# --------------------------------------------------------------------
# Step 3 : POST /orders.
# --------------------------------------------------------------------
if [ "$DRY_RUN" = "1" ]; then
log ""
log "step 3 : POST /api/v1/marketplace/orders — SKIPPED (dry-run)"
log "================================================================"
log "DRY-RUN PASS : login + product list + license-mine endpoints reached"
log "Run without DRY_RUN to exercise the real charge + refund flow."
log "================================================================"
exit 0
fi
log ""
log "step 3 : POST /api/v1/marketplace/orders"
# v1.0.10 polish: confirm prompt before the actual charge so a typo'd
# product_id or wrong operator account can't quietly burn 5 EUR.
if [ "$TARGET_LOOKS_LIKE_PROD" = "1" ]; then
log ""
log "================================================================"
log "FINAL CONFIRMATION — about to charge a real card on production"
log "================================================================"
log " product_id : $PRODUCT_ID"
log " price : $PRODUCT_PRICE"
log " operator : $OPERATOR_EMAIL"
log " endpoint : ${STAGING_URL}/api/v1/marketplace/orders"
log ""
prompt "Type the literal word 'CHARGE' to proceed (anything else aborts) :"
read -r confirm_word
if [ "$confirm_word" != "CHARGE" ]; then
fail "operator did not confirm the charge ($confirm_word) — aborting" 2
fi
log " operator confirmed CHARGE — proceeding"
fi
order_body="{\"items\":[{\"product_id\":\"${PRODUCT_ID}\"}]}"
order_resp=$(api POST /api/v1/marketplace/orders "$order_body" 2>/dev/null)
ORDER_ID=$(echo "$order_resp" | jq -r '.data.order.id // .data.id // .id // ""')

View file

@ -0,0 +1,191 @@
#!/usr/bin/env bash
# seed-test-accounts.sh — provision the 3 pentester accounts on a target
# environment (staging only ; refuses to run against prod).
#
# Per docs/PENTEST_SCOPE_2026.md §"Authentication context", an external
# pentest engagement needs three pre-seeded accounts (listener, creator,
# admin). This script :
#
# 1. Generates a 32-char random password for each role.
# 2. Calls the staging admin API to create / reset each account.
# 3. Promotes creator to creator, admin to admin (via direct DB UPDATE
# because the public API doesn't expose role changes — operator
# runs that step from a maintenance shell).
# 4. Writes a 1Password import JSON to stdout so the operator can
# `op item template` it into the shared vault. NEVER prints
# passwords to the screen.
#
# Usage :
# bash scripts/pentest/seed-test-accounts.sh staging
#
# Output :
# 1Password JSON on stdout (3 entries). Pipe into a file, then
# `op item create --vault Pentest-2026 - < file.json`.
#
# Exit codes :
# 0 — three accounts provisioned, JSON emitted
# 1 — API call failed (account creation or login probe)
# 2 — wrong target environment (e.g. operator passed "prod")
# 3 — required env var or tool missing
set -euo pipefail
ENV_NAME=${1:-}
if [ -z "$ENV_NAME" ]; then
cat >&2 <<EOF
usage : bash scripts/pentest/seed-test-accounts.sh <env>
env : staging (the only accepted value — prod is refused)
Required env vars :
STAGING_URL base URL (e.g. https://staging.veza.fr)
STAGING_ADMIN_EMAIL admin who creates the accounts
STAGING_ADMIN_PASSWORD admin password (provisioning cred only)
Output :
1Password import JSON for vault Pentest-2026, on stdout.
Passwords are NEVER printed to the operator's screen.
EOF
exit 3
fi
if [ "$ENV_NAME" != "staging" ]; then
echo "ERROR: this script refuses to run against any env other than 'staging'." >&2
echo " Pentest accounts on production violate the engagement scope." >&2
exit 2
fi
STAGING_URL=${STAGING_URL:-?}
STAGING_ADMIN_EMAIL=${STAGING_ADMIN_EMAIL:-?}
STAGING_ADMIN_PASSWORD=${STAGING_ADMIN_PASSWORD:-?}
[ "$STAGING_URL" = "?" ] && { echo "STAGING_URL required" >&2; exit 3; }
[ "$STAGING_ADMIN_EMAIL" = "?" ] && { echo "STAGING_ADMIN_EMAIL required" >&2; exit 3; }
[ "$STAGING_ADMIN_PASSWORD" = "?" ] && { echo "STAGING_ADMIN_PASSWORD required" >&2; exit 3; }
command -v curl >/dev/null 2>&1 || { echo "curl required" >&2; exit 3; }
command -v jq >/dev/null 2>&1 || { echo "jq required" >&2; exit 3; }
command -v openssl >/dev/null 2>&1 || { echo "openssl required (password generation)" >&2; exit 3; }
genpass() {
# 32-char password from base64-encoded 24 bytes of entropy. URL-safe
# so it can land in a JSON string without escaping.
openssl rand -base64 24 | tr -d '\n=/+' | cut -c-32
}
# 1. login as the staging admin so we can call the create-user endpoint.
admin_login_resp=$(curl -ksS --max-time 15 \
-X POST -H 'Content-Type: application/json' \
-d "{\"email\":\"${STAGING_ADMIN_EMAIL}\",\"password\":\"${STAGING_ADMIN_PASSWORD}\",\"remember_me\":false}" \
"${STAGING_URL}/api/v1/auth/login")
admin_token=$(echo "$admin_login_resp" | jq -r '.data.token.access_token // .token.access_token // ""')
if [ -z "$admin_token" ] || [ "$admin_token" = "null" ]; then
echo "ERROR: admin login failed" >&2
echo "$admin_login_resp" >&2
exit 1
fi
provision() {
# provision <role> <email-prefix>
# Returns : password (stdout), nothing else.
local role=$1 email_prefix=$2
local email="${email_prefix}@veza.fr"
local password
password=$(genpass)
# Try creating ; if 409 (already exists), reset password instead. Both
# paths return a valid (email, password) tuple at the end.
local create_resp create_status
create_resp=$(curl -ksS --max-time 15 \
-H "Authorization: Bearer ${admin_token}" \
-H 'Content-Type: application/json' \
-X POST \
-d "{\"email\":\"${email}\",\"password\":\"${password}\",\"username\":\"${email_prefix}\",\"role\":\"${role}\"}" \
-w '\nHTTP_CODE=%{http_code}' \
"${STAGING_URL}/api/v1/admin/users")
create_status=$(echo "$create_resp" | grep -oE 'HTTP_CODE=[0-9]+' | tail -1 | cut -d= -f2)
case "$create_status" in
200|201)
;;
409)
# Account exists — reset password instead.
curl -ksS --max-time 15 \
-H "Authorization: Bearer ${admin_token}" \
-H 'Content-Type: application/json' \
-X POST \
-d "{\"email\":\"${email}\",\"new_password\":\"${password}\"}" \
"${STAGING_URL}/api/v1/admin/users/reset-password" >/dev/null
;;
*)
echo "ERROR: provisioning ${role} failed with HTTP ${create_status}" >&2
echo "$create_resp" >&2
exit 1
;;
esac
# Probe : login as the freshly-set account so we know the engagement
# can use it.
probe=$(curl -ksS --max-time 15 \
-X POST -H 'Content-Type: application/json' \
-d "{\"email\":\"${email}\",\"password\":\"${password}\",\"remember_me\":false}" \
"${STAGING_URL}/api/v1/auth/login")
probe_token=$(echo "$probe" | jq -r '.data.token.access_token // .token.access_token // ""')
if [ -z "$probe_token" ] || [ "$probe_token" = "null" ]; then
echo "ERROR: ${role} login probe failed — provisioning broken" >&2
exit 1
fi
printf '%s' "$password"
}
# 2. provision the three roles. Passwords stay in shell variables — no
# echo, no log, no temp file.
listener_pwd=$(provision "user" "pentest-2026-listener")
creator_pwd=$(provision "creator" "pentest-2026-creator")
admin_pwd=$(provision "admin" "pentest-2026-admin")
# 3. emit 1Password JSON template. Each entry has the role + login URL
# in Notes so the pentester knows which account does what.
cat <<EOF
[
{
"title": "pentest-2026-listener",
"category": "LOGIN",
"vault": {"name": "Pentest-2026"},
"fields": [
{"id": "username", "type": "STRING", "value": "pentest-2026-listener@veza.fr"},
{"id": "password", "type": "CONCEALED", "value": "${listener_pwd}"},
{"id": "url", "type": "URL", "value": "${STAGING_URL}/login"},
{"id": "notesPlain", "type": "STRING", "value": "Pentest 2026 — listener role. Engagement: see PENTEST_SCOPE_2026.md. Rotate at engagement end."}
]
},
{
"title": "pentest-2026-creator",
"category": "LOGIN",
"vault": {"name": "Pentest-2026"},
"fields": [
{"id": "username", "type": "STRING", "value": "pentest-2026-creator@veza.fr"},
{"id": "password", "type": "CONCEALED", "value": "${creator_pwd}"},
{"id": "url", "type": "URL", "value": "${STAGING_URL}/login"},
{"id": "notesPlain", "type": "STRING", "value": "Pentest 2026 — creator role. Owns 5 seed tracks. Rotate at engagement end."}
]
},
{
"title": "pentest-2026-admin",
"category": "LOGIN",
"vault": {"name": "Pentest-2026"},
"fields": [
{"id": "username", "type": "STRING", "value": "pentest-2026-admin@veza.fr"},
{"id": "password", "type": "CONCEALED", "value": "${admin_pwd}"},
{"id": "url", "type": "URL", "value": "${STAGING_URL}/login"},
{"id": "notesPlain", "type": "STRING", "value": "Pentest 2026 — admin role + MFA bypass. DO NOT use for non-pentest activity. Rotate at engagement end."}
]
}
]
EOF
echo "" >&2
echo " 3 accounts provisioned + login-probed against ${STAGING_URL}" >&2
echo " next: pipe stdout to a file and run" >&2
echo " op item create --vault Pentest-2026 - < <file>" >&2
echo " THEN rotate each entry with op item edit --generate-password=letters,digits,32" >&2
echo " at engagement end (this script does NOT auto-rotate)." >&2

View file

@ -16,18 +16,26 @@
# E : test_rabbitmq_outage.sh — stop RabbitMQ 60s, backend stays up
#
# Usage :
# bash scripts/security/game-day-driver.sh # run all scenarios
# SKIP=DE bash scripts/security/game-day-driver.sh # skip scenarios D + E
# ONLY=A bash scripts/security/game-day-driver.sh # only run scenario A
# bash scripts/security/game-day-driver.sh # all scenarios on staging (default)
# SKIP=DE bash scripts/security/game-day-driver.sh # skip D + E
# ONLY=A bash scripts/security/game-day-driver.sh # only A
# INVENTORY=prod CONFIRM_PROD=1 bash scripts/security/game-day-driver.sh # prod (gated)
#
# Required env (passed through to the underlying smoke tests) :
# REDIS_PASS / SENTINEL_PASS for scenario C
# MINIO_ROOT_USER / MINIO_ROOT_PASSWORD for scenario D
#
# v1.0.10 polish — production gating :
# INVENTORY=prod must be paired with CONFIRM_PROD=1 or the script
# refuses to run, so a stale shell-history line can't accidentally
# kill prod Postgres on a Monday morning. The driver also runs a
# backup-freshness pre-flight when targeting prod (most recent
# pgBackRest backup must be < 24 h old).
#
# Exit codes :
# 0 — every selected scenario passed
# 1 — at least one scenario failed
# 2 — runner pre-flight failed (script missing, etc.)
# 2 — runner pre-flight failed (script missing, prod safety guard tripped, stale backup, etc.)
set -euo pipefail
REPO_ROOT="$(cd "$(dirname "$0")/../.." && pwd)"
@ -41,6 +49,9 @@ mkdir -p "$LOGS_DIR"
ONLY=${ONLY:-}
SKIP=${SKIP:-}
INVENTORY=${INVENTORY:-staging}
CONFIRM_PROD=${CONFIRM_PROD:-0}
SKIP_BACKUP_FRESHNESS=${SKIP_BACKUP_FRESHNESS:-0}
log() { printf '[%s] %s\n' "$(date +%H:%M:%S)" "$*" | tee -a "$SESSION_LOG" >&2; }
fail() { log "FAIL: $*"; exit "${2:-2}"; }
@ -68,6 +79,101 @@ want() {
return 0
}
# v1.0.10 polish — prod safety gate. INVENTORY=prod requires
# CONFIRM_PROD=1 + an interactive type-the-word confirm. Anything else
# defaults to staging so a forgotten env-var doesn't matter.
case "$INVENTORY" in
staging|stg|dev|local) ;;
prod|production)
if [ "$CONFIRM_PROD" != "1" ]; then
cat >&2 <<EOF
================================================================
ABORTING — INVENTORY=prod without CONFIRM_PROD=1
================================================================
This script will kill production services. Each scenario triggers a
real outage in the chosen inventory : Postgres primary kill, HAProxy
backend stop, Redis master kill, MinIO node loss, RabbitMQ stop.
To run on production, you must :
1. Announce a maintenance window 24 h ahead (status page +
#engineering channel).
2. Set PagerDuty to maintenance mode for the affected services.
3. Confirm pgBackRest's last backup is < 24 h old (this script
auto-checks if you don't pass SKIP_BACKUP_FRESHNESS=1).
4. Re-invoke with :
INVENTORY=prod CONFIRM_PROD=1 \\
bash scripts/security/game-day-driver.sh
The driver will then ask for one more interactive confirmation
(type the word KILL-PROD) before the first scenario fires.
================================================================
EOF
exit 2
fi
# Backup-freshness pre-flight : refuse to run if the most recent
# pgBackRest full/diff is > 24 h old. Recovery from a stale backup
# can extend an outage from minutes to hours, so the cost of
# postponing the game day is much less than the cost of compounded
# data loss if scenario A fails to recover and we have to restore
# from yesterday-but-one.
if [ "$SKIP_BACKUP_FRESHNESS" != "1" ]; then
if command -v pgbackrest >/dev/null 2>&1; then
last_backup_ts=$(pgbackrest --stanza=veza info --output=json 2>/dev/null \
| python3 -c "
import json, sys
try:
data = json.load(sys.stdin)
backups = data[0]['backup'] if data else []
if not backups: print(0); sys.exit(0)
print(max(b['timestamp']['stop'] for b in backups))
except Exception:
print(0)
" 2>/dev/null || echo 0)
now_ts=$(date +%s)
age_seconds=$(( now_ts - last_backup_ts ))
if [ "$last_backup_ts" -eq 0 ]; then
fail "pgBackRest backup-freshness check failed : could not parse 'pgbackrest info'. Set SKIP_BACKUP_FRESHNESS=1 to override (only after manually verifying a recent backup exists)." 2
fi
if [ "$age_seconds" -gt 86400 ]; then
age_hours=$(( age_seconds / 3600 ))
fail "pgBackRest most recent backup is ${age_hours}h old (threshold 24h). Run a backup before the game day, or set SKIP_BACKUP_FRESHNESS=1 if you've validated freshness another way." 2
fi
log "pre-flight : pgBackRest most recent backup is $(( age_seconds / 3600 ))h $(( (age_seconds % 3600) / 60 ))m old (< 24h threshold) — OK"
else
log "WARN : pgbackrest CLI not on \$PATH ; skipping backup-freshness check. Set SKIP_BACKUP_FRESHNESS=1 to silence this warning if intentional."
fi
fi
# Final type-the-word confirm. Everything above can be set in env
# by mistake ; this last step requires a human at the keyboard.
cat >&2 <<EOF
================================================================
PROD GAME DAY — final confirmation
================================================================
inventory : prod
scenarios : ${SCENARIOS[*]}${ONLY:+ (filtered by ONLY=$ONLY)}${SKIP:+ (filtered by SKIP=$SKIP)}
session : $SESSION_LOG
Each scenario triggers a real outage. Type the literal phrase
KILL-PROD (any other input aborts) to proceed :
EOF
read -r confirm_phrase
if [ "$confirm_phrase" != "KILL-PROD" ]; then
fail "operator did not confirm KILL-PROD ($confirm_phrase) — aborting" 2
fi
;;
*)
fail "INVENTORY=$INVENTORY not recognised — must be one of staging|prod" 2
;;
esac
# Pre-flight : every selected scenario script must exist + be executable.
for s in "${SCENARIOS[@]}"; do
if want "$s"; then
@ -83,6 +189,7 @@ declare -A SCENARIO_DURATION
log "================================================================"
log "Game day session : $SESSION_DATE"
log "Inventory : $INVENTORY"
log "Session log : $SESSION_LOG"
log "Scenarios run : ${SCENARIOS[*]}"
[ -n "$ONLY" ] && log "ONLY filter : $ONLY"

View file

@ -0,0 +1,255 @@
#!/usr/bin/env bash
# monitor-checks.sh — poll the soft-launch acceptance gate live during
# the bêta window so the operator gets a heads-up before the decision
# call instead of discovering at 18:00 UTC that one threshold is red.
#
# Acceptance gate (per docs/SOFT_LAUNCH_BETA_2026.md §"Acceptance gate") :
# - ≥ 50 testers signed up (used_at != NULL on beta_invites)
# - 0 P1 events in Sentry today
# - Status page green for the last 4 h
# - Synthetic parcours all green for 6 h
# - Nightly k6 load test green
# - < 3 HIGH-severity issues reported
#
# v1.0.10 Cluster 3.4.
#
# Usage :
# DATABASE_URL=postgres://... \
# SENTRY_AUTH_TOKEN=... \
# STATUSPAGE_URL=https://status.veza.fr \
# PROM_URL=https://prom.veza.fr \
# bash scripts/soft-launch/monitor-checks.sh
#
# By default the script runs once and exits with the gate's verdict.
# Run it from cron (e.g. every 30 min) or pass LOOP=1 to keep checking
# in-place every CHECK_INTERVAL seconds (default 600 = 10 min).
#
# Optional env :
# LOOP=1 continuous mode
# CHECK_INTERVAL seconds between checks in LOOP mode (default 600)
# QUIET=1 only emit the verdict line (for cron piping)
# THRESHOLD_TESTERS override 50 (default), e.g. set to 100 for
# a stricter sub-window
#
# Exit codes :
# 0 — every gate green
# 1 — at least one gate red
# 2 — at least one gate could not be checked (collector down,
# token wrong, etc.) — operator must verify manually
# 3 — required env / tool missing
set -euo pipefail
DATABASE_URL=${DATABASE_URL:-?}
SENTRY_AUTH_TOKEN=${SENTRY_AUTH_TOKEN:-?}
STATUSPAGE_URL=${STATUSPAGE_URL:-https://status.veza.fr}
PROM_URL=${PROM_URL:-?}
LOOP=${LOOP:-0}
CHECK_INTERVAL=${CHECK_INTERVAL:-600}
QUIET=${QUIET:-0}
THRESHOLD_TESTERS=${THRESHOLD_TESTERS:-50}
[ "$DATABASE_URL" = "?" ] && { echo "DATABASE_URL required" >&2; exit 3; }
[ "$SENTRY_AUTH_TOKEN" = "?" ] && { echo "SENTRY_AUTH_TOKEN required (read scope sufficient)" >&2; exit 3; }
[ "$PROM_URL" = "?" ] && { echo "PROM_URL required" >&2; exit 3; }
command -v psql >/dev/null 2>&1 || { echo "psql required" >&2; exit 3; }
command -v curl >/dev/null 2>&1 || { echo "curl required" >&2; exit 3; }
command -v jq >/dev/null 2>&1 || { echo "jq required" >&2; exit 3; }
# ----------------------------------------------------------------------
# Individual gate checks. Each prints "✅ <name>" / "🔴 <name>" / "⚪ <name>"
# (last for "could not check"), and sets one of GATE_*_OK to 0 / 1 / 2.
# ----------------------------------------------------------------------
GATE_TESTERS_OK=2
GATE_SENTRY_OK=2
GATE_STATUSPAGE_OK=2
GATE_SYNTHETIC_OK=2
GATE_K6_OK=2
GATE_ISSUES_OK=2
check_testers() {
local count
count=$(psql "$DATABASE_URL" -A -t -c "
SELECT count(*) FROM beta_invites WHERE used_at IS NOT NULL;
" 2>/dev/null | tr -d ' ' || echo "?")
if [ "$count" = "?" ] || ! [[ "$count" =~ ^[0-9]+$ ]]; then
echo "⚪ testers signed-up : check failed (psql)"
GATE_TESTERS_OK=2
return
fi
if [ "$count" -ge "$THRESHOLD_TESTERS" ]; then
echo "✅ testers signed-up : $count / $THRESHOLD_TESTERS"
GATE_TESTERS_OK=0
else
echo "🔴 testers signed-up : $count / $THRESHOLD_TESTERS"
GATE_TESTERS_OK=1
fi
}
check_sentry_p1() {
# Sentry API : count of unresolved P1 issues last 24h.
local count
count=$(curl -s -H "Authorization: Bearer $SENTRY_AUTH_TOKEN" \
"https://sentry.io/api/0/projects/veza/veza-backend/issues/?statsPeriod=24h&query=is:unresolved%20level:fatal" \
2>/dev/null | jq 'length' 2>/dev/null || echo "?")
if [ "$count" = "?" ] || ! [[ "$count" =~ ^[0-9]+$ ]]; then
echo "⚪ Sentry P1 events 24h : check failed (auth or network)"
GATE_SENTRY_OK=2
return
fi
if [ "$count" -eq 0 ]; then
echo "✅ Sentry P1 events 24h : 0"
GATE_SENTRY_OK=0
else
echo "🔴 Sentry P1 events 24h : $count (must be 0)"
GATE_SENTRY_OK=1
fi
}
check_statuspage() {
local status
status=$(curl -s "$STATUSPAGE_URL/api/v1/status" 2>/dev/null \
| jq -r '.indicator // .status.indicator // ""' 2>/dev/null || echo "")
case "$status" in
none|operational)
echo "✅ status page : $status (green)"
GATE_STATUSPAGE_OK=0
;;
minor|major|critical)
echo "🔴 status page : $status"
GATE_STATUSPAGE_OK=1
;;
*)
echo "⚪ status page : check failed (got '$status')"
GATE_STATUSPAGE_OK=2
;;
esac
}
check_synthetic() {
# PromQL : sum of probe_success over the last 6h ; expect every
# parcours at 1 (success).
local query='probe_success{probe_kind="synthetic"} == 0'
local resp
resp=$(curl -s --get "$PROM_URL/api/v1/query" \
--data-urlencode "query=$query" 2>/dev/null)
local result_count
result_count=$(echo "$resp" | jq '.data.result | length' 2>/dev/null || echo "?")
if [ "$result_count" = "?" ] || ! [[ "$result_count" =~ ^[0-9]+$ ]]; then
echo "⚪ synthetic parcours : check failed (Prometheus)"
GATE_SYNTHETIC_OK=2
return
fi
if [ "$result_count" -eq 0 ]; then
echo "✅ synthetic parcours : all green"
GATE_SYNTHETIC_OK=0
else
local failing
failing=$(echo "$resp" | jq -r '.data.result[].metric.parcours' 2>/dev/null | tr '\n' ',' | sed 's/,$//')
echo "🔴 synthetic parcours : $result_count failing ($failing)"
GATE_SYNTHETIC_OK=1
fi
}
check_k6_nightly() {
# k6 nightly is exposed as veza_k6_nightly_last_success_timestamp_seconds
# by the Forgejo runner workflow's textfile-collector. Reading via Prom
# gives "is the last success < 30h old?".
local query='time() - veza_k6_nightly_last_success_timestamp_seconds'
local resp age
resp=$(curl -s --get "$PROM_URL/api/v1/query" \
--data-urlencode "query=$query" 2>/dev/null)
age=$(echo "$resp" | jq -r '.data.result[0].value[1] // ""' 2>/dev/null)
if [ -z "$age" ] || [ "$age" = "null" ]; then
echo "⚪ k6 nightly : check failed (metric absent — runner offline?)"
GATE_K6_OK=2
return
fi
age_int=$(printf '%.0f' "$age" 2>/dev/null || echo 999999)
if [ "$age_int" -lt 108000 ]; then # 30h
echo "✅ k6 nightly : last success $(( age_int / 3600 ))h ago"
GATE_K6_OK=0
else
echo "🔴 k6 nightly : last success $(( age_int / 3600 ))h ago (> 30h)"
GATE_K6_OK=1
fi
}
check_high_issues() {
# The operator-reported issues count lives in the SOFT_LAUNCH_BETA_2026.md
# report under "Issues reported". Without an external tracker we read it
# from a known location in the report file. Skip if file absent.
local report
report="$(cd "$(dirname "$0")/../.." && pwd)/docs/SOFT_LAUNCH_BETA_2026.md"
if [ ! -f "$report" ]; then
echo "⚪ HIGH issues count : report file not found"
GATE_ISSUES_OK=2
return
fi
local count
count=$(grep -cE '^\| HIGH ' "$report" 2>/dev/null || echo 0)
if [ "$count" -lt 3 ]; then
echo "✅ HIGH-severity issues reported : $count / < 3"
GATE_ISSUES_OK=0
else
echo "🔴 HIGH-severity issues reported : $count / < 3"
GATE_ISSUES_OK=1
fi
}
# ----------------------------------------------------------------------
# Main loop
# ----------------------------------------------------------------------
run_once() {
if [ "$QUIET" != "1" ]; then
echo "================================================================"
echo "Acceptance gate check — $(date -u +'%Y-%m-%d %H:%M:%S UTC')"
echo "----------------------------------------------------------------"
fi
check_testers
check_sentry_p1
check_statuspage
check_synthetic
check_k6_nightly
check_high_issues
if [ "$QUIET" != "1" ]; then
echo "----------------------------------------------------------------"
fi
local red=0 unknown=0
for v in "$GATE_TESTERS_OK" "$GATE_SENTRY_OK" "$GATE_STATUSPAGE_OK" \
"$GATE_SYNTHETIC_OK" "$GATE_K6_OK" "$GATE_ISSUES_OK"; do
case $v in
1) red=$(( red + 1 )) ;;
2) unknown=$(( unknown + 1 )) ;;
esac
done
if [ "$red" -eq 0 ] && [ "$unknown" -eq 0 ]; then
echo "VERDICT : ALL GATES GREEN — soft-launch is GO"
return 0
elif [ "$red" -gt 0 ]; then
echo "VERDICT : $red gate(s) RED — NO-GO until resolved"
return 1
else
echo "VERDICT : $unknown gate(s) UNCHECKABLE — operator must verify manually before decision call"
return 2
fi
}
if [ "$LOOP" != "1" ]; then
run_once
exit $?
fi
# Continuous mode.
while true; do
run_once || true
echo ""
echo "next check in ${CHECK_INTERVAL}s — Ctrl-C to exit"
sleep "$CHECK_INTERVAL"
done

View file

@ -0,0 +1,179 @@
#!/usr/bin/env bash
# send-invitations.sh — batch-insert beta invitations from a validated
# cohort CSV, generate unique invite codes, render personalised email
# bodies, optionally dispatch via SMTP.
#
# Wraps the validate-cohort.sh sanity check + a transactional INSERT
# into beta_invites + a per-recipient email render. Splits "generate
# the codes + render the emails" from "actually send" so a dry-run
# produces a flat directory of `.eml` files the operator can review
# before dispatch.
#
# v1.0.10 Cluster 3.4.
#
# Usage :
# # Step 1 : dry-run (default). Inserts beta_invites rows, emits
# # eml files but does NOT send anything.
# DATABASE_URL=postgres://... \
# bash scripts/soft-launch/send-invitations.sh path/to/cohort.csv
#
# # Step 2 : after reviewing the eml files, dispatch with msmtp /
# # sendmail / aws-ses-cli (or whatever SEND_CMD points at).
# SEND=1 SEND_CMD='msmtp -t' \
# bash scripts/soft-launch/send-invitations.sh path/to/cohort.csv
#
# Required env :
# DATABASE_URL Postgres URL (read+write to beta_invites)
# FRONTEND_URL base URL the invite link points at
# (e.g. https://staging.veza.fr)
#
# Optional env :
# SEND=1 actually dispatch ; otherwise dry-run (eml only)
# SEND_CMD sendmail-compatible command (default: 'msmtp -t')
# SENT_BY_EMAIL operator email for the beta_invites.sent_by FK ;
# defaults to the value in the CSV's third column
# FROM_ADDR From: header (default: invitations@veza.fr)
# SUBJECT email subject (default: 'You're in for the Veza beta')
# TEMPLATE path to eml template (default:
# templates/email/beta_invite.html.template)
# FORCE=1 skip validate-cohort.sh failures (use with care)
#
# Exit codes :
# 0 — everything succeeded
# 1 — cohort validation failed (see validate-cohort.sh)
# 2 — DB transaction failed
# 3 — required env missing
# 4 — dispatch failed for at least one recipient (see logs)
set -euo pipefail
REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
CSV=${1:-}
if [ -z "$CSV" ] || [ ! -f "$CSV" ]; then
echo "usage: bash scripts/soft-launch/send-invitations.sh path/to/cohort.csv" >&2
exit 3
fi
DATABASE_URL=${DATABASE_URL:-?}
FRONTEND_URL=${FRONTEND_URL:-?}
[ "$DATABASE_URL" = "?" ] && { echo "DATABASE_URL required" >&2; exit 3; }
[ "$FRONTEND_URL" = "?" ] && { echo "FRONTEND_URL required" >&2; exit 3; }
SEND=${SEND:-0}
SEND_CMD=${SEND_CMD:-msmtp -t}
FROM_ADDR=${FROM_ADDR:-invitations@veza.fr}
SUBJECT=${SUBJECT:-Vous êtes admis dans la bêta Veza}
TEMPLATE=${TEMPLATE:-$REPO_ROOT/templates/email/beta_invite.eml.template}
FORCE=${FORCE:-0}
SESSION_DATE="$(date +%Y%m%d-%H%M)"
OUTDIR="$REPO_ROOT/scripts/soft-launch/out-${SESSION_DATE}"
command -v psql >/dev/null 2>&1 || { echo "psql required" >&2; exit 3; }
command -v openssl >/dev/null 2>&1 || { echo "openssl required" >&2; exit 3; }
# Step 1 — validate the cohort. Bypass with FORCE=1 if needed.
echo "→ validating cohort $CSV"
if ! bash "$(dirname "$0")/validate-cohort.sh" "$CSV"; then
if [ "$FORCE" != "1" ]; then
echo "ERROR: cohort validation failed. Re-run with FORCE=1 to bypass (not recommended)." >&2
exit 1
fi
echo "WARN : cohort validation reported issues but FORCE=1 set — proceeding."
fi
mkdir -p "$OUTDIR"
echo "→ output dir $OUTDIR"
# Step 2 — generate codes + insert rows + render emails. Each insert
# is one transaction so a partial failure leaves consistent state.
gen_code() {
# 16-char base32-ish (no 0/1/I/L) so codes are paste-friendly.
openssl rand -hex 16 | tr 'a-f0-9' 'a-z2-9' \
| tr -d 'oilOIL01' | head -c 16
}
if [ ! -f "$TEMPLATE" ]; then
echo "ERROR: template $TEMPLATE not found." >&2
exit 3
fi
inserted=0
failed=0
failed_emails=()
while IFS=, read -r email cohort sent_by_email; do
email=$(echo "$email" | tr -d '\r' | xargs)
cohort=$(echo "$cohort" | tr -d '\r' | xargs)
sent_by_email=$(echo "$sent_by_email" | tr -d '\r' | xargs)
code=$(gen_code)
# Resolve sent_by user_id (may be NULL if operator email isn't a
# registered user — e.g. ops shared mailbox).
sent_by_id=$(psql "$DATABASE_URL" -A -t -c "
SELECT id::text FROM users WHERE email = '$sent_by_email' LIMIT 1;
" 2>/dev/null | tr -d ' ' || echo "")
if [ -z "$sent_by_id" ]; then
sent_by_clause="NULL"
else
sent_by_clause="'$sent_by_id'"
fi
if ! psql "$DATABASE_URL" -1 -c "
INSERT INTO beta_invites (code, email, cohort, sent_by, expires_at)
VALUES ('$code', '$email', '$cohort', $sent_by_clause, NOW() + INTERVAL '30 days');
" >/dev/null 2>&1; then
failed=$(( failed + 1 ))
failed_emails+=("$email")
continue
fi
inserted=$(( inserted + 1 ))
# Render the eml — operator-readable, ready for SEND_CMD.
eml="$OUTDIR/${email//[^a-zA-Z0-9._-]/_}.eml"
invite_url="$FRONTEND_URL/signup?invite=$code"
sed \
-e "s|{{TO_ADDR}}|$email|g" \
-e "s|{{FROM_ADDR}}|$FROM_ADDR|g" \
-e "s|{{SUBJECT}}|$SUBJECT|g" \
-e "s|{{INVITE_URL}}|$invite_url|g" \
-e "s|{{INVITE_CODE}}|$code|g" \
-e "s|{{COHORT}}|$cohort|g" \
-e "s|{{FRONTEND_URL}}|$FRONTEND_URL|g" \
"$TEMPLATE" > "$eml"
done < <(tail -n +2 "$CSV")
echo "→ inserted $inserted invitations into beta_invites"
echo "→ rendered $inserted emails to $OUTDIR"
[ "$failed" -gt 0 ] && {
echo "WARN : $failed insert(s) failed — see logs above"
for e in "${failed_emails[@]}"; do echo " - $e"; done
}
# Step 3 — optionally dispatch.
if [ "$SEND" != "1" ]; then
echo ""
echo "DRY-RUN — review the eml files in $OUTDIR before sending."
echo "When ready :"
echo " SEND=1 SEND_CMD='$SEND_CMD' bash scripts/soft-launch/send-invitations.sh $CSV"
exit 0
fi
echo "→ dispatching via : $SEND_CMD"
dispatch_failed=0
for eml in "$OUTDIR"/*.eml; do
if ! $SEND_CMD < "$eml" >>"$OUTDIR/dispatch.log" 2>&1; then
dispatch_failed=$(( dispatch_failed + 1 ))
echo " FAIL : $eml" | tee -a "$OUTDIR/dispatch.log"
fi
done
echo ""
if [ "$dispatch_failed" -gt 0 ]; then
echo "WARN : $dispatch_failed dispatch(es) failed — see $OUTDIR/dispatch.log"
exit 4
fi
echo "PASS : all $inserted invitations dispatched."
echo "Track redemption with :"
echo " psql \"\$DATABASE_URL\" -c 'SELECT cohort, count(*) FILTER (WHERE used_at IS NOT NULL) AS redeemed, count(*) AS total FROM beta_invites GROUP BY cohort ORDER BY cohort;'"

View file

@ -0,0 +1,173 @@
#!/usr/bin/env bash
# validate-cohort.sh — sanity-check a soft-launch beta cohort CSV
# before it gets fed to send-invitations.sh.
#
# The CSV is the operator's curated list of beta-tester emails +
# segmentation. This script catches the avoidable mistakes BEFORE
# we batch-insert 100 rows into beta_invites and start spraying
# emails :
#
# - Empty file or wrong header
# - Duplicate emails (would create 2 invites for the same person)
# - Malformed emails (missing @, leading/trailing whitespace)
# - Cohort distribution looks off (no creators, only one segment,
# under-50 total — soft-launch acceptance gate is ≥50 testers)
# - Email collisions with existing users (already registered = the
# invite code is wasted)
#
# v1.0.10 Cluster 3.4.
#
# Usage :
# bash scripts/soft-launch/validate-cohort.sh path/to/cohort.csv
#
# Optional env :
# DATABASE_URL if set, also checks for collisions with the users
# table (email already registered → flagged but not
# fatal — operator may want to invite an existing
# user back to test the new flows).
# MIN_COHORT minimum total rows required (default 50, matches the
# acceptance-gate threshold in SOFT_LAUNCH_BETA_2026.md).
# MIN_CREATORS minimum number of creator-* cohort rows (default 5).
#
# Exit codes :
# 0 — cohort valid
# 1 — cohort malformed (will block send-invitations.sh)
# 2 — cohort merely warns (size below minimum, missing collision
# check) ; operator may proceed with --force
set -euo pipefail
CSV=${1:-}
if [ -z "$CSV" ] || [ ! -f "$CSV" ]; then
cat >&2 <<EOF
usage : bash scripts/soft-launch/validate-cohort.sh path/to/cohort.csv
CSV format (header required) :
email,cohort,sent_by_email
alice@example.com,creator-vinyl,ops@veza.fr
bob@example.com,listener-jazz,ops@veza.fr
...
cohort labels are free-text but should follow the convention
<role>-<segment> so the post-launch attribution report groups cleanly.
EOF
exit 1
fi
MIN_COHORT=${MIN_COHORT:-50}
MIN_CREATORS=${MIN_CREATORS:-5}
# 1. Header check.
header=$(head -1 "$CSV" | tr -d '\r')
if [ "$header" != "email,cohort,sent_by_email" ]; then
echo "ERROR: header line must be exactly 'email,cohort,sent_by_email' (got: $header)" >&2
exit 1
fi
# 2. Row count + duplicates + email shape (awk pipeline reads once).
total=0
malformed=0
duplicates=0
declare -A seen
declare -A cohort_count
declare -a malformed_lines
while IFS=, read -r email cohort sent_by_email; do
email=$(echo "$email" | tr -d '\r' | xargs)
cohort=$(echo "$cohort" | tr -d '\r' | xargs)
total=$(( total + 1 ))
# Email shape : must contain exactly one @, no whitespace, > 5 chars.
if [[ ! "$email" =~ ^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$ ]]; then
malformed=$(( malformed + 1 ))
malformed_lines+=(" line $(( total + 1 )) : invalid email '$email'")
continue
fi
# Duplicate detection.
if [ -n "${seen[$email]:-}" ]; then
duplicates=$(( duplicates + 1 ))
malformed_lines+=(" line $(( total + 1 )) : duplicate email '$email' (first seen at line ${seen[$email]})")
continue
fi
seen[$email]=$(( total + 1 ))
# Cohort tally.
cohort_count[$cohort]=$(( ${cohort_count[$cohort]:-0} + 1 ))
done < <(tail -n +2 "$CSV")
echo "----------------------------------------------------------------"
echo "Cohort validation report"
echo "----------------------------------------------------------------"
echo " CSV file : $CSV"
echo " Total rows : $total"
echo " Unique emails : ${#seen[@]}"
echo " Malformed rows : $malformed"
echo " Duplicates : $duplicates"
echo ""
echo "Distribution by cohort :"
for c in "${!cohort_count[@]}"; do
printf " %-40s %d\n" "$c" "${cohort_count[$c]}"
done | sort
echo ""
exit_code=0
# 3. Hard checks (block send).
if [ "$malformed" -gt 0 ] || [ "$duplicates" -gt 0 ]; then
echo "ERROR: $malformed malformed + $duplicates duplicate row(s) — fix before sending."
for line in "${malformed_lines[@]}"; do
echo "$line"
done
exit 1
fi
# 4. Soft checks (warn, don't block — operator decides).
if [ "$total" -lt "$MIN_COHORT" ]; then
echo "WARN : cohort has $total rows ; soft-launch acceptance gate is ≥ $MIN_COHORT."
exit_code=2
fi
creator_total=0
for c in "${!cohort_count[@]}"; do
if [[ "$c" == creator-* ]]; then
creator_total=$(( creator_total + cohort_count[$c] ))
fi
done
if [ "$creator_total" -lt "$MIN_CREATORS" ]; then
echo "WARN : only $creator_total creator-* cohort rows ; goal is ≥ $MIN_CREATORS for upload-flow coverage."
exit_code=2
fi
if [ "${#cohort_count[@]}" -lt 3 ]; then
echo "WARN : only ${#cohort_count[@]} distinct cohort labels — feedback will be narrow."
exit_code=2
fi
# 5. Optional : DATABASE_URL collision check.
if [ -n "${DATABASE_URL:-}" ]; then
command -v psql >/dev/null 2>&1 || {
echo "WARN : DATABASE_URL set but psql not on \$PATH ; skipping collision check."
exit_code=2
}
if command -v psql >/dev/null 2>&1; then
emails_csv=$(printf '%s,' "${!seen[@]}" | sed 's/,$//')
collisions=$(psql "$DATABASE_URL" -A -t -c "
SELECT count(*) FROM users WHERE email = ANY(string_to_array('$emails_csv', ','));
" 2>/dev/null | tr -d ' ' || echo "?")
if [ "$collisions" = "?" ]; then
echo "WARN : couldn't query users table (psql connection issue) ; skipping collision check."
exit_code=2
elif [ "$collisions" -gt 0 ]; then
echo "INFO : $collisions email(s) in the cohort already exist in the users table — invite codes will be wasted on existing accounts."
exit_code=2
fi
fi
fi
echo ""
case $exit_code in
0) echo "PASS : cohort valid, ready for send-invitations.sh." ;;
2) echo "WARN : cohort valid but with caveats — review and re-run with --force from send-invitations.sh if intentional." ;;
esac
exit $exit_code

View file

@ -0,0 +1,65 @@
-- 990_beta_invites.sql
-- v1.0.10 polish (Cluster 3.4) — soft-launch beta cohort tracking.
--
-- Records each individual invitation sent for the v2.0.0 soft-launch
-- bêta. Tracks (a) the invite code used in the registration link,
-- (b) when the recipient redeemed it (NULL until redemption), and
-- (c) which cohort segment (creator / listener / community-member /
-- press) the recipient belongs to so the post-launch report can
-- attribute feedback by audience.
--
-- The associated email template + send script live at
-- scripts/soft-launch/send-invitations.sh and reference this table
-- via INSERT … RETURNING code.
--
-- Privacy : the email column is the only PII here ; no behavioural
-- data is stored. used_at is the redemption signal. After v2.0.0
-- public launch, run the cleanup migration in 991 (TBD) to anonymise
-- the email column for invites that haven't been redeemed in 30+ days.
CREATE TABLE IF NOT EXISTS public.beta_invites (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-- The invite code is what the recipient pastes into the signup
-- form. 16 random characters from a base32 alphabet (no 0/1/I/L
-- to avoid eyestrain). Generated by send-invitations.sh.
code VARCHAR(32) NOT NULL UNIQUE,
email VARCHAR(320) NOT NULL,
-- Free-text label so the cohort generator can carry whatever
-- segmentation the operator wants (e.g. "creator-vinyl-pressing",
-- "listener-jazz-mailing-list", "press-pitchfork"). Index below
-- is for the post-launch report grouping.
cohort VARCHAR(64) NOT NULL,
-- NULL until the recipient signs up. Set by the auth handler
-- when /auth/register is hit with a valid invite code.
used_at TIMESTAMPTZ,
-- Hard expiry so unredeemed invites can't accumulate forever.
-- Default 30 days from creation ; soft-launch is short-window.
expires_at TIMESTAMPTZ NOT NULL DEFAULT (NOW() + INTERVAL '30 days'),
-- Operator who sent the invite — useful when reconciling "who
-- gave their friend a code" during the audit.
sent_by UUID REFERENCES public.users(id) ON DELETE SET NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
COMMENT ON TABLE public.beta_invites IS
'v2.0.0 soft-launch beta invitation tracking. v1.0.10 Cluster 3.4.';
COMMENT ON COLUMN public.beta_invites.code IS
'16-char base32 invite code (no 0/1/I/L). Pasted into signup form.';
COMMENT ON COLUMN public.beta_invites.cohort IS
'Free-text cohort label (creator-* / listener-* / press-* / etc.).';
COMMENT ON COLUMN public.beta_invites.used_at IS
'Redemption timestamp. NULL means the invite is still pending.';
-- Lookup by code (signup path) — every /auth/register call reads it.
CREATE UNIQUE INDEX IF NOT EXISTS idx_beta_invites_code
ON public.beta_invites(code);
-- Cohort grouping for the post-launch attribution query.
CREATE INDEX IF NOT EXISTS idx_beta_invites_cohort
ON public.beta_invites(cohort);
-- Pending-invitations sweep — cron job that expires unused invites
-- after expires_at. Partial index keeps it small.
CREATE INDEX IF NOT EXISTS idx_beta_invites_pending_expiry
ON public.beta_invites(expires_at)
WHERE used_at IS NULL;

View file

@ -0,0 +1,92 @@
To: {{TO_ADDR}}
From: Veza <{{FROM_ADDR}}>
Subject: {{SUBJECT}}
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="--veza-beta-boundary"
----veza-beta-boundary
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
Bonjour,
Vous êtes invité·e à rejoindre la bêta privée de Veza —
une plateforme de streaming musical éthique faite pour les
créateur·ices et les auditeur·ices, sans algorithme de
recommandation comportementale, sans gamification, sans dark
patterns.
Votre code d'invitation : {{INVITE_CODE}}
Pour vous inscrire :
{{INVITE_URL}}
Le code expire dans 30 jours.
Pendant la bêta, l'idée est simple : utilisez Veza comme vous
utiliseriez n'importe quelle plateforme musicale. Uploadez,
écoutez, partagez, achetez. Quand quelque chose vous frustre
ou vous étonne — en bien comme en mal — dites-le. Le canal
de retour vous sera communiqué après l'inscription.
Cohorte : {{COHORT}}
(C'est juste un tag interne pour qu'on regroupe les retours
par contexte d'usage. Ça n'affecte ni votre expérience ni vos
permissions.)
À très vite,
L'équipe Veza
--
Si vous n'avez pas demandé cette invitation, ignorez ce
message. Le code expirera automatiquement après 30 jours.
----veza-beta-boundary
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: 7bit
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Invitation à la bêta Veza</title>
</head>
<body style="font-family: Georgia, 'Times New Roman', serif; line-height: 1.6; color: #1a1a1e; margin: 0; padding: 0; background-color: #f8f7f4;">
<div style="max-width: 600px; margin: 20px auto; padding: 30px; background-color: #ffffff; border: 1px solid #e8e6e0;">
<h1 style="font-weight: 400; color: #1a1a1e; margin-top: 0; font-size: 28px;">Bienvenue dans la bêta Veza.</h1>
<p>Bonjour,</p>
<p>Vous êtes invité·e à rejoindre la <strong>bêta privée</strong> de Veza — une plateforme de streaming musical éthique faite pour les créateur·ices et les auditeur·ices, sans algorithme de recommandation comportementale, sans gamification, sans dark patterns.</p>
<div style="text-align: center; margin: 35px 0;">
<a href="{{INVITE_URL}}" style="background-color: #1a1a1e; color: #f8f7f4; padding: 14px 32px; text-decoration: none; display: inline-block; font-weight: 400; letter-spacing: 0.05em;">
Activer mon invitation
</a>
</div>
<p style="color: #555; font-size: 14px;">Ou collez ce lien dans votre navigateur :</p>
<p style="word-break: break-all; color: #888; background-color: #f8f7f4; padding: 10px; font-family: 'Courier New', monospace; font-size: 12px; border-left: 2px solid #d4a574;">{{INVITE_URL}}</p>
<p style="color: #555; font-size: 14px; margin-top: 25px;">Code d'invitation :</p>
<p style="font-family: 'Courier New', monospace; font-size: 18px; letter-spacing: 0.1em; background-color: #f8f7f4; padding: 12px; text-align: center; color: #1a1a1e;">{{INVITE_CODE}}</p>
<hr style="border: none; border-top: 1px solid #e8e6e0; margin: 30px 0;">
<p style="font-size: 14px; color: #555;">Pendant la bêta, l'idée est simple : utilisez Veza comme vous utiliseriez n'importe quelle plateforme musicale. Uploadez, écoutez, partagez, achetez. Quand quelque chose vous frustre ou vous étonne — en bien comme en mal — dites-le. Le canal de retour vous sera communiqué après l'inscription.</p>
<p style="font-size: 13px; color: #888; margin-top: 25px;">Cohorte : <strong>{{COHORT}}</strong> — c'est juste un tag interne pour qu'on regroupe les retours par contexte d'usage.</p>
<p style="margin-top: 30px; color: #888; font-size: 12px;">
Le code expire dans 30 jours. Si vous n'avez pas demandé cette invitation, ignorez ce message.
</p>
<hr style="border: none; border-top: 1px solid #e8e6e0; margin: 25px 0;">
<p style="color: #aaa; font-size: 11px; text-align: center; font-family: 'Courier New', monospace; letter-spacing: 0.1em;">
VEZA · v2.0.0 BETA · {{FRONTEND_URL}}
</p>
</div>
</body>
</html>
----veza-beta-boundary--