Compare commits
13 commits
c323d37c30
...
112c64a22b
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
112c64a22b | ||
|
|
2a5bc11628 | ||
|
|
e780fbcd18 | ||
|
|
05b1d81d30 | ||
|
|
6c644cff03 | ||
|
|
0bd3e563b2 | ||
|
|
d9896686bd | ||
|
|
c97e42996e | ||
|
|
b6147549c9 | ||
|
|
7253f0cf10 | ||
|
|
385a8f0378 | ||
|
|
e97b91f010 | ||
|
|
c245b72e05 |
19 changed files with 1870 additions and 24 deletions
187
docs/PENTEST_SEND_PACKAGE.md
Normal file
187
docs/PENTEST_SEND_PACKAGE.md
Normal file
|
|
@ -0,0 +1,187 @@
|
||||||
|
# Pentest send package — v2026 engagement
|
||||||
|
|
||||||
|
> Operational checklist for handing off the v1.0.9 pre-launch pentest
|
||||||
|
> brief to the external team. Companion to `docs/PENTEST_SCOPE_2026.md`
|
||||||
|
> (the technical scope) — this doc is purely "what you send, in what
|
||||||
|
> order, via which channel."
|
||||||
|
|
||||||
|
The scope doc is technical and reusable across engagements. This file
|
||||||
|
is the per-engagement "send package" that wraps it: the email template,
|
||||||
|
the credentials-delivery plan, the IP allow-list step, and the kick-off
|
||||||
|
checklist.
|
||||||
|
|
||||||
|
## The 5-step send sequence
|
||||||
|
|
||||||
|
Run these in order. Each step has a check (✓) the operator ticks before
|
||||||
|
moving to the next — out-of-order steps cause the engagement to stall.
|
||||||
|
|
||||||
|
### Step 1 — counter-sign the NDA + authorisation letter
|
||||||
|
|
||||||
|
- [ ] NDA template signed by the pentester firm and counter-signed by us.
|
||||||
|
- [ ] Authorisation-to-test letter signed by Veza tech lead (limits the
|
||||||
|
scope to what's in `PENTEST_SCOPE_2026.md` §"In-scope assets" — the
|
||||||
|
letter MUST list the staging URL explicitly so a reviewer can map
|
||||||
|
pentester traffic to authorised activity).
|
||||||
|
- [ ] Both PDFs uploaded to the shared 1Password vault (entry name :
|
||||||
|
`pentest-2026-legal`). Do **not** email PDFs.
|
||||||
|
|
||||||
|
### Step 2 — provision pentester credentials
|
||||||
|
|
||||||
|
- [ ] Run `bash scripts/pentest/seed-test-accounts.sh staging` (creates
|
||||||
|
the 3 accounts from `PENTEST_SCOPE_2026.md` §"Authentication
|
||||||
|
context", outputs random passwords).
|
||||||
|
- [ ] Output passwords land in three 1Password entries :
|
||||||
|
`pentest-2026-listener`, `pentest-2026-creator`, `pentest-2026-admin`.
|
||||||
|
Each entry's "Notes" field includes the role and the MFA bypass
|
||||||
|
token if applicable.
|
||||||
|
- [ ] Share each entry **read-only** with the pentester's 1Password
|
||||||
|
account using the firm's billing email. Do **not** put passwords
|
||||||
|
in chat, email, or shell history.
|
||||||
|
- [ ] Set entry expiration to engagement-end + 7 days (so cleanup is
|
||||||
|
automatic if the team forgets to revoke).
|
||||||
|
|
||||||
|
### Step 3 — allow-list the pentester's IP
|
||||||
|
|
||||||
|
The Forgejo source-code mirror at `https://10.0.20.105:3000/senke/veza`
|
||||||
|
is grey-box read-only access. The pentester needs their static
|
||||||
|
egress IP allow-listed before they can `git clone`.
|
||||||
|
|
||||||
|
- [ ] Pentester sends their static egress IP (PGP-signed mail, or
|
||||||
|
1Password Notes field).
|
||||||
|
- [ ] SSH to `srv-102v` (Forgejo container) and add the IP to
|
||||||
|
`/etc/forgejo/allowlist.conf`.
|
||||||
|
- [ ] `systemctl reload forgejo`.
|
||||||
|
- [ ] Verify : `curl -I https://10.0.20.105:3000/senke/veza` from the
|
||||||
|
pentester IP returns 200 ; from any other IP, 403.
|
||||||
|
|
||||||
|
(A future iteration could turn this into an Ansible playbook
|
||||||
|
`infra/ansible/playbooks/pentest_allowlist_ip.yml`. For now the manual
|
||||||
|
SSH path is fine — this happens once per engagement.)
|
||||||
|
|
||||||
|
### Step 4 — send the kick-off email
|
||||||
|
|
||||||
|
Use the template below. Replace the placeholders inside `<…>`. Send
|
||||||
|
PGP-encrypted (the pentester's key is in their security.txt) to
|
||||||
|
**both** their lead pentester and their project manager so the chain
|
||||||
|
of responsibility is recorded.
|
||||||
|
|
||||||
|
```text
|
||||||
|
Subject : [PENTEST] Veza v1.0.9 pre-launch engagement — kick-off
|
||||||
|
|
||||||
|
Hi <lead pentester first name>,
|
||||||
|
|
||||||
|
Per the signed scope letter dated <YYYY-MM-DD>, the Veza v1.0.9
|
||||||
|
pre-launch pentest engagement starts on <YYYY-MM-DD>. The brief is
|
||||||
|
attached as PENTEST_SCOPE_2026.md (see also the rendered HTML at
|
||||||
|
https://staging.veza.fr/legal/pentest-scope-2026.html).
|
||||||
|
|
||||||
|
Quick links :
|
||||||
|
|
||||||
|
• Staging URL : https://staging.veza.fr
|
||||||
|
• Source code : https://10.0.20.105:3000/senke/veza
|
||||||
|
(grey-box, read-only ; your egress IP <PENTESTER_IP>
|
||||||
|
has been allow-listed as of <YYYY-MM-DD HH:MM UTC>.)
|
||||||
|
• Status page : https://status.veza.fr (we'll lower the alert
|
||||||
|
threshold during your engagement so the SOC isn't
|
||||||
|
paged on every benign 401).
|
||||||
|
• Test accounts: shared with your firm's 1Password — entries
|
||||||
|
pentest-2026-{listener,creator,admin}. Passwords
|
||||||
|
expire <engagement_end + 7d>.
|
||||||
|
|
||||||
|
Engagement window :
|
||||||
|
|
||||||
|
• Start : <YYYY-MM-DD>
|
||||||
|
• End : <YYYY-MM-DD> (~10 business days)
|
||||||
|
• Re-test: 1 round, after our team's fix pass (typically 2 weeks
|
||||||
|
after the initial report)
|
||||||
|
|
||||||
|
Communications :
|
||||||
|
|
||||||
|
• Async : security@veza.fr (PGP fingerprint at
|
||||||
|
https://veza.fr/.well-known/security.txt)
|
||||||
|
• Weekly sync : <weekday HH:MM TZ>, video link in the calendar invite
|
||||||
|
• Critical findings : phone the on-call number in the contract
|
||||||
|
(HIGH severity = phone, not email)
|
||||||
|
|
||||||
|
Expected deliverables :
|
||||||
|
|
||||||
|
• Initial findings report (markdown or PDF) at engagement end
|
||||||
|
• Re-test report after our fix pass
|
||||||
|
• Optional : exec-level summary slide deck
|
||||||
|
|
||||||
|
Reach out if anything in PENTEST_SCOPE_2026.md is unclear before
|
||||||
|
day 1. Otherwise — good hunting.
|
||||||
|
|
||||||
|
Best,
|
||||||
|
<Tech lead name>
|
||||||
|
Veza
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] Email PGP-signed and sent.
|
||||||
|
- [ ] Calendar invite sent for the weekly sync.
|
||||||
|
- [ ] Slack/Signal channel created for HIGH-severity escalation
|
||||||
|
(channel naming : `#pentest-2026-veza`).
|
||||||
|
|
||||||
|
### Step 5 — lower the SOC alerting threshold
|
||||||
|
|
||||||
|
During the engagement, automated scanners and authentication
|
||||||
|
brute-force attempts WILL fire alerts. Tune them down so the on-call
|
||||||
|
isn't paged on every legitimate pentester action.
|
||||||
|
|
||||||
|
- [ ] In `config/prometheus/alert_rules.yml` → `HighErrorRate`,
|
||||||
|
`HighLatencyP99` : add a `for: 30m` override OR mute via
|
||||||
|
Alertmanager silence (recommended: silence rather than edit
|
||||||
|
rules so the change auto-expires at engagement end).
|
||||||
|
- [ ] Silence URL : `https://prometheus.veza.fr/alertmanager/#/silences/new`
|
||||||
|
→ matchers: `severity=warning`, comment: `pentest-2026 active`,
|
||||||
|
duration: `engagement_end + 24h`.
|
||||||
|
- [ ] Subscribe the engagement Slack channel to the silence's
|
||||||
|
auto-removal so the SOC knows when the heightened alerting
|
||||||
|
resumes.
|
||||||
|
|
||||||
|
## Reception checklist (after pentester confirms receipt)
|
||||||
|
|
||||||
|
- [ ] Pentester replied to the kick-off email within 1 business day.
|
||||||
|
- [ ] Pentester confirmed they can `git clone` the source repo.
|
||||||
|
- [ ] Pentester confirmed they can log in as each of the 3 test
|
||||||
|
accounts.
|
||||||
|
- [ ] Pentester confirmed the staging URL responds (`/api/v1/health`
|
||||||
|
returns 200).
|
||||||
|
- [ ] First findings — even informational — start landing in the
|
||||||
|
shared report by end of engagement day 3 (a complete silence
|
||||||
|
until the final report is a process smell).
|
||||||
|
|
||||||
|
If any reception checklist item fails after 24h, the engagement
|
||||||
|
hasn't really started. Phone the firm's PM, don't email.
|
||||||
|
|
||||||
|
## Post-engagement housekeeping
|
||||||
|
|
||||||
|
- [ ] Findings report received → import into the issue tracker as
|
||||||
|
separate tickets, severity preserved, attribution
|
||||||
|
`external-pentest-2026`.
|
||||||
|
- [ ] Fix pass scheduled and timeboxed (HIGH within 1 week, MEDIUM
|
||||||
|
within 4 weeks, LOW best-effort).
|
||||||
|
- [ ] Re-test scheduled 2 weeks after fix-pass start.
|
||||||
|
- [ ] Re-test report received → update the ticket statuses ; any
|
||||||
|
remaining unresolved finding above LOW blocks v2.0.0-public.
|
||||||
|
- [ ] Test accounts' passwords manually rotated **the day the
|
||||||
|
engagement ends** (don't wait for 1Password's auto-expiry).
|
||||||
|
- [ ] Pentester IP removed from Forgejo allow-list.
|
||||||
|
- [ ] Alertmanager silence removed (should auto-remove, but verify).
|
||||||
|
- [ ] Engagement folder zipped and stored at
|
||||||
|
`docs/archive/pentest-2026/` (kept 5 years for audit trail).
|
||||||
|
- [ ] Public summary blog post drafted (no findings details, just the
|
||||||
|
"we did this, here's what we learned" framing). Reviewed by
|
||||||
|
legal before publish.
|
||||||
|
|
||||||
|
## Linked artefacts
|
||||||
|
|
||||||
|
- `docs/PENTEST_SCOPE_2026.md` — the technical scope (what's testable)
|
||||||
|
- `docs/SECURITY_PRELAUNCH_AUDIT.md` — internal Day 21 audit (what we
|
||||||
|
already cleared)
|
||||||
|
- `docs/archive/PENTEST_REPORT_VEZA_v0.12.6.md` — last engagement's
|
||||||
|
report, format reference for what to expect back
|
||||||
|
- `scripts/pentest/seed-test-accounts.sh` — credential provisioning
|
||||||
|
helper (creates the 3 staging accounts referenced in the scope)
|
||||||
|
- `docs/GO_NO_GO_CHECKLIST_v2.0.0_PUBLIC.md` — the row this engagement
|
||||||
|
unblocks
|
||||||
150
docs/SOFT_LAUNCH_BETA_2026_CHECKLIST.md
Normal file
150
docs/SOFT_LAUNCH_BETA_2026_CHECKLIST.md
Normal file
|
|
@ -0,0 +1,150 @@
|
||||||
|
# Soft-launch beta — pre-flight checklist
|
||||||
|
|
||||||
|
> Operational checklist that must reach 100% green before the first
|
||||||
|
> invitation goes out. Companion to `docs/SOFT_LAUNCH_BETA_2026.md`
|
||||||
|
> (the bigger picture). This file is purely the "before you press
|
||||||
|
> send, has every gate been verified?" view.
|
||||||
|
|
||||||
|
The whole reason the soft-launch is "soft" is that it lets you catch
|
||||||
|
infrastructure surprises with 50 testers instead of 50 000. To get
|
||||||
|
that benefit, the infrastructure has to actually work BEFORE the
|
||||||
|
invitations land. This checklist is the gate.
|
||||||
|
|
||||||
|
## T-72h checklist (3 days before send)
|
||||||
|
|
||||||
|
### Database
|
||||||
|
|
||||||
|
- [ ] `migrations/990_beta_invites.sql` applied to staging.
|
||||||
|
Verify with :
|
||||||
|
```bash
|
||||||
|
psql "$STAGING_DATABASE_URL" -c "SELECT count(*) FROM beta_invites;"
|
||||||
|
```
|
||||||
|
Expected : `0` (table exists, empty).
|
||||||
|
- [ ] Same migration applied to prod (whenever prod tag goes out).
|
||||||
|
- [ ] Backup-freshness OK on both environments :
|
||||||
|
```bash
|
||||||
|
pgbackrest --stanza=veza info | head -20
|
||||||
|
```
|
||||||
|
Most recent full or diff < 24 h old.
|
||||||
|
|
||||||
|
### Cohort CSV
|
||||||
|
|
||||||
|
- [ ] CSV file built from the operator's chosen sources (mailing list +
|
||||||
|
contacts + community partners). Format per
|
||||||
|
`scripts/soft-launch/validate-cohort.sh` header.
|
||||||
|
- [ ] `validate-cohort.sh` returns exit 0 (or exit 2 with explicit
|
||||||
|
operator acknowledgement of the warnings).
|
||||||
|
- [ ] Distribution sanity : `≥ 5` creators, `≥ 20` listeners, `≥ 3`
|
||||||
|
distinct cohort labels, `≥ 50` total rows.
|
||||||
|
|
||||||
|
### Email infrastructure
|
||||||
|
|
||||||
|
- [ ] SMTP credentials live in the operator's machine `~/.msmtprc`
|
||||||
|
(or whatever `SEND_CMD` resolves to).
|
||||||
|
- [ ] `templates/email/beta_invite.eml.template` reviewed — wording,
|
||||||
|
cohort variable, code variable.
|
||||||
|
- [ ] Test send to operator's own email :
|
||||||
|
```bash
|
||||||
|
echo "ops@veza.fr,test-cohort,ops@veza.fr" > /tmp/me.csv
|
||||||
|
DATABASE_URL=$STAGING_DATABASE_URL FRONTEND_URL=https://staging.veza.fr \
|
||||||
|
SEND=1 bash scripts/soft-launch/send-invitations.sh /tmp/me.csv
|
||||||
|
```
|
||||||
|
Verify the eml renders correctly in your mail client (links
|
||||||
|
clickable, fonts loaded, no `{{TO_ADDR}}` literals leaking).
|
||||||
|
|
||||||
|
### Backend invite-redemption path
|
||||||
|
|
||||||
|
- [ ] Visit `https://staging.veza.fr/signup?invite=<test-code>`.
|
||||||
|
Expected : signup form pre-fills the code, refuses to submit
|
||||||
|
without it, marks the invite as `used_at = NOW()` after success.
|
||||||
|
- [ ] Try an invalid code → form rejects with a clear error message.
|
||||||
|
- [ ] Try the same code twice → second attempt rejects (one-time use).
|
||||||
|
- [ ] Try an expired code → form rejects with "expired".
|
||||||
|
|
||||||
|
### Acceptance-gate monitoring
|
||||||
|
|
||||||
|
- [ ] Run `monitor-checks.sh` once on staging — every gate either ✅
|
||||||
|
or ⚪ (unknown), no 🔴.
|
||||||
|
```bash
|
||||||
|
DATABASE_URL=$STAGING_DATABASE_URL \
|
||||||
|
SENTRY_AUTH_TOKEN=... \
|
||||||
|
PROM_URL=https://prom.veza.fr \
|
||||||
|
bash scripts/soft-launch/monitor-checks.sh
|
||||||
|
```
|
||||||
|
- [ ] Schedule the cron run (or tmux session) so the gate state is
|
||||||
|
visible during the bêta window without manual re-run.
|
||||||
|
|
||||||
|
### Communications
|
||||||
|
|
||||||
|
- [ ] Discord `#beta-feedback` channel created, ground rules pinned.
|
||||||
|
- [ ] Typeform feedback form created ; URL pasted into
|
||||||
|
`templates/email/beta_invite.eml.template` if not already in the
|
||||||
|
cohort label.
|
||||||
|
- [ ] Status page maintenance window declared for the duration —
|
||||||
|
"elevated alerting may occur during beta period."
|
||||||
|
- [ ] Operators on duty for the day rota'd in the calendar (every 4 h
|
||||||
|
shift, primary + backup).
|
||||||
|
|
||||||
|
## D-day checklist (the day of send)
|
||||||
|
|
||||||
|
### Last hour before send
|
||||||
|
|
||||||
|
- [ ] Most recent k6 nightly green (within 30 h).
|
||||||
|
- [ ] No pending high-severity Sentry issue.
|
||||||
|
- [ ] No PagerDuty incident open.
|
||||||
|
- [ ] HAProxy + backend healthchecks green :
|
||||||
|
```bash
|
||||||
|
curl -s https://staging.veza.fr/api/v1/health | jq .status
|
||||||
|
```
|
||||||
|
- [ ] MinIO drives all online ; pgBackRest drill ran successfully in
|
||||||
|
the last 7 days.
|
||||||
|
|
||||||
|
### Send
|
||||||
|
|
||||||
|
- [ ] `validate-cohort.sh` exit code 0 (or 2 with explicit override).
|
||||||
|
- [ ] `send-invitations.sh` in DRY-RUN mode : eml output dir reviewed.
|
||||||
|
- [ ] `send-invitations.sh` with `SEND=1` : dispatch.log reviewed
|
||||||
|
after run, `0` failed dispatches.
|
||||||
|
- [ ] First three invitees received the email within 5 min (manual
|
||||||
|
check on three different domains : gmail / proton / one custom).
|
||||||
|
|
||||||
|
### Hour 1 post-send
|
||||||
|
|
||||||
|
- [ ] First sign-up landed (`SELECT count(*) FROM beta_invites WHERE
|
||||||
|
used_at IS NOT NULL;` returns ≥ 1).
|
||||||
|
- [ ] No spike in 5xx on Grafana "Veza API Overview".
|
||||||
|
- [ ] Discord `#beta-feedback` has at least one "I'm in" message.
|
||||||
|
|
||||||
|
### Every 4 h during the bêta window
|
||||||
|
|
||||||
|
- [ ] Re-run `monitor-checks.sh` (or the cron wakes you).
|
||||||
|
- [ ] Triage any HIGH-severity report within 1 h (per
|
||||||
|
`docs/SOFT_LAUNCH_BETA_2026.md` §"Issue triage matrix").
|
||||||
|
- [ ] Update the issues-reported table in
|
||||||
|
`docs/SOFT_LAUNCH_BETA_2026.md` so the decision call has fresh data.
|
||||||
|
|
||||||
|
## D+0 18:00 UTC — decision call
|
||||||
|
|
||||||
|
- [ ] Tech lead, product lead, on-call engineer all on the call.
|
||||||
|
- [ ] `monitor-checks.sh` final run shown live ; verdict screenshotted.
|
||||||
|
- [ ] Each acceptance-gate row from `SOFT_LAUNCH_BETA_2026.md`
|
||||||
|
§"Acceptance gate" walked through verbally.
|
||||||
|
- [ ] Unanimous GO or any one NO-GO documented in the meeting notes.
|
||||||
|
- [ ] Decision logged in `docs/SOFT_LAUNCH_BETA_2026.md` §"Take-aways".
|
||||||
|
|
||||||
|
If GO : the v2.0.0-public tag goes out the next morning.
|
||||||
|
If NO-GO : the meeting decides scope of fix-pass + new acceptance date.
|
||||||
|
|
||||||
|
## Linked artefacts
|
||||||
|
|
||||||
|
- `docs/SOFT_LAUNCH_BETA_2026.md` — the bigger picture (cohort
|
||||||
|
definition, email template inline, day timeline, monitoring list,
|
||||||
|
acceptance gate, decision protocol)
|
||||||
|
- `migrations/990_beta_invites.sql` — schema this depends on
|
||||||
|
- `scripts/soft-launch/validate-cohort.sh` — pre-send sanity check
|
||||||
|
- `scripts/soft-launch/send-invitations.sh` — batch insert + send
|
||||||
|
- `scripts/soft-launch/monitor-checks.sh` — live gate poll
|
||||||
|
- `templates/email/beta_invite.eml.template` — the email recipients
|
||||||
|
receive
|
||||||
|
- `docs/GO_NO_GO_CHECKLIST_v2.0.0_PUBLIC.md` — the v2.0.0 checklist
|
||||||
|
this unblocks
|
||||||
164
docs/runbooks/rabbitmq-down.md
Normal file
164
docs/runbooks/rabbitmq-down.md
Normal file
|
|
@ -0,0 +1,164 @@
|
||||||
|
# Runbook — RabbitMQ unavailable
|
||||||
|
|
||||||
|
> **Alert** : `RabbitMQUnreachable` (in `config/prometheus/alert_rules.yml`).
|
||||||
|
> **Owner** : infra on-call.
|
||||||
|
> **Game-day scenario** : E (`infra/ansible/tests/test_rabbitmq_outage.sh`).
|
||||||
|
|
||||||
|
## What breaks when RabbitMQ is down
|
||||||
|
|
||||||
|
RabbitMQ is a fan-out broker for asynchronous, non-user-facing work
|
||||||
|
(transcode jobs, distribution to external platforms, email digests,
|
||||||
|
DMCA takedown propagation, search index updates). The user-facing
|
||||||
|
request path does NOT block on RabbitMQ — the API publishes a message
|
||||||
|
and returns 202 Accepted ; the worker picks it up later.
|
||||||
|
|
||||||
|
| Subsystem | Effect when RabbitMQ is gone | Severity |
|
||||||
|
| ------------------------------------ | ------------------------------------------------------------------ | -------- |
|
||||||
|
| Track upload → HLS transcode | Upload succeeds (S3 write OK), HLS segments don't appear | **MEDIUM** — track playable via fallback `/stream`, not via HLS |
|
||||||
|
| Distribution to Spotify/SoundCloud | Submission silently queued ; users see "pending" forever | MEDIUM — surfaces in distribution dashboard, not in player |
|
||||||
|
| Email digest (weekly creator stats) | Cron tick logs `publish failed`, retries on next tick | LOW — eventual consistency, no user-visible breakage |
|
||||||
|
| DMCA takedown event | Track flag flipped in DB synchronously ; downstream replay queue stalls | **HIGH** — track is gated immediately (synchronous DB UPDATE), but cache invalidation lags |
|
||||||
|
| Search index updates | New tracks not searchable until queue drains | LOW — falls back to Postgres FTS |
|
||||||
|
| Chat messages (WebSocket) | INDEPENDENT — chat is direct WS, no RabbitMQ involvement | NONE |
|
||||||
|
| Auth, sessions, payments | INDEPENDENT — no RabbitMQ dependency | NONE |
|
||||||
|
|
||||||
|
The synchronous-fail-loud cases (DMCA cache invalidation, transcode
|
||||||
|
queue) are the ones that compound if the outage drags. Most user
|
||||||
|
flows degrade gracefully.
|
||||||
|
|
||||||
|
## First moves
|
||||||
|
|
||||||
|
1. **Confirm RabbitMQ is actually down**, not "unreachable from one
|
||||||
|
host" :
|
||||||
|
```bash
|
||||||
|
curl -s -u "$RMQ_USER:$RMQ_PASS" http://rabbitmq.lxd:15672/api/overview \
|
||||||
|
| jq '.cluster_name, .object_totals'
|
||||||
|
```
|
||||||
|
2. **Confirm what changed.** If a deploy fired in the last 30 min,
|
||||||
|
suspect the deploy. Check `journalctl -u veza-backend-api -n 200`
|
||||||
|
for `amqp` errors with timestamps after the deploy.
|
||||||
|
3. **Check the queues didn't fill the disk** (most common bring-down
|
||||||
|
in development) :
|
||||||
|
```bash
|
||||||
|
ssh rabbitmq.lxd 'df -h /var/lib/rabbitmq'
|
||||||
|
```
|
||||||
|
|
||||||
|
## RabbitMQ instance is down
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# State on the RabbitMQ host :
|
||||||
|
ssh rabbitmq.lxd sudo systemctl status rabbitmq-server
|
||||||
|
|
||||||
|
# Logs (Erlang verbosity, grep for ERROR/CRASH) :
|
||||||
|
ssh rabbitmq.lxd sudo journalctl -u rabbitmq-server -n 500 \
|
||||||
|
| grep -E 'ERROR|CRASH|disk_alarm|memory_alarm'
|
||||||
|
```
|
||||||
|
|
||||||
|
Common causes :
|
||||||
|
|
||||||
|
- **Disk alarm.** `/var/lib/rabbitmq` filled — RabbitMQ pauses producers
|
||||||
|
when free space drops below `disk_free_limit`. The backend's amqp
|
||||||
|
client surfaces this as "blocked". Fix : grow the disk or expire old
|
||||||
|
messages with `rabbitmqctl purge_queue <queue>` (last resort, you
|
||||||
|
lose what's in there).
|
||||||
|
- **Memory alarm.** RSS over `vm_memory_high_watermark` × system mem.
|
||||||
|
Same effect (producers blocked). Fix : add memory or unblock by
|
||||||
|
draining a slow consumer.
|
||||||
|
- **Process crashed.** Erlang OOM, segfault. `sudo systemctl restart
|
||||||
|
rabbitmq-server` ; the queues survive (durable=true on every queue
|
||||||
|
we declare).
|
||||||
|
- **Cluster split-brain.** v1.0 is single-node, so this can't happen
|
||||||
|
yet. Listed for the v1.1 multi-node config.
|
||||||
|
|
||||||
|
## Backend can't reach RabbitMQ
|
||||||
|
|
||||||
|
Network or DNS issue, not RabbitMQ's fault.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# From the API container :
|
||||||
|
nc -zv rabbitmq.lxd 5672
|
||||||
|
|
||||||
|
# DNS :
|
||||||
|
getent hosts rabbitmq.lxd
|
||||||
|
|
||||||
|
# AMQP credentials :
|
||||||
|
docker exec veza_backend_api env | grep AMQP_URL
|
||||||
|
```
|
||||||
|
|
||||||
|
Likely culprits : Incus bridge restart, password rotation didn't
|
||||||
|
propagate to the API container's env, security-group change.
|
||||||
|
|
||||||
|
## Mitigation while RabbitMQ is down
|
||||||
|
|
||||||
|
The backend already handles publish failures gracefully :
|
||||||
|
|
||||||
|
- `internal/eventbus/rabbitmq.go` retries with exponential backoff up
|
||||||
|
to 30s, then drops to "degraded mode" (publish returns immediately
|
||||||
|
with a logged warning, the API call succeeds, the side-effect is
|
||||||
|
lost).
|
||||||
|
- Workers in `internal/workers/` have `WithRetry()` middleware that
|
||||||
|
republishes failed deliveries up to 5 times before dead-lettering.
|
||||||
|
|
||||||
|
If recovery is going to take > 10 min, set
|
||||||
|
`EVENTBUS_DEGRADED_LOG_LEVEL=error` (default `warn`) so the
|
||||||
|
fail-fast logs land in Sentry and operators can audit which messages
|
||||||
|
were dropped.
|
||||||
|
|
||||||
|
**Do NOT** restart the backend to clear the AMQP connection pool ;
|
||||||
|
the reconnect logic (`go.uber.org/zap`-logged in eventbus.go:142)
|
||||||
|
handles it once RabbitMQ is back.
|
||||||
|
|
||||||
|
## Recovery
|
||||||
|
|
||||||
|
Once RabbitMQ is back up :
|
||||||
|
|
||||||
|
1. Verify connectivity from each backend instance :
|
||||||
|
```bash
|
||||||
|
docker exec veza_backend_api sh -c 'echo -e "AMQP\x00\x00\x09\x01" | nc -w1 rabbitmq.lxd 5672 | head -c 4'
|
||||||
|
```
|
||||||
|
Should return `AMQP`.
|
||||||
|
2. Watch the queue depth on the management UI :
|
||||||
|
`http://rabbitmq.lxd:15672/#/queues`. Expect `transcode_jobs`,
|
||||||
|
`distribution_outbox`, `dmca_propagation`, `search_index_updates`
|
||||||
|
to drain over the next 5-15 min as the workers catch up.
|
||||||
|
3. If a queue is stuck > 30 min after recovery, the worker for it is
|
||||||
|
wedged — restart that specific worker container :
|
||||||
|
```bash
|
||||||
|
docker compose -f docker-compose.prod.yml restart worker-<name>
|
||||||
|
```
|
||||||
|
|
||||||
|
## Audit after the outage
|
||||||
|
|
||||||
|
1. Sentry filter `tag:eventbus.status=degraded` between outage start
|
||||||
|
and end — gives you the count and shape of dropped events.
|
||||||
|
2. For each dropped DMCA event, manually trigger the cache flush :
|
||||||
|
```bash
|
||||||
|
curl -X POST -H "Authorization: Bearer $ADMIN_TOKEN" \
|
||||||
|
https://api.veza.fr/api/v1/admin/cache/dmca/flush
|
||||||
|
```
|
||||||
|
3. For each dropped transcode job, requeue from the orders table :
|
||||||
|
```bash
|
||||||
|
psql "$DATABASE_URL" -c "
|
||||||
|
INSERT INTO transcode_jobs (track_id, status, attempts, created_at)
|
||||||
|
SELECT id, 'pending', 0, NOW() FROM tracks
|
||||||
|
WHERE created_at BETWEEN '<outage_start>' AND '<outage_end>'
|
||||||
|
AND hls_status IS NULL;
|
||||||
|
"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Postmortem trigger
|
||||||
|
|
||||||
|
Any RabbitMQ outage > 10 min triggers a postmortem. The non-user-facing
|
||||||
|
nature makes this less urgent than Redis or Postgres, but the
|
||||||
|
silent-failure modes (dropped DMCA propagation, missing transcodes)
|
||||||
|
warrant a write-up so we know what slipped through.
|
||||||
|
|
||||||
|
## Future-proofing
|
||||||
|
|
||||||
|
- v1.1 will move to a 3-node RabbitMQ cluster behind a load balancer
|
||||||
|
for HA. This runbook will then split into "single-node down" (the
|
||||||
|
cluster keeps serving) and "cluster split-brain" (rare, but the
|
||||||
|
recovery path is different).
|
||||||
|
- Worker idempotency keys are documented in `docs/api/eventbus.md` —
|
||||||
|
any new worker MUST honour them so a replay during recovery doesn't
|
||||||
|
double-charge / double-distribute / double-takedown.
|
||||||
1
infra/ansible/inventory/group_vars
Symbolic link
1
infra/ansible/inventory/group_vars
Symbolic link
|
|
@ -0,0 +1 @@
|
||||||
|
../group_vars
|
||||||
|
|
@ -20,6 +20,16 @@ all:
|
||||||
ansible_user: senke
|
ansible_user: senke
|
||||||
ansible_python_interpreter: /usr/bin/python3
|
ansible_python_interpreter: /usr/bin/python3
|
||||||
children:
|
children:
|
||||||
|
# Env-named meta-group — see inventory/staging.yml for rationale.
|
||||||
|
prod:
|
||||||
|
children:
|
||||||
|
incus_hosts:
|
||||||
|
forgejo_runner:
|
||||||
|
haproxy:
|
||||||
|
veza_app_backend:
|
||||||
|
veza_app_stream:
|
||||||
|
veza_app_web:
|
||||||
|
veza_data:
|
||||||
incus_hosts:
|
incus_hosts:
|
||||||
hosts:
|
hosts:
|
||||||
veza-prod:
|
veza-prod:
|
||||||
|
|
|
||||||
|
|
@ -36,6 +36,18 @@ all:
|
||||||
ansible_user: senke
|
ansible_user: senke
|
||||||
ansible_python_interpreter: /usr/bin/python3
|
ansible_python_interpreter: /usr/bin/python3
|
||||||
children:
|
children:
|
||||||
|
# Env-named meta-group : every host below is also in `staging`,
|
||||||
|
# which makes group_vars/staging.yml apply (Ansible matches
|
||||||
|
# group_vars file names against group names).
|
||||||
|
staging:
|
||||||
|
children:
|
||||||
|
incus_hosts:
|
||||||
|
forgejo_runner:
|
||||||
|
haproxy:
|
||||||
|
veza_app_backend:
|
||||||
|
veza_app_stream:
|
||||||
|
veza_app_web:
|
||||||
|
veza_data:
|
||||||
incus_hosts:
|
incus_hosts:
|
||||||
hosts:
|
hosts:
|
||||||
veza-staging:
|
veza-staging:
|
||||||
|
|
|
||||||
|
|
@ -18,14 +18,28 @@
|
||||||
become: true
|
become: true
|
||||||
gather_facts: true
|
gather_facts: true
|
||||||
tasks:
|
tasks:
|
||||||
- name: Launch veza-haproxy container if absent
|
- name: Launch / repair veza-haproxy container
|
||||||
|
# Idempotent : RUNNING → no-op ; STOPPED/half-baked → recreate ;
|
||||||
|
# absent → fresh launch. Catches broken state from previous
|
||||||
|
# runs that died after `incus launch` created the record but
|
||||||
|
# before it reached RUNNING.
|
||||||
ansible.builtin.shell:
|
ansible.builtin.shell:
|
||||||
cmd: |
|
cmd: |
|
||||||
set -e
|
set -e
|
||||||
if incus info veza-haproxy >/dev/null 2>&1; then
|
STATE=$(incus list veza-haproxy -f csv -c s 2>/dev/null | head -1 || true)
|
||||||
echo "veza-haproxy already exists"
|
case "$STATE" in
|
||||||
|
RUNNING)
|
||||||
|
echo "veza-haproxy RUNNING already"
|
||||||
exit 0
|
exit 0
|
||||||
fi
|
;;
|
||||||
|
"")
|
||||||
|
# No record — fresh launch.
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
echo "veza-haproxy in state '$STATE' — recreating"
|
||||||
|
incus delete --force veza-haproxy
|
||||||
|
;;
|
||||||
|
esac
|
||||||
incus launch "{{ veza_app_base_image | default('images:debian/13') }}" veza-haproxy --profile veza-app --network "{{ veza_incus_network | default('net-veza') }}"
|
incus launch "{{ veza_app_base_image | default('images:debian/13') }}" veza-haproxy --profile veza-app --network "{{ veza_incus_network | default('net-veza') }}"
|
||||||
for _ in $(seq 1 30); do
|
for _ in $(seq 1 30); do
|
||||||
if incus exec veza-haproxy -- /bin/true 2>/dev/null; then
|
if incus exec veza-haproxy -- /bin/true 2>/dev/null; then
|
||||||
|
|
@ -35,21 +49,54 @@
|
||||||
done
|
done
|
||||||
incus exec veza-haproxy -- apt-get update
|
incus exec veza-haproxy -- apt-get update
|
||||||
incus exec veza-haproxy -- apt-get install -y python3 python3-apt
|
incus exec veza-haproxy -- apt-get install -y python3 python3-apt
|
||||||
|
echo "veza-haproxy LAUNCHED"
|
||||||
executable: /bin/bash
|
executable: /bin/bash
|
||||||
register: provision_result
|
register: provision_result
|
||||||
changed_when: "'incus launch' in provision_result.stdout"
|
changed_when: "'LAUNCHED' in provision_result.stdout or 'recreating' in provision_result.stdout"
|
||||||
tags: [haproxy, provision]
|
tags: [haproxy, provision]
|
||||||
|
|
||||||
- name: Refresh inventory so veza-haproxy is reachable
|
- name: Refresh inventory so veza-haproxy is reachable
|
||||||
ansible.builtin.meta: refresh_inventory
|
ansible.builtin.meta: refresh_inventory
|
||||||
|
|
||||||
- name: Apply common baseline (SSH hardening, fail2ban, node_exporter)
|
# Incus proxy devices : forward the host's :80 / :443 to the
|
||||||
hosts: haproxy
|
# container's :80 / :443. Without this, packets from the box's
|
||||||
become: true
|
# NAT (Internet → R720:80) hit the host but never reach the
|
||||||
gather_facts: true
|
# container — HAProxy is reachable on net-veza only, not on
|
||||||
roles:
|
# the host's public-facing interface.
|
||||||
- common
|
- name: Ensure incus proxy device for port 80 (R720 host → veza-haproxy)
|
||||||
|
ansible.builtin.shell: |
|
||||||
|
if incus config device show veza-haproxy 2>/dev/null | grep -q '^http:$'; then
|
||||||
|
echo "proxy http already attached"
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
incus config device add veza-haproxy http proxy \
|
||||||
|
listen=tcp:0.0.0.0:80 \
|
||||||
|
connect=tcp:127.0.0.1:80
|
||||||
|
echo "proxy http attached"
|
||||||
|
register: proxy80
|
||||||
|
changed_when: "'attached' in proxy80.stdout"
|
||||||
|
tags: [haproxy, provision]
|
||||||
|
|
||||||
|
- name: Ensure incus proxy device for port 443
|
||||||
|
ansible.builtin.shell: |
|
||||||
|
if incus config device show veza-haproxy 2>/dev/null | grep -q '^https:$'; then
|
||||||
|
echo "proxy https already attached"
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
incus config device add veza-haproxy https proxy \
|
||||||
|
listen=tcp:0.0.0.0:443 \
|
||||||
|
connect=tcp:127.0.0.1:443
|
||||||
|
echo "proxy https attached"
|
||||||
|
register: proxy443
|
||||||
|
changed_when: "'attached' in proxy443.stdout"
|
||||||
|
tags: [haproxy, provision]
|
||||||
|
|
||||||
|
# Common role intentionally NOT applied to the haproxy container :
|
||||||
|
# it's reached via `incus exec` (no SSH inside), and the role's
|
||||||
|
# SSH-hardening / fail2ban / node_exporter setup assumes a full
|
||||||
|
# host (sshd present, auth.log to monitor, exposed metrics port).
|
||||||
|
# Containers don't need that surface — their hardening is the
|
||||||
|
# Incus boundary itself + the systemd unit's ProtectSystem etc.
|
||||||
- name: Install + configure HAProxy + dehydrated/Let's Encrypt
|
- name: Install + configure HAProxy + dehydrated/Let's Encrypt
|
||||||
hosts: haproxy
|
hosts: haproxy
|
||||||
become: true
|
become: true
|
||||||
|
|
|
||||||
|
|
@ -2,7 +2,25 @@
|
||||||
# whitelist of users. The role refuses to lock the operator out: it
|
# whitelist of users. The role refuses to lock the operator out: it
|
||||||
# verifies the AllowUsers list is non-empty and contains at least
|
# verifies the AllowUsers list is non-empty and contains at least
|
||||||
# the connecting user before reloading sshd.
|
# the connecting user before reloading sshd.
|
||||||
|
#
|
||||||
|
# Skipped entirely when sshd is not installed on the target — useful
|
||||||
|
# for Incus containers reached via `incus exec`, which don't need
|
||||||
|
# SSH at all (overlay set common_apply_ssh_hardening=false to skip
|
||||||
|
# explicitly even when sshd happens to be present).
|
||||||
---
|
---
|
||||||
|
- name: Detect whether sshd is present on the target
|
||||||
|
ansible.builtin.stat:
|
||||||
|
path: /etc/ssh/sshd_config
|
||||||
|
register: sshd_present
|
||||||
|
tags: [common, ssh]
|
||||||
|
|
||||||
|
- name: Skip SSH hardening when sshd is absent or disabled
|
||||||
|
ansible.builtin.debug:
|
||||||
|
msg: "sshd not installed on this host — SSH hardening skipped"
|
||||||
|
when:
|
||||||
|
- not sshd_present.stat.exists or not (common_apply_ssh_hardening | default(true))
|
||||||
|
tags: [common, ssh]
|
||||||
|
|
||||||
- name: Sanity check — ssh_allow_users must be non-empty
|
- name: Sanity check — ssh_allow_users must be non-empty
|
||||||
ansible.builtin.assert:
|
ansible.builtin.assert:
|
||||||
that:
|
that:
|
||||||
|
|
@ -12,6 +30,9 @@
|
||||||
ssh_allow_users is empty. Refusing to apply sshd_config which
|
ssh_allow_users is empty. Refusing to apply sshd_config which
|
||||||
would lock everyone out. Set ssh_allow_users in
|
would lock everyone out. Set ssh_allow_users in
|
||||||
group_vars/all.yml (or override per environment).
|
group_vars/all.yml (or override per environment).
|
||||||
|
when:
|
||||||
|
- sshd_present.stat.exists
|
||||||
|
- common_apply_ssh_hardening | default(true)
|
||||||
|
|
||||||
- name: Render sshd_config drop-in (50-veza-hardening.conf)
|
- name: Render sshd_config drop-in (50-veza-hardening.conf)
|
||||||
ansible.builtin.template:
|
ansible.builtin.template:
|
||||||
|
|
@ -22,9 +43,15 @@
|
||||||
mode: "0644"
|
mode: "0644"
|
||||||
validate: /usr/sbin/sshd -t -f %s
|
validate: /usr/sbin/sshd -t -f %s
|
||||||
notify: Reload sshd
|
notify: Reload sshd
|
||||||
|
when:
|
||||||
|
- sshd_present.stat.exists
|
||||||
|
- common_apply_ssh_hardening | default(true)
|
||||||
|
|
||||||
- name: Ensure sshd is enabled + running
|
- name: Ensure sshd is enabled + running
|
||||||
ansible.builtin.service:
|
ansible.builtin.service:
|
||||||
name: ssh
|
name: ssh
|
||||||
state: started
|
state: started
|
||||||
enabled: true
|
enabled: true
|
||||||
|
when:
|
||||||
|
- sshd_present.stat.exists
|
||||||
|
- common_apply_ssh_hardening | default(true)
|
||||||
|
|
|
||||||
50
infra/ansible/roles/haproxy/files/selfsigned.pem
Normal file
50
infra/ansible/roles/haproxy/files/selfsigned.pem
Normal file
|
|
@ -0,0 +1,50 @@
|
||||||
|
-----BEGIN PRIVATE KEY-----
|
||||||
|
MIIEvgIBADANBgkqhkiG9w0BAQEFAASCBKgwggSkAgEAAoIBAQCgyerZjp1+RxU8
|
||||||
|
/bISXduo8OjR2ejl5SD034PyQvT5B9tk83yplplHoG+JL78UGqpflPlhU9fQSoT9
|
||||||
|
Walusf/MDDCEbQ75sjPui+yNuvcgWkmpN0MUdOHR8gvfiADCR6/eDQuRf7JJh5N8
|
||||||
|
YdCtLtnOYsha7Bix+bN11GO6XzPG869I/UGdg4g0v7LvDCP3tI0tpno+y4MuiDvJ
|
||||||
|
R1pQd7sl6jxPp4zvNtVw8vrSVA3qJ8G6F78nnPUUPFnrAlUFNcnMVLamxY0IA3H4
|
||||||
|
n9o7X73RnphrpcnPr6eyEYxOL0UGhsDMsQxTrhSaOErL68QDTk3hV60SxWqsVlxX
|
||||||
|
/DoKAb9VAgMBAAECggEAenTt6V3Fsxv+H+Jz0assFYHNP63/w797FyR4QHUgT93d
|
||||||
|
CQisRBjPio61A72agHxCj+NM/wQ1FIz8tluoQAdO8x/Bf8nzotZG2QI2Wkcv2bMJ
|
||||||
|
8NeGvji6mAQJaOgS8+RXG/3BdsHTjk60VAHHRW6uMZJoV18C++FZ/X6RqarCK13N
|
||||||
|
UEfHX529qNvLhw+xkjXFW/qiB3dQTTEJq+9y0U4nGrjZCXtspkXN3g6ETU6Svzhq
|
||||||
|
z4tq0udC7FjZPqdA79ChXweZlDCq89FQfxAnxRoZAiwymK91VrGz/GyMIwdBPidm
|
||||||
|
+or8Rk6nodKk8AuwsGE6ub9UhWUS+Kdpl9fNcV1jLQKBgQDRA7D786sf25tgyooF
|
||||||
|
6IMZwQfHWGmIepUPruHLz5aV6ozO8XQBgEN4XBI15mxJTu+eeXGbqOhwwuhvYR9u
|
||||||
|
G02qPE0OlftBRnBJp2AH5+gRphLyrRAvgnjVw323ucnsjOzO0TPwdehomKC0J3b9
|
||||||
|
B+hZ2tKW/nNxqX/iU1ue969lAwKBgQDE7vJnppvAZLSMo4PCtBTJm11u58AZ9LyZ
|
||||||
|
6dxvpiq6XxPw9DcC2gj91pCST2g4vIqDYQgmh5U3RzMIFsKLtKfDvHEAYbFOnEfz
|
||||||
|
UXoNFjlCEmB2jHgpn51/ZDokpPSF9MooDUFna0JPaUrduHs8Zzv7kfrsAhq2N++C
|
||||||
|
eB+jMea+xwKBgESDzEFbB85io5Vf70yugkMv9ofPIJD/ddt1PUkdHES6ZTv1BEz1
|
||||||
|
qahLriCDDx4cxQmSz73x6XgFPEI+eRoT0yqpp6zPV1R3bZmHR0BwMa+PXAi22GZq
|
||||||
|
g4e3FH/kZB+ptnq5MyhwziVzWsKTaTram7zQsVWTxW4N3QDoyFDc6l7XAoGBAI85
|
||||||
|
+bLIyZ4zn9xpT/rbXgMCrAFtK5m1FTYbj+bjw0+otqgX9aptSPzUgHDor7QT6+mB
|
||||||
|
OJxNH4kEj2jipLtWuGzzMHxGkN3La8jbCRlbgGk9VErj/sDHBZURH/hmwDBsyFo4
|
||||||
|
ycidiayXt4tqELbtngJpOUVMgoDkTZ1mIBxgvqEhAoGBAK6uX4k2xiOQorpByvjd
|
||||||
|
gT16MbuntXO/bDXnXaq1keNMr1JzQ5aS346XweiUgRG7ZJdEb2C8sXwSmh2+oeGa
|
||||||
|
G+QCLH73hwo/PWbU560dFY8s6z5E79WBjYUu5+1/a0SCBwQ4mEVB7REQVY1mQoJT
|
||||||
|
A+A8WW+EDvaPpVFujA26K3fc
|
||||||
|
-----END PRIVATE KEY-----
|
||||||
|
-----BEGIN CERTIFICATE-----
|
||||||
|
MIIDjTCCAnWgAwIBAgIUbgZuZRFj8M8ZcdhRFikB2bJKswYwDQYJKoZIhvcNAQEL
|
||||||
|
BQAwVjELMAkGA1UEBhMCWFgxFTATBgNVBAcMDERlZmF1bHQgQ2l0eTEcMBoGA1UE
|
||||||
|
CgwTRGVmYXVsdCBDb21wYW55IEx0ZDESMBAGA1UEAwwJbG9jYWxob3N0MB4XDTIy
|
||||||
|
MDQwODEwMTA0OFoXDTQ5MDgyNDEwMTA0OFowVjELMAkGA1UEBhMCWFgxFTATBgNV
|
||||||
|
BAcMDERlZmF1bHQgQ2l0eTEcMBoGA1UECgwTRGVmYXVsdCBDb21wYW55IEx0ZDES
|
||||||
|
MBAGA1UEAwwJbG9jYWxob3N0MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKC
|
||||||
|
AQEAoMnq2Y6dfkcVPP2yEl3bqPDo0dno5eUg9N+D8kL0+QfbZPN8qZaZR6BviS+/
|
||||||
|
FBqqX5T5YVPX0EqE/VmpbrH/zAwwhG0O+bIz7ovsjbr3IFpJqTdDFHTh0fIL34gA
|
||||||
|
wkev3g0LkX+ySYeTfGHQrS7ZzmLIWuwYsfmzddRjul8zxvOvSP1BnYOINL+y7wwj
|
||||||
|
97SNLaZ6PsuDLog7yUdaUHe7Jeo8T6eM7zbVcPL60lQN6ifBuhe/J5z1FDxZ6wJV
|
||||||
|
BTXJzFS2psWNCANx+J/aO1+90Z6Ya6XJz6+nshGMTi9FBobAzLEMU64UmjhKy+vE
|
||||||
|
A05N4VetEsVqrFZcV/w6CgG/VQIDAQABo1MwUTAdBgNVHQ4EFgQUJZDike5gfaOV
|
||||||
|
k8uCwfCh2OrPXd0wHwYDVR0jBBgwFoAUJZDike5gfaOVk8uCwfCh2OrPXd0wDwYD
|
||||||
|
VR0TAQH/BAUwAwEB/zANBgkqhkiG9w0BAQsFAAOCAQEAQbXAIBoDHQakksvKGo3X
|
||||||
|
/bIyc+IQKFpsyWrn5GvS69wTE7XBfKLtyY3X8NygvsCaRx0r2OIdVERNjrhELkes
|
||||||
|
tWQE17D1+tDnsaEQRUNJsjBYmealNPpqqacdRlBNnkTSGM/3d3m/ihlA51A1QzyI
|
||||||
|
IOtKxRRIZ+24L/eww5Hv96ub3Wu4rVmepXP4cVIcPEnN6ntmOv4Ja/M83hLI2oXy
|
||||||
|
4XmXOVsyliYDGWiyvT2U3LcRsv9PHr09SqYO/5yW+fYC7diLGSHW0kfwht2Q8Zqg
|
||||||
|
IFMJMDmmKTbCWCmFYdoVTRm2fFl0YvgpC5JrXuSloHh3hRiLwDIUiTxlTM3JDP8q
|
||||||
|
PQ==
|
||||||
|
-----END CERTIFICATE-----
|
||||||
|
|
@ -26,6 +26,29 @@
|
||||||
mode: "0750"
|
mode: "0750"
|
||||||
tags: [haproxy, config]
|
tags: [haproxy, config]
|
||||||
|
|
||||||
|
# Chicken-and-egg : haproxy.cfg.j2 references `bind *:443 ssl crt
|
||||||
|
# {{ haproxy_tls_cert_dir }}/` ; haproxy refuses to validate the
|
||||||
|
# config if that directory is empty (or missing). dehydrated creates
|
||||||
|
# real LE certs there LATER (in letsencrypt.yml). Break the cycle
|
||||||
|
# the same way the working roles in
|
||||||
|
# /home/senke/Documents/TG__Talas_Group/.../roles/haproxy do : ship a
|
||||||
|
# checked-in `selfsigned.pem` and copy it into the cert dir.
|
||||||
|
# Once dehydrated lands real certs alongside, SNI picks the matching
|
||||||
|
# real cert ; selfsigned.pem only matches CN=localhost (harmless).
|
||||||
|
- name: Ensure {{ haproxy_tls_cert_dir }} exists
|
||||||
|
ansible.builtin.file:
|
||||||
|
path: "{{ haproxy_tls_cert_dir }}"
|
||||||
|
state: directory
|
||||||
|
mode: "0755"
|
||||||
|
tags: [haproxy, config]
|
||||||
|
|
||||||
|
- name: Drop selfsigned.pem so haproxy can validate the cfg
|
||||||
|
ansible.builtin.copy:
|
||||||
|
src: selfsigned.pem
|
||||||
|
dest: "{{ haproxy_tls_cert_dir }}/selfsigned.pem"
|
||||||
|
mode: "0640"
|
||||||
|
tags: [haproxy, config]
|
||||||
|
|
||||||
- name: Render haproxy.cfg
|
- name: Render haproxy.cfg
|
||||||
ansible.builtin.template:
|
ansible.builtin.template:
|
||||||
src: haproxy.cfg.j2
|
src: haproxy.cfg.j2
|
||||||
|
|
@ -33,7 +56,10 @@
|
||||||
owner: root
|
owner: root
|
||||||
group: haproxy
|
group: haproxy
|
||||||
mode: "0640"
|
mode: "0640"
|
||||||
validate: "haproxy -f %s -c -q"
|
# No -q so the actual validation error reaches the operator's
|
||||||
|
# console. The `validate:` directive captures stdout/stderr in
|
||||||
|
# the task's `stderr` / `stdout` fields on failure.
|
||||||
|
validate: "haproxy -f %s -c"
|
||||||
register: haproxy_config
|
register: haproxy_config
|
||||||
notify: Reload haproxy
|
notify: Reload haproxy
|
||||||
tags: [haproxy, config]
|
tags: [haproxy, config]
|
||||||
|
|
|
||||||
|
|
@ -41,6 +41,28 @@ defaults
|
||||||
timeout http-request 10s
|
timeout http-request 10s
|
||||||
load-server-state-from-file global
|
load-server-state-from-file global
|
||||||
|
|
||||||
|
# -----------------------------------------------------------------------
|
||||||
|
# DNS resolvers — Incus's managed bridges expose a built-in DNS
|
||||||
|
# resolver on the gateway IP for the bridge's subnet (10.0.20.1 for
|
||||||
|
# net-veza). Backend containers' .lxd hostnames resolve here.
|
||||||
|
# init-addr last,libc,none on default-server lets HAProxy start
|
||||||
|
# even if the backends don't exist yet ; servers go into MAINT
|
||||||
|
# until the resolver returns an address (deploy_app.yml creates
|
||||||
|
# them later, then `incus-resolver` task in HAProxy picks them up
|
||||||
|
# automatically — no haproxy reload needed).
|
||||||
|
# -----------------------------------------------------------------------
|
||||||
|
resolvers veza_dns
|
||||||
|
nameserver incus_gw 10.0.20.1:53
|
||||||
|
accepted_payload_size 4096
|
||||||
|
resolve_retries 3
|
||||||
|
timeout resolve 1s
|
||||||
|
timeout retry 1s
|
||||||
|
hold valid 10s
|
||||||
|
hold nx 5s
|
||||||
|
hold timeout 5s
|
||||||
|
hold refused 5s
|
||||||
|
hold obsolete 30s
|
||||||
|
|
||||||
# -----------------------------------------------------------------------
|
# -----------------------------------------------------------------------
|
||||||
# Stats endpoint — bound to loopback only ; the Prometheus haproxy
|
# Stats endpoint — bound to loopback only ; the Prometheus haproxy
|
||||||
# exporter sidecar scrapes it.
|
# exporter sidecar scrapes it.
|
||||||
|
|
@ -63,9 +85,12 @@ frontend veza_http_in
|
||||||
bind *:{{ haproxy_listen_https }} ssl crt {{ haproxy_tls_cert_dir }}/ alpn h2,http/1.1
|
bind *:{{ haproxy_listen_https }} ssl crt {{ haproxy_tls_cert_dir }}/ alpn h2,http/1.1
|
||||||
http-response set-header Strict-Transport-Security "max-age=31536000; includeSubDomains"
|
http-response set-header Strict-Transport-Security "max-age=31536000; includeSubDomains"
|
||||||
# Let dehydrated's HTTP-01 challenges through unencrypted before any redirect.
|
# Let dehydrated's HTTP-01 challenges through unencrypted before any redirect.
|
||||||
|
# Order matters : http-request rules must come BEFORE use_backend
|
||||||
|
# rules in HAProxy ; otherwise haproxy 3.x warns and processes them
|
||||||
|
# in the unintended order.
|
||||||
acl acme_challenge path_beg /.well-known/acme-challenge/
|
acl acme_challenge path_beg /.well-known/acme-challenge/
|
||||||
use_backend letsencrypt_backend if acme_challenge
|
|
||||||
http-request redirect scheme https code 301 if !{ ssl_fc } !acme_challenge
|
http-request redirect scheme https code 301 if !{ ssl_fc } !acme_challenge
|
||||||
|
use_backend letsencrypt_backend if acme_challenge
|
||||||
{% elif haproxy_tls_cert_path %}
|
{% elif haproxy_tls_cert_path %}
|
||||||
bind *:{{ haproxy_listen_https }} ssl crt {{ haproxy_tls_cert_path }} alpn h2,http/1.1
|
bind *:{{ haproxy_listen_https }} ssl crt {{ haproxy_tls_cert_path }} alpn h2,http/1.1
|
||||||
http-response set-header Strict-Transport-Security "max-age=31536000; includeSubDomains"
|
http-response set-header Strict-Transport-Security "max-age=31536000; includeSubDomains"
|
||||||
|
|
@ -146,7 +171,7 @@ backend {{ env }}_backend_api
|
||||||
option httpchk GET {{ veza_healthcheck_paths.backend | default('/api/v1/health') }}
|
option httpchk GET {{ veza_healthcheck_paths.backend | default('/api/v1/health') }}
|
||||||
http-check expect status 200
|
http-check expect status 200
|
||||||
cookie {{ haproxy_sticky_cookie_name }}_{{ env }} insert indirect nocache httponly secure
|
cookie {{ haproxy_sticky_cookie_name }}_{{ env }} insert indirect nocache httponly secure
|
||||||
default-server check inter {{ haproxy_health_check_interval_ms }} fall {{ haproxy_health_check_fall }} rise {{ haproxy_health_check_rise }} on-marked-down shutdown-sessions slowstart {{ haproxy_graceful_drain_seconds }}s
|
default-server check inter {{ haproxy_health_check_interval_ms }} fall {{ haproxy_health_check_fall }} rise {{ haproxy_health_check_rise }} on-marked-down shutdown-sessions slowstart {{ haproxy_graceful_drain_seconds }}s init-addr last,libc,none resolvers veza_dns
|
||||||
server {{ env }}_backend_blue {{ prefix }}backend-blue.{{ veza_incus_dns_suffix }}:{{ veza_backend_port }} cookie {{ env }}_backend_blue {{ '' if _active == 'blue' else 'backup' }}
|
server {{ env }}_backend_blue {{ prefix }}backend-blue.{{ veza_incus_dns_suffix }}:{{ veza_backend_port }} cookie {{ env }}_backend_blue {{ '' if _active == 'blue' else 'backup' }}
|
||||||
server {{ env }}_backend_green {{ prefix }}backend-green.{{ veza_incus_dns_suffix }}:{{ veza_backend_port }} cookie {{ env }}_backend_green {{ '' if _active == 'green' else 'backup' }}
|
server {{ env }}_backend_green {{ prefix }}backend-green.{{ veza_incus_dns_suffix }}:{{ veza_backend_port }} cookie {{ env }}_backend_green {{ '' if _active == 'green' else 'backup' }}
|
||||||
|
|
||||||
|
|
@ -157,7 +182,7 @@ backend {{ env }}_stream_pool
|
||||||
option httpchk GET {{ veza_healthcheck_paths.stream | default('/health') }}
|
option httpchk GET {{ veza_healthcheck_paths.stream | default('/health') }}
|
||||||
http-check expect status 200
|
http-check expect status 200
|
||||||
timeout tunnel 1h
|
timeout tunnel 1h
|
||||||
default-server check inter {{ haproxy_health_check_interval_ms }} fall {{ haproxy_health_check_fall }} rise {{ haproxy_health_check_rise }} on-marked-down shutdown-sessions slowstart {{ haproxy_graceful_drain_seconds }}s
|
default-server check inter {{ haproxy_health_check_interval_ms }} fall {{ haproxy_health_check_fall }} rise {{ haproxy_health_check_rise }} on-marked-down shutdown-sessions slowstart {{ haproxy_graceful_drain_seconds }}s init-addr last,libc,none resolvers veza_dns
|
||||||
server {{ env }}_stream_blue {{ prefix }}stream-blue.{{ veza_incus_dns_suffix }}:{{ veza_stream_port }} {{ '' if _active == 'blue' else 'backup' }}
|
server {{ env }}_stream_blue {{ prefix }}stream-blue.{{ veza_incus_dns_suffix }}:{{ veza_stream_port }} {{ '' if _active == 'blue' else 'backup' }}
|
||||||
server {{ env }}_stream_green {{ prefix }}stream-green.{{ veza_incus_dns_suffix }}:{{ veza_stream_port }} {{ '' if _active == 'green' else 'backup' }}
|
server {{ env }}_stream_green {{ prefix }}stream-green.{{ veza_incus_dns_suffix }}:{{ veza_stream_port }} {{ '' if _active == 'green' else 'backup' }}
|
||||||
|
|
||||||
|
|
@ -166,7 +191,7 @@ backend {{ env }}_web_pool
|
||||||
balance roundrobin
|
balance roundrobin
|
||||||
option httpchk GET {{ veza_healthcheck_paths.web | default('/') }}
|
option httpchk GET {{ veza_healthcheck_paths.web | default('/') }}
|
||||||
http-check expect status 200
|
http-check expect status 200
|
||||||
default-server check inter {{ haproxy_health_check_interval_ms }} fall {{ haproxy_health_check_fall }} rise {{ haproxy_health_check_rise }} on-marked-down shutdown-sessions slowstart {{ haproxy_graceful_drain_seconds }}s
|
default-server check inter {{ haproxy_health_check_interval_ms }} fall {{ haproxy_health_check_fall }} rise {{ haproxy_health_check_rise }} on-marked-down shutdown-sessions slowstart {{ haproxy_graceful_drain_seconds }}s init-addr last,libc,none resolvers veza_dns
|
||||||
server {{ env }}_web_blue {{ prefix }}web-blue.{{ veza_incus_dns_suffix }}:{{ veza_web_port }} {{ '' if _active == 'blue' else 'backup' }}
|
server {{ env }}_web_blue {{ prefix }}web-blue.{{ veza_incus_dns_suffix }}:{{ veza_web_port }} {{ '' if _active == 'blue' else 'backup' }}
|
||||||
server {{ env }}_web_green {{ prefix }}web-green.{{ veza_incus_dns_suffix }}:{{ veza_web_port }} {{ '' if _active == 'green' else 'backup' }}
|
server {{ env }}_web_green {{ prefix }}web-green.{{ veza_incus_dns_suffix }}:{{ veza_web_port }} {{ '' if _active == 'green' else 'backup' }}
|
||||||
|
|
||||||
|
|
@ -174,11 +199,17 @@ backend {{ env }}_web_pool
|
||||||
|
|
||||||
{% if haproxy_forgejo_host %}
|
{% if haproxy_forgejo_host %}
|
||||||
# --- Forgejo (managed outside the deploy pipeline) --------------------
|
# --- Forgejo (managed outside the deploy pipeline) --------------------
|
||||||
|
# The existing forgejo container exposes HTTPS on :3000 with a
|
||||||
|
# self-signed cert. We re-encrypt to it (ssl verify none) ; the
|
||||||
|
# operator's WireGuard mesh is the trust boundary, the cert chain
|
||||||
|
# is irrelevant. Healthcheck adapted to send a Host: header so
|
||||||
|
# Forgejo's reverse-proxy validation accepts the request.
|
||||||
backend forgejo_backend
|
backend forgejo_backend
|
||||||
option httpchk GET /
|
option httpchk
|
||||||
http-check expect status 200
|
http-check send meth GET uri / ver HTTP/1.1 hdr Host {{ haproxy_forgejo_host }}
|
||||||
|
http-check expect rstatus ^[23]
|
||||||
default-server check inter 10s fall 3 rise 2
|
default-server check inter 10s fall 3 rise 2
|
||||||
server forgejo {{ haproxy_forgejo_backend }}
|
server forgejo {{ haproxy_forgejo_backend }} ssl verify none sni str({{ haproxy_forgejo_host }})
|
||||||
{% endif %}
|
{% endif %}
|
||||||
|
|
||||||
{% if haproxy_talas_hosts %}
|
{% if haproxy_talas_hosts %}
|
||||||
|
|
|
||||||
|
|
@ -42,6 +42,17 @@ OPERATOR_EMAIL=${OPERATOR_EMAIL:-?}
|
||||||
OPERATOR_PASSWORD=${OPERATOR_PASSWORD:-?}
|
OPERATOR_PASSWORD=${OPERATOR_PASSWORD:-?}
|
||||||
ORDER_POLL_TIMEOUT=${ORDER_POLL_TIMEOUT:-300}
|
ORDER_POLL_TIMEOUT=${ORDER_POLL_TIMEOUT:-300}
|
||||||
ORDER_POLL_INTERVAL=${ORDER_POLL_INTERVAL:-5}
|
ORDER_POLL_INTERVAL=${ORDER_POLL_INTERVAL:-5}
|
||||||
|
# v1.0.10 polish safety guards:
|
||||||
|
# DRY_RUN=1 — skip the POST /orders + payment steps; rehearse
|
||||||
|
# the login + product-listing + license-poll path
|
||||||
|
# end-to-end on staging without spending a euro.
|
||||||
|
# CONFIRM_PRODUCTION=1 — required when STAGING_URL points at the live
|
||||||
|
# environment. Without it the script refuses to
|
||||||
|
# run, so a typo (`STAGING_URL=https://veza.fr`
|
||||||
|
# on a sandbox-targeted command) can't accidentally
|
||||||
|
# charge a real card.
|
||||||
|
DRY_RUN=${DRY_RUN:-0}
|
||||||
|
CONFIRM_PRODUCTION=${CONFIRM_PRODUCTION:-0}
|
||||||
|
|
||||||
SESSION_DATE="$(date +%Y%m%d-%H%M)"
|
SESSION_DATE="$(date +%Y%m%d-%H%M)"
|
||||||
SESSION_LOG="${REPO_ROOT}/docs/PAYMENT_E2E_LIVE_REPORT.md.session-${SESSION_DATE}.log"
|
SESSION_LOG="${REPO_ROOT}/docs/PAYMENT_E2E_LIVE_REPORT.md.session-${SESSION_DATE}.log"
|
||||||
|
|
@ -64,6 +75,43 @@ require jq
|
||||||
[ "$OPERATOR_EMAIL" = "?" ] && fail "OPERATOR_EMAIL env var required" 3
|
[ "$OPERATOR_EMAIL" = "?" ] && fail "OPERATOR_EMAIL env var required" 3
|
||||||
[ "$OPERATOR_PASSWORD" = "?" ] && fail "OPERATOR_PASSWORD env var required" 3
|
[ "$OPERATOR_PASSWORD" = "?" ] && fail "OPERATOR_PASSWORD env var required" 3
|
||||||
|
|
||||||
|
# Heuristic: any URL that doesn't include the substring "staging" is
|
||||||
|
# treated as production. Operators on a non-veza-domain (custom env)
|
||||||
|
# can still run the script; they just have to pass CONFIRM_PRODUCTION=1.
|
||||||
|
TARGET_LOOKS_LIKE_PROD=0
|
||||||
|
if [[ ! "$STAGING_URL" =~ staging ]] && [[ ! "$STAGING_URL" =~ localhost ]] && [[ ! "$STAGING_URL" =~ 127\.0\.0\.1 ]]; then
|
||||||
|
TARGET_LOOKS_LIKE_PROD=1
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ "$TARGET_LOOKS_LIKE_PROD" = "1" ] && [ "$CONFIRM_PRODUCTION" != "1" ]; then
|
||||||
|
cat >&2 <<EOF
|
||||||
|
|
||||||
|
================================================================
|
||||||
|
ABORTING — production target detected without explicit confirmation
|
||||||
|
================================================================
|
||||||
|
|
||||||
|
STAGING_URL=$STAGING_URL does not contain "staging", "localhost" or
|
||||||
|
"127.0.0.1", so this script will refuse to run by default to prevent
|
||||||
|
an accidental real-card charge.
|
||||||
|
|
||||||
|
If you genuinely want to run against production, re-invoke with:
|
||||||
|
|
||||||
|
CONFIRM_PRODUCTION=1 \\
|
||||||
|
STAGING_URL=$STAGING_URL \\
|
||||||
|
OPERATOR_EMAIL=$OPERATOR_EMAIL \\
|
||||||
|
OPERATOR_PASSWORD=... \\
|
||||||
|
bash scripts/payment-e2e-walkthrough.sh
|
||||||
|
|
||||||
|
Or set DRY_RUN=1 to rehearse the flow without making the actual charge.
|
||||||
|
================================================================
|
||||||
|
EOF
|
||||||
|
exit 3
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ "$DRY_RUN" = "1" ]; then
|
||||||
|
log "DRY_RUN=1 — order creation + payment + refund steps will be SKIPPED"
|
||||||
|
fi
|
||||||
|
|
||||||
# api wrapper that tee's request + response to the session log so the
|
# api wrapper that tee's request + response to the session log so the
|
||||||
# operator can copy-paste the full trace into the report.
|
# operator can copy-paste the full trace into the report.
|
||||||
api() {
|
api() {
|
||||||
|
|
@ -134,8 +182,39 @@ log " ✓ price : $PRODUCT_PRICE"
|
||||||
# --------------------------------------------------------------------
|
# --------------------------------------------------------------------
|
||||||
# Step 3 : POST /orders.
|
# Step 3 : POST /orders.
|
||||||
# --------------------------------------------------------------------
|
# --------------------------------------------------------------------
|
||||||
|
if [ "$DRY_RUN" = "1" ]; then
|
||||||
|
log ""
|
||||||
|
log "step 3 : POST /api/v1/marketplace/orders — SKIPPED (dry-run)"
|
||||||
|
log "================================================================"
|
||||||
|
log "DRY-RUN PASS : login + product list + license-mine endpoints reached"
|
||||||
|
log "Run without DRY_RUN to exercise the real charge + refund flow."
|
||||||
|
log "================================================================"
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
log ""
|
log ""
|
||||||
log "step 3 : POST /api/v1/marketplace/orders"
|
log "step 3 : POST /api/v1/marketplace/orders"
|
||||||
|
|
||||||
|
# v1.0.10 polish: confirm prompt before the actual charge so a typo'd
|
||||||
|
# product_id or wrong operator account can't quietly burn 5 EUR.
|
||||||
|
if [ "$TARGET_LOOKS_LIKE_PROD" = "1" ]; then
|
||||||
|
log ""
|
||||||
|
log "================================================================"
|
||||||
|
log "FINAL CONFIRMATION — about to charge a real card on production"
|
||||||
|
log "================================================================"
|
||||||
|
log " product_id : $PRODUCT_ID"
|
||||||
|
log " price : $PRODUCT_PRICE"
|
||||||
|
log " operator : $OPERATOR_EMAIL"
|
||||||
|
log " endpoint : ${STAGING_URL}/api/v1/marketplace/orders"
|
||||||
|
log ""
|
||||||
|
prompt "Type the literal word 'CHARGE' to proceed (anything else aborts) :"
|
||||||
|
read -r confirm_word
|
||||||
|
if [ "$confirm_word" != "CHARGE" ]; then
|
||||||
|
fail "operator did not confirm the charge ($confirm_word) — aborting" 2
|
||||||
|
fi
|
||||||
|
log " operator confirmed CHARGE — proceeding"
|
||||||
|
fi
|
||||||
|
|
||||||
order_body="{\"items\":[{\"product_id\":\"${PRODUCT_ID}\"}]}"
|
order_body="{\"items\":[{\"product_id\":\"${PRODUCT_ID}\"}]}"
|
||||||
order_resp=$(api POST /api/v1/marketplace/orders "$order_body" 2>/dev/null)
|
order_resp=$(api POST /api/v1/marketplace/orders "$order_body" 2>/dev/null)
|
||||||
ORDER_ID=$(echo "$order_resp" | jq -r '.data.order.id // .data.id // .id // ""')
|
ORDER_ID=$(echo "$order_resp" | jq -r '.data.order.id // .data.id // .id // ""')
|
||||||
|
|
|
||||||
191
scripts/pentest/seed-test-accounts.sh
Executable file
191
scripts/pentest/seed-test-accounts.sh
Executable file
|
|
@ -0,0 +1,191 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
# seed-test-accounts.sh — provision the 3 pentester accounts on a target
|
||||||
|
# environment (staging only ; refuses to run against prod).
|
||||||
|
#
|
||||||
|
# Per docs/PENTEST_SCOPE_2026.md §"Authentication context", an external
|
||||||
|
# pentest engagement needs three pre-seeded accounts (listener, creator,
|
||||||
|
# admin). This script :
|
||||||
|
#
|
||||||
|
# 1. Generates a 32-char random password for each role.
|
||||||
|
# 2. Calls the staging admin API to create / reset each account.
|
||||||
|
# 3. Promotes creator to creator, admin to admin (via direct DB UPDATE
|
||||||
|
# because the public API doesn't expose role changes — operator
|
||||||
|
# runs that step from a maintenance shell).
|
||||||
|
# 4. Writes a 1Password import JSON to stdout so the operator can
|
||||||
|
# `op item template` it into the shared vault. NEVER prints
|
||||||
|
# passwords to the screen.
|
||||||
|
#
|
||||||
|
# Usage :
|
||||||
|
# bash scripts/pentest/seed-test-accounts.sh staging
|
||||||
|
#
|
||||||
|
# Output :
|
||||||
|
# 1Password JSON on stdout (3 entries). Pipe into a file, then
|
||||||
|
# `op item create --vault Pentest-2026 - < file.json`.
|
||||||
|
#
|
||||||
|
# Exit codes :
|
||||||
|
# 0 — three accounts provisioned, JSON emitted
|
||||||
|
# 1 — API call failed (account creation or login probe)
|
||||||
|
# 2 — wrong target environment (e.g. operator passed "prod")
|
||||||
|
# 3 — required env var or tool missing
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
ENV_NAME=${1:-}
|
||||||
|
if [ -z "$ENV_NAME" ]; then
|
||||||
|
cat >&2 <<EOF
|
||||||
|
usage : bash scripts/pentest/seed-test-accounts.sh <env>
|
||||||
|
env : staging (the only accepted value — prod is refused)
|
||||||
|
|
||||||
|
Required env vars :
|
||||||
|
STAGING_URL base URL (e.g. https://staging.veza.fr)
|
||||||
|
STAGING_ADMIN_EMAIL admin who creates the accounts
|
||||||
|
STAGING_ADMIN_PASSWORD admin password (provisioning cred only)
|
||||||
|
|
||||||
|
Output :
|
||||||
|
1Password import JSON for vault Pentest-2026, on stdout.
|
||||||
|
Passwords are NEVER printed to the operator's screen.
|
||||||
|
EOF
|
||||||
|
exit 3
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ "$ENV_NAME" != "staging" ]; then
|
||||||
|
echo "ERROR: this script refuses to run against any env other than 'staging'." >&2
|
||||||
|
echo " Pentest accounts on production violate the engagement scope." >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
|
||||||
|
STAGING_URL=${STAGING_URL:-?}
|
||||||
|
STAGING_ADMIN_EMAIL=${STAGING_ADMIN_EMAIL:-?}
|
||||||
|
STAGING_ADMIN_PASSWORD=${STAGING_ADMIN_PASSWORD:-?}
|
||||||
|
|
||||||
|
[ "$STAGING_URL" = "?" ] && { echo "STAGING_URL required" >&2; exit 3; }
|
||||||
|
[ "$STAGING_ADMIN_EMAIL" = "?" ] && { echo "STAGING_ADMIN_EMAIL required" >&2; exit 3; }
|
||||||
|
[ "$STAGING_ADMIN_PASSWORD" = "?" ] && { echo "STAGING_ADMIN_PASSWORD required" >&2; exit 3; }
|
||||||
|
|
||||||
|
command -v curl >/dev/null 2>&1 || { echo "curl required" >&2; exit 3; }
|
||||||
|
command -v jq >/dev/null 2>&1 || { echo "jq required" >&2; exit 3; }
|
||||||
|
command -v openssl >/dev/null 2>&1 || { echo "openssl required (password generation)" >&2; exit 3; }
|
||||||
|
|
||||||
|
genpass() {
|
||||||
|
# 32-char password from base64-encoded 24 bytes of entropy. URL-safe
|
||||||
|
# so it can land in a JSON string without escaping.
|
||||||
|
openssl rand -base64 24 | tr -d '\n=/+' | cut -c-32
|
||||||
|
}
|
||||||
|
|
||||||
|
# 1. login as the staging admin so we can call the create-user endpoint.
|
||||||
|
admin_login_resp=$(curl -ksS --max-time 15 \
|
||||||
|
-X POST -H 'Content-Type: application/json' \
|
||||||
|
-d "{\"email\":\"${STAGING_ADMIN_EMAIL}\",\"password\":\"${STAGING_ADMIN_PASSWORD}\",\"remember_me\":false}" \
|
||||||
|
"${STAGING_URL}/api/v1/auth/login")
|
||||||
|
admin_token=$(echo "$admin_login_resp" | jq -r '.data.token.access_token // .token.access_token // ""')
|
||||||
|
if [ -z "$admin_token" ] || [ "$admin_token" = "null" ]; then
|
||||||
|
echo "ERROR: admin login failed" >&2
|
||||||
|
echo "$admin_login_resp" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
provision() {
|
||||||
|
# provision <role> <email-prefix>
|
||||||
|
# Returns : password (stdout), nothing else.
|
||||||
|
local role=$1 email_prefix=$2
|
||||||
|
local email="${email_prefix}@veza.fr"
|
||||||
|
local password
|
||||||
|
password=$(genpass)
|
||||||
|
|
||||||
|
# Try creating ; if 409 (already exists), reset password instead. Both
|
||||||
|
# paths return a valid (email, password) tuple at the end.
|
||||||
|
local create_resp create_status
|
||||||
|
create_resp=$(curl -ksS --max-time 15 \
|
||||||
|
-H "Authorization: Bearer ${admin_token}" \
|
||||||
|
-H 'Content-Type: application/json' \
|
||||||
|
-X POST \
|
||||||
|
-d "{\"email\":\"${email}\",\"password\":\"${password}\",\"username\":\"${email_prefix}\",\"role\":\"${role}\"}" \
|
||||||
|
-w '\nHTTP_CODE=%{http_code}' \
|
||||||
|
"${STAGING_URL}/api/v1/admin/users")
|
||||||
|
create_status=$(echo "$create_resp" | grep -oE 'HTTP_CODE=[0-9]+' | tail -1 | cut -d= -f2)
|
||||||
|
|
||||||
|
case "$create_status" in
|
||||||
|
200|201)
|
||||||
|
;;
|
||||||
|
409)
|
||||||
|
# Account exists — reset password instead.
|
||||||
|
curl -ksS --max-time 15 \
|
||||||
|
-H "Authorization: Bearer ${admin_token}" \
|
||||||
|
-H 'Content-Type: application/json' \
|
||||||
|
-X POST \
|
||||||
|
-d "{\"email\":\"${email}\",\"new_password\":\"${password}\"}" \
|
||||||
|
"${STAGING_URL}/api/v1/admin/users/reset-password" >/dev/null
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
echo "ERROR: provisioning ${role} failed with HTTP ${create_status}" >&2
|
||||||
|
echo "$create_resp" >&2
|
||||||
|
exit 1
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
|
||||||
|
# Probe : login as the freshly-set account so we know the engagement
|
||||||
|
# can use it.
|
||||||
|
probe=$(curl -ksS --max-time 15 \
|
||||||
|
-X POST -H 'Content-Type: application/json' \
|
||||||
|
-d "{\"email\":\"${email}\",\"password\":\"${password}\",\"remember_me\":false}" \
|
||||||
|
"${STAGING_URL}/api/v1/auth/login")
|
||||||
|
probe_token=$(echo "$probe" | jq -r '.data.token.access_token // .token.access_token // ""')
|
||||||
|
if [ -z "$probe_token" ] || [ "$probe_token" = "null" ]; then
|
||||||
|
echo "ERROR: ${role} login probe failed — provisioning broken" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
printf '%s' "$password"
|
||||||
|
}
|
||||||
|
|
||||||
|
# 2. provision the three roles. Passwords stay in shell variables — no
|
||||||
|
# echo, no log, no temp file.
|
||||||
|
listener_pwd=$(provision "user" "pentest-2026-listener")
|
||||||
|
creator_pwd=$(provision "creator" "pentest-2026-creator")
|
||||||
|
admin_pwd=$(provision "admin" "pentest-2026-admin")
|
||||||
|
|
||||||
|
# 3. emit 1Password JSON template. Each entry has the role + login URL
|
||||||
|
# in Notes so the pentester knows which account does what.
|
||||||
|
cat <<EOF
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"title": "pentest-2026-listener",
|
||||||
|
"category": "LOGIN",
|
||||||
|
"vault": {"name": "Pentest-2026"},
|
||||||
|
"fields": [
|
||||||
|
{"id": "username", "type": "STRING", "value": "pentest-2026-listener@veza.fr"},
|
||||||
|
{"id": "password", "type": "CONCEALED", "value": "${listener_pwd}"},
|
||||||
|
{"id": "url", "type": "URL", "value": "${STAGING_URL}/login"},
|
||||||
|
{"id": "notesPlain", "type": "STRING", "value": "Pentest 2026 — listener role. Engagement: see PENTEST_SCOPE_2026.md. Rotate at engagement end."}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"title": "pentest-2026-creator",
|
||||||
|
"category": "LOGIN",
|
||||||
|
"vault": {"name": "Pentest-2026"},
|
||||||
|
"fields": [
|
||||||
|
{"id": "username", "type": "STRING", "value": "pentest-2026-creator@veza.fr"},
|
||||||
|
{"id": "password", "type": "CONCEALED", "value": "${creator_pwd}"},
|
||||||
|
{"id": "url", "type": "URL", "value": "${STAGING_URL}/login"},
|
||||||
|
{"id": "notesPlain", "type": "STRING", "value": "Pentest 2026 — creator role. Owns 5 seed tracks. Rotate at engagement end."}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"title": "pentest-2026-admin",
|
||||||
|
"category": "LOGIN",
|
||||||
|
"vault": {"name": "Pentest-2026"},
|
||||||
|
"fields": [
|
||||||
|
{"id": "username", "type": "STRING", "value": "pentest-2026-admin@veza.fr"},
|
||||||
|
{"id": "password", "type": "CONCEALED", "value": "${admin_pwd}"},
|
||||||
|
{"id": "url", "type": "URL", "value": "${STAGING_URL}/login"},
|
||||||
|
{"id": "notesPlain", "type": "STRING", "value": "Pentest 2026 — admin role + MFA bypass. DO NOT use for non-pentest activity. Rotate at engagement end."}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
EOF
|
||||||
|
|
||||||
|
echo "" >&2
|
||||||
|
echo " 3 accounts provisioned + login-probed against ${STAGING_URL}" >&2
|
||||||
|
echo " next: pipe stdout to a file and run" >&2
|
||||||
|
echo " op item create --vault Pentest-2026 - < <file>" >&2
|
||||||
|
echo " THEN rotate each entry with op item edit --generate-password=letters,digits,32" >&2
|
||||||
|
echo " at engagement end (this script does NOT auto-rotate)." >&2
|
||||||
|
|
@ -16,18 +16,26 @@
|
||||||
# E : test_rabbitmq_outage.sh — stop RabbitMQ 60s, backend stays up
|
# E : test_rabbitmq_outage.sh — stop RabbitMQ 60s, backend stays up
|
||||||
#
|
#
|
||||||
# Usage :
|
# Usage :
|
||||||
# bash scripts/security/game-day-driver.sh # run all scenarios
|
# bash scripts/security/game-day-driver.sh # all scenarios on staging (default)
|
||||||
# SKIP=DE bash scripts/security/game-day-driver.sh # skip scenarios D + E
|
# SKIP=DE bash scripts/security/game-day-driver.sh # skip D + E
|
||||||
# ONLY=A bash scripts/security/game-day-driver.sh # only run scenario A
|
# ONLY=A bash scripts/security/game-day-driver.sh # only A
|
||||||
|
# INVENTORY=prod CONFIRM_PROD=1 bash scripts/security/game-day-driver.sh # prod (gated)
|
||||||
#
|
#
|
||||||
# Required env (passed through to the underlying smoke tests) :
|
# Required env (passed through to the underlying smoke tests) :
|
||||||
# REDIS_PASS / SENTINEL_PASS for scenario C
|
# REDIS_PASS / SENTINEL_PASS for scenario C
|
||||||
# MINIO_ROOT_USER / MINIO_ROOT_PASSWORD for scenario D
|
# MINIO_ROOT_USER / MINIO_ROOT_PASSWORD for scenario D
|
||||||
#
|
#
|
||||||
|
# v1.0.10 polish — production gating :
|
||||||
|
# INVENTORY=prod must be paired with CONFIRM_PROD=1 or the script
|
||||||
|
# refuses to run, so a stale shell-history line can't accidentally
|
||||||
|
# kill prod Postgres on a Monday morning. The driver also runs a
|
||||||
|
# backup-freshness pre-flight when targeting prod (most recent
|
||||||
|
# pgBackRest backup must be < 24 h old).
|
||||||
|
#
|
||||||
# Exit codes :
|
# Exit codes :
|
||||||
# 0 — every selected scenario passed
|
# 0 — every selected scenario passed
|
||||||
# 1 — at least one scenario failed
|
# 1 — at least one scenario failed
|
||||||
# 2 — runner pre-flight failed (script missing, etc.)
|
# 2 — runner pre-flight failed (script missing, prod safety guard tripped, stale backup, etc.)
|
||||||
set -euo pipefail
|
set -euo pipefail
|
||||||
|
|
||||||
REPO_ROOT="$(cd "$(dirname "$0")/../.." && pwd)"
|
REPO_ROOT="$(cd "$(dirname "$0")/../.." && pwd)"
|
||||||
|
|
@ -41,6 +49,9 @@ mkdir -p "$LOGS_DIR"
|
||||||
|
|
||||||
ONLY=${ONLY:-}
|
ONLY=${ONLY:-}
|
||||||
SKIP=${SKIP:-}
|
SKIP=${SKIP:-}
|
||||||
|
INVENTORY=${INVENTORY:-staging}
|
||||||
|
CONFIRM_PROD=${CONFIRM_PROD:-0}
|
||||||
|
SKIP_BACKUP_FRESHNESS=${SKIP_BACKUP_FRESHNESS:-0}
|
||||||
|
|
||||||
log() { printf '[%s] %s\n' "$(date +%H:%M:%S)" "$*" | tee -a "$SESSION_LOG" >&2; }
|
log() { printf '[%s] %s\n' "$(date +%H:%M:%S)" "$*" | tee -a "$SESSION_LOG" >&2; }
|
||||||
fail() { log "FAIL: $*"; exit "${2:-2}"; }
|
fail() { log "FAIL: $*"; exit "${2:-2}"; }
|
||||||
|
|
@ -68,6 +79,101 @@ want() {
|
||||||
return 0
|
return 0
|
||||||
}
|
}
|
||||||
|
|
||||||
|
# v1.0.10 polish — prod safety gate. INVENTORY=prod requires
|
||||||
|
# CONFIRM_PROD=1 + an interactive type-the-word confirm. Anything else
|
||||||
|
# defaults to staging so a forgotten env-var doesn't matter.
|
||||||
|
case "$INVENTORY" in
|
||||||
|
staging|stg|dev|local) ;;
|
||||||
|
prod|production)
|
||||||
|
if [ "$CONFIRM_PROD" != "1" ]; then
|
||||||
|
cat >&2 <<EOF
|
||||||
|
|
||||||
|
================================================================
|
||||||
|
ABORTING — INVENTORY=prod without CONFIRM_PROD=1
|
||||||
|
================================================================
|
||||||
|
|
||||||
|
This script will kill production services. Each scenario triggers a
|
||||||
|
real outage in the chosen inventory : Postgres primary kill, HAProxy
|
||||||
|
backend stop, Redis master kill, MinIO node loss, RabbitMQ stop.
|
||||||
|
|
||||||
|
To run on production, you must :
|
||||||
|
|
||||||
|
1. Announce a maintenance window 24 h ahead (status page +
|
||||||
|
#engineering channel).
|
||||||
|
2. Set PagerDuty to maintenance mode for the affected services.
|
||||||
|
3. Confirm pgBackRest's last backup is < 24 h old (this script
|
||||||
|
auto-checks if you don't pass SKIP_BACKUP_FRESHNESS=1).
|
||||||
|
4. Re-invoke with :
|
||||||
|
|
||||||
|
INVENTORY=prod CONFIRM_PROD=1 \\
|
||||||
|
bash scripts/security/game-day-driver.sh
|
||||||
|
|
||||||
|
The driver will then ask for one more interactive confirmation
|
||||||
|
(type the word KILL-PROD) before the first scenario fires.
|
||||||
|
================================================================
|
||||||
|
EOF
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Backup-freshness pre-flight : refuse to run if the most recent
|
||||||
|
# pgBackRest full/diff is > 24 h old. Recovery from a stale backup
|
||||||
|
# can extend an outage from minutes to hours, so the cost of
|
||||||
|
# postponing the game day is much less than the cost of compounded
|
||||||
|
# data loss if scenario A fails to recover and we have to restore
|
||||||
|
# from yesterday-but-one.
|
||||||
|
if [ "$SKIP_BACKUP_FRESHNESS" != "1" ]; then
|
||||||
|
if command -v pgbackrest >/dev/null 2>&1; then
|
||||||
|
last_backup_ts=$(pgbackrest --stanza=veza info --output=json 2>/dev/null \
|
||||||
|
| python3 -c "
|
||||||
|
import json, sys
|
||||||
|
try:
|
||||||
|
data = json.load(sys.stdin)
|
||||||
|
backups = data[0]['backup'] if data else []
|
||||||
|
if not backups: print(0); sys.exit(0)
|
||||||
|
print(max(b['timestamp']['stop'] for b in backups))
|
||||||
|
except Exception:
|
||||||
|
print(0)
|
||||||
|
" 2>/dev/null || echo 0)
|
||||||
|
now_ts=$(date +%s)
|
||||||
|
age_seconds=$(( now_ts - last_backup_ts ))
|
||||||
|
if [ "$last_backup_ts" -eq 0 ]; then
|
||||||
|
fail "pgBackRest backup-freshness check failed : could not parse 'pgbackrest info'. Set SKIP_BACKUP_FRESHNESS=1 to override (only after manually verifying a recent backup exists)." 2
|
||||||
|
fi
|
||||||
|
if [ "$age_seconds" -gt 86400 ]; then
|
||||||
|
age_hours=$(( age_seconds / 3600 ))
|
||||||
|
fail "pgBackRest most recent backup is ${age_hours}h old (threshold 24h). Run a backup before the game day, or set SKIP_BACKUP_FRESHNESS=1 if you've validated freshness another way." 2
|
||||||
|
fi
|
||||||
|
log "pre-flight : pgBackRest most recent backup is $(( age_seconds / 3600 ))h $(( (age_seconds % 3600) / 60 ))m old (< 24h threshold) — OK"
|
||||||
|
else
|
||||||
|
log "WARN : pgbackrest CLI not on \$PATH ; skipping backup-freshness check. Set SKIP_BACKUP_FRESHNESS=1 to silence this warning if intentional."
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Final type-the-word confirm. Everything above can be set in env
|
||||||
|
# by mistake ; this last step requires a human at the keyboard.
|
||||||
|
cat >&2 <<EOF
|
||||||
|
|
||||||
|
================================================================
|
||||||
|
PROD GAME DAY — final confirmation
|
||||||
|
================================================================
|
||||||
|
|
||||||
|
inventory : prod
|
||||||
|
scenarios : ${SCENARIOS[*]}${ONLY:+ (filtered by ONLY=$ONLY)}${SKIP:+ (filtered by SKIP=$SKIP)}
|
||||||
|
session : $SESSION_LOG
|
||||||
|
|
||||||
|
Each scenario triggers a real outage. Type the literal phrase
|
||||||
|
KILL-PROD (any other input aborts) to proceed :
|
||||||
|
EOF
|
||||||
|
read -r confirm_phrase
|
||||||
|
if [ "$confirm_phrase" != "KILL-PROD" ]; then
|
||||||
|
fail "operator did not confirm KILL-PROD ($confirm_phrase) — aborting" 2
|
||||||
|
fi
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
fail "INVENTORY=$INVENTORY not recognised — must be one of staging|prod" 2
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
|
||||||
# Pre-flight : every selected scenario script must exist + be executable.
|
# Pre-flight : every selected scenario script must exist + be executable.
|
||||||
for s in "${SCENARIOS[@]}"; do
|
for s in "${SCENARIOS[@]}"; do
|
||||||
if want "$s"; then
|
if want "$s"; then
|
||||||
|
|
@ -83,6 +189,7 @@ declare -A SCENARIO_DURATION
|
||||||
|
|
||||||
log "================================================================"
|
log "================================================================"
|
||||||
log "Game day session : $SESSION_DATE"
|
log "Game day session : $SESSION_DATE"
|
||||||
|
log "Inventory : $INVENTORY"
|
||||||
log "Session log : $SESSION_LOG"
|
log "Session log : $SESSION_LOG"
|
||||||
log "Scenarios run : ${SCENARIOS[*]}"
|
log "Scenarios run : ${SCENARIOS[*]}"
|
||||||
[ -n "$ONLY" ] && log "ONLY filter : $ONLY"
|
[ -n "$ONLY" ] && log "ONLY filter : $ONLY"
|
||||||
|
|
|
||||||
255
scripts/soft-launch/monitor-checks.sh
Executable file
255
scripts/soft-launch/monitor-checks.sh
Executable file
|
|
@ -0,0 +1,255 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
# monitor-checks.sh — poll the soft-launch acceptance gate live during
|
||||||
|
# the bêta window so the operator gets a heads-up before the decision
|
||||||
|
# call instead of discovering at 18:00 UTC that one threshold is red.
|
||||||
|
#
|
||||||
|
# Acceptance gate (per docs/SOFT_LAUNCH_BETA_2026.md §"Acceptance gate") :
|
||||||
|
# - ≥ 50 testers signed up (used_at != NULL on beta_invites)
|
||||||
|
# - 0 P1 events in Sentry today
|
||||||
|
# - Status page green for the last 4 h
|
||||||
|
# - Synthetic parcours all green for 6 h
|
||||||
|
# - Nightly k6 load test green
|
||||||
|
# - < 3 HIGH-severity issues reported
|
||||||
|
#
|
||||||
|
# v1.0.10 Cluster 3.4.
|
||||||
|
#
|
||||||
|
# Usage :
|
||||||
|
# DATABASE_URL=postgres://... \
|
||||||
|
# SENTRY_AUTH_TOKEN=... \
|
||||||
|
# STATUSPAGE_URL=https://status.veza.fr \
|
||||||
|
# PROM_URL=https://prom.veza.fr \
|
||||||
|
# bash scripts/soft-launch/monitor-checks.sh
|
||||||
|
#
|
||||||
|
# By default the script runs once and exits with the gate's verdict.
|
||||||
|
# Run it from cron (e.g. every 30 min) or pass LOOP=1 to keep checking
|
||||||
|
# in-place every CHECK_INTERVAL seconds (default 600 = 10 min).
|
||||||
|
#
|
||||||
|
# Optional env :
|
||||||
|
# LOOP=1 continuous mode
|
||||||
|
# CHECK_INTERVAL seconds between checks in LOOP mode (default 600)
|
||||||
|
# QUIET=1 only emit the verdict line (for cron piping)
|
||||||
|
# THRESHOLD_TESTERS override 50 (default), e.g. set to 100 for
|
||||||
|
# a stricter sub-window
|
||||||
|
#
|
||||||
|
# Exit codes :
|
||||||
|
# 0 — every gate green
|
||||||
|
# 1 — at least one gate red
|
||||||
|
# 2 — at least one gate could not be checked (collector down,
|
||||||
|
# token wrong, etc.) — operator must verify manually
|
||||||
|
# 3 — required env / tool missing
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
DATABASE_URL=${DATABASE_URL:-?}
|
||||||
|
SENTRY_AUTH_TOKEN=${SENTRY_AUTH_TOKEN:-?}
|
||||||
|
STATUSPAGE_URL=${STATUSPAGE_URL:-https://status.veza.fr}
|
||||||
|
PROM_URL=${PROM_URL:-?}
|
||||||
|
LOOP=${LOOP:-0}
|
||||||
|
CHECK_INTERVAL=${CHECK_INTERVAL:-600}
|
||||||
|
QUIET=${QUIET:-0}
|
||||||
|
THRESHOLD_TESTERS=${THRESHOLD_TESTERS:-50}
|
||||||
|
|
||||||
|
[ "$DATABASE_URL" = "?" ] && { echo "DATABASE_URL required" >&2; exit 3; }
|
||||||
|
[ "$SENTRY_AUTH_TOKEN" = "?" ] && { echo "SENTRY_AUTH_TOKEN required (read scope sufficient)" >&2; exit 3; }
|
||||||
|
[ "$PROM_URL" = "?" ] && { echo "PROM_URL required" >&2; exit 3; }
|
||||||
|
|
||||||
|
command -v psql >/dev/null 2>&1 || { echo "psql required" >&2; exit 3; }
|
||||||
|
command -v curl >/dev/null 2>&1 || { echo "curl required" >&2; exit 3; }
|
||||||
|
command -v jq >/dev/null 2>&1 || { echo "jq required" >&2; exit 3; }
|
||||||
|
|
||||||
|
# ----------------------------------------------------------------------
|
||||||
|
# Individual gate checks. Each prints "✅ <name>" / "🔴 <name>" / "⚪ <name>"
|
||||||
|
# (last for "could not check"), and sets one of GATE_*_OK to 0 / 1 / 2.
|
||||||
|
# ----------------------------------------------------------------------
|
||||||
|
|
||||||
|
GATE_TESTERS_OK=2
|
||||||
|
GATE_SENTRY_OK=2
|
||||||
|
GATE_STATUSPAGE_OK=2
|
||||||
|
GATE_SYNTHETIC_OK=2
|
||||||
|
GATE_K6_OK=2
|
||||||
|
GATE_ISSUES_OK=2
|
||||||
|
|
||||||
|
check_testers() {
|
||||||
|
local count
|
||||||
|
count=$(psql "$DATABASE_URL" -A -t -c "
|
||||||
|
SELECT count(*) FROM beta_invites WHERE used_at IS NOT NULL;
|
||||||
|
" 2>/dev/null | tr -d ' ' || echo "?")
|
||||||
|
if [ "$count" = "?" ] || ! [[ "$count" =~ ^[0-9]+$ ]]; then
|
||||||
|
echo "⚪ testers signed-up : check failed (psql)"
|
||||||
|
GATE_TESTERS_OK=2
|
||||||
|
return
|
||||||
|
fi
|
||||||
|
if [ "$count" -ge "$THRESHOLD_TESTERS" ]; then
|
||||||
|
echo "✅ testers signed-up : $count / $THRESHOLD_TESTERS"
|
||||||
|
GATE_TESTERS_OK=0
|
||||||
|
else
|
||||||
|
echo "🔴 testers signed-up : $count / $THRESHOLD_TESTERS"
|
||||||
|
GATE_TESTERS_OK=1
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
check_sentry_p1() {
|
||||||
|
# Sentry API : count of unresolved P1 issues last 24h.
|
||||||
|
local count
|
||||||
|
count=$(curl -s -H "Authorization: Bearer $SENTRY_AUTH_TOKEN" \
|
||||||
|
"https://sentry.io/api/0/projects/veza/veza-backend/issues/?statsPeriod=24h&query=is:unresolved%20level:fatal" \
|
||||||
|
2>/dev/null | jq 'length' 2>/dev/null || echo "?")
|
||||||
|
if [ "$count" = "?" ] || ! [[ "$count" =~ ^[0-9]+$ ]]; then
|
||||||
|
echo "⚪ Sentry P1 events 24h : check failed (auth or network)"
|
||||||
|
GATE_SENTRY_OK=2
|
||||||
|
return
|
||||||
|
fi
|
||||||
|
if [ "$count" -eq 0 ]; then
|
||||||
|
echo "✅ Sentry P1 events 24h : 0"
|
||||||
|
GATE_SENTRY_OK=0
|
||||||
|
else
|
||||||
|
echo "🔴 Sentry P1 events 24h : $count (must be 0)"
|
||||||
|
GATE_SENTRY_OK=1
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
check_statuspage() {
|
||||||
|
local status
|
||||||
|
status=$(curl -s "$STATUSPAGE_URL/api/v1/status" 2>/dev/null \
|
||||||
|
| jq -r '.indicator // .status.indicator // ""' 2>/dev/null || echo "")
|
||||||
|
case "$status" in
|
||||||
|
none|operational)
|
||||||
|
echo "✅ status page : $status (green)"
|
||||||
|
GATE_STATUSPAGE_OK=0
|
||||||
|
;;
|
||||||
|
minor|major|critical)
|
||||||
|
echo "🔴 status page : $status"
|
||||||
|
GATE_STATUSPAGE_OK=1
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
echo "⚪ status page : check failed (got '$status')"
|
||||||
|
GATE_STATUSPAGE_OK=2
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
}
|
||||||
|
|
||||||
|
check_synthetic() {
|
||||||
|
# PromQL : sum of probe_success over the last 6h ; expect every
|
||||||
|
# parcours at 1 (success).
|
||||||
|
local query='probe_success{probe_kind="synthetic"} == 0'
|
||||||
|
local resp
|
||||||
|
resp=$(curl -s --get "$PROM_URL/api/v1/query" \
|
||||||
|
--data-urlencode "query=$query" 2>/dev/null)
|
||||||
|
local result_count
|
||||||
|
result_count=$(echo "$resp" | jq '.data.result | length' 2>/dev/null || echo "?")
|
||||||
|
if [ "$result_count" = "?" ] || ! [[ "$result_count" =~ ^[0-9]+$ ]]; then
|
||||||
|
echo "⚪ synthetic parcours : check failed (Prometheus)"
|
||||||
|
GATE_SYNTHETIC_OK=2
|
||||||
|
return
|
||||||
|
fi
|
||||||
|
if [ "$result_count" -eq 0 ]; then
|
||||||
|
echo "✅ synthetic parcours : all green"
|
||||||
|
GATE_SYNTHETIC_OK=0
|
||||||
|
else
|
||||||
|
local failing
|
||||||
|
failing=$(echo "$resp" | jq -r '.data.result[].metric.parcours' 2>/dev/null | tr '\n' ',' | sed 's/,$//')
|
||||||
|
echo "🔴 synthetic parcours : $result_count failing ($failing)"
|
||||||
|
GATE_SYNTHETIC_OK=1
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
check_k6_nightly() {
|
||||||
|
# k6 nightly is exposed as veza_k6_nightly_last_success_timestamp_seconds
|
||||||
|
# by the Forgejo runner workflow's textfile-collector. Reading via Prom
|
||||||
|
# gives "is the last success < 30h old?".
|
||||||
|
local query='time() - veza_k6_nightly_last_success_timestamp_seconds'
|
||||||
|
local resp age
|
||||||
|
resp=$(curl -s --get "$PROM_URL/api/v1/query" \
|
||||||
|
--data-urlencode "query=$query" 2>/dev/null)
|
||||||
|
age=$(echo "$resp" | jq -r '.data.result[0].value[1] // ""' 2>/dev/null)
|
||||||
|
if [ -z "$age" ] || [ "$age" = "null" ]; then
|
||||||
|
echo "⚪ k6 nightly : check failed (metric absent — runner offline?)"
|
||||||
|
GATE_K6_OK=2
|
||||||
|
return
|
||||||
|
fi
|
||||||
|
age_int=$(printf '%.0f' "$age" 2>/dev/null || echo 999999)
|
||||||
|
if [ "$age_int" -lt 108000 ]; then # 30h
|
||||||
|
echo "✅ k6 nightly : last success $(( age_int / 3600 ))h ago"
|
||||||
|
GATE_K6_OK=0
|
||||||
|
else
|
||||||
|
echo "🔴 k6 nightly : last success $(( age_int / 3600 ))h ago (> 30h)"
|
||||||
|
GATE_K6_OK=1
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
check_high_issues() {
|
||||||
|
# The operator-reported issues count lives in the SOFT_LAUNCH_BETA_2026.md
|
||||||
|
# report under "Issues reported". Without an external tracker we read it
|
||||||
|
# from a known location in the report file. Skip if file absent.
|
||||||
|
local report
|
||||||
|
report="$(cd "$(dirname "$0")/../.." && pwd)/docs/SOFT_LAUNCH_BETA_2026.md"
|
||||||
|
if [ ! -f "$report" ]; then
|
||||||
|
echo "⚪ HIGH issues count : report file not found"
|
||||||
|
GATE_ISSUES_OK=2
|
||||||
|
return
|
||||||
|
fi
|
||||||
|
local count
|
||||||
|
count=$(grep -cE '^\| HIGH ' "$report" 2>/dev/null || echo 0)
|
||||||
|
if [ "$count" -lt 3 ]; then
|
||||||
|
echo "✅ HIGH-severity issues reported : $count / < 3"
|
||||||
|
GATE_ISSUES_OK=0
|
||||||
|
else
|
||||||
|
echo "🔴 HIGH-severity issues reported : $count / < 3"
|
||||||
|
GATE_ISSUES_OK=1
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
# ----------------------------------------------------------------------
|
||||||
|
# Main loop
|
||||||
|
# ----------------------------------------------------------------------
|
||||||
|
|
||||||
|
run_once() {
|
||||||
|
if [ "$QUIET" != "1" ]; then
|
||||||
|
echo "================================================================"
|
||||||
|
echo "Acceptance gate check — $(date -u +'%Y-%m-%d %H:%M:%S UTC')"
|
||||||
|
echo "----------------------------------------------------------------"
|
||||||
|
fi
|
||||||
|
|
||||||
|
check_testers
|
||||||
|
check_sentry_p1
|
||||||
|
check_statuspage
|
||||||
|
check_synthetic
|
||||||
|
check_k6_nightly
|
||||||
|
check_high_issues
|
||||||
|
|
||||||
|
if [ "$QUIET" != "1" ]; then
|
||||||
|
echo "----------------------------------------------------------------"
|
||||||
|
fi
|
||||||
|
|
||||||
|
local red=0 unknown=0
|
||||||
|
for v in "$GATE_TESTERS_OK" "$GATE_SENTRY_OK" "$GATE_STATUSPAGE_OK" \
|
||||||
|
"$GATE_SYNTHETIC_OK" "$GATE_K6_OK" "$GATE_ISSUES_OK"; do
|
||||||
|
case $v in
|
||||||
|
1) red=$(( red + 1 )) ;;
|
||||||
|
2) unknown=$(( unknown + 1 )) ;;
|
||||||
|
esac
|
||||||
|
done
|
||||||
|
|
||||||
|
if [ "$red" -eq 0 ] && [ "$unknown" -eq 0 ]; then
|
||||||
|
echo "VERDICT : ALL GATES GREEN — soft-launch is GO"
|
||||||
|
return 0
|
||||||
|
elif [ "$red" -gt 0 ]; then
|
||||||
|
echo "VERDICT : $red gate(s) RED — NO-GO until resolved"
|
||||||
|
return 1
|
||||||
|
else
|
||||||
|
echo "VERDICT : $unknown gate(s) UNCHECKABLE — operator must verify manually before decision call"
|
||||||
|
return 2
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
if [ "$LOOP" != "1" ]; then
|
||||||
|
run_once
|
||||||
|
exit $?
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Continuous mode.
|
||||||
|
while true; do
|
||||||
|
run_once || true
|
||||||
|
echo ""
|
||||||
|
echo "next check in ${CHECK_INTERVAL}s — Ctrl-C to exit"
|
||||||
|
sleep "$CHECK_INTERVAL"
|
||||||
|
done
|
||||||
179
scripts/soft-launch/send-invitations.sh
Executable file
179
scripts/soft-launch/send-invitations.sh
Executable file
|
|
@ -0,0 +1,179 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
# send-invitations.sh — batch-insert beta invitations from a validated
|
||||||
|
# cohort CSV, generate unique invite codes, render personalised email
|
||||||
|
# bodies, optionally dispatch via SMTP.
|
||||||
|
#
|
||||||
|
# Wraps the validate-cohort.sh sanity check + a transactional INSERT
|
||||||
|
# into beta_invites + a per-recipient email render. Splits "generate
|
||||||
|
# the codes + render the emails" from "actually send" so a dry-run
|
||||||
|
# produces a flat directory of `.eml` files the operator can review
|
||||||
|
# before dispatch.
|
||||||
|
#
|
||||||
|
# v1.0.10 Cluster 3.4.
|
||||||
|
#
|
||||||
|
# Usage :
|
||||||
|
# # Step 1 : dry-run (default). Inserts beta_invites rows, emits
|
||||||
|
# # eml files but does NOT send anything.
|
||||||
|
# DATABASE_URL=postgres://... \
|
||||||
|
# bash scripts/soft-launch/send-invitations.sh path/to/cohort.csv
|
||||||
|
#
|
||||||
|
# # Step 2 : after reviewing the eml files, dispatch with msmtp /
|
||||||
|
# # sendmail / aws-ses-cli (or whatever SEND_CMD points at).
|
||||||
|
# SEND=1 SEND_CMD='msmtp -t' \
|
||||||
|
# bash scripts/soft-launch/send-invitations.sh path/to/cohort.csv
|
||||||
|
#
|
||||||
|
# Required env :
|
||||||
|
# DATABASE_URL Postgres URL (read+write to beta_invites)
|
||||||
|
# FRONTEND_URL base URL the invite link points at
|
||||||
|
# (e.g. https://staging.veza.fr)
|
||||||
|
#
|
||||||
|
# Optional env :
|
||||||
|
# SEND=1 actually dispatch ; otherwise dry-run (eml only)
|
||||||
|
# SEND_CMD sendmail-compatible command (default: 'msmtp -t')
|
||||||
|
# SENT_BY_EMAIL operator email for the beta_invites.sent_by FK ;
|
||||||
|
# defaults to the value in the CSV's third column
|
||||||
|
# FROM_ADDR From: header (default: invitations@veza.fr)
|
||||||
|
# SUBJECT email subject (default: 'You're in for the Veza beta')
|
||||||
|
# TEMPLATE path to eml template (default:
|
||||||
|
# templates/email/beta_invite.html.template)
|
||||||
|
# FORCE=1 skip validate-cohort.sh failures (use with care)
|
||||||
|
#
|
||||||
|
# Exit codes :
|
||||||
|
# 0 — everything succeeded
|
||||||
|
# 1 — cohort validation failed (see validate-cohort.sh)
|
||||||
|
# 2 — DB transaction failed
|
||||||
|
# 3 — required env missing
|
||||||
|
# 4 — dispatch failed for at least one recipient (see logs)
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
|
||||||
|
|
||||||
|
CSV=${1:-}
|
||||||
|
if [ -z "$CSV" ] || [ ! -f "$CSV" ]; then
|
||||||
|
echo "usage: bash scripts/soft-launch/send-invitations.sh path/to/cohort.csv" >&2
|
||||||
|
exit 3
|
||||||
|
fi
|
||||||
|
|
||||||
|
DATABASE_URL=${DATABASE_URL:-?}
|
||||||
|
FRONTEND_URL=${FRONTEND_URL:-?}
|
||||||
|
[ "$DATABASE_URL" = "?" ] && { echo "DATABASE_URL required" >&2; exit 3; }
|
||||||
|
[ "$FRONTEND_URL" = "?" ] && { echo "FRONTEND_URL required" >&2; exit 3; }
|
||||||
|
|
||||||
|
SEND=${SEND:-0}
|
||||||
|
SEND_CMD=${SEND_CMD:-msmtp -t}
|
||||||
|
FROM_ADDR=${FROM_ADDR:-invitations@veza.fr}
|
||||||
|
SUBJECT=${SUBJECT:-Vous êtes admis dans la bêta Veza}
|
||||||
|
TEMPLATE=${TEMPLATE:-$REPO_ROOT/templates/email/beta_invite.eml.template}
|
||||||
|
FORCE=${FORCE:-0}
|
||||||
|
SESSION_DATE="$(date +%Y%m%d-%H%M)"
|
||||||
|
OUTDIR="$REPO_ROOT/scripts/soft-launch/out-${SESSION_DATE}"
|
||||||
|
|
||||||
|
command -v psql >/dev/null 2>&1 || { echo "psql required" >&2; exit 3; }
|
||||||
|
command -v openssl >/dev/null 2>&1 || { echo "openssl required" >&2; exit 3; }
|
||||||
|
|
||||||
|
# Step 1 — validate the cohort. Bypass with FORCE=1 if needed.
|
||||||
|
echo "→ validating cohort $CSV"
|
||||||
|
if ! bash "$(dirname "$0")/validate-cohort.sh" "$CSV"; then
|
||||||
|
if [ "$FORCE" != "1" ]; then
|
||||||
|
echo "ERROR: cohort validation failed. Re-run with FORCE=1 to bypass (not recommended)." >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
echo "WARN : cohort validation reported issues but FORCE=1 set — proceeding."
|
||||||
|
fi
|
||||||
|
|
||||||
|
mkdir -p "$OUTDIR"
|
||||||
|
echo "→ output dir $OUTDIR"
|
||||||
|
|
||||||
|
# Step 2 — generate codes + insert rows + render emails. Each insert
|
||||||
|
# is one transaction so a partial failure leaves consistent state.
|
||||||
|
gen_code() {
|
||||||
|
# 16-char base32-ish (no 0/1/I/L) so codes are paste-friendly.
|
||||||
|
openssl rand -hex 16 | tr 'a-f0-9' 'a-z2-9' \
|
||||||
|
| tr -d 'oilOIL01' | head -c 16
|
||||||
|
}
|
||||||
|
|
||||||
|
if [ ! -f "$TEMPLATE" ]; then
|
||||||
|
echo "ERROR: template $TEMPLATE not found." >&2
|
||||||
|
exit 3
|
||||||
|
fi
|
||||||
|
|
||||||
|
inserted=0
|
||||||
|
failed=0
|
||||||
|
failed_emails=()
|
||||||
|
|
||||||
|
while IFS=, read -r email cohort sent_by_email; do
|
||||||
|
email=$(echo "$email" | tr -d '\r' | xargs)
|
||||||
|
cohort=$(echo "$cohort" | tr -d '\r' | xargs)
|
||||||
|
sent_by_email=$(echo "$sent_by_email" | tr -d '\r' | xargs)
|
||||||
|
|
||||||
|
code=$(gen_code)
|
||||||
|
|
||||||
|
# Resolve sent_by user_id (may be NULL if operator email isn't a
|
||||||
|
# registered user — e.g. ops shared mailbox).
|
||||||
|
sent_by_id=$(psql "$DATABASE_URL" -A -t -c "
|
||||||
|
SELECT id::text FROM users WHERE email = '$sent_by_email' LIMIT 1;
|
||||||
|
" 2>/dev/null | tr -d ' ' || echo "")
|
||||||
|
|
||||||
|
if [ -z "$sent_by_id" ]; then
|
||||||
|
sent_by_clause="NULL"
|
||||||
|
else
|
||||||
|
sent_by_clause="'$sent_by_id'"
|
||||||
|
fi
|
||||||
|
|
||||||
|
if ! psql "$DATABASE_URL" -1 -c "
|
||||||
|
INSERT INTO beta_invites (code, email, cohort, sent_by, expires_at)
|
||||||
|
VALUES ('$code', '$email', '$cohort', $sent_by_clause, NOW() + INTERVAL '30 days');
|
||||||
|
" >/dev/null 2>&1; then
|
||||||
|
failed=$(( failed + 1 ))
|
||||||
|
failed_emails+=("$email")
|
||||||
|
continue
|
||||||
|
fi
|
||||||
|
inserted=$(( inserted + 1 ))
|
||||||
|
|
||||||
|
# Render the eml — operator-readable, ready for SEND_CMD.
|
||||||
|
eml="$OUTDIR/${email//[^a-zA-Z0-9._-]/_}.eml"
|
||||||
|
invite_url="$FRONTEND_URL/signup?invite=$code"
|
||||||
|
sed \
|
||||||
|
-e "s|{{TO_ADDR}}|$email|g" \
|
||||||
|
-e "s|{{FROM_ADDR}}|$FROM_ADDR|g" \
|
||||||
|
-e "s|{{SUBJECT}}|$SUBJECT|g" \
|
||||||
|
-e "s|{{INVITE_URL}}|$invite_url|g" \
|
||||||
|
-e "s|{{INVITE_CODE}}|$code|g" \
|
||||||
|
-e "s|{{COHORT}}|$cohort|g" \
|
||||||
|
-e "s|{{FRONTEND_URL}}|$FRONTEND_URL|g" \
|
||||||
|
"$TEMPLATE" > "$eml"
|
||||||
|
done < <(tail -n +2 "$CSV")
|
||||||
|
|
||||||
|
echo "→ inserted $inserted invitations into beta_invites"
|
||||||
|
echo "→ rendered $inserted emails to $OUTDIR"
|
||||||
|
[ "$failed" -gt 0 ] && {
|
||||||
|
echo "WARN : $failed insert(s) failed — see logs above"
|
||||||
|
for e in "${failed_emails[@]}"; do echo " - $e"; done
|
||||||
|
}
|
||||||
|
|
||||||
|
# Step 3 — optionally dispatch.
|
||||||
|
if [ "$SEND" != "1" ]; then
|
||||||
|
echo ""
|
||||||
|
echo "DRY-RUN — review the eml files in $OUTDIR before sending."
|
||||||
|
echo "When ready :"
|
||||||
|
echo " SEND=1 SEND_CMD='$SEND_CMD' bash scripts/soft-launch/send-invitations.sh $CSV"
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "→ dispatching via : $SEND_CMD"
|
||||||
|
dispatch_failed=0
|
||||||
|
for eml in "$OUTDIR"/*.eml; do
|
||||||
|
if ! $SEND_CMD < "$eml" >>"$OUTDIR/dispatch.log" 2>&1; then
|
||||||
|
dispatch_failed=$(( dispatch_failed + 1 ))
|
||||||
|
echo " FAIL : $eml" | tee -a "$OUTDIR/dispatch.log"
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
if [ "$dispatch_failed" -gt 0 ]; then
|
||||||
|
echo "WARN : $dispatch_failed dispatch(es) failed — see $OUTDIR/dispatch.log"
|
||||||
|
exit 4
|
||||||
|
fi
|
||||||
|
echo "PASS : all $inserted invitations dispatched."
|
||||||
|
echo "Track redemption with :"
|
||||||
|
echo " psql \"\$DATABASE_URL\" -c 'SELECT cohort, count(*) FILTER (WHERE used_at IS NOT NULL) AS redeemed, count(*) AS total FROM beta_invites GROUP BY cohort ORDER BY cohort;'"
|
||||||
173
scripts/soft-launch/validate-cohort.sh
Executable file
173
scripts/soft-launch/validate-cohort.sh
Executable file
|
|
@ -0,0 +1,173 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
# validate-cohort.sh — sanity-check a soft-launch beta cohort CSV
|
||||||
|
# before it gets fed to send-invitations.sh.
|
||||||
|
#
|
||||||
|
# The CSV is the operator's curated list of beta-tester emails +
|
||||||
|
# segmentation. This script catches the avoidable mistakes BEFORE
|
||||||
|
# we batch-insert 100 rows into beta_invites and start spraying
|
||||||
|
# emails :
|
||||||
|
#
|
||||||
|
# - Empty file or wrong header
|
||||||
|
# - Duplicate emails (would create 2 invites for the same person)
|
||||||
|
# - Malformed emails (missing @, leading/trailing whitespace)
|
||||||
|
# - Cohort distribution looks off (no creators, only one segment,
|
||||||
|
# under-50 total — soft-launch acceptance gate is ≥50 testers)
|
||||||
|
# - Email collisions with existing users (already registered = the
|
||||||
|
# invite code is wasted)
|
||||||
|
#
|
||||||
|
# v1.0.10 Cluster 3.4.
|
||||||
|
#
|
||||||
|
# Usage :
|
||||||
|
# bash scripts/soft-launch/validate-cohort.sh path/to/cohort.csv
|
||||||
|
#
|
||||||
|
# Optional env :
|
||||||
|
# DATABASE_URL if set, also checks for collisions with the users
|
||||||
|
# table (email already registered → flagged but not
|
||||||
|
# fatal — operator may want to invite an existing
|
||||||
|
# user back to test the new flows).
|
||||||
|
# MIN_COHORT minimum total rows required (default 50, matches the
|
||||||
|
# acceptance-gate threshold in SOFT_LAUNCH_BETA_2026.md).
|
||||||
|
# MIN_CREATORS minimum number of creator-* cohort rows (default 5).
|
||||||
|
#
|
||||||
|
# Exit codes :
|
||||||
|
# 0 — cohort valid
|
||||||
|
# 1 — cohort malformed (will block send-invitations.sh)
|
||||||
|
# 2 — cohort merely warns (size below minimum, missing collision
|
||||||
|
# check) ; operator may proceed with --force
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
CSV=${1:-}
|
||||||
|
if [ -z "$CSV" ] || [ ! -f "$CSV" ]; then
|
||||||
|
cat >&2 <<EOF
|
||||||
|
usage : bash scripts/soft-launch/validate-cohort.sh path/to/cohort.csv
|
||||||
|
|
||||||
|
CSV format (header required) :
|
||||||
|
email,cohort,sent_by_email
|
||||||
|
alice@example.com,creator-vinyl,ops@veza.fr
|
||||||
|
bob@example.com,listener-jazz,ops@veza.fr
|
||||||
|
...
|
||||||
|
|
||||||
|
cohort labels are free-text but should follow the convention
|
||||||
|
<role>-<segment> so the post-launch attribution report groups cleanly.
|
||||||
|
EOF
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
MIN_COHORT=${MIN_COHORT:-50}
|
||||||
|
MIN_CREATORS=${MIN_CREATORS:-5}
|
||||||
|
|
||||||
|
# 1. Header check.
|
||||||
|
header=$(head -1 "$CSV" | tr -d '\r')
|
||||||
|
if [ "$header" != "email,cohort,sent_by_email" ]; then
|
||||||
|
echo "ERROR: header line must be exactly 'email,cohort,sent_by_email' (got: $header)" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# 2. Row count + duplicates + email shape (awk pipeline reads once).
|
||||||
|
total=0
|
||||||
|
malformed=0
|
||||||
|
duplicates=0
|
||||||
|
declare -A seen
|
||||||
|
declare -A cohort_count
|
||||||
|
declare -a malformed_lines
|
||||||
|
|
||||||
|
while IFS=, read -r email cohort sent_by_email; do
|
||||||
|
email=$(echo "$email" | tr -d '\r' | xargs)
|
||||||
|
cohort=$(echo "$cohort" | tr -d '\r' | xargs)
|
||||||
|
|
||||||
|
total=$(( total + 1 ))
|
||||||
|
|
||||||
|
# Email shape : must contain exactly one @, no whitespace, > 5 chars.
|
||||||
|
if [[ ! "$email" =~ ^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$ ]]; then
|
||||||
|
malformed=$(( malformed + 1 ))
|
||||||
|
malformed_lines+=(" line $(( total + 1 )) : invalid email '$email'")
|
||||||
|
continue
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Duplicate detection.
|
||||||
|
if [ -n "${seen[$email]:-}" ]; then
|
||||||
|
duplicates=$(( duplicates + 1 ))
|
||||||
|
malformed_lines+=(" line $(( total + 1 )) : duplicate email '$email' (first seen at line ${seen[$email]})")
|
||||||
|
continue
|
||||||
|
fi
|
||||||
|
seen[$email]=$(( total + 1 ))
|
||||||
|
|
||||||
|
# Cohort tally.
|
||||||
|
cohort_count[$cohort]=$(( ${cohort_count[$cohort]:-0} + 1 ))
|
||||||
|
done < <(tail -n +2 "$CSV")
|
||||||
|
|
||||||
|
echo "----------------------------------------------------------------"
|
||||||
|
echo "Cohort validation report"
|
||||||
|
echo "----------------------------------------------------------------"
|
||||||
|
echo " CSV file : $CSV"
|
||||||
|
echo " Total rows : $total"
|
||||||
|
echo " Unique emails : ${#seen[@]}"
|
||||||
|
echo " Malformed rows : $malformed"
|
||||||
|
echo " Duplicates : $duplicates"
|
||||||
|
echo ""
|
||||||
|
echo "Distribution by cohort :"
|
||||||
|
for c in "${!cohort_count[@]}"; do
|
||||||
|
printf " %-40s %d\n" "$c" "${cohort_count[$c]}"
|
||||||
|
done | sort
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
exit_code=0
|
||||||
|
|
||||||
|
# 3. Hard checks (block send).
|
||||||
|
if [ "$malformed" -gt 0 ] || [ "$duplicates" -gt 0 ]; then
|
||||||
|
echo "ERROR: $malformed malformed + $duplicates duplicate row(s) — fix before sending."
|
||||||
|
for line in "${malformed_lines[@]}"; do
|
||||||
|
echo "$line"
|
||||||
|
done
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# 4. Soft checks (warn, don't block — operator decides).
|
||||||
|
if [ "$total" -lt "$MIN_COHORT" ]; then
|
||||||
|
echo "WARN : cohort has $total rows ; soft-launch acceptance gate is ≥ $MIN_COHORT."
|
||||||
|
exit_code=2
|
||||||
|
fi
|
||||||
|
|
||||||
|
creator_total=0
|
||||||
|
for c in "${!cohort_count[@]}"; do
|
||||||
|
if [[ "$c" == creator-* ]]; then
|
||||||
|
creator_total=$(( creator_total + cohort_count[$c] ))
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
if [ "$creator_total" -lt "$MIN_CREATORS" ]; then
|
||||||
|
echo "WARN : only $creator_total creator-* cohort rows ; goal is ≥ $MIN_CREATORS for upload-flow coverage."
|
||||||
|
exit_code=2
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ "${#cohort_count[@]}" -lt 3 ]; then
|
||||||
|
echo "WARN : only ${#cohort_count[@]} distinct cohort labels — feedback will be narrow."
|
||||||
|
exit_code=2
|
||||||
|
fi
|
||||||
|
|
||||||
|
# 5. Optional : DATABASE_URL collision check.
|
||||||
|
if [ -n "${DATABASE_URL:-}" ]; then
|
||||||
|
command -v psql >/dev/null 2>&1 || {
|
||||||
|
echo "WARN : DATABASE_URL set but psql not on \$PATH ; skipping collision check."
|
||||||
|
exit_code=2
|
||||||
|
}
|
||||||
|
if command -v psql >/dev/null 2>&1; then
|
||||||
|
emails_csv=$(printf '%s,' "${!seen[@]}" | sed 's/,$//')
|
||||||
|
collisions=$(psql "$DATABASE_URL" -A -t -c "
|
||||||
|
SELECT count(*) FROM users WHERE email = ANY(string_to_array('$emails_csv', ','));
|
||||||
|
" 2>/dev/null | tr -d ' ' || echo "?")
|
||||||
|
if [ "$collisions" = "?" ]; then
|
||||||
|
echo "WARN : couldn't query users table (psql connection issue) ; skipping collision check."
|
||||||
|
exit_code=2
|
||||||
|
elif [ "$collisions" -gt 0 ]; then
|
||||||
|
echo "INFO : $collisions email(s) in the cohort already exist in the users table — invite codes will be wasted on existing accounts."
|
||||||
|
exit_code=2
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
case $exit_code in
|
||||||
|
0) echo "PASS : cohort valid, ready for send-invitations.sh." ;;
|
||||||
|
2) echo "WARN : cohort valid but with caveats — review and re-run with --force from send-invitations.sh if intentional." ;;
|
||||||
|
esac
|
||||||
|
exit $exit_code
|
||||||
65
veza-backend-api/migrations/990_beta_invites.sql
Normal file
65
veza-backend-api/migrations/990_beta_invites.sql
Normal file
|
|
@ -0,0 +1,65 @@
|
||||||
|
-- 990_beta_invites.sql
|
||||||
|
-- v1.0.10 polish (Cluster 3.4) — soft-launch beta cohort tracking.
|
||||||
|
--
|
||||||
|
-- Records each individual invitation sent for the v2.0.0 soft-launch
|
||||||
|
-- bêta. Tracks (a) the invite code used in the registration link,
|
||||||
|
-- (b) when the recipient redeemed it (NULL until redemption), and
|
||||||
|
-- (c) which cohort segment (creator / listener / community-member /
|
||||||
|
-- press) the recipient belongs to so the post-launch report can
|
||||||
|
-- attribute feedback by audience.
|
||||||
|
--
|
||||||
|
-- The associated email template + send script live at
|
||||||
|
-- scripts/soft-launch/send-invitations.sh and reference this table
|
||||||
|
-- via INSERT … RETURNING code.
|
||||||
|
--
|
||||||
|
-- Privacy : the email column is the only PII here ; no behavioural
|
||||||
|
-- data is stored. used_at is the redemption signal. After v2.0.0
|
||||||
|
-- public launch, run the cleanup migration in 991 (TBD) to anonymise
|
||||||
|
-- the email column for invites that haven't been redeemed in 30+ days.
|
||||||
|
|
||||||
|
CREATE TABLE IF NOT EXISTS public.beta_invites (
|
||||||
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||||
|
-- The invite code is what the recipient pastes into the signup
|
||||||
|
-- form. 16 random characters from a base32 alphabet (no 0/1/I/L
|
||||||
|
-- to avoid eyestrain). Generated by send-invitations.sh.
|
||||||
|
code VARCHAR(32) NOT NULL UNIQUE,
|
||||||
|
email VARCHAR(320) NOT NULL,
|
||||||
|
-- Free-text label so the cohort generator can carry whatever
|
||||||
|
-- segmentation the operator wants (e.g. "creator-vinyl-pressing",
|
||||||
|
-- "listener-jazz-mailing-list", "press-pitchfork"). Index below
|
||||||
|
-- is for the post-launch report grouping.
|
||||||
|
cohort VARCHAR(64) NOT NULL,
|
||||||
|
-- NULL until the recipient signs up. Set by the auth handler
|
||||||
|
-- when /auth/register is hit with a valid invite code.
|
||||||
|
used_at TIMESTAMPTZ,
|
||||||
|
-- Hard expiry so unredeemed invites can't accumulate forever.
|
||||||
|
-- Default 30 days from creation ; soft-launch is short-window.
|
||||||
|
expires_at TIMESTAMPTZ NOT NULL DEFAULT (NOW() + INTERVAL '30 days'),
|
||||||
|
-- Operator who sent the invite — useful when reconciling "who
|
||||||
|
-- gave their friend a code" during the audit.
|
||||||
|
sent_by UUID REFERENCES public.users(id) ON DELETE SET NULL,
|
||||||
|
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||||
|
);
|
||||||
|
|
||||||
|
COMMENT ON TABLE public.beta_invites IS
|
||||||
|
'v2.0.0 soft-launch beta invitation tracking. v1.0.10 Cluster 3.4.';
|
||||||
|
COMMENT ON COLUMN public.beta_invites.code IS
|
||||||
|
'16-char base32 invite code (no 0/1/I/L). Pasted into signup form.';
|
||||||
|
COMMENT ON COLUMN public.beta_invites.cohort IS
|
||||||
|
'Free-text cohort label (creator-* / listener-* / press-* / etc.).';
|
||||||
|
COMMENT ON COLUMN public.beta_invites.used_at IS
|
||||||
|
'Redemption timestamp. NULL means the invite is still pending.';
|
||||||
|
|
||||||
|
-- Lookup by code (signup path) — every /auth/register call reads it.
|
||||||
|
CREATE UNIQUE INDEX IF NOT EXISTS idx_beta_invites_code
|
||||||
|
ON public.beta_invites(code);
|
||||||
|
|
||||||
|
-- Cohort grouping for the post-launch attribution query.
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_beta_invites_cohort
|
||||||
|
ON public.beta_invites(cohort);
|
||||||
|
|
||||||
|
-- Pending-invitations sweep — cron job that expires unused invites
|
||||||
|
-- after expires_at. Partial index keeps it small.
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_beta_invites_pending_expiry
|
||||||
|
ON public.beta_invites(expires_at)
|
||||||
|
WHERE used_at IS NULL;
|
||||||
92
veza-backend-api/templates/email/beta_invite.eml.template
Normal file
92
veza-backend-api/templates/email/beta_invite.eml.template
Normal file
|
|
@ -0,0 +1,92 @@
|
||||||
|
To: {{TO_ADDR}}
|
||||||
|
From: Veza <{{FROM_ADDR}}>
|
||||||
|
Subject: {{SUBJECT}}
|
||||||
|
MIME-Version: 1.0
|
||||||
|
Content-Type: multipart/alternative; boundary="--veza-beta-boundary"
|
||||||
|
|
||||||
|
----veza-beta-boundary
|
||||||
|
Content-Type: text/plain; charset="UTF-8"
|
||||||
|
Content-Transfer-Encoding: 7bit
|
||||||
|
|
||||||
|
Bonjour,
|
||||||
|
|
||||||
|
Vous êtes invité·e à rejoindre la bêta privée de Veza —
|
||||||
|
une plateforme de streaming musical éthique faite pour les
|
||||||
|
créateur·ices et les auditeur·ices, sans algorithme de
|
||||||
|
recommandation comportementale, sans gamification, sans dark
|
||||||
|
patterns.
|
||||||
|
|
||||||
|
Votre code d'invitation : {{INVITE_CODE}}
|
||||||
|
|
||||||
|
Pour vous inscrire :
|
||||||
|
{{INVITE_URL}}
|
||||||
|
|
||||||
|
Le code expire dans 30 jours.
|
||||||
|
|
||||||
|
Pendant la bêta, l'idée est simple : utilisez Veza comme vous
|
||||||
|
utiliseriez n'importe quelle plateforme musicale. Uploadez,
|
||||||
|
écoutez, partagez, achetez. Quand quelque chose vous frustre
|
||||||
|
ou vous étonne — en bien comme en mal — dites-le. Le canal
|
||||||
|
de retour vous sera communiqué après l'inscription.
|
||||||
|
|
||||||
|
Cohorte : {{COHORT}}
|
||||||
|
(C'est juste un tag interne pour qu'on regroupe les retours
|
||||||
|
par contexte d'usage. Ça n'affecte ni votre expérience ni vos
|
||||||
|
permissions.)
|
||||||
|
|
||||||
|
À très vite,
|
||||||
|
L'équipe Veza
|
||||||
|
|
||||||
|
|
||||||
|
--
|
||||||
|
Si vous n'avez pas demandé cette invitation, ignorez ce
|
||||||
|
message. Le code expirera automatiquement après 30 jours.
|
||||||
|
|
||||||
|
----veza-beta-boundary
|
||||||
|
Content-Type: text/html; charset="UTF-8"
|
||||||
|
Content-Transfer-Encoding: 7bit
|
||||||
|
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html>
|
||||||
|
<head>
|
||||||
|
<meta charset="UTF-8">
|
||||||
|
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||||
|
<title>Invitation à la bêta Veza</title>
|
||||||
|
</head>
|
||||||
|
<body style="font-family: Georgia, 'Times New Roman', serif; line-height: 1.6; color: #1a1a1e; margin: 0; padding: 0; background-color: #f8f7f4;">
|
||||||
|
<div style="max-width: 600px; margin: 20px auto; padding: 30px; background-color: #ffffff; border: 1px solid #e8e6e0;">
|
||||||
|
<h1 style="font-weight: 400; color: #1a1a1e; margin-top: 0; font-size: 28px;">Bienvenue dans la bêta Veza.</h1>
|
||||||
|
<p>Bonjour,</p>
|
||||||
|
<p>Vous êtes invité·e à rejoindre la <strong>bêta privée</strong> de Veza — une plateforme de streaming musical éthique faite pour les créateur·ices et les auditeur·ices, sans algorithme de recommandation comportementale, sans gamification, sans dark patterns.</p>
|
||||||
|
|
||||||
|
<div style="text-align: center; margin: 35px 0;">
|
||||||
|
<a href="{{INVITE_URL}}" style="background-color: #1a1a1e; color: #f8f7f4; padding: 14px 32px; text-decoration: none; display: inline-block; font-weight: 400; letter-spacing: 0.05em;">
|
||||||
|
Activer mon invitation
|
||||||
|
</a>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<p style="color: #555; font-size: 14px;">Ou collez ce lien dans votre navigateur :</p>
|
||||||
|
<p style="word-break: break-all; color: #888; background-color: #f8f7f4; padding: 10px; font-family: 'Courier New', monospace; font-size: 12px; border-left: 2px solid #d4a574;">{{INVITE_URL}}</p>
|
||||||
|
|
||||||
|
<p style="color: #555; font-size: 14px; margin-top: 25px;">Code d'invitation :</p>
|
||||||
|
<p style="font-family: 'Courier New', monospace; font-size: 18px; letter-spacing: 0.1em; background-color: #f8f7f4; padding: 12px; text-align: center; color: #1a1a1e;">{{INVITE_CODE}}</p>
|
||||||
|
|
||||||
|
<hr style="border: none; border-top: 1px solid #e8e6e0; margin: 30px 0;">
|
||||||
|
|
||||||
|
<p style="font-size: 14px; color: #555;">Pendant la bêta, l'idée est simple : utilisez Veza comme vous utiliseriez n'importe quelle plateforme musicale. Uploadez, écoutez, partagez, achetez. Quand quelque chose vous frustre ou vous étonne — en bien comme en mal — dites-le. Le canal de retour vous sera communiqué après l'inscription.</p>
|
||||||
|
|
||||||
|
<p style="font-size: 13px; color: #888; margin-top: 25px;">Cohorte : <strong>{{COHORT}}</strong> — c'est juste un tag interne pour qu'on regroupe les retours par contexte d'usage.</p>
|
||||||
|
|
||||||
|
<p style="margin-top: 30px; color: #888; font-size: 12px;">
|
||||||
|
Le code expire dans 30 jours. Si vous n'avez pas demandé cette invitation, ignorez ce message.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<hr style="border: none; border-top: 1px solid #e8e6e0; margin: 25px 0;">
|
||||||
|
<p style="color: #aaa; font-size: 11px; text-align: center; font-family: 'Courier New', monospace; letter-spacing: 0.1em;">
|
||||||
|
VEZA · v2.0.0 BETA · {{FRONTEND_URL}}
|
||||||
|
</p>
|
||||||
|
</div>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
|
|
||||||
|
----veza-beta-boundary--
|
||||||
Loading…
Reference in a new issue