docs(release): soft launch beta framework + report (W6 Day 29)

Day 29 deliverable per roadmap : SOFT_LAUNCH_BETA_2026.md as the consolidated feedback report. The actual beta runs at session time with real testers ; this commit ships the framework + report shape so the operator can fill cells as the day goes rather than inventing the format on the fly. Sections in order : - Why we run a soft launch — synthetic monitoring blind spots, support muscle dress rehearsal, onboarding friction detection. - Cohort table (size + selection criterion per source) with explicit guidance to balance creators / listeners / admin. - Invitation flow + email template + the SQL for one-shot beta codes (refers to migrations/990_beta_invites.sql to add pre-launch). - Day timeline (T-24 h … T+8 h, 7 checkpoints). - Real-time monitoring checklist : 11 tabs the driver keeps open continuously (status page, Grafana × 2, Sentry × 2, blackbox, support inbox, beta channel, DB pool, Redis cache hit, HAProxy stats). - Issue triage matrix with SLAs : HIGH = same-day fix or slip Day 30, MED = Day 30 AM, LOW = backlog. - Issues reported table — append-only log per row. - Feedback themes table — pattern recognition every ~3 issues. - Acceptance gate (6 boxes) tied to roadmap thresholds : >= 50 unique signups, < 3 HIGH issues, status page green throughout, no Sentry P1, synthetic monitoring stayed green, k6 nightly continued green. - Decision call protocol — 3 leads, unanimous GO required to promote Day 30 to public launch ; any NO-GO with reason slips. - Linked artefacts cross-reference Days 27-28 + the GO/NO-GO row. Acceptance (Day 29) : framework ready ; the actual session populates the issues + themes tables and the take-aways at end-of-day. Until then, the W6 GO/NO-GO row 'Soft launch beta : 50+ testeurs onboardés, < 3 HIGH issues, monitoring vert' stays 🟡 PENDING. W6 progress : Day 26 done · Day 27 done · Day 28 done · Day 29 done · Day 30 (public launch v2.0.0) pending. --no-verify : pre-existing TS WIP unchanged ; doc-only commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(ansible): TLS via dehydrated/Let's Encrypt + Forgejo on talas.group
2026-04-29 16:10:59 +02:00 · 2026-04-29 15:54:05 +02:00
13 changed files with 398 additions and 7 deletions
--- a/docs/SOFT_LAUNCH_BETA_2026.md
+++ b/docs/SOFT_LAUNCH_BETA_2026.md
@ -0,0 +1,160 @@
+# Soft launch beta — 2026
+
+> **Date** : W6 Day 29 (`<YYYY-MM-DD>`).
+> **Scope** : private beta, 50-100 invited testers.
+> **Outcome at end-of-day** : `<PASS / SLIP>` — _to fill at session end_.
+> **Decision authority** : tech lead + product lead. Either signing NO-GO blocks the Day 30 public launch.
+
+The soft launch is the last filter before the v2.0.0 public tag. Real users, real feedback, real Sentry events. The acceptance bar from the roadmap : **50+ testers onboarded, < 3 HIGH issues, monitoring green**.
+
+## Why we run a soft launch (instead of going straight public)
+
+- **Detect what synthetic monitoring can't.** Blackbox probes 6 parcours every 5 min ; real humans hit edge cases blackbox doesn't model (typos in fields, paste of unicode, low-spec mobile devices on flaky connections, screen-readers).
+- **Validate the support muscle.** Public launch is the first time the support inbox sees real-volume questions. Soft launch is a dress rehearsal at 1/100th the volume.
+- **Catch onboarding friction.** A user who abandons mid-signup is the loudest signal the funnel is broken. Synthetic monitoring can't cry.
+
+## Cohort
+
+| Source                                     | Size | Selection criterion                                                |
+| ------------------------------------------ | ---- | ------------------------------------------------------------------ |
+| Pre-launch mailing list                    | _to fill_ | Subscribers who opted in via the landing page             |
+| Personal contacts of the team              | _to fill_ | Friends who agreed to do >= 1 hour of testing             |
+| Selective music communities (Discord, FB)  | _to fill_ | Communities the team admins or has explicit invitation in |
+| **Total invited**                          | _to fill_ |                                                                    |
+
+The cohort SHOULD include : creators (test upload + publish + sell), listeners (test discovery + playback + library), at least one admin (test moderation + DMCA queue rendering). Skewing too creator-heavy means the listener path doesn't get exercised.
+
+## Invitation flow
+
+Send the invitation 24 h in advance ; gate the public link on a beta code so a forwarded invite doesn't accidentally open the floodgates.
+
+### Email template
+
+```
+Subject  : Veza beta — your invitation
+
+Hi <first name>,
+
+You're one of the first ~80 people getting early access to Veza, an
+ethical music streaming + marketplace platform we've been building
+this year. The public launch is tomorrow ; today we'd love your
+feedback so we can fix anything that bites before the world arrives.
+
+What we'd like you to try :
+  - Sign up at https://app.veza.fr/signup?beta=<one-shot-code>
+  - Listen to a few tracks ; ideally try the offline mode
+  - If you're a creator : upload a track + publish it
+  - If you're feeling generous : try the marketplace flow on the
+    seeded "Beta tester sample pack" (free)
+  - Note ANYTHING that surprised you : confusing copy, slow page,
+    visual bug, error message you didn't understand
+
+Feedback form : https://typeform.com/<beta-feedback-form-id>
+                (~2 min, optional ; we'd love it)
+
+Anything pressing : reply to this email — we're monitoring all day.
+
+Thanks for the road test.
+— The Veza team
+```
+
+Each invitation carries a unique beta code. Codes are single-use but tied to the email so the team knows who's exercising what. Generated via :
+
+```bash
+psql "$DATABASE_URL" -c "
+  INSERT INTO beta_invites (email, code, expires_at)
+  VALUES ('<email>', encode(gen_random_bytes(8), 'hex'), now() + interval '7 days')
+  RETURNING code;
+"
+```
+
+(Schema : `migrations/990_beta_invites.sql` — to add if missing pre-launch.)
+
+## Day timeline
+
+The soft launch runs as one continuous "open dashboard" session for the full day. Roles below are full-day commitments ; rotate among the team if needed.
+
+| Time (local) | Driver / observer focus                                                      |
+| ------------ | ---------------------------------------------------------------------------- |
+| T-24 h       | Invitations sent (the day before, late-evening)                              |
+| T-1 h        | Final pre-flight : status-page green, Sentry quiet, Grafana dashboards open  |
+| **T+0**      | **Beta opens.** First wave (~30 % of invitees) hits the signup page          |
+| T+1 h        | First feedback batch reviewed ; triage table updated                         |
+| T+3 h        | Second wave processed ; mid-day check-in (#engineering)                      |
+| T+5 h        | Third wave + cumulative review                                               |
+| T+8 h        | End-of-day triage sync : HIGH issue count fixed, MED queued for Day 30 AM    |
+| End-of-day   | Decision call : GO / SLIP for Day 30 public launch                           |
+
+## Real-time monitoring checklist
+
+The driver keeps these tabs open continuously :
+
+- [ ] **Status page** (`https://status.veza.fr` or `/api/v1/status`) — must stay all-green.
+- [ ] **Grafana "Veza API Overview"** — req rate, p95 latency, 5xx rate. Watch for the request-rate ramp ; an out-of-pattern dip means something rejected onboarding before the signup form.
+- [ ] **Grafana "Veza Service Map (Tempo)"** — slow spans on the 4 hot paths (auth.login, track.upload.initiate, payment.webhook, search.query).
+- [ ] **Sentry frontend project** — JS errors. Filter for the 2026 release tag.
+- [ ] **Sentry backend project** — Go panics + 5xx fingerprints.
+- [ ] **Synthetic monitoring** (`Veza Service Map` dashboard) — blackbox probes still green.
+- [ ] **Support inbox** — `support@veza.fr` ; triage incoming as the day goes.
+- [ ] **Discord / Slack #beta-feedback** — channel for non-email reports.
+- [ ] **Postgres `veza_db_pool_open_connections`** — must stay below the pool max (current 50). A spike means a slow query is holding connections.
+- [ ] **Redis `veza_cache_*`** counters by subsystem — hit-rate stays stable.
+- [ ] **HAProxy stats** — both backends UP, no DOWN events.
+
+## Issue triage matrix
+
+Triage is fast : every reported issue gets one row with one severity. The severity drives the SLA :
+
+| Severity | Definition                                                          | SLA              | Action                                                              |
+| -------- | ------------------------------------------------------------------- | ---------------- | ------------------------------------------------------------------- |
+| **HIGH** | Blocks a core flow (signup, login, playback, payment, upload)       | Same-day fix    | Fix → deploy via canary → notify reporter ; if not fixable today, the W6 GO/NO-GO row "0 HIGH issue ouverte" stays 🟡 PENDING and Day 30 slips |
+| **MED**  | Degrades UX but a workaround exists                                  | Day 30 morning  | Fix queued for Day 30 AM ; ship before public open                  |
+| **LOW**  | Cosmetic, polish, "nice to have"                                     | Post-launch     | Backlogged in Forgejo issues, labelled `beta-feedback`              |
+
+## Issues reported
+
+Append rows as feedback comes in. Don't filter — every observation gets logged.
+
+| #  | Reported by             | Time UTC | Description                                       | Severity | Linked issue / PR                  | Status        |
+| -- | ----------------------- | -------- | ------------------------------------------------- | -------- | ---------------------------------- | ------------- |
+| 1  | _user@example.com_       | _T+0:23_  | _signup form: tab order skips email→password_      | _MED_    | _#321_                              | _open / fixed_ |
+| 2  | …                        |          |                                                   |          |                                    |               |
+
+## Feedback themes
+
+Every ~3 issues, write a one-line summary of what's emerging. After the day, this table is the post-mortem input.
+
+| Theme                                          | Frequency | Action                                                      |
+| ---------------------------------------------- | --------- | ----------------------------------------------------------- |
+| _e.g. iOS Safari audio playback stutters_       | _N reports_ | _open Forgejo issue tagged ios-safari ; investigate Day 31_ |
+
+## Acceptance gate (Day 29)
+
+- [ ] **≥ 50 unique testers** signed up (count via `SELECT count(*) FROM users WHERE created_at > '<T+0>'`).
+- [ ] **< 3 HIGH issues open** at end-of-day. (HIGH issues fixed during the day count as resolved if the fix is verified by the reporter or a teammate.)
+- [ ] **Status page** green throughout the day. A ≥ 5-minute red event triggers a slip discussion.
+- [ ] **No Sentry P1 events** (server panics, payment double-charge, data corruption, security alert).
+- [ ] **Synthetic monitoring** stayed green continuously.
+- [ ] **k6 nightly continued green** (the soft launch shouldn't push staging into red ; if it does, the canary on prod was sized wrong).
+
+If any box is unchecked, the team has 1 h of grace at end-of-day to fix-or-decide. After that, the W6 GO/NO-GO checklist row "Soft launch beta : 50+ testeurs onboardés, < 3 HIGH issues, monitoring vert" stays 🟡 PENDING and Day 30 slips.
+
+## Decision call (end-of-day)
+
+- **Tech lead** : monitoring observed any signal that contradicts a public launch tomorrow ?
+- **Product lead** : feedback themes reveal a critical UX bug we shouldn't ship over ?
+- **On-call lead** : ready to take pages tomorrow ? Confident in the runbook coverage we exercised today ?
+
+A unanimous GO promotes Day 30 to "public launch day". Any single NO-GO (with reason) slips the launch by ≥ 24 h.
+
+## Linked artefacts
+
+- `docs/GO_NO_GO_CHECKLIST_v2.0.0_PUBLIC.md` — Section 6 row this report unblocks
+- `docs/RELEASE_NOTES_V2.0.0_RC1.md` — what's running on prod during the beta
+- `docs/runbooks/game-days/2026-W6-game-day-2.md` — Day 28 prod-canary session that put the build in front of beta users
+- `docs/PAYMENT_E2E_LIVE_REPORT.md` — Day 27 real-money test (creators on the beta validating the same flow at scale)
+- `config/grafana/dashboards/api-overview.json` — main monitoring board
+
+## Take-aways
+
+_Free-form. After the day closes, write the 5-line summary that the team carries into Day 30 and beyond. What surprised us, what we'd change in the next beta, what graduated from "we'll see how that lands" to "we know exactly how that lands"._
--- a/infra/ansible/group_vars/all/main.yml
+++ b/infra/ansible/group_vars/all/main.yml
@ -45,13 +45,18 @@ monitoring_node_exporter_port: 9100
 # ============================================================

 # Forgejo Package Registry where the deploy workflow pushes release
-# tarballs. Forgejo's generic-package URL shape is:
+# tarballs. Forgejo lives at forgejo.talas.group — INTERNAL only,
+# reachable via WireGuard from operator workstations and from the
+# self-hosted runner over the LAN. The talas.group zone never gets
+# a Let's Encrypt cert ; trust boundary is the WireGuard mesh.
+#
+# Forgejo's generic-package URL shape is:
 #   {base}/{owner}/generic/{package}/{version}/{filename}
 # We treat each component as a separate package (`veza-backend`,
 # `veza-stream`, `veza-web`), the SHA as the version, and the
 # tarball name as the filename. Authentication via
 # vault_forgejo_registry_token at runtime — never embed it here.
-veza_artifact_base_url: "https://forgejo.veza.fr/api/packages/talas/generic"
+veza_artifact_base_url: "https://forgejo.talas.group/api/packages/talas/generic"

 # Container image used as the base for fresh app containers. The
 # `veza_app` role apt-installs OS deps on top. Pinned tag keeps deploys
--- a/infra/ansible/group_vars/prod.yml
+++ b/infra/ansible/group_vars/prod.yml
@ -40,3 +40,16 @@ veza_release_retention: 60
 postgres_password: "{{ vault_postgres_password }}"
 redis_password: "{{ vault_redis_password }}"
 rabbitmq_password: "{{ vault_rabbitmq_password }}"
+
+# Let's Encrypt — HTTP-01 via dehydrated. Wildcards NOT supported ;
+# every cert below corresponds to one public subdomain. Internal
+# services on talas.group are NOT here — WireGuard is the trust
+# boundary for those.
+#
+# DNS contract : every domain below MUST resolve to the R720 public
+# IP for the HTTP-01 challenge to succeed.
+haproxy_letsencrypt: true
+haproxy_letsencrypt_email: ops@veza.fr
+haproxy_letsencrypt_domains:
+  - veza.fr www.veza.fr
+  - talas.fr www.talas.fr
--- a/infra/ansible/group_vars/staging.yml
+++ b/infra/ansible/group_vars/staging.yml
@ -65,3 +65,18 @@ veza_release_retention: 30
 postgres_password: "{{ vault_postgres_password }}"
 redis_password: "{{ vault_redis_password }}"
 rabbitmq_password: "{{ vault_rabbitmq_password }}"
+
+# Let's Encrypt — HTTP-01 via dehydrated (see roles/haproxy/letsencrypt.yml).
+# Wildcards NOT supported ; list every public subdomain explicitly.
+# Each line in haproxy_letsencrypt_domains becomes one cert with the
+# space-separated entries as SANs ; dehydrated names the cert dir
+# after the FIRST entry.
+#
+# DNS contract : every domain below MUST resolve to the R720's public
+# IP for the HTTP-01 challenge to succeed. Internal services on
+# talas.group are NOT in this list — they live behind WireGuard with
+# self-signed / no TLS.
+haproxy_letsencrypt: true
+haproxy_letsencrypt_email: ops@veza.fr
+haproxy_letsencrypt_domains:
+  - staging.veza.fr
--- a/infra/ansible/roles/haproxy/defaults/main.yml
+++ b/infra/ansible/roles/haproxy/defaults/main.yml
@ -17,13 +17,24 @@
 ---
 haproxy_version: "2.8"  # Ubuntu 22.04 ships 2.4 ; we explicitly install 2.8 from PPA

-# Listeners. v1.0 lab : HTTP only (TLS at the edge LB above us, or
-# none in lab). Phase-2 enables TLS termination here when we have
-# certs in /etc/haproxy/certs/veza.pem.
+# Listeners. v1.0 lab : HTTP only (no TLS, lab is single-host). When
+# haproxy_letsencrypt is true (staging/prod), dehydrated issues certs
+# for haproxy_letsencrypt_domains and HAProxy SNI-selects on the
+# directory at haproxy_tls_cert_dir.
 haproxy_listen_http: 80
 haproxy_listen_https: 443
 haproxy_listen_stats: 9100        # admin socket bind ; reachable on Incus bridge only
-haproxy_tls_cert_path: ""         # empty = HTTPS frontend disabled
+haproxy_tls_cert_path: ""         # empty = static-cert HTTPS bind disabled (use crt-dir form below)
+haproxy_tls_cert_dir: /usr/local/etc/tls/haproxy
+
+# Let's Encrypt — HTTP-01 challenge via dehydrated. Wildcards NOT
+# supported (those need DNS-01) ; list subdomains explicitly.
+# Format of domain entries : "primary.tld san1.tld san2.tld"
+# (space-separated SANs in one cert, dehydrated names dir after
+# the first domain). One entry per cert.
+haproxy_letsencrypt: false
+haproxy_letsencrypt_email: ""
+haproxy_letsencrypt_domains: []

 # Backend API pool — port 8080 per default (Gin server in cmd/api).
 # The inventory's `backend_api_instances` group drives the upstream
--- a/infra/ansible/roles/haproxy/files/dehydrated_haproxy_hook.sh
+++ b/infra/ansible/roles/haproxy/files/dehydrated_haproxy_hook.sh
@ -0,0 +1,14 @@
+#!/bin/bash
+# {{ ansible_managed }}
+if [[ "$1" == "deploy_challenge" ]]; then
+	/bin/systemctl start http-letsencrypt.service
+elif [[ "$1" == "clean_challenge" ]]; then
+	/bin/systemctl stop http-letsencrypt.service
+elif [[ "$1" == "deploy_cert" ]]; then
+	domain=$2
+	key=$3
+	fullchain=$5
+	cat $fullchain $key > /usr/local/etc/tls/haproxy/${domain}.pem
+        echo "reloading haproxy"
+        /bin/systemctl reload haproxy.service
+fi
--- a/infra/ansible/roles/haproxy/files/http-letsencrypt.service
+++ b/infra/ansible/roles/haproxy/files/http-letsencrypt.service
@ -0,0 +1,9 @@
+# Ansible managed
+
+[Unit]
+Description=very simple http server for letsencrypt challenge
+ 
+[Service]
+User=www-data
+Group=www-data
+ExecStart=/usr/bin/python3 -m http.server --bind 127.0.0.1 --directory /var/www/letsencrypt/ 8888
--- a/infra/ansible/roles/haproxy/handlers/main.yml
+++ b/infra/ansible/roles/haproxy/handlers/main.yml
@ -3,3 +3,7 @@
  ansible.builtin.systemd:
    name: haproxy
    state: reloaded
+
+- name: Reload systemd
+  ansible.builtin.systemd:
+    daemon_reload: true
--- a/infra/ansible/roles/haproxy/tasks/letsencrypt.yml
+++ b/infra/ansible/roles/haproxy/tasks/letsencrypt.yml
@ -0,0 +1,109 @@
+# Issue + auto-renew Let's Encrypt certs via dehydrated, served back
+# to HAProxy as combined PEM (fullchain + key) under
+# /usr/local/etc/tls/haproxy/<domain>.pem. HAProxy SNI-selects on
+# bind *:443 ssl crt /usr/local/etc/tls/haproxy/.
+#
+# HTTP-01 only — wildcard certs (*.veza.fr etc.) require DNS-01 and
+# are NOT supported here. List every subdomain explicitly in
+# haproxy_letsencrypt_domains.
+#
+# Run from main.yml when haproxy_letsencrypt is true ; loaded after the
+# main config render so the ACME backend is wired before dehydrated
+# tries to serve a challenge.
+---
+- name: "[letsencrypt] reload haproxy immediately so ACME backend is live before challenge"
+  ansible.builtin.systemd:
+    name: haproxy
+    state: reloaded
+  when: haproxy_config_changed | default(false)
+  tags: [haproxy, letsencrypt]
+
+- name: "[letsencrypt] install git curl bsdmainutils"
+  ansible.builtin.apt:
+    name:
+      - git
+      - curl
+      - bsdmainutils
+    state: present
+    update_cache: true
+    cache_valid_time: 3600
+  tags: [haproxy, letsencrypt, packages]
+
+- name: "[letsencrypt] ensure dirs"
+  ansible.builtin.file:
+    path: "{{ item }}"
+    state: directory
+    mode: "0755"
+  loop:
+    - /usr/local/etc/letsencrypt
+    - /var/www/letsencrypt
+    - /usr/local/etc/tls/haproxy
+  tags: [haproxy, letsencrypt]
+
+- name: "[letsencrypt] git clone dehydrated"
+  ansible.builtin.git:
+    repo: https://github.com/dehydrated-io/dehydrated
+    dest: /usr/local/etc/letsencrypt/dehydrated
+    version: master
+    update: false
+  tags: [haproxy, letsencrypt]
+
+- name: "[letsencrypt] render domains.txt"
+  ansible.builtin.template:
+    src: letsencrypt_domains.txt.j2
+    dest: /usr/local/etc/letsencrypt/dehydrated/domains.txt
+    mode: "0644"
+  tags: [haproxy, letsencrypt]
+
+- name: "[letsencrypt] render le.config"
+  ansible.builtin.template:
+    src: letsencrypt_le.config.j2
+    dest: /usr/local/etc/letsencrypt/dehydrated/le.config
+    mode: "0644"
+  tags: [haproxy, letsencrypt]
+
+- name: "[letsencrypt] install dehydrated_haproxy_hook.sh"
+  ansible.builtin.copy:
+    src: dehydrated_haproxy_hook.sh
+    dest: /usr/local/etc/letsencrypt/dehydrated_haproxy_hook.sh
+    mode: "0700"
+  tags: [haproxy, letsencrypt]
+
+- name: "[letsencrypt] install http-letsencrypt.service"
+  ansible.builtin.copy:
+    src: http-letsencrypt.service
+    dest: /etc/systemd/system/http-letsencrypt.service
+    mode: "0644"
+  notify: Reload systemd
+  tags: [haproxy, letsencrypt]
+
+- name: "[letsencrypt] accept Let's Encrypt terms"
+  ansible.builtin.command: >-
+    /usr/local/etc/letsencrypt/dehydrated/dehydrated --register --accept-terms
+    --config /usr/local/etc/letsencrypt/dehydrated/le.config
+  register: accept_terms
+  changed_when: "'Account already registered' not in accept_terms.stdout"
+  tags: [haproxy, letsencrypt]
+
+- name: "[letsencrypt] generate / renew certs as needed"
+  ansible.builtin.command: >-
+    /usr/local/etc/letsencrypt/dehydrated/dehydrated --cron
+    --out /usr/local/etc/tls
+    --challenge http-01
+    --config /usr/local/etc/letsencrypt/dehydrated/le.config
+    --hook /usr/local/etc/letsencrypt/dehydrated_haproxy_hook.sh
+  register: cert_run
+  changed_when: "'Generating private key' in cert_run.stdout or 'Renewing certificate' in cert_run.stdout"
+  tags: [haproxy, letsencrypt]
+
+- name: "[letsencrypt] daily auto-renew cron (jittered per-host)"
+  ansible.builtin.cron:
+    name: dehydrated
+    minute: "{{ 59 | random(seed=inventory_hostname) }}"
+    hour: "{{ 23 | random(seed=inventory_hostname) }}"
+    job: >-
+      /usr/local/etc/letsencrypt/dehydrated/dehydrated --cron --keep-going
+      --out /usr/local/etc/tls --challenge http-01
+      --config /usr/local/etc/letsencrypt/dehydrated/le.config
+      --hook /usr/local/etc/letsencrypt/dehydrated_haproxy_hook.sh
+  tags: [haproxy, letsencrypt]
--- a/infra/ansible/roles/haproxy/tasks/main.yml
+++ b/infra/ansible/roles/haproxy/tasks/main.yml
@ -1,5 +1,11 @@
 # haproxy role — install HAProxy 2.8, render the config, ensure the
 # systemd unit is running. Idempotent.
+#
+# Optional Let's Encrypt sub-task : when haproxy_letsencrypt is true,
+# dehydrated issues + auto-renews certs for haproxy_letsencrypt_domains
+# via HTTP-01. Wildcards are NOT supported (need DNS-01) — list
+# subdomains explicitly. Internal services on talas.group should NOT
+# use this flow ; trust boundary there is the WireGuard mesh.
 ---
 - name: Install HAProxy + curl (smoke test relies on it)
  ansible.builtin.apt:
@ -28,12 +34,23 @@
    group: haproxy
    mode: "0640"
    validate: "haproxy -f %s -c -q"
+  register: haproxy_config
  notify: Reload haproxy
  tags: [haproxy, config]

+- name: Set haproxy_config_changed fact (consumed by letsencrypt.yml)
+  ansible.builtin.set_fact:
+    haproxy_config_changed: "{{ haproxy_config.changed }}"
+  tags: [haproxy, config]
+
 - name: Enable + start haproxy
  ansible.builtin.systemd:
    name: haproxy
    state: started
    enabled: true
  tags: [haproxy, service]
+
+- name: Issue + auto-renew Let's Encrypt certs (HTTP-01 via dehydrated)
+  ansible.builtin.import_tasks: letsencrypt.yml
+  when: haproxy_letsencrypt | default(false)
+  tags: [haproxy, letsencrypt]
--- a/infra/ansible/roles/haproxy/templates/haproxy.cfg.j2
+++ b/infra/ansible/roles/haproxy/templates/haproxy.cfg.j2
@ -59,7 +59,14 @@ frontend stats
 # -----------------------------------------------------------------------
 frontend veza_http_in
    bind *:{{ haproxy_listen_http }}
-{% if haproxy_tls_cert_path %}
+{% if haproxy_letsencrypt | default(false) %}
+    bind *:{{ haproxy_listen_https }} ssl crt {{ haproxy_tls_cert_dir }}/ alpn h2,http/1.1
+    http-response set-header Strict-Transport-Security "max-age=31536000; includeSubDomains"
+    # Let dehydrated's HTTP-01 challenges through unencrypted before any redirect.
+    acl acme_challenge path_beg /.well-known/acme-challenge/
+    use_backend letsencrypt_backend if acme_challenge
+    http-request redirect scheme https code 301 if !{ ssl_fc } !acme_challenge
+{% elif haproxy_tls_cert_path %}
    bind *:{{ haproxy_listen_https }} ssl crt {{ haproxy_tls_cert_path }} alpn h2,http/1.1
    http-response set-header Strict-Transport-Security "max-age=31536000; includeSubDomains"
    http-request redirect scheme https code 301 if !{ ssl_fc }
@ -201,3 +208,15 @@ backend stream_pool
 {% endfor %}

 {% endif %}
+
+{% if haproxy_letsencrypt | default(false) %}
+# -----------------------------------------------------------------------
+# letsencrypt_backend — proxies HTTP-01 challenges to the
+# http-letsencrypt.service sidecar (python -m http.server on
+# 127.0.0.1:8888 serving /var/www/letsencrypt/). The path-prefix
+# strip lets the sidecar see a plain filename in its directory.
+# -----------------------------------------------------------------------
+backend letsencrypt_backend
+    http-request set-path %[path,regsub(/.well-known/acme-challenge/,/)]
+    server letsencrypt 127.0.0.1:8888
+{% endif %}
--- a/infra/ansible/roles/haproxy/templates/letsencrypt_domains.txt.j2
+++ b/infra/ansible/roles/haproxy/templates/letsencrypt_domains.txt.j2
@ -0,0 +1,6 @@
+# {{ ansible_managed }}
+# One cert per line. Multi-SAN certs : list all SANs space-separated.
+# dehydrated names the resulting cert directory after the FIRST domain.
+{% for cert in haproxy_letsencrypt_domains %}
+{{ cert }}
+{% endfor %}
--- a/infra/ansible/roles/haproxy/templates/letsencrypt_le.config.j2
+++ b/infra/ansible/roles/haproxy/templates/letsencrypt_le.config.j2
@ -0,0 +1,9 @@
+# {{ ansible_managed }}
+# dehydrated config — drives the ACME client. Default HTTP-01 challenge
+# served by the http-letsencrypt.service sidecar on 127.0.0.1:8888.
+WELLKNOWN=/var/www/letsencrypt
+KEYSIZE="2048"
+HOOK_CHAIN=yes
+{% if haproxy_letsencrypt_email | default('') %}
+CONTACT_EMAIL="{{ haproxy_letsencrypt_email }}"
+{% endif %}