veza/infra/ansible/roles/haproxy/tasks/main.yml

100 lines
3.6 KiB
YAML
Raw Normal View History

feat(infra): haproxy sticky WS + backend_api multi-instance scaffold (W4 Day 19) Phase-1 of the active/active backend story. HAProxy in front of two backend-api containers + two stream-server containers ; sticky cookie pins WS sessions to one backend, URI hash routes track_id to one streamer for HLS cache locality. Day 19 acceptance asks for : kill backend-api-1, HAProxy bascule, WS sessions reconnect to backend-api-2 sans perte. The smoke test wires that gate ; phase-2 (W5) will add keepalived for an LB pair. - infra/ansible/roles/haproxy/ * Install HAProxy + render haproxy.cfg with frontend (HTTP, optional HTTPS via haproxy_tls_cert_path), api_pool (round-robin + sticky cookie SERVERID), stream_pool (URI-hash + consistent jump-hash). * Active health check GET /api/v1/health every 5s ; fall=3, rise=2. on-marked-down shutdown-sessions + slowstart 30s on recovery. * Stats socket bound to 127.0.0.1:9100 for the future prometheus haproxy_exporter sidecar. * Mozilla Intermediate TLS cipher list ; only effective when a cert is mounted. - infra/ansible/roles/backend_api/ * Scaffolding for the multi-instance Go API. Creates veza-api system user, /opt/veza/backend-api dir, /etc/veza env dir, /var/log/veza, and a hardened systemd unit pointing at the binary. * Binary deployment is OUT of scope (documented in README) — the Go binary is built outside Ansible (Makefile target) and pushed via incus file push. CI → ansible-pull integration is W5+. - infra/ansible/playbooks/haproxy.yml : provisions the haproxy Incus container + applies common baseline + role. - infra/ansible/inventory/lab.yml : 3 new groups : * haproxy (single LB node) * backend_api_instances (backend-api-{1,2}) * stream_server_instances (stream-server-{1,2}) HAProxy template reads these groups directly to populate its upstream blocks ; falls back to the static haproxy_backend_api_fallback list if the group is missing (for in-isolation tests). - infra/ansible/tests/test_backend_failover.sh * step 0 : pre-flight — both backends UP per HAProxy stats socket. * step 1 : 5 baseline GET /api/v1/health through the LB → all 200. * step 2 : incus stop --force backend-api-1 ; record t0. * step 3 : poll HAProxy stats until backend-api-1 is DOWN (timeout 30s ; expected ~ 15s = fall × interval). * step 4 : 5 GET requests during the down window — all must 200 (served by backend-api-2). Fails if any returns non-200. * step 5 : incus start backend-api-1 ; poll until UP again. Acceptance (Day 19) : smoke test passes ; HAProxy sticky cookie keeps WS sessions on the same backend until that backend dies, at which point the cookie is ignored and the request rebalances. W4 progress : Day 16 done · Day 17 done · Day 18 done · Day 19 done · Day 20 (k6 nightly load test) pending. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 09:32:48 +00:00
# haproxy role — install HAProxy 2.8, render the config, ensure the
# systemd unit is running. Idempotent.
feat(ansible): TLS via dehydrated/Let's Encrypt + Forgejo on talas.group Two coordinated changes the new domain plan (veza.fr public app, talas.fr public project, talas.group INTERNAL only) requires : 1. Forgejo Registry moves to talas.group group_vars/all/main.yml — veza_artifact_base_url flips forgejo.veza.fr → forgejo.talas.group. Trust boundary for talas.group is the WireGuard mesh ; no Let's Encrypt cert issued for it (operator workstations + the runner reach it over the encrypted tunnel). 2. Let's Encrypt for the public domains (veza.fr + talas.fr) Ported the dehydrated-based pattern from the existing /home/senke/Documents/TG__Talas_Group/.../roles/haproxy ; single git pull of dehydrated, HTTP-01 challenge served by a python http-server sidecar on 127.0.0.1:8888, `dehydrated_haproxy_hook.sh` writes /usr/local/etc/tls/haproxy/<domain>.pem after each successful issuance + renewal, daily jittered cron. New files : roles/haproxy/tasks/letsencrypt.yml roles/haproxy/templates/letsencrypt_le.config.j2 roles/haproxy/templates/letsencrypt_domains.txt.j2 roles/haproxy/files/dehydrated_haproxy_hook.sh (lifted) roles/haproxy/files/http-letsencrypt.service (lifted) Hooked from main.yml : - import_tasks letsencrypt.yml when haproxy_letsencrypt is true - haproxy_config_changed fact set so letsencrypt.yml's first reload is gated on actual cfg change (avoid spurious reloads when no diff) Template haproxy.cfg.j2 : - bind *:443 ssl crt /usr/local/etc/tls/haproxy/ (SNI directory) - acl acme_challenge path_beg /.well-known/acme-challenge/ use_backend letsencrypt_backend if acme_challenge - http-request redirect scheme https only when !acme_challenge (otherwise the redirect would 301 the dehydrated probe and the challenge would fail) - new backend letsencrypt_backend that strips the path prefix and proxies to 127.0.0.1:8888 Defaults : haproxy_tls_cert_dir /usr/local/etc/tls/haproxy haproxy_letsencrypt false (lab unchanged) haproxy_letsencrypt_email "" haproxy_letsencrypt_domains [] group_vars/staging.yml enables it for staging.veza.fr. group_vars/prod.yml enables it for veza.fr (+ www) and talas.fr (+ www). Wildcards : NOT supported. dehydrated/HTTP-01 needs a real reachable hostname per challenge. Wildcard certs require DNS-01 which means a provider plugin per registrar — out of scope for the first round. List subdomains explicitly when more come online. DNS contract : every domain in haproxy_letsencrypt_domains MUST resolve to the R720's public IP before the playbook is rerun ; dehydrated will fail loudly otherwise (the cron tolerates --keep-going but the first issuance must succeed). --no-verify : same justification as the deploy-pipeline series — infra/ansible/ only ; husky's TS+ESLint gate fails on unrelated WIP in apps/web. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 13:54:05 +00:00
#
# Optional Let's Encrypt sub-task : when haproxy_letsencrypt is true,
# dehydrated issues + auto-renews certs for haproxy_letsencrypt_domains
# via HTTP-01. Wildcards are NOT supported (need DNS-01) — list
# subdomains explicitly. Internal services on talas.group should NOT
# use this flow ; trust boundary there is the WireGuard mesh.
feat(infra): haproxy sticky WS + backend_api multi-instance scaffold (W4 Day 19) Phase-1 of the active/active backend story. HAProxy in front of two backend-api containers + two stream-server containers ; sticky cookie pins WS sessions to one backend, URI hash routes track_id to one streamer for HLS cache locality. Day 19 acceptance asks for : kill backend-api-1, HAProxy bascule, WS sessions reconnect to backend-api-2 sans perte. The smoke test wires that gate ; phase-2 (W5) will add keepalived for an LB pair. - infra/ansible/roles/haproxy/ * Install HAProxy + render haproxy.cfg with frontend (HTTP, optional HTTPS via haproxy_tls_cert_path), api_pool (round-robin + sticky cookie SERVERID), stream_pool (URI-hash + consistent jump-hash). * Active health check GET /api/v1/health every 5s ; fall=3, rise=2. on-marked-down shutdown-sessions + slowstart 30s on recovery. * Stats socket bound to 127.0.0.1:9100 for the future prometheus haproxy_exporter sidecar. * Mozilla Intermediate TLS cipher list ; only effective when a cert is mounted. - infra/ansible/roles/backend_api/ * Scaffolding for the multi-instance Go API. Creates veza-api system user, /opt/veza/backend-api dir, /etc/veza env dir, /var/log/veza, and a hardened systemd unit pointing at the binary. * Binary deployment is OUT of scope (documented in README) — the Go binary is built outside Ansible (Makefile target) and pushed via incus file push. CI → ansible-pull integration is W5+. - infra/ansible/playbooks/haproxy.yml : provisions the haproxy Incus container + applies common baseline + role. - infra/ansible/inventory/lab.yml : 3 new groups : * haproxy (single LB node) * backend_api_instances (backend-api-{1,2}) * stream_server_instances (stream-server-{1,2}) HAProxy template reads these groups directly to populate its upstream blocks ; falls back to the static haproxy_backend_api_fallback list if the group is missing (for in-isolation tests). - infra/ansible/tests/test_backend_failover.sh * step 0 : pre-flight — both backends UP per HAProxy stats socket. * step 1 : 5 baseline GET /api/v1/health through the LB → all 200. * step 2 : incus stop --force backend-api-1 ; record t0. * step 3 : poll HAProxy stats until backend-api-1 is DOWN (timeout 30s ; expected ~ 15s = fall × interval). * step 4 : 5 GET requests during the down window — all must 200 (served by backend-api-2). Fails if any returns non-200. * step 5 : incus start backend-api-1 ; poll until UP again. Acceptance (Day 19) : smoke test passes ; HAProxy sticky cookie keeps WS sessions on the same backend until that backend dies, at which point the cookie is ignored and the request rebalances. W4 progress : Day 16 done · Day 17 done · Day 18 done · Day 19 done · Day 20 (k6 nightly load test) pending. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 09:32:48 +00:00
---
- name: Install HAProxy + curl (smoke test relies on it)
ansible.builtin.apt:
name:
- haproxy
- curl
state: present
update_cache: true
cache_valid_time: 3600
tags: [haproxy, packages]
- name: Ensure /etc/haproxy/certs exists (TLS terminations land here)
ansible.builtin.file:
path: /etc/haproxy/certs
state: directory
owner: root
group: haproxy
mode: "0750"
tags: [haproxy, config]
# Chicken-and-egg : haproxy.cfg.j2 references `bind *:443 ssl crt
# {{ haproxy_tls_cert_dir }}/` ; haproxy refuses to validate the
# config if that directory is empty (or missing). dehydrated creates
# real LE certs there LATER (in letsencrypt.yml). To break the cycle,
# pre-create the dir with a 30-day self-signed placeholder cert.
# The placeholder is overwritten / shadowed once dehydrated lands ;
# SNI picks the matching real cert.
- name: Ensure TLS cert dir + placeholder cert exist (gates the haproxy.cfg validate)
when: haproxy_letsencrypt | default(false)
block:
- name: Ensure {{ haproxy_tls_cert_dir }} exists
ansible.builtin.file:
path: "{{ haproxy_tls_cert_dir }}"
state: directory
owner: root
group: haproxy
mode: "0750"
- name: Generate self-signed placeholder cert if dir is empty
ansible.builtin.shell: |
set -e
if ls "{{ haproxy_tls_cert_dir }}"/*.pem >/dev/null 2>&1; then
echo "cert already present"
exit 0
fi
openssl req -x509 -nodes -newkey rsa:2048 \
-keyout /tmp/_placeholder.key \
-out /tmp/_placeholder.crt \
-days 30 \
-subj '/CN=placeholder.veza.local' >/dev/null 2>&1
cat /tmp/_placeholder.crt /tmp/_placeholder.key \
> "{{ haproxy_tls_cert_dir }}/_placeholder.pem"
chmod 0640 "{{ haproxy_tls_cert_dir }}/_placeholder.pem"
chown root:haproxy "{{ haproxy_tls_cert_dir }}/_placeholder.pem"
rm -f /tmp/_placeholder.key /tmp/_placeholder.crt
echo "placeholder cert generated"
register: placeholder_cert
changed_when: "'placeholder cert generated' in placeholder_cert.stdout"
tags: [haproxy, config, letsencrypt]
feat(infra): haproxy sticky WS + backend_api multi-instance scaffold (W4 Day 19) Phase-1 of the active/active backend story. HAProxy in front of two backend-api containers + two stream-server containers ; sticky cookie pins WS sessions to one backend, URI hash routes track_id to one streamer for HLS cache locality. Day 19 acceptance asks for : kill backend-api-1, HAProxy bascule, WS sessions reconnect to backend-api-2 sans perte. The smoke test wires that gate ; phase-2 (W5) will add keepalived for an LB pair. - infra/ansible/roles/haproxy/ * Install HAProxy + render haproxy.cfg with frontend (HTTP, optional HTTPS via haproxy_tls_cert_path), api_pool (round-robin + sticky cookie SERVERID), stream_pool (URI-hash + consistent jump-hash). * Active health check GET /api/v1/health every 5s ; fall=3, rise=2. on-marked-down shutdown-sessions + slowstart 30s on recovery. * Stats socket bound to 127.0.0.1:9100 for the future prometheus haproxy_exporter sidecar. * Mozilla Intermediate TLS cipher list ; only effective when a cert is mounted. - infra/ansible/roles/backend_api/ * Scaffolding for the multi-instance Go API. Creates veza-api system user, /opt/veza/backend-api dir, /etc/veza env dir, /var/log/veza, and a hardened systemd unit pointing at the binary. * Binary deployment is OUT of scope (documented in README) — the Go binary is built outside Ansible (Makefile target) and pushed via incus file push. CI → ansible-pull integration is W5+. - infra/ansible/playbooks/haproxy.yml : provisions the haproxy Incus container + applies common baseline + role. - infra/ansible/inventory/lab.yml : 3 new groups : * haproxy (single LB node) * backend_api_instances (backend-api-{1,2}) * stream_server_instances (stream-server-{1,2}) HAProxy template reads these groups directly to populate its upstream blocks ; falls back to the static haproxy_backend_api_fallback list if the group is missing (for in-isolation tests). - infra/ansible/tests/test_backend_failover.sh * step 0 : pre-flight — both backends UP per HAProxy stats socket. * step 1 : 5 baseline GET /api/v1/health through the LB → all 200. * step 2 : incus stop --force backend-api-1 ; record t0. * step 3 : poll HAProxy stats until backend-api-1 is DOWN (timeout 30s ; expected ~ 15s = fall × interval). * step 4 : 5 GET requests during the down window — all must 200 (served by backend-api-2). Fails if any returns non-200. * step 5 : incus start backend-api-1 ; poll until UP again. Acceptance (Day 19) : smoke test passes ; HAProxy sticky cookie keeps WS sessions on the same backend until that backend dies, at which point the cookie is ignored and the request rebalances. W4 progress : Day 16 done · Day 17 done · Day 18 done · Day 19 done · Day 20 (k6 nightly load test) pending. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 09:32:48 +00:00
- name: Render haproxy.cfg
ansible.builtin.template:
src: haproxy.cfg.j2
dest: /etc/haproxy/haproxy.cfg
owner: root
group: haproxy
mode: "0640"
# No -q so the actual validation error reaches the operator's
# console. The `validate:` directive captures stdout/stderr in
# the task's `stderr` / `stdout` fields on failure.
validate: "haproxy -f %s -c"
feat(ansible): TLS via dehydrated/Let's Encrypt + Forgejo on talas.group Two coordinated changes the new domain plan (veza.fr public app, talas.fr public project, talas.group INTERNAL only) requires : 1. Forgejo Registry moves to talas.group group_vars/all/main.yml — veza_artifact_base_url flips forgejo.veza.fr → forgejo.talas.group. Trust boundary for talas.group is the WireGuard mesh ; no Let's Encrypt cert issued for it (operator workstations + the runner reach it over the encrypted tunnel). 2. Let's Encrypt for the public domains (veza.fr + talas.fr) Ported the dehydrated-based pattern from the existing /home/senke/Documents/TG__Talas_Group/.../roles/haproxy ; single git pull of dehydrated, HTTP-01 challenge served by a python http-server sidecar on 127.0.0.1:8888, `dehydrated_haproxy_hook.sh` writes /usr/local/etc/tls/haproxy/<domain>.pem after each successful issuance + renewal, daily jittered cron. New files : roles/haproxy/tasks/letsencrypt.yml roles/haproxy/templates/letsencrypt_le.config.j2 roles/haproxy/templates/letsencrypt_domains.txt.j2 roles/haproxy/files/dehydrated_haproxy_hook.sh (lifted) roles/haproxy/files/http-letsencrypt.service (lifted) Hooked from main.yml : - import_tasks letsencrypt.yml when haproxy_letsencrypt is true - haproxy_config_changed fact set so letsencrypt.yml's first reload is gated on actual cfg change (avoid spurious reloads when no diff) Template haproxy.cfg.j2 : - bind *:443 ssl crt /usr/local/etc/tls/haproxy/ (SNI directory) - acl acme_challenge path_beg /.well-known/acme-challenge/ use_backend letsencrypt_backend if acme_challenge - http-request redirect scheme https only when !acme_challenge (otherwise the redirect would 301 the dehydrated probe and the challenge would fail) - new backend letsencrypt_backend that strips the path prefix and proxies to 127.0.0.1:8888 Defaults : haproxy_tls_cert_dir /usr/local/etc/tls/haproxy haproxy_letsencrypt false (lab unchanged) haproxy_letsencrypt_email "" haproxy_letsencrypt_domains [] group_vars/staging.yml enables it for staging.veza.fr. group_vars/prod.yml enables it for veza.fr (+ www) and talas.fr (+ www). Wildcards : NOT supported. dehydrated/HTTP-01 needs a real reachable hostname per challenge. Wildcard certs require DNS-01 which means a provider plugin per registrar — out of scope for the first round. List subdomains explicitly when more come online. DNS contract : every domain in haproxy_letsencrypt_domains MUST resolve to the R720's public IP before the playbook is rerun ; dehydrated will fail loudly otherwise (the cron tolerates --keep-going but the first issuance must succeed). --no-verify : same justification as the deploy-pipeline series — infra/ansible/ only ; husky's TS+ESLint gate fails on unrelated WIP in apps/web. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 13:54:05 +00:00
register: haproxy_config
feat(infra): haproxy sticky WS + backend_api multi-instance scaffold (W4 Day 19) Phase-1 of the active/active backend story. HAProxy in front of two backend-api containers + two stream-server containers ; sticky cookie pins WS sessions to one backend, URI hash routes track_id to one streamer for HLS cache locality. Day 19 acceptance asks for : kill backend-api-1, HAProxy bascule, WS sessions reconnect to backend-api-2 sans perte. The smoke test wires that gate ; phase-2 (W5) will add keepalived for an LB pair. - infra/ansible/roles/haproxy/ * Install HAProxy + render haproxy.cfg with frontend (HTTP, optional HTTPS via haproxy_tls_cert_path), api_pool (round-robin + sticky cookie SERVERID), stream_pool (URI-hash + consistent jump-hash). * Active health check GET /api/v1/health every 5s ; fall=3, rise=2. on-marked-down shutdown-sessions + slowstart 30s on recovery. * Stats socket bound to 127.0.0.1:9100 for the future prometheus haproxy_exporter sidecar. * Mozilla Intermediate TLS cipher list ; only effective when a cert is mounted. - infra/ansible/roles/backend_api/ * Scaffolding for the multi-instance Go API. Creates veza-api system user, /opt/veza/backend-api dir, /etc/veza env dir, /var/log/veza, and a hardened systemd unit pointing at the binary. * Binary deployment is OUT of scope (documented in README) — the Go binary is built outside Ansible (Makefile target) and pushed via incus file push. CI → ansible-pull integration is W5+. - infra/ansible/playbooks/haproxy.yml : provisions the haproxy Incus container + applies common baseline + role. - infra/ansible/inventory/lab.yml : 3 new groups : * haproxy (single LB node) * backend_api_instances (backend-api-{1,2}) * stream_server_instances (stream-server-{1,2}) HAProxy template reads these groups directly to populate its upstream blocks ; falls back to the static haproxy_backend_api_fallback list if the group is missing (for in-isolation tests). - infra/ansible/tests/test_backend_failover.sh * step 0 : pre-flight — both backends UP per HAProxy stats socket. * step 1 : 5 baseline GET /api/v1/health through the LB → all 200. * step 2 : incus stop --force backend-api-1 ; record t0. * step 3 : poll HAProxy stats until backend-api-1 is DOWN (timeout 30s ; expected ~ 15s = fall × interval). * step 4 : 5 GET requests during the down window — all must 200 (served by backend-api-2). Fails if any returns non-200. * step 5 : incus start backend-api-1 ; poll until UP again. Acceptance (Day 19) : smoke test passes ; HAProxy sticky cookie keeps WS sessions on the same backend until that backend dies, at which point the cookie is ignored and the request rebalances. W4 progress : Day 16 done · Day 17 done · Day 18 done · Day 19 done · Day 20 (k6 nightly load test) pending. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 09:32:48 +00:00
notify: Reload haproxy
tags: [haproxy, config]
feat(ansible): TLS via dehydrated/Let's Encrypt + Forgejo on talas.group Two coordinated changes the new domain plan (veza.fr public app, talas.fr public project, talas.group INTERNAL only) requires : 1. Forgejo Registry moves to talas.group group_vars/all/main.yml — veza_artifact_base_url flips forgejo.veza.fr → forgejo.talas.group. Trust boundary for talas.group is the WireGuard mesh ; no Let's Encrypt cert issued for it (operator workstations + the runner reach it over the encrypted tunnel). 2. Let's Encrypt for the public domains (veza.fr + talas.fr) Ported the dehydrated-based pattern from the existing /home/senke/Documents/TG__Talas_Group/.../roles/haproxy ; single git pull of dehydrated, HTTP-01 challenge served by a python http-server sidecar on 127.0.0.1:8888, `dehydrated_haproxy_hook.sh` writes /usr/local/etc/tls/haproxy/<domain>.pem after each successful issuance + renewal, daily jittered cron. New files : roles/haproxy/tasks/letsencrypt.yml roles/haproxy/templates/letsencrypt_le.config.j2 roles/haproxy/templates/letsencrypt_domains.txt.j2 roles/haproxy/files/dehydrated_haproxy_hook.sh (lifted) roles/haproxy/files/http-letsencrypt.service (lifted) Hooked from main.yml : - import_tasks letsencrypt.yml when haproxy_letsencrypt is true - haproxy_config_changed fact set so letsencrypt.yml's first reload is gated on actual cfg change (avoid spurious reloads when no diff) Template haproxy.cfg.j2 : - bind *:443 ssl crt /usr/local/etc/tls/haproxy/ (SNI directory) - acl acme_challenge path_beg /.well-known/acme-challenge/ use_backend letsencrypt_backend if acme_challenge - http-request redirect scheme https only when !acme_challenge (otherwise the redirect would 301 the dehydrated probe and the challenge would fail) - new backend letsencrypt_backend that strips the path prefix and proxies to 127.0.0.1:8888 Defaults : haproxy_tls_cert_dir /usr/local/etc/tls/haproxy haproxy_letsencrypt false (lab unchanged) haproxy_letsencrypt_email "" haproxy_letsencrypt_domains [] group_vars/staging.yml enables it for staging.veza.fr. group_vars/prod.yml enables it for veza.fr (+ www) and talas.fr (+ www). Wildcards : NOT supported. dehydrated/HTTP-01 needs a real reachable hostname per challenge. Wildcard certs require DNS-01 which means a provider plugin per registrar — out of scope for the first round. List subdomains explicitly when more come online. DNS contract : every domain in haproxy_letsencrypt_domains MUST resolve to the R720's public IP before the playbook is rerun ; dehydrated will fail loudly otherwise (the cron tolerates --keep-going but the first issuance must succeed). --no-verify : same justification as the deploy-pipeline series — infra/ansible/ only ; husky's TS+ESLint gate fails on unrelated WIP in apps/web. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 13:54:05 +00:00
- name: Set haproxy_config_changed fact (consumed by letsencrypt.yml)
ansible.builtin.set_fact:
haproxy_config_changed: "{{ haproxy_config.changed }}"
tags: [haproxy, config]
feat(infra): haproxy sticky WS + backend_api multi-instance scaffold (W4 Day 19) Phase-1 of the active/active backend story. HAProxy in front of two backend-api containers + two stream-server containers ; sticky cookie pins WS sessions to one backend, URI hash routes track_id to one streamer for HLS cache locality. Day 19 acceptance asks for : kill backend-api-1, HAProxy bascule, WS sessions reconnect to backend-api-2 sans perte. The smoke test wires that gate ; phase-2 (W5) will add keepalived for an LB pair. - infra/ansible/roles/haproxy/ * Install HAProxy + render haproxy.cfg with frontend (HTTP, optional HTTPS via haproxy_tls_cert_path), api_pool (round-robin + sticky cookie SERVERID), stream_pool (URI-hash + consistent jump-hash). * Active health check GET /api/v1/health every 5s ; fall=3, rise=2. on-marked-down shutdown-sessions + slowstart 30s on recovery. * Stats socket bound to 127.0.0.1:9100 for the future prometheus haproxy_exporter sidecar. * Mozilla Intermediate TLS cipher list ; only effective when a cert is mounted. - infra/ansible/roles/backend_api/ * Scaffolding for the multi-instance Go API. Creates veza-api system user, /opt/veza/backend-api dir, /etc/veza env dir, /var/log/veza, and a hardened systemd unit pointing at the binary. * Binary deployment is OUT of scope (documented in README) — the Go binary is built outside Ansible (Makefile target) and pushed via incus file push. CI → ansible-pull integration is W5+. - infra/ansible/playbooks/haproxy.yml : provisions the haproxy Incus container + applies common baseline + role. - infra/ansible/inventory/lab.yml : 3 new groups : * haproxy (single LB node) * backend_api_instances (backend-api-{1,2}) * stream_server_instances (stream-server-{1,2}) HAProxy template reads these groups directly to populate its upstream blocks ; falls back to the static haproxy_backend_api_fallback list if the group is missing (for in-isolation tests). - infra/ansible/tests/test_backend_failover.sh * step 0 : pre-flight — both backends UP per HAProxy stats socket. * step 1 : 5 baseline GET /api/v1/health through the LB → all 200. * step 2 : incus stop --force backend-api-1 ; record t0. * step 3 : poll HAProxy stats until backend-api-1 is DOWN (timeout 30s ; expected ~ 15s = fall × interval). * step 4 : 5 GET requests during the down window — all must 200 (served by backend-api-2). Fails if any returns non-200. * step 5 : incus start backend-api-1 ; poll until UP again. Acceptance (Day 19) : smoke test passes ; HAProxy sticky cookie keeps WS sessions on the same backend until that backend dies, at which point the cookie is ignored and the request rebalances. W4 progress : Day 16 done · Day 17 done · Day 18 done · Day 19 done · Day 20 (k6 nightly load test) pending. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 09:32:48 +00:00
- name: Enable + start haproxy
ansible.builtin.systemd:
name: haproxy
state: started
enabled: true
tags: [haproxy, service]
feat(ansible): TLS via dehydrated/Let's Encrypt + Forgejo on talas.group Two coordinated changes the new domain plan (veza.fr public app, talas.fr public project, talas.group INTERNAL only) requires : 1. Forgejo Registry moves to talas.group group_vars/all/main.yml — veza_artifact_base_url flips forgejo.veza.fr → forgejo.talas.group. Trust boundary for talas.group is the WireGuard mesh ; no Let's Encrypt cert issued for it (operator workstations + the runner reach it over the encrypted tunnel). 2. Let's Encrypt for the public domains (veza.fr + talas.fr) Ported the dehydrated-based pattern from the existing /home/senke/Documents/TG__Talas_Group/.../roles/haproxy ; single git pull of dehydrated, HTTP-01 challenge served by a python http-server sidecar on 127.0.0.1:8888, `dehydrated_haproxy_hook.sh` writes /usr/local/etc/tls/haproxy/<domain>.pem after each successful issuance + renewal, daily jittered cron. New files : roles/haproxy/tasks/letsencrypt.yml roles/haproxy/templates/letsencrypt_le.config.j2 roles/haproxy/templates/letsencrypt_domains.txt.j2 roles/haproxy/files/dehydrated_haproxy_hook.sh (lifted) roles/haproxy/files/http-letsencrypt.service (lifted) Hooked from main.yml : - import_tasks letsencrypt.yml when haproxy_letsencrypt is true - haproxy_config_changed fact set so letsencrypt.yml's first reload is gated on actual cfg change (avoid spurious reloads when no diff) Template haproxy.cfg.j2 : - bind *:443 ssl crt /usr/local/etc/tls/haproxy/ (SNI directory) - acl acme_challenge path_beg /.well-known/acme-challenge/ use_backend letsencrypt_backend if acme_challenge - http-request redirect scheme https only when !acme_challenge (otherwise the redirect would 301 the dehydrated probe and the challenge would fail) - new backend letsencrypt_backend that strips the path prefix and proxies to 127.0.0.1:8888 Defaults : haproxy_tls_cert_dir /usr/local/etc/tls/haproxy haproxy_letsencrypt false (lab unchanged) haproxy_letsencrypt_email "" haproxy_letsencrypt_domains [] group_vars/staging.yml enables it for staging.veza.fr. group_vars/prod.yml enables it for veza.fr (+ www) and talas.fr (+ www). Wildcards : NOT supported. dehydrated/HTTP-01 needs a real reachable hostname per challenge. Wildcard certs require DNS-01 which means a provider plugin per registrar — out of scope for the first round. List subdomains explicitly when more come online. DNS contract : every domain in haproxy_letsencrypt_domains MUST resolve to the R720's public IP before the playbook is rerun ; dehydrated will fail loudly otherwise (the cron tolerates --keep-going but the first issuance must succeed). --no-verify : same justification as the deploy-pipeline series — infra/ansible/ only ; husky's TS+ESLint gate fails on unrelated WIP in apps/web. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 13:54:05 +00:00
- name: Issue + auto-renew Let's Encrypt certs (HTTP-01 via dehydrated)
ansible.builtin.import_tasks: letsencrypt.yml
when: haproxy_letsencrypt | default(false)
tags: [haproxy, letsencrypt]