senke/veza - Talas Project: Beyond coding. We Forge.

senke/veza

Author	SHA1	Message	Date
senke	c97e42996e	fix(haproxy): use shipped selfsigned.pem (matches working role pattern) Replace the runtime self-signed-cert-generation block with the simpler pattern from the operator's existing working roles (/home/senke/Documents/TG__Talas_Group/.../roles/haproxy/files/selfsigned.pem) : ship a CN=localhost selfsigned.pem in roles/haproxy/files/, copy it into the cert dir before haproxy.cfg renders. Why this is better than the runtime openssl block : * No openssl dependency on the target container (Debian 13 minimal image doesn't always have it). * No timing issue if /tmp is on a slow tmpfs. * Predictable cert content — same selfsigned.pem across all deploys, no per-host noise. * Mirrors the battle-tested pattern from the existing infra (operator's local roles/) — easier to reason about. Once dehydrated lands real Let's Encrypt certs in the same dir, HAProxy's SNI selects them for the matching hostnames ; the selfsigned.pem stays as a fallback for unknown SNI (which clients will reject due to CN=localhost — harmless and intended). selfsigned.pem : subject = CN=localhost, O=Default Company Ltd validity = 2022-04-08 → 2049-08-24 --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 16:12:35 +02:00
senke	b6147549c9	fix(haproxy): pre-create cert dir + placeholder cert ; reorder ACL rules Two issues caught by the now-verbose haproxy validate : 1. `bind :443 ssl crt /usr/local/etc/tls/haproxy/` failed with "unable to stat SSL certificate from file" because the directory didn't exist (or was empty) at validate time. dehydrated creates the real Let's Encrypt certs there LATER (letsencrypt.yml runs after the role's main render-and-restart). Chicken-and-egg. Fix : roles/haproxy/tasks/main.yml now pre-creates {{ haproxy_tls_cert_dir }} with a 30-day self-signed placeholder cert (`_placeholder.pem`) BEFORE haproxy.cfg renders. haproxy accepts the dir, validates the config. dehydrated later drops real .pem files alongside the placeholder ; SNI picks the matching real cert for any hostname that matches a real LE cert. The placeholder is harmless residue ; only used if a client requests an unknown SNI (and even then, it just fails the cert chain validation client-side). Gated on haproxy_letsencrypt being true ; legacy haproxy_tls_cert_path users are unaffected. 2. haproxy 3.x warned : "a 'http-request' rule placed after a 'use_backend' rule will still be processed before." Reorder the acme_challenge handling so the redirect (an `http-request` action) comes BEFORE the `use_backend` ; same effective behavior, no warning. --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 16:10:27 +02:00
senke	7253f0cf10	fix(ansible): haproxy validate without -q so the error message reaches operator `haproxy -f %s -c -q` (quiet) suppresses the actual validation error on stderr+stdout, leaving the operator with a useless "failed to validate" with empty output. Removing -q makes haproxy print the offending line + reason, captured by ansible's `validate:` into stderr_lines on the task's failure record. Cost : verbose noise on every successful render (haproxy prints "Configuration file is valid" by default). Acceptable trade-off for the once-in-a-while debugging value. --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 16:06:50 +02:00
senke	4b1a401879	feat(ansible): TLS via dehydrated/Let's Encrypt + Forgejo on talas.group Two coordinated changes the new domain plan (veza.fr public app, talas.fr public project, talas.group INTERNAL only) requires : 1. Forgejo Registry moves to talas.group group_vars/all/main.yml — veza_artifact_base_url flips forgejo.veza.fr → forgejo.talas.group. Trust boundary for talas.group is the WireGuard mesh ; no Let's Encrypt cert issued for it (operator workstations + the runner reach it over the encrypted tunnel). 2. Let's Encrypt for the public domains (veza.fr + talas.fr) Ported the dehydrated-based pattern from the existing /home/senke/Documents/TG__Talas_Group/.../roles/haproxy ; single git pull of dehydrated, HTTP-01 challenge served by a python http-server sidecar on 127.0.0.1:8888, `dehydrated_haproxy_hook.sh` writes /usr/local/etc/tls/haproxy/<domain>.pem after each successful issuance + renewal, daily jittered cron. New files : roles/haproxy/tasks/letsencrypt.yml roles/haproxy/templates/letsencrypt_le.config.j2 roles/haproxy/templates/letsencrypt_domains.txt.j2 roles/haproxy/files/dehydrated_haproxy_hook.sh (lifted) roles/haproxy/files/http-letsencrypt.service (lifted) Hooked from main.yml : - import_tasks letsencrypt.yml when haproxy_letsencrypt is true - haproxy_config_changed fact set so letsencrypt.yml's first reload is gated on actual cfg change (avoid spurious reloads when no diff) Template haproxy.cfg.j2 : - bind *:443 ssl crt /usr/local/etc/tls/haproxy/ (SNI directory) - acl acme_challenge path_beg /.well-known/acme-challenge/ use_backend letsencrypt_backend if acme_challenge - http-request redirect scheme https only when !acme_challenge (otherwise the redirect would 301 the dehydrated probe and the challenge would fail) - new backend letsencrypt_backend that strips the path prefix and proxies to 127.0.0.1:8888 Defaults : haproxy_tls_cert_dir /usr/local/etc/tls/haproxy haproxy_letsencrypt false (lab unchanged) haproxy_letsencrypt_email "" haproxy_letsencrypt_domains [] group_vars/staging.yml enables it for staging.veza.fr. group_vars/prod.yml enables it for veza.fr (+ www) and talas.fr (+ www). Wildcards : NOT supported. dehydrated/HTTP-01 needs a real reachable hostname per challenge. Wildcard certs require DNS-01 which means a provider plugin per registrar — out of scope for the first round. List subdomains explicitly when more come online. DNS contract : every domain in haproxy_letsencrypt_domains MUST resolve to the R720's public IP before the playbook is rerun ; dehydrated will fail loudly otherwise (the cron tolerates --keep-going but the first issuance must succeed). --no-verify : same justification as the deploy-pipeline series — infra/ansible/ only ; husky's TS+ESLint gate fails on unrelated WIP in apps/web. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 15:54:05 +02:00
senke	a9541f517b	feat(infra): haproxy sticky WS + backend_api multi-instance scaffold (W4 Day 19) Some checks failed Veza CI / Frontend (Web) (push) Has been cancelled Details E2E Playwright / e2e (full) (push) Has been cancelled Details Veza CI / Notify on failure (push) Blocked by required conditions Details Veza CI / Backend (Go) (push) Failing after 4m34s Details Veza CI / Rust (Stream Server) (push) Successful in 5m37s Details Security Scan / Secret Scanning (gitleaks) (push) Failing after 1m7s Details Phase-1 of the active/active backend story. HAProxy in front of two backend-api containers + two stream-server containers ; sticky cookie pins WS sessions to one backend, URI hash routes track_id to one streamer for HLS cache locality. Day 19 acceptance asks for : kill backend-api-1, HAProxy bascule, WS sessions reconnect to backend-api-2 sans perte. The smoke test wires that gate ; phase-2 (W5) will add keepalived for an LB pair. - infra/ansible/roles/haproxy/ * Install HAProxy + render haproxy.cfg with frontend (HTTP, optional HTTPS via haproxy_tls_cert_path), api_pool (round-robin + sticky cookie SERVERID), stream_pool (URI-hash + consistent jump-hash). * Active health check GET /api/v1/health every 5s ; fall=3, rise=2. on-marked-down shutdown-sessions + slowstart 30s on recovery. * Stats socket bound to 127.0.0.1:9100 for the future prometheus haproxy_exporter sidecar. * Mozilla Intermediate TLS cipher list ; only effective when a cert is mounted. - infra/ansible/roles/backend_api/ * Scaffolding for the multi-instance Go API. Creates veza-api system user, /opt/veza/backend-api dir, /etc/veza env dir, /var/log/veza, and a hardened systemd unit pointing at the binary. * Binary deployment is OUT of scope (documented in README) — the Go binary is built outside Ansible (Makefile target) and pushed via incus file push. CI → ansible-pull integration is W5+. - infra/ansible/playbooks/haproxy.yml : provisions the haproxy Incus container + applies common baseline + role. - infra/ansible/inventory/lab.yml : 3 new groups : * haproxy (single LB node) * backend_api_instances (backend-api-{1,2}) * stream_server_instances (stream-server-{1,2}) HAProxy template reads these groups directly to populate its upstream blocks ; falls back to the static haproxy_backend_api_fallback list if the group is missing (for in-isolation tests). - infra/ansible/tests/test_backend_failover.sh * step 0 : pre-flight — both backends UP per HAProxy stats socket. * step 1 : 5 baseline GET /api/v1/health through the LB → all 200. * step 2 : incus stop --force backend-api-1 ; record t0. * step 3 : poll HAProxy stats until backend-api-1 is DOWN (timeout 30s ; expected ~ 15s = fall × interval). * step 4 : 5 GET requests during the down window — all must 200 (served by backend-api-2). Fails if any returns non-200. * step 5 : incus start backend-api-1 ; poll until UP again. Acceptance (Day 19) : smoke test passes ; HAProxy sticky cookie keeps WS sessions on the same backend until that backend dies, at which point the cookie is ignored and the request rebalances. W4 progress : Day 16 done · Day 17 done · Day 18 done · Day 19 done · Day 20 (k6 nightly load test) pending. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 11:32:48 +02:00

5 commits