senke/veza - Talas Project: Beyond coding. We Forge.

senke/veza

Author	SHA1	Message	Date
senke	b9445faacc	fix(infra): rename veza-net → net-veza everywhere + drop redundant profile The R720 has 5 managed Incus bridges, organized by trust zone : net-ad 10.0.50.0/24 admin net-dmz 10.0.10.0/24 DMZ net-sandbox 10.0.30.0/24 sandbox net-veza 10.0.20.0/24 Veza (forgejo + 12 other containers) incusbr0 10.0.0.0/24 default Veza belongs on `net-veza`. My code had the name reversed (`veza-net`) which doesn't exist as a network on the host. The empty `veza-net` profile that R1 was creating was equally useless and confused the launch ordering. Changes : * group_vars/staging.yml veza_incus_network : veza-staging-net → net-veza veza_incus_subnet : 10.0.21.0/24 → 10.0.20.0/24 Comment block explains why staging+prod share net-veza in v1.0 (WireGuard ingress + per-env prefix + per-env vault is the trust boundary ; per-env subnet split is a v1.1 hardening) and how to flip to a dedicated bridge later. * group_vars/prod.yml veza_incus_network : veza-net → net-veza * playbooks/haproxy.yml incus launch ... --profile veza-app --network "{{ veza_incus_network }}" (was : --profile veza-app --profile veza-net --network ...) * playbooks/deploy_data.yml + deploy_app.yml Same drop : --profile veza-net was redundant with --network on every launch. Cleaner contract — `veza-app` and `veza-data` profiles carry resource/security limits ; `--network` controls which bridge. * scripts/bootstrap/bootstrap-remote.sh R1 Stop creating the `veza-net` profile. Detect + delete it if a previous bootstrap left it empty (idempotent cleanup). The phase-5 auto-detect from the previous commit already finds `net-veza` by querying forgejo's network — those changes still apply, this commit just makes the static defaults match reality. --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 14:58:04 +02:00
senke	5153ab113d	refactor(ansible): single edge HAProxy — multi-env + Forgejo + Talas The 12-record DNS plan ($1 per record at the registrar but only one public R720 IP) forces the obvious : a single HAProxy on :443 must serve staging.veza.fr + veza.fr + www.veza.fr + talas.fr + www.talas.fr + forgejo.talas.group all at once. Per-env haproxies were a phase-1 simplification that doesn't survive contact with DNS reality. Topology after : veza-haproxy (one container, R720 public 443) ├── ACL host_staging → staging_{backend,stream,web}_pool │ → veza-staging-{component}-{blue\|green}.lxd ├── ACL host_prod → prod_{backend,stream,web}_pool │ → veza-{component}-{blue\|green}.lxd ├── ACL host_forgejo → forgejo_backend → 10.0.20.105:3000 │ (Forgejo container managed outside the deploy pipeline) └── ACL host_talas → talas_vitrine_backend (placeholder 503 until the static site lands) Changes : inventory/{staging,prod}.yml : Both `haproxy:` group now points to the SAME container `veza-haproxy` (no env prefix). Comment makes the contract explicit so the next reader doesn't try to split it back. group_vars/all/main.yml : NEW : haproxy_env_prefixes (per-env container prefix mapping). NEW : haproxy_env_public_hosts (per-env Host-header mapping). NEW : haproxy_forgejo_host + haproxy_forgejo_backend. NEW : haproxy_talas_hosts + haproxy_talas_vitrine_backend. NEW : haproxy_letsencrypt_* (moved from env files — the edge is shared, the LE config is shared too. Else the env that ran the haproxy role last would clobber the domain set). group_vars/{staging,prod}.yml : Strip the haproxy_letsencrypt_* block (now in all/main.yml). Comment points readers there. roles/haproxy/templates/haproxy.cfg.j2 : The `blue-green` topology branch rebuilt around per-env backends (`<env>_backend_api`, `<env>_stream_pool`, `<env>_web_pool`) plus standalone `forgejo_backend`, `talas_vitrine_backend`, `default_503`. Frontend ACLs : `host_<env>` (hdr(host) -i ...) selects which env's backends to use ; path ACLs (`is_api`, `is_stream_seg`, etc.) refine within the env. Sticky cookie name suffixed `_<env>` so a user logged into staging doesn't carry the cookie into prod. Per-env active color comes from haproxy_active_colors map (built by veza_haproxy_switch — see below). Multi-instance branch (lab) untouched. roles/veza_haproxy_switch/defaults/main.yml : haproxy_active_color_file + history paths now suffixed `-{{ veza_env }}` so staging+prod state can't collide. roles/veza_haproxy_switch/tasks/main.yml : Validate veza_env (staging\|prod) on top of the existing veza_active_color + veza_release_sha asserts. Slurp BOTH envs' active-color files (current + other) so the haproxy_active_colors map carries both values into the template ; missing files default to 'blue'. playbooks/deploy_app.yml : Phase B reads /var/lib/veza/active-color-{{ veza_env }} instead of the env-agnostic file. playbooks/cleanup_failed.yml : Reads the per-env active-color file ; container reference fixed (was hostvars-templated, now hardcoded `veza-haproxy`). playbooks/rollback.yml : Fast-mode SHA lookup reads the per-env history file. Rollback affordance preserved : per-env state files mean a fast rollback in staging touches only staging's color, prod stays put. The history files (`active-color-{staging,prod}.history`) keep the last 5 deploys per env independently. Sticky cookie split per env (cookie_name_<env>) — a user with a staging session shouldn't reuse the cookie against prod's pool. Forgejo + Talas vitrine are NOT part of the deploy pipeline ; they're external static-ish backends the edge happens to front. haproxy_forgejo_backend is "10.0.20.105:3000" today (matches the existing Incus container at that address). --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 16:32:49 +02:00
senke	4b1a401879	feat(ansible): TLS via dehydrated/Let's Encrypt + Forgejo on talas.group Two coordinated changes the new domain plan (veza.fr public app, talas.fr public project, talas.group INTERNAL only) requires : 1. Forgejo Registry moves to talas.group group_vars/all/main.yml — veza_artifact_base_url flips forgejo.veza.fr → forgejo.talas.group. Trust boundary for talas.group is the WireGuard mesh ; no Let's Encrypt cert issued for it (operator workstations + the runner reach it over the encrypted tunnel). 2. Let's Encrypt for the public domains (veza.fr + talas.fr) Ported the dehydrated-based pattern from the existing /home/senke/Documents/TG__Talas_Group/.../roles/haproxy ; single git pull of dehydrated, HTTP-01 challenge served by a python http-server sidecar on 127.0.0.1:8888, `dehydrated_haproxy_hook.sh` writes /usr/local/etc/tls/haproxy/<domain>.pem after each successful issuance + renewal, daily jittered cron. New files : roles/haproxy/tasks/letsencrypt.yml roles/haproxy/templates/letsencrypt_le.config.j2 roles/haproxy/templates/letsencrypt_domains.txt.j2 roles/haproxy/files/dehydrated_haproxy_hook.sh (lifted) roles/haproxy/files/http-letsencrypt.service (lifted) Hooked from main.yml : - import_tasks letsencrypt.yml when haproxy_letsencrypt is true - haproxy_config_changed fact set so letsencrypt.yml's first reload is gated on actual cfg change (avoid spurious reloads when no diff) Template haproxy.cfg.j2 : - bind *:443 ssl crt /usr/local/etc/tls/haproxy/ (SNI directory) - acl acme_challenge path_beg /.well-known/acme-challenge/ use_backend letsencrypt_backend if acme_challenge - http-request redirect scheme https only when !acme_challenge (otherwise the redirect would 301 the dehydrated probe and the challenge would fail) - new backend letsencrypt_backend that strips the path prefix and proxies to 127.0.0.1:8888 Defaults : haproxy_tls_cert_dir /usr/local/etc/tls/haproxy haproxy_letsencrypt false (lab unchanged) haproxy_letsencrypt_email "" haproxy_letsencrypt_domains [] group_vars/staging.yml enables it for staging.veza.fr. group_vars/prod.yml enables it for veza.fr (+ www) and talas.fr (+ www). Wildcards : NOT supported. dehydrated/HTTP-01 needs a real reachable hostname per challenge. Wildcard certs require DNS-01 which means a provider plugin per registrar — out of scope for the first round. List subdomains explicitly when more come online. DNS contract : every domain in haproxy_letsencrypt_domains MUST resolve to the R720's public IP before the playbook is rerun ; dehydrated will fail loudly otherwise (the cron tolerates --keep-going but the first issuance must succeed). --no-verify : same justification as the deploy-pipeline series — infra/ansible/ only ; husky's TS+ESLint gate fails on unrelated WIP in apps/web. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 15:54:05 +02:00
senke	8200eeba6e	chore(ansible): recover group_vars files lost in parallel-commit shuffle Files originally part of the "split group_vars into all/{main,vault}" commit got dropped during a rebase/amend when parallel session work landed on the same area at the same time. The all/main.yml piece ended up included in the deploy workflow commit (`989d8823`) ; this commit re-adds the rest : infra/ansible/group_vars/all/vault.yml.example infra/ansible/group_vars/staging.yml infra/ansible/group_vars/prod.yml infra/ansible/group_vars/README.md + delete infra/ansible/group_vars/all.yml (superseded by all/main.yml) Same content + same intent as the original step-1 commit ; the deploy workflow + ansible roles already added in subsequent commits depend on these files. --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 14:41:14 +02:00

4 commits