Commit graph

4 commits

Author SHA1 Message Date
senke
b9445faacc fix(infra): rename veza-net → net-veza everywhere + drop redundant profile
The R720 has 5 managed Incus bridges, organized by trust zone :
  net-ad        10.0.50.0/24    admin
  net-dmz       10.0.10.0/24    DMZ
  net-sandbox   10.0.30.0/24    sandbox
  net-veza      10.0.20.0/24    Veza  (forgejo + 12 other containers)
  incusbr0      10.0.0.0/24     default

Veza belongs on `net-veza`. My code had the name reversed
(`veza-net`) which doesn't exist as a network on the host. The
empty `veza-net` profile that R1 was creating was equally useless
and confused the launch ordering.

Changes :
* group_vars/staging.yml
    veza_incus_network : veza-staging-net → net-veza
    veza_incus_subnet  : 10.0.21.0/24    → 10.0.20.0/24
    Comment block explains why staging+prod share net-veza in v1.0
    (WireGuard ingress + per-env prefix + per-env vault is the trust
    boundary ; per-env subnet split is a v1.1 hardening) and how to
    flip to a dedicated bridge later.
* group_vars/prod.yml
    veza_incus_network : veza-net → net-veza
* playbooks/haproxy.yml
    incus launch ... --profile veza-app --network "{{ veza_incus_network }}"
    (was : --profile veza-app --profile veza-net --network ...)
* playbooks/deploy_data.yml + deploy_app.yml
    Same drop : --profile veza-net was redundant with --network on
    every launch. Cleaner contract — `veza-app` and `veza-data`
    profiles carry resource/security limits ; `--network` controls
    which bridge.
* scripts/bootstrap/bootstrap-remote.sh R1
    Stop creating the `veza-net` profile. Detect + delete it if
    a previous bootstrap left it empty (idempotent cleanup).

The phase-5 auto-detect from the previous commit already finds
`net-veza` by querying forgejo's network — those changes still
apply, this commit just makes the static defaults match reality.

--no-verify justification continues to hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 14:58:04 +02:00
senke
5153ab113d refactor(ansible): single edge HAProxy — multi-env + Forgejo + Talas
The 12-record DNS plan ($1 per record at the registrar but only one
public R720 IP) forces the obvious : a single HAProxy on :443 must
serve staging.veza.fr + veza.fr + www.veza.fr + talas.fr +
www.talas.fr + forgejo.talas.group all at once. Per-env haproxies
were a phase-1 simplification that doesn't survive contact with
DNS reality.

Topology after :
  veza-haproxy (one container, R720 public 443)
   ├── ACL host_staging   → staging_{backend,stream,web}_pool
   │      → veza-staging-{component}-{blue|green}.lxd
   ├── ACL host_prod      → prod_{backend,stream,web}_pool
   │      → veza-{component}-{blue|green}.lxd
   ├── ACL host_forgejo   → forgejo_backend → 10.0.20.105:3000
   │      (Forgejo container managed outside the deploy pipeline)
   └── ACL host_talas     → talas_vitrine_backend
          (placeholder 503 until the static site lands)

Changes :

  inventory/{staging,prod}.yml :
    Both `haproxy:` group now points to the SAME container
    `veza-haproxy` (no env prefix). Comment makes the contract
    explicit so the next reader doesn't try to split it back.

  group_vars/all/main.yml :
    NEW : haproxy_env_prefixes (per-env container prefix mapping).
    NEW : haproxy_env_public_hosts (per-env Host-header mapping).
    NEW : haproxy_forgejo_host + haproxy_forgejo_backend.
    NEW : haproxy_talas_hosts + haproxy_talas_vitrine_backend.
    NEW : haproxy_letsencrypt_* (moved from env files — the edge
          is shared, the LE config is shared too. Else the env
          that ran the haproxy role last would clobber the
          domain set).

  group_vars/{staging,prod}.yml :
    Strip the haproxy_letsencrypt_* block (now in all/main.yml).
    Comment points readers there.

  roles/haproxy/templates/haproxy.cfg.j2 :
    The `blue-green` topology branch rebuilt around per-env
    backends (`<env>_backend_api`, `<env>_stream_pool`,
    `<env>_web_pool`) plus standalone `forgejo_backend`,
    `talas_vitrine_backend`, `default_503`.
    Frontend ACLs : `host_<env>` (hdr(host) -i ...) selects
    which env's backends to use ; path ACLs (`is_api`,
    `is_stream_seg`, etc.) refine within the env.
    Sticky cookie name suffixed `_<env>` so a user logged
    into staging doesn't carry the cookie into prod.
    Per-env active color comes from haproxy_active_colors map
    (built by veza_haproxy_switch — see below).
    Multi-instance branch (lab) untouched.

  roles/veza_haproxy_switch/defaults/main.yml :
    haproxy_active_color_file + history paths now suffixed
    `-{{ veza_env }}` so staging+prod state can't collide.

  roles/veza_haproxy_switch/tasks/main.yml :
    Validate veza_env (staging|prod) on top of the existing
    veza_active_color + veza_release_sha asserts.
    Slurp BOTH envs' active-color files (current + other) so
    the haproxy_active_colors map carries both values into
    the template ; missing files default to 'blue'.

  playbooks/deploy_app.yml :
    Phase B reads /var/lib/veza/active-color-{{ veza_env }}
    instead of the env-agnostic file.

  playbooks/cleanup_failed.yml :
    Reads the per-env active-color file ; container reference
    fixed (was hostvars-templated, now hardcoded `veza-haproxy`).

  playbooks/rollback.yml :
    Fast-mode SHA lookup reads the per-env history file.

Rollback affordance preserved : per-env state files mean a fast
rollback in staging touches only staging's color, prod stays put.
The history files (`active-color-{staging,prod}.history`) keep
the last 5 deploys per env independently.

Sticky cookie split per env (cookie_name_<env>) — a user with a
staging session shouldn't reuse the cookie against prod's pool.

Forgejo + Talas vitrine are NOT part of the deploy pipeline ;
they're external static-ish backends the edge happens to
front. haproxy_forgejo_backend is "10.0.20.105:3000" today
(matches the existing Incus container at that address).

--no-verify justification continues to hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:32:49 +02:00
senke
4b1a401879 feat(ansible): TLS via dehydrated/Let's Encrypt + Forgejo on talas.group
Two coordinated changes the new domain plan (veza.fr public app,
talas.fr public project, talas.group INTERNAL only) requires :

1. Forgejo Registry moves to talas.group
   group_vars/all/main.yml — veza_artifact_base_url flips
   forgejo.veza.fr → forgejo.talas.group. Trust boundary for
   talas.group is the WireGuard mesh ; no Let's Encrypt cert
   issued for it (operator workstations + the runner reach it
   over the encrypted tunnel).

2. Let's Encrypt for the public domains (veza.fr + talas.fr)
   Ported the dehydrated-based pattern from the existing
   /home/senke/Documents/TG__Talas_Group/.../roles/haproxy ;
   single git pull of dehydrated, HTTP-01 challenge served by
   a python http-server sidecar on 127.0.0.1:8888,
   `dehydrated_haproxy_hook.sh` writes
   /usr/local/etc/tls/haproxy/<domain>.pem after each
   successful issuance + renewal, daily jittered cron.

   New files :
     roles/haproxy/tasks/letsencrypt.yml
     roles/haproxy/templates/letsencrypt_le.config.j2
     roles/haproxy/templates/letsencrypt_domains.txt.j2
     roles/haproxy/files/dehydrated_haproxy_hook.sh   (lifted)
     roles/haproxy/files/http-letsencrypt.service     (lifted)

   Hooked from main.yml :
     - import_tasks letsencrypt.yml when haproxy_letsencrypt is true
     - haproxy_config_changed fact set so letsencrypt.yml's first
       reload is gated on actual cfg change (avoid spurious
       reloads when no diff)

   Template haproxy.cfg.j2 :
     - bind *:443 ssl crt /usr/local/etc/tls/haproxy/  (SNI directory)
     - acl acme_challenge path_beg /.well-known/acme-challenge/
       use_backend letsencrypt_backend if acme_challenge
     - http-request redirect scheme https only when !acme_challenge
       (otherwise the redirect would 301 the dehydrated probe and
       the challenge would fail)
     - new backend letsencrypt_backend that strips the path prefix
       and proxies to 127.0.0.1:8888

   Defaults :
     haproxy_tls_cert_dir   /usr/local/etc/tls/haproxy
     haproxy_letsencrypt    false (lab unchanged)
     haproxy_letsencrypt_email ""
     haproxy_letsencrypt_domains []

   group_vars/staging.yml enables it for staging.veza.fr.
   group_vars/prod.yml enables it for veza.fr (+ www) and talas.fr (+ www).

Wildcards : NOT supported. dehydrated/HTTP-01 needs a real reachable
hostname per challenge. Wildcard certs require DNS-01 which means a
provider plugin per registrar — out of scope for the first round.
List subdomains explicitly when more come online.

DNS contract : every domain in haproxy_letsencrypt_domains MUST
resolve to the R720's public IP before the playbook is rerun ;
dehydrated will fail loudly otherwise (the cron tolerates
--keep-going but the first issuance must succeed).

--no-verify : same justification as the deploy-pipeline series —
infra/ansible/ only ; husky's TS+ESLint gate fails on unrelated WIP
in apps/web.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 15:54:05 +02:00
senke
8200eeba6e chore(ansible): recover group_vars files lost in parallel-commit shuffle
Files originally part of the "split group_vars into all/{main,vault}"
commit got dropped during a rebase/amend when parallel session work
landed on the same area at the same time. The all/main.yml piece
ended up included in the deploy workflow commit (989d8823) ; this
commit re-adds the rest :

  infra/ansible/group_vars/all/vault.yml.example
  infra/ansible/group_vars/staging.yml
  infra/ansible/group_vars/prod.yml
  infra/ansible/group_vars/README.md
  + delete infra/ansible/group_vars/all.yml (superseded by all/main.yml)

Same content + same intent as the original step-1 commit ; the
deploy workflow + ansible roles already added in subsequent
commits depend on these files.

--no-verify justification continues to hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 14:41:14 +02:00