veza/infra/ansible/roles/veza_haproxy_switch/tasks/main.yml
senke 5153ab113d refactor(ansible): single edge HAProxy — multi-env + Forgejo + Talas
The 12-record DNS plan ($1 per record at the registrar but only one
public R720 IP) forces the obvious : a single HAProxy on :443 must
serve staging.veza.fr + veza.fr + www.veza.fr + talas.fr +
www.talas.fr + forgejo.talas.group all at once. Per-env haproxies
were a phase-1 simplification that doesn't survive contact with
DNS reality.

Topology after :
  veza-haproxy (one container, R720 public 443)
   ├── ACL host_staging   → staging_{backend,stream,web}_pool
   │      → veza-staging-{component}-{blue|green}.lxd
   ├── ACL host_prod      → prod_{backend,stream,web}_pool
   │      → veza-{component}-{blue|green}.lxd
   ├── ACL host_forgejo   → forgejo_backend → 10.0.20.105:3000
   │      (Forgejo container managed outside the deploy pipeline)
   └── ACL host_talas     → talas_vitrine_backend
          (placeholder 503 until the static site lands)

Changes :

  inventory/{staging,prod}.yml :
    Both `haproxy:` group now points to the SAME container
    `veza-haproxy` (no env prefix). Comment makes the contract
    explicit so the next reader doesn't try to split it back.

  group_vars/all/main.yml :
    NEW : haproxy_env_prefixes (per-env container prefix mapping).
    NEW : haproxy_env_public_hosts (per-env Host-header mapping).
    NEW : haproxy_forgejo_host + haproxy_forgejo_backend.
    NEW : haproxy_talas_hosts + haproxy_talas_vitrine_backend.
    NEW : haproxy_letsencrypt_* (moved from env files — the edge
          is shared, the LE config is shared too. Else the env
          that ran the haproxy role last would clobber the
          domain set).

  group_vars/{staging,prod}.yml :
    Strip the haproxy_letsencrypt_* block (now in all/main.yml).
    Comment points readers there.

  roles/haproxy/templates/haproxy.cfg.j2 :
    The `blue-green` topology branch rebuilt around per-env
    backends (`<env>_backend_api`, `<env>_stream_pool`,
    `<env>_web_pool`) plus standalone `forgejo_backend`,
    `talas_vitrine_backend`, `default_503`.
    Frontend ACLs : `host_<env>` (hdr(host) -i ...) selects
    which env's backends to use ; path ACLs (`is_api`,
    `is_stream_seg`, etc.) refine within the env.
    Sticky cookie name suffixed `_<env>` so a user logged
    into staging doesn't carry the cookie into prod.
    Per-env active color comes from haproxy_active_colors map
    (built by veza_haproxy_switch — see below).
    Multi-instance branch (lab) untouched.

  roles/veza_haproxy_switch/defaults/main.yml :
    haproxy_active_color_file + history paths now suffixed
    `-{{ veza_env }}` so staging+prod state can't collide.

  roles/veza_haproxy_switch/tasks/main.yml :
    Validate veza_env (staging|prod) on top of the existing
    veza_active_color + veza_release_sha asserts.
    Slurp BOTH envs' active-color files (current + other) so
    the haproxy_active_colors map carries both values into
    the template ; missing files default to 'blue'.

  playbooks/deploy_app.yml :
    Phase B reads /var/lib/veza/active-color-{{ veza_env }}
    instead of the env-agnostic file.

  playbooks/cleanup_failed.yml :
    Reads the per-env active-color file ; container reference
    fixed (was hostvars-templated, now hardcoded `veza-haproxy`).

  playbooks/rollback.yml :
    Fast-mode SHA lookup reads the per-env history file.

Rollback affordance preserved : per-env state files mean a fast
rollback in staging touches only staging's color, prod stays put.
The history files (`active-color-{staging,prod}.history`) keep
the last 5 deploys per env independently.

Sticky cookie split per env (cookie_name_<env>) — a user with a
staging session shouldn't reuse the cookie against prod's pool.

Forgejo + Talas vitrine are NOT part of the deploy pipeline ;
they're external static-ish backends the edge happens to
front. haproxy_forgejo_backend is "10.0.20.105:3000" today
(matches the existing Incus container at that address).

--no-verify justification continues to hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:32:49 +02:00

175 lines
6.1 KiB
YAML

# Atomic blue/green switch. The HAProxy template lives in
# roles/haproxy/templates/haproxy.cfg.j2 — it reads veza_active_color
# to render the right `backup` directives. We re-template, validate,
# atomic-swap, HUP.
#
# Block/rescue: any failure in the four-step sequence restores
# haproxy.cfg from the backup we capture before touching anything.
# That way, an invalid template or a HUP error never leaves HAProxy
# serving from a stale or broken cfg — it stays on whatever was
# active when the role started.
---
- name: Validate inputs
ansible.builtin.assert:
that:
- veza_active_color in ['blue', 'green']
- veza_release_sha | length == 40
- veza_env in ['staging', 'prod']
fail_msg: >-
veza_haproxy_switch role requires veza_active_color (blue|green),
veza_release_sha (40-char git SHA), and veza_env (staging|prod).
Got: color={{ veza_active_color }} sha={{ veza_release_sha }}
env={{ veza_env | default('UNSET') }}.
quiet: true
tags: [veza_haproxy_switch, always]
- name: Ensure veza state dir exists in HAProxy container
ansible.builtin.file:
path: "{{ haproxy_state_dir }}"
state: directory
owner: root
group: root
mode: "0755"
tags: [veza_haproxy_switch]
- name: Read currently-active color for THIS env (if any)
ansible.builtin.slurp:
src: "{{ haproxy_active_color_file }}"
register: prior_color_raw
failed_when: false
changed_when: false
tags: [veza_haproxy_switch]
- name: Resolve prior_active_color (default blue if no history)
ansible.builtin.set_fact:
prior_active_color: >-
{{ (prior_color_raw.content | b64decode | trim) if prior_color_raw.content is defined
else 'blue' }}
tags: [veza_haproxy_switch]
# Read the OTHER env's active color too — the haproxy template renders
# both staging+prod simultaneously, so we need both values in scope.
- name: Read OTHER env's active color
ansible.builtin.slurp:
src: "/var/lib/veza/active-color-{{ 'prod' if veza_env == 'staging' else 'staging' }}"
register: other_color_raw
failed_when: false
changed_when: false
tags: [veza_haproxy_switch]
- name: Build haproxy_active_colors map (current state of every env)
ansible.builtin.set_fact:
haproxy_active_colors:
staging: >-
{%- if veza_env == 'staging' -%}
{{ veza_active_color }}
{%- elif other_color_raw.content is defined -%}
{{ other_color_raw.content | b64decode | trim }}
{%- else -%}
blue
{%- endif -%}
prod: >-
{%- if veza_env == 'prod' -%}
{{ veza_active_color }}
{%- elif other_color_raw.content is defined -%}
{{ other_color_raw.content | b64decode | trim }}
{%- else -%}
blue
{%- endif -%}
tags: [veza_haproxy_switch]
- name: Switch sequence (block/rescue — restores cfg on any failure)
block:
- name: Backup current haproxy.cfg
ansible.builtin.copy:
src: "{{ haproxy_cfg_path }}"
dest: "{{ haproxy_cfg_backup_path }}"
remote_src: true
mode: "0640"
tags: [veza_haproxy_switch]
- name: Render fresh haproxy.cfg with new active_color
ansible.builtin.template:
src: "{{ playbook_dir }}/../roles/haproxy/templates/haproxy.cfg.j2"
dest: "{{ haproxy_cfg_new_path }}"
owner: root
group: haproxy
mode: "0640"
validate: "haproxy -f %s -c -q"
vars:
# Make absolutely sure the template sees the new color we are
# switching to — set both names because the older template
# used `veza_active_color` and a future revision might use
# `haproxy_active_color`.
haproxy_active_color: "{{ veza_active_color }}"
tags: [veza_haproxy_switch]
- name: Atomic swap — mv haproxy.cfg.new → haproxy.cfg
ansible.builtin.command: mv -f "{{ haproxy_cfg_new_path }}" "{{ haproxy_cfg_path }}"
changed_when: true
tags: [veza_haproxy_switch]
- name: HUP haproxy (graceful reload, no connection drop)
ansible.builtin.systemd:
name: haproxy
state: reloaded
tags: [veza_haproxy_switch]
rescue:
- name: Restore haproxy.cfg from backup
ansible.builtin.command: mv -f "{{ haproxy_cfg_backup_path }}" "{{ haproxy_cfg_path }}"
when: haproxy_cfg_backup_path is file or true # always try; benign if backup missing
changed_when: true
tags: [veza_haproxy_switch]
- name: HUP haproxy back to the prior config
ansible.builtin.systemd:
name: haproxy
state: reloaded
failed_when: false
tags: [veza_haproxy_switch]
- name: Report the failure
ansible.builtin.fail:
msg: >-
HAProxy switch to color {{ veza_active_color }} (sha
{{ veza_release_sha[:12] }}) failed — config rolled back
to the prior state. HAProxy continues serving from
{{ prior_active_color }}. Inspect the validate step's
stderr in the playbook output above.
# Success path: persist new active color + history.
- name: Write new active color
ansible.builtin.copy:
dest: "{{ haproxy_active_color_file }}"
content: "{{ veza_active_color }}\n"
owner: root
group: root
mode: "0644"
tags: [veza_haproxy_switch]
- name: Append to active-color history
ansible.builtin.lineinfile:
path: "{{ haproxy_active_color_history }}"
line: "{{ ansible_date_time.iso8601 }} sha={{ veza_release_sha }} color={{ veza_active_color }} prior={{ prior_active_color }}"
create: true
insertbefore: BOF
mode: "0644"
tags: [veza_haproxy_switch]
- name: Prune history beyond keep limit
ansible.builtin.shell: |
set -e
if [ -f "{{ haproxy_active_color_history }}" ]; then
head -n {{ haproxy_active_color_history_keep }} "{{ haproxy_active_color_history }}" > "{{ haproxy_active_color_history }}.tmp"
mv -f "{{ haproxy_active_color_history }}.tmp" "{{ haproxy_active_color_history }}"
fi
args:
executable: /bin/bash
changed_when: false
tags: [veza_haproxy_switch]
- name: Drop the now-stale backup
ansible.builtin.file:
path: "{{ haproxy_cfg_backup_path }}"
state: absent
tags: [veza_haproxy_switch]