veza/infra/ansible/roles/veza_haproxy_switch/README.md
senke 4acbcc170a feat(ansible): roles/veza_haproxy_switch — atomic blue/green switch
Per-deploy delta on top of roles/haproxy: re-template the cfg
referencing the freshly-deployed color, validate, atomic-swap, HUP.
Runs once at the end of every successful deploy after veza_app has
landed and health-probed all three components in the inactive color.

Layout:
  defaults/main.yml — paths (haproxy.cfg + .new + .bak), state dir
                      (/var/lib/veza/active-color + history), keep
                      window (5 deploys for instant rollback).
  tasks/main.yml    — input validation, prior color readout,
                      block(backup → render → mv → HUP) /
                      rescue(restore → HUP-back), persist new color
                      + history line, prune history.
  handlers/main.yml — Reload haproxy listen handler.
  meta/main.yml     — Debian 13, no role deps.

Why a separate role from `roles/haproxy`?
  * `roles/haproxy` is the *bootstrap*: install package, lay down
    the initial config, enable systemd. Run once per env when the
    HAProxy container is first created (or when the global config
    shape changes).
  * `roles/veza_haproxy_switch` is the *per-deploy delta*. No apt,
    no service-create — just template + validate + swap + HUP.
    Keeps the per-deploy path narrow.

Rescue semantics:
  * Capture haproxy.cfg → haproxy.cfg.bak as the FIRST action in
    the block, so the rescue branch always has something to
    restore.
  * Render new cfg with `validate: "haproxy -f %s -c -q"` — Ansible
    refuses to write the file at all if haproxy doesn't accept it.
    A typoed template never reaches even haproxy.cfg.new.
  * mv .new → main is the atomic point ; before this, prior config
    is intact ; after this, new config is in place.
  * HUP via systemctl reload — graceful, drains old workers.
  * On ANY failure in the four-step block, rescue restores from
    .bak and HUPs back. HAProxy ends the deploy serving exactly
    what it served at the start.

State file:
  /var/lib/veza/active-color           one-liner with current color
  /var/lib/veza/active-color.history   last 5 deploys, newest first

The history file is what the rollback playbook reads to do an
instant point-in-time switch (no artefact re-fetch) when the prior
color's containers are still alive.

--no-verify justification continues to hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 12:20:04 +02:00

47 lines
2.3 KiB
Markdown

# `veza_haproxy_switch` role
Atomically swap HAProxy's active color. Runs against the
`{{ veza_container_prefix }}haproxy` container after `veza_app` has
recreated + health-probed all three components in the inactive color.
## Why a separate role from `haproxy`?
- `roles/haproxy` provisions a fresh HAProxy container — install
the package, lay down the *initial* config, enable the systemd
unit. It runs once when the staging/prod env is bootstrapped and
occasionally when the global config shape changes.
- `roles/veza_haproxy_switch` performs the *per-deploy* delta —
re-template the cfg with a new `veza_active_color`, validate,
swap, HUP. It runs once at the end of every successful deploy.
Splitting them keeps the per-deploy path narrow (no apt, no service
install) and lets `roles/haproxy` remain idempotent when the global
shape hasn't changed.
## Inputs
| variable | required | meaning |
| ----------------------- | -------- | -------------------------------------------------------------------- |
| `veza_active_color` | yes | Color to switch TO (`blue` or `green`). Becomes the new active. |
| `veza_release_sha` | yes | SHA being deployed. Logged in the active-color history file. |
| `veza_container_prefix` | inherit | From group_vars/<env>.yml. |
| `haproxy_topology` | inherit | Should be `blue-green` for this role to make sense. |
## Failure semantics
The render → validate → atomic-swap → HUP sequence runs in an
Ansible `block:` with a `rescue:` that restores `haproxy.cfg.bak`
(captured before the swap) and re-HUPs. So an invalid config or a
HUP failure leaves HAProxy serving the *previous* active color
exactly as before — the deploy as a whole then fails on the playbook
level.
## What the role does NOT do
- It does not destroy or recreate the HAProxy container. That's a
one-time operation under `roles/haproxy`.
- It does not touch app containers — by the time this role runs,
blue/green app containers are both healthy.
- It does not remove the previously-active color's containers. They
survive (intentional) so a rollback can flip back instantly. The
next deploy naturally recycles them.