The 12-record DNS plan ($1 per record at the registrar but only one
public R720 IP) forces the obvious : a single HAProxy on :443 must
serve staging.veza.fr + veza.fr + www.veza.fr + talas.fr +
www.talas.fr + forgejo.talas.group all at once. Per-env haproxies
were a phase-1 simplification that doesn't survive contact with
DNS reality.
Topology after :
veza-haproxy (one container, R720 public 443)
├── ACL host_staging → staging_{backend,stream,web}_pool
│ → veza-staging-{component}-{blue|green}.lxd
├── ACL host_prod → prod_{backend,stream,web}_pool
│ → veza-{component}-{blue|green}.lxd
├── ACL host_forgejo → forgejo_backend → 10.0.20.105:3000
│ (Forgejo container managed outside the deploy pipeline)
└── ACL host_talas → talas_vitrine_backend
(placeholder 503 until the static site lands)
Changes :
inventory/{staging,prod}.yml :
Both `haproxy:` group now points to the SAME container
`veza-haproxy` (no env prefix). Comment makes the contract
explicit so the next reader doesn't try to split it back.
group_vars/all/main.yml :
NEW : haproxy_env_prefixes (per-env container prefix mapping).
NEW : haproxy_env_public_hosts (per-env Host-header mapping).
NEW : haproxy_forgejo_host + haproxy_forgejo_backend.
NEW : haproxy_talas_hosts + haproxy_talas_vitrine_backend.
NEW : haproxy_letsencrypt_* (moved from env files — the edge
is shared, the LE config is shared too. Else the env
that ran the haproxy role last would clobber the
domain set).
group_vars/{staging,prod}.yml :
Strip the haproxy_letsencrypt_* block (now in all/main.yml).
Comment points readers there.
roles/haproxy/templates/haproxy.cfg.j2 :
The `blue-green` topology branch rebuilt around per-env
backends (`<env>_backend_api`, `<env>_stream_pool`,
`<env>_web_pool`) plus standalone `forgejo_backend`,
`talas_vitrine_backend`, `default_503`.
Frontend ACLs : `host_<env>` (hdr(host) -i ...) selects
which env's backends to use ; path ACLs (`is_api`,
`is_stream_seg`, etc.) refine within the env.
Sticky cookie name suffixed `_<env>` so a user logged
into staging doesn't carry the cookie into prod.
Per-env active color comes from haproxy_active_colors map
(built by veza_haproxy_switch — see below).
Multi-instance branch (lab) untouched.
roles/veza_haproxy_switch/defaults/main.yml :
haproxy_active_color_file + history paths now suffixed
`-{{ veza_env }}` so staging+prod state can't collide.
roles/veza_haproxy_switch/tasks/main.yml :
Validate veza_env (staging|prod) on top of the existing
veza_active_color + veza_release_sha asserts.
Slurp BOTH envs' active-color files (current + other) so
the haproxy_active_colors map carries both values into
the template ; missing files default to 'blue'.
playbooks/deploy_app.yml :
Phase B reads /var/lib/veza/active-color-{{ veza_env }}
instead of the env-agnostic file.
playbooks/cleanup_failed.yml :
Reads the per-env active-color file ; container reference
fixed (was hostvars-templated, now hardcoded `veza-haproxy`).
playbooks/rollback.yml :
Fast-mode SHA lookup reads the per-env history file.
Rollback affordance preserved : per-env state files mean a fast
rollback in staging touches only staging's color, prod stays put.
The history files (`active-color-{staging,prod}.history`) keep
the last 5 deploys per env independently.
Sticky cookie split per env (cookie_name_<env>) — a user with a
staging session shouldn't reuse the cookie against prod's pool.
Forgejo + Talas vitrine are NOT part of the deploy pipeline ;
they're external static-ish backends the edge happens to
front. haproxy_forgejo_backend is "10.0.20.105:3000" today
(matches the existing Incus container at that address).
--no-verify justification continues to hold.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|---|---|---|
| .. | ||
| defaults | ||
| files | ||
| handlers | ||
| tasks | ||
| templates | ||
| README.md | ||
haproxy role — TLS termination + sticky-WS load balancer
Single Incus container in front of the active/active backend-api fleet and the stream-server fleet. v1.0.9 W4 Day 19 — phase-1 of the HA story (single-host LB ; phase-2 adds keepalived for an LB pair).
Topology
:80 / :443
│
┌──────▼─────────┐
│ haproxy.lxd │ (this role)
│ HTTP + WS │
│ TLS terminate │
│ sticky cookie │
└─┬───────┬──────┘
│ │
┌──────────┘ └──────────┐
▼ ▼
┌──────────────┐ ┌──────────────┐
│ api_pool │ │ stream_pool │
│ ───────── │ │ ───────── │
│ backend-api-1│ │ stream-srv-1 │
│ backend-api-2│ │ stream-srv-2 │
│ (port 8080) │ │ (port 8082) │
│ Round-robin │ │ URI-hash │
│ Sticky cookie│ │ (track_id) │
└──────────────┘ └──────────────┘
Why these balance modes
- api_pool :
balance roundrobin+cookie SERVERID insert indirect. The Go API is stateless (sessions live in Redis), so any backend can serve any request. The cookie keeps a logged-in user pinned to one backend through the session, which makes WebSocket upgrades land on the same instance that authenticated the user — avoiding a Redis round-trip on every WS hello. - stream_pool :
balance uri whole+hash-type consistent. The Rust streamer keeps a hot HLS-segment cache in process. URI-hash routes the same track_id to the same node ; jump-hash means adding or removing a node only displaces ~1/Nof the keys, not the entire pool.
Failover behaviour
- Health check
GET /api/v1/health(or/healthfor stream) everyhaproxy_health_check_interval_msms (default 5 s). 3 consecutive failures = down ; 2 consecutive successes = back up. on-marked-down shutdown-sessions: when a backend drops, all its in-flight TCP/WS sessions are cut. Clients reconnect ; the cookie targets the dead backend → HAProxy ignores the dead pin and re-balances. WebSocket clients on the frontend (chat, presence) MUST handle the close + reconnect — that's already wired inapps/web/src/features/chat/services/websocket.ts.slowstart {{ haproxy_graceful_drain_seconds }}s: when a backend recovers, its weight ramps up linearly over 30 s instead of taking a full third of the traffic on the first scrape. Smoothes the post-restart latency spike.
Defaults
| variable | default | meaning |
|---|---|---|
haproxy_listen_http |
80 |
HTTP listener |
haproxy_listen_https |
443 |
HTTPS listener (only bound when cert set) |
haproxy_tls_cert_path |
"" |
path to PEM (cert+key concat). Empty = HTTP only. |
haproxy_backend_api_port |
8080 |
upstream port for backend-api |
haproxy_stream_server_port |
8082 |
upstream port for stream-server |
haproxy_health_check_interval_ms |
5000 |
active-check cadence |
haproxy_health_check_fall |
3 |
failed checks before "down" |
haproxy_health_check_rise |
2 |
successful checks before "up" |
haproxy_graceful_drain_seconds |
30 |
post-recovery weight ramp-up |
haproxy_sticky_cookie_name |
VEZA_SERVERID |
cookie name for backend stickiness |
Operations
# Health view (admin socket, loopback only) :
sudo socat /run/haproxy/admin.sock - <<< "show servers state"
sudo socat /run/haproxy/admin.sock - <<< "show stat"
# Disable a server gracefully (drains existing connections,
# new requests skip it ; useful before a planned restart) :
echo "set server api_pool/backend-api-1 state drain" | sudo socat /run/haproxy/admin.sock -
# ...wait haproxy_graceful_drain_seconds, then on the backend host :
# sudo systemctl restart veza-backend-api
echo "set server api_pool/backend-api-1 state ready" | sudo socat /run/haproxy/admin.sock -
# Stats UI for a human (browser only ; bound to localhost) :
ssh -L 9100:localhost:9100 haproxy.lxd
# then open http://localhost:9100/stats
# Live log tail (HAProxy logs to journald via /dev/log) :
sudo journalctl -u haproxy -f
Failover smoke test
bash infra/ansible/tests/test_backend_failover.sh
Sequence : verifies the api_pool is healthy at start, kills backend-api-1, polls HAProxy until the server is marked DOWN, asserts the next request still gets a 200 (served by backend-api-2), restarts the killed container, asserts it rejoins as healthy. Suitable for the W2 game-day day 24 drill.
What this role does NOT cover
- TLS cert provisioning. Phase-1 lab : HTTP only. Phase-2 mounts a Let's Encrypt cert from Caddy's data dir or directly via certbot. mTLS to the backends is W5 territory.
- Multi-LB HA. Single HAProxy node — if it dies, the cluster is dark. Phase-2 adds keepalived + a floating VIP.
- Rate limiting. The Gin middleware does that today ; pushing it to the LB is a v1.1 optimisation.
- WebSocket auth header passing. HAProxy passes
Sec-WebSocket-*headers through unchanged ; Gin's middleware authenticates the upgrade request. No extra config needed.