feat(infra): haproxy sticky WS + backend_api multi-instance scaffold (W4 Day 19)
Phase-1 of the active/active backend story. HAProxy in front of two
backend-api containers + two stream-server containers ; sticky cookie
pins WS sessions to one backend, URI hash routes track_id to one
streamer for HLS cache locality.
Day 19 acceptance asks for : kill backend-api-1, HAProxy bascule, WS
sessions reconnect to backend-api-2 sans perte. The smoke test wires
that gate ; phase-2 (W5) will add keepalived for an LB pair.
- infra/ansible/roles/haproxy/
* Install HAProxy + render haproxy.cfg with frontend (HTTP, optional
HTTPS via haproxy_tls_cert_path), api_pool (round-robin + sticky
cookie SERVERID), stream_pool (URI-hash + consistent jump-hash).
* Active health check GET /api/v1/health every 5s ; fall=3, rise=2.
on-marked-down shutdown-sessions + slowstart 30s on recovery.
* Stats socket bound to 127.0.0.1:9100 for the future prometheus
haproxy_exporter sidecar.
* Mozilla Intermediate TLS cipher list ; only effective when a cert
is mounted.
- infra/ansible/roles/backend_api/
* Scaffolding for the multi-instance Go API. Creates veza-api
system user, /opt/veza/backend-api dir, /etc/veza env dir,
/var/log/veza, and a hardened systemd unit pointing at the binary.
* Binary deployment is OUT of scope (documented in README) — the
Go binary is built outside Ansible (Makefile target) and pushed
via incus file push. CI → ansible-pull integration is W5+.
- infra/ansible/playbooks/haproxy.yml : provisions the haproxy Incus
container + applies common baseline + role.
- infra/ansible/inventory/lab.yml : 3 new groups :
* haproxy (single LB node)
* backend_api_instances (backend-api-{1,2})
* stream_server_instances (stream-server-{1,2})
HAProxy template reads these groups directly to populate its
upstream blocks ; falls back to the static haproxy_backend_api_fallback
list if the group is missing (for in-isolation tests).
- infra/ansible/tests/test_backend_failover.sh
* step 0 : pre-flight — both backends UP per HAProxy stats socket.
* step 1 : 5 baseline GET /api/v1/health through the LB → all 200.
* step 2 : incus stop --force backend-api-1 ; record t0.
* step 3 : poll HAProxy stats until backend-api-1 is DOWN
(timeout 30s ; expected ~ 15s = fall × interval).
* step 4 : 5 GET requests during the down window — all must 200
(served by backend-api-2). Fails if any returns non-200.
* step 5 : incus start backend-api-1 ; poll until UP again.
Acceptance (Day 19) : smoke test passes ; HAProxy sticky cookie
keeps WS sessions on the same backend until that backend dies, at
which point the cookie is ignored and the request rebalances.
W4 progress : Day 16 done · Day 17 done · Day 18 done · Day 19 done ·
Day 20 (k6 nightly load test) pending.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 09:32:48 +00:00
|
|
|
# haproxy defaults — TLS-terminating frontend + backend pools for the
|
|
|
|
|
# stateless backend-api fleet and the stream server. v1.0.9 W4 Day 19.
|
|
|
|
|
#
|
|
|
|
|
# Topology :
|
|
|
|
|
#
|
|
|
|
|
# client → :443 HAProxy (TLS) → backend-api-1.lxd:8080
|
|
|
|
|
# → backend-api-2.lxd:8080
|
|
|
|
|
# → stream-server-1.lxd:8082 (track_id hash)
|
|
|
|
|
# → stream-server-2.lxd:8082
|
|
|
|
|
#
|
|
|
|
|
# WebSocket affinity : HAProxy sets `SERVERID` cookie on the first
|
|
|
|
|
# response ; subsequent requests (HTTP + WS upgrade) carry the cookie
|
|
|
|
|
# back to the same backend. The cookie survives across page loads so
|
|
|
|
|
# a chat session reconnecting after a 30s pause typically lands on the
|
|
|
|
|
# same instance — but if the original instance is offline, the cookie
|
|
|
|
|
# is ignored and the next-best healthy backend takes over.
|
|
|
|
|
---
|
|
|
|
|
haproxy_version: "2.8" # Ubuntu 22.04 ships 2.4 ; we explicitly install 2.8 from PPA
|
|
|
|
|
|
2026-04-29 13:54:05 +00:00
|
|
|
# Listeners. v1.0 lab : HTTP only (no TLS, lab is single-host). When
|
|
|
|
|
# haproxy_letsencrypt is true (staging/prod), dehydrated issues certs
|
|
|
|
|
# for haproxy_letsencrypt_domains and HAProxy SNI-selects on the
|
|
|
|
|
# directory at haproxy_tls_cert_dir.
|
feat(infra): haproxy sticky WS + backend_api multi-instance scaffold (W4 Day 19)
Phase-1 of the active/active backend story. HAProxy in front of two
backend-api containers + two stream-server containers ; sticky cookie
pins WS sessions to one backend, URI hash routes track_id to one
streamer for HLS cache locality.
Day 19 acceptance asks for : kill backend-api-1, HAProxy bascule, WS
sessions reconnect to backend-api-2 sans perte. The smoke test wires
that gate ; phase-2 (W5) will add keepalived for an LB pair.
- infra/ansible/roles/haproxy/
* Install HAProxy + render haproxy.cfg with frontend (HTTP, optional
HTTPS via haproxy_tls_cert_path), api_pool (round-robin + sticky
cookie SERVERID), stream_pool (URI-hash + consistent jump-hash).
* Active health check GET /api/v1/health every 5s ; fall=3, rise=2.
on-marked-down shutdown-sessions + slowstart 30s on recovery.
* Stats socket bound to 127.0.0.1:9100 for the future prometheus
haproxy_exporter sidecar.
* Mozilla Intermediate TLS cipher list ; only effective when a cert
is mounted.
- infra/ansible/roles/backend_api/
* Scaffolding for the multi-instance Go API. Creates veza-api
system user, /opt/veza/backend-api dir, /etc/veza env dir,
/var/log/veza, and a hardened systemd unit pointing at the binary.
* Binary deployment is OUT of scope (documented in README) — the
Go binary is built outside Ansible (Makefile target) and pushed
via incus file push. CI → ansible-pull integration is W5+.
- infra/ansible/playbooks/haproxy.yml : provisions the haproxy Incus
container + applies common baseline + role.
- infra/ansible/inventory/lab.yml : 3 new groups :
* haproxy (single LB node)
* backend_api_instances (backend-api-{1,2})
* stream_server_instances (stream-server-{1,2})
HAProxy template reads these groups directly to populate its
upstream blocks ; falls back to the static haproxy_backend_api_fallback
list if the group is missing (for in-isolation tests).
- infra/ansible/tests/test_backend_failover.sh
* step 0 : pre-flight — both backends UP per HAProxy stats socket.
* step 1 : 5 baseline GET /api/v1/health through the LB → all 200.
* step 2 : incus stop --force backend-api-1 ; record t0.
* step 3 : poll HAProxy stats until backend-api-1 is DOWN
(timeout 30s ; expected ~ 15s = fall × interval).
* step 4 : 5 GET requests during the down window — all must 200
(served by backend-api-2). Fails if any returns non-200.
* step 5 : incus start backend-api-1 ; poll until UP again.
Acceptance (Day 19) : smoke test passes ; HAProxy sticky cookie
keeps WS sessions on the same backend until that backend dies, at
which point the cookie is ignored and the request rebalances.
W4 progress : Day 16 done · Day 17 done · Day 18 done · Day 19 done ·
Day 20 (k6 nightly load test) pending.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 09:32:48 +00:00
|
|
|
haproxy_listen_http: 80
|
|
|
|
|
haproxy_listen_https: 443
|
|
|
|
|
haproxy_listen_stats: 9100 # admin socket bind ; reachable on Incus bridge only
|
2026-04-29 13:54:05 +00:00
|
|
|
haproxy_tls_cert_path: "" # empty = static-cert HTTPS bind disabled (use crt-dir form below)
|
|
|
|
|
haproxy_tls_cert_dir: /usr/local/etc/tls/haproxy
|
|
|
|
|
|
|
|
|
|
# Let's Encrypt — HTTP-01 challenge via dehydrated. Wildcards NOT
|
|
|
|
|
# supported (those need DNS-01) ; list subdomains explicitly.
|
|
|
|
|
# Format of domain entries : "primary.tld san1.tld san2.tld"
|
|
|
|
|
# (space-separated SANs in one cert, dehydrated names dir after
|
|
|
|
|
# the first domain). One entry per cert.
|
|
|
|
|
haproxy_letsencrypt: false
|
|
|
|
|
haproxy_letsencrypt_email: ""
|
|
|
|
|
haproxy_letsencrypt_domains: []
|
feat(infra): haproxy sticky WS + backend_api multi-instance scaffold (W4 Day 19)
Phase-1 of the active/active backend story. HAProxy in front of two
backend-api containers + two stream-server containers ; sticky cookie
pins WS sessions to one backend, URI hash routes track_id to one
streamer for HLS cache locality.
Day 19 acceptance asks for : kill backend-api-1, HAProxy bascule, WS
sessions reconnect to backend-api-2 sans perte. The smoke test wires
that gate ; phase-2 (W5) will add keepalived for an LB pair.
- infra/ansible/roles/haproxy/
* Install HAProxy + render haproxy.cfg with frontend (HTTP, optional
HTTPS via haproxy_tls_cert_path), api_pool (round-robin + sticky
cookie SERVERID), stream_pool (URI-hash + consistent jump-hash).
* Active health check GET /api/v1/health every 5s ; fall=3, rise=2.
on-marked-down shutdown-sessions + slowstart 30s on recovery.
* Stats socket bound to 127.0.0.1:9100 for the future prometheus
haproxy_exporter sidecar.
* Mozilla Intermediate TLS cipher list ; only effective when a cert
is mounted.
- infra/ansible/roles/backend_api/
* Scaffolding for the multi-instance Go API. Creates veza-api
system user, /opt/veza/backend-api dir, /etc/veza env dir,
/var/log/veza, and a hardened systemd unit pointing at the binary.
* Binary deployment is OUT of scope (documented in README) — the
Go binary is built outside Ansible (Makefile target) and pushed
via incus file push. CI → ansible-pull integration is W5+.
- infra/ansible/playbooks/haproxy.yml : provisions the haproxy Incus
container + applies common baseline + role.
- infra/ansible/inventory/lab.yml : 3 new groups :
* haproxy (single LB node)
* backend_api_instances (backend-api-{1,2})
* stream_server_instances (stream-server-{1,2})
HAProxy template reads these groups directly to populate its
upstream blocks ; falls back to the static haproxy_backend_api_fallback
list if the group is missing (for in-isolation tests).
- infra/ansible/tests/test_backend_failover.sh
* step 0 : pre-flight — both backends UP per HAProxy stats socket.
* step 1 : 5 baseline GET /api/v1/health through the LB → all 200.
* step 2 : incus stop --force backend-api-1 ; record t0.
* step 3 : poll HAProxy stats until backend-api-1 is DOWN
(timeout 30s ; expected ~ 15s = fall × interval).
* step 4 : 5 GET requests during the down window — all must 200
(served by backend-api-2). Fails if any returns non-200.
* step 5 : incus start backend-api-1 ; poll until UP again.
Acceptance (Day 19) : smoke test passes ; HAProxy sticky cookie
keeps WS sessions on the same backend until that backend dies, at
which point the cookie is ignored and the request rebalances.
W4 progress : Day 16 done · Day 17 done · Day 18 done · Day 19 done ·
Day 20 (k6 nightly load test) pending.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 09:32:48 +00:00
|
|
|
|
|
|
|
|
# Backend API pool — port 8080 per default (Gin server in cmd/api).
|
|
|
|
|
# The inventory's `backend_api_instances` group drives the upstream
|
|
|
|
|
# server list ; if absent, the role falls back to the static defaults
|
|
|
|
|
# below so the role is testable in isolation.
|
|
|
|
|
haproxy_backend_api_port: 8080
|
|
|
|
|
haproxy_backend_api_fallback:
|
|
|
|
|
- backend-api-1
|
|
|
|
|
- backend-api-2
|
|
|
|
|
|
|
|
|
|
# Stream server pool — port 8082 (Rust Axum). Uses URI-hash balance so
|
|
|
|
|
# the same track_id consistently lands on the same node, maximising the
|
|
|
|
|
# in-process HLS cache hit rate.
|
|
|
|
|
haproxy_stream_server_port: 8082
|
|
|
|
|
haproxy_stream_server_fallback:
|
|
|
|
|
- stream-server-1
|
|
|
|
|
- stream-server-2
|
|
|
|
|
|
|
|
|
|
# Health check cadence + drain — Day 19 acceptance asks for 5s checks
|
|
|
|
|
# and 30s drain before remove.
|
|
|
|
|
haproxy_health_check_interval_ms: 5000
|
|
|
|
|
haproxy_health_check_fall: 3 # 3 failed checks = down
|
|
|
|
|
haproxy_health_check_rise: 2 # 2 passed checks = back up
|
|
|
|
|
haproxy_graceful_drain_seconds: 30
|
|
|
|
|
|
|
|
|
|
# Sticky cookie name. Rotating it bumps the SERVERID and forces a
|
|
|
|
|
# rebalance — useful after a config change that reshapes the pool.
|
|
|
|
|
haproxy_sticky_cookie_name: "VEZA_SERVERID"
|