All checks were successful
Veza CI / Rust (Stream Server) (push) Successful in 3m49s
Security Scan / Secret Scanning (gitleaks) (push) Successful in 58s
Veza CI / Backend (Go) (push) Successful in 5m59s
Veza CI / Frontend (Web) (push) Successful in 15m22s
E2E Playwright / e2e (full) (push) Successful in 19m34s
Veza CI / Notify on failure (push) Has been skipped
ROADMAP_V1.0_LAUNCH.md §Semaine 2 day 7 deliverable: PgBouncer
fronts the pg_auto_failover formation, the backend pays the
postgres-fork cost 50 times per pool refresh instead of once per
HTTP handler.
Wiring:
veza-backend-api ──libpq──▶ pgaf-pgbouncer:6432 ──libpq──▶ pgaf-primary:5432
(1000 client cap) (50 server pool)
Files:
infra/ansible/roles/pgbouncer/
defaults/main.yml — pool sizes match the acceptance target
(1000 client × 50 server × 10 reserve), pool_mode=transaction
(the only safe mode given the backend's session usage —
LISTEN/NOTIFY and cross-tx prepared statements are forbidden,
neither of which Veza uses), DNS TTL = 60s for failover.
tasks/main.yml — apt install pgbouncer + postgresql-client (so
the pgbench / admin psql lives on the same container), render
pgbouncer.ini + userlist.txt, ensure /var/log/postgresql for
the file log, enable + start service.
templates/pgbouncer.ini.j2 — full config; databases section
points at pgaf-primary.lxd:5432 directly. Failover follows
via DNS TTL until the W2 day 8 pg_autoctl state-change hook
that issues RELOAD on the admin console.
templates/userlist.txt.j2 — only rendered when auth_type !=
trust. Lab uses trust on the bridge subnet; prod gets a
vault-backed list of md5/scram hashes.
handlers/main.yml — RELOAD pgbouncer (graceful, doesn't drop
established clients).
README.md — operational cheatsheet:
- SHOW POOLS / SHOW STATS via the admin console
- the transaction-mode forbids list (LISTEN/NOTIFY etc.)
- failover behaviour today vs after the W2-day-8 hook lands
infra/ansible/playbooks/postgres_ha.yml
Provision step extended to launch pgaf-pgbouncer alongside
the formation containers. Two new plays at the bottom apply
common baseline + pgbouncer role to it.
infra/ansible/inventory/lab.yml
`pgbouncer` group with pgaf-pgbouncer reachable via the
community.general.incus connection plugin (consistent with the
postgres_ha containers).
infra/ansible/tests/test_pgbouncer_load.sh
Acceptance: pgbench 500 clients × 30s × 8 threads against the
pgbouncer endpoint, must report 0 failed transactions and 0
connection errors. Also runs `pgbench -i -s 10` first to
initialise the standard fixture — that init goes through
pgbouncer too, which incidentally validates transaction-mode
compatibility before the load run starts.
Exit codes: 0 / 1 (errors) / 2 (unreachable) / 3 (missing tool).
veza-backend-api/internal/config/config.go
Comment block above DATABASE_URL load — documents the prod
wiring (DATABASE_URL points at pgaf-pgbouncer.lxd:6432, NOT
at pgaf-primary directly). Also notes the dev/CI exception:
direct Postgres because the small scale doesn't benefit from
pooling and tests occasionally lean on session-scoped GUCs
that transaction-mode would break.
Acceptance verified locally:
$ ansible-playbook -i inventory/lab.yml playbooks/postgres_ha.yml \
--syntax-check
playbook: playbooks/postgres_ha.yml ← clean
$ bash -n infra/ansible/tests/test_pgbouncer_load.sh
syntax OK
$ cd veza-backend-api && go build ./...
(clean — comment-only change in config.go)
$ gofmt -l internal/config/config.go
(no output — clean)
Real apply + pgbench run requires the lab R720 + the
community.general collection — operator's call.
Out of scope (deferred per ROADMAP §2):
- HA pgbouncer (single instance per env at v1.0; double
instance + keepalived in v1.1 if needed)
- pg_autoctl state-change hook → pgbouncer RELOAD (W2 day 8)
- Prometheus pgbouncer_exporter (W2 day 9 with the OTel
collector + observability stack)
SKIP_TESTS=1 — IaC YAML + bash + Go comment-only diff.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
86 lines
3.3 KiB
Bash
Executable file
86 lines
3.3 KiB
Bash
Executable file
#!/usr/bin/env bash
|
||
# test_pgbouncer_load.sh — exercise PgBouncer with 500 concurrent
|
||
# clients × 30s, fail unless every connection lands and stays
|
||
# under the query_wait_timeout ceiling.
|
||
#
|
||
# v1.0.9 Day 7 acceptance for ROADMAP_V1.0_LAUNCH.md §Semaine 2:
|
||
# "pgbench 500 clients × 30s sans erreur de connexion".
|
||
#
|
||
# Usage:
|
||
# bash infra/ansible/tests/test_pgbouncer_load.sh
|
||
#
|
||
# Env overrides:
|
||
# PGBOUNCER_HOST default: pgaf-pgbouncer.lxd
|
||
# PGBOUNCER_PORT default: 6432
|
||
# PGBOUNCER_DB default: veza
|
||
# PGBENCH_CLIENTS default: 500
|
||
# PGBENCH_DURATION default: 30
|
||
#
|
||
# Exit codes:
|
||
# 0 — pgbench completed clean (no connection errors, no aborts)
|
||
# 1 — pgbench reported errors during the run
|
||
# 2 — pgbouncer not reachable
|
||
# 3 — required tool missing on host
|
||
|
||
set -euo pipefail
|
||
|
||
PGBOUNCER_HOST=${PGBOUNCER_HOST:-pgaf-pgbouncer.lxd}
|
||
PGBOUNCER_PORT=${PGBOUNCER_PORT:-6432}
|
||
PGBOUNCER_DB=${PGBOUNCER_DB:-veza}
|
||
PGBOUNCER_USER=${PGBOUNCER_USER:-veza}
|
||
PGBENCH_CLIENTS=${PGBENCH_CLIENTS:-500}
|
||
PGBENCH_DURATION=${PGBENCH_DURATION:-30}
|
||
PGBENCH_THREADS=${PGBENCH_THREADS:-8}
|
||
|
||
log() { printf '[%s] %s\n' "$(date +%H:%M:%S)" "$*" >&2; }
|
||
fail() { log "FAIL: $*"; exit "${2:-1}"; }
|
||
require() { command -v "$1" >/dev/null 2>&1 || fail "missing tool: $1" 3; }
|
||
|
||
require pgbench
|
||
require psql
|
||
require awk
|
||
|
||
# 0. Reachability — PgBouncer alive on listen_addr:listen_port.
|
||
log "step 0: probing pgbouncer at ${PGBOUNCER_HOST}:${PGBOUNCER_PORT}"
|
||
if ! psql "host=${PGBOUNCER_HOST} port=${PGBOUNCER_PORT} dbname=${PGBOUNCER_DB} user=${PGBOUNCER_USER} connect_timeout=5" -c 'select 1' >/dev/null 2>&1; then
|
||
fail "pgbouncer not reachable (or app db ${PGBOUNCER_DB} not provisioned). Check the pgbouncer service + the formation primary." 2
|
||
fi
|
||
|
||
# 1. pgbench fixture — initialise the standard pgbench tables ONCE
|
||
# before the load run. The init connects through pgbouncer too,
|
||
# which incidentally checks transaction-mode compatibility.
|
||
log "step 1: initialising pgbench fixture (scale=10)"
|
||
if ! pgbench -h "${PGBOUNCER_HOST}" -p "${PGBOUNCER_PORT}" -U "${PGBOUNCER_USER}" -d "${PGBOUNCER_DB}" -i -s 10 --no-vacuum 2>&1 | tail -20 >&2; then
|
||
fail "pgbench -i failed — check pgbouncer auth / pool_mode" 1
|
||
fi
|
||
|
||
# 2. Load run.
|
||
log "step 2: pgbench ${PGBENCH_CLIENTS} clients × ${PGBENCH_DURATION}s × ${PGBENCH_THREADS} threads"
|
||
out=$(pgbench \
|
||
-h "${PGBOUNCER_HOST}" \
|
||
-p "${PGBOUNCER_PORT}" \
|
||
-U "${PGBOUNCER_USER}" \
|
||
-d "${PGBOUNCER_DB}" \
|
||
-c "${PGBENCH_CLIENTS}" \
|
||
-j "${PGBENCH_THREADS}" \
|
||
-T "${PGBENCH_DURATION}" \
|
||
--no-vacuum \
|
||
-P 5 \
|
||
-r 2>&1)
|
||
|
||
echo "$out" | sed 's/^/ /' >&2
|
||
|
||
# pgbench reports "number of failed transactions: N (X.XX%)" — anything
|
||
# > 0 fails the test. Also catch outright "connection refused" errors
|
||
# from the runner output.
|
||
failed_tx=$(echo "$out" | awk '/number of failed transactions:/ { print $5; exit }' | tr -d ',()')
|
||
failed_tx=${failed_tx:-0}
|
||
conn_errors=$(echo "$out" | grep -ciE 'connection (refused|reset|timeout)' || true)
|
||
|
||
log "verdict: failed_tx=${failed_tx} conn_errors=${conn_errors}"
|
||
if [ "${failed_tx}" != "0" ] || [ "${conn_errors}" -gt 0 ]; then
|
||
fail "pgbench surfaced errors — pool sizing, query_wait_timeout, or upstream is the bottleneck"
|
||
fi
|
||
|
||
log "PASS: pgbench ${PGBENCH_CLIENTS} clients × ${PGBENCH_DURATION}s clean"
|
||
exit 0
|