veza/infra/ansible/roles/pgbouncer/defaults/main.yml
senke ba6e8b4e0e
All checks were successful
Veza CI / Rust (Stream Server) (push) Successful in 3m49s
Security Scan / Secret Scanning (gitleaks) (push) Successful in 58s
Veza CI / Backend (Go) (push) Successful in 5m59s
Veza CI / Frontend (Web) (push) Successful in 15m22s
E2E Playwright / e2e (full) (push) Successful in 19m34s
Veza CI / Notify on failure (push) Has been skipped
feat(infra): pgbouncer role + pgbench load test (W2 Day 7)
ROADMAP_V1.0_LAUNCH.md §Semaine 2 day 7 deliverable: PgBouncer
fronts the pg_auto_failover formation, the backend pays the
postgres-fork cost 50 times per pool refresh instead of once per
HTTP handler.

Wiring:
  veza-backend-api ──libpq──▶ pgaf-pgbouncer:6432 ──libpq──▶ pgaf-primary:5432
                              (1000 client cap)             (50 server pool)

Files:
  infra/ansible/roles/pgbouncer/
    defaults/main.yml — pool sizes match the acceptance target
      (1000 client × 50 server × 10 reserve), pool_mode=transaction
      (the only safe mode given the backend's session usage —
      LISTEN/NOTIFY and cross-tx prepared statements are forbidden,
      neither of which Veza uses), DNS TTL = 60s for failover.
    tasks/main.yml — apt install pgbouncer + postgresql-client (so
      the pgbench / admin psql lives on the same container), render
      pgbouncer.ini + userlist.txt, ensure /var/log/postgresql for
      the file log, enable + start service.
    templates/pgbouncer.ini.j2 — full config; databases section
      points at pgaf-primary.lxd:5432 directly. Failover follows
      via DNS TTL until the W2 day 8 pg_autoctl state-change hook
      that issues RELOAD on the admin console.
    templates/userlist.txt.j2 — only rendered when auth_type !=
      trust. Lab uses trust on the bridge subnet; prod gets a
      vault-backed list of md5/scram hashes.
    handlers/main.yml — RELOAD pgbouncer (graceful, doesn't drop
      established clients).
    README.md — operational cheatsheet:
      - SHOW POOLS / SHOW STATS via the admin console
      - the transaction-mode forbids list (LISTEN/NOTIFY etc.)
      - failover behaviour today vs after the W2-day-8 hook lands

  infra/ansible/playbooks/postgres_ha.yml
    Provision step extended to launch pgaf-pgbouncer alongside
    the formation containers. Two new plays at the bottom apply
    common baseline + pgbouncer role to it.

  infra/ansible/inventory/lab.yml
    `pgbouncer` group with pgaf-pgbouncer reachable via the
    community.general.incus connection plugin (consistent with the
    postgres_ha containers).

  infra/ansible/tests/test_pgbouncer_load.sh
    Acceptance: pgbench 500 clients × 30s × 8 threads against the
    pgbouncer endpoint, must report 0 failed transactions and 0
    connection errors. Also runs `pgbench -i -s 10` first to
    initialise the standard fixture — that init goes through
    pgbouncer too, which incidentally validates transaction-mode
    compatibility before the load run starts.
    Exit codes: 0 / 1 (errors) / 2 (unreachable) / 3 (missing tool).

  veza-backend-api/internal/config/config.go
    Comment block above DATABASE_URL load — documents the prod
    wiring (DATABASE_URL points at pgaf-pgbouncer.lxd:6432, NOT
    at pgaf-primary directly). Also notes the dev/CI exception:
    direct Postgres because the small scale doesn't benefit from
    pooling and tests occasionally lean on session-scoped GUCs
    that transaction-mode would break.

Acceptance verified locally:
  $ ansible-playbook -i inventory/lab.yml playbooks/postgres_ha.yml \
      --syntax-check
  playbook: playbooks/postgres_ha.yml          ← clean
  $ bash -n infra/ansible/tests/test_pgbouncer_load.sh
  syntax OK
  $ cd veza-backend-api && go build ./...
  (clean — comment-only change in config.go)
  $ gofmt -l internal/config/config.go
  (no output — clean)

Real apply + pgbench run requires the lab R720 + the
community.general collection — operator's call.

Out of scope (deferred per ROADMAP §2):
  - HA pgbouncer (single instance per env at v1.0; double
    instance + keepalived in v1.1 if needed)
  - pg_autoctl state-change hook → pgbouncer RELOAD (W2 day 8)
  - Prometheus pgbouncer_exporter (W2 day 9 with the OTel
    collector + observability stack)

SKIP_TESTS=1 — IaC YAML + bash + Go comment-only diff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 18:35:05 +02:00

61 lines
2.6 KiB
YAML

# PgBouncer connection pool — fronts the pg_auto_failover formation
# so the backend keeps a stable pool of cheap client-side connections
# (1000 capacity) backed by a small pool of expensive Postgres
# connections (50). Without this, every Go HTTP handler that opened
# a transaction was paying the ~1ms forking overhead of a fresh
# Postgres backend.
#
# Mode: transaction. Connections are returned to the pool after each
# transaction, NOT after each statement (statement mode breaks
# prepared statements + session-local features) and NOT after
# disconnect (session mode = no real pooling). The Veza backend uses
# session-scoped features in a few places (SET LOCAL inside a tx,
# advisory locks per tx, etc.) but no cross-transaction session
# state — transaction mode fits.
---
# Listen address inside the container — exposed to the Incus bridge
# so any peer container resolves it via `pgaf-pgbouncer.lxd:6432`.
pgbouncer_listen_addr: 0.0.0.0
pgbouncer_listen_port: 6432
# Pool sizes per the v1.0.9 Day 7 acceptance: 1000 client connections
# capacity, 50 actual postgres connections behind. Tune in
# group_vars/<env>.yml when load profiles are baselined.
pgbouncer_max_client_conn: 1000
pgbouncer_default_pool_size: 50
pgbouncer_min_pool_size: 10
pgbouncer_reserve_pool_size: 10
pgbouncer_reserve_pool_timeout: 5
# Transaction mode is the only one the Veza backend can use safely —
# see the role-level comment block above. Override only in tests
# that explicitly benchmark statement vs transaction mode.
pgbouncer_pool_mode: transaction
pgbouncer_server_reset_query: DISCARD ALL
# Upstream — the pg_auto_failover formation primary. Day 7 lab points
# at the primary container directly; W2 day 8 (or v1.1) wires
# pgbouncer reload on failover via a pg_autoctl callback.
pgbouncer_upstream_host: pgaf-primary.lxd
pgbouncer_upstream_port: 5432
pgbouncer_upstream_dbname: veza
pgbouncer_upstream_user: veza
# Auth — trust on the lab bridge (10.99.0.0/24 + container-only).
# Prod overrides to scram-sha-256 + a userlist managed by
# `pg_autoctl pgbouncer-userlist` or equivalent.
pgbouncer_auth_type: trust
pgbouncer_auth_file: /etc/pgbouncer/userlist.txt
# When auth_type=md5/scram-sha-256, this list is rendered into
# userlist.txt. Format: { user: "name", password: "hash" }. md5 hashes
# are 'md5' + md5(password+user). For lab/trust we leave this empty.
pgbouncer_users: []
# Admin console — `psql -h <host> -p 6432 pgbouncer` for SHOW POOLS,
# RELOAD, etc. Restricted to the postgres + admins user. Lab default
# trusts the bridge; prod tightens to a unix socket only.
pgbouncer_admin_users:
- postgres
- veza
pgbouncer_stats_users:
- postgres