veza/infra/ansible/inventory/lab.yml
senke ba6e8b4e0e
All checks were successful
Veza CI / Rust (Stream Server) (push) Successful in 3m49s
Security Scan / Secret Scanning (gitleaks) (push) Successful in 58s
Veza CI / Backend (Go) (push) Successful in 5m59s
Veza CI / Frontend (Web) (push) Successful in 15m22s
E2E Playwright / e2e (full) (push) Successful in 19m34s
Veza CI / Notify on failure (push) Has been skipped
feat(infra): pgbouncer role + pgbench load test (W2 Day 7)
ROADMAP_V1.0_LAUNCH.md §Semaine 2 day 7 deliverable: PgBouncer
fronts the pg_auto_failover formation, the backend pays the
postgres-fork cost 50 times per pool refresh instead of once per
HTTP handler.

Wiring:
  veza-backend-api ──libpq──▶ pgaf-pgbouncer:6432 ──libpq──▶ pgaf-primary:5432
                              (1000 client cap)             (50 server pool)

Files:
  infra/ansible/roles/pgbouncer/
    defaults/main.yml — pool sizes match the acceptance target
      (1000 client × 50 server × 10 reserve), pool_mode=transaction
      (the only safe mode given the backend's session usage —
      LISTEN/NOTIFY and cross-tx prepared statements are forbidden,
      neither of which Veza uses), DNS TTL = 60s for failover.
    tasks/main.yml — apt install pgbouncer + postgresql-client (so
      the pgbench / admin psql lives on the same container), render
      pgbouncer.ini + userlist.txt, ensure /var/log/postgresql for
      the file log, enable + start service.
    templates/pgbouncer.ini.j2 — full config; databases section
      points at pgaf-primary.lxd:5432 directly. Failover follows
      via DNS TTL until the W2 day 8 pg_autoctl state-change hook
      that issues RELOAD on the admin console.
    templates/userlist.txt.j2 — only rendered when auth_type !=
      trust. Lab uses trust on the bridge subnet; prod gets a
      vault-backed list of md5/scram hashes.
    handlers/main.yml — RELOAD pgbouncer (graceful, doesn't drop
      established clients).
    README.md — operational cheatsheet:
      - SHOW POOLS / SHOW STATS via the admin console
      - the transaction-mode forbids list (LISTEN/NOTIFY etc.)
      - failover behaviour today vs after the W2-day-8 hook lands

  infra/ansible/playbooks/postgres_ha.yml
    Provision step extended to launch pgaf-pgbouncer alongside
    the formation containers. Two new plays at the bottom apply
    common baseline + pgbouncer role to it.

  infra/ansible/inventory/lab.yml
    `pgbouncer` group with pgaf-pgbouncer reachable via the
    community.general.incus connection plugin (consistent with the
    postgres_ha containers).

  infra/ansible/tests/test_pgbouncer_load.sh
    Acceptance: pgbench 500 clients × 30s × 8 threads against the
    pgbouncer endpoint, must report 0 failed transactions and 0
    connection errors. Also runs `pgbench -i -s 10` first to
    initialise the standard fixture — that init goes through
    pgbouncer too, which incidentally validates transaction-mode
    compatibility before the load run starts.
    Exit codes: 0 / 1 (errors) / 2 (unreachable) / 3 (missing tool).

  veza-backend-api/internal/config/config.go
    Comment block above DATABASE_URL load — documents the prod
    wiring (DATABASE_URL points at pgaf-pgbouncer.lxd:6432, NOT
    at pgaf-primary directly). Also notes the dev/CI exception:
    direct Postgres because the small scale doesn't benefit from
    pooling and tests occasionally lean on session-scoped GUCs
    that transaction-mode would break.

Acceptance verified locally:
  $ ansible-playbook -i inventory/lab.yml playbooks/postgres_ha.yml \
      --syntax-check
  playbook: playbooks/postgres_ha.yml          ← clean
  $ bash -n infra/ansible/tests/test_pgbouncer_load.sh
  syntax OK
  $ cd veza-backend-api && go build ./...
  (clean — comment-only change in config.go)
  $ gofmt -l internal/config/config.go
  (no output — clean)

Real apply + pgbench run requires the lab R720 + the
community.general collection — operator's call.

Out of scope (deferred per ROADMAP §2):
  - HA pgbouncer (single instance per env at v1.0; double
    instance + keepalived in v1.1 if needed)
  - pg_autoctl state-change hook → pgbouncer RELOAD (W2 day 8)
  - Prometheus pgbouncer_exporter (W2 day 9 with the OTel
    collector + observability stack)

SKIP_TESTS=1 — IaC YAML + bash + Go comment-only diff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 18:35:05 +02:00

58 lines
2 KiB
YAML

# Lab inventory — the R720's local lab Incus container used to dry-run
# role changes before they touch staging or prod. Override
# ansible_host / ansible_user / ansible_port in `host_vars/<host>.yml`
# (gitignored if it carries credentials, otherwise plain values).
#
# Usage:
# ansible-playbook -i inventory/lab.yml playbooks/site.yml --check
# ansible-playbook -i inventory/lab.yml playbooks/site.yml
#
# v1.0.9 Day 6: postgres_ha group added. The 3 containers
# (pgaf-monitor, pgaf-primary, pgaf-replica) live ON the veza-lab
# host and are addressed via the `community.general.incus`
# connection plugin — no SSH setup needed inside the containers.
all:
hosts:
veza-lab:
ansible_host: 10.0.20.150
ansible_user: senke
ansible_python_interpreter: /usr/bin/python3
children:
incus_hosts:
hosts:
veza-lab:
veza_lab:
hosts:
veza-lab:
postgres_ha:
hosts:
pgaf-monitor:
pg_auto_failover_role: monitor
pgaf-primary:
pg_auto_failover_role: node
pgaf-replica:
pg_auto_failover_role: node
vars:
# Containers reached via Incus exec on the parent host. The
# plugin lives in the community.general collection — install
# with `ansible-galaxy collection install community.general`
# before running this playbook.
ansible_connection: community.general.incus
ansible_python_interpreter: /usr/bin/python3
postgres_ha_monitor:
hosts:
pgaf-monitor:
postgres_ha_nodes:
# Order matters — primary first so it registers as primary; replica
# second so it joins as standby.
hosts:
pgaf-primary:
pgaf-replica:
# v1.0.9 Day 7: pgbouncer fronts the formation. Same
# community.general.incus connection plugin as postgres_ha.
pgbouncer:
hosts:
pgaf-pgbouncer:
vars:
ansible_connection: community.general.incus
ansible_python_interpreter: /usr/bin/python3