# Runbook — Postgres failover (`pg_auto_failover`) > **Alerts** : `PostgresPrimaryUnreachable`, `PostgresReplicationLagHigh` · also reached from `api-availability-slo-burn.md` and `api-latency-slo-burn.md`. > **Owner** : infra on-call. ## Topology recap ``` ┌─────────────────┐ │ pgaf-monitor │ ← state machine; assigns primary/standby roles └────────┬────────┘ │ pg_auto_failover protocol │ ┌─────┴─────┐ │ │ ┌──▼───┐ ┌───▼────┐ │ pgaf-│ │ pgaf- │ │primary│ │replica │ └───────┘ └────────┘ ``` PgBouncer (`pgaf-pgbouncer`, port 6432) sits in front of whoever is currently primary. Backend reads `DATABASE_URL` from env that already points at the bouncer. ## What "failover" looks like - Primary disappears (crash, host reboot, manual `incus stop`). - Monitor notices within `pgaf_health_check_interval` (~10s). - After `pgaf_failover_timeout` (60s), monitor promotes the replica to primary. - PgBouncer is reconfigured by the monitor's notify hook ; new connections go to the new primary. **Expected RTO is ~60 seconds.** RPO ≈ 0 if synchronous replication was caught up; up to one tx if async. ## Diagnose state ```bash # From any node : sudo -u postgres pg_autoctl show state # Look for one node with state="primary" and one with state="secondary". # If both are "wait_for_primary" the formation is wedged. # Connection-level test (does the bouncer route to a live primary?) : psql "$DATABASE_URL" -c "SELECT now(), pg_is_in_recovery();" # pg_is_in_recovery = false ⇒ you're hitting the primary ``` ## Common failure modes ### A. Monitor is up, primary is down, replica didn't get promoted Either `pgaf_failover_timeout` hasn't elapsed yet (wait 60s) **or** the replica is too far behind to be safe. ```bash # On the replica : sudo -u postgres pg_autoctl show state # Check the LSN distance — if it's > 1MB the monitor will refuse promote. ``` If monitor refused, manual promotion (only if you accept potential data loss) : ```bash sudo -u postgres pg_autoctl perform failover --formation default --group 0 ``` ### B. Monitor itself is down The data nodes keep serving their last-known role until the monitor returns. Reads keep working from the standby. **No automatic failover happens** without the monitor — start it before doing anything else. ```bash sudo systemctl start pg_autoctl@monitor sudo journalctl -u pg_autoctl@monitor -n 200 --no-pager ``` ### C. Both data nodes are down (catastrophe) Restore from pgBackRest. See the dr-drill runbook in `docs/archive/` (or the `pgbackrest` role README) for the manual procedure. **Estimated RTO ~30 min** with a full+diff already on MinIO. ## Connection routing PgBouncer holds the routing decision, so during a failover : ```bash # Confirm which Postgres backend is currently behind the bouncer : psql -h pgaf-pgbouncer.lxd -p 6432 -U pgbouncer pgbouncer -c "SHOW SERVERS;" ``` If the bouncer is still pointing at the dead primary : ```bash # Reload the bouncer config (the pg_auto_failover monitor's # `host_change_hook.sh` should have done this automatically — if not, # something is broken) : sudo systemctl reload pgbouncer ``` ## Backend behavior during failover The backend's GORM connection pool drops dead connections lazily. Expect a few hundred 5xx during the 30-60s window — this trips `APIAvailabilitySLOFastBurn`. The alert clears once the pool refills. ## After recovery 1. Re-add the failed node as standby : ```bash sudo -u postgres pg_autoctl create postgres ... ``` 2. Wait for `pg_autoctl show state` to show two healthy nodes. 3. Run the next dr-drill cycle to validate backups against the new primary. 4. Postmortem if downtime > 5 min.