# Database Migrations ## Overview Veza uses SQL migrations stored in `veza-backend-api/migrations/`. Migrations are applied in order by filename (lexicographic sort). ## Migration Naming - Format: `NNN_description.sql` (e.g. `101_product_reviews.sql`) - Use snake_case for descriptions - Down migrations (rollback): `NNN_description_down.sql` when needed ## Squash Script The `scripts/squash_migrations.sh` script generates a baseline SQL file that concatenates all migrations into a single file. This is useful for: - Fresh database setup - Creating a clean baseline for new environments - Versioned releases (e.g. `baseline_v0601.sql`) ### Usage ```bash # From project root ./scripts/squash_migrations.sh ``` Output: `veza-backend-api/migrations/baseline_v0601.sql` ### Procedure 1. Run the script after adding new migrations 2. Update the version in the script (e.g. `baseline_v0601.sql`) for each release 3. Update the migration range comment (e.g. `001-113`) to reflect the latest migration number 4. The baseline file is auto-generated; do not edit it manually ## Recent Migrations | # | File | Description | |---|------|-------------| | 116 | `116_seller_transfers_retry.sql` | v0.701: Add `retry_count`, `next_retry_at` to `seller_transfers`; index for failed retries | ## Adding New Migrations 1. Create a new file: `veza-backend-api/migrations/NNN_description.sql` 2. Use the next available number (check existing migrations) 3. Write idempotent SQL when possible (e.g. `IF NOT EXISTS`) 4. Test locally before committing 5. Run `squash_migrations.sh` to update the baseline for the release ## Expand-contract discipline (W5+ deploy pipeline contract) > **TL;DR** — every migration must be **backward-compatible** with the > previous deploy's binary. No `DROP COLUMN`, no `ALTER ... NOT NULL`, > no `RENAME` in step 1. Schema evolution happens across **multiple > deploys**, not in one. ### Why this matters The blue/green deploy pipeline (`infra/ansible/playbooks/deploy_app.yml`) makes rollback trivial at the **app layer**: HAProxy flips back to the previous color, ~5 seconds wall-clock, no data lost. But the **database** doesn't have colors. Migrations apply once, against the shared postgres container, and stay applied across the rollback. If a deploy adds a non-nullable column and the rollback tries to insert a row without that column, the insert fails. The rollback button is broken — the previous binary now crashes against the post-migration schema. The fix isn't to make the pipeline smarter. It's to make migrations forward-AND-backward compatible by construction. ### The expand-contract pattern (3 deploys per "destructive" change) **Step 1 (deploy N) — Expand**: add the new shape **alongside** the old. Both binaries (old + new) work. ```sql -- migration NNN_add_user_email_verified.sql ALTER TABLE users ADD COLUMN email_verified BOOLEAN; -- nullable, no default — the old binary doesn't know about it. -- the new binary writes true/false on signup ; reads coalesce NULL → false. ``` **Step 2 (deploy N+1) — Backfill**: once Step 1 is stable in prod (≥ 1 week, no rollbacks needed), backfill existing rows. ```sql -- migration NNN+1_backfill_user_email_verified.sql UPDATE users SET email_verified = false WHERE email_verified IS NULL; ``` **Step 3 (deploy N+2) — Contract**: once the backfill is in, add the constraint. The old binary (still write-coalescing NULL → false) keeps working ; the new binary uses `NOT NULL` knowledge. ```sql -- migration NNN+2_user_email_verified_not_null.sql ALTER TABLE users ALTER COLUMN email_verified SET NOT NULL; ALTER TABLE users ALTER COLUMN email_verified SET DEFAULT false; ``` After Step 3 is stable, you can rollback exactly **one** deploy without breakage. Rolling back beyond Step 1 is no longer safe — that's the expected consequence of expand-contract. ### Allowed in a single deploy | Change | Safe in one deploy? | | --------------------------------------- | ----------------------- | | `CREATE TABLE` | yes | | `CREATE INDEX CONCURRENTLY` | yes | | Add nullable column | yes | | Add column with constant default | yes (PG ≥ 11) | | Backfill UPDATE (idempotent) | yes | | `DROP INDEX CONCURRENTLY` | yes (read paths flex) | | `DROP TABLE` (if no recent code reads it) | with caution | ### NOT allowed in a single deploy | Change | Why | | --------------------------------------- | -------------------------------------------- | | `DROP COLUMN` | rollback's binary still selects it | | `ALTER COLUMN ... NOT NULL` (no prior backfill) | rollback inserts NULL | | `ALTER COLUMN ... TYPE` | rollback's binary expects old type | | `RENAME COLUMN` | rollback's binary still references old name | | `RENAME TABLE` | rollback queries old name | ### Reviewer checklist (PRs touching `veza-backend-api/migrations/`) - [ ] Migration is **forward-only** (GORM doesn't run rollback SQL). - [ ] Migration is **idempotent** (re-running on an already-migrated DB is a no-op — `IF NOT EXISTS`, `ON CONFLICT DO NOTHING`, etc.). - [ ] No `DROP COLUMN`, `ALTER ... NOT NULL`, `RENAME` (or, if there is, the PR description references the prior backfill PRs and explains why this is the contract step). - [ ] If the migration takes a heavy lock (eg `ALTER TABLE` rewriting), use `CREATE INDEX CONCURRENTLY` or split. - [ ] App code changes assume both old and new schema are valid. ### When you must violate the rule (incident) Sometimes a hot incident demands a destructive change ASAP and rollback is acceptable risk. In that case: 1. Tag the PR with `migration:destructive`. 2. Document in the PR body what the rollback procedure is (manual SQL to recreate the dropped column, etc.). 3. Get a second pair of eyes on the migration before merge. 4. Block the corresponding rollback workflow for that env until you've verified the new schema is sticking. ### Future hardening (not in v1.0.x) A `squawk` linter step in `.forgejo/workflows/ci.yml` could scan `veza-backend-api/migrations/*.sql` and fail on `DROP COLUMN`, `ALTER ... NOT NULL`, `RENAME`. The discipline above is the v1.0 answer ; tooling lands when the hand-rolled discipline starts missing things.