veza/veza-backend-api/migrations/983_seller_transfers_reversal_pending_not_null.sql

-- v1.0.7 item B (day 3): CHECK constraint guaranteeing that every
-- row in status='reversal_pending' has next_retry_at set.
--
-- Failure mode this prevents: a reversal_pending row with
-- next_retry_at=NULL is invisible to the worker's select
--   WHERE status='reversal_pending' AND (next_retry_at IS NULL OR next_retry_at <= NOW())
-- (actually the `IS NULL` branch IS included in the worker query,
-- so the row would be picked up — but the pattern is fragile:
-- anyone who later writes a stricter WHERE clause expecting
-- next_retry_at NOT NULL would silently orphan these rows).
--
-- Belt-and-suspenders: the constraint makes the invariant structural.
-- reverseSellerAccounting always sets next_retry_at=NOW() on
-- transition; the worker always sets a retry timestamp on
-- same-state retry; terminal transitions (reversed,
-- permanently_failed) clear next_retry_at back to NULL. No code
-- path produces NULL in combination with reversal_pending — this
-- constraint codifies that.
--
-- The constraint is a CHECK not a NOT NULL because the column is
-- legitimately NULL in the other statuses (pending, completed,
-- failed without pending retry, reversed, permanently_failed).

-- ADD CONSTRAINT is not natively IF NOT EXISTS in Postgres — the
-- DO block catches `duplicate_object` so re-running the migration
-- (test runs, manual psql apply, re-deploys) is a no-op. Same shape
-- the runner expects from other idempotent migrations in this
-- directory.
DO $$
BEGIN
    ALTER TABLE seller_transfers
        ADD CONSTRAINT chk_reversal_pending_has_next_retry_at
        CHECK (status <> 'reversal_pending' OR next_retry_at IS NOT NULL)
        NOT VALID;
EXCEPTION
    WHEN duplicate_object THEN
        -- constraint already present (prior manual apply or prior run)
        NULL;
END $$;

-- NOT VALID creates the constraint without scanning existing rows.
-- Pre-v1.0.7.2 rows are grandfathered — item A + day 2 of B don't
-- produce any reversal_pending rows with NULL next_retry_at, but
-- if any slipped in via tests or manual ops writes, validation
-- would fail the migration. Use VALIDATE CONSTRAINT in a follow-up
-- once the table is known to be clean:
--
--   ALTER TABLE seller_transfers VALIDATE CONSTRAINT chk_reversal_pending_has_next_retry_at;
--
-- Running VALIDATE requires a SHARE UPDATE EXCLUSIVE lock (not
-- AccessExclusive), which is cheap and non-blocking for readers.
-- Left as a separate one-shot so the deploy is independent of
-- table cleanup timing.
--
-- Post-validate, future INSERTs / UPDATEs that violate the
-- invariant will fail synchronously — the constraint is enforced
-- regardless of NOT VALID / VALIDATED state.
feat(marketplace): stripe reversal error disambiguation + CHECK constraint + E2E — v1.0.7 item B day 3 Day-3 closure of item B. The three things day 2 deferred are now done: 1. Stripe error disambiguation. ReverseTransfer in StripeConnectService now parses stripe.Error.Code + HTTPStatusCode + Msg to emit the sentinels the worker routes on. Pre-day-3 the sentinels were declared but the service wrapped every error opaquely, making this the exact "temporary compromise frozen into permanent" pattern the audit was meant to prevent — flagged during review and fixed same day. Mapping: * 404 + code=resource_missing → ErrTransferNotFound * 400 + msg matches "already" + "reverse" → ErrTransferAlreadyReversed * any other → transient (wrapped raw, retry) The "already reversed" case has no machine-readable code in stripe-go (unlike ChargeAlreadyRefunded for charges — the SDK doesn't enumerate the equivalent for transfers), so it's message-parsed. Fragility documented at the call site: if Stripe changes the wording, the worker treats the response as transient and eventually surfaces the row to permanently_failed after max retries. Worst-case regression is "benign case gets noisier", not data loss. 2. Migration 983: CHECK constraint chk_reversal_pending_has_next_ retry_at CHECK (status != 'reversal_pending' OR next_retry_at IS NOT NULL). Added NOT VALID so the constraint is enforced on new writes without scanning existing rows; a follow-up VALIDATE can run once the table is known to be clean. Prevents the "invisible orphan" failure mode where a reversal_pending row with NULL next_retry_at would be skipped by any future stricter worker query. 3. End-to-end reversal flow test (reversal_e2e_test.go) chains three sub-scenarios: (a) happy path — refund.succeeded → reversal_pending → worker → reversed with stripe_reversal_id persisted; (b) invalid stripe_transfer_id → worker terminates rapidly to permanently_failed with single Stripe call, no retries (the highest-value coverage per day-3 review); (c) already-reversed out-of-band → worker flips to reversed with informative message. Architecture note — the sentinels were moved to a new leaf package `internal/core/connecterrors` because both marketplace (needs them for the worker's errors.Is checks) and services (needs them to emit) import them, and an import cycle (marketplace → monitoring → services) would form if either owned them directly. marketplace re-exports them as type aliases so the worker code reads naturally against the marketplace namespace. New tests: * services/stripe_connect_service_test.go — 7 cases on isAlreadyReversedMessage (pins Stripe's wording), 1 case on the error-classification shape. Doesn't invoke stripe.SetBackend — the translation logic is tested via a crafted stripe.Error, the emission is trusted on the read of `errors.As` + the known shape of stripe.Error. marketplace/reversal_e2e_test.go — 3 end-to-end sub-tests chaining refund → worker against a dual-role mock. The invalid-id case asserts single-call-no-retries termination. * Migration 983 applied cleanly to the local Postgres; constraint visible in \d seller_transfers as NOT VALID (behavior correct for future writes, existing rows grandfathered). Self-assessment on day-2's struct-literal refactor of processSellerTransfers (deferred from day 2): The refactor is borderline — neither clearer nor confusing than the original mutation-after-construct pattern. Logged in the v1.0.7-rc1 CHANGELOG as a post-v1.0.7 consideration: if GORM BeforeUpdate hooks prove cleaner on other state machines (axis 2), revisit the anti-mutation test approach. CHANGELOG v1.0.7-rc1 entry added documenting items A + B end-to-end. Tag not yet applied — items C, D, E, F remain on the v1.0.7 plan. The rc1 tag lands when those four items close + the smoke probe validates the full cadence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> 2026-04-18 00:12:03 +00:00			`-- v1.0.7 item B (day 3): CHECK constraint guaranteeing that every`
			`-- row in status='reversal_pending' has next_retry_at set.`
			`--`
			`-- Failure mode this prevents: a reversal_pending row with`
			`-- next_retry_at=NULL is invisible to the worker's select`
			`-- WHERE status='reversal_pending' AND (next_retry_at IS NULL OR next_retry_at <= NOW())`
			-- (actually the `IS NULL` branch IS included in the worker query,
			`-- so the row would be picked up — but the pattern is fragile:`
			`-- anyone who later writes a stricter WHERE clause expecting`
			`-- next_retry_at NOT NULL would silently orphan these rows).`
			`--`
			`-- Belt-and-suspenders: the constraint makes the invariant structural.`
			`-- reverseSellerAccounting always sets next_retry_at=NOW() on`
			`-- transition; the worker always sets a retry timestamp on`
			`-- same-state retry; terminal transitions (reversed,`
			`-- permanently_failed) clear next_retry_at back to NULL. No code`
			`-- path produces NULL in combination with reversal_pending — this`
			`-- constraint codifies that.`
			`--`
			`-- The constraint is a CHECK not a NOT NULL because the column is`
			`-- legitimately NULL in the other statuses (pending, completed,`
			`-- failed without pending retry, reversed, permanently_failed).`

fix(migrations): make 983 CHECK constraint idempotent via DO block Migration 983 was crashing backend startup on my local DB because (a) I'd manually applied it via psql during B day 3 development before the migration runner saw it, so the constraint existed but was not tracked; (b) the migration used plain ADD CONSTRAINT which Postgres doesn't support with IF NOT EXISTS for CHECK constraints. Fix: wrap the ALTER TABLE in a DO block that catches `duplicate_object` — re-running the migration becomes a no-op, matches the idempotency contract the other migrations in this directory observe. Any env where the constraint already exists (manual apply, prior successful run) now proceeds cleanly. Verified: backend starts cleanly after the fix. Pre-rc1 blocker resolved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> 2026-04-18 02:08:14 +00:00			`-- ADD CONSTRAINT is not natively IF NOT EXISTS in Postgres — the`
			-- DO block catches `duplicate_object` so re-running the migration
			`-- (test runs, manual psql apply, re-deploys) is a no-op. Same shape`
			`-- the runner expects from other idempotent migrations in this`
			`-- directory.`
			`DO $$`
			`BEGIN`
			`ALTER TABLE seller_transfers`
			`ADD CONSTRAINT chk_reversal_pending_has_next_retry_at`
			`CHECK (status <> 'reversal_pending' OR next_retry_at IS NOT NULL)`
			`NOT VALID;`
			`EXCEPTION`
			`WHEN duplicate_object THEN`
			`-- constraint already present (prior manual apply or prior run)`
			`NULL;`
			`END $$;`
feat(marketplace): stripe reversal error disambiguation + CHECK constraint + E2E — v1.0.7 item B day 3 Day-3 closure of item B. The three things day 2 deferred are now done: 1. Stripe error disambiguation. ReverseTransfer in StripeConnectService now parses stripe.Error.Code + HTTPStatusCode + Msg to emit the sentinels the worker routes on. Pre-day-3 the sentinels were declared but the service wrapped every error opaquely, making this the exact "temporary compromise frozen into permanent" pattern the audit was meant to prevent — flagged during review and fixed same day. Mapping: * 404 + code=resource_missing → ErrTransferNotFound * 400 + msg matches "already" + "reverse" → ErrTransferAlreadyReversed * any other → transient (wrapped raw, retry) The "already reversed" case has no machine-readable code in stripe-go (unlike ChargeAlreadyRefunded for charges — the SDK doesn't enumerate the equivalent for transfers), so it's message-parsed. Fragility documented at the call site: if Stripe changes the wording, the worker treats the response as transient and eventually surfaces the row to permanently_failed after max retries. Worst-case regression is "benign case gets noisier", not data loss. 2. Migration 983: CHECK constraint chk_reversal_pending_has_next_ retry_at CHECK (status != 'reversal_pending' OR next_retry_at IS NOT NULL). Added NOT VALID so the constraint is enforced on new writes without scanning existing rows; a follow-up VALIDATE can run once the table is known to be clean. Prevents the "invisible orphan" failure mode where a reversal_pending row with NULL next_retry_at would be skipped by any future stricter worker query. 3. End-to-end reversal flow test (reversal_e2e_test.go) chains three sub-scenarios: (a) happy path — refund.succeeded → reversal_pending → worker → reversed with stripe_reversal_id persisted; (b) invalid stripe_transfer_id → worker terminates rapidly to permanently_failed with single Stripe call, no retries (the highest-value coverage per day-3 review); (c) already-reversed out-of-band → worker flips to reversed with informative message. Architecture note — the sentinels were moved to a new leaf package `internal/core/connecterrors` because both marketplace (needs them for the worker's errors.Is checks) and services (needs them to emit) import them, and an import cycle (marketplace → monitoring → services) would form if either owned them directly. marketplace re-exports them as type aliases so the worker code reads naturally against the marketplace namespace. New tests: * services/stripe_connect_service_test.go — 7 cases on isAlreadyReversedMessage (pins Stripe's wording), 1 case on the error-classification shape. Doesn't invoke stripe.SetBackend — the translation logic is tested via a crafted stripe.Error, the emission is trusted on the read of `errors.As` + the known shape of stripe.Error. marketplace/reversal_e2e_test.go — 3 end-to-end sub-tests chaining refund → worker against a dual-role mock. The invalid-id case asserts single-call-no-retries termination. * Migration 983 applied cleanly to the local Postgres; constraint visible in \d seller_transfers as NOT VALID (behavior correct for future writes, existing rows grandfathered). Self-assessment on day-2's struct-literal refactor of processSellerTransfers (deferred from day 2): The refactor is borderline — neither clearer nor confusing than the original mutation-after-construct pattern. Logged in the v1.0.7-rc1 CHANGELOG as a post-v1.0.7 consideration: if GORM BeforeUpdate hooks prove cleaner on other state machines (axis 2), revisit the anti-mutation test approach. CHANGELOG v1.0.7-rc1 entry added documenting items A + B end-to-end. Tag not yet applied — items C, D, E, F remain on the v1.0.7 plan. The rc1 tag lands when those four items close + the smoke probe validates the full cadence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> 2026-04-18 00:12:03 +00:00
			`-- NOT VALID creates the constraint without scanning existing rows.`
			`-- Pre-v1.0.7.2 rows are grandfathered — item A + day 2 of B don't`
			`-- produce any reversal_pending rows with NULL next_retry_at, but`
			`-- if any slipped in via tests or manual ops writes, validation`
			`-- would fail the migration. Use VALIDATE CONSTRAINT in a follow-up`
			`-- once the table is known to be clean:`
			`--`
			`-- ALTER TABLE seller_transfers VALIDATE CONSTRAINT chk_reversal_pending_has_next_retry_at;`
			`--`
			`-- Running VALIDATE requires a SHARE UPDATE EXCLUSIVE lock (not`
			`-- AccessExclusive), which is cheap and non-blocking for readers.`
			`-- Left as a separate one-shot so the deploy is independent of`
			`-- table cleanup timing.`
			`--`
			`-- Post-validate, future INSERTs / UPDATEs that violate the`
			`-- invariant will fail synchronously — the constraint is enforced`
			`-- regardless of NOT VALID / VALIDATED state.`