veza/veza-backend-api/internal/core/connecterrors/sentinels.go

32 lines
1.6 KiB
Go
Raw Normal View History

feat(marketplace): stripe reversal error disambiguation + CHECK constraint + E2E — v1.0.7 item B day 3 Day-3 closure of item B. The three things day 2 deferred are now done: 1. Stripe error disambiguation. ReverseTransfer in StripeConnectService now parses stripe.Error.Code + HTTPStatusCode + Msg to emit the sentinels the worker routes on. Pre-day-3 the sentinels were declared but the service wrapped every error opaquely, making this the exact "temporary compromise frozen into permanent" pattern the audit was meant to prevent — flagged during review and fixed same day. Mapping: * 404 + code=resource_missing → ErrTransferNotFound * 400 + msg matches "already" + "reverse" → ErrTransferAlreadyReversed * any other → transient (wrapped raw, retry) The "already reversed" case has no machine-readable code in stripe-go (unlike ChargeAlreadyRefunded for charges — the SDK doesn't enumerate the equivalent for transfers), so it's message-parsed. Fragility documented at the call site: if Stripe changes the wording, the worker treats the response as transient and eventually surfaces the row to permanently_failed after max retries. Worst-case regression is "benign case gets noisier", not data loss. 2. Migration 983: CHECK constraint chk_reversal_pending_has_next_ retry_at CHECK (status != 'reversal_pending' OR next_retry_at IS NOT NULL). Added NOT VALID so the constraint is enforced on new writes without scanning existing rows; a follow-up VALIDATE can run once the table is known to be clean. Prevents the "invisible orphan" failure mode where a reversal_pending row with NULL next_retry_at would be skipped by any future stricter worker query. 3. End-to-end reversal flow test (reversal_e2e_test.go) chains three sub-scenarios: (a) happy path — refund.succeeded → reversal_pending → worker → reversed with stripe_reversal_id persisted; (b) invalid stripe_transfer_id → worker terminates rapidly to permanently_failed with single Stripe call, no retries (the highest-value coverage per day-3 review); (c) already-reversed out-of-band → worker flips to reversed with informative message. Architecture note — the sentinels were moved to a new leaf package `internal/core/connecterrors` because both marketplace (needs them for the worker's errors.Is checks) and services (needs them to emit) import them, and an import cycle (marketplace → monitoring → services) would form if either owned them directly. marketplace re-exports them as type aliases so the worker code reads naturally against the marketplace namespace. New tests: * services/stripe_connect_service_test.go — 7 cases on isAlreadyReversedMessage (pins Stripe's wording), 1 case on the error-classification shape. Doesn't invoke stripe.SetBackend — the translation logic is tested via a crafted *stripe.Error, the emission is trusted on the read of `errors.As` + the known shape of stripe.Error. * marketplace/reversal_e2e_test.go — 3 end-to-end sub-tests chaining refund → worker against a dual-role mock. The invalid-id case asserts single-call-no-retries termination. * Migration 983 applied cleanly to the local Postgres; constraint visible in \d seller_transfers as NOT VALID (behavior correct for future writes, existing rows grandfathered). Self-assessment on day-2's struct-literal refactor of processSellerTransfers (deferred from day 2): The refactor is borderline — neither clearer nor confusing than the original mutation-after-construct pattern. Logged in the v1.0.7-rc1 CHANGELOG as a post-v1.0.7 consideration: if GORM BeforeUpdate hooks prove cleaner on other state machines (axis 2), revisit the anti-mutation test approach. CHANGELOG v1.0.7-rc1 entry added documenting items A + B end-to-end. Tag not yet applied — items C, D, E, F remain on the v1.0.7 plan. The rc1 tag lands when those four items close + the smoke probe validates the full cadence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 00:12:03 +00:00
// Package connecterrors holds error sentinels shared between the
// Stripe Connect HTTP client (internal/services) and the marketplace
// reversal worker (internal/core/marketplace). Neither of those
// packages can directly export a sentinel the other references without
// creating an import cycle (marketplace → monitoring → services), so
// the sentinels live here as a leaf package that depends on nothing
// and is depended on by both.
//
// Scope is intentionally narrow: only errors that the worker routes on
// via errors.Is live here. Generic Stripe errors remain wrapped raw
// by the service and treated as transient retry candidates by the
// worker. A new sentinel lands here only when the worker needs to
// branch on it differently from the transient case.
package connecterrors
import "errors"
// ErrTransferAlreadyReversed indicates the Stripe transfer has already
// been fully reversed out-of-band (via the Dashboard, another
// instance, or a prior worker tick that lost the response). Benign —
// the worker treats this as reversal success and flips the row to
// reversed with a distinctive error_message.
var ErrTransferAlreadyReversed = errors.New("stripe transfer already reversed")
// ErrTransferNotFound indicates the Stripe transfer id doesn't exist
// at Stripe. This is a data-integrity incident (our DB carries an id
// Stripe can't recognise), not a retry scenario. The worker
// terminates the row as permanently_failed and surfaces it for ops
// investigation — never retries, which would amplify the
// inconsistency.
var ErrTransferNotFound = errors.New("stripe transfer not found")