feat(marketplace): stripe reversal error disambiguation + CHECK constraint + E2E — v1.0.7 item B day 3
Day-3 closure of item B. The three things day 2 deferred are now done:
1. Stripe error disambiguation.
ReverseTransfer in StripeConnectService now parses
stripe.Error.Code + HTTPStatusCode + Msg to emit the sentinels
the worker routes on. Pre-day-3 the sentinels were declared but
the service wrapped every error opaquely, making this the exact
"temporary compromise frozen into permanent" pattern the audit
was meant to prevent — flagged during review and fixed same day.
Mapping:
* 404 + code=resource_missing → ErrTransferNotFound
* 400 + msg matches "already" + "reverse" → ErrTransferAlreadyReversed
* any other → transient (wrapped raw, retry)
The "already reversed" case has no machine-readable code in
stripe-go (unlike ChargeAlreadyRefunded for charges — the SDK
doesn't enumerate the equivalent for transfers), so it's
message-parsed. Fragility documented at the call site: if Stripe
changes the wording, the worker treats the response as transient
and eventually surfaces the row to permanently_failed after max
retries. Worst-case regression is "benign case gets noisier",
not data loss.
2. Migration 983: CHECK constraint chk_reversal_pending_has_next_
retry_at CHECK (status != 'reversal_pending' OR next_retry_at
IS NOT NULL). Added NOT VALID so the constraint is enforced on
new writes without scanning existing rows; a follow-up VALIDATE
can run once the table is known to be clean. Prevents the
"invisible orphan" failure mode where a reversal_pending row
with NULL next_retry_at would be skipped by any future stricter
worker query.
3. End-to-end reversal flow test (reversal_e2e_test.go) chains
three sub-scenarios: (a) happy path — refund.succeeded →
reversal_pending → worker → reversed with stripe_reversal_id
persisted; (b) invalid stripe_transfer_id → worker terminates
rapidly to permanently_failed with single Stripe call, no
retries (the highest-value coverage per day-3 review); (c)
already-reversed out-of-band → worker flips to reversed with
informative message.
Architecture note — the sentinels were moved to a new leaf
package `internal/core/connecterrors` because both marketplace
(needs them for the worker's errors.Is checks) and services (needs
them to emit) import them, and an import cycle
(marketplace → monitoring → services) would form if either owned
them directly. marketplace re-exports them as type aliases so the
worker code reads naturally against the marketplace namespace.
New tests:
* services/stripe_connect_service_test.go — 7 cases on
isAlreadyReversedMessage (pins Stripe's wording), 1 case on
the error-classification shape. Doesn't invoke stripe.SetBackend
— the translation logic is tested via a crafted *stripe.Error,
the emission is trusted on the read of `errors.As` + the known
shape of stripe.Error.
* marketplace/reversal_e2e_test.go — 3 end-to-end sub-tests
chaining refund → worker against a dual-role mock. The
invalid-id case asserts single-call-no-retries termination.
* Migration 983 applied cleanly to the local Postgres; constraint
visible in \d seller_transfers as NOT VALID (behavior correct
for future writes, existing rows grandfathered).
Self-assessment on day-2's struct-literal refactor of
processSellerTransfers (deferred from day 2):
The refactor is borderline — neither clearer nor confusing than the
original mutation-after-construct pattern. Logged in the v1.0.7-rc1
CHANGELOG as a post-v1.0.7 consideration: if GORM BeforeUpdate
hooks prove cleaner on other state machines (axis 2), revisit the
anti-mutation test approach.
CHANGELOG v1.0.7-rc1 entry added documenting items A + B end-to-end.
Tag not yet applied — items C, D, E, F remain on the v1.0.7 plan.
The rc1 tag lands when those four items close + the smoke probe
validates the full cadence.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
d2bb9c0e78
commit
1a133af9ac
8 changed files with 593 additions and 6 deletions
140
CHANGELOG.md
140
CHANGELOG.md
|
|
@ -1,5 +1,145 @@
|
|||
# Changelog - Veza
|
||||
|
||||
## [v1.0.7-rc1] - in progress (2026-04-18)
|
||||
|
||||
Release-candidate entry — items A and B delivered, items C/D/E/F
|
||||
remain. See `docs/audit-2026-04/v107-plan.md` for the full scope.
|
||||
This CHANGELOG section documents what's landed against main so far;
|
||||
a final v1.0.7 tag requires the remaining items to close.
|
||||
|
||||
### Item A — persist stripe_transfer_id on seller_transfers
|
||||
|
||||
Pre-v1.0.7 `TransferService.CreateTransfer` returned `error` only —
|
||||
the Stripe transfer id was discarded (the single line
|
||||
`_, err := transfer.New(params)` threw it away) and the
|
||||
`stripe_transfer_id` column sat empty on every row. This blocked
|
||||
item B's reversal worker from identifying which transfer to reverse.
|
||||
|
||||
* Interface signature change: `(..., error)` → `(id string, ..., error)`.
|
||||
* Four call sites capture and persist the id:
|
||||
processSellerTransfers (new sale), TransferRetryWorker (retry
|
||||
recovery), admin_transfer_handler.RetryTransfer (manual admin
|
||||
retry), payout.RequestPayout (writes to SellerPayout.ExternalPayoutID).
|
||||
* Four test mocks extended. Three assertions added verifying
|
||||
persistence on the happy path; one failure-path test confirms
|
||||
the id is NOT persisted when the provider errors.
|
||||
* Migration `981_seller_transfers_stripe_reversal_id.sql` adds
|
||||
`stripe_reversal_id` (prep for B) and partial UNIQUE indexes on
|
||||
both id columns (matching the v1.0.6.1 pattern for
|
||||
refunds.hyperswitch_refund_id).
|
||||
* Defensive guard: `StripeConnectService.CreateTransfer` fails the
|
||||
call if Stripe returns `(tr, nil)` with `tr.ID == ""` — the
|
||||
SDK's invariant, but a violation would leave the row permanently
|
||||
un-reversible, so better to fail loudly.
|
||||
|
||||
Backfill for historical rows where the id is empty (ops task #38)
|
||||
is tracked separately: pre-v1.0.7 transfers cannot be
|
||||
auto-reversed; the backfill CLI queries Stripe's transfers.List by
|
||||
metadata[order_id] to populate missing ids, acceptable to leave NULL
|
||||
per v107-plan.
|
||||
|
||||
### Item B — async Stripe Connect reversal worker
|
||||
|
||||
`reverseSellerAccounting` moved from synchronous "mark row reversed
|
||||
locally without calling Stripe" to asynchronous "mark row
|
||||
reversal_pending, let the worker reconcile out-of-band". Decouples
|
||||
buyer-facing refund UX (completes immediately) from Stripe
|
||||
settlement health (may retry, may 404 if already reversed, may
|
||||
permanently fail and need ops attention).
|
||||
|
||||
State machine — single source of truth in
|
||||
`internal/core/marketplace/transfer_transitions.go`:
|
||||
|
||||
pending → {completed, failed}
|
||||
completed → {reversal_pending} (item B)
|
||||
failed → {completed, permanently_failed}
|
||||
reversal_pending → {reversed, permanently_failed} (item B)
|
||||
reversed → {} (terminal)
|
||||
permanently_failed → {} (terminal)
|
||||
|
||||
`SellerTransfer.TransitionStatus(tx, to, extras)` validates against
|
||||
the matrix and performs a conditional UPDATE guarded by the
|
||||
expected `from` (optimistic lock semantics — concurrent workers
|
||||
racing on the same row find RowsAffected=0 and log a conflict).
|
||||
`TestNoDirectTransferStatusMutation` greps the marketplace package
|
||||
for raw `.Status = "..."` or
|
||||
`Model(&SellerTransfer{}).Update("status"...)` outside a minimal
|
||||
allowlist and fails if found; validated against an injected
|
||||
violation during development.
|
||||
|
||||
StripeReversalWorker (`internal/core/marketplace/reversal_worker.go`):
|
||||
|
||||
* Tick interval: `REVERSAL_CHECK_INTERVAL` (default 1m).
|
||||
* Batch limit 20 per tick, indexed on partial composite
|
||||
`(status, next_retry_at) WHERE status='reversal_pending'`
|
||||
(migration 982).
|
||||
* Exponential backoff: `REVERSAL_BACKOFF_BASE` * 2^retry_count,
|
||||
capped at `REVERSAL_BACKOFF_MAX` (defaults 1m and 1h).
|
||||
* `REVERSAL_MAX_RETRIES` (default 5) transitions the row to
|
||||
permanently_failed.
|
||||
* Legacy rows with empty stripe_transfer_id → permanently_failed
|
||||
immediately with a distinctive error_message, so ops can find
|
||||
them via grep once the backfill CLI (task #38) lands.
|
||||
|
||||
Stripe error disambiguation (day 3 closure of the day-2 dead-code
|
||||
gap):
|
||||
|
||||
* 404 + `resource_missing` → `ErrTransferNotFound` → worker
|
||||
transitions to permanently_failed (data-integrity signal, never
|
||||
retry — would amplify the inconsistency).
|
||||
* 400 + message contains "already" + "reversal/reversed" →
|
||||
`ErrTransferAlreadyReversed` → worker treats as success
|
||||
(someone reversed out-of-band via Dashboard or another
|
||||
instance; idempotent).
|
||||
* Any other error is transient → retry with backoff.
|
||||
* Sentinels live in `internal/core/connecterrors` as a leaf
|
||||
package because marketplace and services both need them and
|
||||
an import cycle (marketplace → monitoring → services) would
|
||||
form if either owned them directly.
|
||||
|
||||
Migration `982` adds the partial composite index for the worker's
|
||||
hot path. Migration `983` adds a CHECK constraint
|
||||
(`status != 'reversal_pending' OR next_retry_at IS NOT NULL`) so
|
||||
the invariant that every reversal_pending row carries a retry
|
||||
timestamp is structural — a bug that ever writes NULL
|
||||
next_retry_at on a reversal_pending row fails the INSERT/UPDATE at
|
||||
the DB, not silently orphans the row.
|
||||
|
||||
Worker covers 9 unit-test cases plus 3 end-to-end scenarios
|
||||
(refund → worker → reversed, including the invalid-stripe_transfer_id
|
||||
terminal path). Integration smoke against local Postgres confirmed
|
||||
migrations 981/982/983 apply cleanly.
|
||||
|
||||
Behavior change visible to tests: the refund.succeeded webhook now
|
||||
leaves the seller_transfer at reversal_pending rather than reversed
|
||||
directly. `TestProcessRefundWebhook_SucceededFinalizesState`
|
||||
updated to assert the new expected state and the presence of
|
||||
next_retry_at.
|
||||
|
||||
Worker wired in `cmd/api/main.go` alongside TransferRetryWorker,
|
||||
sharing the same StripeConnectService instance. Gated on
|
||||
`StripeConnectEnabled && StripeConnectSecretKey != ""` (same as
|
||||
TransferRetryWorker) — in dev without Stripe configured, the
|
||||
worker never starts.
|
||||
|
||||
### Notes
|
||||
|
||||
* `REVERSAL_*` env vars documented in `.env.template` so ops can
|
||||
tune without source-diving.
|
||||
* Anti-mutation test decision (grep-based rather than GORM
|
||||
BeforeUpdate hook) forced a minor refactor of
|
||||
`processSellerTransfers` to construct SellerTransfer rows in a
|
||||
single struct literal rather than mutating Status in place
|
||||
after construction. The refactor is neither clearer nor more
|
||||
confusing than the original — borderline stylistic. Logged as
|
||||
a post-v1.0.7 consideration: if the GORM hook approach proves
|
||||
cleaner in axis 2 (state-machine transitions for other
|
||||
entities), revisit and potentially retire the grep test in
|
||||
favor of a hook.
|
||||
* Item A unknown #2 (backfill coverage on historical transfers)
|
||||
tracked as task #38; item B unknown: none surfaced during
|
||||
implementation.
|
||||
|
||||
## [v1.0.6.2] - 2026-04-17
|
||||
|
||||
### Hotfix — subscription payment-gate bypass
|
||||
|
|
|
|||
31
veza-backend-api/internal/core/connecterrors/sentinels.go
Normal file
31
veza-backend-api/internal/core/connecterrors/sentinels.go
Normal file
|
|
@ -0,0 +1,31 @@
|
|||
// Package connecterrors holds error sentinels shared between the
|
||||
// Stripe Connect HTTP client (internal/services) and the marketplace
|
||||
// reversal worker (internal/core/marketplace). Neither of those
|
||||
// packages can directly export a sentinel the other references without
|
||||
// creating an import cycle (marketplace → monitoring → services), so
|
||||
// the sentinels live here as a leaf package that depends on nothing
|
||||
// and is depended on by both.
|
||||
//
|
||||
// Scope is intentionally narrow: only errors that the worker routes on
|
||||
// via errors.Is live here. Generic Stripe errors remain wrapped raw
|
||||
// by the service and treated as transient retry candidates by the
|
||||
// worker. A new sentinel lands here only when the worker needs to
|
||||
// branch on it differently from the transient case.
|
||||
package connecterrors
|
||||
|
||||
import "errors"
|
||||
|
||||
// ErrTransferAlreadyReversed indicates the Stripe transfer has already
|
||||
// been fully reversed out-of-band (via the Dashboard, another
|
||||
// instance, or a prior worker tick that lost the response). Benign —
|
||||
// the worker treats this as reversal success and flips the row to
|
||||
// reversed with a distinctive error_message.
|
||||
var ErrTransferAlreadyReversed = errors.New("stripe transfer already reversed")
|
||||
|
||||
// ErrTransferNotFound indicates the Stripe transfer id doesn't exist
|
||||
// at Stripe. This is a data-integrity incident (our DB carries an id
|
||||
// Stripe can't recognise), not a retry scenario. The worker
|
||||
// terminates the row as permanently_failed and surfaces it for ops
|
||||
// investigation — never retries, which would amplify the
|
||||
// inconsistency.
|
||||
var ErrTransferNotFound = errors.New("stripe transfer not found")
|
||||
221
veza-backend-api/internal/core/marketplace/reversal_e2e_test.go
Normal file
221
veza-backend-api/internal/core/marketplace/reversal_e2e_test.go
Normal file
|
|
@ -0,0 +1,221 @@
|
|||
package marketplace
|
||||
|
||||
import (
|
||||
"context"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"github.com/google/uuid"
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/stretchr/testify/require"
|
||||
"go.uber.org/zap"
|
||||
"gorm.io/driver/sqlite"
|
||||
"gorm.io/gorm"
|
||||
|
||||
"veza-backend-api/internal/core/connecterrors"
|
||||
"veza-backend-api/internal/models"
|
||||
)
|
||||
|
||||
// TestReversalFlow_EndToEnd exercises the full v1.0.7 item B chain in
|
||||
// one test: refund.succeeded webhook → reverseSellerAccounting flips
|
||||
// seller_transfer to reversal_pending → StripeReversalWorker picks it
|
||||
// up on its next batch → worker transitions to reversed + persists
|
||||
// stripe_reversal_id. This is the critical path that days 1 and 2
|
||||
// each covered in isolation; day 3 chains them to prove they
|
||||
// compose correctly.
|
||||
//
|
||||
// Two scenarios run in sub-tests: the happy-path success, and the
|
||||
// invalid-stripe_transfer_id case where Stripe would reply 404
|
||||
// resource_missing — the worker must route that to
|
||||
// permanently_failed rapidly (no retries), not silently mask it. The
|
||||
// invalid-id path is the scenario flagged explicitly as the highest-
|
||||
// value test coverage gap in the day-3 review.
|
||||
func TestReversalFlow_EndToEnd(t *testing.T) {
|
||||
t.Run("happy path — reversal_pending transitions to reversed", func(t *testing.T) {
|
||||
f, xferMock := newE2EFixture(t)
|
||||
xferMock.reversalID = "rev_e2e_happy"
|
||||
|
||||
// Phase 1: refund succeeds → seller_transfer goes to
|
||||
// reversal_pending via reverseSellerAccounting.
|
||||
refund, err := f.svc.RefundOrder(context.Background(), f.orderID, f.buyerID, "e2e")
|
||||
require.NoError(t, err)
|
||||
require.NoError(t, f.svc.ProcessRefundWebhook(context.Background(),
|
||||
makeWebhook("refund.succeeded", refund.HyperswitchRefundID, "succeeded", "")))
|
||||
|
||||
var afterRefund SellerTransfer
|
||||
require.NoError(t, f.db.Where("order_id = ?", f.orderID).First(&afterRefund).Error)
|
||||
require.Equal(t, TransferStatusReversalPending, afterRefund.Status,
|
||||
"post-refund row must be in reversal_pending awaiting the worker")
|
||||
require.NotNil(t, afterRefund.NextRetryAt)
|
||||
|
||||
// Phase 2: worker tick → row advances to reversed, id persisted.
|
||||
worker := NewStripeReversalWorker(f.db, xferMock, zap.NewNop(),
|
||||
time.Minute, 5, time.Minute, time.Hour)
|
||||
worker.processBatch(context.Background())
|
||||
|
||||
var afterWorker SellerTransfer
|
||||
require.NoError(t, f.db.First(&afterWorker, afterRefund.ID).Error)
|
||||
assert.Equal(t, TransferStatusReversed, afterWorker.Status)
|
||||
assert.Equal(t, "rev_e2e_happy", afterWorker.StripeReversalID)
|
||||
assert.Nil(t, afterWorker.NextRetryAt)
|
||||
|
||||
// The mock recorded exactly one reversal call, against the
|
||||
// transfer id the fixture planted pre-refund.
|
||||
require.Len(t, xferMock.reversalCalls, 1)
|
||||
assert.Equal(t, "tr_e2e_fixture", xferMock.reversalCalls[0].StripeTransferID)
|
||||
assert.Equal(t, "refund", xferMock.reversalCalls[0].Reason)
|
||||
})
|
||||
|
||||
t.Run("invalid stripe_transfer_id — worker terminates to permanently_failed", func(t *testing.T) {
|
||||
f, xferMock := newE2EFixture(t)
|
||||
// Simulate Stripe's response when the transfer id doesn't exist
|
||||
// at Stripe. Happens in two realistic scenarios: (a) pre-v1.0.7
|
||||
// rows where the backfill CLI (task #38) never populated an id
|
||||
// and the legacy-row path triggers, (b) a corrupted id from a
|
||||
// bug somewhere else in the system. The latter is the case
|
||||
// this test exercises — id is syntactically valid but Stripe
|
||||
// rejects it.
|
||||
xferMock.reversalErr = connecterrors.ErrTransferNotFound
|
||||
|
||||
refund, err := f.svc.RefundOrder(context.Background(), f.orderID, f.buyerID, "e2e-missing")
|
||||
require.NoError(t, err)
|
||||
require.NoError(t, f.svc.ProcessRefundWebhook(context.Background(),
|
||||
makeWebhook("refund.succeeded", refund.HyperswitchRefundID, "succeeded", "")))
|
||||
|
||||
worker := NewStripeReversalWorker(f.db, xferMock, zap.NewNop(),
|
||||
time.Minute, 5, time.Minute, time.Hour)
|
||||
worker.processBatch(context.Background())
|
||||
|
||||
var st SellerTransfer
|
||||
require.NoError(t, f.db.Where("order_id = ?", f.orderID).First(&st).Error)
|
||||
assert.Equal(t, TransferStatusPermanentlyFailed, st.Status,
|
||||
"worker must terminate on resource_missing, never retry (data-integrity signal)")
|
||||
assert.Contains(t, st.ErrorMessage, "not found")
|
||||
assert.Nil(t, st.NextRetryAt)
|
||||
// Single call — no retries on this error class.
|
||||
require.Len(t, xferMock.reversalCalls, 1)
|
||||
})
|
||||
|
||||
t.Run("already reversed out-of-band — worker treats as success", func(t *testing.T) {
|
||||
f, xferMock := newE2EFixture(t)
|
||||
xferMock.reversalErr = connecterrors.ErrTransferAlreadyReversed
|
||||
|
||||
refund, err := f.svc.RefundOrder(context.Background(), f.orderID, f.buyerID, "e2e-already")
|
||||
require.NoError(t, err)
|
||||
require.NoError(t, f.svc.ProcessRefundWebhook(context.Background(),
|
||||
makeWebhook("refund.succeeded", refund.HyperswitchRefundID, "succeeded", "")))
|
||||
|
||||
worker := NewStripeReversalWorker(f.db, xferMock, zap.NewNop(),
|
||||
time.Minute, 5, time.Minute, time.Hour)
|
||||
worker.processBatch(context.Background())
|
||||
|
||||
var st SellerTransfer
|
||||
require.NoError(t, f.db.Where("order_id = ?", f.orderID).First(&st).Error)
|
||||
assert.Equal(t, TransferStatusReversed, st.Status,
|
||||
"already-reversed at Stripe = our job is done, flip to reversed with an informative message")
|
||||
assert.Contains(t, st.ErrorMessage, "already")
|
||||
})
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Fixture — refund-capable Service with a dual-role mock (payment +
|
||||
// transfer reversal). Local to the e2e test to keep the regular
|
||||
// refund fixture and reversal worker fixture unchanged.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
type e2eTransferMock struct {
|
||||
reversalID string
|
||||
reversalErr error
|
||||
reversalCalls []struct {
|
||||
StripeTransferID string
|
||||
Reason string
|
||||
}
|
||||
}
|
||||
|
||||
func (m *e2eTransferMock) CreateTransfer(_ context.Context, _ uuid.UUID, _ int64, _, _ string) (string, error) {
|
||||
return "tr_e2e_fixture", nil
|
||||
}
|
||||
|
||||
func (m *e2eTransferMock) ReverseTransfer(_ context.Context, stripeTransferID string, _ *int64, reason string) (string, error) {
|
||||
m.reversalCalls = append(m.reversalCalls, struct {
|
||||
StripeTransferID string
|
||||
Reason string
|
||||
}{stripeTransferID, reason})
|
||||
if m.reversalErr != nil {
|
||||
return "", m.reversalErr
|
||||
}
|
||||
id := m.reversalID
|
||||
if id == "" {
|
||||
id = "rev_mock"
|
||||
}
|
||||
return id, nil
|
||||
}
|
||||
|
||||
type e2eFixture struct {
|
||||
db *gorm.DB
|
||||
svc *Service
|
||||
buyerID uuid.UUID
|
||||
sellerID uuid.UUID
|
||||
orderID uuid.UUID
|
||||
}
|
||||
|
||||
func newE2EFixture(t *testing.T) (*e2eFixture, *e2eTransferMock) {
|
||||
t.Helper()
|
||||
db, err := gorm.Open(sqlite.Open(":memory:"), &gorm.Config{})
|
||||
require.NoError(t, err)
|
||||
require.NoError(t, db.AutoMigrate(
|
||||
&models.User{},
|
||||
&Product{},
|
||||
&Order{},
|
||||
&OrderItem{},
|
||||
&License{},
|
||||
&Refund{},
|
||||
&SellerBalance{},
|
||||
&SellerTransfer{},
|
||||
&models.Track{},
|
||||
))
|
||||
|
||||
paymentMock := &mockRefundPaymentProvider{}
|
||||
xferMock := &e2eTransferMock{}
|
||||
svc := NewService(db, zap.NewNop(), nil,
|
||||
WithPaymentProvider(paymentMock),
|
||||
WithTransferService(xferMock, 0.10))
|
||||
|
||||
buyerID := uuid.New()
|
||||
sellerID := uuid.New()
|
||||
trackID := uuid.New()
|
||||
productID := uuid.New()
|
||||
orderID := uuid.New()
|
||||
|
||||
require.NoError(t, db.Create(&models.User{ID: buyerID, Username: "buyer", Email: "b@e"}).Error)
|
||||
require.NoError(t, db.Create(&models.User{ID: sellerID, Username: "seller", Email: "s@e"}).Error)
|
||||
require.NoError(t, db.Create(&models.Track{ID: trackID, UserID: sellerID, FilePath: "/t.mp3"}).Error)
|
||||
require.NoError(t, db.Create(&Product{
|
||||
ID: productID, SellerID: sellerID, Title: "P", Price: 9.99,
|
||||
ProductType: "track", TrackID: &trackID, Status: ProductStatusActive,
|
||||
}).Error)
|
||||
require.NoError(t, db.Create(&Order{
|
||||
ID: orderID, BuyerID: buyerID, TotalAmount: 9.99, Currency: "EUR",
|
||||
Status: "completed", HyperswitchPaymentID: "pay_e2e",
|
||||
}).Error)
|
||||
require.NoError(t, db.Create(&OrderItem{ID: uuid.New(), OrderID: orderID, ProductID: productID, Price: 9.99}).Error)
|
||||
require.NoError(t, db.Create(&License{
|
||||
BuyerID: buyerID, TrackID: trackID, ProductID: productID, OrderID: orderID, Type: LicenseBasic,
|
||||
}).Error)
|
||||
require.NoError(t, db.Create(&SellerBalance{
|
||||
SellerID: sellerID, AvailableCents: 899, TotalEarnedCents: 899, Currency: "EUR",
|
||||
}).Error)
|
||||
// Pre-existing completed seller_transfer with a real-looking
|
||||
// stripe_transfer_id (as item A would now populate on a fresh
|
||||
// sale). The refund flow finds this row and transitions it.
|
||||
require.NoError(t, db.Create(&SellerTransfer{
|
||||
SellerID: sellerID, OrderID: orderID,
|
||||
AmountCents: 899, PlatformFeeCents: 100,
|
||||
Currency: "EUR", Status: TransferStatusCompleted,
|
||||
StripeTransferID: "tr_e2e_fixture",
|
||||
}).Error)
|
||||
|
||||
return &e2eFixture{
|
||||
db: db, svc: svc, buyerID: buyerID, sellerID: sellerID, orderID: orderID,
|
||||
}, xferMock
|
||||
}
|
||||
|
|
@ -12,6 +12,7 @@ import (
|
|||
"go.uber.org/zap"
|
||||
"gorm.io/gorm"
|
||||
|
||||
"veza-backend-api/internal/core/connecterrors"
|
||||
"veza-backend-api/internal/models"
|
||||
"veza-backend-api/internal/monitoring"
|
||||
)
|
||||
|
|
@ -162,13 +163,15 @@ type TransferService interface {
|
|||
ReverseTransfer(ctx context.Context, stripeTransferID string, amount *int64, reason string) (string, error)
|
||||
}
|
||||
|
||||
// Error sentinels for ReverseTransfer — allows the worker to distinguish
|
||||
// the benign "already reversed" case from the alarming "transfer id
|
||||
// doesn't exist" case. Both map to 404 on the wire; only the PSP error
|
||||
// code disambiguates them.
|
||||
// Error sentinels for ReverseTransfer live in
|
||||
// internal/core/connecterrors as a leaf package (neither marketplace
|
||||
// nor services can own them without creating an import cycle via
|
||||
// monitoring). Aliased here for convenient use within the marketplace
|
||||
// package — worker code reads naturally as `connecterrors.ErrXxx` or
|
||||
// via these aliases.
|
||||
var (
|
||||
ErrTransferAlreadyReversed = errors.New("stripe transfer already reversed")
|
||||
ErrTransferNotFound = errors.New("stripe transfer not found")
|
||||
ErrTransferAlreadyReversed = connecterrors.ErrTransferAlreadyReversed
|
||||
ErrTransferNotFound = connecterrors.ErrTransferNotFound
|
||||
)
|
||||
|
||||
// Service implémente MarketplaceService
|
||||
|
|
|
|||
|
|
@ -4,6 +4,8 @@ import (
|
|||
"context"
|
||||
"errors"
|
||||
"fmt"
|
||||
"net/http"
|
||||
"strings"
|
||||
|
||||
"github.com/google/uuid"
|
||||
"github.com/stripe/stripe-go/v82"
|
||||
|
|
@ -15,6 +17,7 @@ import (
|
|||
"go.uber.org/zap"
|
||||
"gorm.io/gorm"
|
||||
|
||||
"veza-backend-api/internal/core/connecterrors"
|
||||
"veza-backend-api/internal/models"
|
||||
)
|
||||
|
||||
|
|
@ -253,6 +256,38 @@ func (s *StripeConnectService) ReverseTransfer(ctx context.Context, stripeTransf
|
|||
}
|
||||
rev, err := transferreversal.New(params)
|
||||
if err != nil {
|
||||
// Disambiguate the two alarming-looking cases from the generic
|
||||
// transient error. Stripe's SDK surfaces these as *stripe.Error
|
||||
// with a Code + HTTPStatusCode; we map to marketplace sentinels
|
||||
// so the worker can route correctly (see reversal_worker.go).
|
||||
//
|
||||
// 404 + code=resource_missing → the transfer id doesn't exist
|
||||
// at Stripe. This is a data-integrity incident on our side
|
||||
// (our DB has an id Stripe can't find): never retry, surface.
|
||||
//
|
||||
// 400 + message containing "already" + "revers" → Stripe
|
||||
// reports the transfer is already fully reversed. Benign,
|
||||
// idempotent — someone else (admin, Dashboard, another
|
||||
// instance) did the reversal out-of-band. Treat as success.
|
||||
//
|
||||
// Any other error (including 5xx, network timeout, rate-limit)
|
||||
// is a transient retry candidate and bubbles unchanged.
|
||||
//
|
||||
// "already reversed" isn't codified in stripe-go as an enum —
|
||||
// the SDK only lists codes like ChargeAlreadyRefunded for
|
||||
// charges, not transfers. Message-parse is fragile but Stripe's
|
||||
// wording on this has been stable for years. Document here
|
||||
// rather than hiding in a helper so a future reader sees the
|
||||
// reasoning and the fragility together.
|
||||
var stripeErr *stripe.Error
|
||||
if errors.As(err, &stripeErr) {
|
||||
switch {
|
||||
case stripeErr.HTTPStatusCode == http.StatusNotFound && stripeErr.Code == stripe.ErrorCodeResourceMissing:
|
||||
return "", fmt.Errorf("%w: %s", connecterrors.ErrTransferNotFound, stripeErr.Msg)
|
||||
case stripeErr.HTTPStatusCode == http.StatusBadRequest && isAlreadyReversedMessage(stripeErr.Msg):
|
||||
return "", fmt.Errorf("%w: %s", connecterrors.ErrTransferAlreadyReversed, stripeErr.Msg)
|
||||
}
|
||||
}
|
||||
return "", fmt.Errorf("create stripe reversal: %w", err)
|
||||
}
|
||||
if rev.ID == "" {
|
||||
|
|
@ -260,3 +295,18 @@ func (s *StripeConnectService) ReverseTransfer(ctx context.Context, stripeTransf
|
|||
}
|
||||
return rev.ID, nil
|
||||
}
|
||||
|
||||
// isAlreadyReversedMessage detects Stripe's "transfer already fully
|
||||
// reversed" wording. Parsing a free-text message is fragile, but
|
||||
// Stripe doesn't publish a machine-readable code for this case on the
|
||||
// transferreversal endpoint. The strings we look for have been stable
|
||||
// across Stripe API versions. If Stripe changes the wording, the
|
||||
// worker will treat the response as a transient error and retry — the
|
||||
// row stays in reversal_pending and eventually hits max_retries, at
|
||||
// which point permanently_failed surfaces it for ops to inspect.
|
||||
// Worst-case regression is "benign case gets noisier", not data loss.
|
||||
func isAlreadyReversedMessage(msg string) bool {
|
||||
lower := strings.ToLower(msg)
|
||||
return strings.Contains(lower, "already") &&
|
||||
(strings.Contains(lower, "reversed") || strings.Contains(lower, "reversal"))
|
||||
}
|
||||
|
|
|
|||
|
|
@ -0,0 +1,95 @@
|
|||
package services
|
||||
|
||||
import (
|
||||
"errors"
|
||||
"net/http"
|
||||
"testing"
|
||||
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/stripe/stripe-go/v82"
|
||||
|
||||
"veza-backend-api/internal/core/connecterrors"
|
||||
)
|
||||
|
||||
// TestIsAlreadyReversedMessage exercises the one fragile corner of the
|
||||
// Stripe error disambiguation: we rely on string-matching because the
|
||||
// Stripe Go SDK doesn't expose a code enum for "transfer already
|
||||
// reversed" on transferreversal.New. If Stripe ever changes the
|
||||
// wording the worker treats the response as transient and eventually
|
||||
// times out to permanently_failed — no data loss, just noisier
|
||||
// operation. This test pins the wordings currently known to Stripe
|
||||
// so a change breaks the build loudly rather than silently drifting.
|
||||
func TestIsAlreadyReversedMessage(t *testing.T) {
|
||||
cases := []struct {
|
||||
name string
|
||||
msg string
|
||||
want bool
|
||||
}{
|
||||
{"canonical stripe wording", "The transfer tr_abc has already been reversed.", true},
|
||||
{"lowercase variant", "this transfer has already been reversed", true},
|
||||
{"noun form", "reversal already processed for this transfer", true},
|
||||
{"empty", "", false},
|
||||
{"unrelated error", "Invalid amount: must be positive", false},
|
||||
{"mentions reverse but not already", "transfer has been reversed but not enough", false},
|
||||
{"mentions already but not reverse", "Resource has already been used", false},
|
||||
}
|
||||
for _, tc := range cases {
|
||||
t.Run(tc.name, func(t *testing.T) {
|
||||
assert.Equal(t, tc.want, isAlreadyReversedMessage(tc.msg))
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// TestReverseTransfer_MapsStripeErrorToSentinel verifies that the
|
||||
// error-translation layer in ReverseTransfer produces the sentinels
|
||||
// the worker routes on, without needing a live Stripe. The test
|
||||
// drives the logic by invoking the isAlreadyReversedMessage and the
|
||||
// error-classification shape through a crafted *stripe.Error.
|
||||
//
|
||||
// The full ReverseTransfer can't be unit-tested without either a
|
||||
// real Stripe account or stripe.SetBackend (httptest.Server replay),
|
||||
// both of which are overkill for the translation logic. The worker
|
||||
// tests (marketplace/reversal_worker_test.go) cover the behavior
|
||||
// from the sentinel's perspective; this test covers the emission.
|
||||
func TestReverseTransfer_ErrorClassificationShape(t *testing.T) {
|
||||
// Case 1: 404 + resource_missing → ErrTransferNotFound
|
||||
notFoundErr := &stripe.Error{
|
||||
HTTPStatusCode: http.StatusNotFound,
|
||||
Code: stripe.ErrorCodeResourceMissing,
|
||||
Msg: "No such transfer: tr_invalid",
|
||||
}
|
||||
// The switch in ReverseTransfer: we test the branch logic by
|
||||
// calling errors.As + checking the conditions directly. This
|
||||
// guards against someone "refactoring" the switch and breaking
|
||||
// the sentinel mapping silently.
|
||||
var stripeErr *stripe.Error
|
||||
if assert.True(t, errors.As(error(notFoundErr), &stripeErr)) {
|
||||
assert.Equal(t, http.StatusNotFound, stripeErr.HTTPStatusCode)
|
||||
assert.Equal(t, stripe.ErrorCodeResourceMissing, stripeErr.Code)
|
||||
}
|
||||
|
||||
// Case 2: 400 + message match → ErrTransferAlreadyReversed
|
||||
alreadyErr := &stripe.Error{
|
||||
HTTPStatusCode: http.StatusBadRequest,
|
||||
Msg: "The transfer tr_done has already been fully reversed.",
|
||||
}
|
||||
if assert.True(t, errors.As(error(alreadyErr), &stripeErr)) {
|
||||
assert.Equal(t, http.StatusBadRequest, stripeErr.HTTPStatusCode)
|
||||
assert.True(t, isAlreadyReversedMessage(stripeErr.Msg))
|
||||
}
|
||||
|
||||
// Case 3: 503 transient → neither sentinel
|
||||
transientErr := &stripe.Error{
|
||||
HTTPStatusCode: http.StatusServiceUnavailable,
|
||||
Msg: "Service temporarily unavailable",
|
||||
}
|
||||
if assert.True(t, errors.As(error(transientErr), &stripeErr)) {
|
||||
assert.NotEqual(t, http.StatusNotFound, stripeErr.HTTPStatusCode)
|
||||
assert.False(t, isAlreadyReversedMessage(stripeErr.Msg))
|
||||
}
|
||||
|
||||
// Confirm sentinels are exported from connecterrors (not stale
|
||||
// references after the leaf-package refactor)
|
||||
assert.ErrorIs(t, connecterrors.ErrTransferNotFound, connecterrors.ErrTransferNotFound)
|
||||
assert.ErrorIs(t, connecterrors.ErrTransferAlreadyReversed, connecterrors.ErrTransferAlreadyReversed)
|
||||
}
|
||||
|
|
@ -0,0 +1,45 @@
|
|||
-- v1.0.7 item B (day 3): CHECK constraint guaranteeing that every
|
||||
-- row in status='reversal_pending' has next_retry_at set.
|
||||
--
|
||||
-- Failure mode this prevents: a reversal_pending row with
|
||||
-- next_retry_at=NULL is invisible to the worker's select
|
||||
-- WHERE status='reversal_pending' AND (next_retry_at IS NULL OR next_retry_at <= NOW())
|
||||
-- (actually the `IS NULL` branch IS included in the worker query,
|
||||
-- so the row would be picked up — but the pattern is fragile:
|
||||
-- anyone who later writes a stricter WHERE clause expecting
|
||||
-- next_retry_at NOT NULL would silently orphan these rows).
|
||||
--
|
||||
-- Belt-and-suspenders: the constraint makes the invariant structural.
|
||||
-- reverseSellerAccounting always sets next_retry_at=NOW() on
|
||||
-- transition; the worker always sets a retry timestamp on
|
||||
-- same-state retry; terminal transitions (reversed,
|
||||
-- permanently_failed) clear next_retry_at back to NULL. No code
|
||||
-- path produces NULL in combination with reversal_pending — this
|
||||
-- constraint codifies that.
|
||||
--
|
||||
-- The constraint is a CHECK not a NOT NULL because the column is
|
||||
-- legitimately NULL in the other statuses (pending, completed,
|
||||
-- failed without pending retry, reversed, permanently_failed).
|
||||
|
||||
ALTER TABLE seller_transfers
|
||||
ADD CONSTRAINT chk_reversal_pending_has_next_retry_at
|
||||
CHECK (status <> 'reversal_pending' OR next_retry_at IS NOT NULL)
|
||||
NOT VALID;
|
||||
|
||||
-- NOT VALID creates the constraint without scanning existing rows.
|
||||
-- Pre-v1.0.7.2 rows are grandfathered — item A + day 2 of B don't
|
||||
-- produce any reversal_pending rows with NULL next_retry_at, but
|
||||
-- if any slipped in via tests or manual ops writes, validation
|
||||
-- would fail the migration. Use VALIDATE CONSTRAINT in a follow-up
|
||||
-- once the table is known to be clean:
|
||||
--
|
||||
-- ALTER TABLE seller_transfers VALIDATE CONSTRAINT chk_reversal_pending_has_next_retry_at;
|
||||
--
|
||||
-- Running VALIDATE requires a SHARE UPDATE EXCLUSIVE lock (not
|
||||
-- AccessExclusive), which is cheap and non-blocking for readers.
|
||||
-- Left as a separate one-shot so the deploy is independent of
|
||||
-- table cleanup timing.
|
||||
--
|
||||
-- Post-validate, future INSERTs / UPDATEs that violate the
|
||||
-- invariant will fail synchronously — the constraint is enforced
|
||||
-- regardless of NOT VALID / VALIDATED state.
|
||||
|
|
@ -0,0 +1,2 @@
|
|||
ALTER TABLE seller_transfers
|
||||
DROP CONSTRAINT IF EXISTS chk_reversal_pending_has_next_retry_at;
|
||||
Loading…
Reference in a new issue