Commit graph

4 commits

Author SHA1 Message Date
senke
d2bb9c0e78 feat(marketplace): async stripe connect reversal worker — v1.0.7 item B day 2
Day-2 cut of item B: the reversal path becomes async. Pre-v1.0.7
(and v1.0.7 day 1) the refund handler flipped seller_transfers
straight from completed to reversed without ever calling Stripe —
the ledger said "reversed" while the seller's Stripe balance still
showed the original transfer as settled. The new flow:

  refund.succeeded webhook
    → reverseSellerAccounting transitions row: completed → reversal_pending
    → StripeReversalWorker (every REVERSAL_CHECK_INTERVAL, default 1m)
      → calls ReverseTransfer on Stripe
      → success: row → reversed + persist stripe_reversal_id
      → 404 already-reversed (dead code until day 3): row → reversed + log
      → 404 resource_missing (dead code until day 3): row → permanently_failed
      → transient error: stay reversal_pending, bump retry_count,
        exponential backoff (base * 2^retry, capped at backoffMax)
      → retries exhausted: row → permanently_failed
    → buyer-facing refund completes immediately regardless of Stripe health

State machine enforcement:
  * New `SellerTransfer.TransitionStatus(tx, to, extras)` wraps every
    mutation: validates against AllowedTransferTransitions, guarded
    UPDATE with WHERE status=<from> (optimistic lock semantics), no
    RowsAffected = stale state / concurrent winner detected.
  * processSellerTransfers no longer mutates .Status in place —
    terminal status is decided before struct construction, so the
    row is Created with its final state.
  * transfer_retry.retryOne and admin RetryTransfer route through
    TransitionStatus. Legacy direct assignment removed.
  * TestNoDirectTransferStatusMutation greps the package for any
    `st.Status = "..."` / `t.Status = "..."` / GORM
    Model(&SellerTransfer{}).Update("status"...) outside the
    allowlist and fails if found. Verified by temporarily injecting
    a violation during development — test caught it as expected.

Configuration (v1.0.7 item B):
  * REVERSAL_WORKER_ENABLED=true (default)
  * REVERSAL_MAX_RETRIES=5 (default)
  * REVERSAL_CHECK_INTERVAL=1m (default)
  * REVERSAL_BACKOFF_BASE=1m (default)
  * REVERSAL_BACKOFF_MAX=1h (default, caps exponential growth)
  * .env.template documents TRANSFER_RETRY_* and REVERSAL_* env vars
    so an ops reader can grep them.

Interface change: TransferService.ReverseTransfer(ctx,
stripe_transfer_id, amount *int64, reason) (reversalID, error)
added. All four mocks extended (process_webhook, transfer_retry,
admin_transfer_handler, payment_flow integration). amount=nil means
full reversal; v1.0.7 always passes nil (partial reversal is future
scope per axis-1 P2).

Stripe 404 disambiguation (ErrTransferAlreadyReversed /
ErrTransferNotFound) is wired in the worker as dead code — the
sentinels are declared and the worker branches on them, but
StripeConnectService.ReverseTransfer doesn't yet emit them. Day 3
will parse stripe.Error.Code and populate the sentinels; no worker
change needed at that point. Keeping the handling skeleton in day 2
so the worker's branch shape doesn't change between days and the
tests can already cover all four paths against the mock.

Worker unit tests (9 cases, all green, sqlite :memory:):
  * happy path: reversal_pending → reversed + stripe_reversal_id set
  * already reversed (mock returns sentinel): → reversed + log
  * not found (mock returns sentinel): → permanently_failed + log
  * transient 503: retry_count++, next_retry_at set with backoff,
    stays reversal_pending
  * backoff capped at backoffMax (verified with base=1s, max=10s,
    retry_count=4 → capped at 10s not 16s)
  * max retries exhausted: → permanently_failed
  * legacy row with empty stripe_transfer_id: → permanently_failed,
    does not call Stripe
  * only picks up reversal_pending (skips all other statuses)
  * respects next_retry_at (future rows skipped)

Existing test updated: TestProcessRefundWebhook_SucceededFinalizesState
now asserts the row lands at reversal_pending with next_retry_at
set (worker's responsibility to drive to reversed), not reversed.

Worker wired in cmd/api/main.go alongside TransferRetryWorker,
sharing the same StripeConnectService instance. Shutdown path
registered for graceful stop.

Cut from day 2 scope (per agreed-upon discipline), landing in day 3:
  * Stripe 404 disambiguation implementation (parse error.Code)
  * End-to-end smoke probe (refund → reversal_pending → worker
    processes → reversed) against local Postgres + mock Stripe
  * Batch-size tuning / inter-batch sleep — batchLimit=20 today is
    safely under Stripe's 100 req/s default rate limit; revisit if
    observed load warrants

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 15:34:29 +02:00
senke
e0efdf8210 fix(connect): defensive empty-id guard + admin retry test asserts persistence
Post-A self-review surfaced two gaps:

1. `StripeConnectService.CreateTransfer` trusted Stripe's SDK to
   return a non-empty `tr.ID` on success (`err == nil`). The
   invariant holds in practice, but an empty id silently persisted
   on a completed transfer leaves the row permanently
   un-reversible — which defeats the entire point of item A.
   Added a belt-and-suspenders check that converts `(tr.ID="",
   err=nil)` into a failed transfer.

2. `TestRetryTransfer_Success` (admin handler) exercised the retry
   path but didn't assert that StripeTransferID was persisted after
   a successful retry. The worker path and processSellerTransfers
   both had the assertion; the admin manual-retry path was the
   third entry into the same behavior and lacked coverage. Added
   the assertion.

Decision on scope: v1.0.6.2 added a partial UNIQUE on
stripe_transfer_id (WHERE IS NOT NULL AND <> '') in migration 981,
matching the v1.0.6.1 pattern for refunds.hyperswitch_refund_id.
The combination of (a) the DB partial UNIQUE and (b) this defensive
guard means there is now no code or data path that can persist an
empty transfer id while claiming success.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:03:37 +02:00
senke
eedaad9f83 refactor(connect): persist stripe_transfer_id on create + retry — v1.0.7 item A
TransferService.CreateTransfer signature changes from (...) error to
(...) (string, error) — the caller now captures the Stripe transfer
identifier and persists it on the SellerTransfer row. Pre-v1.0.7 the
stripe_transfer_id column was declared on the model and table but
never written to, which blocked the reversal worker (v1.0.7 item B)
from identifying which transfer to reverse on refund.

Changes:
  * `TransferService` interface and `StripeConnectService.CreateTransfer`
    both return the Stripe transfer id alongside the error.
  * `processSellerTransfers` (marketplace service) persists the id on
    success before `tx.Create(&st)` so a crash between Stripe ACK and
    DB commit leaves no inconsistency.
  * `TransferRetryWorker.retryOne` persists on retry success — a row
    that failed on first attempt and succeeded via the worker is
    reversal-ready all the same.
  * `admin_transfer_handler.RetryTransfer` (manual retry) persists too.
  * `SellerPayout.ExternalPayoutID` is populated by the Connect payout
    flow (`payout.go`) — the field existed but was never written.
  * Four test mocks updated; two tests assert the id is persisted on
    the happy path, one on the failure path confirms we don't write a
    fake id when the provider errors.

Migration `981_seller_transfers_stripe_reversal_id.sql`:
  * Adds nullable `stripe_reversal_id` column for item B.
  * Partial UNIQUE indexes on both stripe_transfer_id and
    stripe_reversal_id (WHERE IS NOT NULL AND <> ''), mirroring the
    v1.0.6.1 pattern for refunds.hyperswitch_refund_id.
  * Logs a count of historical completed transfers that lack an id —
    these are candidates for the backfill CLI follow-up task.

Backfill for historical rows is a separate follow-up (cmd/tools/
backfill_stripe_transfer_ids, calling Stripe's transfers.List with
Destination + Metadata[order_id]). Pre-v1.0.7 transfers without a
backfilled id cannot be auto-reversed on refund — document in P2.9
admin-recovery when it lands. Acceptable scope per v107-plan.

Migration number bumped 980 → 981 because v1.0.6.2 used 980 for the
unpaid-subscription cleanup; v107-plan updated with the note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 13:08:39 +02:00
senke
b3a74d6740 test(admin): add admin transfer handler tests 2026-02-23 23:35:11 +01:00