Phase 1 (commit 2a96766a) opened the pending_payment status: a paid-plan
subscribe path creates a UserSubscription row in pending_payment +
subscription_invoices row carrying the Hyperswitch payment_id, then hands
the client_secret back to the SPA. Phase 2 lands the webhook side: the
PSP-driven state transition that closes the loop.
State machine:
- pending_payment + status=succeeded → invoice paid (paid_at=now), sub active
- pending_payment + status=failed → invoice failed, sub expired
- already terminal → idempotent no-op (paid_at NOT bumped)
- payment_id not in subscription_invoices → marketplace.ErrNotASubscription
(caller falls through to the order webhook flow)
The processor only flips a subscription out of pending_payment. Rows that
have already transitioned (concurrent flow, manual admin action, plan
upgrade) are left alone — the invoice still gets the terminal status
update so the audit trail stays consistent.
New surface:
- hyperswitch.SubscriptionWebhookProcessor — the actual handler. Reads
subscription_invoices by hyperswitch_payment_id, looks up the parent
user_subscriptions row, applies the transition in a single tx.
- hyperswitch.IsSubscriptionEventType — exported helper for callers
that want to skip the DB hit on clearly non-subscription events.
- marketplace.SubscriptionWebhookHandler (interface) +
marketplace.ErrNotASubscription (sentinel) — keeps marketplace from
importing the hyperswitch package while still allowing
ProcessPaymentWebhook to dispatch typed.
- marketplace.WithSubscriptionWebhookHandler (option) — wired by
routes_webhooks.getMarketplaceService so the prod webhook handler
routes subscription events instead of swallowing them as "order not
found".
Dispatcher in ProcessPaymentWebhook: try subscription first, fall through
to the order flow on ErrNotASubscription. Order events are unchanged.
Tests (4, sqlite in-memory, all green):
- Succeeded: pending_payment → active+paid, paid_at set
- Failed: pending_payment → expired+failed
- Idempotent replay: second succeeded webhook is a no-op, paid_at NOT
re-stamped (locks down Hyperswitch's at-least-once delivery contract)
- Unknown payment_id: returns marketplace.ErrNotASubscription so the
dispatcher falls through to ProcessPaymentWebhook's order flow
Removes the v1.0.6.2 "active row without PSP linkage" fantôme pattern
that hasEffectivePayment had to filter retroactively — the Phase 1 +
Phase 2 pair is now the canonical paid-plan creation path.
E2E + recovery endpoint (POST /api/v1/subscriptions/complete/:id) +
distribution gate land in Phase 3 (Day 3 of ROADMAP_V1.0_LAUNCH.md).
SKIP_TESTS=1 rationale: this commit is backend-only (Go); the husky
pre-commit hook only runs frontend typecheck/lint/vitest. Backend tests
verified manually:
$ go test -short -count=1 ./internal/services/hyperswitch/... ./internal/core/marketplace/... ./internal/core/subscription/...
ok veza-backend-api/internal/services/hyperswitch
ok veza-backend-api/internal/core/marketplace
ok veza-backend-api/internal/core/subscription
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every POST /webhooks/hyperswitch delivery now writes a row to
`hyperswitch_webhook_log` regardless of signature-valid or
processing outcome. Captures both legitimate deliveries and attack
probes — a forensics query now has the actual bytes to read, not
just a "webhook rejected" log line. Disputes (axis-1 P1.6) ride
along: the log captures dispute.* events alongside payment and
refund events, ready for when disputes get a handler.
Table shape (migration 984):
* payload TEXT — readable in psql, invalid UTF-8 replaced with
empty (forensics value is in headers + ip + timing for those
attacks, not the binary body).
* signature_valid BOOLEAN + partial index for "show me attack
attempts" being instantaneous.
* processing_result TEXT — 'ok' / 'error: <msg>' /
'signature_invalid' / 'skipped'. Matches the P1.5 action
semantic exactly.
* source_ip, user_agent, request_id — forensics essentials.
request_id is captured from Hyperswitch's X-Request-Id header
when present, else a server-side UUID so every row correlates
to VEZA's structured logs.
* event_type — best-effort extract from the JSON payload, NULL
on malformed input.
Hardening:
* 64KB body cap via io.LimitReader rejects oversize with 413
before any INSERT — prevents log-spam DoS.
* Single INSERT per delivery with final state; no two-phase
update race on signature-failure path. signature_invalid and
processing-error rows both land.
* DB persistence failures are logged but swallowed — the
endpoint's contract is to ack Hyperswitch, not perfect audit.
Retention sweep:
* CleanupHyperswitchWebhookLog in internal/jobs, daily tick,
batched DELETE (10k rows + 100ms pause) so a large backlog
doesn't lock the table.
* HYPERSWITCH_WEBHOOK_LOG_RETENTION_DAYS (default 90).
* Same goroutine-ticker pattern as ScheduleOrphanTracksCleanup.
* Wired in cmd/api/main.go alongside the existing cleanup jobs.
Tests: 5 in webhook_log_test.go (persistence, request_id auto-gen,
invalid-JSON leaves event_type empty, invalid-signature capture,
extractEventType 5 sub-cases) + 4 in cleanup_hyperswitch_webhook_
log_test.go (deletes-older-than, noop, default-on-zero,
context-cancel). Migration 984 applied cleanly to local Postgres;
all indexes present.
Also (v107-plan.md):
* Item G acceptance gains an explicit Idempotency-Key threading
requirement with an empty-key loud-fail test — "literally
copy-paste D's 4-line test skeleton". Closes the risk that
item G silently reopens the HTTP-retry duplicate-charge
exposure D closed.
Out of scope for E (noted in CHANGELOG):
* Rate limit on the endpoint — pre-existing middleware covers
it at the router level; adding a per-endpoint limit is
separate scope.
* Readable-payload SQL view — deferred, the TEXT column is
already human-readable; a convenience view is a nice-to-have
not a ship-blocker.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fourth item of the v1.0.6 backlog, and the structuring one — the pre-
v1.0.6 RefundOrder wrote `status='refunded'` to the DB and called
Hyperswitch synchronously in the same transaction, treating the API
ack as terminal confirmation. In reality Hyperswitch returns `pending`
and only finalizes via webhook. Customers could see "refunded" in the
UI while their bank was still uncredited, and the seller balance
stayed credited even on successful refunds.
v1.0.6 flow
Phase 1 — open a pending refund (short row-locked transaction):
* validate permissions + 14-day window + double-submit guard
* persist Refund{status=pending}
* flip order to `refund_pending` (not `refunded` — that's the
webhook's job)
Phase 2 — call PSP outside the transaction:
* Provider.CreateRefund returns (refund_id, status, err). The
refund_id is the unique idempotency key for the webhook.
* on PSP error: mark Refund{status=failed}, roll order back to
`completed` so the buyer can retry.
* on success: persist hyperswitch_refund_id, stay in `pending`
even if the sync status is "succeeded". The webhook is the only
authoritative signal. (Per customer guidance: "ne jamais flipper
à succeeded sur la réponse synchrone du POST".)
Phase 3 — webhook drives terminal state:
* ProcessRefundWebhook looks up by hyperswitch_refund_id (UNIQUE
constraint in the new `refunds` table guarantees idempotency).
* terminal-state short-circuit: IsTerminal() returns 200 without
mutating anything, so a Hyperswitch retry storm is safe.
* on refund.succeeded: flip refund + order to succeeded/refunded,
revoke licenses, debit seller balance, mark every SellerTransfer
for the order as `reversed`. All within a row-locked tx.
* on refund.failed: flip refund to failed, order back to
`completed`.
Seller-side reconciliation
* SellerBalance.DebitSellerBalance was using Postgres-only GREATEST,
which silently failed on SQLite tests. Ported to a portable
CASE WHEN that clamps at zero in both DBs.
* SellerTransfer.Status = "reversed" captures the refund event in
the ledger. The actual Stripe Connect Transfers:reversal call is
flagged TODO(v1.0.7) — requires wiring through TransferService
with connected-account context that the current transfer worker
doesn't expose. The internal balance is corrected here so the
buyer and seller views match as soon as the PSP confirms; the
missing piece is purely the money-movement round-trip at Stripe.
Webhook routing
* HyperswitchWebhookPayload extended with event_type + refund_id +
error_message, with flat and nested (object.*) shapes supported
(same tolerance as the existing payment fields).
* New IsRefundEvent() discriminator: matches any event_type
containing "refund" (case-insensitive) or presence of refund_id.
routes_webhooks.go peeks the payload once and dispatches to
ProcessRefundWebhook or ProcessPaymentWebhook.
* No signature-verification changes — the same HMAC-SHA512 check
protects both paths.
Handler response
* POST /marketplace/orders/:id/refund now returns
`{ refund: { id, status: "pending" }, message }` so the UI can
surface the in-flight state. A new ErrRefundAlreadyRequested maps
to 400 with a "already in progress" message instead of silently
creating a duplicate row (the double-submit guard checks order
status = `refund_pending` *before* the existing-row check so the
error is explicit).
Schema
* Migration 978_refunds_table.sql adds the `refunds` table with
UNIQUE(hyperswitch_refund_id). The uniqueness constraint is the
load-bearing idempotency guarantee — a duplicate PSP notification
lands on the same DB row, and the webhook handler's
FOR UPDATE + IsTerminal() check turns it into a no-op.
* hyperswitch_refund_id is nullable (NULL between Phase 1 and
Phase 2) so the UNIQUE index ignores rows that haven't been
assigned a PSP id yet.
Partial refunds
* The Provider.CreateRefund signature carries `amount *int64`
already (nil = full), but the service call-site passes nil. Full
refunds only for v1.0.6 — partial-refund UX needs a product
decision and is deferred to v1.0.7. Flagged in the ErrRefund*
section.
Tests (15 cases, all sqlite-in-memory + httptest-style mock provider)
* RefundOrder phase 1
- OpensPendingRefund: pending state, refund_id captured, order
→ refund_pending, licenses untouched
- PSPErrorRollsBack: failed state, order reverts to completed
- DoubleRequestRejected: second call returns
ErrRefundAlreadyRequested, not a generic ErrOrderNotRefundable
- NotCompleted / NoPaymentID / Forbidden / SellerCanRefund
- ExpiredRefundWindow / FallbackExpiredNoDeadline
* ProcessRefundWebhook
- SucceededFinalizesState: refund + order + licenses + seller
balance + seller transfer all reconciled in one tx
- FailedRollsOrderBack: order returns to completed for retry
- IsRefundEventIdempotentOnReplay: second webhook asserts
succeeded_at timestamp is *unchanged*, proving the second
invocation bailed out on IsTerminal (not re-ran)
- UnknownRefundIDReturnsOK: never-issued refund_id → 200 silent
(avoids a Hyperswitch retry storm on stale events)
- MissingRefundID: explicit 400 error
- NonTerminalStatusIgnored: pending/processing leave the row
alone
* HyperswitchWebhookPayload.IsRefundEvent: 6 dispatcher cases
(flat event_type, mixed case, payment event, refund_id alone,
empty, nested object.refund_id)
Backward compat
* hyperswitch.Provider still exposes the old Refund(ctx,...) error
method for any call-site that only cared about success/failure.
* Old mockRefundPaymentProvider replaced; external mocks need to
add CreateRefund — the interface is now (refundID, status, err).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>