veza/veza-backend-api/internal/services/hyperswitch/client.go
senke 7e180a2c08 feat(workers): hyperswitch reconciliation sweep for stuck pending states — v1.0.7 item C
New ReconcileHyperswitchWorker sweeps for pending orders and refunds
whose terminal webhook never arrived. Pulls live PSP state for each
stuck row and synthesises a webhook payload to feed the normal
ProcessPaymentWebhook / ProcessRefundWebhook dispatcher. The existing
terminal-state guards on those handlers make reconciliation
idempotent against real webhooks — a late webhook after the reconciler
resolved the row is a no-op.

Three stuck-state classes covered:
  1. Stuck orders (pending > 30m, non-empty payment_id) → GetPaymentStatus
     + synthetic payment.<status> webhook.
  2. Stuck refunds with PSP id (pending > 30m, non-empty
     hyperswitch_refund_id) → GetRefundStatus + synthetic
     refund.<status> webhook (error_message forwarded).
  3. Orphan refunds (pending > 5m, EMPTY hyperswitch_refund_id) →
     mark failed + roll order back to completed + log ERROR. This
     is the "we crashed between Phase 1 and Phase 2 of RefundOrder"
     case, operator-attention territory.

New interfaces:
  * marketplace.HyperswitchReadClient — read-only PSP surface the
    worker depends on (GetPaymentStatus, GetRefundStatus). The
    worker never calls CreatePayment / CreateRefund.
  * hyperswitch.Client.GetRefund + RefundStatus struct added.
  * hyperswitch.Provider gains GetRefundStatus + GetPaymentStatus
    pass-throughs that satisfy the marketplace interface.

Configuration (all env-var tunable with sensible defaults):
  * RECONCILE_WORKER_ENABLED=true
  * RECONCILE_INTERVAL=1h (ops can drop to 5m during incident
    response without a code change)
  * RECONCILE_ORDER_STUCK_AFTER=30m
  * RECONCILE_REFUND_STUCK_AFTER=30m
  * RECONCILE_REFUND_ORPHAN_AFTER=5m (shorter because "app crashed"
    is a different signal from "network hiccup")

Operational details:
  * Batch limit 50 rows per phase per tick so a 10k-row backlog
    doesn't hammer Hyperswitch. Next tick picks up the rest.
  * PSP read errors leave the row untouched — next tick retries.
    Reconciliation is always safe to replay.
  * Structured log on every action so `grep reconcile` tells the
    ops story: which order/refund got synced, against what status,
    how long it was stuck.
  * Worker wired in cmd/api/main.go, gated on
    HyperswitchEnabled + HyperswitchAPIKey. Graceful shutdown
    registered.
  * RunOnce exposed as public API for ad-hoc ops trigger during
    incident response.

Tests — 10 cases, all green (sqlite :memory:):
  * TestReconcile_StuckOrder_SyncsViaSyntheticWebhook
  * TestReconcile_RecentOrder_NotTouched
  * TestReconcile_CompletedOrder_NotTouched
  * TestReconcile_OrderWithEmptyPaymentID_NotTouched
  * TestReconcile_PSPReadErrorLeavesRowIntact
  * TestReconcile_OrphanRefund_AutoFails_OrderRollsBack
  * TestReconcile_RecentOrphanRefund_NotTouched
  * TestReconcile_StuckRefund_SyncsViaSyntheticWebhook
  * TestReconcile_StuckRefund_FailureStatus_PassesErrorMessage
  * TestReconcile_AllTerminalStates_NoOp

CHANGELOG v1.0.7-rc1 updated with the full item C section between D
and the existing E block, matching the order convention (ship order:
A → D → B → E → C, CHANGELOG order follows).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 03:08:15 +02:00

260 lines
8.9 KiB
Go

package hyperswitch
import (
"bytes"
"context"
"encoding/json"
"fmt"
"net/http"
"time"
)
// Client is the Hyperswitch API client for payment operations.
type Client struct {
baseURL string
apiKey string
httpClient *http.Client
}
// NewClient creates a new Hyperswitch client.
func NewClient(baseURL, apiKey string) *Client {
return &Client{
baseURL: baseURL,
apiKey: apiKey,
httpClient: &http.Client{
Timeout: 30 * time.Second,
},
}
}
// CreatePaymentRequest is the request body for POST /payments.
type CreatePaymentRequest struct {
Amount int64 `json:"amount"` // Amount in minor units (e.g. centimes for EUR)
Currency string `json:"currency"` // e.g. "EUR"
ReturnURL string `json:"return_url,omitempty"`
Metadata map[string]string `json:"metadata,omitempty"`
}
// PaymentResponse is the response from POST /payments.
type PaymentResponse struct {
PaymentID string `json:"payment_id"`
ClientSecret string `json:"client_secret"`
Status string `json:"status"`
Amount int64 `json:"amount"`
Currency string `json:"currency"`
}
// PaymentStatus is the response from GET /payments/{payment_id}.
type PaymentStatus struct {
PaymentID string `json:"payment_id"`
Status string `json:"status"`
}
// CreatePayment creates a payment in Hyperswitch and returns client_secret for frontend.
//
// idempotencyKey is REQUIRED (v1.0.7 item D) and sent as the
// `Idempotency-Key` header. Hyperswitch short-circuits subsequent
// requests carrying the same key (within the PSP-side TTL — typically
// 24h to 7d, verify against current docs) to the original response,
// so an HTTP-layer retry (TLS reconnect, proxy flap, DNS hiccup) on
// the same call produces at-most-once semantics. The key MUST be
// stable across retries of the same logical call — order.ID.String()
// at the site that creates orders is the canonical choice.
//
// Scope note: this header only addresses HTTP-transport retry within
// a single CreatePayment invocation. It does NOT address
// application-level replay (user double-click, form double-submit,
// retry after crash before DB write). That class of bug requires
// state-machine preconditions on VEZA side and is addressed by the
// order state machine + checkout handler's existing guards.
func (c *Client) CreatePayment(ctx context.Context, idempotencyKey string, amount int64, currency, orderID, returnURL string, metadata map[string]string) (*PaymentResponse, error) {
if idempotencyKey == "" {
return nil, fmt.Errorf("create payment: idempotency key required")
}
if metadata == nil {
metadata = make(map[string]string)
}
if orderID != "" {
metadata["order_id"] = orderID
}
reqBody := CreatePaymentRequest{
Amount: amount,
Currency: currency,
ReturnURL: returnURL,
Metadata: metadata,
}
body, err := json.Marshal(reqBody)
if err != nil {
return nil, fmt.Errorf("marshal create payment request: %w", err)
}
req, err := http.NewRequestWithContext(ctx, http.MethodPost, c.baseURL+"/payments", bytes.NewReader(body))
if err != nil {
return nil, fmt.Errorf("create request: %w", err)
}
req.Header.Set("Content-Type", "application/json")
req.Header.Set("api-key", c.apiKey)
req.Header.Set("Idempotency-Key", idempotencyKey)
resp, err := c.httpClient.Do(req)
if err != nil {
return nil, fmt.Errorf("http request: %w", err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
return nil, fmt.Errorf("hyperswitch create payment failed: status %d", resp.StatusCode)
}
var out PaymentResponse
if err := json.NewDecoder(resp.Body).Decode(&out); err != nil {
return nil, fmt.Errorf("decode response: %w", err)
}
return &out, nil
}
// CreatePaymentSimple creates a payment and returns paymentID and clientSecret.
// Convenience wrapper for PaymentProvider interface.
func (c *Client) CreatePaymentSimple(ctx context.Context, idempotencyKey string, amount int64, currency, orderID, returnURL string, metadata map[string]string) (paymentID, clientSecret string, err error) {
resp, err := c.CreatePayment(ctx, idempotencyKey, amount, currency, orderID, returnURL, metadata)
if err != nil {
return "", "", err
}
return resp.PaymentID, resp.ClientSecret, nil
}
// GetPaymentStatus retrieves payment status string from Hyperswitch.
func (c *Client) GetPaymentStatus(ctx context.Context, paymentID string) (string, error) {
status, err := c.GetPayment(ctx, paymentID)
if err != nil {
return "", err
}
return status.Status, nil
}
// GetPayment retrieves payment status from Hyperswitch.
func (c *Client) GetPayment(ctx context.Context, paymentID string) (*PaymentStatus, error) {
req, err := http.NewRequestWithContext(ctx, http.MethodGet, c.baseURL+"/payments/"+paymentID, nil)
if err != nil {
return nil, fmt.Errorf("create request: %w", err)
}
req.Header.Set("api-key", c.apiKey)
resp, err := c.httpClient.Do(req)
if err != nil {
return nil, fmt.Errorf("http request: %w", err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
return nil, fmt.Errorf("hyperswitch get payment failed: status %d", resp.StatusCode)
}
var out PaymentStatus
if err := json.NewDecoder(resp.Body).Decode(&out); err != nil {
return nil, fmt.Errorf("decode response: %w", err)
}
return &out, nil
}
// CreateRefundRequest is the request body for POST /refunds (v0.403 R2)
type CreateRefundRequest struct {
PaymentID string `json:"payment_id"`
Amount *int64 `json:"amount,omitempty"` // nil = full refund
Reason string `json:"reason,omitempty"`
RefundType string `json:"refund_type,omitempty"` // "instant" or "scheduled"
}
// RefundResponse is the response from POST /refunds
type RefundResponse struct {
RefundID string `json:"refund_id"`
PaymentID string `json:"payment_id"`
Amount int64 `json:"amount"`
Currency string `json:"currency"`
Status string `json:"status"`
}
// RefundStatus is the response from GET /refunds/{refund_id}.
// Used by the reconciliation worker (v1.0.7 item C) to sync stuck
// pending refunds with their actual PSP state.
type RefundStatus struct {
RefundID string `json:"refund_id"`
Status string `json:"status"`
ErrorMessage string `json:"error_message,omitempty"`
}
// GetRefund retrieves refund status from Hyperswitch (v1.0.7 item C).
// Mirror of GetPayment, used by the reconciliation sweep to
// synthesise a webhook payload when we never received one from the
// PSP but the pending refund row has been sitting around too long.
func (c *Client) GetRefund(ctx context.Context, refundID string) (*RefundStatus, error) {
if refundID == "" {
return nil, fmt.Errorf("get refund: empty refund_id")
}
req, err := http.NewRequestWithContext(ctx, http.MethodGet, c.baseURL+"/refunds/"+refundID, nil)
if err != nil {
return nil, fmt.Errorf("create request: %w", err)
}
req.Header.Set("api-key", c.apiKey)
resp, err := c.httpClient.Do(req)
if err != nil {
return nil, fmt.Errorf("http request: %w", err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
return nil, fmt.Errorf("hyperswitch get refund failed: status %d", resp.StatusCode)
}
var out RefundStatus
if err := json.NewDecoder(resp.Body).Decode(&out); err != nil {
return nil, fmt.Errorf("decode response: %w", err)
}
return &out, nil
}
// CreateRefund creates a refund against a payment (v0.403 R2).
//
// idempotencyKey is REQUIRED (v1.0.7 item D) and sent as the
// `Idempotency-Key` header. Canonical choice: the pending Refund
// row's ID — stable across HTTP retries of the same logical refund,
// and unique per refund attempt. Same scope caveats as CreatePayment:
// HTTP-transport-level retry only, not application-level replay.
// Application-level idempotency is guaranteed by the partial UNIQUE
// on `refunds.hyperswitch_refund_id` landed in v1.0.6.1.
func (c *Client) CreateRefund(ctx context.Context, idempotencyKey, paymentID string, amount *int64, reason string) (*RefundResponse, error) {
if idempotencyKey == "" {
return nil, fmt.Errorf("create refund: idempotency key required")
}
reqBody := CreateRefundRequest{
PaymentID: paymentID,
Amount: amount,
Reason: reason,
RefundType: "instant",
}
body, err := json.Marshal(reqBody)
if err != nil {
return nil, fmt.Errorf("marshal refund request: %w", err)
}
req, err := http.NewRequestWithContext(ctx, http.MethodPost, c.baseURL+"/refunds", bytes.NewReader(body))
if err != nil {
return nil, fmt.Errorf("create request: %w", err)
}
req.Header.Set("Content-Type", "application/json")
req.Header.Set("api-key", c.apiKey)
req.Header.Set("Idempotency-Key", idempotencyKey)
resp, err := c.httpClient.Do(req)
if err != nil {
return nil, fmt.Errorf("http request: %w", err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
return nil, fmt.Errorf("hyperswitch create refund failed: status %d", resp.StatusCode)
}
var out RefundResponse
if err := json.NewDecoder(resp.Body).Decode(&out); err != nil {
return nil, fmt.Errorf("decode response: %w", err)
}
return &out, nil
}