feat(webhooks): persist raw hyperswitch payloads to audit log — v1.0.7 item E
Every POST /webhooks/hyperswitch delivery now writes a row to
`hyperswitch_webhook_log` regardless of signature-valid or
processing outcome. Captures both legitimate deliveries and attack
probes — a forensics query now has the actual bytes to read, not
just a "webhook rejected" log line. Disputes (axis-1 P1.6) ride
along: the log captures dispute.* events alongside payment and
refund events, ready for when disputes get a handler.
Table shape (migration 984):
* payload TEXT — readable in psql, invalid UTF-8 replaced with
empty (forensics value is in headers + ip + timing for those
attacks, not the binary body).
* signature_valid BOOLEAN + partial index for "show me attack
attempts" being instantaneous.
* processing_result TEXT — 'ok' / 'error: <msg>' /
'signature_invalid' / 'skipped'. Matches the P1.5 action
semantic exactly.
* source_ip, user_agent, request_id — forensics essentials.
request_id is captured from Hyperswitch's X-Request-Id header
when present, else a server-side UUID so every row correlates
to VEZA's structured logs.
* event_type — best-effort extract from the JSON payload, NULL
on malformed input.
Hardening:
* 64KB body cap via io.LimitReader rejects oversize with 413
before any INSERT — prevents log-spam DoS.
* Single INSERT per delivery with final state; no two-phase
update race on signature-failure path. signature_invalid and
processing-error rows both land.
* DB persistence failures are logged but swallowed — the
endpoint's contract is to ack Hyperswitch, not perfect audit.
Retention sweep:
* CleanupHyperswitchWebhookLog in internal/jobs, daily tick,
batched DELETE (10k rows + 100ms pause) so a large backlog
doesn't lock the table.
* HYPERSWITCH_WEBHOOK_LOG_RETENTION_DAYS (default 90).
* Same goroutine-ticker pattern as ScheduleOrphanTracksCleanup.
* Wired in cmd/api/main.go alongside the existing cleanup jobs.
Tests: 5 in webhook_log_test.go (persistence, request_id auto-gen,
invalid-JSON leaves event_type empty, invalid-signature capture,
extractEventType 5 sub-cases) + 4 in cleanup_hyperswitch_webhook_
log_test.go (deletes-older-than, noop, default-on-zero,
context-cancel). Migration 984 applied cleanly to local Postgres;
all indexes present.
Also (v107-plan.md):
* Item G acceptance gains an explicit Idempotency-Key threading
requirement with an empty-key loud-fail test — "literally
copy-paste D's 4-line test skeleton". Closes the risk that
item G silently reopens the HTTP-retry duplicate-charge
exposure D closed.
Out of scope for E (noted in CHANGELOG):
* Rate limit on the endpoint — pre-existing middleware covers
it at the router level; adding a per-endpoint limit is
separate scope.
* Readable-payload SQL view — deferred, the TEXT column is
already human-readable; a convenience view is a nice-to-have
not a ship-blocker.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 00:44:58 +00:00
|
|
|
package hyperswitch
|
|
|
|
|
|
|
|
|
|
import (
|
|
|
|
|
"context"
|
|
|
|
|
"encoding/json"
|
|
|
|
|
"time"
|
|
|
|
|
|
|
|
|
|
"github.com/google/uuid"
|
|
|
|
|
"gorm.io/gorm"
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
// MaxWebhookPayloadBytes caps the body size the handler accepts before
|
2026-04-18 02:05:16 +00:00
|
|
|
// persisting. Hyperswitch's own payloads are in the low-KB range
|
|
|
|
|
// (1-5 KB typical for payment/refund events); 256KB is defense in
|
|
|
|
|
// depth.
|
|
|
|
|
//
|
|
|
|
|
// Why 256KB and not 64KB: dispute-class events may carry metadata
|
|
|
|
|
// (line items, customer context, evidence references) that inflates
|
|
|
|
|
// beyond the typical event size. A 64KB cap created a non-zero risk
|
|
|
|
|
// of silently dropping a legitimate dispute webhook — that class of
|
|
|
|
|
// event is exactly what makes axis-1 P1.6 (disputes) a v1.0.8 item,
|
|
|
|
|
// and losing one to a too-aggressive cap would be the worst kind of
|
|
|
|
|
// self-inflicted wound. 256KB is 50x the typical payload, ~10x the
|
|
|
|
|
// inflated dispute-metadata ceiling we've observed in similar PSPs,
|
|
|
|
|
// and still tightly bounded: even at rate-limit ceiling (100 req/s
|
|
|
|
|
// per-IP), worst-case sustained = ~25MB/s, cleaned up daily.
|
|
|
|
|
//
|
|
|
|
|
// The rate limit is the primary DoS defense; this cap is defense in
|
|
|
|
|
// depth. If we ever see legitimate traffic nudging the cap, the
|
|
|
|
|
// correct response is raising the cap, not the rate limit — payload
|
|
|
|
|
// size and request frequency are orthogonal attack surfaces.
|
|
|
|
|
const MaxWebhookPayloadBytes = 256 * 1024
|
feat(webhooks): persist raw hyperswitch payloads to audit log — v1.0.7 item E
Every POST /webhooks/hyperswitch delivery now writes a row to
`hyperswitch_webhook_log` regardless of signature-valid or
processing outcome. Captures both legitimate deliveries and attack
probes — a forensics query now has the actual bytes to read, not
just a "webhook rejected" log line. Disputes (axis-1 P1.6) ride
along: the log captures dispute.* events alongside payment and
refund events, ready for when disputes get a handler.
Table shape (migration 984):
* payload TEXT — readable in psql, invalid UTF-8 replaced with
empty (forensics value is in headers + ip + timing for those
attacks, not the binary body).
* signature_valid BOOLEAN + partial index for "show me attack
attempts" being instantaneous.
* processing_result TEXT — 'ok' / 'error: <msg>' /
'signature_invalid' / 'skipped'. Matches the P1.5 action
semantic exactly.
* source_ip, user_agent, request_id — forensics essentials.
request_id is captured from Hyperswitch's X-Request-Id header
when present, else a server-side UUID so every row correlates
to VEZA's structured logs.
* event_type — best-effort extract from the JSON payload, NULL
on malformed input.
Hardening:
* 64KB body cap via io.LimitReader rejects oversize with 413
before any INSERT — prevents log-spam DoS.
* Single INSERT per delivery with final state; no two-phase
update race on signature-failure path. signature_invalid and
processing-error rows both land.
* DB persistence failures are logged but swallowed — the
endpoint's contract is to ack Hyperswitch, not perfect audit.
Retention sweep:
* CleanupHyperswitchWebhookLog in internal/jobs, daily tick,
batched DELETE (10k rows + 100ms pause) so a large backlog
doesn't lock the table.
* HYPERSWITCH_WEBHOOK_LOG_RETENTION_DAYS (default 90).
* Same goroutine-ticker pattern as ScheduleOrphanTracksCleanup.
* Wired in cmd/api/main.go alongside the existing cleanup jobs.
Tests: 5 in webhook_log_test.go (persistence, request_id auto-gen,
invalid-JSON leaves event_type empty, invalid-signature capture,
extractEventType 5 sub-cases) + 4 in cleanup_hyperswitch_webhook_
log_test.go (deletes-older-than, noop, default-on-zero,
context-cancel). Migration 984 applied cleanly to local Postgres;
all indexes present.
Also (v107-plan.md):
* Item G acceptance gains an explicit Idempotency-Key threading
requirement with an empty-key loud-fail test — "literally
copy-paste D's 4-line test skeleton". Closes the risk that
item G silently reopens the HTTP-retry duplicate-charge
exposure D closed.
Out of scope for E (noted in CHANGELOG):
* Rate limit on the endpoint — pre-existing middleware covers
it at the router level; adding a per-endpoint limit is
separate scope.
* Readable-payload SQL view — deferred, the TEXT column is
already human-readable; a convenience view is a nice-to-have
not a ship-blocker.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 00:44:58 +00:00
|
|
|
|
|
|
|
|
// WebhookLog mirrors the hyperswitch_webhook_log table. Written once
|
|
|
|
|
// per webhook delivery (even on signature failure or oversize) so the
|
|
|
|
|
// forensics trail captures attack attempts alongside legitimate
|
|
|
|
|
// traffic.
|
|
|
|
|
type WebhookLog struct {
|
|
|
|
|
ID uuid.UUID `gorm:"type:uuid;primaryKey" json:"id"`
|
|
|
|
|
ReceivedAt time.Time `gorm:"autoCreateTime;column:received_at" json:"received_at"`
|
|
|
|
|
Payload string `gorm:"type:text;column:payload" json:"payload"`
|
|
|
|
|
SignatureValid bool `gorm:"column:signature_valid" json:"signature_valid"`
|
|
|
|
|
SignatureHeader string `gorm:"column:signature_header" json:"signature_header,omitempty"`
|
|
|
|
|
ProcessingResult string `gorm:"column:processing_result;type:text" json:"processing_result"`
|
|
|
|
|
EventType string `gorm:"column:event_type" json:"event_type,omitempty"`
|
|
|
|
|
SourceIP string `gorm:"column:source_ip" json:"source_ip,omitempty"`
|
|
|
|
|
UserAgent string `gorm:"column:user_agent" json:"user_agent,omitempty"`
|
|
|
|
|
RequestID string `gorm:"column:request_id" json:"request_id"`
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// TableName pins the table name for GORM — the struct would otherwise
|
|
|
|
|
// pluralize to `webhook_logs`.
|
|
|
|
|
func (WebhookLog) TableName() string { return "hyperswitch_webhook_log" }
|
|
|
|
|
|
|
|
|
|
// BeforeCreate populates the UUID if the caller left it zero.
|
|
|
|
|
func (w *WebhookLog) BeforeCreate(tx *gorm.DB) error {
|
|
|
|
|
if w.ID == uuid.Nil {
|
|
|
|
|
w.ID = uuid.New()
|
|
|
|
|
}
|
|
|
|
|
return nil
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// LogWebhook inserts a single audit row for a webhook delivery.
|
|
|
|
|
// Intended for one-shot use from the HTTP handler; any failure here
|
|
|
|
|
// is logged by the caller but never fails the webhook response — the
|
|
|
|
|
// primary job of the endpoint is to ack Hyperswitch, not to persist
|
|
|
|
|
// audit perfectly.
|
|
|
|
|
//
|
|
|
|
|
// event_type is extracted from the payload on a best-effort basis: if
|
|
|
|
|
// the JSON parses and carries an event_type field, we capture it; if
|
|
|
|
|
// not (malformed payload, attack probe), we leave it empty. No insert
|
|
|
|
|
// failure for malformed payloads — that's the entire point of the log.
|
|
|
|
|
func LogWebhook(ctx context.Context, db *gorm.DB, row *WebhookLog) error {
|
|
|
|
|
if row.RequestID == "" {
|
|
|
|
|
row.RequestID = uuid.New().String()
|
|
|
|
|
}
|
|
|
|
|
if row.EventType == "" {
|
|
|
|
|
row.EventType = extractEventType(row.Payload)
|
|
|
|
|
}
|
|
|
|
|
return db.WithContext(ctx).Create(row).Error
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// extractEventType attempts to pull the `event_type` field from a JSON
|
|
|
|
|
// payload. Returns empty string on any parse failure — event_type is
|
|
|
|
|
// informational, not a join key, so unknown is a fine default.
|
|
|
|
|
func extractEventType(payload string) string {
|
|
|
|
|
if payload == "" {
|
|
|
|
|
return ""
|
|
|
|
|
}
|
|
|
|
|
var probe struct {
|
|
|
|
|
EventType string `json:"event_type"`
|
|
|
|
|
}
|
|
|
|
|
if err := json.Unmarshal([]byte(payload), &probe); err != nil {
|
|
|
|
|
return ""
|
|
|
|
|
}
|
|
|
|
|
return probe.EventType
|
|
|
|
|
}
|