senke/veza

History

senke 3c4d0148be feat(webhooks): persist raw hyperswitch payloads to audit log — v1.0.7 item E Every POST /webhooks/hyperswitch delivery now writes a row to `hyperswitch_webhook_log` regardless of signature-valid or processing outcome. Captures both legitimate deliveries and attack probes — a forensics query now has the actual bytes to read, not just a "webhook rejected" log line. Disputes (axis-1 P1.6) ride along: the log captures dispute.* events alongside payment and refund events, ready for when disputes get a handler. Table shape (migration 984): * payload TEXT — readable in psql, invalid UTF-8 replaced with empty (forensics value is in headers + ip + timing for those attacks, not the binary body). * signature_valid BOOLEAN + partial index for "show me attack attempts" being instantaneous. * processing_result TEXT — 'ok' / 'error: <msg>' / 'signature_invalid' / 'skipped'. Matches the P1.5 action semantic exactly. * source_ip, user_agent, request_id — forensics essentials. request_id is captured from Hyperswitch's X-Request-Id header when present, else a server-side UUID so every row correlates to VEZA's structured logs. * event_type — best-effort extract from the JSON payload, NULL on malformed input. Hardening: * 64KB body cap via io.LimitReader rejects oversize with 413 before any INSERT — prevents log-spam DoS. * Single INSERT per delivery with final state; no two-phase update race on signature-failure path. signature_invalid and processing-error rows both land. * DB persistence failures are logged but swallowed — the endpoint's contract is to ack Hyperswitch, not perfect audit. Retention sweep: * CleanupHyperswitchWebhookLog in internal/jobs, daily tick, batched DELETE (10k rows + 100ms pause) so a large backlog doesn't lock the table. * HYPERSWITCH_WEBHOOK_LOG_RETENTION_DAYS (default 90). * Same goroutine-ticker pattern as ScheduleOrphanTracksCleanup. * Wired in cmd/api/main.go alongside the existing cleanup jobs. Tests: 5 in webhook_log_test.go (persistence, request_id auto-gen, invalid-JSON leaves event_type empty, invalid-signature capture, extractEventType 5 sub-cases) + 4 in cleanup_hyperswitch_webhook_ log_test.go (deletes-older-than, noop, default-on-zero, context-cancel). Migration 984 applied cleanly to local Postgres; all indexes present. Also (v107-plan.md): * Item G acceptance gains an explicit Idempotency-Key threading requirement with an empty-key loud-fail test — "literally copy-paste D's 4-line test skeleton". Closes the risk that item G silently reopens the HTTP-retry duplicate-charge exposure D closed. Out of scope for E (noted in CHANGELOG): * Rate limit on the endpoint — pre-existing middleware covers it at the router level; adding a per-endpoint limit is separate scope. * Readable-payload SQL view — deferred, the TEXT column is already human-readable; a convenience view is a nice-to-have not a ship-blocker. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-04-18 02:44:58 +02:00
..
axis-1-correctness.md	fix(distribution,audit): propagate ErrSubscriptionNoPayment to handler + P0.12 closure date + E2E regression TODO	2026-04-17 12:43:21 +02:00
README.md	docs(audit): 2026-04 correctness/accounting findings (axis 1)	2026-04-17 03:21:33 +02:00
v107-plan.md	feat(webhooks): persist raw hyperswitch payloads to audit log — v1.0.7 item E	2026-04-18 02:44:58 +02:00

README.md

VEZA Audit — 2026-04

Scope — VEZA backend (Go) + web (TypeScript). TALAS software (firmware, PCB reverse-engineering pipeline) is out of scope and will be audited separately when its phase stabilises.

Source state — commits up to a57bb6f78 (v1.0.6.1, 2026-04-17).

Auditor — Claude Opus 4.7 (1M context).

Axes

#	File	Status
1	`axis-1-correctness.md` — correctness / accounting	✅ delivered
2	`axis-2-state-machines.md` — transition matrix + illegal-transition tests	🔲 pending v1.0.7
3	`axis-3-security.md` — attack surface (signatures, rate limits, authz, secrets)	🔲 pending
4	`axis-4-tests.md` — coverage vs reality, failure-injection gap	🔲 pending
5	`axis-5-debt.md` — documented debt vs hidden debt (TODO/FIXME inventory)	🔲 pending

Axis 2 is gated on v1.0.7 landing first — otherwise the transition matrix captures a v1.0.6.1 snapshot that's immediately stale. See v107-plan.md for the sequencing.

Reading conventions

Every finding cites file:line evidence. Structure:

### P{0|1|2}.N — short title
**Evidence** — concrete cites
**Consequence** — what breaks today / tomorrow
**Action** — what to do, with enough detail that an implementer can start
**Criticity** — P0 / P1 / P2 / wontfix (with justification)

P0 = fix within v1.0.7 or earlier (ledger diverges today, or a v1.0.7 commitment is structurally blocked). P1 = v1.0.7 target. Operational visibility / correctness hardening. P2 = v1.0.8+. Nice-to-have. wontfix = justified non-action.

Info needed from ops (not determinable from code)

Tracked in axis-1-correctness.md. Absence of answers becomes a finding in its own right.

Derived deliverables

v107-plan.md — sequencing, dependencies and relative effort for the axis-1 P0 findings + the CHANGELOG-parked v1.0.7 items. Read this before picking up v1.0.7 work.