veza/docs/audit-2026-04/README.md

53 lines
2 KiB
Markdown
Raw Normal View History

docs(audit): 2026-04 correctness/accounting findings (axis 1) Axis 1 of the 5-axis VEZA audit, scoped to money-movement correctness and ledger↔PSP reconciliation. Layout: one file per axis under docs/audit-2026-04/, README index, v107-plan.md derived. P0 findings (block v1.0.7 "ready-to-show" gate): * P0.1 — SellerTransfer.StripeTransferID declared but never populated. stripe_connect_service.CreateTransfer discards the *stripe.Transfer return value (`_, err := transfer.New(params)`), so the column in models.go:237 is dead. Structural blocker for the CHANGELOG-parked v1.0.7 "Stripe Connect reversal" item. * P0.2 — No Stripe Connect reversal on refund.succeeded. Every refund today creates a permanent VEZA↔Stripe ledger gap. Action reworked to decouple via a new `seller_transfers.status = 'reversal_pending'` state + async worker, so Stripe flaps never block buyer-facing refund UX. * P0.3 — No reconciliation sweep for stuck orders / refunds / refund rows with empty hyperswitch_refund_id. Hourly worker recommended, same pattern as v1.0.5 Fix 6 orphan-tracks cleaner. * P0.4 — No Idempotency-Key on outbound Hyperswitch POST /payments and POST /refunds. Action includes an explicit scope note: the header covers HTTP-transport retry only, NOT application-level replay (for which the fix is a state-machine precondition). P1 findings: * P1.5 — Webhook raw payloads not persisted (blocks dispute forensics) * P1.6 — Disputes / chargebacks silently dropped (new, surfaced during review; dispute.* webhooks fall through the default case) * P1.7 — Subscription money-movement not covered by v1.0.6 hardening * P1.8 — No ledger-health Prometheus metrics P2 findings: * P2.9 — No admin API for manual override * P2.10 — Partial refund latent compromise (amount *int64 always nil) wontfix: * wontfix.11 — Per-seller retry interval (re-evaluate at 10× load) Derived deliverable: v107-plan.md sequences the 6 de-duplicated items (4 P0 + 2 P1) with a dependency graph, two parallel tracks, per-commit effort estimates (D→A→B; E→C→F), release gating and open questions (volume magnitude, Connect backfill %). Info needed from ops (tracked in axis-1 doc, not determinable from code): last manual reconciliation date, whether subscriptions are currently sold, current order/refund volume. Axes 2-5 deferred: README.md marks axis 2 (state machines) as gated on v1.0.7 landing first, otherwise the transition matrix captures a v1.0.6.1 snapshot that's immediately stale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 01:21:33 +00:00
# VEZA Audit — 2026-04
> **Scope** — VEZA backend (Go) + web (TypeScript).
> TALAS software (firmware, PCB reverse-engineering pipeline) is **out of scope**
> and will be audited separately when its phase stabilises.
>
> **Source state** — commits up to `a57bb6f78` (v1.0.6.1, 2026-04-17).
>
> **Auditor** — Claude Opus 4.7 (1M context).
## Axes
| # | File | Status |
|---|---|---|
| 1 | [`axis-1-correctness.md`](./axis-1-correctness.md) — correctness / accounting | ✅ delivered |
| 2 | `axis-2-state-machines.md` — transition matrix + illegal-transition tests | 🔲 pending v1.0.7 |
| 3 | `axis-3-security.md` — attack surface (signatures, rate limits, authz, secrets) | 🔲 pending |
| 4 | `axis-4-tests.md` — coverage vs reality, failure-injection gap | 🔲 pending |
| 5 | `axis-5-debt.md` — documented debt vs hidden debt (TODO/FIXME inventory) | 🔲 pending |
Axis 2 is gated on v1.0.7 landing first — otherwise the transition matrix
captures a v1.0.6.1 snapshot that's immediately stale. See
[`v107-plan.md`](./v107-plan.md) for the sequencing.
## Reading conventions
Every finding cites `file:line` evidence. Structure:
```
### P{0|1|2}.N — short title
**Evidence** — concrete cites
**Consequence** — what breaks today / tomorrow
**Action** — what to do, with enough detail that an implementer can start
**Criticity** — P0 / P1 / P2 / wontfix (with justification)
```
**P0** = fix within v1.0.7 or earlier (ledger diverges today, or a v1.0.7
commitment is structurally blocked).
**P1** = v1.0.7 target. Operational visibility / correctness hardening.
**P2** = v1.0.8+. Nice-to-have.
**wontfix** = justified non-action.
## Info needed from ops (not determinable from code)
Tracked in [`axis-1-correctness.md`](./axis-1-correctness.md#info-needed-from-ops).
Absence of answers becomes a finding in its own right.
## Derived deliverables
- [`v107-plan.md`](./v107-plan.md) — sequencing, dependencies and relative
effort for the axis-1 P0 findings + the CHANGELOG-parked v1.0.7 items.
Read this before picking up v1.0.7 work.