veza/docs/audit-2026-04/axis-1-correctness.md
senke 26cb523334 fix(distribution,audit): propagate ErrSubscriptionNoPayment to handler + P0.12 closure date + E2E regression TODO
Self-review of the v1.0.6.2 hotfix surfaced that
distribution.checkEligibility silently swallowed
subscription.ErrSubscriptionNoPayment as "ineligible, no extra info",
so a user with a fantôme subscription trying to submit a distribution
got "Distribution requires Creator or Premium plan" — misleading, the
user has a plan but no payment. checkEligibility now propagates the
error so the handler can surface "Your subscription is not linked to
a payment. Complete payment to enable distribution."

Security is unchanged — the gate still refuses. This is a UX clarity
fix for honest-path users who landed in the fantôme state via a
broken payment flow.

Also:
- Closure timestamp added to axis-1 P0.12 ("closed 2026-04-17 in
  v1.0.6.2 (commit 9a8d2a4e7)") so future readers know the finding's
  lifecycle without re-grepping the CHANGELOG.
- Item G in v107-plan.md gains an explicit E2E Playwright @critical
  acceptance — the shell probe + Go unit tests validate the fix
  today but don't run on every commit, so a refactor of Subscribe or
  checkEligibility could silently re-open the bypass. The E2E test
  makes regression coverage automatic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 12:43:21 +02:00

609 lines
27 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Axis 1 — Correctness / accounting
> See [`README.md`](./README.md) for scope, auditor, source state and
> reading conventions.
## Surface summary
| PSP | Package | Endpoints called | Env gate |
|---|---|---|---|
| Hyperswitch (payments + refunds) | `internal/services/hyperswitch/` | `POST /payments`, `GET /payments/:id`, `POST /refunds` | `HYPERSWITCH_ENABLED` |
| Stripe Connect (seller payouts + transfers) | `internal/services/stripe_connect_service.go` | `POST /v1/account_links`, `GET /v1/accounts/:id`, `GET /v1/balance`, `POST /v1/transfers` | `STRIPE_CONNECT_ENABLED` |
| Table | Status column | Terminal states | Non-terminal |
|---|---|---|---|
| `orders` | `status` | completed, failed, cancelled, refunded | pending, refund_pending |
| `refunds` (v1.0.6) | `status` | succeeded, failed | pending |
| `seller_transfers` | `status` | completed, permanently_failed, reversed | pending, failed |
| `seller_payouts` | `status` | completed, failed | pending, processing |
| `licenses` | `revoked_at` (nullable) | NULL (active) / timestamp (revoked) | — |
| `user_subscriptions` | `status` (enum) | canceled, expired | active, trialing, past_due |
Webhook entry: `POST /api/v1/webhooks/hyperswitch`
(`routes_webhooks.go:55`), HMAC-SHA512 verified, dispatches to
`ProcessPaymentWebhook` or `ProcessRefundWebhook` via
`HyperswitchWebhookPayload.IsRefundEvent()` (`service.go:700`).
Existing reconciliation: **only** the `TransferRetryWorker` for
`seller_transfers.status='failed'` (`transfer_retry.go:57`), interval
`TRANSFER_RETRY_INTERVAL` default `5m` (`config.go:399`).
---
## P0.1 — `SellerTransfer.StripeTransferID` declared but never populated
**Evidence**
- Column declared: `internal/core/marketplace/models.go:237`
`StripeTransferID string \`gorm:"size:255" json:"stripe_transfer_id,omitempty"\``
- Zero write-sites (`grep 'StripeTransferID\s*='` returns 0 hits in production code).
- Upstream returns the transfer but we discard it:
`internal/services/stripe_connect_service.go:205`
`_, err := transfer.New(params)` — the `*stripe.Transfer` pointer carrying
`.ID` is underscored away.
- `CreateTransfer` signature returns only `error`
(`internal/services/stripe_connect_service.go:179`), so callers physically
cannot persist the id even if they wanted to.
**Consequence**
Direct blocker for the v1.0.6 CHANGELOG-documented "parked v1.0.7 —
Stripe Connect Transfers:reversal". When anyone starts that work, there
is no `stripe_transfer_id` in the DB to pass to
`transfers.CreateReversal(...)`. The only fallbacks would be:
- Query Stripe with `transfers.List(Destination=account_id, Metadata[order_id]=...)` — fragile, rate-limited, no unique guarantee.
- Do a full upstream refactor of the transfer worker — larger scope than the TODO implies.
This is exactly the "temporary compromise frozen into permanent" pattern
the audit charter targets.
**Action (prerequisite for any v1.0.7 reversal work)**
1. Change `TransferService.CreateTransfer` signature:
`(ctx, sellerID, amount, currency, orderID string) (stripeTransferID string, err error)`.
2. Update `processSellerTransfers` (`service.go:762`) to capture the id
and set `st.StripeTransferID = stripeTransferID` before `tx.Create(&st)`.
3. Update `TransferRetryWorker.retryOne` (`transfer_retry.go:116`) to
persist the id on successful retry too.
**Criticity** — P0. Must land before the reversal commit, not with it.
---
## P0.2 — No Stripe Connect reversal on refund.succeeded
**Evidence**
- `internal/core/marketplace/service.go:1529` `reverseSellerAccounting`
does three things on refund webhook succeeded:
1. `DebitSellerBalance` (in-DB balance row decrement)
2. `seller_transfers.status = "reversed"` (in-DB ledger flag)
3. **Nothing else.**
- The CHANGELOG v1.0.6 entry explicitly flags this: "Stripe Connect
Transfers:reversal call flagged TODO(v1.0.7). [...] the missing piece
is the money-movement round-trip at Stripe."
**Consequence**
For every refunded order, the buyer is credited by Hyperswitch (verified
via webhook) and the VEZA internal seller balance is debited correctly.
But the Stripe-Connect-side transfer that moved the seller's share into
their connected account is **not** reversed. The seller's Stripe account
still holds that money. VEZA's books say "debited", Stripe says "paid
out". On every refund this creates a real reconciliation gap that a
seller's monthly Stripe statement will show.
This isn't a ledger-consistency issue in VEZA's own DB — it's a
VEZA-vs-Stripe divergence. It grows linearly with refund volume.
**Action — decoupled from the buyer-refund critical path**
The intuitive fix ("call `transfers.NewReversal` inside
`reverseSellerAccounting` and return on error to let the webhook retry")
is wrong: Hyperswitch has already credited the buyer, the buyer is
watching their order page, and a Stripe flap would leave the buyer in
`refund_pending` until Stripe comes back. That couples buyer UX to the
weakest dependency in the whole flow.
Correct pattern — split the reversal into two concerns, with an
intermediate state that says "VEZA books updated, PSP reversal still
owed":
1. **Migration** — extend `seller_transfers.status` to include a new
value `reversal_pending`. Update the state-value documentation
comment in `internal/core/marketplace/models.go:242`.
2. **Refund finalization** (`service.go:1466 finalizeSuccessfulRefund`)
— no change to its flow except: instead of transitioning
`seller_transfers.status` directly to `reversed` in
`reverseSellerAccounting`, transition to `reversal_pending` and set
`next_retry_at = NOW()`. The buyer-facing flow finalizes immediately:
order=refunded, licenses revoked, webhook returns 200.
3. **New worker** `StripeReversalWorker` — scans
`seller_transfers WHERE status='reversal_pending' AND (next_retry_at IS NULL OR next_retry_at <= NOW())`.
For each row:
- if `StripeTransferID == ""` (see P0.1), skip + log ERROR "cannot
reverse — transfer id missing", alert ops.
- else call
`transfer.NewReversal(stripeTransferID, &stripe.TransferReversalParams{Amount: amountCents})`.
- on success: `status='reversed'`, persist the reversal id on a new
`stripe_reversal_id` column.
- on Stripe 404 (transfer not found): assume already reversed
out-of-band, set `status='reversed'`, log INFO.
- on any other error: exponential backoff on `next_retry_at`,
increment `retry_count`. Hit the same `permanently_failed` ceiling
the TransferRetryWorker uses today.
This gives us three properties:
- Buyer refund never blocks on Stripe health.
- VEZA accounting is consistent with Hyperswitch refund state at all times.
- The VEZA↔Stripe gap is bounded (every reversal either completes or
ends in `permanently_failed` with a clear ops action).
**Criticity** — P0. Pre-public-opening of marketplace.
---
## P0.3 — No reconciliation sweep for stuck pending states
**Evidence**
- Inventory search returned zero scheduled jobs that compare DB state
to PSP state for payments or refunds.
- `TransferRetryWorker` exists (`transfer_retry.go`) but only reconciles
`seller_transfers` — not orders, not refunds.
- Agent inventory: "No startup sanity checks comparing DB to PSP for
orders/refunds. No admin endpoints for manual order/refund/transfer
verification. No sweeps for stuck orders in `refund_pending` or orders
exceeding 14-day deadline."
**Consequence**
If a Hyperswitch webhook is never delivered — they've been down, our
`/webhooks/hyperswitch` was down during their retry window, or their
delivery system has a bug — then:
- An order stuck in `pending` stays forever. Customer paid, got no
license. Only surfaces via customer complaint.
- A refund stuck in `refund_pending` stays forever. Customer asked for
refund, got nothing. Order stuck in `refund_pending` also blocks
re-attempts (`ErrRefundAlreadyRequested` at `service.go:1303`).
- A refund row with empty `hyperswitch_refund_id` (Phase 1 completed,
Phase 2 crashed between PSP call and DB UPDATE) has no way to
correlate to the PSP. Nothing queries them.
**Action**
Hourly worker (same pattern as `CleanupOrphanTracks` from v1.0.5 Fix 6):
```go
// internal/jobs/reconcile_hyperswitch.go
func ReconcilePendingOrders(ctx context.Context, db *gorm.DB, hs *hyperswitch.Client) error {
// Orders in 'pending' older than 30m: GET /payments/:id, sync
// Refunds in 'pending' > 30m with non-empty refund_id: GET /refunds/:id, sync
// Refunds in 'pending' > 5m with EMPTY refund_id: mark failed, roll back order
}
```
Wire in `cmd/api/main.go` next to `ScheduleOrphanTracksCleanup`.
**Criticity** — P0. The gap is open today on every deployed instance.
---
## P0.4 — `CreatePayment` / `CreateRefund` not idempotent at the HTTP layer
**Evidence**
- `internal/services/hyperswitch/client.go:73-78` and
`internal/services/hyperswitch/client.go:170-175` — only
`Content-Type` and `api-key` headers are set. No `Idempotency-Key`.
- Hyperswitch's API supports an `Idempotency-Key` header (standard PSP
convention).
**Consequence**
If the HTTP call from VEZA to Hyperswitch retries at the transport
layer (curl-level / net/http Transport-level retry, proxy retry, TLS
reconnect after a flap) during a single logical call, a second payment
or refund object can be created for the same intent. Customer
double-charged, or two refunds issued. The order row's
`hyperswitch_payment_id` column is set to the *first* payment_id we
hear back about, so the second shadow payment is orphaned in Hyperswitch
with no VEZA row pointing at it.
**Scope note — what Idempotency-Key does and does NOT fix**
This header protects against **HTTP-transport retry** only. It does
**not** fix, and should not be conflated with:
- **Application-level replay** — buyer clicks "Pay" twice, frontend
double-submits a form, VEZA application code retries after a DB
transaction aborts. These paths create distinct `order.id` values, so
each call sees a different natural key and the PSP legitimately
creates different payments. The fix there is a state-machine
precondition (order dedup by `(buyer_id, cart_hash)` within a window,
frontend debounce, CSRF single-use token), not the header.
- **Crash-between-call-and-save** — VEZA POST succeeds, Hyperswitch
returns payment_id, VEZA crashes before persisting it. A subsequent
application retry with the same `order.id` *does* recover thanks to
the Idempotency-Key (Hyperswitch returns the cached response), but
this only works if the application retry re-uses the same order row.
That requires the order row to exist in a `pending_payment_submitted`
intermediate state before the PSP call — which the current code
doesn't have. Documenting as a follow-up.
**Action**
Add `Idempotency-Key: <natural key>` header on every mutating call:
- `CreatePayment``order_id` (UUID, guaranteed unique per order row).
- `CreateRefund``refund.ID` (UUID from the pending `refunds` row —
populated before the PSP call in Phase 1, so the same key is used
across any HTTP-level retry within that call).
The `http.Client` in `hyperswitch.Client` should NOT set this
implicitly; callers must pass the key explicitly so the semantic is
auditable at each call site. Two changes, each ~3 lines.
The state-machine-level fix for application retry is out of scope here;
it goes on the v1.0.8+ roadmap (documented in `v107-plan.md`).
**Criticity** — P0. Low probability per call but unbounded blast radius
(customer double-charge).
---
## P1.5 — Webhook raw payloads not persisted
**Evidence**
- `routes_webhooks.go:59-82` reads body, verifies signature, dispatches.
- No table or log file captures the raw webhook body, signature header,
or processing result for later audit.
**Consequence**
When a customer disputes ("I never got refunded"), we cannot reproduce
what Hyperswitch told us. The only evidence is in the backend `log` at
whatever retention the host keeps — and structured logs don't carry the
raw body (we explicitly redact some fields for security).
**Action**
New table `hyperswitch_webhook_log`:
```sql
CREATE TABLE hyperswitch_webhook_log (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
received_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
signature_header TEXT NOT NULL,
signature_valid BOOLEAN NOT NULL,
event_type TEXT,
payment_id TEXT,
refund_id TEXT,
payload BYTEA NOT NULL,
processing_result TEXT NOT NULL, -- 'ok', 'error: <msg>', 'skipped'
processed_at TIMESTAMPTZ
);
CREATE INDEX ON hyperswitch_webhook_log (payment_id) WHERE payment_id IS NOT NULL;
CREATE INDEX ON hyperswitch_webhook_log (refund_id) WHERE refund_id IS NOT NULL;
```
Insert in `routes_webhooks.go:hyperswitchWebhookHandler` **before**
signature verification — so we capture attack attempts too. Retention
90 days via a separate sweep job.
**Criticity** — P1. Not ledger-diverging, but the first ops question
on any refund/dispute claim is "what did the webhook say", and today we
can't answer it.
---
## P1.6 — Disputes / chargebacks silently dropped
**Evidence**
- `HyperswitchWebhookPayload.IsRefundEvent()` (`service.go:700`) only
distinguishes refund-vs-non-refund. No dispute discriminator.
- Zero grep hits for `dispute|chargeback|disputed` in
`internal/core/marketplace/`, `internal/services/hyperswitch/`, or
anywhere in the webhook handler path.
- Hyperswitch emits `dispute.opened`, `dispute.evidence_required`,
`dispute.won`, `dispute.lost`, `dispute.expired` — all land on
`POST /webhooks/hyperswitch` and currently fall through
`ProcessPaymentWebhook`'s switch to the `default` case
(`service.go:718`), which logs Debug "ignoring status" and returns 200.
**Consequence**
A dispute/chargeback is the case where the ledger diverges most
violently, because:
1. It's **not initiated by VEZA** — we learn about it passively when
the buyer's bank pushes a reversal through the card network.
2. On `dispute.lost` (the common case if we don't respond in time),
Hyperswitch debits our acquirer balance for the original amount
**plus** a chargeback fee (typically 15-25€ depending on acquirer).
3. The VEZA `orders` row stays `completed`, the license stays active,
the seller stays paid. Buyer has their money back via the card
network. Every table in VEZA thinks everything is fine. The only
signal is the acquirer statement, and it's weeks delayed.
4. Chargeback rate is a seller health metric: an account with 1% CB
rate gets flagged by Visa/MC and the whole VEZA merchant account
can be put under review. Without counting disputes per-seller, we
have no early-warning system.
**Action (staged)**
Minimum (ships with P1.5 webhook log): every `dispute.*` event lands in
`hyperswitch_webhook_log` with the payload captured. Ops can query.
Recommended (v1.0.8 sprint):
1. New `disputes` table:
```sql
CREATE TABLE disputes (
id UUID PRIMARY KEY,
order_id UUID NOT NULL REFERENCES orders(id),
hyperswitch_dispute_id TEXT UNIQUE NOT NULL,
status TEXT NOT NULL, -- opened, evidence_required, won, lost, expired
amount_cents BIGINT NOT NULL,
reason TEXT,
evidence_due_at TIMESTAMPTZ,
opened_at TIMESTAMPTZ NOT NULL,
resolved_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
```
2. Extend `IsRefundEvent()` discriminator to a three-way
`EventKind()` that returns `payment | refund | dispute`.
3. New `ProcessDisputeWebhook` — mirrors `ProcessRefundWebhook`'s
idempotency pattern (unique `hyperswitch_dispute_id` + row lock).
4. On `dispute.lost`: same seller-side reversal machinery as refund
(P0.2), because the money effect is identical.
5. Per-seller chargeback counter on `seller_balances` or a new
`seller_risk_metrics` table, exposed to admin UI.
**Criticity** — P1 minimum (capture + log). P0 if volume justifies it
once public-opened. At current volume (pre-opening), accepting the gap
is defensible; post-opening, a single dispute we miss is worse than
the engineering cost of handling it.
---
## P1.7 → P0.12 — Subscription payment-gate bypass (escalated 2026-04-17)
> **Re-classified from P1.7 to P0 on 2026-04-17** after probing ops Q2.
> The probe did not confirm "subscriptions are sold" — it confirmed
> "subscriptions are activated server-side without payment, and that
> state satisfies a paid-feature gate". Criticity moved from "P1
> hardening if subs are live" to "P0 security — bypass is live in any
> environment where `HYPERSWITCH_ENABLED=false` (which is the current
> dev/staging default and was the prod default of every env variable
> default before v1.0.6.2)". Hotfix v1.0.6.2 ships the gate fix
> (`GetUserSubscription` filter + migration 980 cleanup). The
> subscription-refund / webhook work (original P1.7 scope) remains and
> splits out below as P1.7' for v1.0.8.
**Evidence (bypass chain, confirmed by probe)**
- `POST /api/v1/subscriptions/subscribe` with `plan_id=<Creator>` from
any authenticated user (role=`user`, no KYC, no email verification
re-check beyond initial signup) returns **HTTP 201 `status=active`**.
- `subscription/service.go:256` `sub.Status = StatusActive` is assigned
unconditionally before the payment call.
- `subscription/service.go:279` `if s.paymentProvider != nil`
payment call is conditional; when Hyperswitch is disabled (the
default for `HYPERSWITCH_ENABLED=false`), the row persists `active`
with no PSP reference written.
- `subscription/service.go:101` (pre-hotfix) `WHERE status IN
('active', 'trialing')` — the fantôme row satisfies this filter,
`GetUserSubscription` returns it to callers.
- `distribution/service.go:253` gate returns
`sub.Plan.HasDistribution || sub.Plan.CanSellOnMarketplace`. Creator
plan has `can_sell_on_marketplace=true` (migration 949 line 79) →
gate passes → user reaches `POST /distribution/submit`.
- `user_subscriptions.hyperswitch_subscription_id` declared
(`models.go:95`) but **never written anywhere** in the codebase —
no code path populates it, so it cannot be used as a post-hoc
"payment confirmed" signal either.
- Probe artefact (captured, cleaned): row
`906c585d-1afb-49e5-ac96-5eb5411cda99` created 2026-04-17 via
authenticated `POST /subscribe` from `smoke.0416@veza.local`
(`role=user`), HS IDs empty. Voided by migration 980 at v1.0.6.2
deploy.
**Consequence**
Any authenticated user gains access to `POST /distribution/submit`,
which dispatches to external distribution partners. Three harms:
1. Reputational — external partners receive distribution submissions
from non-paying VEZA accounts.
2. Financial — if the distribution workflow triggers billable API
calls to the partner platform (depends on workflow, worth auditing
as a separate finding under axis 3).
3. Contractual — VEZA's agreements with partners likely assume
"paid-subscriber submissions only".
The `role` column is not affected (user stays `user`, no bypass of
v1.0.6 creator self-service gate), and free-tier upload is unaffected
— the specific gate bypassed is the `has_distribution ||
can_sell_on_marketplace` check on paid plans.
**Action — shipped in v1.0.6.2 (commit d31f5733d)**
1. `subscription/service.go` `GetUserSubscription` now routes through
`hasEffectivePayment` — effective payment = free plan OR unexpired
trial OR invoice with non-empty `hyperswitch_payment_id`.
2. `ErrSubscriptionNoPayment` added; `distribution.checkEligibility`
and the four subscription handlers map it to "no active
subscription" (with `needs_payment=true` on `GET /me/subscription`
so honest-path users get actionable info).
3. Migration `980_void_unpaid_subscriptions.sql` sweeps pre-existing
fantôme rows to `status='expired'` with audit table
`voided_subscriptions_20260417`.
4. Probe script `scripts/probes/subscription-unpaid-activation.sh`
kept as versioned regression test.
5. `TODO(v1.0.7-item-G)` annotation on
`subscription/service.go:createNewSubscription` marks the deferred
work: replace the `if s.paymentProvider != nil` short-circuit with
a mandatory `pending_payment` state + webhook-driven activation
(item G in the v1.0.7 plan).
**Criticity** — **P0, closed 2026-04-17 in v1.0.6.2 (commit
d31f5733d).** Item G in v1.0.7 hardens the creation path end-to-end.
---
## P1.7' — Subscription money-movement not covered by v1.0.6 hardening (original scope, deferred to v1.0.8)
**Evidence**
- Subscription model has Hyperswitch keys:
`subscription/models.go:95` `HyperswitchSubscriptionID`, line 96
`HyperswitchCustomerID`, line 136 `HyperswitchPaymentID`.
- Status enum: `active, trialing, canceled, past_due, expired`
(`subscription/models.go:20-28`).
- `Service.Subscribe` (`subscription/service.go:149`) and
`Service.CancelSubscription` (line 390) exist.
- **Zero refund code in the subscription package.** Cancellation marks
the row `canceled` but does not initiate any PSP reimbursement.
- No visible webhook handler branch for `subscription.*` events — the
dispatcher only knows payment vs refund.
- No code path was found that transitions a subscription INTO
`past_due` (which implies it's driven by an external event the
webhook isn't consuming, or it's dead state).
**Consequence**
The marketplace refund hardening only covers one-shot purchases. A
subscription cancellation inside the paid period currently doesn't
refund anything, and a failed recurring charge doesn't move the
subscription to `past_due` because nothing wires it up.
**Action**
Dedicated sprint for subscription state-machine hardening. Scope for
v1.0.8. Breakdown:
1. Inventory every state in `user_subscriptions.status` with evidence
of which code path lands there.
2. Handle `subscription.payment_succeeded`, `subscription.payment_failed`,
`subscription.canceled` webhook events in a
`ProcessSubscriptionWebhook`.
3. Decide product policy on mid-period cancellation (pro-rata refund?
honour until end of period? forfeit?).
**Criticity** — P1 for the hardening. v1.0.6.2 closes the bypass half;
this half (refund / webhook completeness) is legitimately v1.0.8 scope
and does not produce a bypass on its own.
---
## P1.8 — No ledger-health metrics
**Evidence**
No grep hit for `prometheus.*counter.*order|refund|transfer|payout` in
`internal/metrics/`. Agent confirms no dashboards.
**Consequence**
Ledger state is invisible until someone complains. Specifically:
- Count of orders in `pending` older than 30 min (webhook never arrived)
- Count of orders in `refund_pending` older than 30 min
- Count of seller_transfers in `failed` at max `retry_count`
- Count of refunds with empty `hyperswitch_refund_id` older than 5 min
- Count of seller_transfers in `reversal_pending` older than 30 min (new state from P0.2)
- Oldest timestamp in each of the above
**Action**
Prometheus gauge metrics updated by a 60s sampler (cheap query each):
```
ledger_stuck_orders_pending_total{age_bucket="30m_1h|1h_1d|1d+"}
ledger_stuck_refunds_pending_total{age_bucket=...}
ledger_failed_transfers_at_max_retry_total
ledger_orphan_refund_rows_total // empty hyperswitch_refund_id
ledger_reversal_pending_transfers_total
```
Grafana panel with 5 number panels + 5 time-series. Alert rules:
- `ledger_stuck_orders_pending_total > 0 for 10m` → page
- `ledger_orphan_refund_rows_total > 0 for 5m` → page (= bug in the
two-phase commit between DB and PSP)
**Criticity** — P1. Ships with P0.3 (the reconciliation sweep) so the
metrics track the sweep's effectiveness.
---
## P2.9 — No admin API for manual refund / order override
**Evidence**
No admin endpoint found that can force an order or refund into a
specific terminal state. Ops would have to issue raw SQL in prod.
**Consequence**
When something genuinely gets stuck (Hyperswitch confirms out-of-band
that a refund succeeded but their webhook is never going to fire for
some freak reason), there's no sanctioned path to resolve without raw
DB access.
**Action**
`POST /api/v1/admin/orders/:id/force-status` +
`POST /api/v1/admin/refunds/:id/force-status`, gated by
`AuthMiddleware.RequireAdminRole`, with an audit-log entry. Body:
`{ new_status: "...", reason: "..." }`. Validates the target state is
reachable from the current one per the documented state machine
(depends on Axis 2).
**Criticity** — P2. Not urgent; raw SQL works until it doesn't.
---
## P2.10 — Partial refund latent compromise
**Evidence**
`client.go:142` `CreateRefundRequest.Amount *int64 // nil = full refund`.
`service.go:1330` passes `nil` as amount on every call.
CHANGELOG v1.0.6: "full-only for v1.0.6 — partial refund UX is deferred
to v1.0.7."
**Consequence**
The `amount *int64` parameter is a permanent half-wired compromise. When
partial refunds land, the implementer must remember:
- Multiple refunds per order must be allowed (change the double-submit
guard at `service.go:1303`).
- `SellerBalance` debit must be proportional to the partial amount, not
full transfer amount.
- The partial UNIQUE index already tolerates the multi-refund case
(v1.0.6.1 migration 979) — but only if each refund has its own
PSP-assigned refund_id, which Hyperswitch does.
- Webhook lookup already uses `hyperswitch_refund_id` so no change.
**Action**
When partial lands: extensive state-table tests (can the same order
reach (`status=refunded`, amount < total)? should it be a new
`partially_refunded` status?). Flag this item in the v1.0.7 plan
explicitly so it's scoped, not drive-by.
**Criticity** P2. No active bug; future hazard only.
---
## wontfix.11 — `TransferRetryInterval` not per-seller configurable
`config.go:399` default `5m`. A single global interval works fine at
current traffic. Re-evaluate at 10× load.
---
## Axis 1 summary
| # | Finding | Criticity | Gate |
|---|---|---|---|
| 1 | `SellerTransfer.StripeTransferID` never set | P0 | prerequisite for v1.0.7 reversal |
| 2 | No Stripe Connect reversal on refund.succeeded | P0 | v1.0.7, decoupled via `reversal_pending` state |
| 3 | No reconciliation sweep for stuck pending states | P0 | v1.0.7 (hourly job) |
| 4 | No `Idempotency-Key` on Hyperswitch calls | P0 | v1.0.7 (2 header lines) |
| 5 | Webhook raw payloads not persisted | P1 | v1.0.7 (migration + insert) |
| 6 | Disputes / chargebacks silently dropped | P1 | v1.0.8 (min: capture in P1.5 log) |
| 7 | Subscription money-movement surface | P1 | v1.0.8 sprint |
| 8 | No ledger-health metrics | P1 | ships with #3 |
| 9 | No admin API for manual override | P2 | v1.0.8+ |
| 10 | Partial refund latent compromise | P2 | flag in v1.0.7 plan |
| 11 | Per-seller retry interval | wontfix | |
**Blockers for "ready to show"** 1, 2, 3, 4 must land before the
marketplace is used by anyone external (grants demo, beta, DIY
community). After those four, VEZA's accounting is self-consistent
AND consistent with Stripe+Hyperswitch, which is the real bar.
See [`v107-plan.md`](./v107-plan.md) for sequencing and effort.
---
## Info needed from ops
Not determinable from code. Answers become part of the audit:
1. **Last manual reconciliation between VEZA DB and Stripe/Hyperswitch
dashboards.** Date + scope + outcome.
_If the answer is "jamais" that's itself a P1 process finding:
no reconciliation cadence established. Absence of documented debt
= hidden debt._
2. **Are subscriptions currently sold** (i.e., is the paid-subscription
feature turned on for end-users), or is the code live but the UI
hides the entry point? If yes, P1.7 P0. If hidden-by-UI,
additional Axis-3 check needed: is the API reachable despite the
UI hiding? Front-flag-only gating is a security finding.
3. **Current volume order of magnitude** number of orders/refunds
processed per week so criticity can be re-checked against blast
radius.