diff --git a/docs/audit-2026-04/axis-1-correctness.md b/docs/audit-2026-04/axis-1-correctness.md index 0d0845674..13cd496f1 100644 --- a/docs/audit-2026-04/axis-1-correctness.md +++ b/docs/audit-2026-04/axis-1-correctness.md @@ -348,7 +348,87 @@ the engineering cost of handling it. --- -## P1.7 — Subscription money-movement not covered by v1.0.6 hardening +## P1.7 → P0.12 — Subscription payment-gate bypass (escalated 2026-04-17) + +> **Re-classified from P1.7 to P0 on 2026-04-17** after probing ops Q2. +> The probe did not confirm "subscriptions are sold" — it confirmed +> "subscriptions are activated server-side without payment, and that +> state satisfies a paid-feature gate". Criticity moved from "P1 +> hardening if subs are live" to "P0 security — bypass is live in any +> environment where `HYPERSWITCH_ENABLED=false` (which is the current +> dev/staging default and was the prod default of every env variable +> default before v1.0.6.2)". Hotfix v1.0.6.2 ships the gate fix +> (`GetUserSubscription` filter + migration 980 cleanup). The +> subscription-refund / webhook work (original P1.7 scope) remains and +> splits out below as P1.7' for v1.0.8. + +**Evidence (bypass chain, confirmed by probe)** +- `POST /api/v1/subscriptions/subscribe` with `plan_id=` from + any authenticated user (role=`user`, no KYC, no email verification + re-check beyond initial signup) returns **HTTP 201 `status=active`**. +- `subscription/service.go:256` `sub.Status = StatusActive` is assigned + unconditionally before the payment call. +- `subscription/service.go:279` `if s.paymentProvider != nil` — + payment call is conditional; when Hyperswitch is disabled (the + default for `HYPERSWITCH_ENABLED=false`), the row persists `active` + with no PSP reference written. +- `subscription/service.go:101` (pre-hotfix) `WHERE status IN + ('active', 'trialing')` — the fantôme row satisfies this filter, + `GetUserSubscription` returns it to callers. +- `distribution/service.go:253` gate returns + `sub.Plan.HasDistribution || sub.Plan.CanSellOnMarketplace`. Creator + plan has `can_sell_on_marketplace=true` (migration 949 line 79) → + gate passes → user reaches `POST /distribution/submit`. +- `user_subscriptions.hyperswitch_subscription_id` declared + (`models.go:95`) but **never written anywhere** in the codebase — + no code path populates it, so it cannot be used as a post-hoc + "payment confirmed" signal either. +- Probe artefact (captured, cleaned): row + `906c585d-1afb-49e5-ac96-5eb5411cda99` created 2026-04-17 via + authenticated `POST /subscribe` from `smoke.0416@veza.local` + (`role=user`), HS IDs empty. Voided by migration 980 at v1.0.6.2 + deploy. + +**Consequence** +Any authenticated user gains access to `POST /distribution/submit`, +which dispatches to external distribution partners. Three harms: +1. Reputational — external partners receive distribution submissions + from non-paying VEZA accounts. +2. Financial — if the distribution workflow triggers billable API + calls to the partner platform (depends on workflow, worth auditing + as a separate finding under axis 3). +3. Contractual — VEZA's agreements with partners likely assume + "paid-subscriber submissions only". +The `role` column is not affected (user stays `user`, no bypass of +v1.0.6 creator self-service gate), and free-tier upload is unaffected +— the specific gate bypassed is the `has_distribution || +can_sell_on_marketplace` check on paid plans. + +**Action — shipped in v1.0.6.2 (commit d31f5733d)** +1. `subscription/service.go` `GetUserSubscription` now routes through + `hasEffectivePayment` — effective payment = free plan OR unexpired + trial OR invoice with non-empty `hyperswitch_payment_id`. +2. `ErrSubscriptionNoPayment` added; `distribution.checkEligibility` + and the four subscription handlers map it to "no active + subscription" (with `needs_payment=true` on `GET /me/subscription` + so honest-path users get actionable info). +3. Migration `980_void_unpaid_subscriptions.sql` sweeps pre-existing + fantôme rows to `status='expired'` with audit table + `voided_subscriptions_20260417`. +4. Probe script `scripts/probes/subscription-unpaid-activation.sh` + kept as versioned regression test. +5. `TODO(v1.0.7-item-G)` annotation on + `subscription/service.go:createNewSubscription` marks the deferred + work: replace the `if s.paymentProvider != nil` short-circuit with + a mandatory `pending_payment` state + webhook-driven activation + (item G in the v1.0.7 plan). + +**Criticity** — **P0, closed in v1.0.6.2.** Item G in v1.0.7 hardens +the creation path end-to-end. + +--- + +## P1.7' — Subscription money-movement not covered by v1.0.6 hardening (original scope, deferred to v1.0.8) **Evidence** - Subscription model has Hyperswitch keys: @@ -383,9 +463,9 @@ v1.0.8. Breakdown: 3. Decide product policy on mid-period cancellation (pro-rata refund? honour until end of period? forfeit?). -**Criticity** — P1 for the hardening, P0 only if subscriptions are -currently sold. Status: needs confirmation from ops — not determinable -from code alone. See [Info needed from ops](#info-needed-from-ops) Q2. +**Criticity** — P1 for the hardening. v1.0.6.2 closes the bypass half; +this half (refund / webhook completeness) is legitimately v1.0.8 scope +and does not produce a bypass on its own. --- diff --git a/docs/audit-2026-04/v107-plan.md b/docs/audit-2026-04/v107-plan.md index 51a86d991..9a65395c1 100644 --- a/docs/audit-2026-04/v107-plan.md +++ b/docs/audit-2026-04/v107-plan.md @@ -4,7 +4,7 @@ > The v1.0.6 CHANGELOG listed 4 "parked v1.0.7" items; the axis-1 audit > added 4 P0 findings. De-duplicated and sequenced below. -## The 8 items, de-duplicated → 6 +## The 9 items (was 8, de-duplicated → 6; +1 after the 2026-04-17 Q2 probe) | # | From | Title | Effort | |---|---|---|---| @@ -14,6 +14,7 @@ | D | audit P0.4 | `Idempotency-Key` on `CreatePayment` / `CreateRefund` | **XS** | | E | audit P1.5 | Webhook raw-payload log table + insert | **S** | | F | audit P1.8 | Ledger-health Prometheus metrics + alerts | **S** | +| **G** | **audit P0.12 follow-up (post v1.0.6.2 hotfix)** | **Subscription `pending_payment` state + webhook-driven activation; replace `if s.paymentProvider != nil` short-circuit** | **M** | Dropped from v1.0.7 scope: - Partial refunds (CHANGELOG-parked) — P2 in audit, feature-class, defer to v1.0.8 @@ -53,15 +54,24 @@ Effort legend — **XS** ≤ 2h, **S** ≤ 1 day, **M** ≤ 3 days, **L** > 3 da │ F metrics │ needs the state-shape C produces; alerts wire │ + alerts │ to the buckets the reconciler defines └──────────────────┘ + + ┌──────────────────┐ + │ G subscription │ independent of A/B/C/D/E/F; shares the + │ pending_ │ `pending_payment` pattern with B's + │ payment │ `reversal_pending`. Builds on v1.0.6.2 + └──────────────────┘ hotfix (which compensates via filter). ``` -**Two parallel tracks:** +**Three parallel tracks:** - **Track 1 (reversal correctness)** — A → B - **Track 2 (operational visibility)** — D, E, C → F +- **Track 3 (subscription creation path)** — G (single item, independent) Two developers can work in parallel without stepping on each other. A single developer sequences as ordered: D first (XS quick win, earns -trust + unblocks "pre-open" checklist), then A→B, then E, then C→F. +trust + unblocks "pre-open" checklist), then A→B, then E, then C→F, +then G. G can also run in parallel at any point after D — it shares no +data-model surface with the other items. ## Commit sequence (single-developer path) @@ -82,6 +92,17 @@ Acceptance: - CHANGELOG entry cross-references P0.4 + its scope note (HTTP retry only, not app-level replay). +**TTL caveat** — Hyperswitch (like most PSPs) honours `Idempotency-Key` +server-side only for a finite window: 24 h is common, 7 d at the high +end. Beyond the TTL, a replayed call with the same key is treated as +a new request. The header therefore protects against HTTP-layer retries +within a single request cycle, not against long-tail application +replay scenarios (for which the application-level idempotency primitives +— order.id on payments, the partial UNIQUE on `refunds.hyperswitch_refund_id` +landed in v1.0.6.1 — are the load-bearing guards). Verify the exact TTL +against current Hyperswitch docs before landing and note it in the +CHANGELOG scope sentence so anyone reading later knows the envelope. + **Ship as v1.0.7-alpha-1 for sandbox testing**, don't wait for B/C. ### 2. `refactor(connect): persist stripe_transfer_id on create + retry` — A @@ -186,10 +207,61 @@ Acceptance: - Sampler queries are cheap: `SELECT COUNT(*) WHERE ... AND created_at < NOW() - INTERVAL '30 min'` per metric, indexed by `status + created_at`. - If indexes are missing: migration `983_ledger_health_indexes.sql`. +### 7. `feat(subscription): pending_payment state + webhook-driven activation` — G + +Effort: **M**. The follow-up to v1.0.6.2's gate-filter compensation. +v1.0.6.2 closed the feature bypass at the consumption site +(`GetUserSubscription` filters fantôme rows out); G replaces the +creation path so no fantôme rows get written in the first place. + +Migrations: +- `984_subscription_pending_payment_enum.sql` adds `'pending_payment'` + to the `user_subscriptions.status` VARCHAR (no enum at DB level, so + just a documentation + backfill step for Go constants). +- `985_backfill_hs_subscription_id_from_invoice.sql` (optional) — + backpopulates `user_subscriptions.hyperswitch_subscription_id` from + the attached invoice's PSP intent where available. Documents the + join rule for the webhook to reconcile on. + +Acceptance: +- `subscription/service.go:createNewSubscription` creates paid-plan + rows in `pending_payment` state (never `active`). Transition to + `active` only via `ProcessSubscriptionWebhook` on + `payment_succeeded`. On `payment_failed`: transition to `expired` + + log INFO; no invoice charge, no DB fantôme. +- `if s.paymentProvider != nil` short-circuit deleted — paid plans + without a payment provider configured return `503 payment provider + not configured` (env misconfig is an ops issue, not a silently-free + subscription). +- `GET /me/subscription` handles `pending_payment` explicitly — + returns `status: pending_payment`, `client_secret` echoed back so + the frontend can resume a stalled flow. +- `distribution.checkEligibility` treats `pending_payment` as + ineligible (same as the v1.0.6.2 `ErrSubscriptionNoPayment` path). +- Remove the TODO(v1.0.7-item-G) annotation; remove the v1.0.6.2 + filter from `GetUserSubscription` (redundant once the creation + path is correct) — OR keep the filter as defence-in-depth and note + it in the code comment. +- Webhook dispatcher: new `subscription` event family. Reuse the + `webhook_raw_payloads` table from item E for persistence. +- Tests: + - Subscribe to paid plan with Hyperswitch enabled → + `pending_payment` row + PSP intent, no feature access yet. + - Webhook `subscription.payment_succeeded` → row transitions to + `active`, feature access granted. + - Webhook `subscription.payment_failed` → row transitions to + `expired`, no charge, no access. + - Webhook replay (same payment_id) is idempotent. + - Subscribe with provider misconfigured → 503, no row created. + - Migration of v1.0.6.2 voided rows — check `voided_subscriptions_20260417` + entries stay readable and not re-pickable by the new flow. + +Independent of A/B/C/D/E/F. Can land at any point after D. + ## Release gating **Before tagging v1.0.7:** -- All six items landed as separate commits. +- All seven items landed as separate commits. - Refund smoke test (see `docs/audit-2026-04/smoke/refund.md` — TBD) re-run against sandbox with the reversal path: assert Stripe side shows the reversal as well as the refund. @@ -206,12 +278,13 @@ the API change). ## Effort total -Worst-case single-developer sequential: **~9-10 working days** -(XS + S + M + S + M + S = 2h + 1d + 3d + 1d + 3d + 1d). +Worst-case single-developer sequential: **~12-13 working days** +(XS + S + M + S + M + S + M = 2h + 1d + 3d + 1d + 3d + 1d + 3d). -With two devs on the parallel tracks: **~5 working days** end-to-end, -because track 1 (A→B, 4d) and track 2 (D, E, C→F done serially = 5d) -overlap. +With two devs on the parallel tracks: **~7 working days** end-to-end +— track 1 (A→B, 4d), track 2 (D, E, C→F, 5d), track 3 (G, 3d, can +run at any point after D). Track 3 extends the critical path unless +a third dev picks it up; with 3 devs it lands in the track-2 window. ## Unknowns to resolve before starting