senke/veza - Talas Project: Beyond coding. We Forge.

senke/veza

Author	SHA1	Message	Date
senke	2a96766ae3	feat(subscription): pending_payment state machine + mandatory provider (v1.0.9 item G — Phase 1) First instalment of Item G from docs/audit-2026-04/v107-plan.md §G. This commit lands the state machine + create-flow change. Phase 2 (webhook handler + recovery endpoint + reconciler sweep) follows. What changes : - `models.go` — adds `StatusPendingPayment` to the SubscriptionStatus enum. Free-text VARCHAR(30) so no DDL needed for the value itself; Phase 2's reconciler index lives in migration 986 (additive, partial index on `created_at` WHERE status='pending_payment'). - `service.go` — `PaymentProvider.CreateSubscriptionPayment` interface gains an `idempotencyKey string` parameter, mirroring the marketplace.refundProvider contract added in v1.0.7 item D. Callers pass the new subscription row's UUID so a retried HTTP request collapses to one PSP charge instead of duplicating it. - `createNewSubscription` — refactored state machine : * Free plan → StatusActive (unchanged, in subscribeToFreePlan). * Paid plan, trial available, first-time user → StatusTrialing, no PSP call (no invoice either — Phase 2 will create the first paid invoice on trial expiry). * Paid plan, no trial / repeat user → StatusPendingPayment + invoice + PSP CreateSubscriptionPayment with idempotency key = subscription.ID.String(). Webhook subscription.payment_succeeded (Phase 2) flips to active; subscription.payment_failed flips to expired. - `if s.paymentProvider != nil` short-circuit removed. Paid plans now require a configured PaymentProvider — without one, `createNewSubscription` returns ErrPaymentProviderRequired. The handler maps this to HTTP 503 "Payment provider not configured — paid plans temporarily unavailable", surfacing env misconfig to ops instead of silently giving away paid plans (the v1.0.6.2 fantôme bug class). - `GetUserSubscription` query unchanged — already filters on `status IN ('active','trialing')`, so pending_payment rows correctly read as "no active subscription" for feature-gate purposes. The v1.0.6.2 hasEffectivePayment filter is kept as defence-in-depth for legacy rows. - `hyperswitch.Provider` — implements `subscription.PaymentProvider` by delegating to the existing `CreatePaymentSimple`. Compile-time interface assertion added (`var _ subscription.PaymentProvider = (Provider)(nil)`). - `routes_subscription.go`* — wires the Hyperswitch provider into `subscription.NewService` when HyperswitchEnabled + HyperswitchAPIKey + HyperswitchURL are all set. Without those, the service falls back to no-provider mode (paid subscribes return 503). - Tests : new TestSubscribe_PendingPaymentStateMachine in gate_test.go covers all five visible outcomes (free / paid+ provider / paid+no-provider / first-trial / repeat-trial) with a fakePaymentProvider that records calls. Asserts on idempotency key = subscription.ID.String(), PSP call counts, and the Subscribe response shape (client_secret + payment_id surfaced). 5/5 green, sqlite :memory:. Phase 2 backlog (next session) : - `ProcessSubscriptionWebhook(ctx, payload)` — flip pending_payment → active on success / expired on failure, idempotent against replays. - Recovery endpoint `POST /api/v1/subscriptions/complete/:id` — return the existing client_secret to resume a stalled flow. - Reconciliation sweep for rows stuck in pending_payment past the webhook-arrival window (uses the new partial index from migration 986). - Distribution.checkEligibility explicit pending_payment branch (today it's already handled implicitly via the active/trialing filter). - E2E @critical : POST /subscribe → POST /distribution/submit asserts 403 with "complete payment" until webhook fires. Backward compat : clients on the previous flow that called /subscribe expecting an immediately-active row will now see status=pending_payment + a client_secret. They must drive the PSP confirm step before the row is granted feature access. The v1.0.6.2 voided_subscriptions cleanup migration (980) handles pre-existing fantôme rows. go build ./... clean. Subscription + handlers test suites green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 10:02:00 +02:00
senke	d03232c85c	feat(storage): add track storage_backend column + config prep (v1.0.8 P0) Some checks failed Veza CI / Backend (Go) (push) Failing after 0s Details Veza CI / Frontend (Web) (push) Failing after 0s Details Veza CI / Rust (Stream Server) (push) Failing after 0s Details Security Scan / Secret Scanning (gitleaks) (push) Failing after 0s Details Veza CI / Notify on failure (push) Failing after 0s Details Phase 0 of the MinIO upload migration (FUNCTIONAL_AUDIT §4 item 2). Schema + config only — Phase 1 will wire TrackService.UploadTrack() to actually route writes to S3 when the flag is flipped. Schema (migration 985): - tracks.storage_backend VARCHAR(16) NOT NULL DEFAULT 'local' CHECK in ('local', 's3') - tracks.storage_key VARCHAR(512) NULL (S3 object key when backend=s3) - Partial index on storage_backend = 's3' (migration progress queries) - Rollback drops both columns + index; safe only while all rows are still 'local' (guard query in the rollback comment) Go model (internal/models/track.go): - StorageBackend string (default 'local', not null) - StorageKey *string (nullable) - Both tagged json:"-" — internal plumbing, never exposed publicly Config (internal/config/config.go): - New field Config.TrackStorageBackend - Read from TRACK_STORAGE_BACKEND env var (default 'local') - Production validation rule #11 (ValidateForEnvironment): - Must be 'local' or 's3' (reject typos like 'S3' or 'minio') - If 's3', requires AWS_S3_ENABLED=true (fail fast, do not boot with TrackStorageBackend=s3 while S3StorageService is nil) - Dev/staging warns and falls back to 'local' instead of fail — keeps iteration fast while still flagging misconfig. Docs: - docs/ENV_VARIABLES.md §13 restructured as "HLS + track storage backend" with a migration playbook (local → s3 → migrate-storage CLI) - docs/ENV_VARIABLES.md §28 validation rules: +2 entries for new rules - docs/ENV_VARIABLES.md §29 drift findings: TRACK_STORAGE_BACKEND added to "missing from template" list before it was fixed - veza-backend-api/.env.template: TRACK_STORAGE_BACKEND=local with comment pointing at Phase 1/2/3 plans No behavior change yet — TrackService.UploadTrack() still hardcodes the local path via copyFileAsync(). Phase 1 wires it. Refs: - AUDIT_REPORT.md §9 item (deferrals v1.0.8) - FUNCTIONAL_AUDIT.md §4 item 2 "Stockage local disque only" - /home/senke/.claude/plans/audit-fonctionnel-wild-hickey.md Item 3 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 19:54:28 +02:00
senke	3c4d0148be	feat(webhooks): persist raw hyperswitch payloads to audit log — v1.0.7 item E Every POST /webhooks/hyperswitch delivery now writes a row to `hyperswitch_webhook_log` regardless of signature-valid or processing outcome. Captures both legitimate deliveries and attack probes — a forensics query now has the actual bytes to read, not just a "webhook rejected" log line. Disputes (axis-1 P1.6) ride along: the log captures dispute.* events alongside payment and refund events, ready for when disputes get a handler. Table shape (migration 984): * payload TEXT — readable in psql, invalid UTF-8 replaced with empty (forensics value is in headers + ip + timing for those attacks, not the binary body). * signature_valid BOOLEAN + partial index for "show me attack attempts" being instantaneous. * processing_result TEXT — 'ok' / 'error: <msg>' / 'signature_invalid' / 'skipped'. Matches the P1.5 action semantic exactly. * source_ip, user_agent, request_id — forensics essentials. request_id is captured from Hyperswitch's X-Request-Id header when present, else a server-side UUID so every row correlates to VEZA's structured logs. * event_type — best-effort extract from the JSON payload, NULL on malformed input. Hardening: * 64KB body cap via io.LimitReader rejects oversize with 413 before any INSERT — prevents log-spam DoS. * Single INSERT per delivery with final state; no two-phase update race on signature-failure path. signature_invalid and processing-error rows both land. * DB persistence failures are logged but swallowed — the endpoint's contract is to ack Hyperswitch, not perfect audit. Retention sweep: * CleanupHyperswitchWebhookLog in internal/jobs, daily tick, batched DELETE (10k rows + 100ms pause) so a large backlog doesn't lock the table. * HYPERSWITCH_WEBHOOK_LOG_RETENTION_DAYS (default 90). * Same goroutine-ticker pattern as ScheduleOrphanTracksCleanup. * Wired in cmd/api/main.go alongside the existing cleanup jobs. Tests: 5 in webhook_log_test.go (persistence, request_id auto-gen, invalid-JSON leaves event_type empty, invalid-signature capture, extractEventType 5 sub-cases) + 4 in cleanup_hyperswitch_webhook_ log_test.go (deletes-older-than, noop, default-on-zero, context-cancel). Migration 984 applied cleanly to local Postgres; all indexes present. Also (v107-plan.md): * Item G acceptance gains an explicit Idempotency-Key threading requirement with an empty-key loud-fail test — "literally copy-paste D's 4-line test skeleton". Closes the risk that item G silently reopens the HTTP-retry duplicate-charge exposure D closed. Out of scope for E (noted in CHANGELOG): * Rate limit on the endpoint — pre-existing middleware covers it at the router level; adding a per-endpoint limit is separate scope. * Readable-payload SQL view — deferred, the TEXT column is already human-readable; a convenience view is a nice-to-have not a ship-blocker. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 02:44:58 +02:00
senke	1a133af9ac	feat(marketplace): stripe reversal error disambiguation + CHECK constraint + E2E — v1.0.7 item B day 3 Day-3 closure of item B. The three things day 2 deferred are now done: 1. Stripe error disambiguation. ReverseTransfer in StripeConnectService now parses stripe.Error.Code + HTTPStatusCode + Msg to emit the sentinels the worker routes on. Pre-day-3 the sentinels were declared but the service wrapped every error opaquely, making this the exact "temporary compromise frozen into permanent" pattern the audit was meant to prevent — flagged during review and fixed same day. Mapping: * 404 + code=resource_missing → ErrTransferNotFound * 400 + msg matches "already" + "reverse" → ErrTransferAlreadyReversed * any other → transient (wrapped raw, retry) The "already reversed" case has no machine-readable code in stripe-go (unlike ChargeAlreadyRefunded for charges — the SDK doesn't enumerate the equivalent for transfers), so it's message-parsed. Fragility documented at the call site: if Stripe changes the wording, the worker treats the response as transient and eventually surfaces the row to permanently_failed after max retries. Worst-case regression is "benign case gets noisier", not data loss. 2. Migration 983: CHECK constraint chk_reversal_pending_has_next_ retry_at CHECK (status != 'reversal_pending' OR next_retry_at IS NOT NULL). Added NOT VALID so the constraint is enforced on new writes without scanning existing rows; a follow-up VALIDATE can run once the table is known to be clean. Prevents the "invisible orphan" failure mode where a reversal_pending row with NULL next_retry_at would be skipped by any future stricter worker query. 3. End-to-end reversal flow test (reversal_e2e_test.go) chains three sub-scenarios: (a) happy path — refund.succeeded → reversal_pending → worker → reversed with stripe_reversal_id persisted; (b) invalid stripe_transfer_id → worker terminates rapidly to permanently_failed with single Stripe call, no retries (the highest-value coverage per day-3 review); (c) already-reversed out-of-band → worker flips to reversed with informative message. Architecture note — the sentinels were moved to a new leaf package `internal/core/connecterrors` because both marketplace (needs them for the worker's errors.Is checks) and services (needs them to emit) import them, and an import cycle (marketplace → monitoring → services) would form if either owned them directly. marketplace re-exports them as type aliases so the worker code reads naturally against the marketplace namespace. New tests: * services/stripe_connect_service_test.go — 7 cases on isAlreadyReversedMessage (pins Stripe's wording), 1 case on the error-classification shape. Doesn't invoke stripe.SetBackend — the translation logic is tested via a crafted stripe.Error, the emission is trusted on the read of `errors.As` + the known shape of stripe.Error. marketplace/reversal_e2e_test.go — 3 end-to-end sub-tests chaining refund → worker against a dual-role mock. The invalid-id case asserts single-call-no-retries termination. * Migration 983 applied cleanly to the local Postgres; constraint visible in \d seller_transfers as NOT VALID (behavior correct for future writes, existing rows grandfathered). Self-assessment on day-2's struct-literal refactor of processSellerTransfers (deferred from day 2): The refactor is borderline — neither clearer nor confusing than the original mutation-after-construct pattern. Logged in the v1.0.7-rc1 CHANGELOG as a post-v1.0.7 consideration: if GORM BeforeUpdate hooks prove cleaner on other state machines (axis 2), revisit the anti-mutation test approach. CHANGELOG v1.0.7-rc1 entry added documenting items A + B end-to-end. Tag not yet applied — items C, D, E, F remain on the v1.0.7 plan. The rc1 tag lands when those four items close + the smoke probe validates the full cadence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 02:12:03 +02:00
senke	8d6f798f2d	feat(marketplace): seller transfer state machine matrix — v1.0.7 item B day 1 Day-1 foundation for item B (async Stripe Connect reversal worker). No worker code, no runtime enforcement yet — just the authoritative state machine that day 2's code will route through. Before writing the worker we want a single place where the legal transitions are defined and tested, so the worker's behavior can be argued against the matrix rather than implicitly codified across call sites. transfer_transitions.go: * SellerTransferStatus constants (Pending, Completed, Failed, ReversalPending [new], Reversed [new], PermanentlyFailed). * AllowedTransferTransitions map: pending → {completed, failed}; completed → {reversal_pending}; failed → {completed, permanently_failed}; reversal_pending → {reversed, permanently_failed}; reversed and permanently_failed as dead ends. * CanTransitionTransferStatus(from, to) — same-state always OK (idempotent bumps of retry_count / next_retry_at); unknown from fails conservatively (typos in call sites become visible). transfer_transitions_test.go: * TestTransferStateTransitions iterates the full 6×6 matrix (36 pairs) and asserts every pair against the expected outcome. * TestTransferStateTransitions_TerminalStatesHaveNoOutgoing double-locks Reversed + PermanentlyFailed as dead ends at the map level (not just at the caller level). * TestTransferStateTransitions_MatrixKeysAreAccountedFor keeps the canonical status list in sync with the map; a new status added to one but not the other fails the test. * TestCanTransitionTransferStatus_UnknownFromIsConservative documents the "unknown from → always false" policy so a future reader sees the intent. Migration 982 adds a partial composite index on (status, next_retry_at) WHERE status='reversal_pending', sibling to the existing idx_seller_transfers_retry (scoped to failed). Two parallel partial indexes cost less than widening the existing one (which would need a table-level lock) and keep the worker query planner- friendly. Day 2 routes processSellerTransfers, TransferRetryWorker, reverseSellerAccounting, admin_transfer_handler through CanTransitionTransferStatus at every Status mutation, and writes StripeReversalWorker. Day 3 exercises the end-to-end flow (refund → reversal_pending → worker → reversed) in a smoke probe. Checkpoint: ping user at end of day 1 before day 2 per discipline agreed upfront. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 14:13:02 +02:00
senke	eedaad9f83	refactor(connect): persist stripe_transfer_id on create + retry — v1.0.7 item A TransferService.CreateTransfer signature changes from (...) error to (...) (string, error) — the caller now captures the Stripe transfer identifier and persists it on the SellerTransfer row. Pre-v1.0.7 the stripe_transfer_id column was declared on the model and table but never written to, which blocked the reversal worker (v1.0.7 item B) from identifying which transfer to reverse on refund. Changes: * `TransferService` interface and `StripeConnectService.CreateTransfer` both return the Stripe transfer id alongside the error. * `processSellerTransfers` (marketplace service) persists the id on success before `tx.Create(&st)` so a crash between Stripe ACK and DB commit leaves no inconsistency. * `TransferRetryWorker.retryOne` persists on retry success — a row that failed on first attempt and succeeded via the worker is reversal-ready all the same. * `admin_transfer_handler.RetryTransfer` (manual retry) persists too. * `SellerPayout.ExternalPayoutID` is populated by the Connect payout flow (`payout.go`) — the field existed but was never written. * Four test mocks updated; two tests assert the id is persisted on the happy path, one on the failure path confirms we don't write a fake id when the provider errors. Migration `981_seller_transfers_stripe_reversal_id.sql`: * Adds nullable `stripe_reversal_id` column for item B. * Partial UNIQUE indexes on both stripe_transfer_id and stripe_reversal_id (WHERE IS NOT NULL AND <> ''), mirroring the v1.0.6.1 pattern for refunds.hyperswitch_refund_id. * Logs a count of historical completed transfers that lack an id — these are candidates for the backfill CLI follow-up task. Backfill for historical rows is a separate follow-up (cmd/tools/ backfill_stripe_transfer_ids, calling Stripe's transfers.List with Destination + Metadata[order_id]). Pre-v1.0.7 transfers without a backfilled id cannot be auto-reversed on refund — document in P2.9 admin-recovery when it lands. Acceptable scope per v107-plan. Migration number bumped 980 → 981 because v1.0.6.2 used 980 for the unpaid-subscription cleanup; v107-plan updated with the note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 13:08:39 +02:00
senke	9a8d2a4e73	chore(release): v1.0.6.2 — subscription payment-gate bypass hotfix Closes a bypass surfaced by the 2026-04 audit probe (axis-1 Q2): any authenticated user could POST /api/v1/subscriptions/subscribe on a paid plan and receive 201 active without the payment provider ever being invoked. The resulting row satisfied `checkEligibility()` in the distribution service via `can_sell_on_marketplace=true` on the Creator plan — effectively free access to /api/v1/distribution/submit, which dispatches to external partners. Fix is centralised in `GetUserSubscription` so there is no code path that can grant subscription-gated access without routing through the payment check. Effective-payment = free plan OR unexpired trial OR invoice with non-empty hyperswitch_payment_id. Migration 980 sweeps pre-existing fantôme rows into `expired`, preserving the tuple in a dated audit table for support outreach. Subscribe and subscribeToFreePlan treat the new ErrSubscriptionNoPayment as equivalent to ErrNoActiveSubscription so re-subscription works cleanly post-cleanup. GET /me/subscription surfaces needs_payment=true with a support-contact message rather than a misleading "you're on free" or an opaque 500. TODO(v1.0.7-item-G) annotation marks where the `if s.paymentProvider != nil` short-circuit needs to become a mandatory pending_payment state. Probe script `scripts/probes/subscription-unpaid-activation.sh` kept as a versioned regression test — dry-run by default, --destructive logs in and attempts the exploit against a live backend with automatic cleanup. 8-case unit test matrix covers the full hasEffectivePayment predicate. Smoke validated end-to-end against local v1.0.6.2: POST /subscribe returns 201 (by design — item G closes the creation path), but GET /me/subscription returns subscription=null + needs_payment=true, distribution eligibility returns false. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 12:21:53 +02:00
senke	f047276362	chore: cleanup old e2e tests, playwright configs, reorganize down migrations - Remove old apps/web/e2e/ test suite (replaced by tests/e2e/) - Remove old playwright configs (smoke, storybook, visual, root) - Move down migrations to veza-backend-api/migrations/rollback/ - Remove stale test results and playwright report artifacts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 11:35:26 +01:00

8 commits