Commit graph

46 commits

Author SHA1 Message Date
senke
8fb07c0df8 chore: release v1.0.7
Some checks failed
Veza CI / Backend (Go) (push) Failing after 0s
Veza CI / Frontend (Web) (push) Failing after 0s
Veza CI / Rust (Stream Server) (push) Failing after 0s
Security Scan / Secret Scanning (gitleaks) (push) Failing after 0s
Veza CI / Notify on failure (push) Failing after 0s
Promote v1.0.7-rc1 to final after the 2026-04-23 cleanup session:
- BFG history rewrite (2.3G → 66M, −97%)
- Marketplace transactions (b5281bec)
- UserRateLimiter wired (ebf3276d)
- 3 deprecated handlers + repository orphan + chat proto removed
- 19 disabled workflows archived
- ENV_VARIABLES.md canonicalized + HLS_STREAMING in template
- AUDIT_REPORT/FUNCTIONAL_AUDIT reconciled (10 done, 3 false-positives,
  2 deferrals v1.0.8)

VERSION: 1.0.7-rc1 → 1.0.7
CHANGELOG: full v1.0.7 entry above v1.0.7-rc1

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 14:38:22 +02:00
senke
6d51f52aae chore: release v1.0.7-rc1
Some checks failed
Veza CI / Backend (Go) (push) Failing after 0s
Veza CI / Frontend (Web) (push) Failing after 0s
Veza CI / Rust (Stream Server) (push) Failing after 0s
Frontend CI / test (push) Failing after 0s
Security Scan / Secret Scanning (gitleaks) (push) Failing after 0s
Veza CI / Notify on failure (push) Failing after 0s
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 00:57:17 +02:00
senke
94dfc80b73 feat(metrics): ledger-health gauges + alert rules — v1.0.7 item F
Five Prometheus gauges + reconciler metrics + Grafana dashboard +
three alert rules. Closes axis-1 P1.8 and adds observability for
item C's reconciler (user review: "F should include reconciler_*
metrics, otherwise tag is blind on the worker we just shipped").

Gauges (veza_ledger_, sampled every 60s):
  * orphan_refund_rows — THE canary. Pending refunds with empty
    hyperswitch_refund_id older than 5m = Phase 2 crash in
    RefundOrder. Alert: > 0 for 5m → page.
  * stuck_orders_pending — order pending > 30m with non-empty
    payment_id. Alert: > 0 for 10m → page.
  * stuck_refunds_pending — refund pending > 30m with hs_id.
  * failed_transfers_at_max_retry — permanently_failed rows.
  * reversal_pending_transfers — item B rows stuck > 30m.

Reconciler metrics (veza_reconciler_):
  * actions_total{phase} — counter by phase.
  * orphan_refunds_total — two-phase-bug canary.
  * sweep_duration_seconds — exponential histogram.
  * last_run_timestamp — alert: stale > 2h → page (worker dead).

Implementation notes:
  * Sampler thresholds hardcoded to match reconciler defaults —
    intentional mismatch allowed (alerts fire while reconciler
    already working = correct behavior).
  * Query error sets gauge to -1 (sentinel for "sampler broken").
  * marketplace package routes through monitoring recorders so it
    doesn't import prometheus directly.
  * Sampler runs regardless of Hyperswitch enablement; gauges
    default 0 when pipeline idle.
  * Graceful shutdown wired in cmd/api/main.go.

Alert rules in config/alertmanager/ledger.yml with runbook
pointers + detailed descriptions — each alert explains WHAT
happened, WHY the reconciler may not resolve it, and WHERE to
look first.

Grafana dashboard config/grafana/dashboards/ledger-health.json —
top row = 5 stat panels (orphan first, color-coded red on > 0),
middle row = trend timeseries + reconciler action rate by phase,
bottom row = sweep duration p50/p95/p99 + seconds-since-last-tick
+ orphan cumulative.

Tests — 6 cases, all green (sqlite :memory:):
  * CountsStuckOrdersPending (includes the filter on
    non-empty payment_id)
  * StuckOrdersZeroWhenAllCompleted
  * CountsOrphanRefunds (THE canary)
  * CountsStuckRefundsWithHsID (gauge-orthogonality check)
  * CountsFailedAndReversalPendingTransfers
  * ReconcilerRecorders (counter + gauge shape)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 03:40:14 +02:00
senke
7e180a2c08 feat(workers): hyperswitch reconciliation sweep for stuck pending states — v1.0.7 item C
New ReconcileHyperswitchWorker sweeps for pending orders and refunds
whose terminal webhook never arrived. Pulls live PSP state for each
stuck row and synthesises a webhook payload to feed the normal
ProcessPaymentWebhook / ProcessRefundWebhook dispatcher. The existing
terminal-state guards on those handlers make reconciliation
idempotent against real webhooks — a late webhook after the reconciler
resolved the row is a no-op.

Three stuck-state classes covered:
  1. Stuck orders (pending > 30m, non-empty payment_id) → GetPaymentStatus
     + synthetic payment.<status> webhook.
  2. Stuck refunds with PSP id (pending > 30m, non-empty
     hyperswitch_refund_id) → GetRefundStatus + synthetic
     refund.<status> webhook (error_message forwarded).
  3. Orphan refunds (pending > 5m, EMPTY hyperswitch_refund_id) →
     mark failed + roll order back to completed + log ERROR. This
     is the "we crashed between Phase 1 and Phase 2 of RefundOrder"
     case, operator-attention territory.

New interfaces:
  * marketplace.HyperswitchReadClient — read-only PSP surface the
    worker depends on (GetPaymentStatus, GetRefundStatus). The
    worker never calls CreatePayment / CreateRefund.
  * hyperswitch.Client.GetRefund + RefundStatus struct added.
  * hyperswitch.Provider gains GetRefundStatus + GetPaymentStatus
    pass-throughs that satisfy the marketplace interface.

Configuration (all env-var tunable with sensible defaults):
  * RECONCILE_WORKER_ENABLED=true
  * RECONCILE_INTERVAL=1h (ops can drop to 5m during incident
    response without a code change)
  * RECONCILE_ORDER_STUCK_AFTER=30m
  * RECONCILE_REFUND_STUCK_AFTER=30m
  * RECONCILE_REFUND_ORPHAN_AFTER=5m (shorter because "app crashed"
    is a different signal from "network hiccup")

Operational details:
  * Batch limit 50 rows per phase per tick so a 10k-row backlog
    doesn't hammer Hyperswitch. Next tick picks up the rest.
  * PSP read errors leave the row untouched — next tick retries.
    Reconciliation is always safe to replay.
  * Structured log on every action so `grep reconcile` tells the
    ops story: which order/refund got synced, against what status,
    how long it was stuck.
  * Worker wired in cmd/api/main.go, gated on
    HyperswitchEnabled + HyperswitchAPIKey. Graceful shutdown
    registered.
  * RunOnce exposed as public API for ad-hoc ops trigger during
    incident response.

Tests — 10 cases, all green (sqlite :memory:):
  * TestReconcile_StuckOrder_SyncsViaSyntheticWebhook
  * TestReconcile_RecentOrder_NotTouched
  * TestReconcile_CompletedOrder_NotTouched
  * TestReconcile_OrderWithEmptyPaymentID_NotTouched
  * TestReconcile_PSPReadErrorLeavesRowIntact
  * TestReconcile_OrphanRefund_AutoFails_OrderRollsBack
  * TestReconcile_RecentOrphanRefund_NotTouched
  * TestReconcile_StuckRefund_SyncsViaSyntheticWebhook
  * TestReconcile_StuckRefund_FailureStatus_PassesErrorMessage
  * TestReconcile_AllTerminalStates_NoOp

CHANGELOG v1.0.7-rc1 updated with the full item C section between D
and the existing E block, matching the order convention (ship order:
A → D → B → E → C, CHANGELOG order follows).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 03:08:15 +02:00
senke
3c4d0148be feat(webhooks): persist raw hyperswitch payloads to audit log — v1.0.7 item E
Every POST /webhooks/hyperswitch delivery now writes a row to
`hyperswitch_webhook_log` regardless of signature-valid or
processing outcome. Captures both legitimate deliveries and attack
probes — a forensics query now has the actual bytes to read, not
just a "webhook rejected" log line. Disputes (axis-1 P1.6) ride
along: the log captures dispute.* events alongside payment and
refund events, ready for when disputes get a handler.

Table shape (migration 984):
  * payload TEXT — readable in psql, invalid UTF-8 replaced with
    empty (forensics value is in headers + ip + timing for those
    attacks, not the binary body).
  * signature_valid BOOLEAN + partial index for "show me attack
    attempts" being instantaneous.
  * processing_result TEXT — 'ok' / 'error: <msg>' /
    'signature_invalid' / 'skipped'. Matches the P1.5 action
    semantic exactly.
  * source_ip, user_agent, request_id — forensics essentials.
    request_id is captured from Hyperswitch's X-Request-Id header
    when present, else a server-side UUID so every row correlates
    to VEZA's structured logs.
  * event_type — best-effort extract from the JSON payload, NULL
    on malformed input.

Hardening:
  * 64KB body cap via io.LimitReader rejects oversize with 413
    before any INSERT — prevents log-spam DoS.
  * Single INSERT per delivery with final state; no two-phase
    update race on signature-failure path. signature_invalid and
    processing-error rows both land.
  * DB persistence failures are logged but swallowed — the
    endpoint's contract is to ack Hyperswitch, not perfect audit.

Retention sweep:
  * CleanupHyperswitchWebhookLog in internal/jobs, daily tick,
    batched DELETE (10k rows + 100ms pause) so a large backlog
    doesn't lock the table.
  * HYPERSWITCH_WEBHOOK_LOG_RETENTION_DAYS (default 90).
  * Same goroutine-ticker pattern as ScheduleOrphanTracksCleanup.
  * Wired in cmd/api/main.go alongside the existing cleanup jobs.

Tests: 5 in webhook_log_test.go (persistence, request_id auto-gen,
invalid-JSON leaves event_type empty, invalid-signature capture,
extractEventType 5 sub-cases) + 4 in cleanup_hyperswitch_webhook_
log_test.go (deletes-older-than, noop, default-on-zero,
context-cancel). Migration 984 applied cleanly to local Postgres;
all indexes present.

Also (v107-plan.md):
  * Item G acceptance gains an explicit Idempotency-Key threading
    requirement with an empty-key loud-fail test — "literally
    copy-paste D's 4-line test skeleton". Closes the risk that
    item G silently reopens the HTTP-retry duplicate-charge
    exposure D closed.

Out of scope for E (noted in CHANGELOG):
  * Rate limit on the endpoint — pre-existing middleware covers
    it at the router level; adding a per-endpoint limit is
    separate scope.
  * Readable-payload SQL view — deferred, the TEXT column is
    already human-readable; a convenience view is a nice-to-have
    not a ship-blocker.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 02:44:58 +02:00
senke
3cd82ba5be fix(hyperswitch): idempotency-key on create-payment and create-refund — v1.0.7 item D
Every outbound POST /payments and POST /refunds from the Hyperswitch
client now carries an Idempotency-Key HTTP header. Key values are
explicit parameters at every call site — no context-carrier magic,
no auto-generation. An empty key is a loud error from the client
(not silent header omission) so a future new call site that forgets
to supply one fails immediately, not months later under an obscure
replay scenario.

Key choices, both stable across HTTP retries of the same logical
call:
  * CreatePayment → order.ID.String() (GORM BeforeCreate populates
    order.ID before the PSP call in ConfirmOrder).
  * CreateRefund → pendingRefund.ID.String() (populated by the
    Phase 1 tx.Create in RefundOrder, available for the Phase 2 PSP
    call).

Scope note (reproduced here for the next reader who grep-s the
commit log for "Idempotency-Key"):

  Idempotency-Key covers HTTP-transport retry (TLS reconnect,
  proxy retry, DNS flap) within a single CreatePayment /
  CreateRefund invocation. It does NOT cover application-level
  replay (user double-click, form double-submit, retry after crash
  before DB write). That class of bug requires state-machine
  preconditions on VEZA side — already addressed by the order
  state machine + the handler-level guards on POST
  /api/v1/payments (for payments) and the partial UNIQUE on
  `refunds.hyperswitch_refund_id` landed in v1.0.6.1 (for refunds).

  Hyperswitch TTL on Idempotency-Key: typically 24h-7d server-side
  (verify against current PSP docs). Beyond TTL, a retry with the
  same key is treated as a new request. Not a concern at current
  volumes; document if retry logic ever extends beyond 1 hour.

Explicitly out of scope: item D does NOT add application-level
retry logic. The current "try once, fail loudly" behavior on PSP
errors is preserved. Adding retries is a separate design exercise
(backoff, max attempts, circuit breaker) not part of this commit.

Interfaces changed:
  * hyperswitch.Client.CreatePayment(ctx, idempotencyKey, ...)
  * hyperswitch.Client.CreatePaymentSimple(...) convenience wrapper
  * hyperswitch.Client.CreateRefund(ctx, idempotencyKey, ...)
  * hyperswitch.Provider.CreatePayment threads through
  * hyperswitch.Provider.CreateRefund threads through
  * marketplace.PaymentProvider interface — first param after ctx
  * marketplace.refundProvider interface — first param after ctx

Removed:
  * hyperswitch.Provider.Refund (zero callers, superseded by
    CreateRefund which returns (refund_id, status, err) and is the
    only method marketplace's refundProvider cares about).

Tests:
  * Two new httptest.Server-backed tests (client_test.go) pin the
    Idempotency-Key header value for CreatePayment and CreateRefund.
  * Two new empty-key tests confirm the client errors rather than
    silently sending no header.
  * TestRefundOrder_OpensPendingRefund gains an assertion that
    f.provider.lastIdempotencyKey == refund.ID.String() — if a
    future refactor threads the key from somewhere else (paymentID,
    uuid.New() per call, etc.) the test fails loudly.
  * Four pre-existing test mocks updated for the new signature
    (mockRefundPaymentProvider in marketplace, mockPaymentProvider
    in tests/integration and tests/contract, mockRefundPayment
    Provider in tests/integration/refund_flow).

Subscription's CreateSubscriptionPayment interface declares its own
shape and has no live Hyperswitch-backed implementation today —
v1.0.6.2 noted this as the payment-gate bypass surface, v1.0.7
item G will ship the real provider. When that lands, item G's
implementation threads the idempotency key through in the same
pattern (documented in v107-plan.md item G acceptance).

CHANGELOG v1.0.7-rc1 entry updated with the full item D scope note
and the "out of scope: retries" caveat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 02:30:02 +02:00
senke
1a133af9ac feat(marketplace): stripe reversal error disambiguation + CHECK constraint + E2E — v1.0.7 item B day 3
Day-3 closure of item B. The three things day 2 deferred are now done:

1. Stripe error disambiguation.
   ReverseTransfer in StripeConnectService now parses
   stripe.Error.Code + HTTPStatusCode + Msg to emit the sentinels
   the worker routes on. Pre-day-3 the sentinels were declared but
   the service wrapped every error opaquely, making this the exact
   "temporary compromise frozen into permanent" pattern the audit
   was meant to prevent — flagged during review and fixed same day.

   Mapping:
     * 404 + code=resource_missing  → ErrTransferNotFound
     * 400 + msg matches "already" + "reverse" → ErrTransferAlreadyReversed
     * any other                    → transient (wrapped raw, retry)

   The "already reversed" case has no machine-readable code in
   stripe-go (unlike ChargeAlreadyRefunded for charges — the SDK
   doesn't enumerate the equivalent for transfers), so it's
   message-parsed. Fragility documented at the call site: if Stripe
   changes the wording, the worker treats the response as transient
   and eventually surfaces the row to permanently_failed after max
   retries. Worst-case regression is "benign case gets noisier",
   not data loss.

2. Migration 983: CHECK constraint chk_reversal_pending_has_next_
   retry_at CHECK (status != 'reversal_pending' OR next_retry_at
   IS NOT NULL). Added NOT VALID so the constraint is enforced on
   new writes without scanning existing rows; a follow-up VALIDATE
   can run once the table is known to be clean. Prevents the
   "invisible orphan" failure mode where a reversal_pending row
   with NULL next_retry_at would be skipped by any future stricter
   worker query.

3. End-to-end reversal flow test (reversal_e2e_test.go) chains
   three sub-scenarios: (a) happy path — refund.succeeded →
   reversal_pending → worker → reversed with stripe_reversal_id
   persisted; (b) invalid stripe_transfer_id → worker terminates
   rapidly to permanently_failed with single Stripe call, no
   retries (the highest-value coverage per day-3 review); (c)
   already-reversed out-of-band → worker flips to reversed with
   informative message.

Architecture note — the sentinels were moved to a new leaf
package `internal/core/connecterrors` because both marketplace
(needs them for the worker's errors.Is checks) and services (needs
them to emit) import them, and an import cycle
(marketplace → monitoring → services) would form if either owned
them directly. marketplace re-exports them as type aliases so the
worker code reads naturally against the marketplace namespace.

New tests:
  * services/stripe_connect_service_test.go — 7 cases on
    isAlreadyReversedMessage (pins Stripe's wording), 1 case on
    the error-classification shape. Doesn't invoke stripe.SetBackend
    — the translation logic is tested via a crafted *stripe.Error,
    the emission is trusted on the read of `errors.As` + the known
    shape of stripe.Error.
  * marketplace/reversal_e2e_test.go — 3 end-to-end sub-tests
    chaining refund → worker against a dual-role mock. The
    invalid-id case asserts single-call-no-retries termination.
  * Migration 983 applied cleanly to the local Postgres; constraint
    visible in \d seller_transfers as NOT VALID (behavior correct
    for future writes, existing rows grandfathered).

Self-assessment on day-2's struct-literal refactor of
processSellerTransfers (deferred from day 2):
The refactor is borderline — neither clearer nor confusing than the
original mutation-after-construct pattern. Logged in the v1.0.7-rc1
CHANGELOG as a post-v1.0.7 consideration: if GORM BeforeUpdate
hooks prove cleaner on other state machines (axis 2), revisit the
anti-mutation test approach.

CHANGELOG v1.0.7-rc1 entry added documenting items A + B end-to-end.
Tag not yet applied — items C, D, E, F remain on the v1.0.7 plan.
The rc1 tag lands when those four items close + the smoke probe
validates the full cadence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 02:12:03 +02:00
senke
149f76ccc7 docs: amend v1.0.6.2 CHANGELOG + item G recovery endpoint
CHANGELOG v1.0.6.2 block now documents the distribution-handler
propagate fix as part of the release (applied in commit 26cb52333
before re-tagging). v1.0.7 item G acceptance gains a recovery
endpoint requirement so the "complete payment" error message has a
real target rather than leaving users stuck.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 12:53:43 +02:00
senke
9a8d2a4e73 chore(release): v1.0.6.2 — subscription payment-gate bypass hotfix
Closes a bypass surfaced by the 2026-04 audit probe (axis-1 Q2): any
authenticated user could POST /api/v1/subscriptions/subscribe on a paid
plan and receive 201 active without the payment provider ever being
invoked. The resulting row satisfied `checkEligibility()` in the
distribution service via `can_sell_on_marketplace=true` on the Creator
plan — effectively free access to /api/v1/distribution/submit, which
dispatches to external partners.

Fix is centralised in `GetUserSubscription` so there is no code path
that can grant subscription-gated access without routing through the
payment check. Effective-payment = free plan OR unexpired trial OR
invoice with non-empty hyperswitch_payment_id. Migration 980 sweeps
pre-existing fantôme rows into `expired`, preserving the tuple in a
dated audit table for support outreach.

Subscribe and subscribeToFreePlan treat the new ErrSubscriptionNoPayment
as equivalent to ErrNoActiveSubscription so re-subscription works
cleanly post-cleanup. GET /me/subscription surfaces needs_payment=true
with a support-contact message rather than a misleading "you're on
free" or an opaque 500. TODO(v1.0.7-item-G) annotation marks where the
`if s.paymentProvider != nil` short-circuit needs to become a mandatory
pending_payment state.

Probe script `scripts/probes/subscription-unpaid-activation.sh` kept as
a versioned regression test — dry-run by default, --destructive logs in
and attempts the exploit against a live backend with automatic cleanup.
8-case unit test matrix covers the full hasEffectivePayment predicate.

Smoke validated end-to-end against local v1.0.6.2: POST /subscribe
returns 201 (by design — item G closes the creation path), but
GET /me/subscription returns subscription=null + needs_payment=true,
distribution eligibility returns false.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 12:21:53 +02:00
senke
5e3964b989 chore(release): v1.0.6.1 — partial UNIQUE on refunds.hyperswitch_refund_id
Hotfix surfaced by the v1.0.6 refund smoke test. Migration 978's plain
UNIQUE constraint on hyperswitch_refund_id collided on empty strings
— two refunds in the same post-Phase-1 / pre-Phase-2 state (or a
previous Phase-2 failure leaving '') would violate the constraint at
INSERT time on the second attempt, even though the refunds were for
different orders.

  * Migration 979_refunds_unique_partial.sql replaces the plain
    UNIQUE with a partial index excluding empty and NULL values.
    Idempotency for successful refunds is preserved — duplicate
    Hyperswitch webhooks land on the same row because the PSP-
    assigned refund_id is non-empty.
  * No Go code change. The bug was purely in the DB constraint shape.

Smoke test that caught it — 5/5 scenarios re-verified end-to-end:
happy path, idempotent replay (succeeded_at + balance strictly
invariant), PSP error rollback, webhook refund.failed, double-submit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 02:42:24 +02:00
senke
a4d2ffd123 chore(release): v1.0.6 — ergonomics + operational hardening
Follow-up to the v1.0.5 hardening sprint. That release validated the
`register → verify → play` critical path end-to-end; this one addresses
the next layer — the UX friction and operational blindspots that a
first-day public user (or a first-day on-call) would hit. Six targeted
commits, each with its own tests:

  * Fix 1 — Self-service creator role (9f4c2183a)
  * Fix 2 — Upload size limits from a single source (7974517c0)
  * Fix 3 — Unified SMTP env schema on canonical SMTP_* names (9002e91d9)
  * Fix 4 — Refund reverse-charge with idempotent webhook (92cf6d6f7)
  * Fix 5 — RTMP ingest health banner on Go Live (698859cc5)
  * Fix 6 — RabbitMQ publish failures no longer silent (4b4770f06)

Breaking changes:
  * marketplace.MarketplaceService.RefundOrder now returns
    (*Refund, error) — callers must accept the pending refund row.
  * Internal refundProvider interface changed from
    Refund(...) error to CreateRefund(...) (refundID, status, err).
  * Order status machine gains `refund_pending` as an intermediate
    state. Clients reading orders.status should not treat it as
    refunded yet.

Parked for v1.0.7:
  * Partial refunds (UX decision + call-site wiring)
  * Stripe Connect Transfers:reversal (internal accounting is
    already corrected; this is the external money-movement call)
  * CloudUploadModal.tsx unifying on /upload/limits
  * Manual smoke test of refund flow against Hyperswitch sandbox

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 02:13:45 +02:00
senke
070e31a463 chore(release): v1.0.5.1 — dev SMTP ergonomics hotfix
A fresh clone + `cp veza-backend-api/.env.template .env` + `make dev-full`
booted the backend with `SMTP_HOST=""` — `EmailService.sendEmail` short-
circuits to log-only when the host is empty, so `register` + `password
reset` produced users stuck with no way to verify (or recover) in dev,
and the smoke test caught MailHog empty despite the service being up.

- `.env.template` now ships MailHog-ready defaults (`localhost:1025`,
  UI on `:8025`, `FROM_EMAIL=no-reply@veza.local`) so a bare clone +
  copy gives a working register flow. Comment rewritten to point at
  both the dev path and the prod override.
- Also exports duplicate variable names (`SMTP_USERNAME`, `SMTP_FROM`,
  `SMTP_FROM_NAME`) read by `internal/email/sender.go`. The two email
  services in-tree disagree on env schema (`SMTP_USER` vs
  `SMTP_USERNAME`, `FROM_EMAIL` vs `SMTP_FROM`, `FROM_NAME` vs
  `SMTP_FROM_NAME`); until v1.0.6 reconciles them, both sets are
  populated so whichever path fires finds its names.

Pure config hotfix. No code change, no migration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 18:16:54 +02:00
senke
ba45bffd9a chore(release): v1.0.5 — hardening sprint
Seven targeted fixes to the register → verify → play critical path before
public opening. Each landed in its own commit with dedicated tests; this
commit just rolls VERSION forward and captures the rationale in the
changelog.

Summary of what's in this release:
  * Fix 1 — Player muet: /stream endpoint + HLS default alignment
  * Fix 2 — Email verify bidon: real SMTP + MailHog + fail-loud in prod
  * Fix 3 — Marketplace gratuit: HYPERSWITCH_ENABLED=true required in prod
  * Fix 4 — Redis obligatoire: REDIS_URL required in prod + ERROR log
    on in-memory PubSub fallback
  * Fix 5 — Maintenance mode DB-backed via platform_settings
  * Fix 6 — Hourly cleanup of orphan tracks stuck in processing
  * Fix 7 — Response cache bypass for range-aware media endpoints
    (surfaced by the browser smoke test; prevents Range/Accept-Ranges
    strip and JSON-round-trip byte corruption on /stream, /download,
    /hls/ and any request with a Range header)

Parked for v1.0.6 (🟠/🟡 audit items + smoke-test ergonomics):
Hyperswitch refund→PSP propagation, livestream UI feedback when
nginx-rtmp is down, upload size mismatch (front 500MB vs back 100MB),
RabbitMQ silent drop on enqueue failure, SMTP_HOST ergonomics for
`make dev` host mode, creator-role self-service onboarding for upload.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 16:14:54 +02:00
senke
d820c22d7d chore(release): v1.0.4 — cleanup sprint complete, CI green
7-day cleanup sprint (J1–J7) done. The codebase is unchanged
functionally but the working tree, docs, k8s runbooks, CI, and
Go dependency graph are all realigned with reality for the first
time since the v1.0.0 release.

VERSION          1.0.2 → 1.0.4 (skips v1.0.3 — that tag already
                 exists upstream, unused on this branch)
CHANGELOG.md     full v1.0.4 entry with per-day (J1–J7) breakdown
                 and the govulncheck + CI fix trail
docs/PROJECT_STATE.md   header month + version table refreshed,
                        pointer to AUDIT_REPORT.md added
docs/FEATURE_STATUS.md  header updated — no feature matrix
                        changes (no feature work in this sprint)

Key deliverables of the sprint:
  J1  0e7097ed1  purge 220 MB of debris (binaries, reports,
                 session docs, stale MVP scripts)
  J2  2aea1af36  rewrite CLAUDE.md, fix README, purge chat-server
                 refs from k8s runbooks and env examples
  J3  67f18892a  remove 3 deprecated unused handlers
  J3+ 7fa314866  2FA handler duplicate removal (bundled by parallel
                 ci-cache commit)
  J4  9cdfc6d89  GDPR-compliant hard delete with Redis SCAN cursor
                 and ES DeleteByQuery — closes TODO(HIGH-007)
  J5  0589ec9fc  defer GeoIP, rename v2-v3-types.ts to domain.ts,
                 document Storybook kill
  J5+ 7f89bebe1  fix lint-staged eslint rule (was linting the
                 whole project — root cause of earlier --no-verify)
  J6  113210734  mark 3 dormant docker-compose files deprecated
  fix 3d1f127ad  bump x/image, quic-go, testcontainers-go — drops
                 containerd + docker/docker from dep graph,
                 resolving 5 govulncheck findings without allowlist
  fix b33227a57  bump go.work to 1.25 to match veza-backend-api
  fix 73fc6e128  bump x/net v0.51.0 for GO-2026-4559
  fix 376d9adc4  retire legacy backend-ci.yml, centralize Docker
                 probe in SkipIfNoIntegration

CI status on the consolidated ci.yml workflow for 376d9adc4:
  Veza CI / Backend (Go)        OK 6m36s
  Veza CI / Frontend (Web)      OK 20m57s
  Veza CI / Rust (Stream)       OK 6m25s
  Security Scan / gitleaks      OK 4m13s
  Veza CI / Notify              skipped (fires only on failure)

First fully green CI run of the sprint and the first in a long
time overall. The tag v1.0.4 is cut on this state.

Refs: AUDIT_REPORT.md, all commits 0e7097ed1..376d9adc4
2026-04-15 16:39:30 +02:00
senke
ecf8d73e55 fix(release): v1.0.2 — Conformité complète V1_SIGNOFF (21 critères)
Some checks failed
Backend API CI / test-unit (push) Failing after 0s
Backend API CI / test-integration (push) Failing after 0s
Frontend CI / test (push) Failing after 0s
Storybook Audit / Build & audit Storybook (push) Failing after 0s
- Couverture Go: script coverage_report.sh, 39% mesuré
- Vitest thresholds frontend 50%
- Load test WebSocket: CHAT_ORIGIN→backend, WS_URL=/api/v1/ws
- Tests: chat_service (WSUrl), password_service (hash/expired)
- V1_SIGNOFF: 14 PASS, 7 N/A documentés
- PERFORMANCE_BASELINE, RGPD, PWA tables v1.0.2
- Runbooks, Grafana, Secrets validés
2026-03-03 21:18:53 +01:00
senke
7cfd48a82a fix(release): v1.0.1 — Conformité complète ROADMAP checklist
Some checks failed
Backend API CI / test-unit (push) Failing after 0s
Backend API CI / test-integration (push) Failing after 0s
Stream Server CI / test (push) Failing after 0s
- Sécurité: npm 0 CRITICAL, cargo audit 0 vulnérabilités
- OpenAPI: @Param id corrigé pour /tracks/quota/{id}
- Tests: Payment E2E passe, OAuth DATABASE_URL fallback
- Migrations: 000_mark_consolidated.sql
- veza-stream-server: prometheus 0.14, validator 0.19
- docs: SECURITY_SCAN_RC1, V1_SIGNOFF, PROJECT_STATE
2026-03-03 20:17:54 +01:00
senke
69c6f55fb1 chore(release): bump VERSION to 1.0.0 — Commercial release 2026-03-03 19:54:04 +01:00
senke
dad5aae71c chore(release): v0.992 RC2 — Release notes, sign-off final
Some checks failed
Backend API CI / test-integration (push) Failing after 0s
Frontend CI / test (push) Failing after 0s
Backend API CI / test-unit (push) Failing after 0s
Storybook Audit / Build & audit Storybook (push) Failing after 0s
2026-03-03 19:53:41 +01:00
senke
0f31c11304 chore: regenerate CHANGELOG, bump VERSION to 0.991 for RC1 2026-03-03 19:52:49 +01:00
senke
1e4ed6ef87 docs: update API_REFERENCE, CHANGELOG, FEATURE_STATUS, PROJECT_STATE for v0.803 2026-03-03 09:25:20 +01:00
senke
4464f98194 chore(release): v0.981 — Beta (staging deploy, bug bash, smoke test)
Some checks failed
Stream Server CI / test (push) Failing after 0s
2026-03-02 19:33:42 +01:00
senke
d577f8c9be chore(release): v0.971 — Phantom (gamification removal, WebRTC Beta, limits doc)
Some checks failed
Backend API CI / test-unit (push) Failing after 0s
Backend API CI / test-integration (push) Failing after 0s
Frontend CI / test (push) Failing after 0s
Storybook Audit / Build & audit Storybook (push) Failing after 0s
2026-03-02 19:25:37 +01:00
senke
da837fc085 chore(release): v0.951 — Loadtest (500 req/s, 1000 WS, 50 uploads, perf indexes)
Some checks failed
Backend API CI / test-unit (push) Failing after 0s
Backend API CI / test-integration (push) Failing after 0s
2026-03-02 19:22:38 +01:00
senke
5063c95a5c docs: update documentation for v0.803 release 2026-02-25 20:04:37 +01:00
senke
7692c4b8b9 feat(v0.802): frontend Cloud/Gear, MSW, docs, scope v0.803, archive
- Cloud: CloudFileVersions, CloudShareModal, versions/share in CloudView
- Gear: GearDocumentsTab, GearRepairsTab, warranty badge, initialTab
- MSW: cloud versions/share, gear documents/repairs, tags suggest
- Stories: CloudFileVersions, CloudShareModal, GearDetailModal variants
- gearService: listDocuments, uploadDocument, deleteDocument, listRepairs, createRepair, deleteRepair
- cloudService: listVersions, restoreVersion, shareFile, getSharedFile
- gear_warranty_notifier: 24h ticker, notifications for expiring warranty
- tag_handler_test: unit tests
- docs: API_REFERENCE, CHANGELOG, PROJECT_STATE, FEATURE_STATUS v0.802
- SCOPE_CONTROL, .cursorrules: scope v0.803
- archive: V0_802_RELEASE_SCOPE, RETROSPECTIVE_V0802
2026-02-25 14:00:58 +01:00
senke
7c73af9b7f docs: update CHANGELOG, PROJECT_STATE, FEATURE_STATUS for v0.801 2026-02-25 10:00:24 +01:00
senke
63867f1d09 feat(v0.703): Go Live & Streaming Complet
Some checks failed
Backend API CI / test-unit (push) Failing after 0s
Backend API CI / test-integration (push) Failing after 0s
Frontend CI / test (push) Failing after 0s
Storybook Audit / Build & audit Storybook (push) Failing after 0s
- Backend: room creation for live streams, permissions CanJoin/CanSend/CanRead for stream rooms
- LiveViewChat: useLiveStreamChat hook, WebSocket connection, stream_id as room
- LiveViewPlayer: real-time viewer count via polling (5s)
- Media Session: seekbackward/seekforward handlers (10s step)
- GoLiveView.stories.tsx: Default, Loading, Error, StreamKeyVisible
- Docs: API_REFERENCE, CHANGELOG, PROJECT_STATE, FEATURE_STATUS, RETROSPECTIVE_V0703
- SCOPE_CONTROL, .cursorrules: update to v0.801
- Archive V0_703_RELEASE_SCOPE.md
2026-02-25 09:35:22 +01:00
senke
6293a88476 docs: update CHANGELOG, PROJECT_STATE, FEATURE_STATUS for v0.702 2026-02-24 00:21:20 +01:00
senke
c785e61e69 feat(v0.701): AdminTransfers page/route, MSW, stories, Deep Health, API ref, docs, scope v0.702
Some checks failed
Backend API CI / test-unit (push) Failing after 0s
Backend API CI / test-integration (push) Failing after 0s
Frontend CI / test (push) Failing after 0s
Storybook Audit / Build & audit Storybook (push) Failing after 0s
- Step 13: AdminTransfersPage, LazyAdminTransfers, route /admin/transfers
- Step 14: MSW handlers admin transfers
- Step 15: AdminTransfersView stories (Default, Empty, WithFailedTransfers, Error, Loading)
- Step 16-17: DeepHealth handler (disk, config), GET /health/deep
- Step 19: health_deep_test.go (4 tests)
- Step 20: docs/API_REFERENCE.md
- Step 21: Archive V0_604, MIGRATIONS.md migration 116
- Step 22: CHANGELOG, PROJECT_STATE, FEATURE_STATUS v0.701
- Step 23: RETROSPECTIVE_V0701, V0_702 placeholder, SCOPE_CONTROL, .cursorrules
- Step 24: Archive V0_701_RELEASE_SCOPE
- Fix: AdminTransfersView Select component (use options API)
2026-02-23 23:42:02 +01:00
senke
00d33a1add docs: update PROJECT_STATE, FEATURE_STATUS, CHANGELOG for v0.603 2026-02-23 22:59:38 +01:00
senke
83ed4f315b chore(release): v0.602 — Payout, Dette Technique & Tests E2E
Some checks failed
Backend API CI / test-unit (push) Failing after 0s
Backend API CI / test-integration (push) Failing after 0s
Frontend CI / test (push) Failing after 0s
Storybook Audit / Build & audit Storybook (push) Failing after 0s
- Stripe Connect: onboarding, balance, SellerDashboardView
- Interceptors: auth.ts, error.ts extracted, facade
- Grafana: dashboards enriched (p50, top endpoints, 4xx, WS, commerce)
- E2E commerce: product->order->review->invoice
- SMOKE_TEST_V0602, RETROSPECTIVE_V0602, PAYOUT_MANUAL
- Archive V0_602 scope, V0_603 placeholder, SCOPE_CONTROL v0.603
- Fix sanitizer regex (Go no backreferences)
- Marketplace test schema: product_licenses, product_images, orders, licenses
2026-02-23 22:32:01 +01:00
senke
aee1ec18e2 docs(v0.503): finalization, documentation, changelog, tag
- Update FEATURE_STATUS.md: HLS Streaming -> Opérationnel (v0.503)
- Update PROJECT_STATE.md: v0.503 delivered, next version v0.601
- Add CHANGELOG.md v0.503 entry with all changes
- Create SMOKE_TEST_V0503.md validation checklist
- Create RETROSPECTIVE_V0503.md
- Archive V0_503_RELEASE_SCOPE.md to docs/archive/
- Create V0_601_RELEASE_SCOPE.md placeholder
- Update SCOPE_CONTROL.md references to v0.601
- Update .cursorrules scope to v0.601
2026-02-22 21:28:46 +01:00
senke
40883aebea docs(v0.502): Sprint 6 -- finalization, docs, and tag
- Update PROJECT_STATE.md: v0.502 delivered, next version v0.503
- Update CHANGELOG.md: comprehensive v0.502 entry (Added/Changed/Removed/Infrastructure)
- Create SMOKE_TEST_V0502.md: validation checklist for chat rewrite
- Create RETROSPECTIVE_V0502.md: retrospective with metrics and action items
- Archive V0_502_RELEASE_SCOPE.md to docs/archive/
- Create V0_503_RELEASE_SCOPE.md placeholder
- Update SCOPE_CONTROL.md and .cursorrules to reference v0.503
2026-02-22 20:51:55 +01:00
senke
c416f51f25 docs(v0.501): Sprint 6 -- finalization and tag
- FIN-01: Add smoke test results (22/22 features pass)
- FIN-02: Update PROJECT_STATE.md for v0.501
- FIN-03: Update CHANGELOG.md with v0.501 entries
- FIN-04: Archive V0_501 scope, create V0_502 placeholder
- FIN-05: Add v0.501 retrospective
- FIN-06: Validate Go build passes
2026-02-22 18:45:07 +01:00
senke
03d9517f2c docs: add v0.404 CHANGELOG and retrospective
FIN-05 + FIN-06: Complete CHANGELOG for v0.404 with all security,
infrastructure, code quality, documentation, testing, and integration
changes. Retrospective includes pre/post scores (4.2 -> 6.6/10).
2026-02-22 17:57:49 +01:00
senke
fa4d141572 test(marketplace): add MSW handlers, update CHANGELOG and docs for v0.401 2026-02-22 14:23:28 +01:00
senke
f48a910d5d feat(chat): add call signaling types 2026-02-22 03:46:10 +01:00
senke
98894be01b docs: update FEATURE_STATUS, PROJECT_STATE, CHANGELOG for v0.302 2026-02-22 03:24:01 +01:00
senke
5fcd33618a docs: préparation v0.302 - V0_302_RELEASE_SCOPE, PROJECT_STATE, SCOPE_CONTROL, FEATURE_STATUS, CHANGELOG 2026-02-21 05:42:16 +01:00
senke
51581f4203 docs: update FEATURE_STATUS, PROJECT_STATE, CHANGELOG for v0.301 2026-02-21 05:32:29 +01:00
senke
03f49a2d93 docs: update FEATURE_STATUS, PROJECT_STATE, CHANGELOG for v0.203 2026-02-20 18:47:23 +01:00
senke
ede3546f4b feat(release): v0.202 — Lots G, H, F, C, D
Some checks failed
Backend API CI / test-unit (push) Failing after 0s
Backend API CI / test-integration (push) Failing after 0s
Frontend CI / test (push) Failing after 0s
Storybook Audit / Build & audit Storybook (push) Failing after 0s
- Lot G: Recherche avancée (musical_key, tri pertinence, autocomplete, facettes, historique)
- Lot H: Analytics créateur (stats, charts, completion rate, export CSV/JSON)
- Lot F: Seller dashboard (GET /sell/stats, liste produits)
- Lot C: Player (crossfade, gapless preload, PiP)
- Lot D2: Autoplay (GET /tracks/recommendations, section À écouter ensuite)

Backend: GetRecommendations handler, route /tracks/recommendations
Frontend: PlayerQueue recommendations, fix TS errors (GlobalPlayer, AnalyticsViewKpiGrid, etc.)
Docs: FEATURE_STATUS, PROJECT_STATE, CHANGELOG, SCOPE_CONTROL
2026-02-20 18:16:17 +01:00
senke
8961b4ba14 chore: finalize v0.201 docs (CHANGELOG, FEATURE_STATUS, PROJECT_STATE, SCOPE_CONTROL) 2026-02-20 15:44:30 +01:00
senke
ccf98983fe chore(v0.103): finalize release — CHANGELOG, FEATURE_STATUS, .cursorrules scope
Some checks failed
Backend API CI / test-unit (push) Failing after 0s
Backend API CI / test-integration (push) Failing after 0s
Frontend CI / test (push) Failing after 0s
Storybook Audit / Build & audit Storybook (push) Failing after 0s
2026-02-20 15:14:25 +01:00
senke
222fb95372 docs: add CHANGELOG v0.102 release notes; test(e2e): add queue flow tests 2026-02-20 12:57:26 +01:00
okinrev
41e554a3e1 docs(remediation): add audit report, remediation plan and changelog skeleton 2025-12-06 13:25:54 +01:00