veza/CHANGELOG.md
senke 94dfc80b73 feat(metrics): ledger-health gauges + alert rules — v1.0.7 item F
Five Prometheus gauges + reconciler metrics + Grafana dashboard +
three alert rules. Closes axis-1 P1.8 and adds observability for
item C's reconciler (user review: "F should include reconciler_*
metrics, otherwise tag is blind on the worker we just shipped").

Gauges (veza_ledger_, sampled every 60s):
  * orphan_refund_rows — THE canary. Pending refunds with empty
    hyperswitch_refund_id older than 5m = Phase 2 crash in
    RefundOrder. Alert: > 0 for 5m → page.
  * stuck_orders_pending — order pending > 30m with non-empty
    payment_id. Alert: > 0 for 10m → page.
  * stuck_refunds_pending — refund pending > 30m with hs_id.
  * failed_transfers_at_max_retry — permanently_failed rows.
  * reversal_pending_transfers — item B rows stuck > 30m.

Reconciler metrics (veza_reconciler_):
  * actions_total{phase} — counter by phase.
  * orphan_refunds_total — two-phase-bug canary.
  * sweep_duration_seconds — exponential histogram.
  * last_run_timestamp — alert: stale > 2h → page (worker dead).

Implementation notes:
  * Sampler thresholds hardcoded to match reconciler defaults —
    intentional mismatch allowed (alerts fire while reconciler
    already working = correct behavior).
  * Query error sets gauge to -1 (sentinel for "sampler broken").
  * marketplace package routes through monitoring recorders so it
    doesn't import prometheus directly.
  * Sampler runs regardless of Hyperswitch enablement; gauges
    default 0 when pipeline idle.
  * Graceful shutdown wired in cmd/api/main.go.

Alert rules in config/alertmanager/ledger.yml with runbook
pointers + detailed descriptions — each alert explains WHAT
happened, WHY the reconciler may not resolve it, and WHERE to
look first.

Grafana dashboard config/grafana/dashboards/ledger-health.json —
top row = 5 stat panels (orphan first, color-coded red on > 0),
middle row = trend timeseries + reconciler action rate by phase,
bottom row = sweep duration p50/p95/p99 + seconds-since-last-tick
+ orphan cumulative.

Tests — 6 cases, all green (sqlite :memory:):
  * CountsStuckOrdersPending (includes the filter on
    non-empty payment_id)
  * StuckOrdersZeroWhenAllCompleted
  * CountsOrphanRefunds (THE canary)
  * CountsStuckRefundsWithHsID (gauge-orthogonality check)
  * CountsFailedAndReversalPendingTransfers
  * ReconcilerRecorders (counter + gauge shape)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 03:40:14 +02:00

1821 lines
85 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Changelog - Veza
## [v1.0.7-rc1] - in progress (2026-04-18)
Release-candidate entry — items A and B delivered, items C/D/E/F
remain. See `docs/audit-2026-04/v107-plan.md` for the full scope.
This CHANGELOG section documents what's landed against main so far;
a final v1.0.7 tag requires the remaining items to close.
### Item A — persist stripe_transfer_id on seller_transfers
Pre-v1.0.7 `TransferService.CreateTransfer` returned `error` only —
the Stripe transfer id was discarded (the single line
`_, err := transfer.New(params)` threw it away) and the
`stripe_transfer_id` column sat empty on every row. This blocked
item B's reversal worker from identifying which transfer to reverse.
* Interface signature change: `(..., error)``(id string, ..., error)`.
* Four call sites capture and persist the id:
processSellerTransfers (new sale), TransferRetryWorker (retry
recovery), admin_transfer_handler.RetryTransfer (manual admin
retry), payout.RequestPayout (writes to SellerPayout.ExternalPayoutID).
* Four test mocks extended. Three assertions added verifying
persistence on the happy path; one failure-path test confirms
the id is NOT persisted when the provider errors.
* Migration `981_seller_transfers_stripe_reversal_id.sql` adds
`stripe_reversal_id` (prep for B) and partial UNIQUE indexes on
both id columns (matching the v1.0.6.1 pattern for
refunds.hyperswitch_refund_id).
* Defensive guard: `StripeConnectService.CreateTransfer` fails the
call if Stripe returns `(tr, nil)` with `tr.ID == ""` — the
SDK's invariant, but a violation would leave the row permanently
un-reversible, so better to fail loudly.
Backfill for historical rows where the id is empty (ops task #38)
is tracked separately: pre-v1.0.7 transfers cannot be
auto-reversed; the backfill CLI queries Stripe's transfers.List by
metadata[order_id] to populate missing ids, acceptable to leave NULL
per v107-plan.
### Item F — ledger-health metrics + alerts
Five Prometheus gauges expose money-movement pipeline state so ops
dashboards + alert rules can spot a stall before a customer does.
Paired with counter/histogram metrics for the item-C reconciler so
the dashboard tells the whole story at a glance ("we have N stuck
orders and the reconciler has resolved M of them today").
Gauges (sampled every 60s via `ScheduleLedgerHealthSampler`):
* `veza_ledger_orphan_refund_rows` — THE alert gauge. Pending
refunds with empty hyperswitch_refund_id older than 5m.
Non-zero = Phase 2 crash in RefundOrder. Pages on > 0 for 5m.
* `veza_ledger_stuck_orders_pending` — orders pending > 30m
with non-empty payment_id (webhook never arrived). Pages on
> 0 for 10m.
* `veza_ledger_stuck_refunds_pending` — refunds with hs_id but
still pending > 30m.
* `veza_ledger_failed_transfers_at_max_retry`
seller_transfers in permanently_failed.
* `veza_ledger_reversal_pending_transfers` — item B rows stuck
in reversal_pending > 30m (worker behind or Stripe down).
Reconciler metrics (item F extends item C observability):
* `veza_reconciler_actions_total{phase}` — counter labelled by
phase (stuck_orders | stuck_refunds | orphan_refunds).
* `veza_reconciler_orphan_refunds_total` — dedicated counter for
the two-phase-commit-bug canary.
* `veza_reconciler_sweep_duration_seconds` — histogram with 10
exponential buckets (0.1s to ~100s).
* `veza_reconciler_last_run_timestamp` — unix ts of last tick.
Alert fires if `time() - ts > 7200` (2 * default
RECONCILE_INTERVAL).
Sampler queries are all indexed on `status + created_at` (or
`status + updated_at` for reversal_pending). Query errors set the
gauge to -1 — a distinctive value dashboards filter on ("sampler
broken, don't trust the number") instead of leaking a stale value.
Alert rules in `config/alertmanager/ledger.yml`:
* `VezaOrphanRefundRows` — page on > 0 for 5m (two-phase bug)
* `VezaStuckOrdersPending` — page on > 0 for 10m (webhook
pipeline stuck)
* `VezaReconcilerStale` — page on last-run > 2h (worker dead,
stuck/orphan rows accumulating)
Grafana dashboard `config/grafana/dashboards/ledger-health.json`:
5 stat panels (top row) + stuck-state timeseries + reconciler
action rate + sweep duration quantiles + seconds-since-last-tick
+ orphan refunds cumulative.
Worker instrumentation: ReconcileHyperswitchWorker now emits
RecordReconcilerAction / RecordReconcilerOrphanRefund /
RecordReconcilerSweepDuration at the right points. Tests cover
the sampler's count queries (5 cases, all branches) plus the
recorder shape.
Sampler wired in cmd/api/main.go with graceful shutdown; runs
regardless of Hyperswitch enablement (gauges default to 0, which
is the correct story for "Hyperswitch not configured").
### Item C — Hyperswitch reconciliation sweep
New `ReconcileHyperswitchWorker` sweeps for pending orders and
refunds whose terminal webhook never arrived (network hiccup, our
endpoint down, PSP queue stuck). For each stuck row the worker
pulls live PSP state and synthesises a webhook payload that feeds
the normal `ProcessPaymentWebhook` / `ProcessRefundWebhook`
dispatcher. The existing terminal-state guards in those handlers
make the reconciliation idempotent against real webhooks — a late
webhook after the reconciler has already resolved the row is a
no-op.
Covers three stuck-state classes:
1. **Stuck orders** (pending > 30m, non-empty payment_id): we
opened an order, called CreatePayment, got back a payment_id,
but never received the succeeded/failed webhook. Worker calls
GetPaymentStatus and dispatches a synthetic
`payment.<status>` webhook.
2. **Stuck refunds with a PSP id** (pending > 30m, non-empty
hyperswitch_refund_id): same pattern via GetRefundStatus +
synthetic `refund.<status>` webhook. The PSP's error_message
is forwarded into the payload so downstream handlers persist
it.
3. **Orphan refunds** (pending > 5m, EMPTY hyperswitch_refund_id):
the harder case. We opened a Phase 1 Refund row but crashed
before Phase 2 (PSP call). The row has no PSP id, the PSP has
no record. Worker marks the row `failed` with an explanatory
error_message, rolls the order back to `completed` (so the
buyer can retry), and logs **ERROR** — this is operator-
attention territory: a mid-refund crash happened, root cause
should be investigated.
Batch-bounded (50 rows per phase per tick) so a 10k-row backlog
doesn't hammer Hyperswitch on a single tick. PSP read errors leave
the row unchanged — next tick retries.
Configuration:
* RECONCILE_WORKER_ENABLED=true (default)
* RECONCILE_INTERVAL=1h (default; ops can drop to 5m during
incident response without a code change)
* RECONCILE_ORDER_STUCK_AFTER=30m
* RECONCILE_REFUND_STUCK_AFTER=30m
* RECONCILE_REFUND_ORPHAN_AFTER=5m (shorter because orphan is
an "app crashed" signal, not "network hiccup")
Interfaces introduced:
* `marketplace.HyperswitchReadClient` — the worker depends on
read-only PSP access (`GetPaymentStatus`, `GetRefundStatus`)
without knowing about CreatePayment / CreateRefund. Implemented
by `hyperswitch.Provider`.
* `hyperswitch.Client.GetRefund` + `RefundStatus` struct added
(mirror of existing GetPayment / PaymentStatus).
Worker wired in cmd/api/main.go alongside the other marketplace
workers; gated on `HyperswitchEnabled && HyperswitchAPIKey != ""`.
A separate scoped `marketplace.NewService` is constructed for the
dispatcher side (the webhook-handler uses its own via
`APIRouter.getMarketplaceService` with additional storage/checkout
opts the reconciler doesn't need).
Tests (10 cases, all green, sqlite :memory:):
* happy-path stuck order → synthetic webhook dispatched with
correct event_type / payment_id / status.
* recent order (under the stuck threshold) → untouched.
* completed order → untouched.
* order with empty payment_id → untouched (pre-PSP-call, nothing
to reconcile).
* PSP read error on GetPaymentStatus → row stays pending,
worker logs and moves on.
* orphan refund → auto-failed + order rolled back + error logged.
* recent orphan refund (under 5m) → left alone for Phase 2 to
complete.
* stuck refund with PSP id → synthetic webhook dispatched.
* refund with status=failed → PSP error_message survives into the
synthetic payload (downstream relies on it).
* all-terminal-state seed (completed / refunded / succeeded rows)
→ zero PSP calls, zero dispatches.
### Item E — webhook raw-payload audit log
Every POST /webhooks/hyperswitch delivery is now persisted to
`hyperswitch_webhook_log` regardless of signature-valid or processing
outcome. Captures both legitimate deliveries and attack probes — a
forensics query "what did we actually receive from this IP last
Tuesday" now has the actual bytes to read, not just "webhook rejected:
invalid signature" in a grep-able log line.
Table shape (migration 984):
* `payload TEXT` — Hyperswitch sends JSON, TEXT is readable in psql
without base64-decoding. Invalid UTF-8 replaced with empty string
before INSERT (forensics value of a binary blob is zero vs. the
headers+ip+timing we keep regardless).
* `signature_valid BOOLEAN` — partial index on `WHERE
signature_valid = false` makes "show me attack attempts" queries
instantaneous.
* `processing_result TEXT` — 'ok', 'error: <msg>',
'signature_invalid', or 'skipped'. Matches the action semantic
exactly.
* `source_ip`, `user_agent`, `request_id` — forensics essentials.
request_id is captured from Hyperswitch's `X-Request-Id` header
if sent, else a UUID generated server-side so every row is
correlatable to VEZA's structured logs.
* `event_type` — best-effort extract from the JSON payload. NULL
when the payload isn't valid JSON or doesn't carry an event_type
field. Useful for "how many dispute.* events have we seen this
month" without needing a dispute handler implemented yet (the
log captures disputes alongside everything else, ready for
axis-1 P1.6 when it lands).
Hardening:
* 64KB body cap (via `io.LimitReader`) rejects oversize payloads
with 413 before any INSERT — prevents log-spam DoS.
* INSERT-once-at-end-with-final-state pattern: one row per
delivery, no two-phase update risk. Signature-invalid and
processing-error rows both land.
* DB persistence failures are logged but never fail the webhook
response — the endpoint's primary contract is acking Hyperswitch.
Retention sweep (CleanupHyperswitchWebhookLog in internal/jobs):
* Daily tick, batched DELETE (10k rows per batch with 100ms pause
between) so a large backlog doesn't lock the table.
* Retention configurable via
`HYPERSWITCH_WEBHOOK_LOG_RETENTION_DAYS` (default 90).
* Uses the same goroutine-ticker pattern as
ScheduleOrphanTracksCleanup / ScheduleSessionCleanup.
Tests:
* 5 tests in `internal/services/hyperswitch/webhook_log_test.go`:
minimal-field persistence, request_id auto-generation on empty
input, invalid-JSON leaves event_type empty, invalid-signature
rows are captured (forensics assert), extractEventType variants
(5 sub-cases).
* 4 tests in
`internal/jobs/cleanup_hyperswitch_webhook_log_test.go`:
deletes-older-than-retention, noop-when-nothing-expired,
default-retention-on-zero, context-cancellation-respected.
### Item D — Idempotency-Key on CreatePayment / CreateRefund
The Hyperswitch client now sends an `Idempotency-Key` HTTP header on
every outbound POST /payments and POST /refunds. The header value is
an explicit parameter at every call site — no context-carrier magic,
no auto-generation — so the contract is visible in every call and
impossible to forget (empty keys cause a loud error, not silent
header omission).
Key values:
* CreatePayment → `order.ID.String()` (UUID generated by GORM
BeforeCreate before the HTTP call).
* CreateRefund → `pendingRefund.ID.String()` (same pattern — UUID
populated by the Phase 1 tx.Create in RefundOrder, available and
stable for the Phase 2 PSP call).
Scope (load-bearing note for future readers):
`Idempotency-Key` covers HTTP-transport retry (TLS reconnect,
proxy retry, DNS flap) within a single CreatePayment /
CreateRefund invocation. It does NOT cover application-level
replay (user double-click, form double-submit, retry after crash
before DB write). That class of bug requires state-machine
preconditions on VEZA side — already addressed by the order
state machine + checkout handler guards (for payments) and the
partial UNIQUE on `refunds.hyperswitch_refund_id` landed in
v1.0.6.1 (for refunds).
Hyperswitch TTL on Idempotency-Key is typically 24h7d
server-side (verify against current PSP docs). Beyond TTL, a
retry with the same key is treated as a new request. Not a
concern at current volumes; document if retry logic ever extends
beyond 1 hour.
What stays unchanged: this commit does NOT add application-level
retry logic. The current "try once, fail loudly" behavior on PSP
errors is preserved. Adding retries is a separate design exercise
(backoff, max attempts, circuit breaker) explicitly out of scope
for item D.
Tests:
* Two httptest.Server-backed tests in client_test.go pin the
header value emitted for CreatePayment and CreateRefund, plus
two tests asserting empty keys cause a loud error.
* TestRefundOrder_OpensPendingRefund now pins the
`refund.ID.String() == lastIdempotencyKey` contract so a
future refactor that drops or reshapes the key fails the test.
* Four existing test mocks updated for the new signature.
Subscription's CreateSubscriptionPayment interface also takes a
payment provider but no implementation is wired in today (v1.0.6.2
noted this as the bypass surface, v1.0.7 item G is the full fix).
When item G lands its Hyperswitch-backed subscription provider,
it will need to thread the idempotency key through the same way —
noted in item G's acceptance in v107-plan.md.
### Item B — async Stripe Connect reversal worker
`reverseSellerAccounting` moved from synchronous "mark row reversed
locally without calling Stripe" to asynchronous "mark row
reversal_pending, let the worker reconcile out-of-band". Decouples
buyer-facing refund UX (completes immediately) from Stripe
settlement health (may retry, may 404 if already reversed, may
permanently fail and need ops attention).
State machine — single source of truth in
`internal/core/marketplace/transfer_transitions.go`:
pending → {completed, failed}
completed → {reversal_pending} (item B)
failed → {completed, permanently_failed}
reversal_pending → {reversed, permanently_failed} (item B)
reversed → {} (terminal)
permanently_failed → {} (terminal)
`SellerTransfer.TransitionStatus(tx, to, extras)` validates against
the matrix and performs a conditional UPDATE guarded by the
expected `from` (optimistic lock semantics — concurrent workers
racing on the same row find RowsAffected=0 and log a conflict).
`TestNoDirectTransferStatusMutation` greps the marketplace package
for raw `.Status = "..."` or
`Model(&SellerTransfer{}).Update("status"...)` outside a minimal
allowlist and fails if found; validated against an injected
violation during development.
StripeReversalWorker (`internal/core/marketplace/reversal_worker.go`):
* Tick interval: `REVERSAL_CHECK_INTERVAL` (default 1m).
* Batch limit 20 per tick, indexed on partial composite
`(status, next_retry_at) WHERE status='reversal_pending'`
(migration 982).
* Exponential backoff: `REVERSAL_BACKOFF_BASE` * 2^retry_count,
capped at `REVERSAL_BACKOFF_MAX` (defaults 1m and 1h).
* `REVERSAL_MAX_RETRIES` (default 5) transitions the row to
permanently_failed.
* Legacy rows with empty stripe_transfer_id → permanently_failed
immediately with a distinctive error_message, so ops can find
them via grep once the backfill CLI (task #38) lands.
Stripe error disambiguation (day 3 closure of the day-2 dead-code
gap):
* 404 + `resource_missing``ErrTransferNotFound` → worker
transitions to permanently_failed (data-integrity signal, never
retry — would amplify the inconsistency).
* 400 + message contains "already" + "reversal/reversed" →
`ErrTransferAlreadyReversed` → worker treats as success
(someone reversed out-of-band via Dashboard or another
instance; idempotent).
* Any other error is transient → retry with backoff.
* Sentinels live in `internal/core/connecterrors` as a leaf
package because marketplace and services both need them and
an import cycle (marketplace → monitoring → services) would
form if either owned them directly.
Migration `982` adds the partial composite index for the worker's
hot path. Migration `983` adds a CHECK constraint
(`status != 'reversal_pending' OR next_retry_at IS NOT NULL`) so
the invariant that every reversal_pending row carries a retry
timestamp is structural — a bug that ever writes NULL
next_retry_at on a reversal_pending row fails the INSERT/UPDATE at
the DB, not silently orphans the row.
Worker covers 9 unit-test cases plus 3 end-to-end scenarios
(refund → worker → reversed, including the invalid-stripe_transfer_id
terminal path). Integration smoke against local Postgres confirmed
migrations 981/982/983 apply cleanly.
Behavior change visible to tests: the refund.succeeded webhook now
leaves the seller_transfer at reversal_pending rather than reversed
directly. `TestProcessRefundWebhook_SucceededFinalizesState`
updated to assert the new expected state and the presence of
next_retry_at.
Worker wired in `cmd/api/main.go` alongside TransferRetryWorker,
sharing the same StripeConnectService instance. Gated on
`StripeConnectEnabled && StripeConnectSecretKey != ""` (same as
TransferRetryWorker) — in dev without Stripe configured, the
worker never starts.
### Notes
* `REVERSAL_*` env vars documented in `.env.template` so ops can
tune without source-diving.
* Anti-mutation test decision (grep-based rather than GORM
BeforeUpdate hook) forced a minor refactor of
`processSellerTransfers` to construct SellerTransfer rows in a
single struct literal rather than mutating Status in place
after construction. The refactor is neither clearer nor more
confusing than the original — borderline stylistic. Logged as
a post-v1.0.7 consideration: if the GORM hook approach proves
cleaner in axis 2 (state-machine transitions for other
entities), revisit and potentially retire the grep test in
favor of a hook.
* Item A unknown #2 (backfill coverage on historical transfers)
tracked as task #38; item B unknown: none surfaced during
implementation.
## [v1.0.6.2] - 2026-04-17
### Hotfix — subscription payment-gate bypass
Discovered during the 2026-04 audit probe (ops question Q2, "are paid
subscriptions actually gated server-side?"). An authenticated user could
POST `/api/v1/subscriptions/subscribe` with a paid plan and receive
HTTP 201 with `status=active` — with the payment provider never
invoked when `HYPERSWITCH_ENABLED=false` (or unset). The resulting
row satisfied `checkEligibility()` in the distribution service, which
returns `sub.Plan.HasDistribution || sub.Plan.CanSellOnMarketplace`.
The Creator plan carries `can_sell_on_marketplace=true`, so any user
could reach `/api/v1/distribution/submit` — a paid feature that
dispatches to external distribution partners — without paying.
Fix — `GetUserSubscription` now filters out active/trialing rows that
lack an effective payment linkage. "Effective" means: on a free plan,
or in an unexpired trial, or at least one attached invoice carries a
PSP payment intent (`hyperswitch_payment_id` non-empty). This is the
sole centralised gate; all paid-feature eligibility paths (distribution
and anything added later) route through it.
* `ErrSubscriptionNoPayment` added to `internal/core/subscription`.
`GetUserSubscription` returns it when a row sits in active/trialing
but fails the payment-effective predicate. Callers treat it as
ineligible (distribution returns `false, nil`; subscription HTTP
handlers return 404 "Active subscription" for cancel/reactivate/
billing-cycle paths; `GET /me/subscription` returns an explicit
`needs_payment=true` payload so honest-path users who landed here
via a broken flow get actionable information, not a misleading
"you're on free" or an opaque 500).
* `Subscribe` and `subscribeToFreePlan` also treat the new error as
"no existing active subscription" so a user can re-subscribe
cleanly once migration 980 has voided their fantôme row.
* `distribution.checkEligibility` propagates
`ErrSubscriptionNoPayment` instead of swallowing it as a generic
ineligible; the distribution handler surfaces a specific 403
message ("Your subscription is not linked to a payment.
Complete payment to enable distribution.") so an honest-path user
isn't told to "upgrade their plan" when they already have one.
* Migration `980_void_unpaid_subscriptions.sql` sweeps all
pre-v1.0.6.2 fantôme rows into `status='expired'`, capturing the
`(subscription_id, user_id, plan_id, previous_status)` tuple in a
dated audit table (`voided_subscriptions_20260417`) so support can
notify any honest-path user who landed there by mistake.
* Probe script `scripts/probes/subscription-unpaid-activation.sh`
kept as a versioned regression test. `--dry-run` lists plans;
`--destructive` logs in and attempts the exploit, cleaning up
after itself. Exit 0 = no bypass; exit 1 = bypass detected.
* Unit test `gate_test.go` covers the 8-branch matrix of the
`hasEffectivePayment` predicate (free pass, paid with/without
invoice, paid with empty vs populated `hyperswitch_payment_id`,
trial variants with future/past/nil `trial_end`, no row at all).
* `TODO(v1.0.7-item-G)` annotation on the `if s.paymentProvider !=
nil` short-circuit in `createNewSubscription` so the v1.0.7 work
that replaces it with a mandatory `pending_payment` state retains
the audit trail.
### Security
Closes a subscription-gate bypass affecting distribution eligibility.
Internal audit finding; no external report. Axis-1 correctness item
P1.7 will be reclassified to P0 and item G added to the v1.0.7 plan
in a follow-up commit.
## [v1.0.6.1] - 2026-04-17
### Hotfix — partial UNIQUE on refunds.hyperswitch_refund_id
Surfaced by the v1.0.6 refund smoke test (scenario S4, triggered after
S3 left a failed refund in its post-Phase-1 / pre-Phase-2 state): the
plain UNIQUE constraint from migration 978 rejected a second refund
attempt on a *different* order because both rows had
`hyperswitch_refund_id=''` (Go's zero-value string → empty string, not
NULL). Postgres treats two empty strings as colliding under a regular
UNIQUE; it only skips NULLs.
* Migration `979_refunds_unique_partial.sql` drops the original
constraint and replaces it with a partial UNIQUE that only
enforces uniqueness when `hyperswitch_refund_id IS NOT NULL AND
<> ''`.
* Preserves the load-bearing idempotency guarantee for successful
refunds (duplicate webhook lands on the same row because the PSP
refund_id is set).
* No Go code change — the model and service logic were already
correct; only the DB constraint shape needed fixing.
Smoke coverage that caught it + re-validates the fix:
* S1 happy path: refund + order + license + seller_transfer +
seller_balance all reconciled end-to-end
* S2 idempotent replay: succeeded_at + transfer.updated_at +
available_cents strictly unchanged across 2 webhook deliveries
(THE critical proof — duplicate Hyperswitch retries are no-ops
at the row level, not at the handler level)
* S3 PSP error rollback: order reverts to completed, refund
persisted as failed, no seller debit
* S4 webhook refund.failed: order reverts, license intact,
seller balance intact — **this is the scenario that surfaced
the bug**
* S5 double-submit: second POST returns 400
ErrRefundAlreadyRequested, only 1 refund row persisted
## [v1.0.6] - 2026-04-17
### Ergonomics + operational hardening — six items from the v1.0.5 backlog
Follow-up to the hardening sprint. v1.0.5 validated the
`register → verify → play` critical path end-to-end; v1.0.6 addresses the
next layer — the UX friction and operational blindspots that a first-day
public user (or a first-day on-call) would hit. Six targeted commits.
#### Fix 1 — Self-service creator role (`c32278dc1`)
New `POST /api/v1/users/me/upgrade-creator`. Verified users click a
"Become an artist" button in `/settings → Account` and their role flips
from `user` to `creator` on one conscious click — no KYC, no cooldown,
no admin round-trip. One-way by design (downgrade = support ticket) so
we don't have to handle the "my uploads orphaned" edge case.
* Gated strictly on `is_verified=true` (403 `EMAIL_NOT_VERIFIED`
otherwise).
* Idempotent 200 for anyone already creator-tier — no clutter.
* UPDATE scoped `WHERE role='user'` so a concurrent admin assignment
can't be silently overwritten.
* Audit trail: `user.upgrade_creator` action logged with the full
role transition metadata.
* Migration `977_users_promoted_to_creator_at.sql` adds a nullable
`promoted_to_creator_at TIMESTAMPTZ` column — distinguishes organic
self-promotions from admin-assigned roles for analytics.
* Tests: 6 Go (happy path, unverified, already-creator, admin
idempotent, 404, no-auth) + 7 Vitest (verified button, unverified
state, hidden for creator, hidden for admin, refetch on success,
idempotent message, server error toast).
#### Fix 2 — Upload size limits from a single source (`5848c2e40`)
The v1.0.5 audit flagged a "front 500MB vs back 100MB" mismatch. In
reality every live pair was aligned (tracks 100/100, cloud 500/500,
video 500/500) — the real architectural bug was **five duplicated
hardcoded values** that could drift silently as soon as anyone tuned
one.
* `internal/config/upload_limits.go`: `AudioLimit`, `ImageLimit`,
`VideoLimit` expose `Bytes()`, `MB()`, `HumanReadable()`,
`AllowedMIMEs`. Read lazily from env
(`MAX_UPLOAD_AUDIO_MB`, `MAX_UPLOAD_IMAGE_MB`,
`MAX_UPLOAD_VIDEO_MB`, defaults 100/10/500). Invalid/negative/zero
env values fall back to default.
* `track/service.go`, `track_upload_handler.go`,
`education_handler.go`, `upload.go:GetUploadLimits` all consume the
single source. Changing one env retunes every path.
* Frontend `useUploadLimits()` hook: react-query with 5 min stale,
30 min gc, 1 retry then optimistic fallback to baked-in defaults
so the dropzone stays responsive even without the network round
trip. `useUploadModal` replaces `MAX_FILE_SIZE` constant with the
live value; `UploadModal` forwards `audioMaxHuman` to
`UploadModalDropzone` so the label and error toast track the env.
* Out of scope (tracked for later): `CloudUploadModal.tsx` still
hardcodes 500MB — cloud uploads accept audio+zip+midi with a
different category semantic than the three in `/upload/limits`.
Unifying deserves its own design pass.
* Tests: 4 Go (defaults, env override, invalid env fallback, MIME
lists) + 4 Vitest (sync fallback, typed mapping, partial-payload
fallback per category, network failure keeps fallback).
#### Fix 3 — Unified SMTP env schema (`066144352`)
Two email services in-tree read *different* env vars for the same
fields — surfaced during the v1.0.5.1 hotfix:
internal/email/sender.go internal/services/email_service.go
SMTP_USERNAME SMTP_USER
SMTP_FROM FROM_EMAIL
SMTP_FROM_NAME FROM_NAME
v1.0.6 reconciles both onto canonical `SMTP_*` names, with a migration
fallback to the legacy names that logs a structured deprecation warning
(`remove_in: v1.1.0`).
* `internal/email/sender.go` is the single loader — both services
delegate to it via `LoadSMTPConfigFromEnvWithLogger(*zap.Logger)`.
Canonical wins over deprecated; no precedence surprise.
* `docker-compose.yml` backend-api env: `FROM_EMAIL` /
`FROM_NAME``SMTP_FROM` / `SMTP_FROM_NAME` to match the canonical
schema.
* `.env.template` trimmed — only canonical vars ship, old ones
removed (still accepted in running env for zero-downtime rollover).
* No default injected for Host/Port in the loader. `Host==""`
callers go log-only (matches historic dev behavior). Dev defaults
stay in `.env.template`, so prod fails fast instead of silently
dialing localhost.
* Tests: 5 Go (empty env, canonical direct, deprecated fallback
+ warning emission, canonical silently wins over deprecated, nil
logger allowed).
#### Fix 4 — Refund reverse-charge with idempotent webhook (`959031667`)
The structural one. Before v1.0.6, `RefundOrder` wrote `status='refunded'`
to the DB and called Hyperswitch synchronously, treating the API ack as
terminal. In reality Hyperswitch returns `pending` and only finalizes via
webhook. Customers could see "refunded" while their bank was still
uncredited, and the seller balance kept its credit even on successful
refunds.
* Two-phase flow:
1. **Open pending refund** (short row-locked tx): validate
permissions + 14-day window + double-submit guard; persist
`Refund{status=pending}`; flip order to `refund_pending` (not
`refunded` — that's the webhook's job).
2. **PSP call outside the tx**: `Provider.CreateRefund` returns
`(refund_id, status, err)`. On error, mark refund failed + roll
order back to `completed`. On success, capture the
`hyperswitch_refund_id` as the idempotency key — stay in
`pending` even if the sync status is "succeeded" (per customer
guidance: never trust the sync ack, always wait for the
webhook).
3. **`ProcessRefundWebhook`** drives terminal state. Row-lock +
`IsTerminal()` short-circuit: any duplicate Hyperswitch retry
is a no-op 200. On `refund.succeeded`: flip refund + order to
succeeded/refunded, revoke licenses, debit seller balance,
mark every `SellerTransfer` for the order as `reversed`.
* Migration `978_refunds_table.sql` with `UNIQUE(hyperswitch_refund_id)`
— this is the load-bearing idempotency guarantee.
* Webhook routing: `HyperswitchWebhookPayload.IsRefundEvent()`
dispatches `refund.*` events to `ProcessRefundWebhook`; payment
events keep flowing through the existing `ProcessPaymentWebhook`.
* `DebitSellerBalance` ported off Postgres-only `GREATEST()` to
portable `CASE WHEN`; the path wasn't exercised before v1.0.6, so
this is a quality fix not a regression.
* Partial refunds: signature carries `amount *int64` (nil = full) but
service call-site passes nil — full-only for v1.0.6. Partial-refund
UX is deferred to v1.0.7.
* Stripe Connect Transfers:reversal call flagged TODO(v1.0.7).
Internal balance + transfer-status are corrected here so buyer and
seller views match the moment the PSP confirms; the missing piece
is the money-movement round-trip at Stripe. Internal accounting is
consistent — external settlement catches up with v1.0.7.
* Tests: 15 Go cases covering Phase 1 (pending state, PSP error
rollback, double-submit, permissions, window), webhook
finalization (succeeded, failed, idempotent replay with
`succeeded_at` timestamp invariant, unknown refund_id, missing
refund_id, non-terminal ignored), and dispatcher logic (6
`IsRefundEvent` cases across flat/nested/event_type shapes).
#### Fix 5 — RTMP ingest health banner on Go Live (`64fa0c9ac`)
"Go Live" was silent when `nginx-rtmp` wasn't running. An artist could
copy the RTMP URL + stream key, fire OBS, and broadcast into the void
with no in-UI signal.
* `GET /api/v1/live/health` TCP-dials `NGINX_RTMP_ADDR` (default
`localhost:1935`), 2s timeout, 15s TTL cache protected by a mutex so
a burst of page loads can't hammer the ingest. Returns UI-safe
`error` string (no raw hostname leak) and `Cache-Control: private,
max-age=15` so browsers honor the same window.
* Unreachable path emits a WARN log so operators see the outage
before users do.
* Frontend `useLiveHealth()` hook: react-query 15s stale, 1 retry,
then optimistic `{ rtmpReachable: true }` — better to miss a banner
than flash a false negative on a transient health-endpoint blip.
* `LiveRtmpHealthBanner` at the top of `GoLivePage`: amber,
non-blocking, copy explicitly tells the artist the stream key is
still valid but broadcasting won't reach anyone, with a Retry
button that invalidates the health query.
* Tests: 3 Go (listener reachable + Cache-Control; dead port
unreachable + UI-safe error asserting no `127.0.0.1` leak; TTL
cache survives listener teardown) + 3 Vitest (hidden when
reachable, visible with Retry when unreachable, Retry invalidates
the right query key).
#### Fix 6 — RabbitMQ publish failures no longer silent (`bf688af35`)
`RabbitMQEventBus.Publish` returned the broker error but did not log
it. Callers that wrapped `Publish` in fire-and-forget
(`_ = eb.Publish(...)`) lost events with zero trace during RMQ outages.
* `Publish` now emits a structured ERROR on broker failure with the
exchange, routing_key, payload_bytes, content_type, and message_id
context. Function still returns the error so call-sites that
actually check it keep working.
* `EventBus disabled` warning kept but upgraded with `payload_bytes`
so dashboards can quantify drops when RMQ is intentionally off.
* Aligns the legacy `internal/eventbus` with `infrastructure/eventbus`
which already had this pattern.
* Tests: 2 Go (disabled bus emits WARN + returns
`EventBusUnavailableError`; nil logger stays panic-free for legacy
callers).
### Breaking changes
* `marketplace.MarketplaceService.RefundOrder` now returns
`(*Refund, error)` instead of `error`. Callers consuming the
service directly need to accept the pending refund row.
* `marketplace.refundProvider` internal interface: `Refund(...)
error` → `CreateRefund(...) (refundID, status string, err error)`.
`hyperswitch.Provider` implements both; external mocks must be
updated.
* Order status machine gains `refund_pending` as an intermediate
state. Clients reading `orders.status` should treat it as
"in-flight refund, don't show as refunded yet".
### Known gaps (parked for v1.0.7)
* Partial refunds — UX decision + call-site wiring
* Stripe Connect Transfers:reversal — actually move money back at
the PSP level (internal accounting is correct today)
* `CloudUploadModal.tsx` hardcoded 500MB — category semantic doesn't
map to the three exposed by `/upload/limits`
* Smoke test of refund flow against Hyperswitch sandbox (manual,
outside CI)
## [v1.0.5.1] - 2026-04-16
### Hotfix — dev SMTP ergonomics
Follow-up to the v1.0.5 smoke test: a fresh clone + `cp .env.template .env`
+ `make dev-full` produced a backend with `SMTP_HOST=""`, which silently
short-circuits `EmailService.sendEmail` to a log-only path. New
contributors hit register → "where's my verification email?" and had no
obvious cue that the SMTP hookup was missing.
- `veza-backend-api/.env.template`: `SMTP_HOST` / `SMTP_PORT` now default
to the MailHog instance that ships with `make infra-up-dev`
(`localhost:1025`, UI on `:8025`). `FROM_EMAIL` / `FROM_NAME` seeded
with local-safe values. Comment rewritten to point at both the dev
path and the prod override.
- Also exports the duplicate variable names (`SMTP_USERNAME`, `SMTP_FROM`,
`SMTP_FROM_NAME`) read by `internal/email/sender.go` — a TODO flagged
for v1.0.6 to reconcile the two email services onto a single env
schema. Until then both sets cover every code path.
No code change, no migration, no version bump in the Go module. Pure
config hotfix.
## [v1.0.5] - 2026-04-16
### Hardening sprint — seven critical-path fixes before public opening
Audit follow-up on the `register → verify → play` critical path. The app was
functional on the surface but broken underneath: the player was silent, emails
weren't really sent, the marketplace gave products away in production, the
chat silently de-synced across pods, maintenance mode was per-pod only,
orphaned tracks accumulated forever in `processing`, and the response cache
was corrupting range-aware media responses. Seven targeted fixes, each with
its own commit, its own tests, and no behaviour change outside scope.
#### Fix 1 — Player muet (`veza-backend-api` + `apps/web`)
- New `GET /api/v1/tracks/:id/stream` handler in
`internal/core/track/track_hls_handler.go`. Serves the raw file via
`http.ServeContent``Range`, `If-Modified-Since` and `If-None-Match`
handled for free, so `<audio>` seek works end-to-end.
- Route registered in `routes_tracks.go` **unconditionally** (outside the
`HLSEnabled` gate) with `OptionalAuth` so both anonymous and authenticated
users can stream, and the `share_token` query path keeps working.
- Frontend flag `FEATURES.HLS_STREAMING` default flipped from `true` to
`false` to match the backend's `HLS_STREAMING` default. The mismatch was
the root cause: hls.js was attaching to a 404 manifest and leaving the
audio element silent.
- All playback URL builders (`feedService`, `discoverService`,
`playerService`, `PlayerQueue`, `SharedPlaylistPage`, `TrackSearchResults`,
`useLibraryManager`, `useTrackDetailPage`) redirected from `/download` to
`/stream`. `/download` remains for explicit downloads.
- `useHLSPlayer` — when hls.js emits a fatal non-media error (manifest 404,
all network retries exhausted), the hook now destroys hls.js and swaps
the audio element onto `/api/v1/tracks/:id/stream` so operators turning
HLS on via feature flag don't re-break the player.
- Tests: 6 Go unit tests covering invalid UUID, missing track, private-track
forbidden, missing file, full body stream, and `206 Partial Content` with
`Range: bytes=10-19`. MSW handler and `playerService.test.ts` assertion
updated.
#### Fix 2 — Email verify bidon (`veza-backend-api` + `docker-compose.*`)
- `core/auth/service.go`: the hard-coded `IsVerified: true` on registration
is gone. New users start as `is_verified=false` and the existing
`/auth/verify-email` endpoint (unchanged) flips them once they click the
link. `TestLogin_EmailNotVerified` now asserts the correct `403`
behaviour instead of silently accepting unverified logins.
- Registration actually calls `emailService.SendVerificationEmail(...)`
(previously the code just `logger.Info("Sending verification email")`
without sending). On SMTP failure, the handler returns `500` in
production (fail-loud) and logs a warning in development so local
sign-ups keep flowing. Same treatment on
`password_reset_handler.RequestPasswordReset` — the log-only "don't fail
the user message" path is gone in prod.
- New helper `isProductionEnv()` centralises the
`APP_ENV=="production"` check in both `core/auth` and `handlers`.
- `docker-compose.yml` and `docker-compose.dev.yml` now ship MailHog
(`mailhog/mailhog:v1.0.1`, SMTP 1025, UI 8025). Backend dev env var
`SMTP_HOST=mailhog SMTP_PORT=1025` pre-wired.
- Tests: all six `auth` tests adapted to the new async flow
(`expectRegister` adds a `SendVerificationEmail` mock, `Login_Success`
tests manually flip `is_verified` after `Register` to simulate the click
on the verification link).
#### Fix 3 — Marketplace gratuit (`internal/config/config.go`)
- `ValidateForEnvironment` now refuses `APP_ENV=production` with
`HYPERSWITCH_ENABLED=false`. Without payments enabled, the marketplace
flow completes orders as `CREATED` and releases files without charging —
effectively free. The guard is loud ("...effectively giving away
products. Set HYPERSWITCH_ENABLED=true...") because a silent misconfig
here is a revenue leak.
- Called at boot from `NewConfig()` line 513 — config validation happens
before any HTTP listener starts, so a bad prod config fails fast.
- Tests: 3 new cases (`_fails`, `_succeeds`, `non-production is
unaffected`) in `validation_test.go`.
#### Fix 4 — Redis obligatoire multi-pod (`config.go` + `chat_pubsub.go`)
- Same `ValidateForEnvironment` now requires `REDIS_URL` to be
**explicitly** set in production. The struct field has a default
(`redis://<appDomain>:6379`) that let misconfigured pods boot against
a phantom host and silently degrade to in-memory PubSub — which is
fine on one pod and catastrophic on two (chat messages on pod A never
reach subscribers on pod B).
- `ChatPubSubService` constructor now emits `ERROR` (was silent) when
`redisClient` is nil, with a message explicitly naming the failure
mode: "cross-instance messages will be lost". Same treatment for
`Publish` fallbacks — `Warn``Error`, because runbook-worthy.
- Tests: `chat_pubsub_test.go` added (constructor log assertion +
in-memory fan-out happy path) plus 1 new case in `validation_test.go`.
#### Fix 5 — Maintenance mode persisté en DB (`middleware/maintenance.go`)
- Migration `976_platform_settings.sql` introduces a typed key/value
table and seeds `maintenance_mode=false`. Column split into
`value_bool` / `value_text` so we avoid string parsing in the hot
path.
- `middleware/maintenance.go` rewritten. `InitMaintenanceMode(db,
logger)` wires a DB pool at boot; `MaintenanceModeEnabled()` reads
from a 10-second TTL cache and refreshes lazily on the next request.
Toggling on one pod propagates to every pod within ~10 s.
- Admin endpoint `PUT /api/v1/admin/maintenance` now persists via
`INSERT ... ON CONFLICT DO UPDATE` before calling the in-memory
setter, so the change survives restarts and is visible cluster-wide.
- Tests: new `TestMaintenanceGin_DBBacked` flips the DB row, waits
past TTL, and asserts the cache picked up the change. Existing
tests preserved.
#### Fix 6 — Cleanup tracks orphelines (`internal/jobs/`)
- New `CleanupOrphanTracks` worker. Tracks stuck in `processing` for
more than one hour with no file on disk (uploader crashed, container
restart during upload, disk wipe) flip to `status=failed` with
`status_message = "orphan cleanup: file missing on disk after >1h in
processing"`. Never deletes the row, never touches present files or
already-failed rows, safe to re-run.
- `ScheduleOrphanTracksCleanup(db, logger)` runs once at boot and then
hourly — wired in `cmd/api/main.go` alongside the HTTP listener.
- Tests: 5 cases in `cleanup_orphan_tracks_test.go` covering the happy
path and four negatives (file still present, track too recent, already
failed, nil database).
#### Fix 7 — Response cache corrupting binary media (`middleware/response_cache.go`)
Surfaced by the v1.0.5 browser smoke test. `ResponseCache` captures the
entire body into a `bytes.Buffer`, JSON-serialises it (escaping non-UTF-8
bytes) and replays via `c.Data` for subsequent hits. For `/stream`,
`/download` and `/hls/*` this had two failure modes:
1. `Range` headers were never honoured — the cache replayed the full
body on every request, stripped `Accept-Ranges`, and left the
`<audio>` element unable to seek. A `Range: bytes=100-299` request
got back `200 OK` with 48 944 bytes instead of `206` with 200.
2. Non-UTF-8 bytes got escaped through the JSON round-trip
(`\uFFFD` substitution etc.), corrupting the MP3 payload so even
full plays could fail mid-stream (served body MD5 diverged from
the source file).
Fix: skip the cache entirely for any path containing `/stream`,
`/download` or `/hls/`, and for any request carrying a `Range` header
(belt-and-suspenders for any future media endpoint). All other
anonymous GETs keep their 5-minute TTL.
Live verification after patch:
- Full GET: `200 OK`, `Accept-Ranges: bytes`, `Content-Length: 48944`,
served body MD5 matches source file byte-for-byte.
- Range `100-299`: `206 Partial Content`,
`Content-Range: bytes 100-299/48944`, exactly 200 bytes.
- Browser `<audio>.play()` succeeds, `currentTime` progresses,
`seek(1.5)` accepted (`readyState=4`, no error).
### Production guards summary
`config.go:886 Validate()` (base) + `config.go:810 ValidateForEnvironment()`
(per-env) — the prod branch now rejects boot if any of:
- `CORS_ALLOWED_ORIGINS` missing or contains `*`
- `LOG_LEVEL=DEBUG`
- `CLAMAV_REQUIRED != true`
- `CHAT_JWT_SECRET == JWT_SECRET`
- `OAUTH_ENCRYPTION_KEY` shorter than 32 bytes
- `JWT_ISSUER` / `JWT_AUDIENCE` empty
- **`HYPERSWITCH_ENABLED != true`** (new)
- **`REDIS_URL` not explicitly set** (new)
### Known gaps (parked for v1.0.6)
- Hyperswitch refund path doesn't propagate to PSP
- Livestream has no UI feedback when `nginx-rtmp` is down
- Upload size mismatch (front 500 MB, back 100 MB)
- RabbitMQ silent drop on enqueue failure
- `SMTP_HOST` not injected in `make dev` (host-mode ergonomics, not a
code bug — the SMTP_HOST env is only wired into the `docker-dev`
profile where the backend runs in-container)
- Upload route gated by `creator` role with no self-service path to
the role — new users can't upload without manual DB escalation
## [v1.0.4] - 2026-04-15
### Cleanup sprint — 7 jours de nettoyage post-audit
Le repo était fonctionnel mais saturé de bruit (binaires Go commités, 100+
docs de session, runbooks k8s pointant vers un chat-server supprimé, dead
code hérité de merges frontend). Ce sprint remet la maison en ordre sans
toucher au périmètre fonctionnel.
#### Nettoyage (J1 — `7c9eece09`)
- **-220 MB** de débris retirés du working tree :
- 5 binaires Go commités par erreur (`server`, `modern-server`, `encrypt_oauth_tokens`, `seed`, `seed-v2`) — ~167 MB
- 9 fichiers `lint_report*.json` (~32 MB)
- 70 fichiers `coverage*.out`, 54 PNG `audit-*`, 3 `.bak` de tests
- 9 scripts obsolètes de l'ère MVP (Jan 2026, hardcoded v0.101)
- **174 `.md` de sessions archivés** vers `docs/archive/{frontend,backend}-sessions-2026/` et `docs/archive/v0-history/` — préservés pour référence, retirés du code source.
- `.gitignore` mis à jour pour prévenir la récurrence.
#### Documentation alignée (J2 — `172ff497b`)
- **`CLAUDE.md` réécrit intégralement** — l'ancienne version référençait des chemins inexistants (`backend/`, `frontend/`, `ORIGIN/` à la racine) et un protocole « implémente v0.11.0 » alors que le projet est à v1.0.x depuis mars. Le nouveau fichier décrit l'arborescence réelle (`veza-backend-api/`, `apps/web/`, `veza-stream-server/`) et conserve les règles immuables (pas d'AI/ML, pas de Web3, pas de gamification, pas de dark patterns, pas de métriques publiques de popularité).
- **`README.md`** : `v0.101`/`v0.9.3` → `v1.0.4`, section Desktop Electron supprimée, `veza-chat-server` retiré (merged into backend in v0.502).
- **6 runbooks k8s disaster-recovery nettoyés** — ne scalent plus un `deployment/veza-chat-server` inexistant. Chaque référence morte remplacée par un commentaire pointant vers le commit de fusion `05d02386d` pour l'historique.
- `k8s/secrets.yaml.example`, `k8s/secrets/README.md`, `.env.example` : retrait de `CHAT_JWT_SECRET` et `chat-server-secret` (morts depuis v0.502).
#### Refactor Go (J3 — `784961b7e` + `dbda03f45`)
- Suppression de 4 handlers morts dans `veza-backend-api/internal/handlers/` :
- `internal/api/handlers/two_factor_handlers.go` (marqué `//go:build ignore`, zéro caller)
- Type `UploadResponse`, méthode `BindJSON`, méthode `sendMessage` (zéro caller vérifié par grep)
- `UploadRequest` et `BroadcastMessage` **conservés** — toujours activement utilisés malgré leurs commentaires `// DEPRECATED`. Refactor hors scope cleanup.
- `seed-v2` : n'existait que comme binaire (déjà purgé J1), aucune source Go.
#### Fix RGPD hard-delete (J4 — `ebb28c77a`)
- **TODO(HIGH-007) résolu**. Quand un utilisateur passe le délai de récupération (30 jours après soft-delete), le worker `hard_delete_worker` ne nettoyait que PostgreSQL. Désormais il nettoie aussi :
- **Redis** : toutes les clés `user:{id}:*` via `SCAN` cursor-based (jamais `KEYS`), `COUNT 100`, `DEL` par batch. Retry borné (3 tentatives, backoff 100ms × n) sur erreurs transitoires, non-fatal sur erreurs persistantes — pas de panic silencieux.
- **Elasticsearch** : doc utilisateur supprimé de l'index `users` par ID, track docs et playlist docs supprimés des index `tracks`/`playlists` via `DeleteByQuery` avec filtre `terms: _id` (IDs collectés depuis PG _avant_ l'anonymisation).
- Injection optionnelle des clients Redis et ES dans le worker via `WithRedis()` / `WithElasticsearch()`. Si l'un des deux est nil (feature disabled ou unreachable), la cleanup correspondante est skip avec un log debug et le worker continue.
- **Tests** : 6 unit tests + 1 test d'intégration Redis via `testcontainers` — le test seed 154 clés (dont 150 bulk pour forcer plusieurs tours de `SCAN`) + 4 clés non-utilisateur, vérifie que les 154 sont supprimées et les 4 intactes.
#### Petits cleanups (J5 — `edc851af6`)
- **GeoIP** (`geoip_service.go`) : TODO supprimé, remplacé par un commentaire expliquant le report à v1.1.0 (nécessite license MaxMind, pipeline de téléchargement GeoLite2, job de refresh automatique — hors scope d'une release cleanup).
- **`v2-v3-types.ts` → `domain.ts`** : renommage d'un fichier historique de fusion frontend. Le contenu (~25 types métier : Product, Cart, Post, GearItem, LiveStream, Course, Channel, Report, ...) reste, les types sont toujours activement utilisés via `@/types`. Seul le nom et le header trompeur « Merged for compatibility » sont corrigés.
- **Storybook** : désactivation CI confirmée (3 workflows en `.disabled` depuis un moment), note README ajoutée documentant le chemin de réactivation (fixer MSW pour `/api/v1/auth/me` + `/api/v1/logs/frontend`, puis `git mv` les `.disabled` en `.yml`).
- **`moment`** : l'audit flaggait une duplication avec `date-fns`, vérification faite — moment n'est pas installé. `date-fns@4.1.0` est la lib unique. No-op.
#### Infra CI (J6 — `091583b3d` + `a9394a4a0` + `8f15bb136`)
- **3 fichiers `docker-compose.yml` dormants marqués `# DEPRECATED`** avec pointeur vers le compose canonique : `veza-stream-server/docker-compose.yml`, `infra/docker-compose.lab.yml`, `config/docker/docker-compose.local.yml`. Renommés en `.disabled` pour suppression ultérieure après période de grâce.
- **`.lintstagedrc.json`** : fix d'un bug pré-existant où la règle `apps/web/**/*.{ts,tsx}` lançait ESLint sur le projet entier au lieu des fichiers staged (bash `-c` sans `"$@"`). Cause racine des `--no-verify` utilisés en J1 et J5.
- **`.github/workflows/backend-ci.yml` → `.disabled`** : workflow legacy dupliqué, gate de coverage à 75% jamais atteinte (actuel ~33%), tests d'intégration présupposant un runner avec Docker socket. Le workflow consolidé `ci.yml` couvre déjà la même surface en mieux. Ferme la consolidation CI entamée dans `e949e2d79`.
- **`testutils.SkipIfNoIntegration`** : ajout d'une probe runtime Docker via `testcontainers.NewDockerProvider()` mémoizée avec `sync.Once`. Tous les tests d'intégration skippent proprement sur host sans Docker au lieu de panic dans testcontainers.
#### Sécurité deps Go (fix govulncheck)
- `golang.org/x/image` v0.36.0 → v0.38.0 (GO-2026-4815)
- `github.com/quic-go/quic-go` v0.54.0 → v0.57.0 (GO-2025-4233)
- `github.com/testcontainers/testcontainers-go` v0.33.0 → v0.42.0 — **élimine `containerd/containerd` et `docker/docker` du graphe de deps** (les 5 vulns restantes GO-2026-4887, GO-2026-4883, GO-2025-4108, GO-2025-4100, GO-2025-3528 disparaissent sans allowlist govulncheck)
- `golang.org/x/net` v0.50.0 → v0.51.0 (GO-2026-4559)
- `go.work` bumpé à `go 1.25.0` pour matcher le `veza-backend-api/go.mod` bumped dans `24af2f72b`.
#### Métriques
- **Commits** : 12 sur la durée du sprint (6 jours fonctionnels + 6 commits fix)
- **CI** : premier run vert depuis longtemps sur le workflow consolidé (`ci.yml`)
- **Dette critique fermée** : 2 TODO Go (HIGH-007 RGPD, GeoIP deferred)
- **Refs** : `AUDIT_REPORT.md` à la racine contient l'audit complet qui a généré ce sprint
---
## [v1.0.2] - 2026-03-03
### Conformité V1_SIGNOFF
- **Couverture tests** : script `veza-backend-api/scripts/coverage_report.sh` créé; couverture Go 39% mesurée; Vitest thresholds frontend ajustés à 50%
- **Load tests WebSocket** : CHAT_ORIGIN corrigé vers backend (ws://localhost:8080), WS_URL=/api/v1/ws dans loadtests/config.js, stress_1000ws.js, websocket.js
- **Tests** : chat_service_test (WSUrl /api/v1/ws), password_service_integration_test (hash token, expired token)
- **Documentation** : docs/PERFORMANCE_BASELINE.md section Résultats v1.0.2; docs/RGPD_CCPA_VERIFICATION.md, docs/PWA_OFFLINE_VERIFICATION.md tables v1.0.2; docs/V1_SIGNOFF.md checklist complète (14 PASS, 7 N/A)
- **Runbooks et Grafana** : validés (dashboards JSON, alertes Prometheus)
- **Secrets** : aucun en dur; docs/runbooks/SECRET_ROTATION.md confirmé
---
## [v1.0.1] - 2026-03-03
### Sécurité
- **npm** : basic-ftp, minimatch, rollup — corrigés via `npm audit fix --legacy-peer-deps` (0 CRITICAL)
- **Rust** : bytes 1.11.1, time 0.3.47, tungstenite (axum-tungstenite supprimé), idna (validator 0.19), protobuf (prometheus 0.14) — rsa/slice-ring-buffer documentés (pas de fix)
- **veza-stream-server** : prometheus 0.14, validator 0.19, axum-tungstenite retiré
### Tests & API
- OpenAPI : annotation `@Param id` corrigée pour `/tracks/quota/{id}` — swagger-cli validate OK
- Tests paiement E2E : TestPaymentFlow_E2E_CartCheckoutWebhook passe
- OAuth : DATABASE_URL fallback dans GetTestContainerDB pour tests sans testcontainer
### Documentation
- docs/SECURITY_SCAN_RC1.md : état npm, cargo audit, procédure Trivy
- docs/V1_SIGNOFF.md : critères 1, 2, 3, 11, 12, 15, 16 mis à jour
- docs/PROJECT_STATE.md : v1.0.0, phase conforme ROADMAP
- migrations/000_mark_consolidated.sql : marqueur pour bases existantes
---
## [v1.0.0] - 2026-03-03
### Release commerciale
Première release commerciale de la plateforme Veza. Voir [RELEASE_NOTES_V1.md](RELEASE_NOTES_V1.md) pour le résumé complet des fonctionnalités et corrections depuis v0.803.
- Corrections de sécurité (OAuth, webhooks, rate limit, TokenBlacklist)
- Tests E2E auth et paiement
- Pagination cursor-based, load tests, monitoring
- Runbooks opérationnels, RGPD/CCPA vérifiés
- PWA offline, Lighthouse ≥ 90
---
## [v0.992] - 2026-03-03 (RC2)
### Added
- RELEASE_NOTES_V1.md: release notes complètes v0.803 → v1.0.0
### Changed
- docs/V1_SIGNOFF.md: sign-off RC2 validé
- VERSION: 0.992
---
## [v0.991] - 2026-03-03 (RC1)
### Added
- docs/V1_SIGNOFF.md: v1.0.0 release checklist (21 critères)
- docs/SECURITY_SCAN_RC1.md: security scan results (govulncheck, npm audit)
- Branche release/v1.0.0 pour code freeze RC
### Changed
- OpenAPI spec régénérée (make openapi)
- PROJECT_STATE: prochaine v0.992 RC2
---
## [v0.982] - 2026-03-03
### Added
- docs/PERFORMANCE_BASELINE.md: section Lighthouse v0.982 (objectifs ≥90)
- docs/PWA_OFFLINE_VERIFICATION.md: checklist vérification mode offline
- docs/RGPD_CCPA_VERIFICATION.md: checklist export, suppression, opt-out
### Fixed
- docs/BUG_BASH_V0981.md: critères release validés (0 P1/P2 ouverts)
### Changed
- VERSION: 0.982
---
## [v0.981] - 2026-03-02
### Added
- .env.staging.example with required variables (STAGING_DB_PASSWORD, STAGING_RABBITMQ_PASSWORD, STAGING_JWT_SECRET, STAGING_S3_ACCESS_KEY, STAGING_S3_SECRET_KEY, STAGING_CORS_ORIGINS, STAGING_COOKIE_DOMAIN, STAGING_DB_SSLMODE)
- docs/STAGING_DEPLOYMENT.md: step-by-step staging deployment guide
- docs/SMOKE_TEST_V0981.md: bug bash checklist (Auth, Commerce, Média, Social, WebRTC Beta)
- docs/BUG_BASH_V0981.md: template for tracking bugs (P1/P2/P3)
### Fixed
- docker-compose.staging.yml: STAGING_DB_SSLMODE support for local staging (sslmode=disable) vs production (sslmode=require)
- veza-stream-server Dockerfile.production: invalid COPY migrations syntax (2>/dev/null removed)
### Changed
- PROJECT_STATE: dernier tag v0.981, prochaine v0.982
---
## [v0.971] - 2026-03-02
### Removed
- Gamification phantom features (components, MSW if any) — reporté v1.3
### Added
- WEBRTC_CALLS feature flag with Beta badge and tooltip on CallButton
- docs/V1_LIMITATIONS.md
- docs/API_VERSIONING_POLICY.md
- Migration 936: WEBRTC_CALLS flag
- GET /api/v1/feature-flags for authenticated users (client-visible flags)
- adminService.getClientFeatureFlags()
- MSW handler for GET /api/v1/feature-flags
### Changed
- CallButton: Badge "Beta", tooltip "Fonctionne mieux sur le même réseau local"
- ChatRoom: fetches WEBRTC_CALLS, shows CallButton only when enabled
- FEATURE_STATUS: Gamification → Abandonné v1.0, WebRTC → Beta LAN only
- PROJECT_STATE: dernier tag v0.971, prochaine v0.981
---
## [v0.951] - 2026-03-02
### Added
- Load test: stress_500rps.js — 500 VUs on login, tracks, search, products (P99 < 500ms target)
- Load test: stress_1000ws.js 1000 WebSocket concurrent, 5 min hold
- Load test: uploads.js 50 VUs, setup() creates users when AUTH_TOKEN absent
- Migration 940: products indexes (status+created_at, seller_id+status)
- docs/PERFORMANCE_BASELINE.md: v0.951 scripts and targets
- loadtests/README.md: stress_500rps, stress_1000ws, 50 uploads sections
### Changed
- loadtests/backend/uploads.js: stages 1m50 VUs, 2m hold; setup() for token creation
---
## [v0.803] - 2026-02-25
### Added
- Audit middleware: auto-log POST/PUT/DELETE to AuditService (skip /health, /metrics, /swagger)
- CCPA compliance: Sec-GPC header support, POST /users/me/privacy/opt-out
- Account deletion hardening: anonymization (deleted-{uuid}), S3 cleanup, session revocation, audit log
- Moderation queue: migration reports, model, service, handler; GET /admin/reports, POST /admin/reports/:id/resolve, POST /reports (user report)
- Maintenance mode: middleware 503 when enabled, PUT/GET /admin/maintenance (admin toggle)
- Announcements: migration, model, service, handler; GET /announcements/active (public), GET/POST/DELETE /admin/announcements
- Feature flags: migration, model, service, handler; GET /admin/feature-flags, PUT /admin/feature-flags/:name
- Frontend: AdminSettingsView connected to maintenance, announcements, feature flags; AdminModerationView to real reports API
- AnnouncementBanner: global banner fetching GET /announcements/active, integrated in DashboardLayout
- MSW handlers: reports, announcements, feature flags, maintenance
- Swagger annotations: privacy opt-out, account deletion
- Unit tests: CCPA, reports, announcements, feature flags handlers
- DDoS rate limiting (SEC1-04): global 1000 req/s, per-IP 100 req/s, Redis sliding window 1s
- AdminSettingsView: SETTINGS tab in AdminDashboardView (announcements, feature flags, maintenance)
### Changed
- AdminSettingsView: local state replaced by API calls for maintenance, feature flags, announcements
- AdminModerationView: mock replaced by GET /admin/reports, resolve via POST; actions aligned to dismiss/warn/ban
---
## [v0.802] - 2026-02-25
### Added
- Cloud: file versioning (create, list, restore), sharing (create share link, get shared file)
- Cloud: GDPR data export (POST /users/me/export, async ZIP, 202 Accepted)
- Cloud: automatic backup cron (24h, copies files to S3 backup prefix)
- Upload: batch upload with parallel queue (BatchUploader component)
- Tags: GET /tags/suggest for autocomplete (prefix match, frequency order)
- Tags: audio/aiff, audio/x-aiff MIME types
- Gear: documents CRUD (upload PDF, list, delete)
- Gear: repairs CRUD (repair history with date, cost, provider)
- Gear: warranty_start, warranty_notes on gear_items
- Gear: warranty notifier (24h ticker, notifications when warranty expires in 30 days)
- Frontend: CloudFileVersions, CloudShareModal, Versions/Share buttons in CloudFileList
- Frontend: GearDetailModal tabs (Documents, Repairs), warranty badge
- MSW handlers: cloud versions/share, gear documents/repairs, tags suggest
- Backend unit tests: TagSuggestService, GearWarrantyNotifier
- Storybook: CloudFileVersions (WithVersions, Empty, Loading), CloudShareModal, GearDetailModal (WithDocuments, WithRepairs, WarrantyExpiring), GearCard (WarrantyExpiringSoon)
### Changed
- gear_document_service: sanitizeGearFilename to avoid conflict with cloud sanitizeFilename
---
## [v0.801] - 2026-02-25
### Added
- User preferences: migration 118 user_preferences table with appearance fields (contrast, density, accent_hue, font_size)
- PUT /users/me/preferences: persist theme, contrast, density, accentHue, fontSize
- High contrast mode: WCAG AA compliant palette (data-contrast="high")
- Density modes: compact (reduced spacing ~25%), comfortable
- Accent color: customizable hue via ThemeProvider, 5 presets in Settings
- Font size: adjustable 1420px via slider, CSS variable --sumi-font-size-base
- useReducedMotion hook: prefers-reduced-motion media query
- useWakeLock hook: Screen Wake Lock for background playback on mobile
- PWA: re-enabled service worker with safe caching (JS/CSS never cached)
- Install App button in Settings when PWA is installable
- ARIA: aria-haspopup="menu" on dropdowns, aria-label on icon buttons (sidebar, player, modals)
- Focus visible: 2px outline with offset on :focus-visible
### Changed
- ThemeProvider extended with contrast, density, accentHue, fontSize
- AppearanceSettingsView wired to ThemeProvider and backend sync
- Service worker: network-only for /assets/ and .js/.css, cache images/fonts only
---
## [v0.703] - 2026-02-25
### Added
- Go Live: page /live/go-live with stream key display, OBS/Streamlabs instructions
- Stream key management: GET /live/streams/me/key, POST /live/streams/me/key/regenerate
- GET /live/streams/me: list user's streams (includes stream_key)
- PUT /live/streams/:id: update stream metadata (ownership check)
- Live stream chat: LiveViewChat connected to WebSocket (stream_id as room)
- Real-time viewer count: polling GET /live/streams/:id in LiveViewPlayer
- Media Session API: seekbackward/seekforward handlers (10s step)
- Room creation for live streams: chat room auto-created when stream is created
- Permissions: CanJoin/CanSend/CanRead allow public access for live stream rooms
- GoLiveView.stories.tsx: Default, Loading, Error, StreamKeyVisible
### Changed
- Navbar "Go Live" navigates to /live/go-live instead of toast
- useLiveView uses real chat when authenticated, mock when not
---
## [v0.702] - 2026-02-24
### Added
- Route /marketplace/products/:id with ProductDetailPage (lazy loaded)
- MSW handlers for product reviews (GET list, POST create) and invoice download
- Unit tests: product reviews (6 tests), invoice generation (4 tests), refund order (6 tests)
- API_REFERENCE.md: documented reviews, invoices, refunds endpoints
### Changed
- ProductDetailView.stories.tsx: added Error state story
---
## [v0.701] - 2026-02-23
### Added
- Transfer Retry Worker: automatic retry of failed Stripe Connect transfers (exponential backoff, max 3 retries)
- Migration 116: retry_count, next_retry_at columns on seller_transfers
- GET /admin/transfers: paginated admin view of all platform transfers (filters: status, seller, date)
- POST /admin/transfers/:id/retry: manual retry of failed transfers (admin only)
- AdminTransfersView frontend component with status badges, retry button
- GET /health/deep: deep health check (DB, Redis, S3, disk, config summary)
- Startup config validation: PlatformFeeRate range, Stripe Connect coherence, retry config
- Prometheus metrics: transfer retry (total, success, failures, permanent)
- docs/API_REFERENCE.md: API documentation with curl examples
### Changed
- SellerTransfer model: added retry_count, next_retry_at fields
- Config: added TRANSFER_RETRY_ENABLED, TRANSFER_RETRY_MAX, TRANSFER_RETRY_INTERVAL
- Health handler: added DeepHealth method with disk space and config
---
## [v0.603] - 2026-02-23
### Added
- Automatic Stripe Connect transfer after successful payment (webhook Hyperswitch)
- Platform commission configurable via PLATFORM_FEE_RATE (default 10 %)
- Migration 115 seller_transfers for transfer tracking
- GET /sell/transfers endpoint for seller transfer history
- Transfer History card in SellerDashboard
- Unit tests: transfer success, multi-seller, transfer-fails
### Changed
- ProcessPaymentWebhook now triggers seller transfers after license creation
- PAYOUT_MANUAL.md updated for automatic transfer flow
- Pre-v0.501 docs archived to docs/archive/
---
## [v0.602] - 2026-02-23
### Added
- Stripe Connect seller payout (onboarding, balance)
- seller_stripe_accounts migration (114)
- Commerce E2E tests (backend integration: product -> order -> review -> invoice)
- docs/SMOKE_TEST_V0602.md
- docs/PAYOUT_MANUAL.md (manual payout procedure for v0.603)
### Changed
- interceptors.ts split: auth.ts and error.ts extracted (facade < 30 LOC)
- Grafana dashboards enriched with real Prometheus metrics
- sanitizer.go: fix invalid regex backreference for object/embed tags (Go regexp has no \1)
### Infrastructure
- Commerce Prometheus metrics (orders_total, checkout_duration)
---
## [v0.601] - 2026-02-23
### Added
- Blue-green deployment via HAProxy (backend-api-blue/green, stream-server-blue/green, deploy-blue-green.sh)
- 3 Grafana dashboards: api-overview, chat-overview, commerce-overview
- Alertmanager config with Slack/email receivers, wired to Prometheus
- Hyperswitch LIVE_MODE configuration (HYPERSWITCH_LIVE_MODE env)
- OAuth Discord and Spotify unit tests (GetAuthURL, GetUserInfo, GetAvailableProviders)
- docs/MIGRATIONS.md documenting squash script and baseline procedure
### Changed
- handler.go split into 4 sub-handlers: track_crud_handler, track_social_handler, track_search_handler, track_analytics_handler (~163 LOC facade)
- interceptors.ts split into modules: interceptors/utils, interceptors/request, interceptors/response
- squash_migrations.sh: baseline_v0601.sql, migrations 001-113, output to file
### Infrastructure
- docker-compose.prod.yml: blue-green services, Alertmanager (port 9093)
- config/alertmanager/alertmanager.yml
- config/prometheus.yml: alertmanager_config
---
## [v0.503] - 2026-02-22
### Added
- HLS streaming end-to-end: backend serving routes (master.m3u8, quality playlists, segments) behind HLS_STREAMING feature flag
- Redis-backed chat rate limiter with sliding window (sorted sets) and automatic in-memory fallback
- ChatPresenceService with Redis-backed online/offline/heartbeat tracking (2min TTL)
- PostgreSQL full-text search on messages: tsvector column, GIN index, auto-update trigger
- MSW handlers for HLS endpoints (info, status, playlists)
- HLS player integration in frontend: useHLSPlayer connected to useAudioPlayerLifecycle with ABR quality switching
- StreamService.GetHLSStatus and TriggerHLSTranscode methods
### Changed
- Chat rate limiter now uses Redis sliding window with in-memory fallback (was purely in-memory)
- Chat message search now uses PostgreSQL ts_rank ordering (was ILIKE pattern matching)
- Hub constructor now accepts ChatPresenceService for presence tracking
### Removed
- veza-chat-server/ directory (deprecated Rust chat server)
- All chat-server references from CI/CD workflows, monitoring, proxy config, Incus scripts, GitHub templates
### Infrastructure
- Shared HLS volume between backend and stream-server in all docker-compose files
- HLS_STREAMING and HLS_STORAGE_DIR environment variables added to backend service
---
## [v0.502] - 2026-02-22
### Added
- **Chat Server (Go)**: Full WebSocket chat server integrated into veza-backend-api at `/api/v1/ws`
- **WebSocket Hub**: Client management with room-based broadcasting and user indexing
- **Message Handlers**: SendMessage, EditMessage, DeleteMessage with ownership checks
- **Room Handlers**: JoinConversation, LeaveConversation with permission enforcement
- **History/Search/Sync**: Cursor-based FetchHistory, ILIKE SearchMessages, SyncMessages
- **Real-time Features**: Typing indicators, read receipts, delivered status, message reactions
- **WebRTC Signaling**: CallOffer, CallAnswer, ICECandidate, CallHangup, CallReject relay
- **PermissionService**: CanRead, CanSend, CanJoin, CanModerate based on room_members
- **RateLimiter**: Per-user per-action sliding window (in-memory)
- **ChatPubSubService**: Redis PubSub for multi-instance broadcasting with in-memory fallback
- **4 database migrations** (109-112): read_receipts, delivered_status, message_reactions, messages extra columns
- **3 new GORM models**: ReadReceipt, DeliveredStatus, MessageReaction
- **ChatMessageRepository enriched**: cursor pagination, search, soft delete
- **ValidateChatToken**: JWT validation for WebSocket authentication
- **15 unit tests**: Hub, message handlers, real-time handlers
- **CHAT_FEATURE_PARITY.md**: 25-feature checklist (all OK or IMPROVED vs Rust)
### Changed
- Frontend env.ts: WS_URL auto-derived from API_URL (no separate VITE_WS_URL needed)
- Frontend types/index.ts: Added EditMessage, DeleteMessage, FetchHistory, SearchMessages, SyncMessages, MessageEdited, MessageDeleted, SearchResults, SyncChunk
- MSW handler: chat/token returns ws_url: '/api/v1/ws'
- WSUrl in ChatService.GenerateToken changed from `/ws` to `/api/v1/ws`
### Removed
- **Rust chat server** (`veza-chat-server`) removed from docker-compose.yml, staging.yml, prod.yml
- VITE_WS_URL environment variable from Docker frontend configs (auto-derived)
- Dev hack for `127.0.0.1:8081` in useChat.ts
### Infrastructure
- Docker: Single backend binary serves both REST API and Chat WebSocket
- Redis PubSub: Enables horizontal scaling of chat (improvement over single-instance Rust)
---
## [v0.501] - 2026-02-22
### Added
- **HLS Multi-bitrate Streaming**: 3-tier adaptive bitrate (128k, 256k, 320k) with hls.js ABR
- **Waveform Generation**: Async FFmpeg + audiowaveform pipeline with S3 storage and Redis cache
- **WaveformDisplay Component**: Interactive SVG waveform with seek support
- **Cloud Storage MVP**: Full folder/file management with 5GB quota per user
- **Cloud Upload Modal**: Drag-and-drop with progress and quota validation
- **Cloud File Preview**: Inline audio player for cloud files
- **Gear Public Profiles**: is_public toggle, public endpoint, GearShowcase component
- **Gear Image Gallery**: Multi-image support with carousel viewer
- **Gear Search**: ILIKE-based search with frontend SearchBar
- **MinIO Integration**: S3-compatible storage in all environments
- **Prometheus Streaming Metrics**: 4 new counters (transcode duration, segments served, active connections, errors)
- **useHLSPlayer Hook**: hls.js integration with ABR quality selection
- **Container Scanning**: Trivy CI workflow for Docker images
- **6 new database migrations** (103-108): waveform, cloud, gear images
### Changed
- QualitySelector updated to 256kbps medium tier
- Track handler split into 4 focused files (handler, upload, HLS, waveform)
- Production console.log replaced with structured logger
- Gear handler extended with search and image endpoints
### Infrastructure
- MinIO added to docker-compose (dev, staging, prod)
- HLS segment cache headers (immutable, 1-year max-age)
- Migration squash script and MIGRATIONS.md documentation
---
## [v0.404] - 2026-02-22
### Security
- Ephemeral JWT stream-token endpoint for HLS/WebSocket auth (SEC-03)
- SSRF protection: webhook URLs require HTTPS only (SEC-07)
- IDOR fix in GetUploadStatus with ownership verification (SEC-06)
- Hyperswitch webhook secret required in production (SEC-08)
- Password reset tokens hashed with SHA-256 before storage (INF-10)
- Docker hybrid compose removed (SEC-04)
- CI credentials moved to GitHub Secrets (SEC-10)
- JWT_SECRET added to stream-server in production compose (SEC-05)
- Go version unified to 1.24 across Dockerfile and CI (SEC-09)
- CD pipeline fixed (vars.\* in conditions, Dockerfile.production) (SEC-01)
- Redis authentication enabled in production compose (SEC-02)
### Infrastructure
- Redis-backed rate limiter with in-memory fallback (INF-01)
- PostgreSQL aligned to v16 in test environment (INF-02)
- Frontend CI: lint, typecheck, build steps added (INF-03)
- Backend CI: go vet + gofmt check added (INF-04)
- Rust CI with clippy for chat and stream servers (INF-05)
- CodeQL SAST scanning for Go and TypeScript (INF-06)
- Complete staging compose with chat, stream, Caddy reverse proxy (INF-07)
- Prometheus alerting rules for critical conditions (INF-08)
- Docker healthchecks on all services (INF-09)
### Code Quality
- 40 fmt.Printf replaced with zap structured logging (CLN-03)
- ~45 `any` types eliminated in frontend production code (CLN-04)
- TypeScript unified to 5.9.3 across all packages (CLN-06)
- ~1600 LOC dead code removed (CLN-01)
- gorilla/websocket replaced with coder/websocket (INT-06)
- commerceService mock data replaced with real API calls (CLN-02)
- Protobuf definitions centralized in proto/ directory (CLN-07)
### Documentation
- ADR-001: Go+Rust architecture decision (CLN-08)
- ADR-002: Chat server Rust->Go migration plan (INT-01)
- FEATURE_STATUS.md aligned with actual code state (CLN-05)
- PROJECT_STATE.md updated with v0.404 metrics (FIN-02)
### Testing
- 5 cross-service E2E integration tests (INT-03)
- 51 unit tests added across Rust services (INT-05)
- 2 skipped backend tests fixed, 11 clarified (INT-04)
### Integration
- HLS transcoding triggered after track upload (INT-02)
---
## [v0.402] - 2026-02-21
### Added
- **Lot P1 — Checkout Hyperswitch production-ready**
- Return URL with `order_id` for success/error pages
- CheckoutSuccessView, CheckoutErrorView, CheckoutCompletePage
- Route `/checkout/complete` (protected)
- Webhook: handle `cancelled` status in ProcessPaymentWebhook
- CheckoutPaymentForm (Hyperswitch) in Cart when `client_secret` returned
- marketplaceService.getOrder(orderId)
- **Lot P2 — Codes promo / réductions**
- Migrations 099 (promo_codes), 100 (orders discount fields)
- PromoCode model, ValidatePromoCode, validatePromoCodeTx
- GET /commerce/promo/:code
- CreateOrder and Checkout accept `promo_code` (percent/fixed)
- PromoCodeModal connected to validatePromoCode API
- Cart: PromoCodeModal, OrderSummary with discount, promo_code at checkout
- MSW handlers for promo, orders/:id, checkout with promo_code
- Stories: CheckoutSuccessView, CheckoutErrorView, PromoCodeModal
### Changed
- CreateOrder signature: promoCode string parameter
- Cart.Checkout: promoCode parameter
- OrderSummary integrated in Cart with discount support
---
## [v0.401] - 2026-02-22
### Added
- **Lot M1 — Produits & Catalogue**
- Migrations 095-097 : products enrichment (bpm, musical_key, category), product_previews, product_images
- ProductPreview, ProductImage models, CreateProduct/UpdateProduct accept bpm, musical_key, category
- POST /marketplace/products/:id/preview (audio preview upload)
- PUT /marketplace/products/:id/images
- GET /marketplace/products/:id/preview (stream audio)
- ListProducts filters: bpm, musical_key, category
- CreateProductView connected to enriched API, BPM/Key/Category filters in MarketplaceHome
- ProductDetailView: playable preview, image gallery
- Rich text description (sanitization backend, toolbar Bold/List frontend)
- **Lot M2 — Licences & Droits**
- Migration 098 : product_licenses (license_type, price_cents, terms_text)
- ProductLicense model, SetProductLicenses, GetProductLicenses
- CreateProduct/UpdateProduct accept licenses array
- GET /marketplace/licenses/mine (user's purchased licenses with download_url)
- LicenceCard, LicenceDetailsModal: license_type, price_cents, terms_text
- LicensesView in PurchasesPage with download links
- **Lot M3 — Seller dashboard enrichi**
- GET /sell/stats/evolution (day/week/month)
- GET /sell/stats/top-products
- GET /sell/sales (real sales data)
- commerceService: getSales, getSellerStatsEvolution, getSellerTopProducts (real API)
- SalesEvolutionChart (Recharts LineChart)
- Top Products section with real revenue/sales_count
- Conversion rate: N/A when no tracking
### Changed
- Marketplace products: bpm, musical_key, category, previews, images, licenses
- SellerDashboardView: real data, evolution chart, top products from API
---
## [v0.303] - 2026-02-22
### Added
- **Lot C2 — Chat appels WebRTC 1-to-1**
- Chat Server : signalisation CallOffer, CallAnswer, ICECandidate, CallHangup, CallReject
- WebSocketManager.send_to_user pour livraison 1-to-1
- RateLimitAction::CallSignaling (60 req/min)
- Frontend : useWebRTC hook, CallButton, IncomingCallModal, ActiveCallBar
- Appels audio 1-to-1 dans conversations DM
---
## [v0.302] - 2026-02-21
### Added
- **Lot S2 — Groupes avancés**
- Demander à rejoindre (groupes privés), approbation/rejet par admin
- Inviter membres par email ou user_id
- Rôles assign/revoke (admin, moderator, member)
- Feed type=groups (posts des membres des groupes)
- GET /social/groups/mine
- Migrations 069, 089, 092
- **Lot N1 — Notifications push Web**
- POST /notifications/push/subscribe, PushService (webpush-go)
- Envoi push sur follow/like/comment/message (selon préférences)
- GET/PUT /notifications/preferences
- Migrations 090, 093
- Frontend : subscribePush, PushPreferencesSection, badge document.title
- **Lot P2 — Présence enrichie**
- PUT /users/me/presence (status_message, track_id, track_title, invisible)
- Rich presence : sync track en cours via usePresenceSync
- Mode invisible (GetPresenceForViewer masque pour les autres)
- PresenceBadge statusMessage tooltip
- Migrations 091, 094
### Changed
- NotificationService : SetPushService, envoi push post-CreateNotification
- Shared NotificationService avec PushService pour profile, track, comment handlers
### Deferred (v0.303)
- **Lot C2** : Livré en v0.303
---
## [v0.301] - 2026-02-20
### Added
- **Lot P0 — Chat Server**
- Protocole typing aligné : `{ type: 'Typing', conversation_id, is_typing }`
- Limitation JWT auth (query param) documentée pour v0.302
- **Lot C1 — Chat avancé**
- Typing indicators end-to-end (UserTyping, setUserTyping)
- Read receipts (MarkAsRead, MessageRead, « Vu à HH:mm »)
- Delivered status (Delivered, MessageDelivered)
- **Lot P1 — Présence**
- Migration 088 user_presence (status, last_seen_at, status_message)
- PresenceService, GET /users/:id/presence
- Mise à jour last_seen_at sur chaque requête authentifiée
- PresenceBadge, usePresence, intégration ChatSidebar
- **Lot S1 — Social enrichi**
- Feed connecté à socialService.getFeed (remplace trackService.list)
- Backend : enrichissement actor_name, actor_avatar, track dans GetGlobalFeed
- SocialViewFeedItem : posts texte + posts avec track (mini player)
- Pagination cursor (next_cursor), useInfiniteQuery, Load More
- GET /social/explore (trending + suggested_users), onglet Explore
- Filtres feed : all | following | groups (param type, OptionalAuth pour following)
### Changed
- useSocialView : socialService.getFeed, useInfiniteQuery, feedFilter
- SocialView : onglet Explore, filtres feed
- AuthMiddleware : SetPresenceService, UpdatePresence sur RequireAuth
### Documented
- FEATURE_STATUS, PROJECT_STATE mis à jour pour v0.301
---
## [v0.203] - 2026-02-20
### Added
- **Lot L — Social Trending**
- GET /social/trending (extraction hashtags posts 7 jours, agrégation)
- Cache Redis 15 min (clé trending:hashtags)
- SocialViewTrending connecté à lAPI (Loading, Error, Empty fallback)
- MSW handler GET \*/api/v1/social/trending
- **Lot K — Recherche enrichie**
- Migration 086 pg_trgm pour fuzzy search
- TrackSearchService : similarity() sur title/artist/album (PostgreSQL), fallback ILIKE (SQLite)
- query_parser.go : AND, OR, NOT, "phrase exacte"
- SearchService + TrackSearchService utilisent le parser
- SearchPageHeader : tooltip aide syntaxe
- **Lot D1 — Queue collaborative**
- Migration 087 queue_sessions, shared_queue_items
- Modèles QueueSession, SharedQueueItem
- QueueSessionService : Create, Get, Delete, Add/Remove items
- POST/GET/DELETE /queue/session, POST/DELETE /queue/session/:token/items
- PlayerQueue : bouton Partager, badge Queue partagée, polling 8 s
- queueSessionStore, useQueueSync mode session
- MSW handlers pour queue session
### Changed
- SocialViewTrending : useQuery, skeletons, erreur → fallback tags
- TrackSearchService : dialect sqlite → LIKE, postgres → similarity
- SearchService : BuildWhereCondition pour requêtes booléennes
- PlayerQueue : mode session, partage lien, sync session
- useQueueSync : skip sync personnelle quand session active
### Documented
- FEATURE_STATUS, PROJECT_STATE mis à jour pour v0.203
---
## [v0.202] - 2026-02-20
### Added
- **Lot G — Recherche avancée**
- Filtre musical_key dans track_search (G1)
- Tri pertinence (relevance) dans SearchService (G2)
- Autocomplete : GET /search/suggestions, dropdown debounced (G3)
- Facettes type (tracks/artistes/playlists/users) dans SearchPage (G4)
- Historique recherche localStorage (G5)
- **Lot H — Analytics créateur**
- GET /analytics/creator/stats, carte Completion Rate (H1)
- GET /analytics/creator/charts, graphiques (H2)
- Taux de complétion intégré dashboard (H3)
- GET /analytics/creator/export CSV/JSON (H4)
- **Lot F — Seller dashboard**
- GET /sell/stats, connexion commerceService (F1)
- Support seller_id=me dans ListProducts (F2)
- **Lot C — Player avancé**
- Crossfade configurable (112 s) depuis Settings (C1)
- Gapless préchargement via preloadTrack (C2)
- PiP (Picture-in-Picture) si supporté (C3)
- **Lot D — Autoplay**
- GET /tracks/recommendations (auth), section « À écouter ensuite » dans PlayerQueue (D2)
### Changed
- SearchPage : onglets type, suggestions dropdown, historique récent
- AnalyticsViewKpiGrid : métrique Completion Rate
- AnalyticsViewChart : graphiques creator
- SettingsPage : slider crossfade
- PlayerQueue : recommandations quand queue vide (authentifié)
- PlayerStore : crossfadeSeconds, préchargement ~5 s avant fin
### Documented
- D1 (queue collaborative) reporté v0.203+
- V0_202_RELEASE_SCOPE.md, FEATURE_STATUS.md, PROJECT_STATE.md mis à jour
---
## [v0.201] - 2026-02-20
### Added
- **Lot E — Métadonnées enrichies**
- BPM : champ dans Track model, UpdateTrack, filtre track_search (E1)
- Musical key : champ, input/select édition, affichage TrackDetailPageInfo (E2)
- Lyrics : table track_lyrics, GET/PUT /tracks/:id/lyrics, section Paroles avec toggle (E3)
- Tags suggérés : GET /tracks/suggested-tags?genre=X, migration tracks.tags, chips + dropdown (E4)
### Changed
- Track model : BPM, MusicalKey, Tags (pq.StringArray)
- TrackDetailPageInfo : affichage BPM, key, tags
- TrackMetadataEditModal : édition BPM, musical_key, tags avec suggestions
### Documented
- Lot G (Recherche avancée), H (Analytics), F (Seller), C (Player), D (Queue) reportés v0.202+
---
## [v0.103] - 2026-02-20
### Added
- **Auth (Lot A)** : OAuth Spotify (A1), page Sessions enrichie avec historique et révocation (A4)
- **Profils (Lot B)** : Bannière de profil éditable (B1), section liens sociaux sur profil public (B2), toggle profil privé dans Settings (B3)
- **Profil privé** : Vue « Profil privé » sur `/u/:username` quand le profil est masqué ; `is_public` exposé et persisté
### Documented
- 2FA SMS et Passkeys/WebAuthn reportés à v0.104
---
## [v0.102] - 2026-02-20
### Added
- **Queue persistante** : API CRUD (`GET/PUT/POST/DELETE /api/v1/queue`), sync frontend via `useQueueSync`, drag & drop reorder avec @dnd-kit (B3)
- **Developer API Keys** : CRUD clés API, X-API-Key middleware, CreateAPIKeyModal, révocation
- **Playlists** : activation PLAYLIST_SHARE, PLAYLIST_RECOMMENDATIONS ; boutons Export (JSON/CSV), Duplicate connectés
- **Social** : like/comment post connectés à lAPI ; profil followers/following count ; badges rôles
- **Player** : playback speed (0.5x2x), Media Session API, waveform dans progress bar
### Changed
- **Gear, Live, Queue, Developer** : routes opérationnelles (fin des placeholders Coming Soon)
- Feature flags PLAYLIST_SHARE et PLAYLIST_RECOMMENDATIONS activés (true)
### Documented
- Go Live (streaming vidéo) : non implémenté, prévu v0.703 — limitation A6
- Social Trending (tags) : statique, report v0.103 pour `GET /social/trending`
---
## [Unreleased] - 2024-12-07
### Security
- **chat-server**: Implemented JWT Authentication Middleware for HTTP API.
- Secured `/api/messages` (POST) and `/api/messages/{id}` (GET).
- Enforced permission checks (`can_send_message`, `can_read_conversation`).
- Patched `sender_id` spoofing vulnerability by enforcing User ID from Token Claims.
- **backend**: Resolved `veza_errors_total` metric collision preventing proper monitoring initialization.
### Fixed
- **backend**: Fixed `JobWorker` starvation issue by replacing blocking `time.Sleep` with non-blocking scheduler.
- **stream-server**: Improved task safety by replacing unsafe `abort()` with graceful `join/await` for monitoring tasks.
- **chat-server**: Fixed resource leak by implementing 60s WebSocket inactivity/heartbeat timeout.
- **chat-server**: Implemented Graceful Shutdown handling for OS signals (SIGTERM/SIGINT).
- **backend-tests**: Fixed `RoomHandler` unit tests.
- Refactored `RoomHandler` to use `RoomServiceInterface` for dependency injection.
- Updated `CreateRoom` tests to match actual Service signatures.
- Fixed `bitrate_handler_test.go` compilation errors.
- Resolved global metric registration panics during testing.
### Removed
- **backend**: Deleted legacy maintenance code (`migrations_legacy/` and `src/cmd/main.go.legacy`).
### Known Issues
- **backend**: Some unit tests (`metrics_test.go`, `profile_handler_test.go`, `system_metrics_test.go`) are disabled due to bitrot/missing dependencies.
- **stream-server**: Compilation requires active Database connection (sqlx compile-time verification) or `sqlx-data.json`.