chore(release): v1.0.6 — ergonomics + operational hardening

Follow-up to the v1.0.5 hardening sprint. That release validated the
`register → verify → play` critical path end-to-end; this one addresses
the next layer — the UX friction and operational blindspots that a
first-day public user (or a first-day on-call) would hit. Six targeted
commits, each with its own tests:

  * Fix 1 — Self-service creator role (9f4c2183a)
  * Fix 2 — Upload size limits from a single source (7974517c0)
  * Fix 3 — Unified SMTP env schema on canonical SMTP_* names (9002e91d9)
  * Fix 4 — Refund reverse-charge with idempotent webhook (92cf6d6f7)
  * Fix 5 — RTMP ingest health banner on Go Live (698859cc5)
  * Fix 6 — RabbitMQ publish failures no longer silent (4b4770f06)

Breaking changes:
  * marketplace.MarketplaceService.RefundOrder now returns
    (*Refund, error) — callers must accept the pending refund row.
  * Internal refundProvider interface changed from
    Refund(...) error to CreateRefund(...) (refundID, status, err).
  * Order status machine gains `refund_pending` as an intermediate
    state. Clients reading orders.status should not treat it as
    refunded yet.

Parked for v1.0.7:
  * Partial refunds (UX decision + call-site wiring)
  * Stripe Connect Transfers:reversal (internal accounting is
    already corrected; this is the external money-movement call)
  * CloudUploadModal.tsx unifying on /upload/limits
  * Manual smoke test of refund flow against Hyperswitch sandbox

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
senke 2026-04-17 02:13:45 +02:00
parent 92cf6d6f76
commit a4d2ffd123
2 changed files with 210 additions and 1 deletions

View file

@ -1,5 +1,214 @@
# Changelog - Veza
## [v1.0.6] - 2026-04-17
### Ergonomics + operational hardening — six items from the v1.0.5 backlog
Follow-up to the hardening sprint. v1.0.5 validated the
`register → verify → play` critical path end-to-end; v1.0.6 addresses the
next layer — the UX friction and operational blindspots that a first-day
public user (or a first-day on-call) would hit. Six targeted commits.
#### Fix 1 — Self-service creator role (`c32278dc1`)
New `POST /api/v1/users/me/upgrade-creator`. Verified users click a
"Become an artist" button in `/settings → Account` and their role flips
from `user` to `creator` on one conscious click — no KYC, no cooldown,
no admin round-trip. One-way by design (downgrade = support ticket) so
we don't have to handle the "my uploads orphaned" edge case.
* Gated strictly on `is_verified=true` (403 `EMAIL_NOT_VERIFIED`
otherwise).
* Idempotent 200 for anyone already creator-tier — no clutter.
* UPDATE scoped `WHERE role='user'` so a concurrent admin assignment
can't be silently overwritten.
* Audit trail: `user.upgrade_creator` action logged with the full
role transition metadata.
* Migration `977_users_promoted_to_creator_at.sql` adds a nullable
`promoted_to_creator_at TIMESTAMPTZ` column — distinguishes organic
self-promotions from admin-assigned roles for analytics.
* Tests: 6 Go (happy path, unverified, already-creator, admin
idempotent, 404, no-auth) + 7 Vitest (verified button, unverified
state, hidden for creator, hidden for admin, refetch on success,
idempotent message, server error toast).
#### Fix 2 — Upload size limits from a single source (`5848c2e40`)
The v1.0.5 audit flagged a "front 500MB vs back 100MB" mismatch. In
reality every live pair was aligned (tracks 100/100, cloud 500/500,
video 500/500) — the real architectural bug was **five duplicated
hardcoded values** that could drift silently as soon as anyone tuned
one.
* `internal/config/upload_limits.go`: `AudioLimit`, `ImageLimit`,
`VideoLimit` expose `Bytes()`, `MB()`, `HumanReadable()`,
`AllowedMIMEs`. Read lazily from env
(`MAX_UPLOAD_AUDIO_MB`, `MAX_UPLOAD_IMAGE_MB`,
`MAX_UPLOAD_VIDEO_MB`, defaults 100/10/500). Invalid/negative/zero
env values fall back to default.
* `track/service.go`, `track_upload_handler.go`,
`education_handler.go`, `upload.go:GetUploadLimits` all consume the
single source. Changing one env retunes every path.
* Frontend `useUploadLimits()` hook: react-query with 5 min stale,
30 min gc, 1 retry then optimistic fallback to baked-in defaults
so the dropzone stays responsive even without the network round
trip. `useUploadModal` replaces `MAX_FILE_SIZE` constant with the
live value; `UploadModal` forwards `audioMaxHuman` to
`UploadModalDropzone` so the label and error toast track the env.
* Out of scope (tracked for later): `CloudUploadModal.tsx` still
hardcodes 500MB — cloud uploads accept audio+zip+midi with a
different category semantic than the three in `/upload/limits`.
Unifying deserves its own design pass.
* Tests: 4 Go (defaults, env override, invalid env fallback, MIME
lists) + 4 Vitest (sync fallback, typed mapping, partial-payload
fallback per category, network failure keeps fallback).
#### Fix 3 — Unified SMTP env schema (`066144352`)
Two email services in-tree read *different* env vars for the same
fields — surfaced during the v1.0.5.1 hotfix:
internal/email/sender.go internal/services/email_service.go
SMTP_USERNAME SMTP_USER
SMTP_FROM FROM_EMAIL
SMTP_FROM_NAME FROM_NAME
v1.0.6 reconciles both onto canonical `SMTP_*` names, with a migration
fallback to the legacy names that logs a structured deprecation warning
(`remove_in: v1.1.0`).
* `internal/email/sender.go` is the single loader — both services
delegate to it via `LoadSMTPConfigFromEnvWithLogger(*zap.Logger)`.
Canonical wins over deprecated; no precedence surprise.
* `docker-compose.yml` backend-api env: `FROM_EMAIL` /
`FROM_NAME``SMTP_FROM` / `SMTP_FROM_NAME` to match the canonical
schema.
* `.env.template` trimmed — only canonical vars ship, old ones
removed (still accepted in running env for zero-downtime rollover).
* No default injected for Host/Port in the loader. `Host==""`
callers go log-only (matches historic dev behavior). Dev defaults
stay in `.env.template`, so prod fails fast instead of silently
dialing localhost.
* Tests: 5 Go (empty env, canonical direct, deprecated fallback
+ warning emission, canonical silently wins over deprecated, nil
logger allowed).
#### Fix 4 — Refund reverse-charge with idempotent webhook (`959031667`)
The structural one. Before v1.0.6, `RefundOrder` wrote `status='refunded'`
to the DB and called Hyperswitch synchronously, treating the API ack as
terminal. In reality Hyperswitch returns `pending` and only finalizes via
webhook. Customers could see "refunded" while their bank was still
uncredited, and the seller balance kept its credit even on successful
refunds.
* Two-phase flow:
1. **Open pending refund** (short row-locked tx): validate
permissions + 14-day window + double-submit guard; persist
`Refund{status=pending}`; flip order to `refund_pending` (not
`refunded` — that's the webhook's job).
2. **PSP call outside the tx**: `Provider.CreateRefund` returns
`(refund_id, status, err)`. On error, mark refund failed + roll
order back to `completed`. On success, capture the
`hyperswitch_refund_id` as the idempotency key — stay in
`pending` even if the sync status is "succeeded" (per customer
guidance: never trust the sync ack, always wait for the
webhook).
3. **`ProcessRefundWebhook`** drives terminal state. Row-lock +
`IsTerminal()` short-circuit: any duplicate Hyperswitch retry
is a no-op 200. On `refund.succeeded`: flip refund + order to
succeeded/refunded, revoke licenses, debit seller balance,
mark every `SellerTransfer` for the order as `reversed`.
* Migration `978_refunds_table.sql` with `UNIQUE(hyperswitch_refund_id)`
— this is the load-bearing idempotency guarantee.
* Webhook routing: `HyperswitchWebhookPayload.IsRefundEvent()`
dispatches `refund.*` events to `ProcessRefundWebhook`; payment
events keep flowing through the existing `ProcessPaymentWebhook`.
* `DebitSellerBalance` ported off Postgres-only `GREATEST()` to
portable `CASE WHEN`; the path wasn't exercised before v1.0.6, so
this is a quality fix not a regression.
* Partial refunds: signature carries `amount *int64` (nil = full) but
service call-site passes nil — full-only for v1.0.6. Partial-refund
UX is deferred to v1.0.7.
* Stripe Connect Transfers:reversal call flagged TODO(v1.0.7).
Internal balance + transfer-status are corrected here so buyer and
seller views match the moment the PSP confirms; the missing piece
is the money-movement round-trip at Stripe. Internal accounting is
consistent — external settlement catches up with v1.0.7.
* Tests: 15 Go cases covering Phase 1 (pending state, PSP error
rollback, double-submit, permissions, window), webhook
finalization (succeeded, failed, idempotent replay with
`succeeded_at` timestamp invariant, unknown refund_id, missing
refund_id, non-terminal ignored), and dispatcher logic (6
`IsRefundEvent` cases across flat/nested/event_type shapes).
#### Fix 5 — RTMP ingest health banner on Go Live (`64fa0c9ac`)
"Go Live" was silent when `nginx-rtmp` wasn't running. An artist could
copy the RTMP URL + stream key, fire OBS, and broadcast into the void
with no in-UI signal.
* `GET /api/v1/live/health` TCP-dials `NGINX_RTMP_ADDR` (default
`localhost:1935`), 2s timeout, 15s TTL cache protected by a mutex so
a burst of page loads can't hammer the ingest. Returns UI-safe
`error` string (no raw hostname leak) and `Cache-Control: private,
max-age=15` so browsers honor the same window.
* Unreachable path emits a WARN log so operators see the outage
before users do.
* Frontend `useLiveHealth()` hook: react-query 15s stale, 1 retry,
then optimistic `{ rtmpReachable: true }` — better to miss a banner
than flash a false negative on a transient health-endpoint blip.
* `LiveRtmpHealthBanner` at the top of `GoLivePage`: amber,
non-blocking, copy explicitly tells the artist the stream key is
still valid but broadcasting won't reach anyone, with a Retry
button that invalidates the health query.
* Tests: 3 Go (listener reachable + Cache-Control; dead port
unreachable + UI-safe error asserting no `127.0.0.1` leak; TTL
cache survives listener teardown) + 3 Vitest (hidden when
reachable, visible with Retry when unreachable, Retry invalidates
the right query key).
#### Fix 6 — RabbitMQ publish failures no longer silent (`bf688af35`)
`RabbitMQEventBus.Publish` returned the broker error but did not log
it. Callers that wrapped `Publish` in fire-and-forget
(`_ = eb.Publish(...)`) lost events with zero trace during RMQ outages.
* `Publish` now emits a structured ERROR on broker failure with the
exchange, routing_key, payload_bytes, content_type, and message_id
context. Function still returns the error so call-sites that
actually check it keep working.
* `EventBus disabled` warning kept but upgraded with `payload_bytes`
so dashboards can quantify drops when RMQ is intentionally off.
* Aligns the legacy `internal/eventbus` with `infrastructure/eventbus`
which already had this pattern.
* Tests: 2 Go (disabled bus emits WARN + returns
`EventBusUnavailableError`; nil logger stays panic-free for legacy
callers).
### Breaking changes
* `marketplace.MarketplaceService.RefundOrder` now returns
`(*Refund, error)` instead of `error`. Callers consuming the
service directly need to accept the pending refund row.
* `marketplace.refundProvider` internal interface: `Refund(...)
error` → `CreateRefund(...) (refundID, status string, err error)`.
`hyperswitch.Provider` implements both; external mocks must be
updated.
* Order status machine gains `refund_pending` as an intermediate
state. Clients reading `orders.status` should treat it as
"in-flight refund, don't show as refunded yet".
### Known gaps (parked for v1.0.7)
* Partial refunds — UX decision + call-site wiring
* Stripe Connect Transfers:reversal — actually move money back at
the PSP level (internal accounting is correct today)
* `CloudUploadModal.tsx` hardcoded 500MB — category semantic doesn't
map to the three exposed by `/upload/limits`
* Smoke test of refund flow against Hyperswitch sandbox (manual,
outside CI)
## [v1.0.5.1] - 2026-04-16
### Hotfix — dev SMTP ergonomics

View file

@ -1 +1 @@
1.0.5.1
1.0.6