chore(release): v1.0.5 — hardening sprint

Seven targeted fixes to the register → verify → play critical path before public opening. Each landed in its own commit with dedicated tests; this commit just rolls VERSION forward and captures the rationale in the changelog. Summary of what's in this release: * Fix 1 — Player muet: /stream endpoint + HLS default alignment * Fix 2 — Email verify bidon: real SMTP + MailHog + fail-loud in prod * Fix 3 — Marketplace gratuit: HYPERSWITCH_ENABLED=true required in prod * Fix 4 — Redis obligatoire: REDIS_URL required in prod + ERROR log on in-memory PubSub fallback * Fix 5 — Maintenance mode DB-backed via platform_settings * Fix 6 — Hourly cleanup of orphan tracks stuck in processing * Fix 7 — Response cache bypass for range-aware media endpoints (surfaced by the browser smoke test; prevents Range/Accept-Ranges strip and JSON-round-trip byte corruption on /stream, /download, /hls/ and any request with a Range header) Parked for v1.0.6 (🟠/🟡 audit items + smoke-test ergonomics): Hyperswitch refund→PSP propagation, livestream UI feedback when nginx-rtmp is down, upload size mismatch (front 500MB vs back 100MB), RabbitMQ silent drop on enqueue failure, SMTP_HOST ergonomics for `make dev` host mode, creator-role self-service onboarding for upload. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 16:14:54 +02:00 · 2026-04-16 16:14:54 +02:00 · 7385f1e4ed
commit 7385f1e4ed
parent b875efcffc
2 changed files with 177 additions and 1 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,5 +1,181 @@
 # Changelog - Veza

+## [v1.0.5] - 2026-04-16
+
+### Hardening sprint — seven critical-path fixes before public opening
+
+Audit follow-up on the `register → verify → play` critical path. The app was
+functional on the surface but broken underneath: the player was silent, emails
+weren't really sent, the marketplace gave products away in production, the
+chat silently de-synced across pods, maintenance mode was per-pod only,
+orphaned tracks accumulated forever in `processing`, and the response cache
+was corrupting range-aware media responses. Seven targeted fixes, each with
+its own commit, its own tests, and no behaviour change outside scope.
+
+#### Fix 1 — Player muet (`veza-backend-api` + `apps/web`)
+
+- New `GET /api/v1/tracks/:id/stream` handler in
+  `internal/core/track/track_hls_handler.go`. Serves the raw file via
+  `http.ServeContent` — `Range`, `If-Modified-Since` and `If-None-Match`
+  handled for free, so `<audio>` seek works end-to-end.
+- Route registered in `routes_tracks.go` **unconditionally** (outside the
+  `HLSEnabled` gate) with `OptionalAuth` so both anonymous and authenticated
+  users can stream, and the `share_token` query path keeps working.
+- Frontend flag `FEATURES.HLS_STREAMING` default flipped from `true` to
+  `false` to match the backend's `HLS_STREAMING` default. The mismatch was
+  the root cause: hls.js was attaching to a 404 manifest and leaving the
+  audio element silent.
+- All playback URL builders (`feedService`, `discoverService`,
+  `playerService`, `PlayerQueue`, `SharedPlaylistPage`, `TrackSearchResults`,
+  `useLibraryManager`, `useTrackDetailPage`) redirected from `/download` to
+  `/stream`. `/download` remains for explicit downloads.
+- `useHLSPlayer` — when hls.js emits a fatal non-media error (manifest 404,
+  all network retries exhausted), the hook now destroys hls.js and swaps
+  the audio element onto `/api/v1/tracks/:id/stream` so operators turning
+  HLS on via feature flag don't re-break the player.
+- Tests: 6 Go unit tests covering invalid UUID, missing track, private-track
+  forbidden, missing file, full body stream, and `206 Partial Content` with
+  `Range: bytes=10-19`. MSW handler and `playerService.test.ts` assertion
+  updated.
+
+#### Fix 2 — Email verify bidon (`veza-backend-api` + `docker-compose.*`)
+
+- `core/auth/service.go`: the hard-coded `IsVerified: true` on registration
+  is gone. New users start as `is_verified=false` and the existing
+  `/auth/verify-email` endpoint (unchanged) flips them once they click the
+  link. `TestLogin_EmailNotVerified` now asserts the correct `403`
+  behaviour instead of silently accepting unverified logins.
+- Registration actually calls `emailService.SendVerificationEmail(...)`
+  (previously the code just `logger.Info("Sending verification email")`
+  without sending). On SMTP failure, the handler returns `500` in
+  production (fail-loud) and logs a warning in development so local
+  sign-ups keep flowing. Same treatment on
+  `password_reset_handler.RequestPasswordReset` — the log-only "don't fail
+  the user message" path is gone in prod.
+- New helper `isProductionEnv()` centralises the
+  `APP_ENV=="production"` check in both `core/auth` and `handlers`.
+- `docker-compose.yml` and `docker-compose.dev.yml` now ship MailHog
+  (`mailhog/mailhog:v1.0.1`, SMTP 1025, UI 8025). Backend dev env var
+  `SMTP_HOST=mailhog SMTP_PORT=1025` pre-wired.
+- Tests: all six `auth` tests adapted to the new async flow
+  (`expectRegister` adds a `SendVerificationEmail` mock, `Login_Success`
+  tests manually flip `is_verified` after `Register` to simulate the click
+  on the verification link).
+
+#### Fix 3 — Marketplace gratuit (`internal/config/config.go`)
+
+- `ValidateForEnvironment` now refuses `APP_ENV=production` with
+  `HYPERSWITCH_ENABLED=false`. Without payments enabled, the marketplace
+  flow completes orders as `CREATED` and releases files without charging —
+  effectively free. The guard is loud ("...effectively giving away
+  products. Set HYPERSWITCH_ENABLED=true...") because a silent misconfig
+  here is a revenue leak.
+- Called at boot from `NewConfig()` line 513 — config validation happens
+  before any HTTP listener starts, so a bad prod config fails fast.
+- Tests: 3 new cases (`_fails`, `_succeeds`, `non-production is
+  unaffected`) in `validation_test.go`.
+
+#### Fix 4 — Redis obligatoire multi-pod (`config.go` + `chat_pubsub.go`)
+
+- Same `ValidateForEnvironment` now requires `REDIS_URL` to be
+  **explicitly** set in production. The struct field has a default
+  (`redis://<appDomain>:6379`) that let misconfigured pods boot against
+  a phantom host and silently degrade to in-memory PubSub — which is
+  fine on one pod and catastrophic on two (chat messages on pod A never
+  reach subscribers on pod B).
+- `ChatPubSubService` constructor now emits `ERROR` (was silent) when
+  `redisClient` is nil, with a message explicitly naming the failure
+  mode: "cross-instance messages will be lost". Same treatment for
+  `Publish` fallbacks — `Warn` → `Error`, because runbook-worthy.
+- Tests: `chat_pubsub_test.go` added (constructor log assertion +
+  in-memory fan-out happy path) plus 1 new case in `validation_test.go`.
+
+#### Fix 5 — Maintenance mode persisté en DB (`middleware/maintenance.go`)
+
+- Migration `976_platform_settings.sql` introduces a typed key/value
+  table and seeds `maintenance_mode=false`. Column split into
+  `value_bool` / `value_text` so we avoid string parsing in the hot
+  path.
+- `middleware/maintenance.go` rewritten. `InitMaintenanceMode(db,
+  logger)` wires a DB pool at boot; `MaintenanceModeEnabled()` reads
+  from a 10-second TTL cache and refreshes lazily on the next request.
+  Toggling on one pod propagates to every pod within ~10 s.
+- Admin endpoint `PUT /api/v1/admin/maintenance` now persists via
+  `INSERT ... ON CONFLICT DO UPDATE` before calling the in-memory
+  setter, so the change survives restarts and is visible cluster-wide.
+- Tests: new `TestMaintenanceGin_DBBacked` flips the DB row, waits
+  past TTL, and asserts the cache picked up the change. Existing
+  tests preserved.
+
+#### Fix 6 — Cleanup tracks orphelines (`internal/jobs/`)
+
+- New `CleanupOrphanTracks` worker. Tracks stuck in `processing` for
+  more than one hour with no file on disk (uploader crashed, container
+  restart during upload, disk wipe) flip to `status=failed` with
+  `status_message = "orphan cleanup: file missing on disk after >1h in
+  processing"`. Never deletes the row, never touches present files or
+  already-failed rows, safe to re-run.
+- `ScheduleOrphanTracksCleanup(db, logger)` runs once at boot and then
+  hourly — wired in `cmd/api/main.go` alongside the HTTP listener.
+- Tests: 5 cases in `cleanup_orphan_tracks_test.go` covering the happy
+  path and four negatives (file still present, track too recent, already
+  failed, nil database).
+
+#### Fix 7 — Response cache corrupting binary media (`middleware/response_cache.go`)
+
+Surfaced by the v1.0.5 browser smoke test. `ResponseCache` captures the
+entire body into a `bytes.Buffer`, JSON-serialises it (escaping non-UTF-8
+bytes) and replays via `c.Data` for subsequent hits. For `/stream`,
+`/download` and `/hls/*` this had two failure modes:
+
+  1. `Range` headers were never honoured — the cache replayed the full
+     body on every request, stripped `Accept-Ranges`, and left the
+     `<audio>` element unable to seek. A `Range: bytes=100-299` request
+     got back `200 OK` with 48 944 bytes instead of `206` with 200.
+  2. Non-UTF-8 bytes got escaped through the JSON round-trip
+     (`\uFFFD` substitution etc.), corrupting the MP3 payload so even
+     full plays could fail mid-stream (served body MD5 diverged from
+     the source file).
+
+Fix: skip the cache entirely for any path containing `/stream`,
+`/download` or `/hls/`, and for any request carrying a `Range` header
+(belt-and-suspenders for any future media endpoint). All other
+anonymous GETs keep their 5-minute TTL.
+
+Live verification after patch:
+  - Full GET: `200 OK`, `Accept-Ranges: bytes`, `Content-Length: 48944`,
+    served body MD5 matches source file byte-for-byte.
+  - Range `100-299`: `206 Partial Content`,
+    `Content-Range: bytes 100-299/48944`, exactly 200 bytes.
+  - Browser `<audio>.play()` succeeds, `currentTime` progresses,
+    `seek(1.5)` accepted (`readyState=4`, no error).
+
+### Production guards summary
+
+`config.go:886 Validate()` (base) + `config.go:810 ValidateForEnvironment()`
+(per-env) — the prod branch now rejects boot if any of:
+
+- `CORS_ALLOWED_ORIGINS` missing or contains `*`
+- `LOG_LEVEL=DEBUG`
+- `CLAMAV_REQUIRED != true`
+- `CHAT_JWT_SECRET == JWT_SECRET`
+- `OAUTH_ENCRYPTION_KEY` shorter than 32 bytes
+- `JWT_ISSUER` / `JWT_AUDIENCE` empty
+- **`HYPERSWITCH_ENABLED != true`** (new)
+- **`REDIS_URL` not explicitly set** (new)
+
+### Known gaps (parked for v1.0.6)
+
+- Hyperswitch refund path doesn't propagate to PSP
+- Livestream has no UI feedback when `nginx-rtmp` is down
+- Upload size mismatch (front 500 MB, back 100 MB)
+- RabbitMQ silent drop on enqueue failure
+- `SMTP_HOST` not injected in `make dev` (host-mode ergonomics, not a
+  code bug — the SMTP_HOST env is only wired into the `docker-dev`
+  profile where the backend runs in-container)
+- Upload route gated by `creator` role with no self-service path to
+  the role — new users can't upload without manual DB escalation
+
 ## [v1.0.4] - 2026-04-15

 ### Cleanup sprint — 7 jours de nettoyage post-audit
--- a/2
+++ b/2
@ -1 +1 @@
-1.0.4
+1.0.5
 @ -1 +1 @@
 .0.4
 .0.5