Commit graph

12 commits

Author SHA1 Message Date
senke
15e591305e feat(cdn): Bunny.net signed URLs + HLS cache headers + metric collision fix (W3 Day 13)
Some checks failed
Veza CI / Rust (Stream Server) (push) Successful in 5m12s
Security Scan / Secret Scanning (gitleaks) (push) Failing after 54s
Veza CI / Backend (Go) (push) Failing after 8m38s
Veza CI / Frontend (Web) (push) Failing after 16m44s
Veza CI / Notify on failure (push) Successful in 15s
E2E Playwright / e2e (full) (push) Successful in 20m28s
CDN edge in front of S3/MinIO via origin-pull. Backend signs URLs
with Bunny.net token-auth (SHA-256 over security_key + path + expires)
so edges verify before serving cached objects ; origin is never hit
on a valid token. Cloudflare CDN / R2 / CloudFront stubs kept.

- internal/services/cdn_service.go : new providers CDNProviderBunny +
  CDNProviderCloudflareR2. SecurityKey added to CDNConfig.
  generateBunnySignedURL implements the documented Bunny scheme
  (url-safe base64, no padding, expires query). HLSSegmentCacheHeaders
  + HLSPlaylistCacheHeaders helpers exported for handlers.
- internal/services/cdn_service_test.go : pin Bunny URL shape +
  base64-url charset ; assert empty SecurityKey fails fast (no
  silent fallback to unsigned URLs).
- internal/core/track/service.go : new CDNURLSigner interface +
  SetCDNService(cdn). GetStorageURL prefers CDN signed URL when
  cdnService.IsEnabled, falls back to direct S3 presign on signing
  error so a CDN partial outage doesn't block playback.
- internal/api/routes_tracks.go + routes_core.go : wire SetCDNService
  on the two TrackService construction sites that serve stream/download.
- internal/config/config.go : 4 new env vars (CDN_ENABLED, CDN_PROVIDER,
  CDN_BASE_URL, CDN_SECURITY_KEY). config.CDNService always non-nil
  after init ; IsEnabled gates the actual usage.
- internal/handlers/hls_handler.go : segments now return
  Cache-Control: public, max-age=86400, immutable (content-addressed
  filenames make this safe). Playlists at max-age=60.
- veza-backend-api/.env.template : 4 placeholder env vars.
- docs/ENV_VARIABLES.md §12 : provider matrix + Bunny vs Cloudflare
  vs R2 trade-offs.

Bug fix collateral : v1.0.9 Day 11 introduced veza_cache_hits_total
which collided in name with monitoring.CacheHitsTotal (different
label set ⇒ promauto MustRegister panic at process init). Day 13
deletes the monitoring duplicate and restores the metrics-package
counter as the single source of truth (label: subsystem). All 8
affected packages green : services, core/track, handlers, middleware,
websocket/chat, metrics, monitoring, config.

Acceptance (Day 13) : code path is wired ; verifying via real Bunny
edge requires a Pull Zone provisioned by the user (EX-? in roadmap).
On the user side : create Pull Zone w/ origin = MinIO, copy token
auth key into CDN_SECURITY_KEY, set CDN_ENABLED=true.

W3 progress : Redis Sentinel ✓ · MinIO distribué ✓ · CDN ✓ ·
DMCA  Day 14 · embed  Day 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 14:07:20 +02:00
senke
d86815561c feat(infra): MinIO distributed EC:2 + migration script (W3 Day 12)
Some checks failed
Veza CI / Rust (Stream Server) (push) Successful in 5m21s
Security Scan / Secret Scanning (gitleaks) (push) Failing after 54s
Veza CI / Backend (Go) (push) Failing after 8m27s
Veza CI / Notify on failure (push) Successful in 6s
E2E Playwright / e2e (full) (push) Failing after 12m42s
Veza CI / Frontend (Web) (push) Successful in 15m49s
Four-node distributed MinIO cluster, single erasure set EC:2, tolerates
2 simultaneous node losses. 50% storage efficiency. Pinned to
RELEASE.2025-09-07T16-13-09Z to match docker-compose so dev/prod
parity is preserved.

- infra/ansible/roles/minio_distributed/ : install pinned binary,
  systemd unit pointed at MINIO_VOLUMES with bracket-expansion form,
  EC:2 forced via MINIO_STORAGE_CLASS_STANDARD. Vault assertion
  blocks shipping placeholder credentials to staging/prod.
- bucket init : creates veza-prod-tracks, enables versioning, applies
  lifecycle.json (30d noncurrent expiry + 7d abort-multipart). Cold-tier
  transition ready but inert until minio_remote_tier_name is set.
- infra/ansible/playbooks/minio_distributed.yml : provisions the 4
  containers, applies common baseline + role.
- infra/ansible/inventory/lab.yml : new minio_nodes group.
- infra/ansible/tests/test_minio_resilience.sh : kill 2 nodes,
  verify EC:2 reconstruction (read OK + checksum matches), restart,
  wait for self-heal.
- scripts/minio-migrate-from-single.sh : mc mirror --preserve from
  the single-node bucket to the new cluster, count-verifies, prints
  rollout next-steps.
- config/prometheus/alert_rules.yml : MinIODriveOffline (warn) +
  MinIONodesUnreachable (page) — page fires at >= 2 nodes unreachable
  because that's the redundancy ceiling for EC:2.
- docs/ENV_VARIABLES.md §12 : MinIO migration cross-ref.

Acceptance (Day 12) : EC:2 survives 2 concurrent kills + self-heals.
Lab apply pending. No backend code change — interface stays AWS S3.

W3 progress : Redis Sentinel ✓ (Day 11), MinIO distribué ✓ (this),
CDN  Day 13, DMCA  Day 14, embed  Day 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 13:46:42 +02:00
senke
a36d9b2d59 feat(redis): Sentinel HA + cache hit rate metrics (W3 Day 11)
Some checks failed
Veza CI / Backend (Go) (push) Failing after 8m56s
Veza CI / Frontend (Web) (push) Has been cancelled
E2E Playwright / e2e (full) (push) Has been cancelled
Veza CI / Notify on failure (push) Blocked by required conditions
Veza CI / Rust (Stream Server) (push) Successful in 5m3s
Security Scan / Secret Scanning (gitleaks) (push) Failing after 53s
Three Incus containers, each running redis-server + redis-sentinel
(co-located). redis-1 = master at first boot, redis-2/3 = replicas.
Sentinel quorum=2 of 3 ; failover-timeout=30s satisfies the W3
acceptance criterion.

- internal/config/redis_init.go : initRedis branches on
  REDIS_SENTINEL_ADDRS ; non-empty -> redis.NewFailoverClient with
  MasterName + SentinelAddrs + SentinelPassword. Empty -> existing
  single-instance NewClient (dev/local stays parametric).
- internal/config/config.go : 3 new fields (RedisSentinelAddrs,
  RedisSentinelMasterName, RedisSentinelPassword) read from env.
  parseRedisSentinelAddrs trims+filters CSV.
- internal/metrics/cache_hit_rate.go : new RecordCacheHit / Miss
  counters, labelled by subsystem. Cardinality bounded.
- internal/middleware/rate_limiter.go : instrument 3 Eval call sites
  (DDoS, frontend log throttle, upload throttle). Hit = Redis answered,
  Miss = error -> in-memory fallback.
- internal/services/chat_pubsub.go : instrument Publish + PublishPresence.
- internal/websocket/chat/presence_service.go : instrument SetOnline /
  SetOffline / Heartbeat / GetPresence. redis.Nil counts as a hit
  (legitimate empty result).
- infra/ansible/roles/redis_sentinel/ : install Redis 7 + Sentinel,
  render redis.conf + sentinel.conf, systemd units. Vault assertion
  prevents shipping placeholder passwords to staging/prod.
- infra/ansible/playbooks/redis_sentinel.yml : provisions the 3
  containers + applies common baseline + role.
- infra/ansible/inventory/lab.yml : new groups redis_ha + redis_ha_master.
- infra/ansible/tests/test_redis_failover.sh : kills the master
  container, polls Sentinel for the new master, asserts elapsed < 30s.
- config/grafana/dashboards/redis-cache-overview.json : 3 hit-rate
  stats (rate_limiter / chat_pubsub / presence) + ops/s breakdown.
- docs/ENV_VARIABLES.md §3 : 3 new REDIS_SENTINEL_* env vars.
- veza-backend-api/.env.template : 3 placeholders (empty default).

Acceptance (Day 11) : Sentinel failover < 30s ; cache hit-rate
dashboard populated. Lab test pending Sentinel deployment.

W3 verification gate progress : Redis Sentinel ✓ (this commit),
MinIO EC4+2  Day 12, CDN  Day 13, DMCA  Day 14, embed  Day 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 13:36:55 +02:00
senke
84e92a75e2 feat(observability): OTel SDK + collector + Tempo + 4 hot path spans (W2 Day 9)
Some checks failed
Veza CI / Notify on failure (push) Blocked by required conditions
Security Scan / Secret Scanning (gitleaks) (push) Waiting to run
Veza CI / Backend (Go) (push) Has been cancelled
Veza CI / Rust (Stream Server) (push) Has been cancelled
Veza CI / Frontend (Web) (push) Has been cancelled
E2E Playwright / e2e (full) (push) Has been cancelled
Wires distributed tracing end-to-end. Backend exports OTLP/gRPC to a
collector, which tail-samples (errors + slow always, 10% rest) and
ships to Tempo. Grafana service-map dashboard pivots on the 4
instrumented hot paths.

- internal/tracing/otlp_exporter.go : InitOTLPTracer + Provider.Shutdown,
  BatchSpanProcessor (5s/512 batch), ParentBased(TraceIDRatio) sampler,
  W3C trace-context + baggage propagators. OTEL_SDK_DISABLED=true
  short-circuits to a no-op. Failure to dial collector is non-fatal.
- cmd/api/main.go : init at boot, defer Shutdown(5s) on exit. appVersion
  ldflag-overridable for resource attributes.
- 4 hot paths instrumented :
    * handlers/auth.go::Login           → "auth.login"
    * core/track/track_upload_handler.go::InitiateChunkedUpload → "track.upload.initiate"
    * core/marketplace/service.go::ProcessPaymentWebhook → "payment.webhook"
    * handlers/search_handlers.go::Search → "search.query"
  PII guarded — email masked, query content not recorded (length only).
- infra/ansible/roles/otel_collector : pin v0.116.1 contrib build,
  systemd unit, tail-sampling config (errors + > 500ms always kept).
- infra/ansible/roles/tempo : pin v2.7.1 monolithic, local-disk backend
  (S3 deferred to v1.1), 14d retention.
- infra/ansible/playbooks/observability.yml : provisions both Incus
  containers + applies common baseline + roles in order.
- inventory/lab.yml : new groups observability, otel_collectors, tempo.
- config/grafana/dashboards/service-map.json : node graph + 4 hot-path
  span tables + collector throughput/queue panels.
- docs/ENV_VARIABLES.md §30 : 4 OTEL_* env vars documented.

Acceptance criterion (Day 9) : login → span visible in Tempo UI. Lab
deployment to validate with `ansible-playbook -i inventory/lab.yml
playbooks/observability.yml` once roles/postgres_ha is up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 01:15:11 +02:00
senke
b8eed72f96 feat(webrtc): coturn ICE config endpoint + frontend wiring + ops template (v1.0.9 item 1.2)
Closes FUNCTIONAL_AUDIT.md §4 #1: WebRTC 1:1 calls had working
signaling but no NAT traversal, so calls between two peers behind
symmetric NAT (corporate firewalls, mobile carrier CGNAT, Incus
container default networking) failed silently after the SDP exchange.

Backend:
  - GET /api/v1/config/webrtc (public) returns {iceServers: [...]}
    built from WEBRTC_STUN_URLS / WEBRTC_TURN_URLS / *_USERNAME /
    *_CREDENTIAL env vars. Half-config (URLs without creds, or vice
    versa) deliberately omits the TURN block — a half-configured TURN
    surfaces auth errors at call time instead of falling back cleanly
    to STUN-only.
  - 4 handler tests cover the matrix.

Frontend:
  - services/api/webrtcConfig.ts caches the config for the page
    lifetime and falls back to the historical hardcoded Google STUN
    if the fetch fails.
  - useWebRTC fetches at mount, hands iceServers synchronously to
    every RTCPeerConnection, exposes a {hasTurn, loaded} hint.
  - CallButton tooltip warns up-front when TURN isn't configured
    instead of letting calls time out silently.

Ops:
  - infra/coturn/turnserver.conf — annotated template with the SSRF-
    safe denied-peer-ip ranges, prometheus exporter, TLS for TURNS,
    static lt-cred-mech (REST-secret rotation deferred to v1.1).
  - infra/coturn/README.md — Incus deploy walkthrough, smoke test
    via turnutils_uclient, capacity rules of thumb.
  - docs/ENV_VARIABLES.md gains a 13bis. WebRTC ICE servers section.

Coturn deployment itself is a separate ops action — this commit lands
the plumbing so the deploy can light up the path with zero code
changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 23:38:42 +02:00
senke
d03232c85c feat(storage): add track storage_backend column + config prep (v1.0.8 P0)
Some checks failed
Veza CI / Backend (Go) (push) Failing after 0s
Veza CI / Frontend (Web) (push) Failing after 0s
Veza CI / Rust (Stream Server) (push) Failing after 0s
Security Scan / Secret Scanning (gitleaks) (push) Failing after 0s
Veza CI / Notify on failure (push) Failing after 0s
Phase 0 of the MinIO upload migration (FUNCTIONAL_AUDIT §4 item 2).
Schema + config only — Phase 1 will wire TrackService.UploadTrack()
to actually route writes to S3 when the flag is flipped.

Schema (migration 985):
- tracks.storage_backend VARCHAR(16) NOT NULL DEFAULT 'local'
  CHECK in ('local', 's3')
- tracks.storage_key VARCHAR(512) NULL (S3 object key when backend=s3)
- Partial index on storage_backend = 's3' (migration progress queries)
- Rollback drops both columns + index; safe only while all rows are
  still 'local' (guard query in the rollback comment)

Go model (internal/models/track.go):
- StorageBackend string (default 'local', not null)
- StorageKey *string (nullable)
- Both tagged json:"-" — internal plumbing, never exposed publicly

Config (internal/config/config.go):
- New field Config.TrackStorageBackend
- Read from TRACK_STORAGE_BACKEND env var (default 'local')
- Production validation rule #11 (ValidateForEnvironment):
  - Must be 'local' or 's3' (reject typos like 'S3' or 'minio')
  - If 's3', requires AWS_S3_ENABLED=true (fail fast, do not boot with
    TrackStorageBackend=s3 while S3StorageService is nil)
- Dev/staging warns and falls back to 'local' instead of fail — keeps
  iteration fast while still flagging misconfig.

Docs:
- docs/ENV_VARIABLES.md §13 restructured as "HLS + track storage backend"
  with a migration playbook (local → s3 → migrate-storage CLI)
- docs/ENV_VARIABLES.md §28 validation rules: +2 entries for new rules
- docs/ENV_VARIABLES.md §29 drift findings: TRACK_STORAGE_BACKEND added
  to "missing from template" list before it was fixed
- veza-backend-api/.env.template: TRACK_STORAGE_BACKEND=local with
  comment pointing at Phase 1/2/3 plans

No behavior change yet — TrackService.UploadTrack() still hardcodes the
local path via copyFileAsync(). Phase 1 wires it.

Refs:
- AUDIT_REPORT.md §9 item (deferrals v1.0.8)
- FUNCTIONAL_AUDIT.md §4 item 2 "Stockage local disque only"
- /home/senke/.claude/plans/audit-fonctionnel-wild-hickey.md Item 3

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 19:54:28 +02:00
senke
7d03ee6686 docs(env): canonicalize ENV_VARIABLES.md + add HLS_STREAMING template
Some checks failed
Veza CI / Backend (Go) (push) Failing after 0s
Veza CI / Frontend (Web) (push) Failing after 0s
Veza CI / Rust (Stream Server) (push) Failing after 0s
Security Scan / Secret Scanning (gitleaks) (push) Failing after 0s
Veza CI / Notify on failure (push) Failing after 0s
Resolves AUDIT_REPORT §9 item #15 (last real item before v1.0.7 final)
and FUNCTIONAL_AUDIT §4 stability item 5.

docs/ENV_VARIABLES.md:
- Complete rewrite from 172 → ~600 lines covering all ~180 env vars
  surveyed directly from code (os.Getenv in Go, std::env::var in Rust,
  import.meta.env in React).
- 30 sections: core, DB, Redis, JWT, OAuth, CORS, rate-limit, SMTP,
  Hyperswitch, Stripe Connect, RabbitMQ, S3/MinIO, HLS, stream server,
  Elasticsearch, ClamAV, Sentry, logging, metrics, frontend Vite,
  feature flags, password policy, build info, RTMP/misc, Rust stream
  schema, security headers recap, deprecated vars, prod validation
  rules, drift findings, startup checklist.
- Documents 8 production-critical validation rules (validation.go:869-1018).
- Flags 14 deprecated vars with canonical replacements for v1.1.0 cleanup.
- Catalogs 11 vars used by code but missing from template (HLS_STREAMING,
  SLOW_REQUEST_THRESHOLD_MS, CONFIG_WATCH, HANDLER_TIMEOUT, VAPID_*, etc).

veza-backend-api/.env.template:
- Add HLS_STREAMING=false with documentation of fallback behavior
  (/tracks/:id/stream with Range support when off).
- Add HLS_STORAGE_DIR=/tmp/veza-hls.

Closes last blocker before v1.0.7 final tag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 14:36:44 +02:00
senke
eb2862092d feat(v0.10.6): Livestreaming basique F471-F476
Some checks failed
Backend API CI / test-unit (push) Failing after 0s
Backend API CI / test-integration (push) Failing after 0s
Frontend CI / test (push) Failing after 0s
Storybook Audit / Build & audit Storybook (push) Failing after 0s
- Backend: callbacks on_publish/on_publish_done, UpdateStreamURL, GetByStreamKey
- Nginx-RTMP: config infra, docker-compose service (profil live)
- Frontend: stream_url dans LiveStream, HLS.js dans LiveViewPlayer, état Stream terminé
- Chat: rate limit send_live_message 1 msg/3s pour rooms live_streams
- Env: RTMP_CALLBACK_SECRET, STREAM_HLS_BASE_URL, NGINX_RTMP_HOST
- Roadmap v0.10.6 marquée DONE
2026-03-10 10:21:57 +01:00
senke
171a154763 feat(v0.10.2): Recherche fulltext Elasticsearch - F361-F365
- Elasticsearch 8.x dans docker-compose.dev
- Package internal/elasticsearch: client, config, mappings, indices
- Sync PG→ES: reindex tracks/users/playlists, IndexTrack/DeleteTrack
- SearchService ES: multi_match + fuzziness (typo tolerance), highlighting
- Fallback gracieux: PostgreSQL si ELASTICSEARCH_URL absent
- Routes: GET /search, GET /search/suggestions, POST /admin/search/reindex
- Frontend: searchApi cursor/limit params (extensibilité)
- docs/ENV_VARIABLES: ELASTICSEARCH_URL, ELASTICSEARCH_INDEX, ELASTICSEARCH_AUTO_INDEX
- Roadmap v0.10.2 → DONE
2026-03-09 10:13:18 +01:00
senke
5197bd24ee v0.9.3 2026-03-05 19:35:57 +01:00
senke
b6c004319c v0.9.2
Some checks failed
Backend API CI / test-unit (push) Failing after 0s
Backend API CI / test-integration (push) Failing after 0s
2026-03-05 19:27:34 +01:00
senke
2df921abd5 v0.9.1 2026-03-05 19:22:31 +01:00