senke/veza - Talas Project: Beyond coding. We Forge.

senke/veza

Author	SHA1	Message	Date
senke	d5152d89a2	feat(stream): HLS default on + marketplace 30s pre-listen + FLAC tier checkbox (W4 Day 17) Some checks failed Veza CI / Rust (Stream Server) (push) Successful in 5m28s Details Security Scan / Secret Scanning (gitleaks) (push) Failing after 53s Details Veza CI / Backend (Go) (push) Failing after 7m59s Details Veza CI / Frontend (Web) (push) Failing after 17m43s Details Veza CI / Notify on failure (push) Successful in 4s Details E2E Playwright / e2e (full) (push) Failing after 20m55s Details Three pieces shipping under one banner since they're the day's deliverables and share no review-time coupling : 1. HLS_STREAMING default flipped true - config.go : getEnvBool default true (was false). Operators wanting a lightweight dev / unit-test env explicitly set HLS_STREAMING=false to skip the transcoder pipeline. - .env.template : default flipped + comment explaining the opt-out. - Effect : every new track upload routes through the HLS transcoder by default ; ABR ladder served via /tracks/:id/master.m3u8. 2. Marketplace 30s pre-listen (creator opt-in) - migrations/989 : adds products.preview_enabled BOOLEAN NOT NULL DEFAULT FALSE + partial index on TRUE values. Default off so adoption is opt-in. - core/marketplace/models.go : PreviewEnabled field on Product. - handlers/marketplace.go : StreamProductPreview gains a fall-through. When no file-based ProductPreview exists AND the product is a track product AND preview_enabled=true, redirect to the underlying /tracks/:id/stream?preview=30. Header X-Preview-Cap-Seconds: 30 surfaces the policy. - core/track/track_hls_handler.go : StreamTrack accepts ?preview=30 and gates anonymous access via isMarketplacePreviewAllowed (raw SQL probe of products.preview_enabled to avoid the track→marketplace import cycle ; the reverse arrow already exists). - Trust model : 30s cap is enforced client-side (HTML5 audio currentTime). Industry standard for tease-to-buy ; not anti-rip. Documented in the migration + handler doc comment. 3. FLAC tier preview checkbox (Premium-gated, hidden by default) - upload-modal/constants.ts : optional flacAvailable on UploadFormData. - upload-modal/UploadModalMetadataForm.tsx : new optional props showFlacAvailable + flacAvailable + onFlacAvailableChange. Checkbox renders only when showFlacAvailable=true ; consumers pass that based on the user's role/subscription tier (deferred to caller wiring — Item G phase 4 will replace the role check with a real subscription-tier check). - Today the checkbox is a UI affordance only ; the actual lossless distribution path (ladder + storage class) is post-launch work. Acceptance (Day 17) : new uploads serve HLS ABR by default ; products.preview_enabled flag wires anonymous 30s pre-listen ; checkbox visible to premium users on the upload form. All 4 tested backend packages pass : handlers, core/track, core/marketplace, config. W4 progress : Day 16 ✓ · Day 17 ✓ · Day 18 (faceted search) ⏳ · Day 19 (HAProxy sticky WS) ⏳ · Day 20 (k6 nightly) ⏳. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 09:56:02 +02:00
senke	15e591305e	feat(cdn): Bunny.net signed URLs + HLS cache headers + metric collision fix (W3 Day 13) Some checks failed Veza CI / Rust (Stream Server) (push) Successful in 5m12s Details Security Scan / Secret Scanning (gitleaks) (push) Failing after 54s Details Veza CI / Backend (Go) (push) Failing after 8m38s Details Veza CI / Frontend (Web) (push) Failing after 16m44s Details Veza CI / Notify on failure (push) Successful in 15s Details E2E Playwright / e2e (full) (push) Successful in 20m28s Details CDN edge in front of S3/MinIO via origin-pull. Backend signs URLs with Bunny.net token-auth (SHA-256 over security_key + path + expires) so edges verify before serving cached objects ; origin is never hit on a valid token. Cloudflare CDN / R2 / CloudFront stubs kept. - internal/services/cdn_service.go : new providers CDNProviderBunny + CDNProviderCloudflareR2. SecurityKey added to CDNConfig. generateBunnySignedURL implements the documented Bunny scheme (url-safe base64, no padding, expires query). HLSSegmentCacheHeaders + HLSPlaylistCacheHeaders helpers exported for handlers. - internal/services/cdn_service_test.go : pin Bunny URL shape + base64-url charset ; assert empty SecurityKey fails fast (no silent fallback to unsigned URLs). - internal/core/track/service.go : new CDNURLSigner interface + SetCDNService(cdn). GetStorageURL prefers CDN signed URL when cdnService.IsEnabled, falls back to direct S3 presign on signing error so a CDN partial outage doesn't block playback. - internal/api/routes_tracks.go + routes_core.go : wire SetCDNService on the two TrackService construction sites that serve stream/download. - internal/config/config.go : 4 new env vars (CDN_ENABLED, CDN_PROVIDER, CDN_BASE_URL, CDN_SECURITY_KEY). config.CDNService always non-nil after init ; IsEnabled gates the actual usage. - internal/handlers/hls_handler.go : segments now return Cache-Control: public, max-age=86400, immutable (content-addressed filenames make this safe). Playlists at max-age=60. - veza-backend-api/.env.template : 4 placeholder env vars. - docs/ENV_VARIABLES.md §12 : provider matrix + Bunny vs Cloudflare vs R2 trade-offs. Bug fix collateral : v1.0.9 Day 11 introduced veza_cache_hits_total which collided in name with monitoring.CacheHitsTotal (different label set ⇒ promauto MustRegister panic at process init). Day 13 deletes the monitoring duplicate and restores the metrics-package counter as the single source of truth (label: subsystem). All 8 affected packages green : services, core/track, handlers, middleware, websocket/chat, metrics, monitoring, config. Acceptance (Day 13) : code path is wired ; verifying via real Bunny edge requires a Pull Zone provisioned by the user (EX-? in roadmap). On the user side : create Pull Zone w/ origin = MinIO, copy token auth key into CDN_SECURITY_KEY, set CDN_ENABLED=true. W3 progress : Redis Sentinel ✓ · MinIO distribué ✓ · CDN ✓ · DMCA ⏳ Day 14 · embed ⏳ Day 15. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 14:07:20 +02:00
senke	a36d9b2d59	feat(redis): Sentinel HA + cache hit rate metrics (W3 Day 11) Some checks failed Veza CI / Backend (Go) (push) Failing after 8m56s Details Veza CI / Frontend (Web) (push) Has been cancelled Details E2E Playwright / e2e (full) (push) Has been cancelled Details Veza CI / Notify on failure (push) Blocked by required conditions Details Veza CI / Rust (Stream Server) (push) Successful in 5m3s Details Security Scan / Secret Scanning (gitleaks) (push) Failing after 53s Details Three Incus containers, each running redis-server + redis-sentinel (co-located). redis-1 = master at first boot, redis-2/3 = replicas. Sentinel quorum=2 of 3 ; failover-timeout=30s satisfies the W3 acceptance criterion. - internal/config/redis_init.go : initRedis branches on REDIS_SENTINEL_ADDRS ; non-empty -> redis.NewFailoverClient with MasterName + SentinelAddrs + SentinelPassword. Empty -> existing single-instance NewClient (dev/local stays parametric). - internal/config/config.go : 3 new fields (RedisSentinelAddrs, RedisSentinelMasterName, RedisSentinelPassword) read from env. parseRedisSentinelAddrs trims+filters CSV. - internal/metrics/cache_hit_rate.go : new RecordCacheHit / Miss counters, labelled by subsystem. Cardinality bounded. - internal/middleware/rate_limiter.go : instrument 3 Eval call sites (DDoS, frontend log throttle, upload throttle). Hit = Redis answered, Miss = error -> in-memory fallback. - internal/services/chat_pubsub.go : instrument Publish + PublishPresence. - internal/websocket/chat/presence_service.go : instrument SetOnline / SetOffline / Heartbeat / GetPresence. redis.Nil counts as a hit (legitimate empty result). - infra/ansible/roles/redis_sentinel/ : install Redis 7 + Sentinel, render redis.conf + sentinel.conf, systemd units. Vault assertion prevents shipping placeholder passwords to staging/prod. - infra/ansible/playbooks/redis_sentinel.yml : provisions the 3 containers + applies common baseline + role. - infra/ansible/inventory/lab.yml : new groups redis_ha + redis_ha_master. - infra/ansible/tests/test_redis_failover.sh : kills the master container, polls Sentinel for the new master, asserts elapsed < 30s. - config/grafana/dashboards/redis-cache-overview.json : 3 hit-rate stats (rate_limiter / chat_pubsub / presence) + ops/s breakdown. - docs/ENV_VARIABLES.md §3 : 3 new REDIS_SENTINEL_* env vars. - veza-backend-api/.env.template : 3 placeholders (empty default). Acceptance (Day 11) : Sentinel failover < 30s ; cache hit-rate dashboard populated. Lab test pending Sentinel deployment. W3 verification gate progress : Redis Sentinel ✓ (this commit), MinIO EC4+2 ⏳ Day 12, CDN ⏳ Day 13, DMCA ⏳ Day 14, embed ⏳ Day 15. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 13:36:55 +02:00
senke	ba6e8b4e0e	feat(infra): pgbouncer role + pgbench load test (W2 Day 7) All checks were successful Veza CI / Rust (Stream Server) (push) Successful in 3m49s Details Security Scan / Secret Scanning (gitleaks) (push) Successful in 58s Details Veza CI / Backend (Go) (push) Successful in 5m59s Details Veza CI / Frontend (Web) (push) Successful in 15m22s Details E2E Playwright / e2e (full) (push) Successful in 19m34s Details Veza CI / Notify on failure (push) Has been skipped Details ROADMAP_V1.0_LAUNCH.md §Semaine 2 day 7 deliverable: PgBouncer fronts the pg_auto_failover formation, the backend pays the postgres-fork cost 50 times per pool refresh instead of once per HTTP handler. Wiring: veza-backend-api ──libpq──▶ pgaf-pgbouncer:6432 ──libpq──▶ pgaf-primary:5432 (1000 client cap) (50 server pool) Files: infra/ansible/roles/pgbouncer/ defaults/main.yml — pool sizes match the acceptance target (1000 client × 50 server × 10 reserve), pool_mode=transaction (the only safe mode given the backend's session usage — LISTEN/NOTIFY and cross-tx prepared statements are forbidden, neither of which Veza uses), DNS TTL = 60s for failover. tasks/main.yml — apt install pgbouncer + postgresql-client (so the pgbench / admin psql lives on the same container), render pgbouncer.ini + userlist.txt, ensure /var/log/postgresql for the file log, enable + start service. templates/pgbouncer.ini.j2 — full config; databases section points at pgaf-primary.lxd:5432 directly. Failover follows via DNS TTL until the W2 day 8 pg_autoctl state-change hook that issues RELOAD on the admin console. templates/userlist.txt.j2 — only rendered when auth_type != trust. Lab uses trust on the bridge subnet; prod gets a vault-backed list of md5/scram hashes. handlers/main.yml — RELOAD pgbouncer (graceful, doesn't drop established clients). README.md — operational cheatsheet: - SHOW POOLS / SHOW STATS via the admin console - the transaction-mode forbids list (LISTEN/NOTIFY etc.) - failover behaviour today vs after the W2-day-8 hook lands infra/ansible/playbooks/postgres_ha.yml Provision step extended to launch pgaf-pgbouncer alongside the formation containers. Two new plays at the bottom apply common baseline + pgbouncer role to it. infra/ansible/inventory/lab.yml `pgbouncer` group with pgaf-pgbouncer reachable via the community.general.incus connection plugin (consistent with the postgres_ha containers). infra/ansible/tests/test_pgbouncer_load.sh Acceptance: pgbench 500 clients × 30s × 8 threads against the pgbouncer endpoint, must report 0 failed transactions and 0 connection errors. Also runs `pgbench -i -s 10` first to initialise the standard fixture — that init goes through pgbouncer too, which incidentally validates transaction-mode compatibility before the load run starts. Exit codes: 0 / 1 (errors) / 2 (unreachable) / 3 (missing tool). veza-backend-api/internal/config/config.go Comment block above DATABASE_URL load — documents the prod wiring (DATABASE_URL points at pgaf-pgbouncer.lxd:6432, NOT at pgaf-primary directly). Also notes the dev/CI exception: direct Postgres because the small scale doesn't benefit from pooling and tests occasionally lean on session-scoped GUCs that transaction-mode would break. Acceptance verified locally: $ ansible-playbook -i inventory/lab.yml playbooks/postgres_ha.yml \ --syntax-check playbook: playbooks/postgres_ha.yml ← clean $ bash -n infra/ansible/tests/test_pgbouncer_load.sh syntax OK $ cd veza-backend-api && go build ./... (clean — comment-only change in config.go) $ gofmt -l internal/config/config.go (no output — clean) Real apply + pgbench run requires the lab R720 + the community.general collection — operator's call. Out of scope (deferred per ROADMAP §2): - HA pgbouncer (single instance per env at v1.0; double instance + keepalived in v1.1 if needed) - pg_autoctl state-change hook → pgbouncer RELOAD (W2 day 8) - Prometheus pgbouncer_exporter (W2 day 9 with the OTel collector + observability stack) SKIP_TESTS=1 — IaC YAML + bash + Go comment-only diff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 18:35:05 +02:00
senke	b8eed72f96	feat(webrtc): coturn ICE config endpoint + frontend wiring + ops template (v1.0.9 item 1.2) Closes FUNCTIONAL_AUDIT.md §4 #1: WebRTC 1:1 calls had working signaling but no NAT traversal, so calls between two peers behind symmetric NAT (corporate firewalls, mobile carrier CGNAT, Incus container default networking) failed silently after the SDP exchange. Backend: - GET /api/v1/config/webrtc (public) returns {iceServers: [...]} built from WEBRTC_STUN_URLS / WEBRTC_TURN_URLS / _USERNAME / _CREDENTIAL env vars. Half-config (URLs without creds, or vice versa) deliberately omits the TURN block — a half-configured TURN surfaces auth errors at call time instead of falling back cleanly to STUN-only. - 4 handler tests cover the matrix. Frontend: - services/api/webrtcConfig.ts caches the config for the page lifetime and falls back to the historical hardcoded Google STUN if the fetch fails. - useWebRTC fetches at mount, hands iceServers synchronously to every RTCPeerConnection, exposes a {hasTurn, loaded} hint. - CallButton tooltip warns up-front when TURN isn't configured instead of letting calls time out silently. Ops: - infra/coturn/turnserver.conf — annotated template with the SSRF- safe denied-peer-ip ranges, prometheus exporter, TLS for TURNS, static lt-cred-mech (REST-secret rotation deferred to v1.1). - infra/coturn/README.md — Incus deploy walkthrough, smoke test via turnutils_uclient, capacity rules of thumb. - docs/ENV_VARIABLES.md gains a 13bis. WebRTC ICE servers section. Coturn deployment itself is a separate ops action — this commit lands the plumbing so the deploy can light up the path with zero code changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 23:38:42 +02:00
senke	70f0fb1636	feat(transcode): read from S3 signed URL when track is s3-backed (v1.0.8 P2) Closes the transcoder's read-side gap for Phase 2. HLS transcoding now works for tracks uploaded under TRACK_STORAGE_BACKEND=s3 without requiring the stream server pod to share a local volume. Changes: - internal/services/hls_transcode_service.go - New SignedURLProvider interface (minimal: GetSignedURL). - HLSTranscodeService gains optional s3Resolver + SetS3Resolver. - TranscodeTrack routed through new resolveSource helper — returns local FilePath for local tracks, a 1h-TTL signed URL for s3-backed rows. Missing resolver for an s3 track returns a clear error. - os.Stat check skipped for HTTP(S) sources (ffmpeg validates them). - transcodeBitrate takes `source` explicitly so URL propagation is obvious and ValidateExecPath is bypassed only for the known signed-URL shape. - isHTTPSource helper (http://, https:// prefix check). - internal/workers/job_worker.go - JobWorker gains optional s3Resolver + SetS3Resolver. - processTranscodingJob skips the local-file stat when track.StorageBackend='s3', reads via signed URL instead. - Passes w.s3Resolver to NewHLSTranscodeService when non-nil. - internal/config/config.go: DI wires S3StorageService into JobWorker after instantiation (nil-safe). - internal/core/track/service.go (copyFileAsyncS3) - Re-enabled stream server trigger: generates a 1h-TTL signed URL for the fresh s3 key and passes it to streamService.StartProcessing. Rust-side ffmpeg consumes HTTPS URLs natively. Failure is logged but does not fail the upload (track will sit in Processing until a retry / reconcile). - internal/core/track/track_upload_handler.go (CompleteChunkedUpload) - Reload track after S3 migration to pick up the new storage_key. - Compute transcodeSource = signed URL (s3 path) or finalPath (local). - Pass transcodeSource to both streamService.StartProcessing and jobEnqueuer.EnqueueTranscodingJob — dual-trigger preserved per plan D2 (consolidation deferred v1.0.9). - internal/services/hls_transcode_service_test.go - TestHLSTranscodeService_TranscodeTrack_EmptyFilePath updated for the expanded error message ("empty FilePath" vs "file path is empty"). Known limitation (v1.0.9): HLS segment OUTPUT still writes to the local outputDir; only the INPUT side is S3-aware. Multi-pod HLS serving needs the worker to upload segments to MinIO post-transcode. Acceptable for v1.0.8 target — single-pod staging supports both local + s3 tracks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 23:34:51 +02:00
senke	d03232c85c	feat(storage): add track storage_backend column + config prep (v1.0.8 P0) Some checks failed Veza CI / Backend (Go) (push) Failing after 0s Details Veza CI / Frontend (Web) (push) Failing after 0s Details Veza CI / Rust (Stream Server) (push) Failing after 0s Details Security Scan / Secret Scanning (gitleaks) (push) Failing after 0s Details Veza CI / Notify on failure (push) Failing after 0s Details Phase 0 of the MinIO upload migration (FUNCTIONAL_AUDIT §4 item 2). Schema + config only — Phase 1 will wire TrackService.UploadTrack() to actually route writes to S3 when the flag is flipped. Schema (migration 985): - tracks.storage_backend VARCHAR(16) NOT NULL DEFAULT 'local' CHECK in ('local', 's3') - tracks.storage_key VARCHAR(512) NULL (S3 object key when backend=s3) - Partial index on storage_backend = 's3' (migration progress queries) - Rollback drops both columns + index; safe only while all rows are still 'local' (guard query in the rollback comment) Go model (internal/models/track.go): - StorageBackend string (default 'local', not null) - StorageKey *string (nullable) - Both tagged json:"-" — internal plumbing, never exposed publicly Config (internal/config/config.go): - New field Config.TrackStorageBackend - Read from TRACK_STORAGE_BACKEND env var (default 'local') - Production validation rule #11 (ValidateForEnvironment): - Must be 'local' or 's3' (reject typos like 'S3' or 'minio') - If 's3', requires AWS_S3_ENABLED=true (fail fast, do not boot with TrackStorageBackend=s3 while S3StorageService is nil) - Dev/staging warns and falls back to 'local' instead of fail — keeps iteration fast while still flagging misconfig. Docs: - docs/ENV_VARIABLES.md §13 restructured as "HLS + track storage backend" with a migration playbook (local → s3 → migrate-storage CLI) - docs/ENV_VARIABLES.md §28 validation rules: +2 entries for new rules - docs/ENV_VARIABLES.md §29 drift findings: TRACK_STORAGE_BACKEND added to "missing from template" list before it was fixed - veza-backend-api/.env.template: TRACK_STORAGE_BACKEND=local with comment pointing at Phase 1/2/3 plans No behavior change yet — TrackService.UploadTrack() still hardcodes the local path via copyFileAsync(). Phase 1 wires it. Refs: - AUDIT_REPORT.md §9 item (deferrals v1.0.8) - FUNCTIONAL_AUDIT.md §4 item 2 "Stockage local disque only" - /home/senke/.claude/plans/audit-fonctionnel-wild-hickey.md Item 3 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 19:54:28 +02:00
senke	7e180a2c08	feat(workers): hyperswitch reconciliation sweep for stuck pending states — v1.0.7 item C New ReconcileHyperswitchWorker sweeps for pending orders and refunds whose terminal webhook never arrived. Pulls live PSP state for each stuck row and synthesises a webhook payload to feed the normal ProcessPaymentWebhook / ProcessRefundWebhook dispatcher. The existing terminal-state guards on those handlers make reconciliation idempotent against real webhooks — a late webhook after the reconciler resolved the row is a no-op. Three stuck-state classes covered: 1. Stuck orders (pending > 30m, non-empty payment_id) → GetPaymentStatus + synthetic payment.<status> webhook. 2. Stuck refunds with PSP id (pending > 30m, non-empty hyperswitch_refund_id) → GetRefundStatus + synthetic refund.<status> webhook (error_message forwarded). 3. Orphan refunds (pending > 5m, EMPTY hyperswitch_refund_id) → mark failed + roll order back to completed + log ERROR. This is the "we crashed between Phase 1 and Phase 2 of RefundOrder" case, operator-attention territory. New interfaces: * marketplace.HyperswitchReadClient — read-only PSP surface the worker depends on (GetPaymentStatus, GetRefundStatus). The worker never calls CreatePayment / CreateRefund. * hyperswitch.Client.GetRefund + RefundStatus struct added. * hyperswitch.Provider gains GetRefundStatus + GetPaymentStatus pass-throughs that satisfy the marketplace interface. Configuration (all env-var tunable with sensible defaults): * RECONCILE_WORKER_ENABLED=true * RECONCILE_INTERVAL=1h (ops can drop to 5m during incident response without a code change) * RECONCILE_ORDER_STUCK_AFTER=30m * RECONCILE_REFUND_STUCK_AFTER=30m * RECONCILE_REFUND_ORPHAN_AFTER=5m (shorter because "app crashed" is a different signal from "network hiccup") Operational details: * Batch limit 50 rows per phase per tick so a 10k-row backlog doesn't hammer Hyperswitch. Next tick picks up the rest. * PSP read errors leave the row untouched — next tick retries. Reconciliation is always safe to replay. * Structured log on every action so `grep reconcile` tells the ops story: which order/refund got synced, against what status, how long it was stuck. * Worker wired in cmd/api/main.go, gated on HyperswitchEnabled + HyperswitchAPIKey. Graceful shutdown registered. * RunOnce exposed as public API for ad-hoc ops trigger during incident response. Tests — 10 cases, all green (sqlite :memory:): * TestReconcile_StuckOrder_SyncsViaSyntheticWebhook * TestReconcile_RecentOrder_NotTouched * TestReconcile_CompletedOrder_NotTouched * TestReconcile_OrderWithEmptyPaymentID_NotTouched * TestReconcile_PSPReadErrorLeavesRowIntact * TestReconcile_OrphanRefund_AutoFails_OrderRollsBack * TestReconcile_RecentOrphanRefund_NotTouched * TestReconcile_StuckRefund_SyncsViaSyntheticWebhook * TestReconcile_StuckRefund_FailureStatus_PassesErrorMessage * TestReconcile_AllTerminalStates_NoOp CHANGELOG v1.0.7-rc1 updated with the full item C section between D and the existing E block, matching the order convention (ship order: A → D → B → E → C, CHANGELOG order follows). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 03:08:15 +02:00
senke	3c4d0148be	feat(webhooks): persist raw hyperswitch payloads to audit log — v1.0.7 item E Every POST /webhooks/hyperswitch delivery now writes a row to `hyperswitch_webhook_log` regardless of signature-valid or processing outcome. Captures both legitimate deliveries and attack probes — a forensics query now has the actual bytes to read, not just a "webhook rejected" log line. Disputes (axis-1 P1.6) ride along: the log captures dispute.* events alongside payment and refund events, ready for when disputes get a handler. Table shape (migration 984): * payload TEXT — readable in psql, invalid UTF-8 replaced with empty (forensics value is in headers + ip + timing for those attacks, not the binary body). * signature_valid BOOLEAN + partial index for "show me attack attempts" being instantaneous. * processing_result TEXT — 'ok' / 'error: <msg>' / 'signature_invalid' / 'skipped'. Matches the P1.5 action semantic exactly. * source_ip, user_agent, request_id — forensics essentials. request_id is captured from Hyperswitch's X-Request-Id header when present, else a server-side UUID so every row correlates to VEZA's structured logs. * event_type — best-effort extract from the JSON payload, NULL on malformed input. Hardening: * 64KB body cap via io.LimitReader rejects oversize with 413 before any INSERT — prevents log-spam DoS. * Single INSERT per delivery with final state; no two-phase update race on signature-failure path. signature_invalid and processing-error rows both land. * DB persistence failures are logged but swallowed — the endpoint's contract is to ack Hyperswitch, not perfect audit. Retention sweep: * CleanupHyperswitchWebhookLog in internal/jobs, daily tick, batched DELETE (10k rows + 100ms pause) so a large backlog doesn't lock the table. * HYPERSWITCH_WEBHOOK_LOG_RETENTION_DAYS (default 90). * Same goroutine-ticker pattern as ScheduleOrphanTracksCleanup. * Wired in cmd/api/main.go alongside the existing cleanup jobs. Tests: 5 in webhook_log_test.go (persistence, request_id auto-gen, invalid-JSON leaves event_type empty, invalid-signature capture, extractEventType 5 sub-cases) + 4 in cleanup_hyperswitch_webhook_ log_test.go (deletes-older-than, noop, default-on-zero, context-cancel). Migration 984 applied cleanly to local Postgres; all indexes present. Also (v107-plan.md): * Item G acceptance gains an explicit Idempotency-Key threading requirement with an empty-key loud-fail test — "literally copy-paste D's 4-line test skeleton". Closes the risk that item G silently reopens the HTTP-retry duplicate-charge exposure D closed. Out of scope for E (noted in CHANGELOG): * Rate limit on the endpoint — pre-existing middleware covers it at the router level; adding a per-endpoint limit is separate scope. * Readable-payload SQL view — deferred, the TEXT column is already human-readable; a convenience view is a nice-to-have not a ship-blocker. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 02:44:58 +02:00
senke	d2bb9c0e78	feat(marketplace): async stripe connect reversal worker — v1.0.7 item B day 2 Day-2 cut of item B: the reversal path becomes async. Pre-v1.0.7 (and v1.0.7 day 1) the refund handler flipped seller_transfers straight from completed to reversed without ever calling Stripe — the ledger said "reversed" while the seller's Stripe balance still showed the original transfer as settled. The new flow: refund.succeeded webhook → reverseSellerAccounting transitions row: completed → reversal_pending → StripeReversalWorker (every REVERSAL_CHECK_INTERVAL, default 1m) → calls ReverseTransfer on Stripe → success: row → reversed + persist stripe_reversal_id → 404 already-reversed (dead code until day 3): row → reversed + log → 404 resource_missing (dead code until day 3): row → permanently_failed → transient error: stay reversal_pending, bump retry_count, exponential backoff (base * 2^retry, capped at backoffMax) → retries exhausted: row → permanently_failed → buyer-facing refund completes immediately regardless of Stripe health State machine enforcement: * New `SellerTransfer.TransitionStatus(tx, to, extras)` wraps every mutation: validates against AllowedTransferTransitions, guarded UPDATE with WHERE status=<from> (optimistic lock semantics), no RowsAffected = stale state / concurrent winner detected. * processSellerTransfers no longer mutates .Status in place — terminal status is decided before struct construction, so the row is Created with its final state. * transfer_retry.retryOne and admin RetryTransfer route through TransitionStatus. Legacy direct assignment removed. * TestNoDirectTransferStatusMutation greps the package for any `st.Status = "..."` / `t.Status = "..."` / GORM Model(&SellerTransfer{}).Update("status"...) outside the allowlist and fails if found. Verified by temporarily injecting a violation during development — test caught it as expected. Configuration (v1.0.7 item B): * REVERSAL_WORKER_ENABLED=true (default) * REVERSAL_MAX_RETRIES=5 (default) * REVERSAL_CHECK_INTERVAL=1m (default) * REVERSAL_BACKOFF_BASE=1m (default) * REVERSAL_BACKOFF_MAX=1h (default, caps exponential growth) * .env.template documents TRANSFER_RETRY_* and REVERSAL_* env vars so an ops reader can grep them. Interface change: TransferService.ReverseTransfer(ctx, stripe_transfer_id, amount int64, reason) (reversalID, error) added. All four mocks extended (process_webhook, transfer_retry, admin_transfer_handler, payment_flow integration). amount=nil means full reversal; v1.0.7 always passes nil (partial reversal is future scope per axis-1 P2). Stripe 404 disambiguation (ErrTransferAlreadyReversed / ErrTransferNotFound) is wired in the worker as dead code — the sentinels are declared and the worker branches on them, but StripeConnectService.ReverseTransfer doesn't yet emit them. Day 3 will parse stripe.Error.Code and populate the sentinels; no worker change needed at that point. Keeping the handling skeleton in day 2 so the worker's branch shape doesn't change between days and the tests can already cover all four paths against the mock. Worker unit tests (9 cases, all green, sqlite :memory:): happy path: reversal_pending → reversed + stripe_reversal_id set * already reversed (mock returns sentinel): → reversed + log * not found (mock returns sentinel): → permanently_failed + log * transient 503: retry_count++, next_retry_at set with backoff, stays reversal_pending * backoff capped at backoffMax (verified with base=1s, max=10s, retry_count=4 → capped at 10s not 16s) * max retries exhausted: → permanently_failed * legacy row with empty stripe_transfer_id: → permanently_failed, does not call Stripe * only picks up reversal_pending (skips all other statuses) * respects next_retry_at (future rows skipped) Existing test updated: TestProcessRefundWebhook_SucceededFinalizesState now asserts the row lands at reversal_pending with next_retry_at set (worker's responsibility to drive to reversed), not reversed. Worker wired in cmd/api/main.go alongside TransferRetryWorker, sharing the same StripeConnectService instance. Shutdown path registered for graceful stop. Cut from day 2 scope (per agreed-upon discipline), landing in day 3: * Stripe 404 disambiguation implementation (parse error.Code) * End-to-end smoke probe (refund → reversal_pending → worker processes → reversed) against local Postgres + mock Stripe * Batch-size tuning / inter-batch sleep — batchLimit=20 today is safely under Stripe's 100 req/s default rate limit; revisit if observed load warrants Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 15:34:29 +02:00
senke	97ca5209a1	fix(chat,config): require REDIS_URL in prod + error on in-memory fallback Two connected failure modes that silently break multi-pod deployments: 1. `RedisURL` has a struct-level default (`redis://<appDomain>:6379`) that makes `c.RedisURL == ""` always false. An operator forgetting to set `REDIS_URL` booted against a phantom host — every Redis call would then fail, and `ChatPubSubService` would quietly fall back to an in-memory map. On a single-pod deploy that "works"; on two pods it silently partitions chat (messages on pod A never reach subscribers on pod B). 2. The fallback itself was logged at `Warn` level, buried under normal traffic. Operators only noticed when users reported stuck chats. Changes: * `config.go` (`ValidateForEnvironment` prod branch): new check that `os.Getenv("REDIS_URL")` is non-empty. The struct field is left alone (dev + test still use the default); we inspect the raw env so the check is "explicitly set" rather than "non-empty after defaults". * `chat_pubsub.go` `NewChatPubSubService`: if `redisClient == nil`, emit an `ERROR` at construction time naming the failure mode ("cross-instance messages will be lost"). Same `Warn`→`Error` promotion for the `Publish` fallback path — runbook-worthy. Tests: new `chat_pubsub_test.go` with a `zaptest/observer` that asserts the ERROR-level log fires exactly once when Redis is nil, plus an in-memory fan-out happy-path so single-pod dev behaviour stays covered. New `TestValidateForEnvironment_RedisURLRequiredInProduction` mirrors the Hyperswitch guard test shape. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 14:56:47 +02:00
senke	03b30c0c29	fix(config): refuse boot in production when HYPERSWITCH_ENABLED=false With payments disabled, the marketplace flow still completes: orders are created with status `CREATED`, the download URL is released, and no PSP call is ever made. In other words: on a misconfigured prod instance, every purchase is free. The only signal was a silent `hyperswitch_enabled=false` at boot. `ValidateForEnvironment()` (already wired at `NewConfig` line 513, before the HTTP listener binds) now rejects `APP_ENV=production` with `HyperswitchEnabled=false`. The error message names the failure mode explicitly ("effectively giving away products") rather than a terse "config invalid" — this is a revenue leak, not a typo. Dev and staging are unaffected. Tests: 3 new cases in `validation_test.go` (`TestValidateForEnvironment_HyperswitchRequiredInProduction`) + `TestLoadConfig_ProdValid` updated to set `HyperswitchEnabled: true`. `TestValidateForEnvironment_ClamAVRequiredInProduction` fixture also includes the new field so its "succeeds" sub-test still runs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 14:55:18 +02:00
senke	a1000ce7fb	style(backend): gofmt -w on 85 files (whitespace only) backend-ci.yml's `test -z "$(gofmt -l .)"` strict gate (added in `13c21ac11`) failed on a backlog of unformatted files. None of the 85 files in this commit had been edited since the gate was added because no push touched veza-backend-api/** in between, so the gate never fired until today's CI fixes triggered it. The diff is exclusively whitespace alignment in struct literals and trailing-space comments. `go build ./...` and the full test suite (with VEZA_SKIP_INTEGRATION=1 -short) pass identically.	2026-04-14 12:22:14 +02:00
senke	23487d8723	feat: backend — config, handlers, services, logging, migration Update RabbitMQ config and eventbus. Improve secret filter logging. Refine presence, cloud, and social services. Update announcement and feature flag handlers. Add track_likes updated_at migration. Rebuild seed binary. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-23 15:46:57 +01:00
senke	73eca4f6ad	feat: backend, stream server & infra improvements Backend (Go): - Config: CORS, RabbitMQ, rate limit, main config updates - Routes: core, distribution, tracks routing changes - Middleware: rate limiter, endpoint limiter, response cache hardening - Handlers: distribution, search handler fixes - Workers: job worker improvements - Upload validator and logging config additions - New migrations: products, orders, performance indexes - Seed tooling and data Stream Server (Rust): - Audio processing, config, routes, simple stream server updates - Dockerfile improvements Infrastructure: - docker-compose.yml updates - nginx-rtmp config changes - Makefile improvements (config, dev, high, infra) - Root package.json and lock file updates - .env.example updates Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 11:36:06 +01:00
senke	2a4de3ce21	v0.9.8	2026-03-06 19:13:16 +01:00
senke	2ed2bb9dcf	v0.9.4	2026-03-05 23:03:43 +01:00
senke	2df921abd5	v0.9.1	2026-03-05 19:22:31 +01:00
senke	7cb4ef56e1	feat(v0.912): Cashflow - payment E2E integration tests Some checks failed Backend API CI / test-unit (push) Failing after 0s Details Backend API CI / test-integration (push) Failing after 0s Details - Add MarketplaceServiceOverride and AuthMiddlewareOverride to config for tests - Wire overrides in routes_webhooks and routes_marketplace (authForMarketplaceInterface) - payment_flow_test: cart -> checkout -> webhook -> order completed, license, transfer - webhook_idempotency_test: 3 identical webhooks -> 1 order, 1 license - webhook_security_test: empty secret 500, invalid sig 401, valid sig 200 - refund_flow_test: completed order -> refund -> order refunded, license revoked - Shared computeWebhookSignature helper in webhook_test_helpers.go - SetMaxOpenConns(1) for sqlite :memory: in idempotency test to avoid flakiness Ref: docs/ROADMAP_V09XX_TO_V1.md v0.912 Cashflow	2026-02-27 20:00:51 +01:00
senke	6823e5a30d	release(v0.902): Sentinel - PKCE OAuth, token encryption, redirect validation, CHAT_JWT_SECRET Some checks failed Backend API CI / test-unit (push) Failing after 0s Details Backend API CI / test-integration (push) Failing after 0s Details - PKCE (S256) in OAuth flow: code_verifier in oauth_states, code_challenge in auth URL - CryptoService: AES-256-GCM encryption for OAuth provider tokens at rest - OAuth redirect URL validated against OAUTH_ALLOWED_REDIRECT_DOMAINS - CHAT_JWT_SECRET must differ from JWT_SECRET in production - Migration script: cmd/tools/encrypt_oauth_tokens for existing tokens - Fixes: VEZA-SEC-003, VEZA-SEC-004, VEZA-SEC-009, VEZA-SEC-010	2026-02-26 19:49:15 +01:00
senke	51984e9a1f	feat(security): v0.901 Ironclad - fix 5 critical/high vulnerabilities Some checks failed Backend API CI / test-unit (push) Failing after 0s Details Backend API CI / test-integration (push) Failing after 0s Details - OAuth: use JWTService+SessionService, httpOnly cookies (VEZA-SEC-001) - Remove PasswordService.GenerateJWT (VEZA-SEC-002) - Hyperswitch webhook: mandatory verification, 500 if secret empty (VEZA-SEC-005) - Auth middleware: TokenBlacklist.IsBlacklisted check (VEZA-SEC-006) - Waveform: ValidateExecPath before exec (VEZA-SEC-007)	2026-02-26 19:34:45 +01:00
senke	42764110f0	feat(config): add transfer retry configuration (v0.701)	2026-02-23 23:31:09 +01:00
senke	535e76adfe	feat(commerce): add PLATFORM_FEE_RATE config (default 10%)	2026-02-23 22:54:50 +01:00
senke	ae81e171c7	feat(seller): add Stripe Connect config	2026-02-23 22:09:23 +01:00
senke	cc9fbf4f24	feat(commerce): Hyperswitch LIVE_MODE configuration Some checks failed Backend API CI / test-unit (push) Failing after 0s Details Backend API CI / test-integration (push) Failing after 0s Details - config: HyperswitchLiveMode (HYPERSWITCH_LIVE_MODE) - routes_marketplace: warn when production + LiveMode=false - docker-compose.prod: HYPERSWITCH_LIVE_MODE env var	2026-02-23 19:56:52 +01:00
senke	218b4b33d6	feat(streaming): wire HLS pipeline end-to-end with serving routes - Add HLSEnabled and HLSStorageDir to backend config (HLS_STREAMING env) - Register HLS serving routes (master.m3u8, quality playlist, segments) behind HLSEnabled feature flag on existing track routes - Add GetHLSStatus and TriggerHLSTranscode methods to StreamService for stream server communication - Update docker-compose (dev, staging, prod) with HLS env vars and shared hls-data volume between backend and stream-server - Stream callback already correctly updates stream_manifest_url	2026-02-22 21:20:35 +01:00
senke	368c78c102	fix(security): require Hyperswitch webhook secret in production when payments enabled SEC-08: If HYPERSWITCH_ENABLED=true in production, startup now fails unless HYPERSWITCH_WEBHOOK_SECRET is set. This prevents webhook signature verification from being silently bypassed.	2026-02-22 17:31:52 +01:00
senke	182b28011f	feat(presence): PresenceService and GET /users/:id/presence (P1.2)	2026-02-21 05:22:43 +01:00
senke	32348bebce	feat(developer): add API keys backend (Lot C) - Migration 082: api_keys table (user_id, name, prefix, hashed_key, scopes, last_used_at, expires_at) - APIKey model, APIKeyService (Create, List, Delete, ValidateAPIKey) - APIKeyHandler: GET/POST/DELETE /api/v1/developer/api-keys - AuthMiddleware: X-API-Key and Bearer vza_* accepted as alternative to JWT - CSRF: skip for API key auth (stateless) - Key format: vza_ prefix, SHA-256 hashed storage	2026-02-20 00:18:36 +01:00
senke	06d56dd298	feat(backend): OAuth FRONTEND_URL from config, docs update - Add FrontendURL to config (FRONTEND_URL or VITE_FRONTEND_URL) - OAuth handlers use config instead of os.Getenv - Update TODOS_AUDIT: mark UUID migration items as resolved - Add ISSUES_P2_BACKLOG.md for GitHub issues - Add ROUTES_ORPHANES.md for routes without UI - Document FRONTEND_URL in .env.example	2026-02-17 16:42:23 +01:00
senke	0f1e416679	refactor(backend): split config into domain modules (P2)	2026-02-16 11:12:21 +01:00
senke	eea88d80bf	fix(security): reject DISABLE_RATE_LIMIT_FOR_TESTS in production (A04)	2026-02-16 10:16:35 +01:00
senke	62f4ae2c82	fix(backend): require ClamAV in production environment Add validation in ValidateForEnvironment() to fail startup when CLAMAV_REQUIRED=false in production. Virus scanning is mandatory for all file uploads in production. Phase 1 audit - P1.4	2026-02-15 15:54:58 +01:00
senke	bbd8ed54de	refactor(config): découper config.go par domaine (audit 2.7) - env_helpers.go: getEnv*, parseLogAggregationLabels - db_init.go: initDatabaseWithRetry - redis_init.go: initRedis, filteredRedisLogger - rabbitmq.go: getRabbitMQURL - cors.go: CORS, cookies - rate_limit.go: rate limit defaults - services_init.go: initServices - middlewares_init.go: initMiddlewares, SetupMiddleware - config.go réduit de ~1487 à ~550 LOC	2026-02-15 14:44:33 +01:00
senke	22e5e21757	chore(audit 2.4, 2.5): supprimer code mort Education et cmd/modern-server - Supprimer routes/handlers/core Education (backend) - Supprimer handler MSW education, refs Sidebar/locales - Basculer Makefile, make/dev.mk, scripts vers cmd/api/main.go - Supprimer veza-backend-api/cmd/modern-server/	2026-02-15 14:39:40 +01:00
senke	2e04d45a14	fix(audit-1.6,1.7): remove hardcoded test secrets, block bypass flags in prod - 1.6: Replace hardcoded JWT secrets in chat server tests with runtime-generated values (env TEST_JWT_SECRET or uuid-based fallback) - 1.7: Add validateNoBypassFlagsInProduction() in config; fail startup if BYPASS_CONTENT_CREATOR_ROLE or CSRF_DISABLED is set in production Refs: AUDIT_TECHNIQUE_INTEGRAL_2026_02_15.md items 1.6, 1.7	2026-02-15 14:18:23 +01:00
senke	b73387af3c	feat(api): add PostgreSQL read replica support (3.7) - Add DATABASE_READ_URL config and InitReadReplica in database package - Add ForRead() helper for read-only handler routing - Update TrackService and TrackSearchService to use read replica for reads - Document setup in DEPLOYMENT_GUIDE.md and .env.template	2026-02-14 22:50:23 +01:00
senke	92f432fb9e	chore: consolidate pending changes (Hyperswitch, PostCard, dashboard, stream server, etc.)	2026-02-14 21:45:15 +01:00
senke	afea976f57	chore: add go.work and optional monorepo orchestrator	2026-02-14 18:21:39 +01:00
senke	ae586f6134	Phase 2 stabilisation: code mort, Modal→Dialog, feature flags, tests, router split, Rust legacy Bloc A - Code mort: - Suppression Studio (components, views, features) - Suppression gamification + services mock (projectService, storageService, gamificationService) - Mise à jour Sidebar, Navbar, locales Bloc B - Frontend: - Suppression modal.tsx deprecated, Modal.stories (doublon Dialog) - Feature flags: PLAYLIST_SEARCH, PLAYLIST_RECOMMENDATIONS, ROLE_MANAGEMENT = true - Suppression 19 tests orphelins, retrait exclusions vitest.config Bloc C - Backend: - Extraction routes_auth.go depuis router.go Bloc D - Rust: - Suppression security_legacy.rs (code mort, patterns déjà dans security/)	2026-02-14 17:23:32 +01:00
senke	30f17dfc2a	chore(backend): config, router, auth, stream service, sanitizer, tests Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-11 22:19:09 +01:00
senke	b1ed46b142	small fixes : cors + login loop	2026-02-07 20:36:48 +01:00
senke	f0ba7de543	state-ownership: delete unused optimisticStoreUpdates.ts file - Deleted apps/web/src/utils/optimisticStoreUpdates.ts (unused file) - File was unused - no imports found in codebase - Mutations already use React Query's onMutate pattern - No TypeScript errors after deletion - Actions 4.4.1.2 and 4.4.1.3 complete	2026-01-15 19:26:53 +01:00
senke	76d95ecfb4	incus deployement fully implemented, Makefile updated and make fmt ran	2026-01-13 19:47:57 +01:00
senke	8efbb97e6f	stabilisation commit A	2026-01-07 19:39:21 +01:00
senke	cdf7da36d1	[FIX] PROD-003: Corriger imports use-toast → useToast	2026-01-04 01:44:17 +01:00
senke	a31726cfe8	[LOGGING] Fix #27 : Correction erreur compilation (variable non utilisée)	2026-01-04 01:44:17 +01:00
senke	1b747a2c29	[LOGGING] Fix #27 : Utiliser logger optimisé (asynchrone) en production/staging	2026-01-04 01:44:17 +01:00
senke	90d4011070	[LOGGING] Fix #4 : Sync() garanti au shutdown via ShutdownManager - Documentation améliorée	2026-01-04 01:44:17 +01:00
senke	9cd76a512f	[LOGGING] Fix #10 : Erreurs silencieuses - Ajout de logs avec contexte pour toutes les erreurs dans core/auth et core/track	2026-01-04 01:44:15 +01:00

1 2

67 commits