Commit graph

16 commits

Author SHA1 Message Date
senke
d5152d89a2 feat(stream): HLS default on + marketplace 30s pre-listen + FLAC tier checkbox (W4 Day 17)
Some checks failed
Veza CI / Rust (Stream Server) (push) Successful in 5m28s
Security Scan / Secret Scanning (gitleaks) (push) Failing after 53s
Veza CI / Backend (Go) (push) Failing after 7m59s
Veza CI / Frontend (Web) (push) Failing after 17m43s
Veza CI / Notify on failure (push) Successful in 4s
E2E Playwright / e2e (full) (push) Failing after 20m55s
Three pieces shipping under one banner since they're the day's
deliverables and share no review-time coupling :

1. HLS_STREAMING default flipped true
   - config.go : getEnvBool default true (was false). Operators wanting
     a lightweight dev / unit-test env explicitly set HLS_STREAMING=false
     to skip the transcoder pipeline.
   - .env.template : default flipped + comment explaining the opt-out.
   - Effect : every new track upload routes through the HLS transcoder
     by default ; ABR ladder served via /tracks/:id/master.m3u8.

2. Marketplace 30s pre-listen (creator opt-in)
   - migrations/989 : adds products.preview_enabled BOOLEAN NOT NULL
     DEFAULT FALSE + partial index on TRUE values. Default off so
     adoption is opt-in.
   - core/marketplace/models.go : PreviewEnabled field on Product.
   - handlers/marketplace.go : StreamProductPreview gains a fall-through.
     When no file-based ProductPreview exists AND the product is a
     track product AND preview_enabled=true, redirect to the underlying
     /tracks/:id/stream?preview=30. Header X-Preview-Cap-Seconds: 30
     surfaces the policy.
   - core/track/track_hls_handler.go : StreamTrack accepts ?preview=30
     and gates anonymous access via isMarketplacePreviewAllowed (raw
     SQL probe of products.preview_enabled to avoid the
     track→marketplace import cycle ; the reverse arrow already exists).
   - Trust model : 30s cap is enforced client-side (HTML5 audio
     currentTime). Industry standard for tease-to-buy ; not anti-rip.
     Documented in the migration + handler doc comment.

3. FLAC tier preview checkbox (Premium-gated, hidden by default)
   - upload-modal/constants.ts : optional flacAvailable on UploadFormData.
   - upload-modal/UploadModalMetadataForm.tsx : new optional props
     showFlacAvailable + flacAvailable + onFlacAvailableChange.
     Checkbox renders only when showFlacAvailable=true ; consumers
     pass that based on the user's role/subscription tier (deferred
     to caller wiring — Item G phase 4 will replace the role check
     with a real subscription-tier check).
   - Today the checkbox is a UI affordance only ; the actual lossless
     distribution path (ladder + storage class) is post-launch work.

Acceptance (Day 17) : new uploads serve HLS ABR by default ;
products.preview_enabled flag wires anonymous 30s pre-listen ;
checkbox visible to premium users on the upload form. All 4 tested
backend packages pass : handlers, core/track, core/marketplace, config.

W4 progress : Day 16 ✓ · Day 17 ✓ · Day 18 (faceted search)  ·
Day 19 (HAProxy sticky WS)  · Day 20 (k6 nightly) .

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 09:56:02 +02:00
senke
15e591305e feat(cdn): Bunny.net signed URLs + HLS cache headers + metric collision fix (W3 Day 13)
Some checks failed
Veza CI / Rust (Stream Server) (push) Successful in 5m12s
Security Scan / Secret Scanning (gitleaks) (push) Failing after 54s
Veza CI / Backend (Go) (push) Failing after 8m38s
Veza CI / Frontend (Web) (push) Failing after 16m44s
Veza CI / Notify on failure (push) Successful in 15s
E2E Playwright / e2e (full) (push) Successful in 20m28s
CDN edge in front of S3/MinIO via origin-pull. Backend signs URLs
with Bunny.net token-auth (SHA-256 over security_key + path + expires)
so edges verify before serving cached objects ; origin is never hit
on a valid token. Cloudflare CDN / R2 / CloudFront stubs kept.

- internal/services/cdn_service.go : new providers CDNProviderBunny +
  CDNProviderCloudflareR2. SecurityKey added to CDNConfig.
  generateBunnySignedURL implements the documented Bunny scheme
  (url-safe base64, no padding, expires query). HLSSegmentCacheHeaders
  + HLSPlaylistCacheHeaders helpers exported for handlers.
- internal/services/cdn_service_test.go : pin Bunny URL shape +
  base64-url charset ; assert empty SecurityKey fails fast (no
  silent fallback to unsigned URLs).
- internal/core/track/service.go : new CDNURLSigner interface +
  SetCDNService(cdn). GetStorageURL prefers CDN signed URL when
  cdnService.IsEnabled, falls back to direct S3 presign on signing
  error so a CDN partial outage doesn't block playback.
- internal/api/routes_tracks.go + routes_core.go : wire SetCDNService
  on the two TrackService construction sites that serve stream/download.
- internal/config/config.go : 4 new env vars (CDN_ENABLED, CDN_PROVIDER,
  CDN_BASE_URL, CDN_SECURITY_KEY). config.CDNService always non-nil
  after init ; IsEnabled gates the actual usage.
- internal/handlers/hls_handler.go : segments now return
  Cache-Control: public, max-age=86400, immutable (content-addressed
  filenames make this safe). Playlists at max-age=60.
- veza-backend-api/.env.template : 4 placeholder env vars.
- docs/ENV_VARIABLES.md §12 : provider matrix + Bunny vs Cloudflare
  vs R2 trade-offs.

Bug fix collateral : v1.0.9 Day 11 introduced veza_cache_hits_total
which collided in name with monitoring.CacheHitsTotal (different
label set ⇒ promauto MustRegister panic at process init). Day 13
deletes the monitoring duplicate and restores the metrics-package
counter as the single source of truth (label: subsystem). All 8
affected packages green : services, core/track, handlers, middleware,
websocket/chat, metrics, monitoring, config.

Acceptance (Day 13) : code path is wired ; verifying via real Bunny
edge requires a Pull Zone provisioned by the user (EX-? in roadmap).
On the user side : create Pull Zone w/ origin = MinIO, copy token
auth key into CDN_SECURITY_KEY, set CDN_ENABLED=true.

W3 progress : Redis Sentinel ✓ · MinIO distribué ✓ · CDN ✓ ·
DMCA  Day 14 · embed  Day 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 14:07:20 +02:00
senke
a36d9b2d59 feat(redis): Sentinel HA + cache hit rate metrics (W3 Day 11)
Some checks failed
Veza CI / Backend (Go) (push) Failing after 8m56s
Veza CI / Frontend (Web) (push) Has been cancelled
E2E Playwright / e2e (full) (push) Has been cancelled
Veza CI / Notify on failure (push) Blocked by required conditions
Veza CI / Rust (Stream Server) (push) Successful in 5m3s
Security Scan / Secret Scanning (gitleaks) (push) Failing after 53s
Three Incus containers, each running redis-server + redis-sentinel
(co-located). redis-1 = master at first boot, redis-2/3 = replicas.
Sentinel quorum=2 of 3 ; failover-timeout=30s satisfies the W3
acceptance criterion.

- internal/config/redis_init.go : initRedis branches on
  REDIS_SENTINEL_ADDRS ; non-empty -> redis.NewFailoverClient with
  MasterName + SentinelAddrs + SentinelPassword. Empty -> existing
  single-instance NewClient (dev/local stays parametric).
- internal/config/config.go : 3 new fields (RedisSentinelAddrs,
  RedisSentinelMasterName, RedisSentinelPassword) read from env.
  parseRedisSentinelAddrs trims+filters CSV.
- internal/metrics/cache_hit_rate.go : new RecordCacheHit / Miss
  counters, labelled by subsystem. Cardinality bounded.
- internal/middleware/rate_limiter.go : instrument 3 Eval call sites
  (DDoS, frontend log throttle, upload throttle). Hit = Redis answered,
  Miss = error -> in-memory fallback.
- internal/services/chat_pubsub.go : instrument Publish + PublishPresence.
- internal/websocket/chat/presence_service.go : instrument SetOnline /
  SetOffline / Heartbeat / GetPresence. redis.Nil counts as a hit
  (legitimate empty result).
- infra/ansible/roles/redis_sentinel/ : install Redis 7 + Sentinel,
  render redis.conf + sentinel.conf, systemd units. Vault assertion
  prevents shipping placeholder passwords to staging/prod.
- infra/ansible/playbooks/redis_sentinel.yml : provisions the 3
  containers + applies common baseline + role.
- infra/ansible/inventory/lab.yml : new groups redis_ha + redis_ha_master.
- infra/ansible/tests/test_redis_failover.sh : kills the master
  container, polls Sentinel for the new master, asserts elapsed < 30s.
- config/grafana/dashboards/redis-cache-overview.json : 3 hit-rate
  stats (rate_limiter / chat_pubsub / presence) + ops/s breakdown.
- docs/ENV_VARIABLES.md §3 : 3 new REDIS_SENTINEL_* env vars.
- veza-backend-api/.env.template : 3 placeholders (empty default).

Acceptance (Day 11) : Sentinel failover < 30s ; cache hit-rate
dashboard populated. Lab test pending Sentinel deployment.

W3 verification gate progress : Redis Sentinel ✓ (this commit),
MinIO EC4+2  Day 12, CDN  Day 13, DMCA  Day 14, embed  Day 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 13:36:55 +02:00
senke
d03232c85c feat(storage): add track storage_backend column + config prep (v1.0.8 P0)
Some checks failed
Veza CI / Backend (Go) (push) Failing after 0s
Veza CI / Frontend (Web) (push) Failing after 0s
Veza CI / Rust (Stream Server) (push) Failing after 0s
Security Scan / Secret Scanning (gitleaks) (push) Failing after 0s
Veza CI / Notify on failure (push) Failing after 0s
Phase 0 of the MinIO upload migration (FUNCTIONAL_AUDIT §4 item 2).
Schema + config only — Phase 1 will wire TrackService.UploadTrack()
to actually route writes to S3 when the flag is flipped.

Schema (migration 985):
- tracks.storage_backend VARCHAR(16) NOT NULL DEFAULT 'local'
  CHECK in ('local', 's3')
- tracks.storage_key VARCHAR(512) NULL (S3 object key when backend=s3)
- Partial index on storage_backend = 's3' (migration progress queries)
- Rollback drops both columns + index; safe only while all rows are
  still 'local' (guard query in the rollback comment)

Go model (internal/models/track.go):
- StorageBackend string (default 'local', not null)
- StorageKey *string (nullable)
- Both tagged json:"-" — internal plumbing, never exposed publicly

Config (internal/config/config.go):
- New field Config.TrackStorageBackend
- Read from TRACK_STORAGE_BACKEND env var (default 'local')
- Production validation rule #11 (ValidateForEnvironment):
  - Must be 'local' or 's3' (reject typos like 'S3' or 'minio')
  - If 's3', requires AWS_S3_ENABLED=true (fail fast, do not boot with
    TrackStorageBackend=s3 while S3StorageService is nil)
- Dev/staging warns and falls back to 'local' instead of fail — keeps
  iteration fast while still flagging misconfig.

Docs:
- docs/ENV_VARIABLES.md §13 restructured as "HLS + track storage backend"
  with a migration playbook (local → s3 → migrate-storage CLI)
- docs/ENV_VARIABLES.md §28 validation rules: +2 entries for new rules
- docs/ENV_VARIABLES.md §29 drift findings: TRACK_STORAGE_BACKEND added
  to "missing from template" list before it was fixed
- veza-backend-api/.env.template: TRACK_STORAGE_BACKEND=local with
  comment pointing at Phase 1/2/3 plans

No behavior change yet — TrackService.UploadTrack() still hardcodes the
local path via copyFileAsync(). Phase 1 wires it.

Refs:
- AUDIT_REPORT.md §9 item (deferrals v1.0.8)
- FUNCTIONAL_AUDIT.md §4 item 2 "Stockage local disque only"
- /home/senke/.claude/plans/audit-fonctionnel-wild-hickey.md Item 3

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 19:54:28 +02:00
senke
7d03ee6686 docs(env): canonicalize ENV_VARIABLES.md + add HLS_STREAMING template
Some checks failed
Veza CI / Backend (Go) (push) Failing after 0s
Veza CI / Frontend (Web) (push) Failing after 0s
Veza CI / Rust (Stream Server) (push) Failing after 0s
Security Scan / Secret Scanning (gitleaks) (push) Failing after 0s
Veza CI / Notify on failure (push) Failing after 0s
Resolves AUDIT_REPORT §9 item #15 (last real item before v1.0.7 final)
and FUNCTIONAL_AUDIT §4 stability item 5.

docs/ENV_VARIABLES.md:
- Complete rewrite from 172 → ~600 lines covering all ~180 env vars
  surveyed directly from code (os.Getenv in Go, std::env::var in Rust,
  import.meta.env in React).
- 30 sections: core, DB, Redis, JWT, OAuth, CORS, rate-limit, SMTP,
  Hyperswitch, Stripe Connect, RabbitMQ, S3/MinIO, HLS, stream server,
  Elasticsearch, ClamAV, Sentry, logging, metrics, frontend Vite,
  feature flags, password policy, build info, RTMP/misc, Rust stream
  schema, security headers recap, deprecated vars, prod validation
  rules, drift findings, startup checklist.
- Documents 8 production-critical validation rules (validation.go:869-1018).
- Flags 14 deprecated vars with canonical replacements for v1.1.0 cleanup.
- Catalogs 11 vars used by code but missing from template (HLS_STREAMING,
  SLOW_REQUEST_THRESHOLD_MS, CONFIG_WATCH, HANDLER_TIMEOUT, VAPID_*, etc).

veza-backend-api/.env.template:
- Add HLS_STREAMING=false with documentation of fallback behavior
  (/tracks/:id/stream with Range support when off).
- Add HLS_STORAGE_DIR=/tmp/veza-hls.

Closes last blocker before v1.0.7 final tag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 14:36:44 +02:00
senke
7e180a2c08 feat(workers): hyperswitch reconciliation sweep for stuck pending states — v1.0.7 item C
New ReconcileHyperswitchWorker sweeps for pending orders and refunds
whose terminal webhook never arrived. Pulls live PSP state for each
stuck row and synthesises a webhook payload to feed the normal
ProcessPaymentWebhook / ProcessRefundWebhook dispatcher. The existing
terminal-state guards on those handlers make reconciliation
idempotent against real webhooks — a late webhook after the reconciler
resolved the row is a no-op.

Three stuck-state classes covered:
  1. Stuck orders (pending > 30m, non-empty payment_id) → GetPaymentStatus
     + synthetic payment.<status> webhook.
  2. Stuck refunds with PSP id (pending > 30m, non-empty
     hyperswitch_refund_id) → GetRefundStatus + synthetic
     refund.<status> webhook (error_message forwarded).
  3. Orphan refunds (pending > 5m, EMPTY hyperswitch_refund_id) →
     mark failed + roll order back to completed + log ERROR. This
     is the "we crashed between Phase 1 and Phase 2 of RefundOrder"
     case, operator-attention territory.

New interfaces:
  * marketplace.HyperswitchReadClient — read-only PSP surface the
    worker depends on (GetPaymentStatus, GetRefundStatus). The
    worker never calls CreatePayment / CreateRefund.
  * hyperswitch.Client.GetRefund + RefundStatus struct added.
  * hyperswitch.Provider gains GetRefundStatus + GetPaymentStatus
    pass-throughs that satisfy the marketplace interface.

Configuration (all env-var tunable with sensible defaults):
  * RECONCILE_WORKER_ENABLED=true
  * RECONCILE_INTERVAL=1h (ops can drop to 5m during incident
    response without a code change)
  * RECONCILE_ORDER_STUCK_AFTER=30m
  * RECONCILE_REFUND_STUCK_AFTER=30m
  * RECONCILE_REFUND_ORPHAN_AFTER=5m (shorter because "app crashed"
    is a different signal from "network hiccup")

Operational details:
  * Batch limit 50 rows per phase per tick so a 10k-row backlog
    doesn't hammer Hyperswitch. Next tick picks up the rest.
  * PSP read errors leave the row untouched — next tick retries.
    Reconciliation is always safe to replay.
  * Structured log on every action so `grep reconcile` tells the
    ops story: which order/refund got synced, against what status,
    how long it was stuck.
  * Worker wired in cmd/api/main.go, gated on
    HyperswitchEnabled + HyperswitchAPIKey. Graceful shutdown
    registered.
  * RunOnce exposed as public API for ad-hoc ops trigger during
    incident response.

Tests — 10 cases, all green (sqlite :memory:):
  * TestReconcile_StuckOrder_SyncsViaSyntheticWebhook
  * TestReconcile_RecentOrder_NotTouched
  * TestReconcile_CompletedOrder_NotTouched
  * TestReconcile_OrderWithEmptyPaymentID_NotTouched
  * TestReconcile_PSPReadErrorLeavesRowIntact
  * TestReconcile_OrphanRefund_AutoFails_OrderRollsBack
  * TestReconcile_RecentOrphanRefund_NotTouched
  * TestReconcile_StuckRefund_SyncsViaSyntheticWebhook
  * TestReconcile_StuckRefund_FailureStatus_PassesErrorMessage
  * TestReconcile_AllTerminalStates_NoOp

CHANGELOG v1.0.7-rc1 updated with the full item C section between D
and the existing E block, matching the order convention (ship order:
A → D → B → E → C, CHANGELOG order follows).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 03:08:15 +02:00
senke
3c4d0148be feat(webhooks): persist raw hyperswitch payloads to audit log — v1.0.7 item E
Every POST /webhooks/hyperswitch delivery now writes a row to
`hyperswitch_webhook_log` regardless of signature-valid or
processing outcome. Captures both legitimate deliveries and attack
probes — a forensics query now has the actual bytes to read, not
just a "webhook rejected" log line. Disputes (axis-1 P1.6) ride
along: the log captures dispute.* events alongside payment and
refund events, ready for when disputes get a handler.

Table shape (migration 984):
  * payload TEXT — readable in psql, invalid UTF-8 replaced with
    empty (forensics value is in headers + ip + timing for those
    attacks, not the binary body).
  * signature_valid BOOLEAN + partial index for "show me attack
    attempts" being instantaneous.
  * processing_result TEXT — 'ok' / 'error: <msg>' /
    'signature_invalid' / 'skipped'. Matches the P1.5 action
    semantic exactly.
  * source_ip, user_agent, request_id — forensics essentials.
    request_id is captured from Hyperswitch's X-Request-Id header
    when present, else a server-side UUID so every row correlates
    to VEZA's structured logs.
  * event_type — best-effort extract from the JSON payload, NULL
    on malformed input.

Hardening:
  * 64KB body cap via io.LimitReader rejects oversize with 413
    before any INSERT — prevents log-spam DoS.
  * Single INSERT per delivery with final state; no two-phase
    update race on signature-failure path. signature_invalid and
    processing-error rows both land.
  * DB persistence failures are logged but swallowed — the
    endpoint's contract is to ack Hyperswitch, not perfect audit.

Retention sweep:
  * CleanupHyperswitchWebhookLog in internal/jobs, daily tick,
    batched DELETE (10k rows + 100ms pause) so a large backlog
    doesn't lock the table.
  * HYPERSWITCH_WEBHOOK_LOG_RETENTION_DAYS (default 90).
  * Same goroutine-ticker pattern as ScheduleOrphanTracksCleanup.
  * Wired in cmd/api/main.go alongside the existing cleanup jobs.

Tests: 5 in webhook_log_test.go (persistence, request_id auto-gen,
invalid-JSON leaves event_type empty, invalid-signature capture,
extractEventType 5 sub-cases) + 4 in cleanup_hyperswitch_webhook_
log_test.go (deletes-older-than, noop, default-on-zero,
context-cancel). Migration 984 applied cleanly to local Postgres;
all indexes present.

Also (v107-plan.md):
  * Item G acceptance gains an explicit Idempotency-Key threading
    requirement with an empty-key loud-fail test — "literally
    copy-paste D's 4-line test skeleton". Closes the risk that
    item G silently reopens the HTTP-retry duplicate-charge
    exposure D closed.

Out of scope for E (noted in CHANGELOG):
  * Rate limit on the endpoint — pre-existing middleware covers
    it at the router level; adding a per-endpoint limit is
    separate scope.
  * Readable-payload SQL view — deferred, the TEXT column is
    already human-readable; a convenience view is a nice-to-have
    not a ship-blocker.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 02:44:58 +02:00
senke
d2bb9c0e78 feat(marketplace): async stripe connect reversal worker — v1.0.7 item B day 2
Day-2 cut of item B: the reversal path becomes async. Pre-v1.0.7
(and v1.0.7 day 1) the refund handler flipped seller_transfers
straight from completed to reversed without ever calling Stripe —
the ledger said "reversed" while the seller's Stripe balance still
showed the original transfer as settled. The new flow:

  refund.succeeded webhook
    → reverseSellerAccounting transitions row: completed → reversal_pending
    → StripeReversalWorker (every REVERSAL_CHECK_INTERVAL, default 1m)
      → calls ReverseTransfer on Stripe
      → success: row → reversed + persist stripe_reversal_id
      → 404 already-reversed (dead code until day 3): row → reversed + log
      → 404 resource_missing (dead code until day 3): row → permanently_failed
      → transient error: stay reversal_pending, bump retry_count,
        exponential backoff (base * 2^retry, capped at backoffMax)
      → retries exhausted: row → permanently_failed
    → buyer-facing refund completes immediately regardless of Stripe health

State machine enforcement:
  * New `SellerTransfer.TransitionStatus(tx, to, extras)` wraps every
    mutation: validates against AllowedTransferTransitions, guarded
    UPDATE with WHERE status=<from> (optimistic lock semantics), no
    RowsAffected = stale state / concurrent winner detected.
  * processSellerTransfers no longer mutates .Status in place —
    terminal status is decided before struct construction, so the
    row is Created with its final state.
  * transfer_retry.retryOne and admin RetryTransfer route through
    TransitionStatus. Legacy direct assignment removed.
  * TestNoDirectTransferStatusMutation greps the package for any
    `st.Status = "..."` / `t.Status = "..."` / GORM
    Model(&SellerTransfer{}).Update("status"...) outside the
    allowlist and fails if found. Verified by temporarily injecting
    a violation during development — test caught it as expected.

Configuration (v1.0.7 item B):
  * REVERSAL_WORKER_ENABLED=true (default)
  * REVERSAL_MAX_RETRIES=5 (default)
  * REVERSAL_CHECK_INTERVAL=1m (default)
  * REVERSAL_BACKOFF_BASE=1m (default)
  * REVERSAL_BACKOFF_MAX=1h (default, caps exponential growth)
  * .env.template documents TRANSFER_RETRY_* and REVERSAL_* env vars
    so an ops reader can grep them.

Interface change: TransferService.ReverseTransfer(ctx,
stripe_transfer_id, amount *int64, reason) (reversalID, error)
added. All four mocks extended (process_webhook, transfer_retry,
admin_transfer_handler, payment_flow integration). amount=nil means
full reversal; v1.0.7 always passes nil (partial reversal is future
scope per axis-1 P2).

Stripe 404 disambiguation (ErrTransferAlreadyReversed /
ErrTransferNotFound) is wired in the worker as dead code — the
sentinels are declared and the worker branches on them, but
StripeConnectService.ReverseTransfer doesn't yet emit them. Day 3
will parse stripe.Error.Code and populate the sentinels; no worker
change needed at that point. Keeping the handling skeleton in day 2
so the worker's branch shape doesn't change between days and the
tests can already cover all four paths against the mock.

Worker unit tests (9 cases, all green, sqlite :memory:):
  * happy path: reversal_pending → reversed + stripe_reversal_id set
  * already reversed (mock returns sentinel): → reversed + log
  * not found (mock returns sentinel): → permanently_failed + log
  * transient 503: retry_count++, next_retry_at set with backoff,
    stays reversal_pending
  * backoff capped at backoffMax (verified with base=1s, max=10s,
    retry_count=4 → capped at 10s not 16s)
  * max retries exhausted: → permanently_failed
  * legacy row with empty stripe_transfer_id: → permanently_failed,
    does not call Stripe
  * only picks up reversal_pending (skips all other statuses)
  * respects next_retry_at (future rows skipped)

Existing test updated: TestProcessRefundWebhook_SucceededFinalizesState
now asserts the row lands at reversal_pending with next_retry_at
set (worker's responsibility to drive to reversed), not reversed.

Worker wired in cmd/api/main.go alongside TransferRetryWorker,
sharing the same StripeConnectService instance. Shutdown path
registered for graceful stop.

Cut from day 2 scope (per agreed-upon discipline), landing in day 3:
  * Stripe 404 disambiguation implementation (parse error.Code)
  * End-to-end smoke probe (refund → reversal_pending → worker
    processes → reversed) against local Postgres + mock Stripe
  * Batch-size tuning / inter-batch sleep — batchLimit=20 today is
    safely under Stripe's 100 req/s default rate limit; revisit if
    observed load warrants

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 15:34:29 +02:00
senke
9002e91d91 refactor(backend,infra): unify SMTP env schema on canonical SMTP_* names
Third item of the v1.0.6 backlog. The v1.0.5.1 hotfix surfaced that two
email paths in-tree read *different* env vars for the same configuration:

    internal/email/sender.go         internal/services/email_service.go
    SMTP_USERNAME                    SMTP_USER
    SMTP_FROM                        FROM_EMAIL
    SMTP_FROM_NAME                   FROM_NAME

The hotfix worked around it by exporting both sets in `.env.template`.
This commit reconciles them onto a single schema so the workaround can
go away.

Changes
  * `internal/email/sender.go` is now the single loader. The canonical
    names (`SMTP_USERNAME`, `SMTP_FROM`, `SMTP_FROM_NAME`) are read
    first; the legacy names (`SMTP_USER`, `FROM_EMAIL`, `FROM_NAME`)
    stay supported as a migration fallback that logs a structured
    deprecation warning ("remove_in: v1.1.0"). Canonical always wins
    over deprecated — no silent precedence flip.
  * `NewSMTPEmailSender` callers keep working unchanged; a new
    `LoadSMTPConfigFromEnvWithLogger(*zap.Logger)` variant lets callers
    opt into the warning stream.
  * `internal/services/email_service.go` drops its six inline
    `os.Getenv` reads and delegates to the shared loader, so
    `AuthService.Register` and `RequestPasswordReset` now see exactly
    the same config as the async job worker.
  * `.env.template`: the duplicate (SMTP_USER + FROM_EMAIL + FROM_NAME)
    block added in v1.0.5.1 is removed — only the canonical SMTP_*
    names ship for new contributors.
  * `docker-compose.yml` (backend-api service): FROM_EMAIL / FROM_NAME
    renamed to SMTP_FROM / SMTP_FROM_NAME to match the canonical schema.
  * No Host/Port default injected in the loader. If SMTP_HOST is
    empty, callers see Host=="" and log-only (historic dev behavior).
    Dev defaults (MailHog localhost:1025) live in `.env.template`, so
    a fresh clone still works; a misconfigured prod pod fails loud
    instead of silently dialing localhost.

Tests
  * 5 new Go tests in `internal/email/smtp_env_test.go`: empty-env
    returns empty config; canonical names read directly; deprecated
    names fall back (one warning per var); canonical wins over
    deprecated silently; nil logger is allowed.
  * Existing `TestLoadSMTPConfigFromEnv`, `TestSMTPEmailSender_Send`,
    and every auth/services package remained green (40+ packages).

Import-cycle note: the loader deliberately lives in `internal/email`,
not `internal/config`, because `internal/config` already depends on
`internal/email` (wiring `EmailSender` at boot). Putting the loader in
`email` keeps the dependency flow one-way.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 20:44:09 +02:00
senke
070e31a463 chore(release): v1.0.5.1 — dev SMTP ergonomics hotfix
A fresh clone + `cp veza-backend-api/.env.template .env` + `make dev-full`
booted the backend with `SMTP_HOST=""` — `EmailService.sendEmail` short-
circuits to log-only when the host is empty, so `register` + `password
reset` produced users stuck with no way to verify (or recover) in dev,
and the smoke test caught MailHog empty despite the service being up.

- `.env.template` now ships MailHog-ready defaults (`localhost:1025`,
  UI on `:8025`, `FROM_EMAIL=no-reply@veza.local`) so a bare clone +
  copy gives a working register flow. Comment rewritten to point at
  both the dev path and the prod override.
- Also exports duplicate variable names (`SMTP_USERNAME`, `SMTP_FROM`,
  `SMTP_FROM_NAME`) read by `internal/email/sender.go`. The two email
  services in-tree disagree on env schema (`SMTP_USER` vs
  `SMTP_USERNAME`, `FROM_EMAIL` vs `SMTP_FROM`, `FROM_NAME` vs
  `SMTP_FROM_NAME`); until v1.0.6 reconciles them, both sets are
  populated so whichever path fires finds its names.

Pure config hotfix. No code change, no migration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 18:16:54 +02:00
senke
2df921abd5 v0.9.1 2026-03-05 19:22:31 +01:00
senke
ae81e171c7 feat(seller): add Stripe Connect config 2026-02-23 22:09:23 +01:00
senke
b73387af3c feat(api): add PostgreSQL read replica support (3.7)
- Add DATABASE_READ_URL config and InitReadReplica in database package
- Add ForRead() helper for read-only handler routing
- Update TrackService and TrackSearchService to use read replica for reads
- Document setup in DEPLOYMENT_GUIDE.md and .env.template
2026-02-14 22:50:23 +01:00
senke
92f432fb9e chore: consolidate pending changes (Hyperswitch, PostCard, dashboard, stream server, etc.) 2026-02-14 21:45:15 +01:00
senke
30f17dfc2a chore(backend): config, router, auth, stream service, sanitizer, tests
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-11 22:19:09 +01:00
senke
712bfb6b8c config(template): add comprehensive .env.template
Created centralized environment template with all configuration
variables documented and categorized.

Categories:
- REQUIRED: DATABASE_URL, JWT_SECRET (min 32 chars), REDIS
- RECOMMENDED: SENTRY_DSN, COOKIE_SECURE, CORS_ALLOWED_ORIGINS
- OPTIONAL: RABBITMQ, SMTP, CLAMAV, S3

Features:
- Clear documentation for each variable
- Default values specified
- Validation rules documented
- Environment-specific guidance (dev vs prod)
- Security notes for sensitive values

Impact: Single source of truth for configuration, reduces config drift.

Fixes: P3.4 (part 1) from audit AUDIT_TEMP_29_01_2026.md
2026-01-29 23:32:18 +01:00