Two connected failure modes that silently break multi-pod deployments:
1. `RedisURL` has a struct-level default (`redis://<appDomain>:6379`)
that makes `c.RedisURL == ""` always false. An operator forgetting
to set `REDIS_URL` booted against a phantom host — every Redis call
would then fail, and `ChatPubSubService` would quietly fall back to
an in-memory map. On a single-pod deploy that "works"; on two pods
it silently partitions chat (messages on pod A never reach
subscribers on pod B).
2. The fallback itself was logged at `Warn` level, buried under normal
traffic. Operators only noticed when users reported stuck chats.
Changes:
* `config.go` (`ValidateForEnvironment` prod branch): new check that
`os.Getenv("REDIS_URL")` is non-empty. The struct field is left
alone (dev + test still use the default); we inspect the raw env so
the check is "explicitly set" rather than "non-empty after defaults".
* `chat_pubsub.go` `NewChatPubSubService`: if `redisClient == nil`,
emit an `ERROR` at construction time naming the failure mode
("cross-instance messages will be lost"). Same `Warn`→`Error`
promotion for the `Publish` fallback path — runbook-worthy.
Tests: new `chat_pubsub_test.go` with a `zaptest/observer` that asserts
the ERROR-level log fires exactly once when Redis is nil, plus an
in-memory fan-out happy-path so single-pod dev behaviour stays covered.
New `TestValidateForEnvironment_RedisURLRequiredInProduction` mirrors
the Hyperswitch guard test shape.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
With payments disabled, the marketplace flow still completes: orders are
created with status `CREATED`, the download URL is released, and no PSP
call is ever made. In other words: on a misconfigured prod instance, every
purchase is free. The only signal was a silent `hyperswitch_enabled=false`
at boot.
`ValidateForEnvironment()` (already wired at `NewConfig` line 513, before
the HTTP listener binds) now rejects `APP_ENV=production` with
`HyperswitchEnabled=false`. The error message names the failure mode
explicitly ("effectively giving away products") rather than a terse
"config invalid" — this is a revenue leak, not a typo.
Dev and staging are unaffected.
Tests: 3 new cases in `validation_test.go`
(`TestValidateForEnvironment_HyperswitchRequiredInProduction`) +
`TestLoadConfig_ProdValid` updated to set `HyperswitchEnabled: true`.
`TestValidateForEnvironment_ClamAVRequiredInProduction` fixture also
includes the new field so its "succeeds" sub-test still runs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
backend-ci.yml's `test -z "$(gofmt -l .)"` strict gate (added in
13c21ac11) failed on a backlog of unformatted files. None of the
85 files in this commit had been edited since the gate was added
because no push touched veza-backend-api/** in between, so the
gate never fired until today's CI fixes triggered it.
The diff is exclusively whitespace alignment in struct literals
and trailing-space comments. `go build ./...` and the full test
suite (with VEZA_SKIP_INTEGRATION=1 -short) pass identically.
Three test failures triggered by changes in 73eca4f6a:
1. TestGetCORSOrigins_EnvironmentDefaults expected dev/staging origins
on :8080 but cors.go now generates :18080 (matching the actual
backend port from Dockerfile EXPOSE). Test was the stale side.
2. TestLoadConfig_ProdValid and TestValidateForEnvironment_ClamAVRequiredInProduction
built a Config literal missing fields that ValidateForEnvironment now
requires in production: ChatJWTSecret (must differ from JWTSecret),
OAuthEncryptionKey (≥32 bytes), JWTIssuer, JWTAudience. Also
explicitly set CLAMAV_REQUIRED=true so validation order is deterministic.
Update RabbitMQ config and eventbus. Improve secret filter logging.
Refine presence, cloud, and social services. Update announcement and
feature flag handlers. Add track_likes updated_at migration. Rebuild
seed binary.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ORDER BY dynamiques : whitelist explicite, fallback created_at DESC
- Login/register soumis au rate limiter global
- VERSION sync + check CI
- Nettoyage références veza-chat-server
- Go 1.24 partout (Dockerfile, workflows)
- TODO/FIXME/HACK convertis en issues ou résolus
- PKCE (S256) in OAuth flow: code_verifier in oauth_states, code_challenge in auth URL
- CryptoService: AES-256-GCM encryption for OAuth provider tokens at rest
- OAuth redirect URL validated against OAUTH_ALLOWED_REDIRECT_DOMAINS
- CHAT_JWT_SECRET must differ from JWT_SECRET in production
- Migration script: cmd/tools/encrypt_oauth_tokens for existing tokens
- Fixes: VEZA-SEC-003, VEZA-SEC-004, VEZA-SEC-009, VEZA-SEC-010
- Add HLSEnabled and HLSStorageDir to backend config (HLS_STREAMING env)
- Register HLS serving routes (master.m3u8, quality playlist, segments)
behind HLSEnabled feature flag on existing track routes
- Add GetHLSStatus and TriggerHLSTranscode methods to StreamService
for stream server communication
- Update docker-compose (dev, staging, prod) with HLS env vars and
shared hls-data volume between backend and stream-server
- Stream callback already correctly updates stream_manifest_url
INT-03: Tests for health endpoint, auth flow, track upload auth,
webhook HTTPS-only, and rate limit headers. Build-tagged
'integration' to avoid running in regular test suite.
SEC-08: If HYPERSWITCH_ENABLED=true in production, startup now fails
unless HYPERSWITCH_WEBHOOK_SECRET is set. This prevents webhook
signature verification from being silently bypassed.
- Add FrontendURL to config (FRONTEND_URL or VITE_FRONTEND_URL)
- OAuth handlers use config instead of os.Getenv
- Update TODOS_AUDIT: mark UUID migration items as resolved
- Add ISSUES_P2_BACKLOG.md for GitHub issues
- Add ROUTES_ORPHANES.md for routes without UI
- Document FRONTEND_URL in .env.example
Replace NODE_ENV/APP_ENV bypass with DISABLE_RATE_LIMIT_FOR_TESTS=true.
Only test runners should set this. Prevents rate limiting bypass when
APP_ENV=development is mistakenly used in production.
Phase 1 audit - P1.6
Add validation in ValidateForEnvironment() to fail startup when
CLAMAV_REQUIRED=false in production. Virus scanning is mandatory
for all file uploads in production.
Phase 1 audit - P1.4
- 1.6: Replace hardcoded JWT secrets in chat server tests with runtime-generated
values (env TEST_JWT_SECRET or uuid-based fallback)
- 1.7: Add validateNoBypassFlagsInProduction() in config; fail startup if
BYPASS_CONTENT_CREATOR_ROLE or CSRF_DISABLED is set in production
Refs: AUDIT_TECHNIQUE_INTEGRAL_2026_02_15.md items 1.6, 1.7
- Add DATABASE_READ_URL config and InitReadReplica in database package
- Add ForRead() helper for read-only handler routing
- Update TrackService and TrackSearchService to use read replica for reads
- Document setup in DEPLOYMENT_GUIDE.md and .env.template
- Deleted apps/web/src/utils/optimisticStoreUpdates.ts (unused file)
- File was unused - no imports found in codebase
- Mutations already use React Query's onMutate pattern
- No TypeScript errors after deletion
- Actions 4.4.1.2 and 4.4.1.3 complete
- Conflit SQLx résolu (alignement sur version 0.7)
- build.rs configurés pour protoc dans chat/stream servers
- API Prometheus migrée vers HistogramOpts
- Traits Display/Debug corrigés (String au lieu de &dyn Display)
- API TOTP corrigée (totp-rs 5.4 avec Secret::Encoded)
- Layers tracing-subscriber corrigés (types conditionnels)
- VezaError/VezaResult exportés dans lib.rs
- TransactionProvider simplifié (retour void au lieu de Box<dyn>)
- VezaConfig contraint Serialize pour to_json()
Files: veza-common/Cargo.toml, veza-common/src/*.rs, veza-chat-server/Cargo.toml, veza-chat-server/build.rs, veza-stream-server/Cargo.toml, veza-stream-server/build.rs, VEZA_ROADMAP.json
Hours: 8 estimated, 3 actual
- Increase IP rate limit from 100 to 200 requests per minute
- Increase IP burst from 10 to 20
- Increase SimpleRateLimiter limit from 100 to 200
- Allows frontend to make multiple requests during initial load (CSRF, state hydration, etc.)
- Can be overridden via RATE_LIMIT_IP_PER_MINUTE and RATE_LIMIT_LIMIT env vars