Commit graph

1 commit

Author SHA1 Message Date
senke
97ca5209a1 fix(chat,config): require REDIS_URL in prod + error on in-memory fallback
Two connected failure modes that silently break multi-pod deployments:

  1. `RedisURL` has a struct-level default (`redis://<appDomain>:6379`)
     that makes `c.RedisURL == ""` always false. An operator forgetting
     to set `REDIS_URL` booted against a phantom host — every Redis call
     would then fail, and `ChatPubSubService` would quietly fall back to
     an in-memory map. On a single-pod deploy that "works"; on two pods
     it silently partitions chat (messages on pod A never reach
     subscribers on pod B).
  2. The fallback itself was logged at `Warn` level, buried under normal
     traffic. Operators only noticed when users reported stuck chats.

Changes:

  * `config.go` (`ValidateForEnvironment` prod branch): new check that
    `os.Getenv("REDIS_URL")` is non-empty. The struct field is left
    alone (dev + test still use the default); we inspect the raw env so
    the check is "explicitly set" rather than "non-empty after defaults".
  * `chat_pubsub.go` `NewChatPubSubService`: if `redisClient == nil`,
    emit an `ERROR` at construction time naming the failure mode
    ("cross-instance messages will be lost"). Same `Warn`→`Error`
    promotion for the `Publish` fallback path — runbook-worthy.

Tests: new `chat_pubsub_test.go` with a `zaptest/observer` that asserts
the ERROR-level log fires exactly once when Redis is nil, plus an
in-memory fan-out happy-path so single-pod dev behaviour stays covered.
New `TestValidateForEnvironment_RedisURLRequiredInProduction` mirrors
the Hyperswitch guard test shape.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 14:56:47 +02:00