Some checks failed
Veza CI / Backend (Go) (push) Failing after 8m56s
Veza CI / Frontend (Web) (push) Has been cancelled
E2E Playwright / e2e (full) (push) Has been cancelled
Veza CI / Notify on failure (push) Blocked by required conditions
Veza CI / Rust (Stream Server) (push) Successful in 5m3s
Security Scan / Secret Scanning (gitleaks) (push) Failing after 53s
Three Incus containers, each running redis-server + redis-sentinel (co-located). redis-1 = master at first boot, redis-2/3 = replicas. Sentinel quorum=2 of 3 ; failover-timeout=30s satisfies the W3 acceptance criterion. - internal/config/redis_init.go : initRedis branches on REDIS_SENTINEL_ADDRS ; non-empty -> redis.NewFailoverClient with MasterName + SentinelAddrs + SentinelPassword. Empty -> existing single-instance NewClient (dev/local stays parametric). - internal/config/config.go : 3 new fields (RedisSentinelAddrs, RedisSentinelMasterName, RedisSentinelPassword) read from env. parseRedisSentinelAddrs trims+filters CSV. - internal/metrics/cache_hit_rate.go : new RecordCacheHit / Miss counters, labelled by subsystem. Cardinality bounded. - internal/middleware/rate_limiter.go : instrument 3 Eval call sites (DDoS, frontend log throttle, upload throttle). Hit = Redis answered, Miss = error -> in-memory fallback. - internal/services/chat_pubsub.go : instrument Publish + PublishPresence. - internal/websocket/chat/presence_service.go : instrument SetOnline / SetOffline / Heartbeat / GetPresence. redis.Nil counts as a hit (legitimate empty result). - infra/ansible/roles/redis_sentinel/ : install Redis 7 + Sentinel, render redis.conf + sentinel.conf, systemd units. Vault assertion prevents shipping placeholder passwords to staging/prod. - infra/ansible/playbooks/redis_sentinel.yml : provisions the 3 containers + applies common baseline + role. - infra/ansible/inventory/lab.yml : new groups redis_ha + redis_ha_master. - infra/ansible/tests/test_redis_failover.sh : kills the master container, polls Sentinel for the new master, asserts elapsed < 30s. - config/grafana/dashboards/redis-cache-overview.json : 3 hit-rate stats (rate_limiter / chat_pubsub / presence) + ops/s breakdown. - docs/ENV_VARIABLES.md §3 : 3 new REDIS_SENTINEL_* env vars. - veza-backend-api/.env.template : 3 placeholders (empty default). Acceptance (Day 11) : Sentinel failover < 30s ; cache hit-rate dashboard populated. Lab test pending Sentinel deployment. W3 verification gate progress : Redis Sentinel ✓ (this commit), MinIO EC4+2 ⏳ Day 12, CDN ⏳ Day 13, DMCA ⏳ Day 14, embed ⏳ Day 15. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
90 lines
3 KiB
YAML
90 lines
3 KiB
YAML
# Lab inventory — the R720's local lab Incus container used to dry-run
|
|
# role changes before they touch staging or prod. Override
|
|
# ansible_host / ansible_user / ansible_port in `host_vars/<host>.yml`
|
|
# (gitignored if it carries credentials, otherwise plain values).
|
|
#
|
|
# Usage:
|
|
# ansible-playbook -i inventory/lab.yml playbooks/site.yml --check
|
|
# ansible-playbook -i inventory/lab.yml playbooks/site.yml
|
|
#
|
|
# v1.0.9 Day 6: postgres_ha group added. The 3 containers
|
|
# (pgaf-monitor, pgaf-primary, pgaf-replica) live ON the veza-lab
|
|
# host and are addressed via the `community.general.incus`
|
|
# connection plugin — no SSH setup needed inside the containers.
|
|
all:
|
|
hosts:
|
|
veza-lab:
|
|
ansible_host: 10.0.20.150
|
|
ansible_user: senke
|
|
ansible_python_interpreter: /usr/bin/python3
|
|
children:
|
|
incus_hosts:
|
|
hosts:
|
|
veza-lab:
|
|
veza_lab:
|
|
hosts:
|
|
veza-lab:
|
|
postgres_ha:
|
|
hosts:
|
|
pgaf-monitor:
|
|
pg_auto_failover_role: monitor
|
|
pgaf-primary:
|
|
pg_auto_failover_role: node
|
|
pgaf-replica:
|
|
pg_auto_failover_role: node
|
|
vars:
|
|
# Containers reached via Incus exec on the parent host. The
|
|
# plugin lives in the community.general collection — install
|
|
# with `ansible-galaxy collection install community.general`
|
|
# before running this playbook.
|
|
ansible_connection: community.general.incus
|
|
ansible_python_interpreter: /usr/bin/python3
|
|
postgres_ha_monitor:
|
|
hosts:
|
|
pgaf-monitor:
|
|
postgres_ha_nodes:
|
|
# Order matters — primary first so it registers as primary; replica
|
|
# second so it joins as standby.
|
|
hosts:
|
|
pgaf-primary:
|
|
pgaf-replica:
|
|
# v1.0.9 Day 7: pgbouncer fronts the formation. Same
|
|
# community.general.incus connection plugin as postgres_ha.
|
|
pgbouncer:
|
|
hosts:
|
|
pgaf-pgbouncer:
|
|
vars:
|
|
ansible_connection: community.general.incus
|
|
ansible_python_interpreter: /usr/bin/python3
|
|
# v1.0.9 W3 Day 11: Redis Sentinel HA. 3 Incus containers each
|
|
# running a redis-server + redis-sentinel; redis-1 boots as master,
|
|
# the other two as replicas. Sentinel quorum = 2 across the 3.
|
|
redis_ha:
|
|
hosts:
|
|
redis-1:
|
|
redis-2:
|
|
redis-3:
|
|
vars:
|
|
ansible_connection: community.general.incus
|
|
ansible_python_interpreter: /usr/bin/python3
|
|
redis_ha_master:
|
|
# First in this list is the bootstrap master ; sentinel.conf.j2
|
|
# references this group to point each sentinel at it.
|
|
hosts:
|
|
redis-1:
|
|
# v1.0.9 Day 9: otel-collector + Tempo for distributed tracing.
|
|
# Each runs in its own Incus container; the API on the host points
|
|
# at otel-collector.lxd:4317 via OTEL_EXPORTER_OTLP_ENDPOINT.
|
|
observability:
|
|
hosts:
|
|
otel-collector:
|
|
tempo:
|
|
vars:
|
|
ansible_connection: community.general.incus
|
|
ansible_python_interpreter: /usr/bin/python3
|
|
otel_collectors:
|
|
hosts:
|
|
otel-collector:
|
|
tempo:
|
|
hosts:
|
|
tempo:
|