veza/k8s/disaster-recovery/runbooks/cluster-failover.md at 2aea1af361b3d15951fc537ed3cbaac262a9ae9a

senke 2aea1af361 docs(J2): align docs with reality — rewrite CLAUDE.md, fix README, purge chat-server refs

Completes Day 2 of the v1.0.3 → v1.0.4 cleanup sprint. The documentation
now describes the actual repo layout instead of a fictional one.

CLAUDE.md — complete rewrite
  Old version referenced paths that don't exist and a protocol aimed at
  implementing v0.11.0 (current tag: v1.0.3). The agent was following a
  map for a city that had been rebuilt.
  - backend/        → veza-backend-api/
  - frontend/       → apps/web/
  - ORIGIN/ (root)  → veza-docs/ORIGIN/
  - veza-chat-server → merged into backend-api (v0.502, commit 279a10d31)
  - apps/desktop/   → never existed
  Also refreshed: stack versions (Go 1.25, Vite 5, React 18.2, Axum 0.8),
  commands, conventions, hook bypasses (SKIP_TYPES/SKIP_TESTS/SKIP_E2E),
  scope rules kept as immutable (no AI/ML, no Web3, no gamification, no
  dark patterns, no public popularity metrics).

README.md — targeted fixes
  - "Version cible: v0.101" → "Version courante: v1.0.4"
  - "Development Setup (v0.9.3)" → "Development Setup"
  - Removed Desktop (Electron) section — never implemented
  - Removed veza-chat-server from structure — merged into backend
  - Removed deprecated compose files section (nothing is DEPRECATED now)

k8s runbooks — remove stale chat-server references
  The disaster-recovery runbooks still scaled/restarted a deployment
  that no longer exists. In a real failover these commands would have
  failed silently and blocked the procedure. Files patched:
    - k8s/disaster-recovery/runbooks/cluster-failover.md
    - k8s/disaster-recovery/runbooks/data-restore.md
    - k8s/disaster-recovery/runbooks/database-failover.md
    - k8s/disaster-recovery/runbooks/rollback-procedure.md
    - k8s/network-policies/README.md
    - k8s/secrets/README.md
    - k8s/secrets.yaml.example
  Each reference is replaced by a short inline note pointing to v0.502
  (commit 279a10d31) so future readers understand the history.

.env.example — remove CHAT_JWT_SECRET
  Legacy env var for the deleted chat server. Replaced by an explanatory
  comment.

Not in this commit (user handles on Forgejo):
  - Closing the 5 open dependabot PRs on veza-chat-server/* branches
  - Deleting those 5 remote branches after the PRs are closed

Refs: AUDIT_REPORT.md §5.1, §7.1, §10 P1, §10 P4

7.8 KiB

Raw Blame History

Cluster Failover Runbook

Prerequisites

Pre-Failover Checklist

Failover Procedure

Step 1: Verify DR Cluster Status

Step 2: Restore Secrets

Step 3: Restore Database

Step 4: Deploy Applications

Step 5: Configure Ingress

Step 6: Update DNS

Step 7: Verify Services

Step 8: Restore Redis (if needed)

Verification Checklist

Post-Failover Tasks

Immediate (First Hour)

Short Term (First Day)

Long Term (Recovery Phase)

Failback Procedure

Troubleshooting

Database Restore Fails

Applications Not Starting

DNS Not Propagating

References

7.8 KiB Raw Blame History

Cluster Failover Runbook

Prerequisites

Pre-Failover Checklist

Failover Procedure

Step 1: Verify DR Cluster Status

Step 2: Restore Secrets

Step 3: Restore Database

Step 4: Deploy Applications

Step 5: Configure Ingress

Step 6: Update DNS

Step 7: Verify Services

Step 8: Restore Redis (if needed)

Verification Checklist

Post-Failover Tasks

Immediate (First Hour)

Short Term (First Day)

Long Term (Recovery Phase)

Failback Procedure

Troubleshooting

Database Restore Fails

Applications Not Starting

DNS Not Propagating

References

7.8 KiB

Raw Blame History