2025-12-25 20:40:31 +00:00
|
|
|
# Application Rollback Runbook
|
|
|
|
|
|
|
|
|
|
This runbook describes the procedure for rolling back a failed application deployment.
|
|
|
|
|
|
|
|
|
|
## Prerequisites
|
|
|
|
|
|
|
|
|
|
- Access to Kubernetes cluster
|
|
|
|
|
- kubectl configured
|
|
|
|
|
- Previous deployment version available
|
|
|
|
|
|
|
|
|
|
## Detection
|
|
|
|
|
|
|
|
|
|
### Automatic Detection
|
|
|
|
|
|
|
|
|
|
Health checks will automatically detect:
|
docs(J2): align docs with reality — rewrite CLAUDE.md, fix README, purge chat-server refs
Completes Day 2 of the v1.0.3 → v1.0.4 cleanup sprint. The documentation
now describes the actual repo layout instead of a fictional one.
CLAUDE.md — complete rewrite
Old version referenced paths that don't exist and a protocol aimed at
implementing v0.11.0 (current tag: v1.0.3). The agent was following a
map for a city that had been rebuilt.
- backend/ → veza-backend-api/
- frontend/ → apps/web/
- ORIGIN/ (root) → veza-docs/ORIGIN/
- veza-chat-server → merged into backend-api (v0.502, commit 279a10d31)
- apps/desktop/ → never existed
Also refreshed: stack versions (Go 1.25, Vite 5, React 18.2, Axum 0.8),
commands, conventions, hook bypasses (SKIP_TYPES/SKIP_TESTS/SKIP_E2E),
scope rules kept as immutable (no AI/ML, no Web3, no gamification, no
dark patterns, no public popularity metrics).
README.md — targeted fixes
- "Version cible: v0.101" → "Version courante: v1.0.4"
- "Development Setup (v0.9.3)" → "Development Setup"
- Removed Desktop (Electron) section — never implemented
- Removed veza-chat-server from structure — merged into backend
- Removed deprecated compose files section (nothing is DEPRECATED now)
k8s runbooks — remove stale chat-server references
The disaster-recovery runbooks still scaled/restarted a deployment
that no longer exists. In a real failover these commands would have
failed silently and blocked the procedure. Files patched:
- k8s/disaster-recovery/runbooks/cluster-failover.md
- k8s/disaster-recovery/runbooks/data-restore.md
- k8s/disaster-recovery/runbooks/database-failover.md
- k8s/disaster-recovery/runbooks/rollback-procedure.md
- k8s/network-policies/README.md
- k8s/secrets/README.md
- k8s/secrets.yaml.example
Each reference is replaced by a short inline note pointing to v0.502
(commit 279a10d31) so future readers understand the history.
.env.example — remove CHAT_JWT_SECRET
Legacy env var for the deleted chat server. Replaced by an explanatory
comment.
Not in this commit (user handles on Forgejo):
- Closing the 5 open dependabot PRs on veza-chat-server/* branches
- Deleting those 5 remote branches after the PRs are closed
Refs: AUDIT_REPORT.md §5.1, §7.1, §10 P1, §10 P4
2026-04-14 15:23:50 +00:00
|
|
|
|
2025-12-25 20:40:31 +00:00
|
|
|
- Application crashes
|
|
|
|
|
- High error rates
|
|
|
|
|
- Slow response times
|
|
|
|
|
- Failed readiness probes
|
|
|
|
|
|
|
|
|
|
### Manual Detection
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Check pod status
|
|
|
|
|
kubectl get pods -n veza-production -l app=veza-backend-api
|
|
|
|
|
|
|
|
|
|
# Check deployment status
|
|
|
|
|
kubectl rollout status deployment/veza-backend-api -n veza-production
|
|
|
|
|
|
|
|
|
|
# Check application logs
|
|
|
|
|
kubectl logs -f deployment/veza-backend-api -n veza-production
|
|
|
|
|
|
|
|
|
|
# Check metrics
|
|
|
|
|
kubectl top pods -n veza-production
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Rollback Procedure
|
|
|
|
|
|
|
|
|
|
### Step 1: Verify Issue
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Check current deployment
|
|
|
|
|
kubectl get deployment veza-backend-api -n veza-production -o yaml
|
|
|
|
|
|
|
|
|
|
# Check recent events
|
|
|
|
|
kubectl get events -n veza-production --sort-by='.lastTimestamp' | tail -20
|
|
|
|
|
|
|
|
|
|
# Verify health endpoint
|
|
|
|
|
curl https://api.veza.com/health
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Step 2: Check Rollback History
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# View deployment history
|
|
|
|
|
kubectl rollout history deployment/veza-backend-api -n veza-production
|
|
|
|
|
|
|
|
|
|
# View details of previous revision
|
|
|
|
|
kubectl rollout history deployment/veza-backend-api -n veza-production --revision=<N>
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Step 3: Execute Rollback
|
|
|
|
|
|
|
|
|
|
#### Option A: Rollback to Previous Version
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Rollback to previous version
|
|
|
|
|
kubectl rollout undo deployment/veza-backend-api -n veza-production
|
|
|
|
|
|
|
|
|
|
# Monitor rollback progress
|
|
|
|
|
kubectl rollout status deployment/veza-backend-api -n veza-production
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
#### Option B: Rollback to Specific Revision
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Rollback to specific revision
|
|
|
|
|
kubectl rollout undo deployment/veza-backend-api -n veza-production --to-revision=<N>
|
|
|
|
|
|
|
|
|
|
# Monitor rollback progress
|
|
|
|
|
kubectl rollout status deployment/veza-backend-api -n veza-production
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Step 4: Verify Rollback
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Check pod status
|
|
|
|
|
kubectl get pods -n veza-production -l app=veza-backend-api
|
|
|
|
|
|
|
|
|
|
# Check deployment status
|
|
|
|
|
kubectl get deployment veza-backend-api -n veza-production
|
|
|
|
|
|
|
|
|
|
# Verify pods are ready
|
|
|
|
|
kubectl wait --for=condition=ready pod \
|
|
|
|
|
-l app=veza-backend-api \
|
|
|
|
|
-n veza-production \
|
|
|
|
|
--timeout=300s
|
|
|
|
|
|
|
|
|
|
# Check application logs
|
|
|
|
|
kubectl logs -f deployment/veza-backend-api -n veza-production
|
|
|
|
|
|
|
|
|
|
# Test health endpoint
|
|
|
|
|
curl https://api.veza.com/health
|
|
|
|
|
|
|
|
|
|
# Test critical endpoints
|
|
|
|
|
curl https://api.veza.com/api/v1/tracks
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Step 5: Verify Application Functionality
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Run smoke tests
|
|
|
|
|
# (Use your application's test suite)
|
|
|
|
|
|
|
|
|
|
# Check metrics
|
|
|
|
|
kubectl top pods -n veza-production
|
|
|
|
|
|
|
|
|
|
# Monitor error rates
|
|
|
|
|
# (Check monitoring dashboard)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Multi-Service Rollback
|
|
|
|
|
|
|
|
|
|
If multiple services need rollback:
|
|
|
|
|
|
|
|
|
|
```bash
|
docs(J2): align docs with reality — rewrite CLAUDE.md, fix README, purge chat-server refs
Completes Day 2 of the v1.0.3 → v1.0.4 cleanup sprint. The documentation
now describes the actual repo layout instead of a fictional one.
CLAUDE.md — complete rewrite
Old version referenced paths that don't exist and a protocol aimed at
implementing v0.11.0 (current tag: v1.0.3). The agent was following a
map for a city that had been rebuilt.
- backend/ → veza-backend-api/
- frontend/ → apps/web/
- ORIGIN/ (root) → veza-docs/ORIGIN/
- veza-chat-server → merged into backend-api (v0.502, commit 279a10d31)
- apps/desktop/ → never existed
Also refreshed: stack versions (Go 1.25, Vite 5, React 18.2, Axum 0.8),
commands, conventions, hook bypasses (SKIP_TYPES/SKIP_TESTS/SKIP_E2E),
scope rules kept as immutable (no AI/ML, no Web3, no gamification, no
dark patterns, no public popularity metrics).
README.md — targeted fixes
- "Version cible: v0.101" → "Version courante: v1.0.4"
- "Development Setup (v0.9.3)" → "Development Setup"
- Removed Desktop (Electron) section — never implemented
- Removed veza-chat-server from structure — merged into backend
- Removed deprecated compose files section (nothing is DEPRECATED now)
k8s runbooks — remove stale chat-server references
The disaster-recovery runbooks still scaled/restarted a deployment
that no longer exists. In a real failover these commands would have
failed silently and blocked the procedure. Files patched:
- k8s/disaster-recovery/runbooks/cluster-failover.md
- k8s/disaster-recovery/runbooks/data-restore.md
- k8s/disaster-recovery/runbooks/database-failover.md
- k8s/disaster-recovery/runbooks/rollback-procedure.md
- k8s/network-policies/README.md
- k8s/secrets/README.md
- k8s/secrets.yaml.example
Each reference is replaced by a short inline note pointing to v0.502
(commit 279a10d31) so future readers understand the history.
.env.example — remove CHAT_JWT_SECRET
Legacy env var for the deleted chat server. Replaced by an explanatory
comment.
Not in this commit (user handles on Forgejo):
- Closing the 5 open dependabot PRs on veza-chat-server/* branches
- Deleting those 5 remote branches after the PRs are closed
Refs: AUDIT_REPORT.md §5.1, §7.1, §10 P1, §10 P4
2026-04-14 15:23:50 +00:00
|
|
|
# Rollback backend API (handles chat since v0.502 merge)
|
2025-12-25 20:40:31 +00:00
|
|
|
kubectl rollout undo deployment/veza-backend-api -n veza-production
|
|
|
|
|
|
|
|
|
|
# Rollback frontend
|
|
|
|
|
kubectl rollout undo deployment/veza-frontend -n veza-production
|
|
|
|
|
|
docs(J2): align docs with reality — rewrite CLAUDE.md, fix README, purge chat-server refs
Completes Day 2 of the v1.0.3 → v1.0.4 cleanup sprint. The documentation
now describes the actual repo layout instead of a fictional one.
CLAUDE.md — complete rewrite
Old version referenced paths that don't exist and a protocol aimed at
implementing v0.11.0 (current tag: v1.0.3). The agent was following a
map for a city that had been rebuilt.
- backend/ → veza-backend-api/
- frontend/ → apps/web/
- ORIGIN/ (root) → veza-docs/ORIGIN/
- veza-chat-server → merged into backend-api (v0.502, commit 279a10d31)
- apps/desktop/ → never existed
Also refreshed: stack versions (Go 1.25, Vite 5, React 18.2, Axum 0.8),
commands, conventions, hook bypasses (SKIP_TYPES/SKIP_TESTS/SKIP_E2E),
scope rules kept as immutable (no AI/ML, no Web3, no gamification, no
dark patterns, no public popularity metrics).
README.md — targeted fixes
- "Version cible: v0.101" → "Version courante: v1.0.4"
- "Development Setup (v0.9.3)" → "Development Setup"
- Removed Desktop (Electron) section — never implemented
- Removed veza-chat-server from structure — merged into backend
- Removed deprecated compose files section (nothing is DEPRECATED now)
k8s runbooks — remove stale chat-server references
The disaster-recovery runbooks still scaled/restarted a deployment
that no longer exists. In a real failover these commands would have
failed silently and blocked the procedure. Files patched:
- k8s/disaster-recovery/runbooks/cluster-failover.md
- k8s/disaster-recovery/runbooks/data-restore.md
- k8s/disaster-recovery/runbooks/database-failover.md
- k8s/disaster-recovery/runbooks/rollback-procedure.md
- k8s/network-policies/README.md
- k8s/secrets/README.md
- k8s/secrets.yaml.example
Each reference is replaced by a short inline note pointing to v0.502
(commit 279a10d31) so future readers understand the history.
.env.example — remove CHAT_JWT_SECRET
Legacy env var for the deleted chat server. Replaced by an explanatory
comment.
Not in this commit (user handles on Forgejo):
- Closing the 5 open dependabot PRs on veza-chat-server/* branches
- Deleting those 5 remote branches after the PRs are closed
Refs: AUDIT_REPORT.md §5.1, §7.1, §10 P1, §10 P4
2026-04-14 15:23:50 +00:00
|
|
|
# Rollback stream server (if media layer affected)
|
|
|
|
|
kubectl rollout undo deployment/veza-stream-server -n veza-production
|
2025-12-25 20:40:31 +00:00
|
|
|
|
|
|
|
|
# Monitor all rollbacks
|
|
|
|
|
kubectl rollout status deployment/veza-backend-api -n veza-production
|
|
|
|
|
kubectl rollout status deployment/veza-frontend -n veza-production
|
docs(J2): align docs with reality — rewrite CLAUDE.md, fix README, purge chat-server refs
Completes Day 2 of the v1.0.3 → v1.0.4 cleanup sprint. The documentation
now describes the actual repo layout instead of a fictional one.
CLAUDE.md — complete rewrite
Old version referenced paths that don't exist and a protocol aimed at
implementing v0.11.0 (current tag: v1.0.3). The agent was following a
map for a city that had been rebuilt.
- backend/ → veza-backend-api/
- frontend/ → apps/web/
- ORIGIN/ (root) → veza-docs/ORIGIN/
- veza-chat-server → merged into backend-api (v0.502, commit 279a10d31)
- apps/desktop/ → never existed
Also refreshed: stack versions (Go 1.25, Vite 5, React 18.2, Axum 0.8),
commands, conventions, hook bypasses (SKIP_TYPES/SKIP_TESTS/SKIP_E2E),
scope rules kept as immutable (no AI/ML, no Web3, no gamification, no
dark patterns, no public popularity metrics).
README.md — targeted fixes
- "Version cible: v0.101" → "Version courante: v1.0.4"
- "Development Setup (v0.9.3)" → "Development Setup"
- Removed Desktop (Electron) section — never implemented
- Removed veza-chat-server from structure — merged into backend
- Removed deprecated compose files section (nothing is DEPRECATED now)
k8s runbooks — remove stale chat-server references
The disaster-recovery runbooks still scaled/restarted a deployment
that no longer exists. In a real failover these commands would have
failed silently and blocked the procedure. Files patched:
- k8s/disaster-recovery/runbooks/cluster-failover.md
- k8s/disaster-recovery/runbooks/data-restore.md
- k8s/disaster-recovery/runbooks/database-failover.md
- k8s/disaster-recovery/runbooks/rollback-procedure.md
- k8s/network-policies/README.md
- k8s/secrets/README.md
- k8s/secrets.yaml.example
Each reference is replaced by a short inline note pointing to v0.502
(commit 279a10d31) so future readers understand the history.
.env.example — remove CHAT_JWT_SECRET
Legacy env var for the deleted chat server. Replaced by an explanatory
comment.
Not in this commit (user handles on Forgejo):
- Closing the 5 open dependabot PRs on veza-chat-server/* branches
- Deleting those 5 remote branches after the PRs are closed
Refs: AUDIT_REPORT.md §5.1, §7.1, §10 P1, §10 P4
2026-04-14 15:23:50 +00:00
|
|
|
kubectl rollout status deployment/veza-stream-server -n veza-production
|
2025-12-25 20:40:31 +00:00
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Database Migration Rollback
|
|
|
|
|
|
|
|
|
|
If rollback includes database changes:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# 1. Stop application
|
|
|
|
|
kubectl scale deployment veza-backend-api --replicas=0 -n veza-production
|
|
|
|
|
|
|
|
|
|
# 2. Rollback database migration
|
|
|
|
|
# (Use your migration tool)
|
|
|
|
|
# Example with migrate tool:
|
|
|
|
|
kubectl run migrate-rollback --rm -it --image=veza-backend-api:previous \
|
|
|
|
|
--restart=Never \
|
|
|
|
|
--env="DATABASE_URL=$DATABASE_URL" \
|
|
|
|
|
-- migrate -path /migrations -database $DATABASE_URL down 1
|
|
|
|
|
|
|
|
|
|
# 3. Rollback application
|
|
|
|
|
kubectl rollout undo deployment/veza-backend-api -n veza-production
|
|
|
|
|
|
|
|
|
|
# 4. Restart application
|
|
|
|
|
kubectl scale deployment veza-backend-api --replicas=3 -n veza-production
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Verification Checklist
|
|
|
|
|
|
|
|
|
|
- [ ] Previous version identified
|
|
|
|
|
- [ ] Rollback executed
|
|
|
|
|
- [ ] Pods are running and ready
|
|
|
|
|
- [ ] Health checks passing
|
|
|
|
|
- [ ] Application logs show no errors
|
|
|
|
|
- [ ] Critical endpoints responding
|
|
|
|
|
- [ ] Metrics normalized
|
|
|
|
|
- [ ] Users can access platform
|
|
|
|
|
- [ ] Monitoring alerts cleared
|
|
|
|
|
|
|
|
|
|
## Troubleshooting
|
|
|
|
|
|
|
|
|
|
### Rollback Fails
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Check deployment status
|
|
|
|
|
kubectl describe deployment veza-backend-api -n veza-production
|
|
|
|
|
|
|
|
|
|
# Check pod events
|
|
|
|
|
kubectl describe pod <pod-name> -n veza-production
|
|
|
|
|
|
|
|
|
|
# Check image availability
|
|
|
|
|
kubectl get pod <pod-name> -n veza-production -o jsonpath='{.spec.containers[0].image}'
|
|
|
|
|
|
|
|
|
|
# If image is missing, may need to rebuild or use different image
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Pods Not Starting
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Check pod logs
|
|
|
|
|
kubectl logs <pod-name> -n veza-production
|
|
|
|
|
|
|
|
|
|
# Check resource constraints
|
|
|
|
|
kubectl describe pod <pod-name> -n veza-production | grep -A 5 "Limits\|Requests"
|
|
|
|
|
|
|
|
|
|
# Check node resources
|
|
|
|
|
kubectl top nodes
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Application Still Failing After Rollback
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Verify correct version is deployed
|
|
|
|
|
kubectl get deployment veza-backend-api -n veza-production -o jsonpath='{.spec.template.spec.containers[0].image}'
|
|
|
|
|
|
|
|
|
|
# Check if issue is in previous version too
|
|
|
|
|
kubectl logs <pod-name> -n veza-production
|
|
|
|
|
|
|
|
|
|
# May need to rollback further or investigate root cause
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Post-Rollback Tasks
|
|
|
|
|
|
|
|
|
|
1. **Investigate Root Cause**
|
docs(J2): align docs with reality — rewrite CLAUDE.md, fix README, purge chat-server refs
Completes Day 2 of the v1.0.3 → v1.0.4 cleanup sprint. The documentation
now describes the actual repo layout instead of a fictional one.
CLAUDE.md — complete rewrite
Old version referenced paths that don't exist and a protocol aimed at
implementing v0.11.0 (current tag: v1.0.3). The agent was following a
map for a city that had been rebuilt.
- backend/ → veza-backend-api/
- frontend/ → apps/web/
- ORIGIN/ (root) → veza-docs/ORIGIN/
- veza-chat-server → merged into backend-api (v0.502, commit 279a10d31)
- apps/desktop/ → never existed
Also refreshed: stack versions (Go 1.25, Vite 5, React 18.2, Axum 0.8),
commands, conventions, hook bypasses (SKIP_TYPES/SKIP_TESTS/SKIP_E2E),
scope rules kept as immutable (no AI/ML, no Web3, no gamification, no
dark patterns, no public popularity metrics).
README.md — targeted fixes
- "Version cible: v0.101" → "Version courante: v1.0.4"
- "Development Setup (v0.9.3)" → "Development Setup"
- Removed Desktop (Electron) section — never implemented
- Removed veza-chat-server from structure — merged into backend
- Removed deprecated compose files section (nothing is DEPRECATED now)
k8s runbooks — remove stale chat-server references
The disaster-recovery runbooks still scaled/restarted a deployment
that no longer exists. In a real failover these commands would have
failed silently and blocked the procedure. Files patched:
- k8s/disaster-recovery/runbooks/cluster-failover.md
- k8s/disaster-recovery/runbooks/data-restore.md
- k8s/disaster-recovery/runbooks/database-failover.md
- k8s/disaster-recovery/runbooks/rollback-procedure.md
- k8s/network-policies/README.md
- k8s/secrets/README.md
- k8s/secrets.yaml.example
Each reference is replaced by a short inline note pointing to v0.502
(commit 279a10d31) so future readers understand the history.
.env.example — remove CHAT_JWT_SECRET
Legacy env var for the deleted chat server. Replaced by an explanatory
comment.
Not in this commit (user handles on Forgejo):
- Closing the 5 open dependabot PRs on veza-chat-server/* branches
- Deleting those 5 remote branches after the PRs are closed
Refs: AUDIT_REPORT.md §5.1, §7.1, §10 P1, §10 P4
2026-04-14 15:23:50 +00:00
|
|
|
- Review deployment logs
|
|
|
|
|
- Check application logs
|
|
|
|
|
- Identify what caused failure
|
2025-12-25 20:40:31 +00:00
|
|
|
|
|
|
|
|
2. **Fix Issue**
|
docs(J2): align docs with reality — rewrite CLAUDE.md, fix README, purge chat-server refs
Completes Day 2 of the v1.0.3 → v1.0.4 cleanup sprint. The documentation
now describes the actual repo layout instead of a fictional one.
CLAUDE.md — complete rewrite
Old version referenced paths that don't exist and a protocol aimed at
implementing v0.11.0 (current tag: v1.0.3). The agent was following a
map for a city that had been rebuilt.
- backend/ → veza-backend-api/
- frontend/ → apps/web/
- ORIGIN/ (root) → veza-docs/ORIGIN/
- veza-chat-server → merged into backend-api (v0.502, commit 279a10d31)
- apps/desktop/ → never existed
Also refreshed: stack versions (Go 1.25, Vite 5, React 18.2, Axum 0.8),
commands, conventions, hook bypasses (SKIP_TYPES/SKIP_TESTS/SKIP_E2E),
scope rules kept as immutable (no AI/ML, no Web3, no gamification, no
dark patterns, no public popularity metrics).
README.md — targeted fixes
- "Version cible: v0.101" → "Version courante: v1.0.4"
- "Development Setup (v0.9.3)" → "Development Setup"
- Removed Desktop (Electron) section — never implemented
- Removed veza-chat-server from structure — merged into backend
- Removed deprecated compose files section (nothing is DEPRECATED now)
k8s runbooks — remove stale chat-server references
The disaster-recovery runbooks still scaled/restarted a deployment
that no longer exists. In a real failover these commands would have
failed silently and blocked the procedure. Files patched:
- k8s/disaster-recovery/runbooks/cluster-failover.md
- k8s/disaster-recovery/runbooks/data-restore.md
- k8s/disaster-recovery/runbooks/database-failover.md
- k8s/disaster-recovery/runbooks/rollback-procedure.md
- k8s/network-policies/README.md
- k8s/secrets/README.md
- k8s/secrets.yaml.example
Each reference is replaced by a short inline note pointing to v0.502
(commit 279a10d31) so future readers understand the history.
.env.example — remove CHAT_JWT_SECRET
Legacy env var for the deleted chat server. Replaced by an explanatory
comment.
Not in this commit (user handles on Forgejo):
- Closing the 5 open dependabot PRs on veza-chat-server/* branches
- Deleting those 5 remote branches after the PRs are closed
Refs: AUDIT_REPORT.md §5.1, §7.1, §10 P1, §10 P4
2026-04-14 15:23:50 +00:00
|
|
|
- Address root cause
|
|
|
|
|
- Test fix in staging
|
|
|
|
|
- Prepare new deployment
|
2025-12-25 20:40:31 +00:00
|
|
|
|
|
|
|
|
3. **Document Incident**
|
docs(J2): align docs with reality — rewrite CLAUDE.md, fix README, purge chat-server refs
Completes Day 2 of the v1.0.3 → v1.0.4 cleanup sprint. The documentation
now describes the actual repo layout instead of a fictional one.
CLAUDE.md — complete rewrite
Old version referenced paths that don't exist and a protocol aimed at
implementing v0.11.0 (current tag: v1.0.3). The agent was following a
map for a city that had been rebuilt.
- backend/ → veza-backend-api/
- frontend/ → apps/web/
- ORIGIN/ (root) → veza-docs/ORIGIN/
- veza-chat-server → merged into backend-api (v0.502, commit 279a10d31)
- apps/desktop/ → never existed
Also refreshed: stack versions (Go 1.25, Vite 5, React 18.2, Axum 0.8),
commands, conventions, hook bypasses (SKIP_TYPES/SKIP_TESTS/SKIP_E2E),
scope rules kept as immutable (no AI/ML, no Web3, no gamification, no
dark patterns, no public popularity metrics).
README.md — targeted fixes
- "Version cible: v0.101" → "Version courante: v1.0.4"
- "Development Setup (v0.9.3)" → "Development Setup"
- Removed Desktop (Electron) section — never implemented
- Removed veza-chat-server from structure — merged into backend
- Removed deprecated compose files section (nothing is DEPRECATED now)
k8s runbooks — remove stale chat-server references
The disaster-recovery runbooks still scaled/restarted a deployment
that no longer exists. In a real failover these commands would have
failed silently and blocked the procedure. Files patched:
- k8s/disaster-recovery/runbooks/cluster-failover.md
- k8s/disaster-recovery/runbooks/data-restore.md
- k8s/disaster-recovery/runbooks/database-failover.md
- k8s/disaster-recovery/runbooks/rollback-procedure.md
- k8s/network-policies/README.md
- k8s/secrets/README.md
- k8s/secrets.yaml.example
Each reference is replaced by a short inline note pointing to v0.502
(commit 279a10d31) so future readers understand the history.
.env.example — remove CHAT_JWT_SECRET
Legacy env var for the deleted chat server. Replaced by an explanatory
comment.
Not in this commit (user handles on Forgejo):
- Closing the 5 open dependabot PRs on veza-chat-server/* branches
- Deleting those 5 remote branches after the PRs are closed
Refs: AUDIT_REPORT.md §5.1, §7.1, §10 P1, §10 P4
2026-04-14 15:23:50 +00:00
|
|
|
- Document rollback procedure
|
|
|
|
|
- Note any issues encountered
|
|
|
|
|
- Update deployment process if needed
|
2025-12-25 20:40:31 +00:00
|
|
|
|
|
|
|
|
4. **Notify Stakeholders**
|
docs(J2): align docs with reality — rewrite CLAUDE.md, fix README, purge chat-server refs
Completes Day 2 of the v1.0.3 → v1.0.4 cleanup sprint. The documentation
now describes the actual repo layout instead of a fictional one.
CLAUDE.md — complete rewrite
Old version referenced paths that don't exist and a protocol aimed at
implementing v0.11.0 (current tag: v1.0.3). The agent was following a
map for a city that had been rebuilt.
- backend/ → veza-backend-api/
- frontend/ → apps/web/
- ORIGIN/ (root) → veza-docs/ORIGIN/
- veza-chat-server → merged into backend-api (v0.502, commit 279a10d31)
- apps/desktop/ → never existed
Also refreshed: stack versions (Go 1.25, Vite 5, React 18.2, Axum 0.8),
commands, conventions, hook bypasses (SKIP_TYPES/SKIP_TESTS/SKIP_E2E),
scope rules kept as immutable (no AI/ML, no Web3, no gamification, no
dark patterns, no public popularity metrics).
README.md — targeted fixes
- "Version cible: v0.101" → "Version courante: v1.0.4"
- "Development Setup (v0.9.3)" → "Development Setup"
- Removed Desktop (Electron) section — never implemented
- Removed veza-chat-server from structure — merged into backend
- Removed deprecated compose files section (nothing is DEPRECATED now)
k8s runbooks — remove stale chat-server references
The disaster-recovery runbooks still scaled/restarted a deployment
that no longer exists. In a real failover these commands would have
failed silently and blocked the procedure. Files patched:
- k8s/disaster-recovery/runbooks/cluster-failover.md
- k8s/disaster-recovery/runbooks/data-restore.md
- k8s/disaster-recovery/runbooks/database-failover.md
- k8s/disaster-recovery/runbooks/rollback-procedure.md
- k8s/network-policies/README.md
- k8s/secrets/README.md
- k8s/secrets.yaml.example
Each reference is replaced by a short inline note pointing to v0.502
(commit 279a10d31) so future readers understand the history.
.env.example — remove CHAT_JWT_SECRET
Legacy env var for the deleted chat server. Replaced by an explanatory
comment.
Not in this commit (user handles on Forgejo):
- Closing the 5 open dependabot PRs on veza-chat-server/* branches
- Deleting those 5 remote branches after the PRs are closed
Refs: AUDIT_REPORT.md §5.1, §7.1, §10 P1, §10 P4
2026-04-14 15:23:50 +00:00
|
|
|
- Send incident report
|
|
|
|
|
- Update status page
|
|
|
|
|
- Schedule post-mortem if needed
|
2025-12-25 20:40:31 +00:00
|
|
|
|
|
|
|
|
## Prevention
|
|
|
|
|
|
|
|
|
|
To prevent future rollbacks:
|
|
|
|
|
|
|
|
|
|
- **Automated Testing**: Run full test suite before deployment
|
|
|
|
|
- **Staged Rollouts**: Use canary or blue-green deployments
|
|
|
|
|
- **Health Checks**: Comprehensive health check endpoints
|
|
|
|
|
- **Monitoring**: Real-time monitoring and alerting
|
|
|
|
|
- **Gradual Rollout**: Deploy to small percentage first
|
|
|
|
|
|
|
|
|
|
## References
|
|
|
|
|
|
|
|
|
|
- [Kubernetes Rollout Documentation](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-back-a-deployment)
|
|
|
|
|
- [Deployment Best Practices](../README.md)
|