Completes Day 2 of the v1.0.3 → v1.0.4 cleanup sprint. The documentation now describes the actual repo layout instead of a fictional one. CLAUDE.md — complete rewrite Old version referenced paths that don't exist and a protocol aimed at implementing v0.11.0 (current tag: v1.0.3). The agent was following a map for a city that had been rebuilt. - backend/ → veza-backend-api/ - frontend/ → apps/web/ - ORIGIN/ (root) → veza-docs/ORIGIN/ - veza-chat-server → merged into backend-api (v0.502, commit279a10d31) - apps/desktop/ → never existed Also refreshed: stack versions (Go 1.25, Vite 5, React 18.2, Axum 0.8), commands, conventions, hook bypasses (SKIP_TYPES/SKIP_TESTS/SKIP_E2E), scope rules kept as immutable (no AI/ML, no Web3, no gamification, no dark patterns, no public popularity metrics). README.md — targeted fixes - "Version cible: v0.101" → "Version courante: v1.0.4" - "Development Setup (v0.9.3)" → "Development Setup" - Removed Desktop (Electron) section — never implemented - Removed veza-chat-server from structure — merged into backend - Removed deprecated compose files section (nothing is DEPRECATED now) k8s runbooks — remove stale chat-server references The disaster-recovery runbooks still scaled/restarted a deployment that no longer exists. In a real failover these commands would have failed silently and blocked the procedure. Files patched: - k8s/disaster-recovery/runbooks/cluster-failover.md - k8s/disaster-recovery/runbooks/data-restore.md - k8s/disaster-recovery/runbooks/database-failover.md - k8s/disaster-recovery/runbooks/rollback-procedure.md - k8s/network-policies/README.md - k8s/secrets/README.md - k8s/secrets.yaml.example Each reference is replaced by a short inline note pointing to v0.502 (commit279a10d31) so future readers understand the history. .env.example — remove CHAT_JWT_SECRET Legacy env var for the deleted chat server. Replaced by an explanatory comment. Not in this commit (user handles on Forgejo): - Closing the 5 open dependabot PRs on veza-chat-server/* branches - Deleting those 5 remote branches after the PRs are closed Refs: AUDIT_REPORT.md §5.1, §7.1, §10 P1, §10 P4
6.2 KiB
6.2 KiB
Application Rollback Runbook
This runbook describes the procedure for rolling back a failed application deployment.
Prerequisites
- Access to Kubernetes cluster
- kubectl configured
- Previous deployment version available
Detection
Automatic Detection
Health checks will automatically detect:
- Application crashes
- High error rates
- Slow response times
- Failed readiness probes
Manual Detection
# Check pod status
kubectl get pods -n veza-production -l app=veza-backend-api
# Check deployment status
kubectl rollout status deployment/veza-backend-api -n veza-production
# Check application logs
kubectl logs -f deployment/veza-backend-api -n veza-production
# Check metrics
kubectl top pods -n veza-production
Rollback Procedure
Step 1: Verify Issue
# Check current deployment
kubectl get deployment veza-backend-api -n veza-production -o yaml
# Check recent events
kubectl get events -n veza-production --sort-by='.lastTimestamp' | tail -20
# Verify health endpoint
curl https://api.veza.com/health
Step 2: Check Rollback History
# View deployment history
kubectl rollout history deployment/veza-backend-api -n veza-production
# View details of previous revision
kubectl rollout history deployment/veza-backend-api -n veza-production --revision=<N>
Step 3: Execute Rollback
Option A: Rollback to Previous Version
# Rollback to previous version
kubectl rollout undo deployment/veza-backend-api -n veza-production
# Monitor rollback progress
kubectl rollout status deployment/veza-backend-api -n veza-production
Option B: Rollback to Specific Revision
# Rollback to specific revision
kubectl rollout undo deployment/veza-backend-api -n veza-production --to-revision=<N>
# Monitor rollback progress
kubectl rollout status deployment/veza-backend-api -n veza-production
Step 4: Verify Rollback
# Check pod status
kubectl get pods -n veza-production -l app=veza-backend-api
# Check deployment status
kubectl get deployment veza-backend-api -n veza-production
# Verify pods are ready
kubectl wait --for=condition=ready pod \
-l app=veza-backend-api \
-n veza-production \
--timeout=300s
# Check application logs
kubectl logs -f deployment/veza-backend-api -n veza-production
# Test health endpoint
curl https://api.veza.com/health
# Test critical endpoints
curl https://api.veza.com/api/v1/tracks
Step 5: Verify Application Functionality
# Run smoke tests
# (Use your application's test suite)
# Check metrics
kubectl top pods -n veza-production
# Monitor error rates
# (Check monitoring dashboard)
Multi-Service Rollback
If multiple services need rollback:
# Rollback backend API (handles chat since v0.502 merge)
kubectl rollout undo deployment/veza-backend-api -n veza-production
# Rollback frontend
kubectl rollout undo deployment/veza-frontend -n veza-production
# Rollback stream server (if media layer affected)
kubectl rollout undo deployment/veza-stream-server -n veza-production
# Monitor all rollbacks
kubectl rollout status deployment/veza-backend-api -n veza-production
kubectl rollout status deployment/veza-frontend -n veza-production
kubectl rollout status deployment/veza-stream-server -n veza-production
Database Migration Rollback
If rollback includes database changes:
# 1. Stop application
kubectl scale deployment veza-backend-api --replicas=0 -n veza-production
# 2. Rollback database migration
# (Use your migration tool)
# Example with migrate tool:
kubectl run migrate-rollback --rm -it --image=veza-backend-api:previous \
--restart=Never \
--env="DATABASE_URL=$DATABASE_URL" \
-- migrate -path /migrations -database $DATABASE_URL down 1
# 3. Rollback application
kubectl rollout undo deployment/veza-backend-api -n veza-production
# 4. Restart application
kubectl scale deployment veza-backend-api --replicas=3 -n veza-production
Verification Checklist
- Previous version identified
- Rollback executed
- Pods are running and ready
- Health checks passing
- Application logs show no errors
- Critical endpoints responding
- Metrics normalized
- Users can access platform
- Monitoring alerts cleared
Troubleshooting
Rollback Fails
# Check deployment status
kubectl describe deployment veza-backend-api -n veza-production
# Check pod events
kubectl describe pod <pod-name> -n veza-production
# Check image availability
kubectl get pod <pod-name> -n veza-production -o jsonpath='{.spec.containers[0].image}'
# If image is missing, may need to rebuild or use different image
Pods Not Starting
# Check pod logs
kubectl logs <pod-name> -n veza-production
# Check resource constraints
kubectl describe pod <pod-name> -n veza-production | grep -A 5 "Limits\|Requests"
# Check node resources
kubectl top nodes
Application Still Failing After Rollback
# Verify correct version is deployed
kubectl get deployment veza-backend-api -n veza-production -o jsonpath='{.spec.template.spec.containers[0].image}'
# Check if issue is in previous version too
kubectl logs <pod-name> -n veza-production
# May need to rollback further or investigate root cause
Post-Rollback Tasks
-
Investigate Root Cause
- Review deployment logs
- Check application logs
- Identify what caused failure
-
Fix Issue
- Address root cause
- Test fix in staging
- Prepare new deployment
-
Document Incident
- Document rollback procedure
- Note any issues encountered
- Update deployment process if needed
-
Notify Stakeholders
- Send incident report
- Update status page
- Schedule post-mortem if needed
Prevention
To prevent future rollbacks:
- Automated Testing: Run full test suite before deployment
- Staged Rollouts: Use canary or blue-green deployments
- Health Checks: Comprehensive health check endpoints
- Monitoring: Real-time monitoring and alerting
- Gradual Rollout: Deploy to small percentage first