senke/veza

senke 2aea1af361 docs(J2): align docs with reality — rewrite CLAUDE.md, fix README, purge chat-server refs

Completes Day 2 of the v1.0.3 → v1.0.4 cleanup sprint. The documentation
now describes the actual repo layout instead of a fictional one.

CLAUDE.md — complete rewrite
  Old version referenced paths that don't exist and a protocol aimed at
  implementing v0.11.0 (current tag: v1.0.3). The agent was following a
  map for a city that had been rebuilt.
  - backend/        → veza-backend-api/
  - frontend/       → apps/web/
  - ORIGIN/ (root)  → veza-docs/ORIGIN/
  - veza-chat-server → merged into backend-api (v0.502, commit 279a10d31)
  - apps/desktop/   → never existed
  Also refreshed: stack versions (Go 1.25, Vite 5, React 18.2, Axum 0.8),
  commands, conventions, hook bypasses (SKIP_TYPES/SKIP_TESTS/SKIP_E2E),
  scope rules kept as immutable (no AI/ML, no Web3, no gamification, no
  dark patterns, no public popularity metrics).

README.md — targeted fixes
  - "Version cible: v0.101" → "Version courante: v1.0.4"
  - "Development Setup (v0.9.3)" → "Development Setup"
  - Removed Desktop (Electron) section — never implemented
  - Removed veza-chat-server from structure — merged into backend
  - Removed deprecated compose files section (nothing is DEPRECATED now)

k8s runbooks — remove stale chat-server references
  The disaster-recovery runbooks still scaled/restarted a deployment
  that no longer exists. In a real failover these commands would have
  failed silently and blocked the procedure. Files patched:
    - k8s/disaster-recovery/runbooks/cluster-failover.md
    - k8s/disaster-recovery/runbooks/data-restore.md
    - k8s/disaster-recovery/runbooks/database-failover.md
    - k8s/disaster-recovery/runbooks/rollback-procedure.md
    - k8s/network-policies/README.md
    - k8s/secrets/README.md
    - k8s/secrets.yaml.example
  Each reference is replaced by a short inline note pointing to v0.502
  (commit 279a10d31) so future readers understand the history.

.env.example — remove CHAT_JWT_SECRET
  Legacy env var for the deleted chat server. Replaced by an explanatory
  comment.

Not in this commit (user handles on Forgejo):
  - Closing the 5 open dependabot PRs on veza-chat-server/* branches
  - Deleting those 5 remote branches after the PRs are closed

Refs: AUDIT_REPORT.md §5.1, §7.1, §10 P1, §10 P4

2026-04-14 17:23:50 +02:00

6.2 KiB

Raw Blame History

Application Rollback Runbook

This runbook describes the procedure for rolling back a failed application deployment.

Prerequisites

Access to Kubernetes cluster
kubectl configured
Previous deployment version available

Detection

Automatic Detection

Health checks will automatically detect:

Application crashes
High error rates
Slow response times
Failed readiness probes

Manual Detection

# Check pod status
kubectl get pods -n veza-production -l app=veza-backend-api

# Check deployment status
kubectl rollout status deployment/veza-backend-api -n veza-production

# Check application logs
kubectl logs -f deployment/veza-backend-api -n veza-production

# Check metrics
kubectl top pods -n veza-production

Rollback Procedure

Step 1: Verify Issue

# Check current deployment
kubectl get deployment veza-backend-api -n veza-production -o yaml

# Check recent events
kubectl get events -n veza-production --sort-by='.lastTimestamp' | tail -20

# Verify health endpoint
curl https://api.veza.com/health

Step 2: Check Rollback History

# View deployment history
kubectl rollout history deployment/veza-backend-api -n veza-production

# View details of previous revision
kubectl rollout history deployment/veza-backend-api -n veza-production --revision=<N>

Step 3: Execute Rollback

Option A: Rollback to Previous Version

# Rollback to previous version
kubectl rollout undo deployment/veza-backend-api -n veza-production

# Monitor rollback progress
kubectl rollout status deployment/veza-backend-api -n veza-production

Option B: Rollback to Specific Revision

# Rollback to specific revision
kubectl rollout undo deployment/veza-backend-api -n veza-production --to-revision=<N>

# Monitor rollback progress
kubectl rollout status deployment/veza-backend-api -n veza-production

Step 4: Verify Rollback

# Check pod status
kubectl get pods -n veza-production -l app=veza-backend-api

# Check deployment status
kubectl get deployment veza-backend-api -n veza-production

# Verify pods are ready
kubectl wait --for=condition=ready pod \
  -l app=veza-backend-api \
  -n veza-production \
  --timeout=300s

# Check application logs
kubectl logs -f deployment/veza-backend-api -n veza-production

# Test health endpoint
curl https://api.veza.com/health

# Test critical endpoints
curl https://api.veza.com/api/v1/tracks

Step 5: Verify Application Functionality

# Run smoke tests
# (Use your application's test suite)

# Check metrics
kubectl top pods -n veza-production

# Monitor error rates
# (Check monitoring dashboard)

Multi-Service Rollback

If multiple services need rollback:

# Rollback backend API (handles chat since v0.502 merge)
kubectl rollout undo deployment/veza-backend-api -n veza-production

# Rollback frontend
kubectl rollout undo deployment/veza-frontend -n veza-production

# Rollback stream server (if media layer affected)
kubectl rollout undo deployment/veza-stream-server -n veza-production

# Monitor all rollbacks
kubectl rollout status deployment/veza-backend-api -n veza-production
kubectl rollout status deployment/veza-frontend -n veza-production
kubectl rollout status deployment/veza-stream-server -n veza-production

Database Migration Rollback

If rollback includes database changes:

# 1. Stop application
kubectl scale deployment veza-backend-api --replicas=0 -n veza-production

# 2. Rollback database migration
# (Use your migration tool)
# Example with migrate tool:
kubectl run migrate-rollback --rm -it --image=veza-backend-api:previous \
  --restart=Never \
  --env="DATABASE_URL=$DATABASE_URL" \
  -- migrate -path /migrations -database $DATABASE_URL down 1

# 3. Rollback application
kubectl rollout undo deployment/veza-backend-api -n veza-production

# 4. Restart application
kubectl scale deployment veza-backend-api --replicas=3 -n veza-production

Verification Checklist

Previous version identified
Rollback executed
Pods are running and ready
Health checks passing
Application logs show no errors
Critical endpoints responding
Metrics normalized
Users can access platform
Monitoring alerts cleared

Troubleshooting

Rollback Fails

# Check deployment status
kubectl describe deployment veza-backend-api -n veza-production

# Check pod events
kubectl describe pod <pod-name> -n veza-production

# Check image availability
kubectl get pod <pod-name> -n veza-production -o jsonpath='{.spec.containers[0].image}'

# If image is missing, may need to rebuild or use different image

Pods Not Starting

# Check pod logs
kubectl logs <pod-name> -n veza-production

# Check resource constraints
kubectl describe pod <pod-name> -n veza-production | grep -A 5 "Limits\|Requests"

# Check node resources
kubectl top nodes

Application Still Failing After Rollback

# Verify correct version is deployed
kubectl get deployment veza-backend-api -n veza-production -o jsonpath='{.spec.template.spec.containers[0].image}'

# Check if issue is in previous version too
kubectl logs <pod-name> -n veza-production

# May need to rollback further or investigate root cause

Post-Rollback Tasks

Investigate Root Cause
- Review deployment logs
- Check application logs
- Identify what caused failure
Fix Issue
- Address root cause
- Test fix in staging
- Prepare new deployment
Document Incident
- Document rollback procedure
- Note any issues encountered
- Update deployment process if needed
Notify Stakeholders
- Send incident report
- Update status page
- Schedule post-mortem if needed

Prevention

To prevent future rollbacks:

Automated Testing: Run full test suite before deployment
Staged Rollouts: Use canary or blue-green deployments
Health Checks: Comprehensive health check endpoints
Monitoring: Real-time monitoring and alerting
Gradual Rollout: Deploy to small percentage first

6.2 KiB Raw Blame History