# Data Restore Runbook This runbook describes the procedure for restoring data from backups after data loss or corruption. ## Prerequisites - Access to backup storage - Database credentials - kubectl access to cluster - Backup file identified ## Pre-Restore Checklist - [ ] Backup file identified and verified - [ ] Backup integrity checked - [ ] Restore point confirmed - [ ] Applications stopped (to prevent writes) - [ ] Current data backed up (if possible) ## Restore Procedure ### Step 1: Identify Backup ```bash # List available backups kubectl get pvc postgres-backup-storage -n veza-production # List backups in storage kubectl run backup-lister --rm -it --image=postgres:15-alpine \ --restart=Never \ --overrides=' { "spec": { "containers": [{ "name": "backup-lister", "image": "postgres:15-alpine", "command": ["/bin/sh", "-c", "ls -lh /backups/postgres/"], "volumeMounts": [{ "name": "backup-storage", "mountPath": "/backups" }] }], "volumes": [{ "name": "backup-storage", "persistentVolumeClaim": { "claimName": "postgres-backup-storage" } }] } }' \ -n veza-production # Or from S3 aws s3 ls s3://veza-backups/postgres/ --recursive | sort ``` ### Step 2: Stop Applications ```bash # Scale down applications to prevent writes # (backend-api handles chat since v0.502 merge — no separate chat-server deployment) kubectl scale deployment veza-backend-api --replicas=0 -n veza-production # Verify pods are stopped kubectl get pods -n veza-production -l app=veza-backend-api ``` ### Step 3: Backup Current State (Optional) ```bash # Create backup of current state before restore kubectl create job --from=cronjob/postgres-backup \ postgres-backup-pre-restore-$(date +%s) \ -n veza-production # Wait for backup to complete kubectl wait --for=condition=complete job/postgres-backup-pre-restore-* \ -n veza-production \ --timeout=600s ``` ### Step 4: Restore Database #### Full Database Restore ```bash # Get database credentials DB_PASSWORD=$(kubectl get secret veza-secrets -n veza-production \ -o jsonpath='{.data.database-url}' | \ base64 -d | grep -oP 'password=\K[^&]+') # Restore database kubectl run postgres-restore --rm -it --image=postgres:15-alpine \ --restart=Never \ --env="PGPASSWORD=$DB_PASSWORD" \ --env="POSTGRES_HOST=postgres-service" \ --env="POSTGRES_USER=veza_user" \ --env="POSTGRES_DB=veza_db" \ --overrides=' { "spec": { "containers": [{ "name": "postgres-restore", "image": "postgres:15-alpine", "command": ["/bin/sh", "-c", "pg_restore -h $POSTGRES_HOST -U $POSTGRES_USER -d $POSTGRES_DB -F c /backups/postgres/veza_db_YYYYMMDD_HHMMSS.dump --clean --if-exists --verbose"], "env": [ {"name": "PGPASSWORD", "value": "'$DB_PASSWORD'"}, {"name": "POSTGRES_HOST", "value": "postgres-service"}, {"name": "POSTGRES_USER", "value": "veza_user"}, {"name": "POSTGRES_DB", "value": "veza_db"} ], "volumeMounts": [{ "name": "backup-storage", "mountPath": "/backups" }] }], "volumes": [{ "name": "backup-storage", "persistentVolumeClaim": { "claimName": "postgres-backup-storage" } }] } }' \ -n veza-production ``` #### Restore from S3 ```bash # Download backup from S3 aws s3 cp s3://veza-backups/postgres/veza_db_YYYYMMDD_HHMMSS.dump /tmp/backup.dump # Restore kubectl run postgres-restore --rm -it --image=postgres:15-alpine \ --restart=Never \ --env="PGPASSWORD=$DB_PASSWORD" \ --overrides=' { "spec": { "containers": [{ "name": "postgres-restore", "image": "postgres:15-alpine", "command": ["/bin/sh", "-c", "pg_restore -h postgres-service -U veza_user -d veza_db -F c /backups/backup.dump --clean --if-exists"], "env": [{"name": "PGPASSWORD", "value": "'$DB_PASSWORD'"}], "volumeMounts": [{ "name": "backup", "mountPath": "/backups" }] }], "volumes": [{ "name": "backup", "hostPath": { "path": "/tmp" } }] } }' \ -n veza-production ``` #### Point-in-Time Recovery ```bash # Restore to specific timestamp using WAL archives pg_restore -h postgres-service -U veza_user -d veza_db \ --recovery-target-time="2025-01-01 12:00:00" \ /backups/postgres/base_backup.dump ``` ### Step 5: Verify Data Integrity ```bash # Check table counts kubectl exec -it postgres-pod -n veza-production -- \ psql -U veza_user -d veza_db -c " SELECT 'users' as table_name, COUNT(*) as count FROM users UNION ALL SELECT 'tracks', COUNT(*) FROM tracks UNION ALL SELECT 'playlists', COUNT(*) FROM playlists; " # Verify specific data kubectl exec -it postgres-pod -n veza-production -- \ psql -U veza_user -d veza_db -c " SELECT id, username, email, created_at FROM users ORDER BY created_at DESC LIMIT 10; " # Check database size kubectl exec -it postgres-pod -n veza-production -- \ psql -U veza_user -d veza_db -c " SELECT pg_size_pretty(pg_database_size('veza_db')); " ``` ### Step 6: Restart Applications ```bash # Scale up applications (backend-api handles chat since v0.502) kubectl scale deployment veza-backend-api --replicas=3 -n veza-production # Wait for pods to be ready kubectl rollout status deployment/veza-backend-api -n veza-production ``` ### Step 7: Verify Application Functionality ```bash # Check application logs kubectl logs -f deployment/veza-backend-api -n veza-production # Test health endpoint curl https://api.veza.com/health # Test API endpoints curl https://api.veza.com/api/v1/tracks curl https://api.veza.com/api/v1/users/me # Run smoke tests # (Use your application's test suite) ``` ## Partial Restore ### Restore Specific Tables ```bash # Restore only specific tables pg_restore -h postgres-service -U veza_user -d veza_db \ -t users -t tracks \ /backups/postgres/veza_db_YYYYMMDD_HHMMSS.dump ``` ### Restore Specific Schema ```bash # Restore only specific schema pg_restore -h postgres-service -U veza_user -d veza_db \ -n public \ /backups/postgres/veza_db_YYYYMMDD_HHMMSS.dump ``` ## Verification Checklist - [ ] Backup file identified and verified - [ ] Applications stopped - [ ] Current state backed up (if possible) - [ ] Database restored successfully - [ ] Data integrity verified - [ ] Applications restarted - [ ] Health checks passing - [ ] API endpoints responding - [ ] Smoke tests passing - [ ] Users can access platform ## Troubleshooting ### Restore Fails with Permission Error ```bash # Check database user permissions kubectl exec -it postgres-pod -n veza-production -- \ psql -U postgres -c "\du veza_user" # Grant necessary permissions kubectl exec -it postgres-pod -n veza-production -- \ psql -U postgres -c "GRANT ALL PRIVILEGES ON DATABASE veza_db TO veza_user;" ``` ### Restore Fails with Connection Error ```bash # Verify database is accessible kubectl exec -it postgres-pod -n veza-production -- \ pg_isready -U veza_user -d veza_db # Check service endpoint kubectl get svc postgres -n veza-production # Test connection kubectl run test-connection --rm -it --image=postgres:15-alpine \ --restart=Never \ --env="PGPASSWORD=$DB_PASSWORD" \ -- psql -h postgres-service -U veza_user -d veza_db -c "SELECT 1;" ``` ### Data Inconsistencies After Restore ```bash # Compare record counts with expected values # Check application logs for errors kubectl logs -f deployment/veza-backend-api -n veza-production # Verify foreign key constraints kubectl exec -it postgres-pod -n veza-production -- \ psql -U veza_user -d veza_db -c " SELECT conname, conrelid::regclass, confrelid::regclass FROM pg_constraint WHERE contype = 'f'; " ``` ## Post-Restore Tasks 1. **Monitor Platform** - Watch application logs - Monitor error rates - Check performance metrics 2. **Verify Data** - Run data integrity checks - Compare with expected values - Test critical user flows 3. **Document Incident** - Document restore procedure - Note any issues encountered - Update runbook if needed 4. **Investigate Root Cause** - Review logs and events - Identify what caused data loss - Implement prevention measures ## References - [Backup Strategy](../backups/README.md) - [PostgreSQL Restore Documentation](https://www.postgresql.org/docs/current/app-pgrestore.html)