veza/k8s/backups/README.md

297 lines
6.8 KiB
Markdown

# Database Backup Configuration
This directory contains Kubernetes CronJobs for automated database backups with retention policies.
## Components
### PostgreSQL Backup
- **Schedule**: Daily at 3:00 AM
- **Format**: PostgreSQL custom format (compressed)
- **Retention**: 30 days (configurable)
- **Storage**: 100Gi PVC
### Redis Backup
- **Schedule**: Daily at 3:30 AM
- **Format**: RDB file
- **Retention**: 30 days (configurable)
- **Storage**: 20Gi PVC
## Prerequisites
### Secrets Required
The backup jobs require the following secrets in `veza-secrets`:
```bash
# PostgreSQL
postgres-host: "postgres-service-name"
postgres-user: "postgres_user"
postgres-password: "postgres_password"
postgres-db: "veza_db"
# Redis (optional password)
redis-host: "redis-service-name"
redis-password: "redis_password" # Optional
# S3 Backup (optional)
s3-backup-bucket: "veza-backups"
aws-access-key-id: "AWS_ACCESS_KEY"
aws-secret-access-key: "AWS_SECRET_KEY"
```
### Create Secrets
```bash
kubectl create secret generic veza-secrets \
--from-literal=postgres-host=postgres \
--from-literal=postgres-user=veza_user \
--from-literal=postgres-password=your_password \
--from-literal=postgres-db=veza_db \
--from-literal=redis-host=redis \
--from-literal=redis-password=your_redis_password \
-n veza-production
```
## Deployment
### 1. Deploy PostgreSQL Backup
```bash
kubectl apply -f k8s/backups/postgres-backup-cronjob.yaml
```
### 2. Deploy Redis Backup
```bash
kubectl apply -f k8s/backups/redis-backup-cronjob.yaml
```
## Verification
### Check CronJob Status
```bash
# List all cronjobs
kubectl get cronjobs -n veza-production
# Check PostgreSQL backup cronjob
kubectl get cronjob postgres-backup -n veza-production
# Check Redis backup cronjob
kubectl get cronjob redis-backup -n veza-production
```
### Check Backup Jobs
```bash
# List recent jobs
kubectl get jobs -n veza-production -l app=postgres-backup
# View job logs
kubectl logs -l app=postgres-backup -n veza-production --tail=100
# Check Redis backup jobs
kubectl get jobs -n veza-production -l app=redis-backup
kubectl logs -l app=redis-backup -n veza-production --tail=100
```
### Verify Backups
```bash
# Create a test pod to access backup storage
kubectl run backup-checker --rm -it --image=postgres:15-alpine \
--restart=Never \
--overrides='
{
"spec": {
"containers": [{
"name": "backup-checker",
"image": "postgres:15-alpine",
"command": ["/bin/sh"],
"stdin": true,
"tty": true,
"volumeMounts": [{
"name": "backup-storage",
"mountPath": "/backups"
}]
}],
"volumes": [{
"name": "backup-storage",
"persistentVolumeClaim": {
"claimName": "postgres-backup-storage"
}
}]
}
}' \
-n veza-production
# Inside the pod, list backups
ls -lh /backups/postgres/
```
## Manual Backup
### Trigger PostgreSQL Backup Manually
```bash
kubectl create job --from=cronjob/postgres-backup postgres-backup-manual-$(date +%s) -n veza-production
```
### Trigger Redis Backup Manually
```bash
kubectl create job --from=cronjob/redis-backup redis-backup-manual-$(date +%s) -n veza-production
```
## Restore from Backup
### Restore PostgreSQL Backup
```bash
# Create a restore pod
kubectl run postgres-restore --rm -it --image=postgres:15-alpine \
--restart=Never \
--overrides='
{
"spec": {
"containers": [{
"name": "postgres-restore",
"image": "postgres:15-alpine",
"command": ["/bin/sh"],
"stdin": true,
"tty": true,
"env": [
{"name": "PGPASSWORD", "value": "your_password"},
{"name": "POSTGRES_HOST", "value": "postgres-service"},
{"name": "POSTGRES_USER", "value": "veza_user"},
{"name": "POSTGRES_DB", "value": "veza_db"}
],
"volumeMounts": [{
"name": "backup-storage",
"mountPath": "/backups"
}]
}],
"volumes": [{
"name": "backup-storage",
"persistentVolumeClaim": {
"claimName": "postgres-backup-storage"
}
}]
}
}' \
-n veza-production
# Inside the pod, restore backup
pg_restore -h $POSTGRES_HOST -U $POSTGRES_USER -d $POSTGRES_DB -F c /backups/postgres/veza_db_YYYYMMDD_HHMMSS.dump
```
### Restore Redis Backup
```bash
# Copy backup file to Redis pod
kubectl cp <backup-file> redis-pod:/data/dump.rdb -n veza-production
# Restart Redis to load the backup
kubectl delete pod <redis-pod> -n veza-production
```
## Configuration
### Change Backup Schedule
Edit the `schedule` field in the CronJob manifest:
```yaml
spec:
schedule: "0 3 * * *" # Cron format: minute hour day month weekday
```
Examples:
- `"0 3 * * *"` - Daily at 3:00 AM
- `"0 */6 * * *"` - Every 6 hours
- `"0 2 * * 0"` - Weekly on Sunday at 2:00 AM
### Change Retention Period
Set the `BACKUP_RETENTION_DAYS` environment variable:
```yaml
env:
- name: BACKUP_RETENTION_DAYS
value: "60" # Keep backups for 60 days
```
### Enable S3 Upload
Add S3 credentials to secrets:
```bash
kubectl create secret generic veza-secrets \
--from-literal=s3-backup-bucket=veza-backups \
--from-literal=aws-access-key-id=YOUR_KEY \
--from-literal=aws-secret-access-key=YOUR_SECRET \
-n veza-production \
--dry-run=client -o yaml | kubectl apply -f -
```
## Monitoring
### Check Backup Success Rate
```bash
# Count successful jobs in last 7 days
kubectl get jobs -n veza-production -l app=postgres-backup \
--field-selector status.successful=1 \
-o json | jq '.items | length'
```
### Monitor Backup Sizes
Backup sizes are logged in job output. Check logs to monitor size trends.
### Set Up Alerts
Configure Prometheus alerts for:
- Failed backup jobs
- Backup size anomalies
- Storage capacity warnings
## Troubleshooting
### Backup Job Fails
1. Check job logs:
```bash
kubectl logs <job-name> -n veza-production
```
2. Verify secrets are correct:
```bash
kubectl get secret veza-secrets -n veza-production -o yaml
```
3. Test database connectivity:
```bash
kubectl run test-db-connection --rm -it --image=postgres:15-alpine \
--restart=Never \
--env="PGPASSWORD=your_password" \
-- psql -h postgres-service -U veza_user -d veza_db -c "SELECT 1"
```
### Storage Full
1. Check PVC usage:
```bash
kubectl describe pvc postgres-backup-storage -n veza-production
```
2. Manually cleanup old backups:
```bash
kubectl run cleanup --rm -it --image=postgres:15-alpine \
--restart=Never \
--overrides='{"spec":{"containers":[{"name":"cleanup","image":"postgres:15-alpine","command":["/bin/sh","-c","find /backups -name \"*.dump\" -mtime +30 -delete"],"volumeMounts":[{"name":"backup-storage","mountPath":"/backups"}],"stdin":true,"tty":true}],"volumes":[{"name":"backup-storage","persistentVolumeClaim":{"claimName":"postgres-backup-storage"}}]}}' \
-n veza-production
```
3. Increase PVC size if needed (requires storage class support)