veza/k8s/backups/README.md

6.8 KiB

Database Backup Configuration

This directory contains Kubernetes CronJobs for automated database backups with retention policies.

Components

PostgreSQL Backup

  • Schedule: Daily at 3:00 AM
  • Format: PostgreSQL custom format (compressed)
  • Retention: 30 days (configurable)
  • Storage: 100Gi PVC

Redis Backup

  • Schedule: Daily at 3:30 AM
  • Format: RDB file
  • Retention: 30 days (configurable)
  • Storage: 20Gi PVC

Prerequisites

Secrets Required

The backup jobs require the following secrets in veza-secrets:

# PostgreSQL
postgres-host: "postgres-service-name"
postgres-user: "postgres_user"
postgres-password: "postgres_password"
postgres-db: "veza_db"

# Redis (optional password)
redis-host: "redis-service-name"
redis-password: "redis_password"  # Optional

# S3 Backup (optional)
s3-backup-bucket: "veza-backups"
aws-access-key-id: "AWS_ACCESS_KEY"
aws-secret-access-key: "AWS_SECRET_KEY"

Create Secrets

kubectl create secret generic veza-secrets \
  --from-literal=postgres-host=postgres \
  --from-literal=postgres-user=veza_user \
  --from-literal=postgres-password=your_password \
  --from-literal=postgres-db=veza_db \
  --from-literal=redis-host=redis \
  --from-literal=redis-password=your_redis_password \
  -n veza-production

Deployment

1. Deploy PostgreSQL Backup

kubectl apply -f k8s/backups/postgres-backup-cronjob.yaml

2. Deploy Redis Backup

kubectl apply -f k8s/backups/redis-backup-cronjob.yaml

Verification

Check CronJob Status

# List all cronjobs
kubectl get cronjobs -n veza-production

# Check PostgreSQL backup cronjob
kubectl get cronjob postgres-backup -n veza-production

# Check Redis backup cronjob
kubectl get cronjob redis-backup -n veza-production

Check Backup Jobs

# List recent jobs
kubectl get jobs -n veza-production -l app=postgres-backup

# View job logs
kubectl logs -l app=postgres-backup -n veza-production --tail=100

# Check Redis backup jobs
kubectl get jobs -n veza-production -l app=redis-backup
kubectl logs -l app=redis-backup -n veza-production --tail=100

Verify Backups

# Create a test pod to access backup storage
kubectl run backup-checker --rm -it --image=postgres:15-alpine \
  --restart=Never \
  --overrides='
{
  "spec": {
    "containers": [{
      "name": "backup-checker",
      "image": "postgres:15-alpine",
      "command": ["/bin/sh"],
      "stdin": true,
      "tty": true,
      "volumeMounts": [{
        "name": "backup-storage",
        "mountPath": "/backups"
      }]
    }],
    "volumes": [{
      "name": "backup-storage",
      "persistentVolumeClaim": {
        "claimName": "postgres-backup-storage"
      }
    }]
  }
}' \
  -n veza-production

# Inside the pod, list backups
ls -lh /backups/postgres/

Manual Backup

Trigger PostgreSQL Backup Manually

kubectl create job --from=cronjob/postgres-backup postgres-backup-manual-$(date +%s) -n veza-production

Trigger Redis Backup Manually

kubectl create job --from=cronjob/redis-backup redis-backup-manual-$(date +%s) -n veza-production

Restore from Backup

Restore PostgreSQL Backup

# Create a restore pod
kubectl run postgres-restore --rm -it --image=postgres:15-alpine \
  --restart=Never \
  --overrides='
{
  "spec": {
    "containers": [{
      "name": "postgres-restore",
      "image": "postgres:15-alpine",
      "command": ["/bin/sh"],
      "stdin": true,
      "tty": true,
      "env": [
        {"name": "PGPASSWORD", "value": "your_password"},
        {"name": "POSTGRES_HOST", "value": "postgres-service"},
        {"name": "POSTGRES_USER", "value": "veza_user"},
        {"name": "POSTGRES_DB", "value": "veza_db"}
      ],
      "volumeMounts": [{
        "name": "backup-storage",
        "mountPath": "/backups"
      }]
    }],
    "volumes": [{
      "name": "backup-storage",
      "persistentVolumeClaim": {
        "claimName": "postgres-backup-storage"
      }
    }]
  }
}' \
  -n veza-production

# Inside the pod, restore backup
pg_restore -h $POSTGRES_HOST -U $POSTGRES_USER -d $POSTGRES_DB -F c /backups/postgres/veza_db_YYYYMMDD_HHMMSS.dump

Restore Redis Backup

# Copy backup file to Redis pod
kubectl cp <backup-file> redis-pod:/data/dump.rdb -n veza-production

# Restart Redis to load the backup
kubectl delete pod <redis-pod> -n veza-production

Configuration

Change Backup Schedule

Edit the schedule field in the CronJob manifest:

spec:
  schedule: "0 3 * * *"  # Cron format: minute hour day month weekday

Examples:

  • "0 3 * * *" - Daily at 3:00 AM
  • "0 */6 * * *" - Every 6 hours
  • "0 2 * * 0" - Weekly on Sunday at 2:00 AM

Change Retention Period

Set the BACKUP_RETENTION_DAYS environment variable:

env:
- name: BACKUP_RETENTION_DAYS
  value: "60"  # Keep backups for 60 days

Enable S3 Upload

Add S3 credentials to secrets:

kubectl create secret generic veza-secrets \
  --from-literal=s3-backup-bucket=veza-backups \
  --from-literal=aws-access-key-id=YOUR_KEY \
  --from-literal=aws-secret-access-key=YOUR_SECRET \
  -n veza-production \
  --dry-run=client -o yaml | kubectl apply -f -

Monitoring

Check Backup Success Rate

# Count successful jobs in last 7 days
kubectl get jobs -n veza-production -l app=postgres-backup \
  --field-selector status.successful=1 \
  -o json | jq '.items | length'

Monitor Backup Sizes

Backup sizes are logged in job output. Check logs to monitor size trends.

Set Up Alerts

Configure Prometheus alerts for:

  • Failed backup jobs
  • Backup size anomalies
  • Storage capacity warnings

Troubleshooting

Backup Job Fails

  1. Check job logs:

    kubectl logs <job-name> -n veza-production
    
  2. Verify secrets are correct:

    kubectl get secret veza-secrets -n veza-production -o yaml
    
  3. Test database connectivity:

    kubectl run test-db-connection --rm -it --image=postgres:15-alpine \
      --restart=Never \
      --env="PGPASSWORD=your_password" \
      -- psql -h postgres-service -U veza_user -d veza_db -c "SELECT 1"
    

Storage Full

  1. Check PVC usage:

    kubectl describe pvc postgres-backup-storage -n veza-production
    
  2. Manually cleanup old backups:

    kubectl run cleanup --rm -it --image=postgres:15-alpine \
      --restart=Never \
      --overrides='{"spec":{"containers":[{"name":"cleanup","image":"postgres:15-alpine","command":["/bin/sh","-c","find /backups -name \"*.dump\" -mtime +30 -delete"],"volumeMounts":[{"name":"backup-storage","mountPath":"/backups"}],"stdin":true,"tty":true}],"volumes":[{"name":"backup-storage","persistentVolumeClaim":{"claimName":"postgres-backup-storage"}}]}}' \
      -n veza-production
    
  3. Increase PVC size if needed (requires storage class support)