veza/k8s/disaster-recovery/runbooks/security-incident.md

7.5 KiB

Security Incident Response Runbook

This runbook describes the procedure for responding to security incidents, including breaches, unauthorized access, and data exfiltration.

Prerequisites

  • Security team contact information
  • Incident response team assembled
  • Access to logs and monitoring
  • Backup and restore procedures ready

Incident Severity Levels

  • P0 (Critical): Active breach, data exfiltration, ransomware
  • P1 (High): Unauthorized access, privilege escalation
  • P2 (Medium): Suspicious activity, potential vulnerability
  • P3 (Low): Security alerts, false positives

Immediate Response (First 15 Minutes)

Step 1: Containment

# Isolate affected systems
# Option A: Scale down affected deployment
kubectl scale deployment veza-backend-api --replicas=0 -n veza-production

# Option B: Block network access
kubectl apply -f k8s/network-policies/block-all.yaml -n veza-production

# Option C: Revoke credentials
# Update secrets immediately
kubectl delete secret veza-secrets -n veza-production
# Restore from Vault with new credentials

Step 2: Preserve Evidence

# Export logs
kubectl logs deployment/veza-backend-api -n veza-production > /tmp/incident-logs-$(date +%s).log

# Export events
kubectl get events -n veza-production --sort-by='.lastTimestamp' > /tmp/incident-events-$(date +%s).log

# Export pod configurations
kubectl get pods -n veza-production -o yaml > /tmp/incident-pods-$(date +%s).yaml

# Export network policies
kubectl get networkpolicies -n veza-production -o yaml > /tmp/incident-netpol-$(date +%s).yaml

Step 3: Notify Team

# Send immediate notification
# (Use your notification system)
# PagerDuty, Slack, Email, etc.

# Document incident
echo "INCIDENT: $(date)" >> /tmp/incident-log.txt
echo "Severity: P0" >> /tmp/incident-log.txt
echo "Description: [Description]" >> /tmp/incident-log.txt

Investigation Phase

Step 1: Identify Scope

# Check for unauthorized pods
kubectl get pods -n veza-production --all-namespaces

# Check for suspicious services
kubectl get svc -n veza-production

# Check for unauthorized ingress
kubectl get ingress -n veza-production

# Check network policies
kubectl get networkpolicies -n veza-production

Step 2: Review Access Logs

# Check API access logs
kubectl logs deployment/veza-backend-api -n veza-production | \
  grep -i "unauthorized\|forbidden\|failed\|error"

# Check authentication logs
kubectl logs deployment/veza-backend-api -n veza-production | \
  grep -i "login\|auth\|token\|jwt"

# Check database access
kubectl logs postgres-pod -n veza-production | \
  grep -i "connection\|login\|failed"

Step 3: Check for Data Exfiltration

# Check database access patterns
kubectl exec -it postgres-pod -n veza-production -- \
  psql -U veza_user -d veza_db -c "
    SELECT * FROM pg_stat_activity 
    WHERE state = 'active' 
    ORDER BY query_start DESC;
  "

# Check for large data exports
kubectl exec -it postgres-pod -n veza-production -- \
  psql -U veza_user -d veza_db -c "
    SELECT schemaname, tablename, n_tup_ins, n_tup_upd, n_tup_del
    FROM pg_stat_user_tables
    ORDER BY n_tup_del DESC;
  "

Remediation Phase

Step 1: Revoke Compromised Credentials

# Revoke JWT secrets
# Update in Vault
vault kv put secret/veza/production/jwt-secret value=$(openssl rand -base64 32)

# Force External Secrets to sync
kubectl annotate externalsecret veza-secrets \
  force-sync=$(date +%s) \
  -n veza-production \
  --overwrite

# Restart applications
kubectl rollout restart deployment/veza-backend-api -n veza-production

Step 2: Patch Vulnerabilities

# Update vulnerable images
kubectl set image deployment/veza-backend-api \
  veza-backend-api=veza-backend-api:latest \
  -n veza-production

# Apply security patches
kubectl apply -f k8s/security-patches/ -n veza-production

Step 3: Restore from Clean Backup

If data was compromised:

# Follow data restore procedure
# See runbooks/data-restore.md

# Restore from backup before incident
kubectl scale deployment veza-backend-api --replicas=0 -n veza-production

# Restore database
# (Follow data-restore.md procedure)

# Restart applications
kubectl scale deployment veza-backend-api --replicas=3 -n veza-production

Step 4: Strengthen Security

# Apply network policies
kubectl apply -f k8s/network-policies/ -n veza-production

# Enable audit logging
kubectl apply -f k8s/audit/audit-policy.yaml

# Update RBAC
kubectl apply -f k8s/rbac/ -n veza-production

# Enable Pod Security Policies
kubectl apply -f k8s/pod-security/ -n veza-production

Recovery Phase

Step 1: Verify System Integrity

# Check all pods are running
kubectl get pods -n veza-production

# Verify health checks
curl https://api.veza.com/health

# Check for anomalies
kubectl top pods -n veza-production

Step 2: Monitor for Recurrence

# Set up enhanced monitoring
# (Configure additional alerts)

# Review logs continuously
kubectl logs -f deployment/veza-backend-api -n veza-production

Step 3: Gradual Re-enablement

# Gradually scale up services
kubectl scale deployment veza-backend-api --replicas=1 -n veza-production

# Monitor for issues
# Wait 15 minutes

# Scale to full capacity
kubectl scale deployment veza-backend-api --replicas=3 -n veza-production

Post-Incident Tasks

Immediate (First 24 Hours)

  1. Document Incident

    • Timeline of events
    • Actions taken
    • Systems affected
    • Data compromised (if any)
  2. Notify Stakeholders

    • Internal team
    • Management
    • Legal (if required)
    • Customers (if data breach)
  3. Preserve Evidence

    • Secure all logs
    • Document all actions
    • Maintain chain of custody

Short Term (First Week)

  1. Root Cause Analysis

    • Identify vulnerability
    • Determine attack vector
    • Assess impact
  2. Remediation

    • Patch vulnerabilities
    • Update security policies
    • Implement additional controls
  3. Communication

    • Internal post-mortem
    • External communication (if needed)
    • Regulatory notifications (if required)

Long Term (Ongoing)

  1. Prevention

    • Security training
    • Regular security audits
    • Penetration testing
    • Security monitoring improvements
  2. Documentation

    • Update security procedures
    • Update incident response plan
    • Document lessons learned

Verification Checklist

  • Incident contained
  • Evidence preserved
  • Compromised credentials revoked
  • Vulnerabilities patched
  • Systems restored
  • Monitoring enhanced
  • Documentation updated
  • Stakeholders notified
  • Post-mortem scheduled

Communication Templates

Internal Notification

Subject: [SECURITY INCIDENT] Veza Platform - <Description>

Severity: P0/P1/P2
Status: Contained/Investigating/Resolved
Impact: <Description>
Actions Taken: <List>
Next Steps: <List>

Incident response team is actively working on resolution.

External Notification (if required)

Subject: Security Incident Notification

We are writing to inform you of a security incident that may have affected your account.

What happened: <Description>
What we're doing: <Actions>
What you should do: <Recommendations>
Timeline: <Dates>

We take security seriously and are committed to protecting your data.

References