senke/veza

senke 83dfdcd642 [INFRA-010] infra: Set up disaster recovery plan

2025-12-25 21:40:31 +01:00

7.5 KiB

Raw Blame History

Security Incident Response Runbook

This runbook describes the procedure for responding to security incidents, including breaches, unauthorized access, and data exfiltration.

Prerequisites

Security team contact information
Incident response team assembled
Access to logs and monitoring
Backup and restore procedures ready

Incident Severity Levels

P0 (Critical): Active breach, data exfiltration, ransomware
P1 (High): Unauthorized access, privilege escalation
P2 (Medium): Suspicious activity, potential vulnerability
P3 (Low): Security alerts, false positives

Immediate Response (First 15 Minutes)

Step 1: Containment

# Isolate affected systems
# Option A: Scale down affected deployment
kubectl scale deployment veza-backend-api --replicas=0 -n veza-production

# Option B: Block network access
kubectl apply -f k8s/network-policies/block-all.yaml -n veza-production

# Option C: Revoke credentials
# Update secrets immediately
kubectl delete secret veza-secrets -n veza-production
# Restore from Vault with new credentials

Step 2: Preserve Evidence

# Export logs
kubectl logs deployment/veza-backend-api -n veza-production > /tmp/incident-logs-$(date +%s).log

# Export events
kubectl get events -n veza-production --sort-by='.lastTimestamp' > /tmp/incident-events-$(date +%s).log

# Export pod configurations
kubectl get pods -n veza-production -o yaml > /tmp/incident-pods-$(date +%s).yaml

# Export network policies
kubectl get networkpolicies -n veza-production -o yaml > /tmp/incident-netpol-$(date +%s).yaml

Step 3: Notify Team

# Send immediate notification
# (Use your notification system)
# PagerDuty, Slack, Email, etc.

# Document incident
echo "INCIDENT: $(date)" >> /tmp/incident-log.txt
echo "Severity: P0" >> /tmp/incident-log.txt
echo "Description: [Description]" >> /tmp/incident-log.txt

Investigation Phase

Step 1: Identify Scope

# Check for unauthorized pods
kubectl get pods -n veza-production --all-namespaces

# Check for suspicious services
kubectl get svc -n veza-production

# Check for unauthorized ingress
kubectl get ingress -n veza-production

# Check network policies
kubectl get networkpolicies -n veza-production

Step 2: Review Access Logs

# Check API access logs
kubectl logs deployment/veza-backend-api -n veza-production | \
  grep -i "unauthorized\|forbidden\|failed\|error"

# Check authentication logs
kubectl logs deployment/veza-backend-api -n veza-production | \
  grep -i "login\|auth\|token\|jwt"

# Check database access
kubectl logs postgres-pod -n veza-production | \
  grep -i "connection\|login\|failed"

Step 3: Check for Data Exfiltration

# Check database access patterns
kubectl exec -it postgres-pod -n veza-production -- \
  psql -U veza_user -d veza_db -c "
    SELECT * FROM pg_stat_activity 
    WHERE state = 'active' 
    ORDER BY query_start DESC;
  "

# Check for large data exports
kubectl exec -it postgres-pod -n veza-production -- \
  psql -U veza_user -d veza_db -c "
    SELECT schemaname, tablename, n_tup_ins, n_tup_upd, n_tup_del
    FROM pg_stat_user_tables
    ORDER BY n_tup_del DESC;
  "

Remediation Phase

Step 1: Revoke Compromised Credentials

# Revoke JWT secrets
# Update in Vault
vault kv put secret/veza/production/jwt-secret value=$(openssl rand -base64 32)

# Force External Secrets to sync
kubectl annotate externalsecret veza-secrets \
  force-sync=$(date +%s) \
  -n veza-production \
  --overwrite

# Restart applications
kubectl rollout restart deployment/veza-backend-api -n veza-production

Step 2: Patch Vulnerabilities

# Update vulnerable images
kubectl set image deployment/veza-backend-api \
  veza-backend-api=veza-backend-api:latest \
  -n veza-production

# Apply security patches
kubectl apply -f k8s/security-patches/ -n veza-production

Step 3: Restore from Clean Backup

If data was compromised:

# Follow data restore procedure
# See runbooks/data-restore.md

# Restore from backup before incident
kubectl scale deployment veza-backend-api --replicas=0 -n veza-production

# Restore database
# (Follow data-restore.md procedure)

# Restart applications
kubectl scale deployment veza-backend-api --replicas=3 -n veza-production

Step 4: Strengthen Security

# Apply network policies
kubectl apply -f k8s/network-policies/ -n veza-production

# Enable audit logging
kubectl apply -f k8s/audit/audit-policy.yaml

# Update RBAC
kubectl apply -f k8s/rbac/ -n veza-production

# Enable Pod Security Policies
kubectl apply -f k8s/pod-security/ -n veza-production

Recovery Phase

Step 1: Verify System Integrity

# Check all pods are running
kubectl get pods -n veza-production

# Verify health checks
curl https://api.veza.com/health

# Check for anomalies
kubectl top pods -n veza-production

Step 2: Monitor for Recurrence

# Set up enhanced monitoring
# (Configure additional alerts)

# Review logs continuously
kubectl logs -f deployment/veza-backend-api -n veza-production

Step 3: Gradual Re-enablement

# Gradually scale up services
kubectl scale deployment veza-backend-api --replicas=1 -n veza-production

# Monitor for issues
# Wait 15 minutes

# Scale to full capacity
kubectl scale deployment veza-backend-api --replicas=3 -n veza-production

Post-Incident Tasks

Immediate (First 24 Hours)

Document Incident
- Timeline of events
- Actions taken
- Systems affected
- Data compromised (if any)
Notify Stakeholders
- Internal team
- Management
- Legal (if required)
- Customers (if data breach)
Preserve Evidence
- Secure all logs
- Document all actions
- Maintain chain of custody

Short Term (First Week)

Root Cause Analysis
- Identify vulnerability
- Determine attack vector
- Assess impact
Remediation
- Patch vulnerabilities
- Update security policies
- Implement additional controls
Communication
- Internal post-mortem
- External communication (if needed)
- Regulatory notifications (if required)

Long Term (Ongoing)

Prevention
- Security training
- Regular security audits
- Penetration testing
- Security monitoring improvements
Documentation
- Update security procedures
- Update incident response plan
- Document lessons learned

Verification Checklist

Incident contained
Evidence preserved
Compromised credentials revoked
Vulnerabilities patched
Systems restored
Monitoring enhanced
Documentation updated
Stakeholders notified
Post-mortem scheduled

Communication Templates

Internal Notification

Subject: [SECURITY INCIDENT] Veza Platform - <Description>

Severity: P0/P1/P2
Status: Contained/Investigating/Resolved
Impact: <Description>
Actions Taken: <List>
Next Steps: <List>

Incident response team is actively working on resolution.

External Notification (if required)

Subject: Security Incident Notification

We are writing to inform you of a security incident that may have affected your account.

What happened: <Description>
What we're doing: <Actions>
What you should do: <Recommendations>
Timeline: <Dates>

We take security seriously and are committed to protecting your data.

7.5 KiB Raw Blame History

Security Incident Response Runbook

Prerequisites

Incident Severity Levels

Immediate Response (First 15 Minutes)

Step 1: Containment

Step 2: Preserve Evidence

Step 3: Notify Team

Investigation Phase

Step 1: Identify Scope

Step 2: Review Access Logs

Step 3: Check for Data Exfiltration

Remediation Phase

Step 1: Revoke Compromised Credentials

Step 2: Patch Vulnerabilities

Step 3: Restore from Clean Backup

Step 4: Strengthen Security

Recovery Phase

Step 1: Verify System Integrity

Step 2: Monitor for Recurrence

Step 3: Gradual Re-enablement

Post-Incident Tasks

Immediate (First 24 Hours)

Short Term (First Week)

Long Term (Ongoing)

Verification Checklist

Communication Templates

Internal Notification

External Notification (if required)

References

7.5 KiB

Raw Blame History