veza/k8s/disaster-recovery/runbooks/security-incident.md

322 lines
7.5 KiB
Markdown

# Security Incident Response Runbook
This runbook describes the procedure for responding to security incidents, including breaches, unauthorized access, and data exfiltration.
## Prerequisites
- Security team contact information
- Incident response team assembled
- Access to logs and monitoring
- Backup and restore procedures ready
## Incident Severity Levels
- **P0 (Critical)**: Active breach, data exfiltration, ransomware
- **P1 (High)**: Unauthorized access, privilege escalation
- **P2 (Medium)**: Suspicious activity, potential vulnerability
- **P3 (Low)**: Security alerts, false positives
## Immediate Response (First 15 Minutes)
### Step 1: Containment
```bash
# Isolate affected systems
# Option A: Scale down affected deployment
kubectl scale deployment veza-backend-api --replicas=0 -n veza-production
# Option B: Block network access
kubectl apply -f k8s/network-policies/block-all.yaml -n veza-production
# Option C: Revoke credentials
# Update secrets immediately
kubectl delete secret veza-secrets -n veza-production
# Restore from Vault with new credentials
```
### Step 2: Preserve Evidence
```bash
# Export logs
kubectl logs deployment/veza-backend-api -n veza-production > /tmp/incident-logs-$(date +%s).log
# Export events
kubectl get events -n veza-production --sort-by='.lastTimestamp' > /tmp/incident-events-$(date +%s).log
# Export pod configurations
kubectl get pods -n veza-production -o yaml > /tmp/incident-pods-$(date +%s).yaml
# Export network policies
kubectl get networkpolicies -n veza-production -o yaml > /tmp/incident-netpol-$(date +%s).yaml
```
### Step 3: Notify Team
```bash
# Send immediate notification
# (Use your notification system)
# PagerDuty, Slack, Email, etc.
# Document incident
echo "INCIDENT: $(date)" >> /tmp/incident-log.txt
echo "Severity: P0" >> /tmp/incident-log.txt
echo "Description: [Description]" >> /tmp/incident-log.txt
```
## Investigation Phase
### Step 1: Identify Scope
```bash
# Check for unauthorized pods
kubectl get pods -n veza-production --all-namespaces
# Check for suspicious services
kubectl get svc -n veza-production
# Check for unauthorized ingress
kubectl get ingress -n veza-production
# Check network policies
kubectl get networkpolicies -n veza-production
```
### Step 2: Review Access Logs
```bash
# Check API access logs
kubectl logs deployment/veza-backend-api -n veza-production | \
grep -i "unauthorized\|forbidden\|failed\|error"
# Check authentication logs
kubectl logs deployment/veza-backend-api -n veza-production | \
grep -i "login\|auth\|token\|jwt"
# Check database access
kubectl logs postgres-pod -n veza-production | \
grep -i "connection\|login\|failed"
```
### Step 3: Check for Data Exfiltration
```bash
# Check database access patterns
kubectl exec -it postgres-pod -n veza-production -- \
psql -U veza_user -d veza_db -c "
SELECT * FROM pg_stat_activity
WHERE state = 'active'
ORDER BY query_start DESC;
"
# Check for large data exports
kubectl exec -it postgres-pod -n veza-production -- \
psql -U veza_user -d veza_db -c "
SELECT schemaname, tablename, n_tup_ins, n_tup_upd, n_tup_del
FROM pg_stat_user_tables
ORDER BY n_tup_del DESC;
"
```
## Remediation Phase
### Step 1: Revoke Compromised Credentials
```bash
# Revoke JWT secrets
# Update in Vault
vault kv put secret/veza/production/jwt-secret value=$(openssl rand -base64 32)
# Force External Secrets to sync
kubectl annotate externalsecret veza-secrets \
force-sync=$(date +%s) \
-n veza-production \
--overwrite
# Restart applications
kubectl rollout restart deployment/veza-backend-api -n veza-production
```
### Step 2: Patch Vulnerabilities
```bash
# Update vulnerable images
kubectl set image deployment/veza-backend-api \
veza-backend-api=veza-backend-api:latest \
-n veza-production
# Apply security patches
kubectl apply -f k8s/security-patches/ -n veza-production
```
### Step 3: Restore from Clean Backup
If data was compromised:
```bash
# Follow data restore procedure
# See runbooks/data-restore.md
# Restore from backup before incident
kubectl scale deployment veza-backend-api --replicas=0 -n veza-production
# Restore database
# (Follow data-restore.md procedure)
# Restart applications
kubectl scale deployment veza-backend-api --replicas=3 -n veza-production
```
### Step 4: Strengthen Security
```bash
# Apply network policies
kubectl apply -f k8s/network-policies/ -n veza-production
# Enable audit logging
kubectl apply -f k8s/audit/audit-policy.yaml
# Update RBAC
kubectl apply -f k8s/rbac/ -n veza-production
# Enable Pod Security Policies
kubectl apply -f k8s/pod-security/ -n veza-production
```
## Recovery Phase
### Step 1: Verify System Integrity
```bash
# Check all pods are running
kubectl get pods -n veza-production
# Verify health checks
curl https://api.veza.com/health
# Check for anomalies
kubectl top pods -n veza-production
```
### Step 2: Monitor for Recurrence
```bash
# Set up enhanced monitoring
# (Configure additional alerts)
# Review logs continuously
kubectl logs -f deployment/veza-backend-api -n veza-production
```
### Step 3: Gradual Re-enablement
```bash
# Gradually scale up services
kubectl scale deployment veza-backend-api --replicas=1 -n veza-production
# Monitor for issues
# Wait 15 minutes
# Scale to full capacity
kubectl scale deployment veza-backend-api --replicas=3 -n veza-production
```
## Post-Incident Tasks
### Immediate (First 24 Hours)
1. **Document Incident**
- Timeline of events
- Actions taken
- Systems affected
- Data compromised (if any)
2. **Notify Stakeholders**
- Internal team
- Management
- Legal (if required)
- Customers (if data breach)
3. **Preserve Evidence**
- Secure all logs
- Document all actions
- Maintain chain of custody
### Short Term (First Week)
1. **Root Cause Analysis**
- Identify vulnerability
- Determine attack vector
- Assess impact
2. **Remediation**
- Patch vulnerabilities
- Update security policies
- Implement additional controls
3. **Communication**
- Internal post-mortem
- External communication (if needed)
- Regulatory notifications (if required)
### Long Term (Ongoing)
1. **Prevention**
- Security training
- Regular security audits
- Penetration testing
- Security monitoring improvements
2. **Documentation**
- Update security procedures
- Update incident response plan
- Document lessons learned
## Verification Checklist
- [ ] Incident contained
- [ ] Evidence preserved
- [ ] Compromised credentials revoked
- [ ] Vulnerabilities patched
- [ ] Systems restored
- [ ] Monitoring enhanced
- [ ] Documentation updated
- [ ] Stakeholders notified
- [ ] Post-mortem scheduled
## Communication Templates
### Internal Notification
```
Subject: [SECURITY INCIDENT] Veza Platform - <Description>
Severity: P0/P1/P2
Status: Contained/Investigating/Resolved
Impact: <Description>
Actions Taken: <List>
Next Steps: <List>
Incident response team is actively working on resolution.
```
### External Notification (if required)
```
Subject: Security Incident Notification
We are writing to inform you of a security incident that may have affected your account.
What happened: <Description>
What we're doing: <Actions>
What you should do: <Recommendations>
Timeline: <Dates>
We take security seriously and are committed to protecting your data.
```
## References
- [Data Restore Runbook](./data-restore.md)
- [Kubernetes Security Best Practices](https://kubernetes.io/docs/concepts/security/)
- [Incident Response Framework](../README.md)