322 lines
7.5 KiB
Markdown
322 lines
7.5 KiB
Markdown
# Security Incident Response Runbook
|
|
|
|
This runbook describes the procedure for responding to security incidents, including breaches, unauthorized access, and data exfiltration.
|
|
|
|
## Prerequisites
|
|
|
|
- Security team contact information
|
|
- Incident response team assembled
|
|
- Access to logs and monitoring
|
|
- Backup and restore procedures ready
|
|
|
|
## Incident Severity Levels
|
|
|
|
- **P0 (Critical)**: Active breach, data exfiltration, ransomware
|
|
- **P1 (High)**: Unauthorized access, privilege escalation
|
|
- **P2 (Medium)**: Suspicious activity, potential vulnerability
|
|
- **P3 (Low)**: Security alerts, false positives
|
|
|
|
## Immediate Response (First 15 Minutes)
|
|
|
|
### Step 1: Containment
|
|
|
|
```bash
|
|
# Isolate affected systems
|
|
# Option A: Scale down affected deployment
|
|
kubectl scale deployment veza-backend-api --replicas=0 -n veza-production
|
|
|
|
# Option B: Block network access
|
|
kubectl apply -f k8s/network-policies/block-all.yaml -n veza-production
|
|
|
|
# Option C: Revoke credentials
|
|
# Update secrets immediately
|
|
kubectl delete secret veza-secrets -n veza-production
|
|
# Restore from Vault with new credentials
|
|
```
|
|
|
|
### Step 2: Preserve Evidence
|
|
|
|
```bash
|
|
# Export logs
|
|
kubectl logs deployment/veza-backend-api -n veza-production > /tmp/incident-logs-$(date +%s).log
|
|
|
|
# Export events
|
|
kubectl get events -n veza-production --sort-by='.lastTimestamp' > /tmp/incident-events-$(date +%s).log
|
|
|
|
# Export pod configurations
|
|
kubectl get pods -n veza-production -o yaml > /tmp/incident-pods-$(date +%s).yaml
|
|
|
|
# Export network policies
|
|
kubectl get networkpolicies -n veza-production -o yaml > /tmp/incident-netpol-$(date +%s).yaml
|
|
```
|
|
|
|
### Step 3: Notify Team
|
|
|
|
```bash
|
|
# Send immediate notification
|
|
# (Use your notification system)
|
|
# PagerDuty, Slack, Email, etc.
|
|
|
|
# Document incident
|
|
echo "INCIDENT: $(date)" >> /tmp/incident-log.txt
|
|
echo "Severity: P0" >> /tmp/incident-log.txt
|
|
echo "Description: [Description]" >> /tmp/incident-log.txt
|
|
```
|
|
|
|
## Investigation Phase
|
|
|
|
### Step 1: Identify Scope
|
|
|
|
```bash
|
|
# Check for unauthorized pods
|
|
kubectl get pods -n veza-production --all-namespaces
|
|
|
|
# Check for suspicious services
|
|
kubectl get svc -n veza-production
|
|
|
|
# Check for unauthorized ingress
|
|
kubectl get ingress -n veza-production
|
|
|
|
# Check network policies
|
|
kubectl get networkpolicies -n veza-production
|
|
```
|
|
|
|
### Step 2: Review Access Logs
|
|
|
|
```bash
|
|
# Check API access logs
|
|
kubectl logs deployment/veza-backend-api -n veza-production | \
|
|
grep -i "unauthorized\|forbidden\|failed\|error"
|
|
|
|
# Check authentication logs
|
|
kubectl logs deployment/veza-backend-api -n veza-production | \
|
|
grep -i "login\|auth\|token\|jwt"
|
|
|
|
# Check database access
|
|
kubectl logs postgres-pod -n veza-production | \
|
|
grep -i "connection\|login\|failed"
|
|
```
|
|
|
|
### Step 3: Check for Data Exfiltration
|
|
|
|
```bash
|
|
# Check database access patterns
|
|
kubectl exec -it postgres-pod -n veza-production -- \
|
|
psql -U veza_user -d veza_db -c "
|
|
SELECT * FROM pg_stat_activity
|
|
WHERE state = 'active'
|
|
ORDER BY query_start DESC;
|
|
"
|
|
|
|
# Check for large data exports
|
|
kubectl exec -it postgres-pod -n veza-production -- \
|
|
psql -U veza_user -d veza_db -c "
|
|
SELECT schemaname, tablename, n_tup_ins, n_tup_upd, n_tup_del
|
|
FROM pg_stat_user_tables
|
|
ORDER BY n_tup_del DESC;
|
|
"
|
|
```
|
|
|
|
## Remediation Phase
|
|
|
|
### Step 1: Revoke Compromised Credentials
|
|
|
|
```bash
|
|
# Revoke JWT secrets
|
|
# Update in Vault
|
|
vault kv put secret/veza/production/jwt-secret value=$(openssl rand -base64 32)
|
|
|
|
# Force External Secrets to sync
|
|
kubectl annotate externalsecret veza-secrets \
|
|
force-sync=$(date +%s) \
|
|
-n veza-production \
|
|
--overwrite
|
|
|
|
# Restart applications
|
|
kubectl rollout restart deployment/veza-backend-api -n veza-production
|
|
```
|
|
|
|
### Step 2: Patch Vulnerabilities
|
|
|
|
```bash
|
|
# Update vulnerable images
|
|
kubectl set image deployment/veza-backend-api \
|
|
veza-backend-api=veza-backend-api:latest \
|
|
-n veza-production
|
|
|
|
# Apply security patches
|
|
kubectl apply -f k8s/security-patches/ -n veza-production
|
|
```
|
|
|
|
### Step 3: Restore from Clean Backup
|
|
|
|
If data was compromised:
|
|
|
|
```bash
|
|
# Follow data restore procedure
|
|
# See runbooks/data-restore.md
|
|
|
|
# Restore from backup before incident
|
|
kubectl scale deployment veza-backend-api --replicas=0 -n veza-production
|
|
|
|
# Restore database
|
|
# (Follow data-restore.md procedure)
|
|
|
|
# Restart applications
|
|
kubectl scale deployment veza-backend-api --replicas=3 -n veza-production
|
|
```
|
|
|
|
### Step 4: Strengthen Security
|
|
|
|
```bash
|
|
# Apply network policies
|
|
kubectl apply -f k8s/network-policies/ -n veza-production
|
|
|
|
# Enable audit logging
|
|
kubectl apply -f k8s/audit/audit-policy.yaml
|
|
|
|
# Update RBAC
|
|
kubectl apply -f k8s/rbac/ -n veza-production
|
|
|
|
# Enable Pod Security Policies
|
|
kubectl apply -f k8s/pod-security/ -n veza-production
|
|
```
|
|
|
|
## Recovery Phase
|
|
|
|
### Step 1: Verify System Integrity
|
|
|
|
```bash
|
|
# Check all pods are running
|
|
kubectl get pods -n veza-production
|
|
|
|
# Verify health checks
|
|
curl https://api.veza.com/health
|
|
|
|
# Check for anomalies
|
|
kubectl top pods -n veza-production
|
|
```
|
|
|
|
### Step 2: Monitor for Recurrence
|
|
|
|
```bash
|
|
# Set up enhanced monitoring
|
|
# (Configure additional alerts)
|
|
|
|
# Review logs continuously
|
|
kubectl logs -f deployment/veza-backend-api -n veza-production
|
|
```
|
|
|
|
### Step 3: Gradual Re-enablement
|
|
|
|
```bash
|
|
# Gradually scale up services
|
|
kubectl scale deployment veza-backend-api --replicas=1 -n veza-production
|
|
|
|
# Monitor for issues
|
|
# Wait 15 minutes
|
|
|
|
# Scale to full capacity
|
|
kubectl scale deployment veza-backend-api --replicas=3 -n veza-production
|
|
```
|
|
|
|
## Post-Incident Tasks
|
|
|
|
### Immediate (First 24 Hours)
|
|
|
|
1. **Document Incident**
|
|
- Timeline of events
|
|
- Actions taken
|
|
- Systems affected
|
|
- Data compromised (if any)
|
|
|
|
2. **Notify Stakeholders**
|
|
- Internal team
|
|
- Management
|
|
- Legal (if required)
|
|
- Customers (if data breach)
|
|
|
|
3. **Preserve Evidence**
|
|
- Secure all logs
|
|
- Document all actions
|
|
- Maintain chain of custody
|
|
|
|
### Short Term (First Week)
|
|
|
|
1. **Root Cause Analysis**
|
|
- Identify vulnerability
|
|
- Determine attack vector
|
|
- Assess impact
|
|
|
|
2. **Remediation**
|
|
- Patch vulnerabilities
|
|
- Update security policies
|
|
- Implement additional controls
|
|
|
|
3. **Communication**
|
|
- Internal post-mortem
|
|
- External communication (if needed)
|
|
- Regulatory notifications (if required)
|
|
|
|
### Long Term (Ongoing)
|
|
|
|
1. **Prevention**
|
|
- Security training
|
|
- Regular security audits
|
|
- Penetration testing
|
|
- Security monitoring improvements
|
|
|
|
2. **Documentation**
|
|
- Update security procedures
|
|
- Update incident response plan
|
|
- Document lessons learned
|
|
|
|
## Verification Checklist
|
|
|
|
- [ ] Incident contained
|
|
- [ ] Evidence preserved
|
|
- [ ] Compromised credentials revoked
|
|
- [ ] Vulnerabilities patched
|
|
- [ ] Systems restored
|
|
- [ ] Monitoring enhanced
|
|
- [ ] Documentation updated
|
|
- [ ] Stakeholders notified
|
|
- [ ] Post-mortem scheduled
|
|
|
|
## Communication Templates
|
|
|
|
### Internal Notification
|
|
|
|
```
|
|
Subject: [SECURITY INCIDENT] Veza Platform - <Description>
|
|
|
|
Severity: P0/P1/P2
|
|
Status: Contained/Investigating/Resolved
|
|
Impact: <Description>
|
|
Actions Taken: <List>
|
|
Next Steps: <List>
|
|
|
|
Incident response team is actively working on resolution.
|
|
```
|
|
|
|
### External Notification (if required)
|
|
|
|
```
|
|
Subject: Security Incident Notification
|
|
|
|
We are writing to inform you of a security incident that may have affected your account.
|
|
|
|
What happened: <Description>
|
|
What we're doing: <Actions>
|
|
What you should do: <Recommendations>
|
|
Timeline: <Dates>
|
|
|
|
We take security seriously and are committed to protecting your data.
|
|
```
|
|
|
|
## References
|
|
|
|
- [Data Restore Runbook](./data-restore.md)
|
|
- [Kubernetes Security Best Practices](https://kubernetes.io/docs/concepts/security/)
|
|
- [Incident Response Framework](../README.md)
|
|
|