veza/k8s/disaster-recovery/runbooks/security-incident.md

# Security Incident Response Runbook

This runbook describes the procedure for responding to security incidents, including breaches, unauthorized access, and data exfiltration.

## Prerequisites

- Security team contact information
- Incident response team assembled
- Access to logs and monitoring
- Backup and restore procedures ready

## Incident Severity Levels

- **P0 (Critical)**: Active breach, data exfiltration, ransomware
- **P1 (High)**: Unauthorized access, privilege escalation
- **P2 (Medium)**: Suspicious activity, potential vulnerability
- **P3 (Low)**: Security alerts, false positives

## Immediate Response (First 15 Minutes)

### Step 1: Containment

```bash
# Isolate affected systems
# Option A: Scale down affected deployment
kubectl scale deployment veza-backend-api --replicas=0 -n veza-production

# Option B: Block network access
kubectl apply -f k8s/network-policies/block-all.yaml -n veza-production

# Option C: Revoke credentials
# Update secrets immediately
kubectl delete secret veza-secrets -n veza-production
# Restore from Vault with new credentials
```

### Step 2: Preserve Evidence

```bash
# Export logs
kubectl logs deployment/veza-backend-api -n veza-production > /tmp/incident-logs-$(date +%s).log

# Export events
kubectl get events -n veza-production --sort-by='.lastTimestamp' > /tmp/incident-events-$(date +%s).log

# Export pod configurations
kubectl get pods -n veza-production -o yaml > /tmp/incident-pods-$(date +%s).yaml

# Export network policies
kubectl get networkpolicies -n veza-production -o yaml > /tmp/incident-netpol-$(date +%s).yaml
```

### Step 3: Notify Team

```bash
# Send immediate notification
# (Use your notification system)
# PagerDuty, Slack, Email, etc.

# Document incident
echo "INCIDENT: $(date)" >> /tmp/incident-log.txt
echo "Severity: P0" >> /tmp/incident-log.txt
echo "Description: [Description]" >> /tmp/incident-log.txt
```

## Investigation Phase

### Step 1: Identify Scope

```bash
# Check for unauthorized pods
kubectl get pods -n veza-production --all-namespaces

# Check for suspicious services
kubectl get svc -n veza-production

# Check for unauthorized ingress
kubectl get ingress -n veza-production

# Check network policies
kubectl get networkpolicies -n veza-production
```

### Step 2: Review Access Logs

```bash
# Check API access logs
kubectl logs deployment/veza-backend-api -n veza-production | \
  grep -i "unauthorized\|forbidden\|failed\|error"

# Check authentication logs
kubectl logs deployment/veza-backend-api -n veza-production | \
  grep -i "login\|auth\|token\|jwt"

# Check database access
kubectl logs postgres-pod -n veza-production | \
  grep -i "connection\|login\|failed"
```

### Step 3: Check for Data Exfiltration

```bash
# Check database access patterns
kubectl exec -it postgres-pod -n veza-production -- \
  psql -U veza_user -d veza_db -c "
    SELECT * FROM pg_stat_activity
    WHERE state = 'active'
    ORDER BY query_start DESC;
  "

# Check for large data exports
kubectl exec -it postgres-pod -n veza-production -- \
  psql -U veza_user -d veza_db -c "
    SELECT schemaname, tablename, n_tup_ins, n_tup_upd, n_tup_del
    FROM pg_stat_user_tables
    ORDER BY n_tup_del DESC;
  "
```

## Remediation Phase

### Step 1: Revoke Compromised Credentials

```bash
# Revoke JWT secrets
# Update in Vault
vault kv put secret/veza/production/jwt-secret value=$(openssl rand -base64 32)

# Force External Secrets to sync
kubectl annotate externalsecret veza-secrets \
  force-sync=$(date +%s) \
  -n veza-production \
  --overwrite

# Restart applications
kubectl rollout restart deployment/veza-backend-api -n veza-production
```

### Step 2: Patch Vulnerabilities

```bash
# Update vulnerable images
kubectl set image deployment/veza-backend-api \
  veza-backend-api=veza-backend-api:latest \
  -n veza-production

# Apply security patches
kubectl apply -f k8s/security-patches/ -n veza-production
```

### Step 3: Restore from Clean Backup

If data was compromised:

```bash
# Follow data restore procedure
# See runbooks/data-restore.md

# Restore from backup before incident
kubectl scale deployment veza-backend-api --replicas=0 -n veza-production

# Restore database
# (Follow data-restore.md procedure)

# Restart applications
kubectl scale deployment veza-backend-api --replicas=3 -n veza-production
```

### Step 4: Strengthen Security

```bash
# Apply network policies
kubectl apply -f k8s/network-policies/ -n veza-production

# Enable audit logging
kubectl apply -f k8s/audit/audit-policy.yaml

# Update RBAC
kubectl apply -f k8s/rbac/ -n veza-production

# Enable Pod Security Policies
kubectl apply -f k8s/pod-security/ -n veza-production
```

## Recovery Phase

### Step 1: Verify System Integrity

```bash
# Check all pods are running
kubectl get pods -n veza-production

# Verify health checks
curl https://api.veza.com/health

# Check for anomalies
kubectl top pods -n veza-production
```

### Step 2: Monitor for Recurrence

```bash
# Set up enhanced monitoring
# (Configure additional alerts)

# Review logs continuously
kubectl logs -f deployment/veza-backend-api -n veza-production
```

### Step 3: Gradual Re-enablement

```bash
# Gradually scale up services
kubectl scale deployment veza-backend-api --replicas=1 -n veza-production

# Monitor for issues
# Wait 15 minutes

# Scale to full capacity
kubectl scale deployment veza-backend-api --replicas=3 -n veza-production
```

## Post-Incident Tasks

### Immediate (First 24 Hours)

1. **Document Incident**
   - Timeline of events
   - Actions taken
   - Systems affected
   - Data compromised (if any)

2. **Notify Stakeholders**
   - Internal team
   - Management
   - Legal (if required)
   - Customers (if data breach)

3. **Preserve Evidence**
   - Secure all logs
   - Document all actions
   - Maintain chain of custody

### Short Term (First Week)

1. **Root Cause Analysis**
   - Identify vulnerability
   - Determine attack vector
   - Assess impact

2. **Remediation**
   - Patch vulnerabilities
   - Update security policies
   - Implement additional controls

3. **Communication**
   - Internal post-mortem
   - External communication (if needed)
   - Regulatory notifications (if required)

### Long Term (Ongoing)

1. **Prevention**
   - Security training
   - Regular security audits
   - Penetration testing
   - Security monitoring improvements

2. **Documentation**
   - Update security procedures
   - Update incident response plan
   - Document lessons learned

## Verification Checklist

- [ ] Incident contained
- [ ] Evidence preserved
- [ ] Compromised credentials revoked
- [ ] Vulnerabilities patched
- [ ] Systems restored
- [ ] Monitoring enhanced
- [ ] Documentation updated
- [ ] Stakeholders notified
- [ ] Post-mortem scheduled

## Communication Templates

### Internal Notification

```
Subject: [SECURITY INCIDENT] Veza Platform - <Description>

Severity: P0/P1/P2
Status: Contained/Investigating/Resolved
Impact: <Description>
Actions Taken: <List>
Next Steps: <List>

Incident response team is actively working on resolution.
```

### External Notification (if required)

```
Subject: Security Incident Notification

We are writing to inform you of a security incident that may have affected your account.

What happened: <Description>
What we're doing: <Actions>
What you should do: <Recommendations>
Timeline: <Dates>

We take security seriously and are committed to protecting your data.
```

## References

- [Data Restore Runbook](./data-restore.md)
- [Kubernetes Security Best Practices](https://kubernetes.io/docs/concepts/security/)
- [Incident Response Framework](../README.md)