veza/k8s/autoscaling/README.md

# Auto-Scaling Configuration for Veza Platform

This directory contains configurations for automatic scaling of the Veza platform based on load, including Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and Cluster Autoscaler.

## Overview

Veza uses multiple layers of auto-scaling:

1. **Horizontal Pod Autoscaler (HPA)**: Scales pods based on CPU, memory, and custom metrics
2. **Vertical Pod Autoscaler (VPA)**: Adjusts resource requests and limits automatically
3. **Cluster Autoscaler**: Scales cluster nodes based on pod scheduling requirements

## Architecture

```
┌─────────────────────────────────────────────────────────┐
│                    Metrics Sources                        │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌────────┐ │
│  │Prometheus│  │Metrics API│  │Custom    │  │External│ │
│  │          │  │           │Metrics    │  │Metrics  │ │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬────┘ │
└───────┼─────────────┼─────────────┼─────────────┼──────┘
        │             │             │             │
        └─────────────┴─────────────┴─────────────┘
                      │
        ┌─────────────▼─────────────┐
        │   Horizontal Pod Autoscaler │
        │   (HPA)                     │
        └─────────────┬─────────────┘
                      │
        ┌─────────────▼─────────────┐
        │      Kubernetes API        │
        └─────────────┬─────────────┘
                      │
        ┌─────────────▼─────────────┐
        │      Deployment           │
        │  (Scale Pods Up/Down)     │
        └───────────────────────────┘
```

## Components

### 1. Horizontal Pod Autoscaler (HPA)

HPA automatically scales the number of pods in a deployment based on observed metrics.

**Scaling Triggers**:
- CPU utilization
- Memory utilization
- Custom metrics (request rate, queue depth, etc.)
- External metrics (from Prometheus, etc.)

**Scaling Behavior**:
- **Scale Up**: Aggressive scaling when load increases
- **Scale Down**: Conservative scaling to prevent thrashing

### 2. Vertical Pod Autoscaler (VPA)

VPA automatically adjusts resource requests and limits for pods based on historical usage.

**Modes**:
- **Off**: No recommendations
- **Initial**: Set resources on pod creation
- **Auto**: Update resources on pod restart
- **Recreate**: Restart pods to apply new resources

### 3. Cluster Autoscaler

Cluster Autoscaler automatically adjusts the size of the Kubernetes cluster by adding or removing nodes.

**Triggers**:
- Pods that cannot be scheduled due to insufficient resources
- Nodes that are underutilized for extended periods

## Configuration

### Prerequisites

#### Enable Metrics Server

```bash
# Install metrics-server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify installation
kubectl get deployment metrics-server -n kube-system
```

#### Enable Custom Metrics (Optional)

For custom metrics, install Prometheus Adapter:

```bash
# Install Prometheus Adapter
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-adapter prometheus-community/prometheus-adapter
```

### Horizontal Pod Autoscaler

#### Backend API HPA

```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: veza-backend-api-hpa
  namespace: veza-production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: veza-backend-api
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
      - type: Pods
        value: 1
        periodSeconds: 60
      selectPolicy: Min
```

#### Frontend HPA

```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: veza-frontend-hpa
  namespace: veza-production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: veza-frontend
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
```

#### Chat Server HPA

```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: veza-chat-server-hpa
  namespace: veza-production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: veza-chat-server
  minReplicas: 2
  maxReplicas: 15
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
```

### Custom Metrics HPA

For scaling based on application-specific metrics:

```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: veza-backend-api-hpa-custom
  namespace: veza-production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: veza-backend-api
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "100"
  - type: Object
    object:
      metric:
        name: queue_depth
      describedObject:
        apiVersion: v1
        kind: Service
        name: veza-backend-api
      target:
        type: Value
        value: "50"
```

### Vertical Pod Autoscaler

#### Backend API VPA

```yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: veza-backend-api-vpa
  namespace: veza-production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: veza-backend-api
  updatePolicy:
    updateMode: "Auto"  # Options: Off, Initial, Auto, Recreate
  resourcePolicy:
    containerPolicies:
    - containerName: backend-api
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 4000m
        memory: 8Gi
      controlledResources: ["cpu", "memory"]
      controlledValues: RequestsAndLimits
```

## Scaling Strategies

### 1. CPU-Based Scaling

**Use Case**: CPU-intensive workloads

**Configuration**:
```yaml
metrics:
- type: Resource
  resource:
    name: cpu
    target:
      type: Utilization
      averageUtilization: 70
```

**Pros**:
- Simple and reliable
- Works well for CPU-bound applications

**Cons**:
- May not reflect actual load for I/O-bound applications
- Can be slow to react to sudden spikes

### 2. Memory-Based Scaling

**Use Case**: Memory-intensive workloads

**Configuration**:
```yaml
metrics:
- type: Resource
  resource:
    name: memory
    target:
      type: Utilization
      averageUtilization: 80
```

**Pros**:
- Prevents OOM kills
- Good for memory-intensive applications

**Cons**:
- Memory usage can be less predictable
- May scale unnecessarily

### 3. Custom Metrics Scaling

**Use Case**: Application-specific scaling needs

**Configuration**:
```yaml
metrics:
- type: Pods
  pods:
    metric:
      name: http_requests_per_second
    target:
      type: AverageValue
      averageValue: "100"
```

**Pros**:
- Scales based on actual application load
- More responsive to traffic patterns

**Cons**:
- Requires custom metrics infrastructure
- More complex setup

### 4. Multi-Metric Scaling

**Use Case**: Complex scaling requirements

**Configuration**:
```yaml
metrics:
- type: Resource
  resource:
    name: cpu
    target:
      type: Utilization
      averageUtilization: 70
- type: Resource
  resource:
    name: memory
    target:
      type: Utilization
      averageUtilization: 80
- type: Pods
  pods:
    metric:
      name: http_requests_per_second
    target:
      type: AverageValue
      averageValue: "100"
```

**Behavior**: HPA scales based on the metric that requires the most replicas.

## Best Practices

### 1. Set Appropriate Min/Max Replicas

```yaml
minReplicas: 3  # Ensure high availability
maxReplicas: 20  # Prevent runaway scaling
```

### 2. Use Stabilization Windows

```yaml
behavior:
  scaleUp:
    stabilizationWindowSeconds: 60  # Wait 60s before scaling up
  scaleDown:
    stabilizationWindowSeconds: 300  # Wait 5min before scaling down
```

### 3. Configure Scaling Policies

```yaml
behavior:
  scaleUp:
    policies:
    - type: Percent
      value: 100  # Can double replicas
      periodSeconds: 15
    - type: Pods
      value: 4  # Or add 4 pods max
      periodSeconds: 15
    selectPolicy: Max  # Use the more aggressive policy
```

### 4. Monitor Scaling Events

```bash
# Watch HPA status
kubectl get hpa -n veza-production -w

# Check HPA events
kubectl describe hpa veza-backend-api-hpa -n veza-production
```

### 5. Test Scaling Behavior

```bash
# Generate load to test scaling
kubectl run load-generator --rm -it --image=busybox --restart=Never -- \
  /bin/sh -c "while true; do wget -q -O- http://veza-backend-api:8080/api/v1/tracks; done"

# Watch scaling
kubectl get hpa veza-backend-api-hpa -n veza-production -w
```

## Cluster Autoscaler

### AWS EKS

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      serviceAccountName: cluster-autoscaler
      containers:
      - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.27.0
        name: cluster-autoscaler
        resources:
          limits:
            cpu: 100m
            memory: 300Mi
          requests:
            cpu: 100m
            memory: 300Mi
        command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=aws
        - --skip-nodes-with-local-storage=false
        - --expander=least-waste
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/veza-cluster
        env:
        - name: AWS_REGION
          value: us-east-1
        - name: AWS_STS_REGIONAL_ENDPOINTS
          value: regional
```

### GCP GKE

GKE has built-in cluster autoscaling. Enable it when creating the cluster:

```bash
gcloud container clusters create veza-cluster \
  --enable-autoscaling \
  --min-nodes=3 \
  --max-nodes=10 \
  --zone=us-central1-a
```

### Azure AKS

```bash
az aks create \
  --resource-group veza-rg \
  --name veza-cluster \
  --node-count 3 \
  --enable-cluster-autoscaler \
  --min-count 3 \
  --max-count 10
```

## Monitoring

### Metrics to Monitor

- **Current Replicas**: Number of pods currently running
- **Desired Replicas**: Target number of pods
- **CPU Utilization**: Current CPU usage
- **Memory Utilization**: Current memory usage
- **Scaling Events**: Scale up/down events

### Prometheus Queries

```promql
# Current replicas
kube_horizontalpodautoscaler_status_current_replicas

# Desired replicas
kube_horizontalpodautoscaler_status_desired_replicas

# CPU utilization
kube_horizontalpodautoscaler_status_current_metrics{metric_name="cpu"}

# Memory utilization
kube_horizontalpodautoscaler_status_current_metrics{metric_name="memory"}

# Scaling events
kube_horizontalpodautoscaler_status_condition{condition="AbleToScale"}
```

### Grafana Dashboard

Create a dashboard to visualize:
- Replica count over time
- CPU/Memory utilization
- Scaling events
- HPA status

## Troubleshooting

### HPA Not Scaling

```bash
# Check HPA status
kubectl get hpa veza-backend-api-hpa -n veza-production

# Check HPA events
kubectl describe hpa veza-backend-api-hpa -n veza-production

# Verify metrics are available
kubectl top pods -n veza-production

# Check metrics-server
kubectl get deployment metrics-server -n kube-system
kubectl logs -n kube-system deployment/metrics-server
```

### Scaling Too Aggressively

```bash
# Increase stabilization window
# Edit HPA to increase stabilizationWindowSeconds

# Add scaling policies to limit rate
# Add policies with lower values
```

### Scaling Too Slowly

```bash
# Decrease stabilization window
# Edit HPA to decrease stabilizationWindowSeconds

# Add more aggressive scaling policies
# Increase Percent or Pods values
```

### VPA Not Working

```bash
# Check VPA status
kubectl get vpa -n veza-production

# Check VPA recommendations
kubectl describe vpa veza-backend-api-vpa -n veza-production

# Verify VPA admission controller is installed
kubectl get deployment vpa-admission-controller -n kube-system
```

## References

- [Kubernetes HPA Documentation](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/)
- [Kubernetes VPA Documentation](https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler)
- [Cluster Autoscaler Documentation](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler)
- [Metrics Server](https://github.com/kubernetes-sigs/metrics-server)