veza/k8s/autoscaling
senke 652d474cc1 [INFRA-012] infra: Set up auto-scaling
🎉 ALL 267 TASKS COMPLETED! 🎉
2025-12-25 21:43:00 +01:00
..
cluster-autoscaler.yaml [INFRA-012] infra: Set up auto-scaling 2025-12-25 21:43:00 +01:00
hpa-backend-api.yaml [INFRA-012] infra: Set up auto-scaling 2025-12-25 21:43:00 +01:00
hpa-chat-server.yaml [INFRA-012] infra: Set up auto-scaling 2025-12-25 21:43:00 +01:00
hpa-frontend.yaml [INFRA-012] infra: Set up auto-scaling 2025-12-25 21:43:00 +01:00
hpa-stream-server.yaml [INFRA-012] infra: Set up auto-scaling 2025-12-25 21:43:00 +01:00
README.md [INFRA-012] infra: Set up auto-scaling 2025-12-25 21:43:00 +01:00
vpa-backend-api.yaml [INFRA-012] infra: Set up auto-scaling 2025-12-25 21:43:00 +01:00

Auto-Scaling Configuration for Veza Platform

This directory contains configurations for automatic scaling of the Veza platform based on load, including Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and Cluster Autoscaler.

Overview

Veza uses multiple layers of auto-scaling:

  1. Horizontal Pod Autoscaler (HPA): Scales pods based on CPU, memory, and custom metrics
  2. Vertical Pod Autoscaler (VPA): Adjusts resource requests and limits automatically
  3. Cluster Autoscaler: Scales cluster nodes based on pod scheduling requirements

Architecture

┌─────────────────────────────────────────────────────────┐
│                    Metrics Sources                        │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌────────┐ │
│  │Prometheus│  │Metrics API│  │Custom    │  │External│ │
│  │          │  │           │Metrics    │  │Metrics  │ │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬────┘ │
└───────┼─────────────┼─────────────┼─────────────┼──────┘
        │             │             │             │
        └─────────────┴─────────────┴─────────────┘
                      │
        ┌─────────────▼─────────────┐
        │   Horizontal Pod Autoscaler │
        │   (HPA)                     │
        └─────────────┬─────────────┘
                      │
        ┌─────────────▼─────────────┐
        │      Kubernetes API        │
        └─────────────┬─────────────┘
                      │
        ┌─────────────▼─────────────┐
        │      Deployment           │
        │  (Scale Pods Up/Down)     │
        └───────────────────────────┘

Components

1. Horizontal Pod Autoscaler (HPA)

HPA automatically scales the number of pods in a deployment based on observed metrics.

Scaling Triggers:

  • CPU utilization
  • Memory utilization
  • Custom metrics (request rate, queue depth, etc.)
  • External metrics (from Prometheus, etc.)

Scaling Behavior:

  • Scale Up: Aggressive scaling when load increases
  • Scale Down: Conservative scaling to prevent thrashing

2. Vertical Pod Autoscaler (VPA)

VPA automatically adjusts resource requests and limits for pods based on historical usage.

Modes:

  • Off: No recommendations
  • Initial: Set resources on pod creation
  • Auto: Update resources on pod restart
  • Recreate: Restart pods to apply new resources

3. Cluster Autoscaler

Cluster Autoscaler automatically adjusts the size of the Kubernetes cluster by adding or removing nodes.

Triggers:

  • Pods that cannot be scheduled due to insufficient resources
  • Nodes that are underutilized for extended periods

Configuration

Prerequisites

Enable Metrics Server

# Install metrics-server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify installation
kubectl get deployment metrics-server -n kube-system

Enable Custom Metrics (Optional)

For custom metrics, install Prometheus Adapter:

# Install Prometheus Adapter
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-adapter prometheus-community/prometheus-adapter

Horizontal Pod Autoscaler

Backend API HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: veza-backend-api-hpa
  namespace: veza-production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: veza-backend-api
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
      - type: Pods
        value: 1
        periodSeconds: 60
      selectPolicy: Min

Frontend HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: veza-frontend-hpa
  namespace: veza-production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: veza-frontend
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Chat Server HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: veza-chat-server-hpa
  namespace: veza-production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: veza-chat-server
  minReplicas: 2
  maxReplicas: 15
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Custom Metrics HPA

For scaling based on application-specific metrics:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: veza-backend-api-hpa-custom
  namespace: veza-production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: veza-backend-api
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "100"
  - type: Object
    object:
      metric:
        name: queue_depth
      describedObject:
        apiVersion: v1
        kind: Service
        name: veza-backend-api
      target:
        type: Value
        value: "50"

Vertical Pod Autoscaler

Backend API VPA

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: veza-backend-api-vpa
  namespace: veza-production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: veza-backend-api
  updatePolicy:
    updateMode: "Auto"  # Options: Off, Initial, Auto, Recreate
  resourcePolicy:
    containerPolicies:
    - containerName: backend-api
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 4000m
        memory: 8Gi
      controlledResources: ["cpu", "memory"]
      controlledValues: RequestsAndLimits

Scaling Strategies

1. CPU-Based Scaling

Use Case: CPU-intensive workloads

Configuration:

metrics:
- type: Resource
  resource:
    name: cpu
    target:
      type: Utilization
      averageUtilization: 70

Pros:

  • Simple and reliable
  • Works well for CPU-bound applications

Cons:

  • May not reflect actual load for I/O-bound applications
  • Can be slow to react to sudden spikes

2. Memory-Based Scaling

Use Case: Memory-intensive workloads

Configuration:

metrics:
- type: Resource
  resource:
    name: memory
    target:
      type: Utilization
      averageUtilization: 80

Pros:

  • Prevents OOM kills
  • Good for memory-intensive applications

Cons:

  • Memory usage can be less predictable
  • May scale unnecessarily

3. Custom Metrics Scaling

Use Case: Application-specific scaling needs

Configuration:

metrics:
- type: Pods
  pods:
    metric:
      name: http_requests_per_second
    target:
      type: AverageValue
      averageValue: "100"

Pros:

  • Scales based on actual application load
  • More responsive to traffic patterns

Cons:

  • Requires custom metrics infrastructure
  • More complex setup

4. Multi-Metric Scaling

Use Case: Complex scaling requirements

Configuration:

metrics:
- type: Resource
  resource:
    name: cpu
    target:
      type: Utilization
      averageUtilization: 70
- type: Resource
  resource:
    name: memory
    target:
      type: Utilization
      averageUtilization: 80
- type: Pods
  pods:
    metric:
      name: http_requests_per_second
    target:
      type: AverageValue
      averageValue: "100"

Behavior: HPA scales based on the metric that requires the most replicas.

Best Practices

1. Set Appropriate Min/Max Replicas

minReplicas: 3  # Ensure high availability
maxReplicas: 20  # Prevent runaway scaling

2. Use Stabilization Windows

behavior:
  scaleUp:
    stabilizationWindowSeconds: 60  # Wait 60s before scaling up
  scaleDown:
    stabilizationWindowSeconds: 300  # Wait 5min before scaling down

3. Configure Scaling Policies

behavior:
  scaleUp:
    policies:
    - type: Percent
      value: 100  # Can double replicas
      periodSeconds: 15
    - type: Pods
      value: 4  # Or add 4 pods max
      periodSeconds: 15
    selectPolicy: Max  # Use the more aggressive policy

4. Monitor Scaling Events

# Watch HPA status
kubectl get hpa -n veza-production -w

# Check HPA events
kubectl describe hpa veza-backend-api-hpa -n veza-production

5. Test Scaling Behavior

# Generate load to test scaling
kubectl run load-generator --rm -it --image=busybox --restart=Never -- \
  /bin/sh -c "while true; do wget -q -O- http://veza-backend-api:8080/api/v1/tracks; done"

# Watch scaling
kubectl get hpa veza-backend-api-hpa -n veza-production -w

Cluster Autoscaler

AWS EKS

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      serviceAccountName: cluster-autoscaler
      containers:
      - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.27.0
        name: cluster-autoscaler
        resources:
          limits:
            cpu: 100m
            memory: 300Mi
          requests:
            cpu: 100m
            memory: 300Mi
        command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=aws
        - --skip-nodes-with-local-storage=false
        - --expander=least-waste
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/veza-cluster
        env:
        - name: AWS_REGION
          value: us-east-1
        - name: AWS_STS_REGIONAL_ENDPOINTS
          value: regional

GCP GKE

GKE has built-in cluster autoscaling. Enable it when creating the cluster:

gcloud container clusters create veza-cluster \
  --enable-autoscaling \
  --min-nodes=3 \
  --max-nodes=10 \
  --zone=us-central1-a

Azure AKS

az aks create \
  --resource-group veza-rg \
  --name veza-cluster \
  --node-count 3 \
  --enable-cluster-autoscaler \
  --min-count 3 \
  --max-count 10

Monitoring

Metrics to Monitor

  • Current Replicas: Number of pods currently running
  • Desired Replicas: Target number of pods
  • CPU Utilization: Current CPU usage
  • Memory Utilization: Current memory usage
  • Scaling Events: Scale up/down events

Prometheus Queries

# Current replicas
kube_horizontalpodautoscaler_status_current_replicas

# Desired replicas
kube_horizontalpodautoscaler_status_desired_replicas

# CPU utilization
kube_horizontalpodautoscaler_status_current_metrics{metric_name="cpu"}

# Memory utilization
kube_horizontalpodautoscaler_status_current_metrics{metric_name="memory"}

# Scaling events
kube_horizontalpodautoscaler_status_condition{condition="AbleToScale"}

Grafana Dashboard

Create a dashboard to visualize:

  • Replica count over time
  • CPU/Memory utilization
  • Scaling events
  • HPA status

Troubleshooting

HPA Not Scaling

# Check HPA status
kubectl get hpa veza-backend-api-hpa -n veza-production

# Check HPA events
kubectl describe hpa veza-backend-api-hpa -n veza-production

# Verify metrics are available
kubectl top pods -n veza-production

# Check metrics-server
kubectl get deployment metrics-server -n kube-system
kubectl logs -n kube-system deployment/metrics-server

Scaling Too Aggressively

# Increase stabilization window
# Edit HPA to increase stabilizationWindowSeconds

# Add scaling policies to limit rate
# Add policies with lower values

Scaling Too Slowly

# Decrease stabilization window
# Edit HPA to decrease stabilizationWindowSeconds

# Add more aggressive scaling policies
# Increase Percent or Pods values

VPA Not Working

# Check VPA status
kubectl get vpa -n veza-production

# Check VPA recommendations
kubectl describe vpa veza-backend-api-vpa -n veza-production

# Verify VPA admission controller is installed
kubectl get deployment vpa-admission-controller -n kube-system

References