# Auto-Scaling Configuration for Veza Platform This directory contains configurations for automatic scaling of the Veza platform based on load, including Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and Cluster Autoscaler. ## Overview Veza uses multiple layers of auto-scaling: 1. **Horizontal Pod Autoscaler (HPA)**: Scales pods based on CPU, memory, and custom metrics 2. **Vertical Pod Autoscaler (VPA)**: Adjusts resource requests and limits automatically 3. **Cluster Autoscaler**: Scales cluster nodes based on pod scheduling requirements ## Architecture ``` ┌─────────────────────────────────────────────────────────┐ │ Metrics Sources │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────┐ │ │ │Prometheus│ │Metrics API│ │Custom │ │External│ │ │ │ │ │ │Metrics │ │Metrics │ │ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬────┘ │ └───────┼─────────────┼─────────────┼─────────────┼──────┘ │ │ │ │ └─────────────┴─────────────┴─────────────┘ │ ┌─────────────▼─────────────┐ │ Horizontal Pod Autoscaler │ │ (HPA) │ └─────────────┬─────────────┘ │ ┌─────────────▼─────────────┐ │ Kubernetes API │ └─────────────┬─────────────┘ │ ┌─────────────▼─────────────┐ │ Deployment │ │ (Scale Pods Up/Down) │ └───────────────────────────┘ ``` ## Components ### 1. Horizontal Pod Autoscaler (HPA) HPA automatically scales the number of pods in a deployment based on observed metrics. **Scaling Triggers**: - CPU utilization - Memory utilization - Custom metrics (request rate, queue depth, etc.) - External metrics (from Prometheus, etc.) **Scaling Behavior**: - **Scale Up**: Aggressive scaling when load increases - **Scale Down**: Conservative scaling to prevent thrashing ### 2. Vertical Pod Autoscaler (VPA) VPA automatically adjusts resource requests and limits for pods based on historical usage. **Modes**: - **Off**: No recommendations - **Initial**: Set resources on pod creation - **Auto**: Update resources on pod restart - **Recreate**: Restart pods to apply new resources ### 3. Cluster Autoscaler Cluster Autoscaler automatically adjusts the size of the Kubernetes cluster by adding or removing nodes. **Triggers**: - Pods that cannot be scheduled due to insufficient resources - Nodes that are underutilized for extended periods ## Configuration ### Prerequisites #### Enable Metrics Server ```bash # Install metrics-server kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml # Verify installation kubectl get deployment metrics-server -n kube-system ``` #### Enable Custom Metrics (Optional) For custom metrics, install Prometheus Adapter: ```bash # Install Prometheus Adapter helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm install prometheus-adapter prometheus-community/prometheus-adapter ``` ### Horizontal Pod Autoscaler #### Backend API HPA ```yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: veza-backend-api-hpa namespace: veza-production spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: veza-backend-api minReplicas: 3 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 behavior: scaleUp: stabilizationWindowSeconds: 60 policies: - type: Percent value: 100 periodSeconds: 15 - type: Pods value: 4 periodSeconds: 15 selectPolicy: Max scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 10 periodSeconds: 60 - type: Pods value: 1 periodSeconds: 60 selectPolicy: Min ``` #### Frontend HPA ```yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: veza-frontend-hpa namespace: veza-production spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: veza-frontend minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 ``` ### Custom Metrics HPA For scaling based on application-specific metrics: ```yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: veza-backend-api-hpa-custom namespace: veza-production spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: veza-backend-api minReplicas: 3 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Pods pods: metric: name: http_requests_per_second target: type: AverageValue averageValue: "100" - type: Object object: metric: name: queue_depth describedObject: apiVersion: v1 kind: Service name: veza-backend-api target: type: Value value: "50" ``` ### Vertical Pod Autoscaler #### Backend API VPA ```yaml apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: name: veza-backend-api-vpa namespace: veza-production spec: targetRef: apiVersion: apps/v1 kind: Deployment name: veza-backend-api updatePolicy: updateMode: "Auto" # Options: Off, Initial, Auto, Recreate resourcePolicy: containerPolicies: - containerName: backend-api minAllowed: cpu: 100m memory: 128Mi maxAllowed: cpu: 4000m memory: 8Gi controlledResources: ["cpu", "memory"] controlledValues: RequestsAndLimits ``` ## Scaling Strategies ### 1. CPU-Based Scaling **Use Case**: CPU-intensive workloads **Configuration**: ```yaml metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 ``` **Pros**: - Simple and reliable - Works well for CPU-bound applications **Cons**: - May not reflect actual load for I/O-bound applications - Can be slow to react to sudden spikes ### 2. Memory-Based Scaling **Use Case**: Memory-intensive workloads **Configuration**: ```yaml metrics: - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 ``` **Pros**: - Prevents OOM kills - Good for memory-intensive applications **Cons**: - Memory usage can be less predictable - May scale unnecessarily ### 3. Custom Metrics Scaling **Use Case**: Application-specific scaling needs **Configuration**: ```yaml metrics: - type: Pods pods: metric: name: http_requests_per_second target: type: AverageValue averageValue: "100" ``` **Pros**: - Scales based on actual application load - More responsive to traffic patterns **Cons**: - Requires custom metrics infrastructure - More complex setup ### 4. Multi-Metric Scaling **Use Case**: Complex scaling requirements **Configuration**: ```yaml metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 - type: Pods pods: metric: name: http_requests_per_second target: type: AverageValue averageValue: "100" ``` **Behavior**: HPA scales based on the metric that requires the most replicas. ## Best Practices ### 1. Set Appropriate Min/Max Replicas ```yaml minReplicas: 3 # Ensure high availability maxReplicas: 20 # Prevent runaway scaling ``` ### 2. Use Stabilization Windows ```yaml behavior: scaleUp: stabilizationWindowSeconds: 60 # Wait 60s before scaling up scaleDown: stabilizationWindowSeconds: 300 # Wait 5min before scaling down ``` ### 3. Configure Scaling Policies ```yaml behavior: scaleUp: policies: - type: Percent value: 100 # Can double replicas periodSeconds: 15 - type: Pods value: 4 # Or add 4 pods max periodSeconds: 15 selectPolicy: Max # Use the more aggressive policy ``` ### 4. Monitor Scaling Events ```bash # Watch HPA status kubectl get hpa -n veza-production -w # Check HPA events kubectl describe hpa veza-backend-api-hpa -n veza-production ``` ### 5. Test Scaling Behavior ```bash # Generate load to test scaling kubectl run load-generator --rm -it --image=busybox --restart=Never -- \ /bin/sh -c "while true; do wget -q -O- http://veza-backend-api:8080/api/v1/tracks; done" # Watch scaling kubectl get hpa veza-backend-api-hpa -n veza-production -w ``` ## Cluster Autoscaler ### AWS EKS ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: cluster-autoscaler namespace: kube-system spec: replicas: 1 selector: matchLabels: app: cluster-autoscaler template: metadata: labels: app: cluster-autoscaler spec: serviceAccountName: cluster-autoscaler containers: - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.27.0 name: cluster-autoscaler resources: limits: cpu: 100m memory: 300Mi requests: cpu: 100m memory: 300Mi command: - ./cluster-autoscaler - --v=4 - --stderrthreshold=info - --cloud-provider=aws - --skip-nodes-with-local-storage=false - --expander=least-waste - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/veza-cluster env: - name: AWS_REGION value: us-east-1 - name: AWS_STS_REGIONAL_ENDPOINTS value: regional ``` ### GCP GKE GKE has built-in cluster autoscaling. Enable it when creating the cluster: ```bash gcloud container clusters create veza-cluster \ --enable-autoscaling \ --min-nodes=3 \ --max-nodes=10 \ --zone=us-central1-a ``` ### Azure AKS ```bash az aks create \ --resource-group veza-rg \ --name veza-cluster \ --node-count 3 \ --enable-cluster-autoscaler \ --min-count 3 \ --max-count 10 ``` ## Monitoring ### Metrics to Monitor - **Current Replicas**: Number of pods currently running - **Desired Replicas**: Target number of pods - **CPU Utilization**: Current CPU usage - **Memory Utilization**: Current memory usage - **Scaling Events**: Scale up/down events ### Prometheus Queries ```promql # Current replicas kube_horizontalpodautoscaler_status_current_replicas # Desired replicas kube_horizontalpodautoscaler_status_desired_replicas # CPU utilization kube_horizontalpodautoscaler_status_current_metrics{metric_name="cpu"} # Memory utilization kube_horizontalpodautoscaler_status_current_metrics{metric_name="memory"} # Scaling events kube_horizontalpodautoscaler_status_condition{condition="AbleToScale"} ``` ### Grafana Dashboard Create a dashboard to visualize: - Replica count over time - CPU/Memory utilization - Scaling events - HPA status ## Troubleshooting ### HPA Not Scaling ```bash # Check HPA status kubectl get hpa veza-backend-api-hpa -n veza-production # Check HPA events kubectl describe hpa veza-backend-api-hpa -n veza-production # Verify metrics are available kubectl top pods -n veza-production # Check metrics-server kubectl get deployment metrics-server -n kube-system kubectl logs -n kube-system deployment/metrics-server ``` ### Scaling Too Aggressively ```bash # Increase stabilization window # Edit HPA to increase stabilizationWindowSeconds # Add scaling policies to limit rate # Add policies with lower values ``` ### Scaling Too Slowly ```bash # Decrease stabilization window # Edit HPA to decrease stabilizationWindowSeconds # Add more aggressive scaling policies # Increase Percent or Pods values ``` ### VPA Not Working ```bash # Check VPA status kubectl get vpa -n veza-production # Check VPA recommendations kubectl describe vpa veza-backend-api-vpa -n veza-production # Verify VPA admission controller is installed kubectl get deployment vpa-admission-controller -n kube-system ``` ## References - [Kubernetes HPA Documentation](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) - [Kubernetes VPA Documentation](https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler) - [Cluster Autoscaler Documentation](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) - [Metrics Server](https://github.com/kubernetes-sigs/metrics-server)