| .. | ||
| cluster-autoscaler.yaml | ||
| hpa-backend-api.yaml | ||
| hpa-frontend.yaml | ||
| hpa-stream-server.yaml | ||
| README.md | ||
| vpa-backend-api.yaml | ||
Auto-Scaling Configuration for Veza Platform
This directory contains configurations for automatic scaling of the Veza platform based on load, including Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and Cluster Autoscaler.
Overview
Veza uses multiple layers of auto-scaling:
- Horizontal Pod Autoscaler (HPA): Scales pods based on CPU, memory, and custom metrics
- Vertical Pod Autoscaler (VPA): Adjusts resource requests and limits automatically
- Cluster Autoscaler: Scales cluster nodes based on pod scheduling requirements
Architecture
┌─────────────────────────────────────────────────────────┐
│ Metrics Sources │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────┐ │
│ │Prometheus│ │Metrics API│ │Custom │ │External│ │
│ │ │ │ │Metrics │ │Metrics │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬────┘ │
└───────┼─────────────┼─────────────┼─────────────┼──────┘
│ │ │ │
└─────────────┴─────────────┴─────────────┘
│
┌─────────────▼─────────────┐
│ Horizontal Pod Autoscaler │
│ (HPA) │
└─────────────┬─────────────┘
│
┌─────────────▼─────────────┐
│ Kubernetes API │
└─────────────┬─────────────┘
│
┌─────────────▼─────────────┐
│ Deployment │
│ (Scale Pods Up/Down) │
└───────────────────────────┘
Components
1. Horizontal Pod Autoscaler (HPA)
HPA automatically scales the number of pods in a deployment based on observed metrics.
Scaling Triggers:
- CPU utilization
- Memory utilization
- Custom metrics (request rate, queue depth, etc.)
- External metrics (from Prometheus, etc.)
Scaling Behavior:
- Scale Up: Aggressive scaling when load increases
- Scale Down: Conservative scaling to prevent thrashing
2. Vertical Pod Autoscaler (VPA)
VPA automatically adjusts resource requests and limits for pods based on historical usage.
Modes:
- Off: No recommendations
- Initial: Set resources on pod creation
- Auto: Update resources on pod restart
- Recreate: Restart pods to apply new resources
3. Cluster Autoscaler
Cluster Autoscaler automatically adjusts the size of the Kubernetes cluster by adding or removing nodes.
Triggers:
- Pods that cannot be scheduled due to insufficient resources
- Nodes that are underutilized for extended periods
Configuration
Prerequisites
Enable Metrics Server
# Install metrics-server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Verify installation
kubectl get deployment metrics-server -n kube-system
Enable Custom Metrics (Optional)
For custom metrics, install Prometheus Adapter:
# Install Prometheus Adapter
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-adapter prometheus-community/prometheus-adapter
Horizontal Pod Autoscaler
Backend API HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: veza-backend-api-hpa
namespace: veza-production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: veza-backend-api
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
- type: Pods
value: 1
periodSeconds: 60
selectPolicy: Min
Frontend HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: veza-frontend-hpa
namespace: veza-production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: veza-frontend
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Custom Metrics HPA
For scaling based on application-specific metrics:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: veza-backend-api-hpa-custom
namespace: veza-production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: veza-backend-api
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
- type: Object
object:
metric:
name: queue_depth
describedObject:
apiVersion: v1
kind: Service
name: veza-backend-api
target:
type: Value
value: "50"
Vertical Pod Autoscaler
Backend API VPA
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: veza-backend-api-vpa
namespace: veza-production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: veza-backend-api
updatePolicy:
updateMode: "Auto" # Options: Off, Initial, Auto, Recreate
resourcePolicy:
containerPolicies:
- containerName: backend-api
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 4000m
memory: 8Gi
controlledResources: ["cpu", "memory"]
controlledValues: RequestsAndLimits
Scaling Strategies
1. CPU-Based Scaling
Use Case: CPU-intensive workloads
Configuration:
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Pros:
- Simple and reliable
- Works well for CPU-bound applications
Cons:
- May not reflect actual load for I/O-bound applications
- Can be slow to react to sudden spikes
2. Memory-Based Scaling
Use Case: Memory-intensive workloads
Configuration:
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Pros:
- Prevents OOM kills
- Good for memory-intensive applications
Cons:
- Memory usage can be less predictable
- May scale unnecessarily
3. Custom Metrics Scaling
Use Case: Application-specific scaling needs
Configuration:
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
Pros:
- Scales based on actual application load
- More responsive to traffic patterns
Cons:
- Requires custom metrics infrastructure
- More complex setup
4. Multi-Metric Scaling
Use Case: Complex scaling requirements
Configuration:
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
Behavior: HPA scales based on the metric that requires the most replicas.
Best Practices
1. Set Appropriate Min/Max Replicas
minReplicas: 3 # Ensure high availability
maxReplicas: 20 # Prevent runaway scaling
2. Use Stabilization Windows
behavior:
scaleUp:
stabilizationWindowSeconds: 60 # Wait 60s before scaling up
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5min before scaling down
3. Configure Scaling Policies
behavior:
scaleUp:
policies:
- type: Percent
value: 100 # Can double replicas
periodSeconds: 15
- type: Pods
value: 4 # Or add 4 pods max
periodSeconds: 15
selectPolicy: Max # Use the more aggressive policy
4. Monitor Scaling Events
# Watch HPA status
kubectl get hpa -n veza-production -w
# Check HPA events
kubectl describe hpa veza-backend-api-hpa -n veza-production
5. Test Scaling Behavior
# Generate load to test scaling
kubectl run load-generator --rm -it --image=busybox --restart=Never -- \
/bin/sh -c "while true; do wget -q -O- http://veza-backend-api:8080/api/v1/tracks; done"
# Watch scaling
kubectl get hpa veza-backend-api-hpa -n veza-production -w
Cluster Autoscaler
AWS EKS
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: cluster-autoscaler
template:
metadata:
labels:
app: cluster-autoscaler
spec:
serviceAccountName: cluster-autoscaler
containers:
- image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.27.0
name: cluster-autoscaler
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 100m
memory: 300Mi
command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/veza-cluster
env:
- name: AWS_REGION
value: us-east-1
- name: AWS_STS_REGIONAL_ENDPOINTS
value: regional
GCP GKE
GKE has built-in cluster autoscaling. Enable it when creating the cluster:
gcloud container clusters create veza-cluster \
--enable-autoscaling \
--min-nodes=3 \
--max-nodes=10 \
--zone=us-central1-a
Azure AKS
az aks create \
--resource-group veza-rg \
--name veza-cluster \
--node-count 3 \
--enable-cluster-autoscaler \
--min-count 3 \
--max-count 10
Monitoring
Metrics to Monitor
- Current Replicas: Number of pods currently running
- Desired Replicas: Target number of pods
- CPU Utilization: Current CPU usage
- Memory Utilization: Current memory usage
- Scaling Events: Scale up/down events
Prometheus Queries
# Current replicas
kube_horizontalpodautoscaler_status_current_replicas
# Desired replicas
kube_horizontalpodautoscaler_status_desired_replicas
# CPU utilization
kube_horizontalpodautoscaler_status_current_metrics{metric_name="cpu"}
# Memory utilization
kube_horizontalpodautoscaler_status_current_metrics{metric_name="memory"}
# Scaling events
kube_horizontalpodautoscaler_status_condition{condition="AbleToScale"}
Grafana Dashboard
Create a dashboard to visualize:
- Replica count over time
- CPU/Memory utilization
- Scaling events
- HPA status
Troubleshooting
HPA Not Scaling
# Check HPA status
kubectl get hpa veza-backend-api-hpa -n veza-production
# Check HPA events
kubectl describe hpa veza-backend-api-hpa -n veza-production
# Verify metrics are available
kubectl top pods -n veza-production
# Check metrics-server
kubectl get deployment metrics-server -n kube-system
kubectl logs -n kube-system deployment/metrics-server
Scaling Too Aggressively
# Increase stabilization window
# Edit HPA to increase stabilizationWindowSeconds
# Add scaling policies to limit rate
# Add policies with lower values
Scaling Too Slowly
# Decrease stabilization window
# Edit HPA to decrease stabilizationWindowSeconds
# Add more aggressive scaling policies
# Increase Percent or Pods values
VPA Not Working
# Check VPA status
kubectl get vpa -n veza-production
# Check VPA recommendations
kubectl describe vpa veza-backend-api-vpa -n veza-production
# Verify VPA admission controller is installed
kubectl get deployment vpa-admission-controller -n kube-system