608 lines
14 KiB
Markdown
608 lines
14 KiB
Markdown
# Auto-Scaling Configuration for Veza Platform
|
|
|
|
This directory contains configurations for automatic scaling of the Veza platform based on load, including Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and Cluster Autoscaler.
|
|
|
|
## Overview
|
|
|
|
Veza uses multiple layers of auto-scaling:
|
|
|
|
1. **Horizontal Pod Autoscaler (HPA)**: Scales pods based on CPU, memory, and custom metrics
|
|
2. **Vertical Pod Autoscaler (VPA)**: Adjusts resource requests and limits automatically
|
|
3. **Cluster Autoscaler**: Scales cluster nodes based on pod scheduling requirements
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ Metrics Sources │
|
|
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────┐ │
|
|
│ │Prometheus│ │Metrics API│ │Custom │ │External│ │
|
|
│ │ │ │ │Metrics │ │Metrics │ │
|
|
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬────┘ │
|
|
└───────┼─────────────┼─────────────┼─────────────┼──────┘
|
|
│ │ │ │
|
|
└─────────────┴─────────────┴─────────────┘
|
|
│
|
|
┌─────────────▼─────────────┐
|
|
│ Horizontal Pod Autoscaler │
|
|
│ (HPA) │
|
|
└─────────────┬─────────────┘
|
|
│
|
|
┌─────────────▼─────────────┐
|
|
│ Kubernetes API │
|
|
└─────────────┬─────────────┘
|
|
│
|
|
┌─────────────▼─────────────┐
|
|
│ Deployment │
|
|
│ (Scale Pods Up/Down) │
|
|
└───────────────────────────┘
|
|
```
|
|
|
|
## Components
|
|
|
|
### 1. Horizontal Pod Autoscaler (HPA)
|
|
|
|
HPA automatically scales the number of pods in a deployment based on observed metrics.
|
|
|
|
**Scaling Triggers**:
|
|
- CPU utilization
|
|
- Memory utilization
|
|
- Custom metrics (request rate, queue depth, etc.)
|
|
- External metrics (from Prometheus, etc.)
|
|
|
|
**Scaling Behavior**:
|
|
- **Scale Up**: Aggressive scaling when load increases
|
|
- **Scale Down**: Conservative scaling to prevent thrashing
|
|
|
|
### 2. Vertical Pod Autoscaler (VPA)
|
|
|
|
VPA automatically adjusts resource requests and limits for pods based on historical usage.
|
|
|
|
**Modes**:
|
|
- **Off**: No recommendations
|
|
- **Initial**: Set resources on pod creation
|
|
- **Auto**: Update resources on pod restart
|
|
- **Recreate**: Restart pods to apply new resources
|
|
|
|
### 3. Cluster Autoscaler
|
|
|
|
Cluster Autoscaler automatically adjusts the size of the Kubernetes cluster by adding or removing nodes.
|
|
|
|
**Triggers**:
|
|
- Pods that cannot be scheduled due to insufficient resources
|
|
- Nodes that are underutilized for extended periods
|
|
|
|
## Configuration
|
|
|
|
### Prerequisites
|
|
|
|
#### Enable Metrics Server
|
|
|
|
```bash
|
|
# Install metrics-server
|
|
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
|
|
|
|
# Verify installation
|
|
kubectl get deployment metrics-server -n kube-system
|
|
```
|
|
|
|
#### Enable Custom Metrics (Optional)
|
|
|
|
For custom metrics, install Prometheus Adapter:
|
|
|
|
```bash
|
|
# Install Prometheus Adapter
|
|
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
|
|
helm install prometheus-adapter prometheus-community/prometheus-adapter
|
|
```
|
|
|
|
### Horizontal Pod Autoscaler
|
|
|
|
#### Backend API HPA
|
|
|
|
```yaml
|
|
apiVersion: autoscaling/v2
|
|
kind: HorizontalPodAutoscaler
|
|
metadata:
|
|
name: veza-backend-api-hpa
|
|
namespace: veza-production
|
|
spec:
|
|
scaleTargetRef:
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
name: veza-backend-api
|
|
minReplicas: 3
|
|
maxReplicas: 20
|
|
metrics:
|
|
- type: Resource
|
|
resource:
|
|
name: cpu
|
|
target:
|
|
type: Utilization
|
|
averageUtilization: 70
|
|
- type: Resource
|
|
resource:
|
|
name: memory
|
|
target:
|
|
type: Utilization
|
|
averageUtilization: 80
|
|
behavior:
|
|
scaleUp:
|
|
stabilizationWindowSeconds: 60
|
|
policies:
|
|
- type: Percent
|
|
value: 100
|
|
periodSeconds: 15
|
|
- type: Pods
|
|
value: 4
|
|
periodSeconds: 15
|
|
selectPolicy: Max
|
|
scaleDown:
|
|
stabilizationWindowSeconds: 300
|
|
policies:
|
|
- type: Percent
|
|
value: 10
|
|
periodSeconds: 60
|
|
- type: Pods
|
|
value: 1
|
|
periodSeconds: 60
|
|
selectPolicy: Min
|
|
```
|
|
|
|
#### Frontend HPA
|
|
|
|
```yaml
|
|
apiVersion: autoscaling/v2
|
|
kind: HorizontalPodAutoscaler
|
|
metadata:
|
|
name: veza-frontend-hpa
|
|
namespace: veza-production
|
|
spec:
|
|
scaleTargetRef:
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
name: veza-frontend
|
|
minReplicas: 2
|
|
maxReplicas: 10
|
|
metrics:
|
|
- type: Resource
|
|
resource:
|
|
name: cpu
|
|
target:
|
|
type: Utilization
|
|
averageUtilization: 70
|
|
- type: Resource
|
|
resource:
|
|
name: memory
|
|
target:
|
|
type: Utilization
|
|
averageUtilization: 80
|
|
```
|
|
|
|
#### Chat Server HPA
|
|
|
|
```yaml
|
|
apiVersion: autoscaling/v2
|
|
kind: HorizontalPodAutoscaler
|
|
metadata:
|
|
name: veza-chat-server-hpa
|
|
namespace: veza-production
|
|
spec:
|
|
scaleTargetRef:
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
name: veza-chat-server
|
|
minReplicas: 2
|
|
maxReplicas: 15
|
|
metrics:
|
|
- type: Resource
|
|
resource:
|
|
name: cpu
|
|
target:
|
|
type: Utilization
|
|
averageUtilization: 70
|
|
- type: Resource
|
|
resource:
|
|
name: memory
|
|
target:
|
|
type: Utilization
|
|
averageUtilization: 80
|
|
```
|
|
|
|
### Custom Metrics HPA
|
|
|
|
For scaling based on application-specific metrics:
|
|
|
|
```yaml
|
|
apiVersion: autoscaling/v2
|
|
kind: HorizontalPodAutoscaler
|
|
metadata:
|
|
name: veza-backend-api-hpa-custom
|
|
namespace: veza-production
|
|
spec:
|
|
scaleTargetRef:
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
name: veza-backend-api
|
|
minReplicas: 3
|
|
maxReplicas: 20
|
|
metrics:
|
|
- type: Resource
|
|
resource:
|
|
name: cpu
|
|
target:
|
|
type: Utilization
|
|
averageUtilization: 70
|
|
- type: Pods
|
|
pods:
|
|
metric:
|
|
name: http_requests_per_second
|
|
target:
|
|
type: AverageValue
|
|
averageValue: "100"
|
|
- type: Object
|
|
object:
|
|
metric:
|
|
name: queue_depth
|
|
describedObject:
|
|
apiVersion: v1
|
|
kind: Service
|
|
name: veza-backend-api
|
|
target:
|
|
type: Value
|
|
value: "50"
|
|
```
|
|
|
|
### Vertical Pod Autoscaler
|
|
|
|
#### Backend API VPA
|
|
|
|
```yaml
|
|
apiVersion: autoscaling.k8s.io/v1
|
|
kind: VerticalPodAutoscaler
|
|
metadata:
|
|
name: veza-backend-api-vpa
|
|
namespace: veza-production
|
|
spec:
|
|
targetRef:
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
name: veza-backend-api
|
|
updatePolicy:
|
|
updateMode: "Auto" # Options: Off, Initial, Auto, Recreate
|
|
resourcePolicy:
|
|
containerPolicies:
|
|
- containerName: backend-api
|
|
minAllowed:
|
|
cpu: 100m
|
|
memory: 128Mi
|
|
maxAllowed:
|
|
cpu: 4000m
|
|
memory: 8Gi
|
|
controlledResources: ["cpu", "memory"]
|
|
controlledValues: RequestsAndLimits
|
|
```
|
|
|
|
## Scaling Strategies
|
|
|
|
### 1. CPU-Based Scaling
|
|
|
|
**Use Case**: CPU-intensive workloads
|
|
|
|
**Configuration**:
|
|
```yaml
|
|
metrics:
|
|
- type: Resource
|
|
resource:
|
|
name: cpu
|
|
target:
|
|
type: Utilization
|
|
averageUtilization: 70
|
|
```
|
|
|
|
**Pros**:
|
|
- Simple and reliable
|
|
- Works well for CPU-bound applications
|
|
|
|
**Cons**:
|
|
- May not reflect actual load for I/O-bound applications
|
|
- Can be slow to react to sudden spikes
|
|
|
|
### 2. Memory-Based Scaling
|
|
|
|
**Use Case**: Memory-intensive workloads
|
|
|
|
**Configuration**:
|
|
```yaml
|
|
metrics:
|
|
- type: Resource
|
|
resource:
|
|
name: memory
|
|
target:
|
|
type: Utilization
|
|
averageUtilization: 80
|
|
```
|
|
|
|
**Pros**:
|
|
- Prevents OOM kills
|
|
- Good for memory-intensive applications
|
|
|
|
**Cons**:
|
|
- Memory usage can be less predictable
|
|
- May scale unnecessarily
|
|
|
|
### 3. Custom Metrics Scaling
|
|
|
|
**Use Case**: Application-specific scaling needs
|
|
|
|
**Configuration**:
|
|
```yaml
|
|
metrics:
|
|
- type: Pods
|
|
pods:
|
|
metric:
|
|
name: http_requests_per_second
|
|
target:
|
|
type: AverageValue
|
|
averageValue: "100"
|
|
```
|
|
|
|
**Pros**:
|
|
- Scales based on actual application load
|
|
- More responsive to traffic patterns
|
|
|
|
**Cons**:
|
|
- Requires custom metrics infrastructure
|
|
- More complex setup
|
|
|
|
### 4. Multi-Metric Scaling
|
|
|
|
**Use Case**: Complex scaling requirements
|
|
|
|
**Configuration**:
|
|
```yaml
|
|
metrics:
|
|
- type: Resource
|
|
resource:
|
|
name: cpu
|
|
target:
|
|
type: Utilization
|
|
averageUtilization: 70
|
|
- type: Resource
|
|
resource:
|
|
name: memory
|
|
target:
|
|
type: Utilization
|
|
averageUtilization: 80
|
|
- type: Pods
|
|
pods:
|
|
metric:
|
|
name: http_requests_per_second
|
|
target:
|
|
type: AverageValue
|
|
averageValue: "100"
|
|
```
|
|
|
|
**Behavior**: HPA scales based on the metric that requires the most replicas.
|
|
|
|
## Best Practices
|
|
|
|
### 1. Set Appropriate Min/Max Replicas
|
|
|
|
```yaml
|
|
minReplicas: 3 # Ensure high availability
|
|
maxReplicas: 20 # Prevent runaway scaling
|
|
```
|
|
|
|
### 2. Use Stabilization Windows
|
|
|
|
```yaml
|
|
behavior:
|
|
scaleUp:
|
|
stabilizationWindowSeconds: 60 # Wait 60s before scaling up
|
|
scaleDown:
|
|
stabilizationWindowSeconds: 300 # Wait 5min before scaling down
|
|
```
|
|
|
|
### 3. Configure Scaling Policies
|
|
|
|
```yaml
|
|
behavior:
|
|
scaleUp:
|
|
policies:
|
|
- type: Percent
|
|
value: 100 # Can double replicas
|
|
periodSeconds: 15
|
|
- type: Pods
|
|
value: 4 # Or add 4 pods max
|
|
periodSeconds: 15
|
|
selectPolicy: Max # Use the more aggressive policy
|
|
```
|
|
|
|
### 4. Monitor Scaling Events
|
|
|
|
```bash
|
|
# Watch HPA status
|
|
kubectl get hpa -n veza-production -w
|
|
|
|
# Check HPA events
|
|
kubectl describe hpa veza-backend-api-hpa -n veza-production
|
|
```
|
|
|
|
### 5. Test Scaling Behavior
|
|
|
|
```bash
|
|
# Generate load to test scaling
|
|
kubectl run load-generator --rm -it --image=busybox --restart=Never -- \
|
|
/bin/sh -c "while true; do wget -q -O- http://veza-backend-api:8080/api/v1/tracks; done"
|
|
|
|
# Watch scaling
|
|
kubectl get hpa veza-backend-api-hpa -n veza-production -w
|
|
```
|
|
|
|
## Cluster Autoscaler
|
|
|
|
### AWS EKS
|
|
|
|
```yaml
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: cluster-autoscaler
|
|
namespace: kube-system
|
|
spec:
|
|
replicas: 1
|
|
selector:
|
|
matchLabels:
|
|
app: cluster-autoscaler
|
|
template:
|
|
metadata:
|
|
labels:
|
|
app: cluster-autoscaler
|
|
spec:
|
|
serviceAccountName: cluster-autoscaler
|
|
containers:
|
|
- image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.27.0
|
|
name: cluster-autoscaler
|
|
resources:
|
|
limits:
|
|
cpu: 100m
|
|
memory: 300Mi
|
|
requests:
|
|
cpu: 100m
|
|
memory: 300Mi
|
|
command:
|
|
- ./cluster-autoscaler
|
|
- --v=4
|
|
- --stderrthreshold=info
|
|
- --cloud-provider=aws
|
|
- --skip-nodes-with-local-storage=false
|
|
- --expander=least-waste
|
|
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/veza-cluster
|
|
env:
|
|
- name: AWS_REGION
|
|
value: us-east-1
|
|
- name: AWS_STS_REGIONAL_ENDPOINTS
|
|
value: regional
|
|
```
|
|
|
|
### GCP GKE
|
|
|
|
GKE has built-in cluster autoscaling. Enable it when creating the cluster:
|
|
|
|
```bash
|
|
gcloud container clusters create veza-cluster \
|
|
--enable-autoscaling \
|
|
--min-nodes=3 \
|
|
--max-nodes=10 \
|
|
--zone=us-central1-a
|
|
```
|
|
|
|
### Azure AKS
|
|
|
|
```bash
|
|
az aks create \
|
|
--resource-group veza-rg \
|
|
--name veza-cluster \
|
|
--node-count 3 \
|
|
--enable-cluster-autoscaler \
|
|
--min-count 3 \
|
|
--max-count 10
|
|
```
|
|
|
|
## Monitoring
|
|
|
|
### Metrics to Monitor
|
|
|
|
- **Current Replicas**: Number of pods currently running
|
|
- **Desired Replicas**: Target number of pods
|
|
- **CPU Utilization**: Current CPU usage
|
|
- **Memory Utilization**: Current memory usage
|
|
- **Scaling Events**: Scale up/down events
|
|
|
|
### Prometheus Queries
|
|
|
|
```promql
|
|
# Current replicas
|
|
kube_horizontalpodautoscaler_status_current_replicas
|
|
|
|
# Desired replicas
|
|
kube_horizontalpodautoscaler_status_desired_replicas
|
|
|
|
# CPU utilization
|
|
kube_horizontalpodautoscaler_status_current_metrics{metric_name="cpu"}
|
|
|
|
# Memory utilization
|
|
kube_horizontalpodautoscaler_status_current_metrics{metric_name="memory"}
|
|
|
|
# Scaling events
|
|
kube_horizontalpodautoscaler_status_condition{condition="AbleToScale"}
|
|
```
|
|
|
|
### Grafana Dashboard
|
|
|
|
Create a dashboard to visualize:
|
|
- Replica count over time
|
|
- CPU/Memory utilization
|
|
- Scaling events
|
|
- HPA status
|
|
|
|
## Troubleshooting
|
|
|
|
### HPA Not Scaling
|
|
|
|
```bash
|
|
# Check HPA status
|
|
kubectl get hpa veza-backend-api-hpa -n veza-production
|
|
|
|
# Check HPA events
|
|
kubectl describe hpa veza-backend-api-hpa -n veza-production
|
|
|
|
# Verify metrics are available
|
|
kubectl top pods -n veza-production
|
|
|
|
# Check metrics-server
|
|
kubectl get deployment metrics-server -n kube-system
|
|
kubectl logs -n kube-system deployment/metrics-server
|
|
```
|
|
|
|
### Scaling Too Aggressively
|
|
|
|
```bash
|
|
# Increase stabilization window
|
|
# Edit HPA to increase stabilizationWindowSeconds
|
|
|
|
# Add scaling policies to limit rate
|
|
# Add policies with lower values
|
|
```
|
|
|
|
### Scaling Too Slowly
|
|
|
|
```bash
|
|
# Decrease stabilization window
|
|
# Edit HPA to decrease stabilizationWindowSeconds
|
|
|
|
# Add more aggressive scaling policies
|
|
# Increase Percent or Pods values
|
|
```
|
|
|
|
### VPA Not Working
|
|
|
|
```bash
|
|
# Check VPA status
|
|
kubectl get vpa -n veza-production
|
|
|
|
# Check VPA recommendations
|
|
kubectl describe vpa veza-backend-api-vpa -n veza-production
|
|
|
|
# Verify VPA admission controller is installed
|
|
kubectl get deployment vpa-admission-controller -n kube-system
|
|
```
|
|
|
|
## References
|
|
|
|
- [Kubernetes HPA Documentation](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/)
|
|
- [Kubernetes VPA Documentation](https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler)
|
|
- [Cluster Autoscaler Documentation](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler)
|
|
- [Metrics Server](https://github.com/kubernetes-sigs/metrics-server)
|
|
|