veza/k8s/autoscaling/README.md
senke f9120c322b
Some checks failed
Backend API CI / test-unit (push) Failing after 0s
Backend API CI / test-integration (push) Failing after 0s
Frontend CI / test (push) Failing after 0s
Storybook Audit / Build & audit Storybook (push) Failing after 0s
Stream Server CI / test (push) Failing after 0s
release(v0.903): Vault - ORDER BY whitelist, rate limiter, VERSION sync, chat-server cleanup, Go 1.24
- ORDER BY dynamiques : whitelist explicite, fallback created_at DESC
- Login/register soumis au rate limiter global
- VERSION sync + check CI
- Nettoyage références veza-chat-server
- Go 1.24 partout (Dockerfile, workflows)
- TODO/FIXME/HACK convertis en issues ou résolus
2026-02-27 09:43:25 +01:00

578 lines
14 KiB
Markdown

# Auto-Scaling Configuration for Veza Platform
This directory contains configurations for automatic scaling of the Veza platform based on load, including Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and Cluster Autoscaler.
## Overview
Veza uses multiple layers of auto-scaling:
1. **Horizontal Pod Autoscaler (HPA)**: Scales pods based on CPU, memory, and custom metrics
2. **Vertical Pod Autoscaler (VPA)**: Adjusts resource requests and limits automatically
3. **Cluster Autoscaler**: Scales cluster nodes based on pod scheduling requirements
## Architecture
```
┌─────────────────────────────────────────────────────────┐
│ Metrics Sources │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────┐ │
│ │Prometheus│ │Metrics API│ │Custom │ │External│ │
│ │ │ │ │Metrics │ │Metrics │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬────┘ │
└───────┼─────────────┼─────────────┼─────────────┼──────┘
│ │ │ │
└─────────────┴─────────────┴─────────────┘
┌─────────────▼─────────────┐
│ Horizontal Pod Autoscaler │
│ (HPA) │
└─────────────┬─────────────┘
┌─────────────▼─────────────┐
│ Kubernetes API │
└─────────────┬─────────────┘
┌─────────────▼─────────────┐
│ Deployment │
│ (Scale Pods Up/Down) │
└───────────────────────────┘
```
## Components
### 1. Horizontal Pod Autoscaler (HPA)
HPA automatically scales the number of pods in a deployment based on observed metrics.
**Scaling Triggers**:
- CPU utilization
- Memory utilization
- Custom metrics (request rate, queue depth, etc.)
- External metrics (from Prometheus, etc.)
**Scaling Behavior**:
- **Scale Up**: Aggressive scaling when load increases
- **Scale Down**: Conservative scaling to prevent thrashing
### 2. Vertical Pod Autoscaler (VPA)
VPA automatically adjusts resource requests and limits for pods based on historical usage.
**Modes**:
- **Off**: No recommendations
- **Initial**: Set resources on pod creation
- **Auto**: Update resources on pod restart
- **Recreate**: Restart pods to apply new resources
### 3. Cluster Autoscaler
Cluster Autoscaler automatically adjusts the size of the Kubernetes cluster by adding or removing nodes.
**Triggers**:
- Pods that cannot be scheduled due to insufficient resources
- Nodes that are underutilized for extended periods
## Configuration
### Prerequisites
#### Enable Metrics Server
```bash
# Install metrics-server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Verify installation
kubectl get deployment metrics-server -n kube-system
```
#### Enable Custom Metrics (Optional)
For custom metrics, install Prometheus Adapter:
```bash
# Install Prometheus Adapter
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-adapter prometheus-community/prometheus-adapter
```
### Horizontal Pod Autoscaler
#### Backend API HPA
```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: veza-backend-api-hpa
namespace: veza-production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: veza-backend-api
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
- type: Pods
value: 1
periodSeconds: 60
selectPolicy: Min
```
#### Frontend HPA
```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: veza-frontend-hpa
namespace: veza-production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: veza-frontend
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
```
### Custom Metrics HPA
For scaling based on application-specific metrics:
```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: veza-backend-api-hpa-custom
namespace: veza-production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: veza-backend-api
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
- type: Object
object:
metric:
name: queue_depth
describedObject:
apiVersion: v1
kind: Service
name: veza-backend-api
target:
type: Value
value: "50"
```
### Vertical Pod Autoscaler
#### Backend API VPA
```yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: veza-backend-api-vpa
namespace: veza-production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: veza-backend-api
updatePolicy:
updateMode: "Auto" # Options: Off, Initial, Auto, Recreate
resourcePolicy:
containerPolicies:
- containerName: backend-api
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 4000m
memory: 8Gi
controlledResources: ["cpu", "memory"]
controlledValues: RequestsAndLimits
```
## Scaling Strategies
### 1. CPU-Based Scaling
**Use Case**: CPU-intensive workloads
**Configuration**:
```yaml
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
```
**Pros**:
- Simple and reliable
- Works well for CPU-bound applications
**Cons**:
- May not reflect actual load for I/O-bound applications
- Can be slow to react to sudden spikes
### 2. Memory-Based Scaling
**Use Case**: Memory-intensive workloads
**Configuration**:
```yaml
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
```
**Pros**:
- Prevents OOM kills
- Good for memory-intensive applications
**Cons**:
- Memory usage can be less predictable
- May scale unnecessarily
### 3. Custom Metrics Scaling
**Use Case**: Application-specific scaling needs
**Configuration**:
```yaml
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
```
**Pros**:
- Scales based on actual application load
- More responsive to traffic patterns
**Cons**:
- Requires custom metrics infrastructure
- More complex setup
### 4. Multi-Metric Scaling
**Use Case**: Complex scaling requirements
**Configuration**:
```yaml
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
```
**Behavior**: HPA scales based on the metric that requires the most replicas.
## Best Practices
### 1. Set Appropriate Min/Max Replicas
```yaml
minReplicas: 3 # Ensure high availability
maxReplicas: 20 # Prevent runaway scaling
```
### 2. Use Stabilization Windows
```yaml
behavior:
scaleUp:
stabilizationWindowSeconds: 60 # Wait 60s before scaling up
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5min before scaling down
```
### 3. Configure Scaling Policies
```yaml
behavior:
scaleUp:
policies:
- type: Percent
value: 100 # Can double replicas
periodSeconds: 15
- type: Pods
value: 4 # Or add 4 pods max
periodSeconds: 15
selectPolicy: Max # Use the more aggressive policy
```
### 4. Monitor Scaling Events
```bash
# Watch HPA status
kubectl get hpa -n veza-production -w
# Check HPA events
kubectl describe hpa veza-backend-api-hpa -n veza-production
```
### 5. Test Scaling Behavior
```bash
# Generate load to test scaling
kubectl run load-generator --rm -it --image=busybox --restart=Never -- \
/bin/sh -c "while true; do wget -q -O- http://veza-backend-api:8080/api/v1/tracks; done"
# Watch scaling
kubectl get hpa veza-backend-api-hpa -n veza-production -w
```
## Cluster Autoscaler
### AWS EKS
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: cluster-autoscaler
template:
metadata:
labels:
app: cluster-autoscaler
spec:
serviceAccountName: cluster-autoscaler
containers:
- image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.27.0
name: cluster-autoscaler
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 100m
memory: 300Mi
command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/veza-cluster
env:
- name: AWS_REGION
value: us-east-1
- name: AWS_STS_REGIONAL_ENDPOINTS
value: regional
```
### GCP GKE
GKE has built-in cluster autoscaling. Enable it when creating the cluster:
```bash
gcloud container clusters create veza-cluster \
--enable-autoscaling \
--min-nodes=3 \
--max-nodes=10 \
--zone=us-central1-a
```
### Azure AKS
```bash
az aks create \
--resource-group veza-rg \
--name veza-cluster \
--node-count 3 \
--enable-cluster-autoscaler \
--min-count 3 \
--max-count 10
```
## Monitoring
### Metrics to Monitor
- **Current Replicas**: Number of pods currently running
- **Desired Replicas**: Target number of pods
- **CPU Utilization**: Current CPU usage
- **Memory Utilization**: Current memory usage
- **Scaling Events**: Scale up/down events
### Prometheus Queries
```promql
# Current replicas
kube_horizontalpodautoscaler_status_current_replicas
# Desired replicas
kube_horizontalpodautoscaler_status_desired_replicas
# CPU utilization
kube_horizontalpodautoscaler_status_current_metrics{metric_name="cpu"}
# Memory utilization
kube_horizontalpodautoscaler_status_current_metrics{metric_name="memory"}
# Scaling events
kube_horizontalpodautoscaler_status_condition{condition="AbleToScale"}
```
### Grafana Dashboard
Create a dashboard to visualize:
- Replica count over time
- CPU/Memory utilization
- Scaling events
- HPA status
## Troubleshooting
### HPA Not Scaling
```bash
# Check HPA status
kubectl get hpa veza-backend-api-hpa -n veza-production
# Check HPA events
kubectl describe hpa veza-backend-api-hpa -n veza-production
# Verify metrics are available
kubectl top pods -n veza-production
# Check metrics-server
kubectl get deployment metrics-server -n kube-system
kubectl logs -n kube-system deployment/metrics-server
```
### Scaling Too Aggressively
```bash
# Increase stabilization window
# Edit HPA to increase stabilizationWindowSeconds
# Add scaling policies to limit rate
# Add policies with lower values
```
### Scaling Too Slowly
```bash
# Decrease stabilization window
# Edit HPA to decrease stabilizationWindowSeconds
# Add more aggressive scaling policies
# Increase Percent or Pods values
```
### VPA Not Working
```bash
# Check VPA status
kubectl get vpa -n veza-production
# Check VPA recommendations
kubectl describe vpa veza-backend-api-vpa -n veza-production
# Verify VPA admission controller is installed
kubectl get deployment vpa-admission-controller -n kube-system
```
## References
- [Kubernetes HPA Documentation](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/)
- [Kubernetes VPA Documentation](https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler)
- [Cluster Autoscaler Documentation](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler)
- [Metrics Server](https://github.com/kubernetes-sigs/metrics-server)