11 KiB
11 KiB
📖 PRODUCTION GUIDE - VEZA RUST MODULES
Guide complet pour le déploiement et l'exploitation des modules Rust en production
Version : 2.0 Production-Ready
Dernière mise à jour : 1er juillet 2025
🎯 APERÇU SYSTÈME
Architecture Production
┌─────────────────────────────────────────┐
│ VEZA PLATFORM │
├─────────────────┬───────────────────────┤
│ CHAT SERVER │ STREAM SERVER │
│ (Rust) │ (Rust) │
├─────────────────┼───────────────────────┤
│ • 100k+ WS │ • 10k+ Streams │
│ • <10ms latency │ • 100k+ Listeners │
│ • E2E Encryption│ • Adaptive Bitrate │
│ • AI Moderation │ • Real-time Effects │
└─────────────────┴───────────────────────┘
│
┌───────┼───────┐
│ BACKEND GO │
│ (API Gateway) │
└───────────────┘
Spécifications Techniques
- Performance : 100k+ connexions WebSocket simultanées
- Latency : <10ms P99 pour messages, <50ms pour streaming
- Throughput : 10k+ requêtes/seconde par instance
- Availability : 99.99% uptime target
- Scalability : Horizontale avec auto-scaling
🔧 CONFIGURATION PRODUCTION
Variables d'Environnement
# === CORE CONFIG ===
RUST_LOG=info
ENVIRONMENT=production
SERVICE_NAME=veza-stream-server
VERSION=2.0.0
# === NETWORK ===
HOST=0.0.0.0
PORT=8080
WS_PORT=8081
GRPC_PORT=50051
# === DATABASE ===
DATABASE_URL=postgresql://veza:secure_pass@postgres:5432/veza_prod
DATABASE_POOL_SIZE=100
DATABASE_TIMEOUT_MS=5000
# === REDIS ===
REDIS_URL=redis://redis:6379
REDIS_POOL_SIZE=50
REDIS_TTL_DEFAULT=3600
# === MONITORING ===
PROMETHEUS_PORT=9090
JAEGER_ENDPOINT=http://jaeger:14268/api/traces
METRICS_ENABLED=true
TRACING_ENABLED=true
# === PERFORMANCE ===
MAX_CONNECTIONS=100000
WORKER_THREADS=16
BLOCKING_THREADS=32
MEMORY_LIMIT_MB=8192
# === SECURITY ===
JWT_SECRET=your_production_jwt_secret_here
ENCRYPTION_KEY=your_32_byte_encryption_key_here
RATE_LIMIT_PER_MINUTE=1000
ENABLE_CORS=true
ALLOWED_ORIGINS=https://veza.live,https://app.veza.live
# === AUDIO STREAMING ===
MAX_STREAMS=10000
MAX_LISTENERS_PER_STREAM=10000
ADAPTIVE_BITRATE=true
DEFAULT_BITRATE=128
CODECS_ENABLED=opus,aac,mp3
# === CHAT ===
MAX_MESSAGE_SIZE=8192
MESSAGE_HISTORY_LIMIT=1000
MODERATION_ENABLED=true
E2E_ENCRYPTION=optional
Limites et Quotas Recommandés
production_limits:
# CPU & Memory
cpu_request: "2000m" # 2 CPU cores minimum
cpu_limit: "8000m" # 8 CPU cores maximum
memory_request: "4Gi" # 4GB RAM minimum
memory_limit: "16Gi" # 16GB RAM maximum
# Network
max_connections: 100000
bandwidth_limit: "1Gbps"
# Storage
ephemeral_storage: "10Gi"
logs_retention: "30d"
# Application
max_message_rate: 100 # messages/second/user
max_file_upload: "200MB"
concurrent_streams: 10000
🚀 DÉPLOIEMENT PRODUCTION
1. Pré-requis Infrastructure
- Kubernetes 1.25+ ou Docker Swarm
- PostgreSQL 14+ avec High Availability
- Redis 7.0+ cluster mode
- Load Balancer avec SSL termination
- Monitoring : Prometheus + Grafana stack
2. Health Checks
healthcheck:
readiness:
path: /health/ready
port: 8080
timeout: 5s
period: 10s
liveness:
path: /health/live
port: 8080
timeout: 5s
period: 30s
failure_threshold: 3
3. Graceful Shutdown
// Configuration de graceful shutdown (30s)
tokio::select! {
_ = signal::ctrl_c() => {
info!("🛑 Graceful shutdown initiated");
// 1. Stop accepting new connections
server.stop_accepting().await;
// 2. Wait for existing connections to finish (max 30s)
timeout(Duration::from_secs(30),
server.wait_for_connections()).await;
// 3. Force close remaining connections
server.force_close().await;
info!("✅ Graceful shutdown completed");
}
}
📊 MONITORING & ALERTING
Métriques Clés à Surveiller
Performance Metrics
# Latency (target: P99 < 50ms)
http_request_duration_seconds{quantile="0.99"} < 0.05
# Throughput (target: > 10k req/s)
rate(http_requests_total[1m]) > 10000
# Error Rate (target: < 0.1%)
rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) < 0.001
Resource Metrics
# CPU Usage (alert: > 80%)
cpu_usage_percent > 80
# Memory Usage (alert: > 85%)
memory_usage_percent > 85
# Connection Count (alert: > 90k)
websocket_connections_active > 90000
Business Metrics
# Active Users (alert: < 1k unusual drop)
increase(active_users_total[5m]) < -1000
# Message Success Rate (alert: < 99.9%)
message_delivery_success_rate < 0.999
# Stream Quality (alert: > 5% degraded)
stream_quality_degraded_percent > 5
Alerting Rules
groups:
- name: veza-rust-modules
rules:
- alert: HighLatency
expr: histogram_quantile(0.99, http_request_duration_seconds) > 0.1
for: 2m
labels:
severity: critical
annotations:
summary: "High latency detected"
- alert: HighErrorRate
expr: rate(http_errors_total[5m]) / rate(http_requests_total[5m]) > 0.01
for: 1m
labels:
severity: critical
- alert: ServiceDown
expr: up == 0
for: 30s
labels:
severity: critical
🔒 SÉCURITÉ PRODUCTION
1. Network Security
- TLS 1.3 obligatoire pour toutes les connexions
- Certificate pinning pour communications inter-services
- Network policies Kubernetes restrictives
- DDoS protection avec rate limiting intelligent
2. Data Protection
// Encryption at rest
let encryption_key = load_key_from_vault().await?;
let encrypted_data = AES_256_GCM.encrypt(&data, &encryption_key)?;
// Encryption in transit
let tls_config = TlsConfig::builder()
.cert_file("/certs/server.crt")
.key_file("/certs/server.key")
.min_tls_version(TlsVersion::TLSv1_3)
.build()?;
3. Authentication & Authorization
- JWT avec rotation automatique (24h)
- RBAC granulaire par resource
- API keys avec scopes limités
- 2FA obligatoire pour comptes privilégiés
🔄 MAINTENANCE & OPÉRATIONS
1. Mise à Jour Rolling
# 1. Update image version
kubectl set image deployment/stream-server \
stream-server=veza/stream-server:v2.1.0
# 2. Monitor rollout
kubectl rollout status deployment/stream-server
# 3. Validate health
kubectl get pods -l app=stream-server
2. Backup & Recovery
- Database : Point-in-time recovery (PITR) avec PostgreSQL
- Configuration : Git-ops avec validation automatique
- Logs : Rétention 30 jours avec archivage S3
- Metrics : Rétention 1 an avec downsampling
3. Scaling Operations
# Horizontal scaling
kubectl scale deployment/stream-server --replicas=10
# Vertical scaling (HPA)
kubectl autoscale deployment/stream-server \
--cpu-percent=70 --min=5 --max=50
🚨 RUNBOOKS INCIDENTS
Incident 1 : High Latency
# 1. Diagnostic rapide
kubectl top pods -l app=stream-server
kubectl logs -l app=stream-server --tail=100
# 2. Scaling immédiat si CPU > 80%
kubectl scale deployment/stream-server --replicas=20
# 3. Investigation
kubectl exec -it stream-server-xxx -- /bin/bash
htop # Vérifier CPU/Memory
ss -tulpn # Vérifier connexions réseau
Incident 2 : Service Down
# 1. Restart rapid
kubectl rollout restart deployment/stream-server
# 2. Check dependencies
kubectl get pods -l app=postgres
kubectl get pods -l app=redis
# 3. Traffic rerouting
kubectl patch service/stream-server -p '{"spec":{"selector":{"app":"stream-server-backup"}}}'
Incident 3 : Memory Leak
# 1. Memory profiling
kubectl exec stream-server-xxx -- /bin/bash
curl http://localhost:9090/debug/pprof/heap > heap.prof
# 2. Graceful restart par batch
for pod in $(kubectl get pods -l app=stream-server -o name); do
kubectl delete $pod
sleep 30 # Wait for replacement
done
📈 PERFORMANCE TUNING
1. OS Level Optimizations
# Network tuning
echo 'net.core.somaxconn=65535' >> /etc/sysctl.conf
echo 'net.core.netdev_max_backlog=5000' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_max_syn_backlog=65535' >> /etc/sysctl.conf
# File descriptor limits
echo '* soft nofile 1048576' >> /etc/security/limits.conf
echo '* hard nofile 1048576' >> /etc/security/limits.conf
2. Application Tuning
// Tokio runtime optimization
let rt = tokio::runtime::Builder::new_multi_thread()
.worker_threads(num_cpus::get() * 2)
.max_blocking_threads(512)
.thread_stack_size(2 * 1024 * 1024) // 2MB stack
.enable_all()
.build()?;
3. Database Optimization
-- Connection pooling
ALTER SYSTEM SET max_connections = 500;
ALTER SYSTEM SET shared_buffers = '4GB';
ALTER SYSTEM SET effective_cache_size = '12GB';
ALTER SYSTEM SET work_mem = '64MB';
-- Index optimization for chat messages
CREATE INDEX CONCURRENTLY idx_messages_room_created
ON messages(room_id, created_at DESC)
WHERE deleted_at IS NULL;
🔍 TROUBLESHOOTING
Problèmes Fréquents
1. WebSocket Connections Dropping
# Check load balancer timeout
kubectl describe ingress stream-server
# Verify heartbeat configuration
grep -r "ping_interval" src/
# Monitor connection metrics
curl http://localhost:9090/metrics | grep websocket
2. Audio Stream Latency
// Verify buffer configuration
let buffer_config = BufferConfig {
target_latency: Duration::from_millis(50),
max_buffer_size: 1024 * 8, // 8KB
adaptive: true,
};
3. Memory Usage Growth
# Check for connection leaks
ss -s | grep tcp
lsof -p $(pgrep stream-server) | wc -l
# Monitor memory pools
curl http://localhost:9090/debug/pprof/allocs
📋 CHECKLIST PRODUCTION
Pre-Deployment
- Load testing completed (100k+ connections)
- Security audit passed
- Monitoring configured
- Backup strategy validated
- Disaster recovery tested
Post-Deployment
- Health checks passing
- Metrics collecting properly
- Logs flowing to aggregation
- Alerts configured and tested
- Performance within targets
Weekly Maintenance
- Check resource utilization trends
- Review error logs
- Update security patches
- Validate backup integrity
- Performance regression testing
🎯 Cette documentation garantit un déploiement production robuste et maintenable des modules Rust Veza.