1676 lines
45 KiB
Markdown
1676 lines
45 KiB
Markdown
# ORIGIN_DEPLOYMENT_GUIDE.md
|
|
|
|
## 📋 RÉSUMÉ EXÉCUTIF
|
|
|
|
Ce document définit le guide de déploiement complet pour la plateforme Veza en production. Il couvre Infrastructure as Code (Terraform/Ansible), containerisation (Docker/Incus), orchestration (Kubernetes), CI/CD pipelines, stratégies zero-downtime, disaster recovery, monitoring, et procedures opérationnelles pour déploiements sécurisés, automatisés et réversibles sur 24 mois.
|
|
|
|
## 🎯 OBJECTIFS
|
|
|
|
### Objectif Principal
|
|
Établir un processus de déploiement automatisé, sécurisé, reproductible et zero-downtime pour production avec rollback < 5 min, déploiements multiples par jour, et RTO < 4 heures en cas de disaster.
|
|
|
|
### Objectifs Secondaires
|
|
- Automatisation complète (Infrastructure as Code)
|
|
- Zero-downtime deployments (blue-green, canary)
|
|
- Rollback automatique en cas d'échec (< 5 min)
|
|
- Disaster recovery plan opérationnel (RTO < 4h, RPO < 1h)
|
|
- Monitoring et alerting en temps réel (Prometheus + Grafana)
|
|
|
|
## 📖 TABLE DES MATIÈRES
|
|
|
|
1. [Deployment Philosophy](#1-deployment-philosophy)
|
|
2. [Infrastructure as Code](#2-infrastructure-as-code)
|
|
3. [Containerization](#3-containerization)
|
|
4. [Kubernetes Orchestration](#4-kubernetes-orchestration)
|
|
5. [CI/CD Pipelines](#5-cicd-pipelines)
|
|
6. [Zero-Downtime Strategies](#6-zero-downtime-strategies)
|
|
7. [Configuration Management](#7-configuration-management)
|
|
8. [Secrets Management](#8-secrets-management)
|
|
9. [Monitoring & Observability](#9-monitoring--observability)
|
|
10. [Backup & Disaster Recovery](#10-backup--disaster-recovery)
|
|
11. [Scaling Strategy](#11-scaling-strategy)
|
|
12. [Operational Procedures](#12-operational-procedures)
|
|
13. [Correctifs de Sécurité Prioritaires](#13-correctifs-de-sécurité-prioritaires)
|
|
14. [Checklist de Déploiement Éthique](#14-checklist-de-déploiement-éthique)
|
|
15. [Plan de Migration JWT HS256 → RS256](#15-plan-de-migration-jwt-hs256--rs256)
|
|
|
|
## 🔒 RÈGLES IMMUABLES
|
|
|
|
1. **Infrastructure as Code**: 100% infrastructure versionnée (Terraform) - aucun changement manuel
|
|
2. **Immutable Infrastructure**: Jamais modifier serveurs existants, toujours redéployer
|
|
3. **Zero Downtime**: Aucun déploiement ne peut interrompre service (blue-green ou canary obligatoire)
|
|
4. **Automated Rollback**: Rollback automatique si health checks fail (< 5 min)
|
|
5. **Version Control**: Toutes les configs versionnées (Git) - aucune exception
|
|
6. **Secrets in Vault**: Aucun secret en clair (HashiCorp Vault ou équivalent)
|
|
7. **Testing in Staging**: Tous déploiements testés en staging d'abord
|
|
8. **Monitoring Required**: Alerting configuré avant mise en production
|
|
9. **Backup Verification**: Backups testés mensuellement (restore test)
|
|
10. **Documentation**: Runbooks à jour pour toutes procedures critiques
|
|
|
|
## 1. DEPLOYMENT PHILOSOPHY
|
|
|
|
### 1.1 Deployment Principles
|
|
|
|
**Twelve-Factor App**:
|
|
1. **Codebase**: One codebase tracked in Git, many deploys
|
|
2. **Dependencies**: Explicitly declare and isolate (go.mod, Cargo.lock, package-lock.json)
|
|
3. **Config**: Store config in environment (never in code)
|
|
4. **Backing Services**: Treat as attached resources (DB, Redis, S3)
|
|
5. **Build, Release, Run**: Strictly separate build and run stages
|
|
6. **Processes**: Execute app as stateless processes
|
|
7. **Port Binding**: Export services via port binding
|
|
8. **Concurrency**: Scale out via process model
|
|
9. **Disposability**: Fast startup and graceful shutdown
|
|
10. **Dev/Prod Parity**: Keep development, staging, production similar
|
|
11. **Logs**: Treat logs as event streams
|
|
12. **Admin Processes**: Run admin/management tasks as one-off processes
|
|
|
|
### 1.2 Deployment Environments
|
|
|
|
| Environment | Purpose | Update Frequency | Users |
|
|
|-------------|---------|------------------|-------|
|
|
| **Development** | Local development | Continuous | Developers |
|
|
| **Staging** | Pre-production testing | Daily | QA, Product Team |
|
|
| **Production** | Live users | Multiple/day | All users |
|
|
|
|
### 1.3 Deployment Workflow
|
|
|
|
```
|
|
┌─────────────┐
|
|
│ Develop │ ─── git push ───> CI/CD Triggered
|
|
└─────────────┘
|
|
│
|
|
▼
|
|
┌─────────────┐
|
|
│ Build │ ─── Tests, Linting, Security Scan
|
|
└─────────────┘
|
|
│
|
|
▼
|
|
┌─────────────┐
|
|
│ Staging │ ─── Deploy to staging, E2E tests
|
|
└─────────────┘
|
|
│
|
|
▼
|
|
┌─────────────┐
|
|
│ Production │ ─── Blue-Green / Canary deployment
|
|
└─────────────┘
|
|
│
|
|
▼
|
|
┌─────────────┐
|
|
│ Monitor │ ─── Health checks, metrics, logs
|
|
└─────────────┘
|
|
│
|
|
▼ (if issues)
|
|
┌─────────────┐
|
|
│ Rollback │ ─── Automatic rollback < 5 min
|
|
└─────────────┘
|
|
```
|
|
|
|
## 2. INFRASTRUCTURE AS CODE
|
|
|
|
### 2.1 Terraform Configuration
|
|
|
|
**Project Structure**:
|
|
```
|
|
terraform/
|
|
├── environments/
|
|
│ ├── production/
|
|
│ │ ├── main.tf
|
|
│ │ ├── variables.tf
|
|
│ │ ├── terraform.tfvars (encrypted)
|
|
│ │ └── outputs.tf
|
|
│ └── staging/
|
|
│ ├── main.tf
|
|
│ ├── variables.tf
|
|
│ ├── terraform.tfvars
|
|
│ └── outputs.tf
|
|
├── modules/
|
|
│ ├── compute/
|
|
│ │ ├── main.tf
|
|
│ │ ├── variables.tf
|
|
│ │ └── outputs.tf
|
|
│ ├── database/
|
|
│ ├── networking/
|
|
│ ├── storage/
|
|
│ └── kubernetes/
|
|
└── backend.tf (Terraform state in S3)
|
|
```
|
|
|
|
**Example: Compute Module**:
|
|
```hcl
|
|
# terraform/modules/compute/main.tf
|
|
resource "aws_instance" "app_server" {
|
|
count = var.instance_count
|
|
ami = var.ami_id
|
|
instance_type = var.instance_type
|
|
|
|
vpc_security_group_ids = [aws_security_group.app.id]
|
|
subnet_id = var.subnet_ids[count.index % length(var.subnet_ids)]
|
|
|
|
user_data = templatefile("${path.module}/user_data.sh", {
|
|
environment = var.environment
|
|
})
|
|
|
|
tags = {
|
|
Name = "veza-app-${var.environment}-${count.index + 1}"
|
|
Environment = var.environment
|
|
ManagedBy = "Terraform"
|
|
}
|
|
|
|
lifecycle {
|
|
create_before_destroy = true
|
|
}
|
|
}
|
|
|
|
resource "aws_security_group" "app" {
|
|
name = "veza-app-${var.environment}"
|
|
description = "Security group for Veza application servers"
|
|
vpc_id = var.vpc_id
|
|
|
|
ingress {
|
|
from_port = 443
|
|
to_port = 443
|
|
protocol = "tcp"
|
|
cidr_blocks = ["0.0.0.0/0"]
|
|
}
|
|
|
|
ingress {
|
|
from_port = 80
|
|
to_port = 80
|
|
protocol = "tcp"
|
|
cidr_blocks = ["0.0.0.0/0"]
|
|
}
|
|
|
|
egress {
|
|
from_port = 0
|
|
to_port = 0
|
|
protocol = "-1"
|
|
cidr_blocks = ["0.0.0.0/0"]
|
|
}
|
|
}
|
|
```
|
|
|
|
**Database Module**:
|
|
```hcl
|
|
# terraform/modules/database/main.tf
|
|
resource "aws_db_instance" "postgres" {
|
|
identifier = "veza-db-${var.environment}"
|
|
engine = "postgres"
|
|
engine_version = "15.4"
|
|
instance_class = var.instance_class
|
|
|
|
allocated_storage = var.allocated_storage
|
|
max_allocated_storage = var.max_allocated_storage
|
|
storage_encrypted = true
|
|
kms_key_id = var.kms_key_id
|
|
|
|
db_name = var.database_name
|
|
username = var.master_username
|
|
password = var.master_password # From Vault
|
|
|
|
vpc_security_group_ids = [aws_security_group.database.id]
|
|
db_subnet_group_name = aws_db_subnet_group.database.name
|
|
|
|
backup_retention_period = var.backup_retention_days
|
|
backup_window = "03:00-04:00"
|
|
maintenance_window = "mon:04:00-mon:05:00"
|
|
|
|
multi_az = var.multi_az
|
|
publicly_accessible = false
|
|
skip_final_snapshot = false
|
|
final_snapshot_identifier = "veza-db-${var.environment}-final-snapshot-${formatdate("YYYY-MM-DD-hhmm", timestamp())}"
|
|
|
|
enabled_cloudwatch_logs_exports = ["postgresql", "upgrade"]
|
|
|
|
tags = {
|
|
Name = "veza-db-${var.environment}"
|
|
Environment = var.environment
|
|
ManagedBy = "Terraform"
|
|
}
|
|
}
|
|
```
|
|
|
|
**Terraform Workflow**:
|
|
```bash
|
|
# Initialize
|
|
cd terraform/environments/production
|
|
terraform init
|
|
|
|
# Plan (review changes)
|
|
terraform plan -out=tfplan
|
|
|
|
# Apply (execute changes)
|
|
terraform apply tfplan
|
|
|
|
# Destroy (cleanup)
|
|
terraform destroy
|
|
```
|
|
|
|
### 2.2 Ansible Configuration
|
|
|
|
**Playbook Structure**:
|
|
```
|
|
ansible/
|
|
├── inventory/
|
|
│ ├── production/
|
|
│ │ ├── hosts.yml
|
|
│ │ └── group_vars/
|
|
│ └── staging/
|
|
│ ├── hosts.yml
|
|
│ └── group_vars/
|
|
├── playbooks/
|
|
│ ├── deploy-backend.yml
|
|
│ ├── deploy-chat-server.yml
|
|
│ ├── deploy-stream-server.yml
|
|
│ └── deploy-frontend.yml
|
|
├── roles/
|
|
│ ├── common/
|
|
│ ├── docker/
|
|
│ ├── nginx/
|
|
│ ├── postgres/
|
|
│ └── monitoring/
|
|
└── ansible.cfg
|
|
```
|
|
|
|
**Deployment Playbook**:
|
|
```yaml
|
|
# ansible/playbooks/deploy-backend.yml
|
|
---
|
|
- name: Deploy Veza Backend API
|
|
hosts: backend_servers
|
|
become: yes
|
|
|
|
vars:
|
|
app_name: veza-backend-api
|
|
app_version: "{{ lookup('env', 'VERSION') | default('latest') }}"
|
|
docker_image: "registry.veza.app/{{ app_name }}:{{ app_version }}"
|
|
|
|
tasks:
|
|
- name: Pull Docker image
|
|
docker_image:
|
|
name: "{{ docker_image }}"
|
|
source: pull
|
|
|
|
- name: Stop old container
|
|
docker_container:
|
|
name: "{{ app_name }}"
|
|
state: stopped
|
|
ignore_errors: yes
|
|
|
|
- name: Remove old container
|
|
docker_container:
|
|
name: "{{ app_name }}"
|
|
state: absent
|
|
ignore_errors: yes
|
|
|
|
- name: Start new container
|
|
docker_container:
|
|
name: "{{ app_name }}"
|
|
image: "{{ docker_image }}"
|
|
state: started
|
|
restart_policy: unless-stopped
|
|
ports:
|
|
- "8080:8080"
|
|
env:
|
|
DATABASE_URL: "{{ database_url }}"
|
|
REDIS_URL: "{{ redis_url }}"
|
|
JWT_SECRET: "{{ jwt_secret }}"
|
|
volumes:
|
|
- "/var/log/{{ app_name }}:/var/log/app"
|
|
healthcheck:
|
|
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
|
|
interval: 30s
|
|
timeout: 10s
|
|
retries: 3
|
|
start_period: 40s
|
|
|
|
- name: Wait for application to be healthy
|
|
uri:
|
|
url: http://localhost:8080/health
|
|
status_code: 200
|
|
register: result
|
|
until: result.status == 200
|
|
retries: 10
|
|
delay: 5
|
|
|
|
- name: Verify deployment
|
|
debug:
|
|
msg: "{{ app_name }} version {{ app_version }} deployed successfully"
|
|
```
|
|
|
|
## 3. CONTAINERIZATION
|
|
|
|
### 3.1 Docker Images
|
|
|
|
**Multi-Stage Build (Go)**:
|
|
```dockerfile
|
|
# veza-backend-api/Dockerfile
|
|
# Stage 1: Builder
|
|
FROM golang:1.21.5-alpine3.18 AS builder
|
|
|
|
WORKDIR /app
|
|
|
|
# Copy dependencies
|
|
COPY go.mod go.sum ./
|
|
RUN go mod download
|
|
|
|
# Copy source
|
|
COPY . .
|
|
|
|
# Build binary
|
|
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -ldflags="-w -s" -o main ./cmd/api
|
|
|
|
# Stage 2: Runner
|
|
FROM alpine:3.18
|
|
|
|
# Install CA certificates for HTTPS
|
|
RUN apk --no-cache add ca-certificates
|
|
|
|
WORKDIR /root/
|
|
|
|
# Copy binary from builder
|
|
COPY --from=builder /app/main .
|
|
|
|
# Create non-root user
|
|
RUN addgroup -g 1000 appuser && \
|
|
adduser -D -u 1000 -G appuser appuser
|
|
|
|
USER appuser
|
|
|
|
# Expose port
|
|
EXPOSE 8080
|
|
|
|
# Health check
|
|
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
|
|
CMD ["/root/main", "healthcheck"]
|
|
|
|
# Run
|
|
ENTRYPOINT ["./main"]
|
|
```
|
|
|
|
**Multi-Stage Build (Rust)**:
|
|
```dockerfile
|
|
# veza-chat-server/Dockerfile
|
|
FROM rust:1.75-alpine AS builder
|
|
|
|
WORKDIR /app
|
|
|
|
RUN apk add --no-cache musl-dev
|
|
|
|
# Copy dependencies
|
|
COPY Cargo.toml Cargo.lock ./
|
|
RUN mkdir src && echo "fn main() {}" > src/main.rs && cargo build --release && rm -rf src
|
|
|
|
# Copy source
|
|
COPY . .
|
|
|
|
# Build binary
|
|
RUN cargo build --release
|
|
|
|
# Stage 2: Runner
|
|
FROM alpine:3.18
|
|
|
|
WORKDIR /app
|
|
|
|
# Copy binary
|
|
COPY --from=builder /app/target/release/veza-chat-server .
|
|
|
|
# Create non-root user
|
|
RUN addgroup -g 1000 appuser && \
|
|
adduser -D -u 1000 -G appuser appuser
|
|
|
|
USER appuser
|
|
|
|
EXPOSE 8081
|
|
|
|
HEALTHCHECK --interval=30s --timeout=10s --start-period=20s --retries=3 \
|
|
CMD ["wget", "--quiet", "--tries=1", "--spider", "http://localhost:8081/health"]
|
|
|
|
ENTRYPOINT ["./veza-chat-server"]
|
|
```
|
|
|
|
**Frontend (React/Vite)**:
|
|
```dockerfile
|
|
# apps/web/Dockerfile
|
|
FROM node:20-alpine AS builder
|
|
|
|
WORKDIR /app
|
|
|
|
COPY package*.json ./
|
|
RUN npm ci
|
|
|
|
COPY . .
|
|
RUN npm run build
|
|
|
|
# Stage 2: Nginx
|
|
FROM nginx:1.25-alpine
|
|
|
|
COPY --from=builder /app/dist /usr/share/nginx/html
|
|
COPY nginx.conf /etc/nginx/conf.d/default.conf
|
|
|
|
EXPOSE 80
|
|
|
|
HEALTHCHECK --interval=30s --timeout=3s --start-period=10s --retries=3 \
|
|
CMD ["wget", "--quiet", "--tries=1", "--spider", "http://localhost/health"]
|
|
|
|
CMD ["nginx", "-g", "daemon off;"]
|
|
```
|
|
|
|
### 3.2 Docker Compose (Development)
|
|
|
|
```yaml
|
|
# docker-compose.yml
|
|
version: '3.9'
|
|
|
|
services:
|
|
postgres:
|
|
image: postgres:15-alpine
|
|
environment:
|
|
POSTGRES_DB: veza_db
|
|
POSTGRES_USER: veza
|
|
POSTGRES_PASSWORD: ${DB_PASSWORD:-password}
|
|
ports:
|
|
- "5432:5432"
|
|
volumes:
|
|
- postgres_data:/var/lib/postgresql/data
|
|
healthcheck:
|
|
test: ["CMD-SHELL", "pg_isready -U veza"]
|
|
interval: 10s
|
|
timeout: 5s
|
|
retries: 5
|
|
|
|
redis:
|
|
image: redis:7-alpine
|
|
ports:
|
|
- "6379:6379"
|
|
volumes:
|
|
- redis_data:/data
|
|
healthcheck:
|
|
test: ["CMD", "redis-cli", "ping"]
|
|
interval: 10s
|
|
timeout: 3s
|
|
retries: 5
|
|
|
|
backend:
|
|
build:
|
|
context: ./veza-backend-api
|
|
dockerfile: Dockerfile
|
|
ports:
|
|
- "8080:8080"
|
|
environment:
|
|
DATABASE_URL: postgresql://veza:${DB_PASSWORD:-password}@postgres:5432/veza_db
|
|
REDIS_URL: redis://redis:6379
|
|
JWT_SECRET: ${JWT_SECRET}
|
|
depends_on:
|
|
postgres:
|
|
condition: service_healthy
|
|
redis:
|
|
condition: service_healthy
|
|
healthcheck:
|
|
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
|
|
interval: 30s
|
|
timeout: 10s
|
|
retries: 3
|
|
|
|
chat-server:
|
|
build:
|
|
context: ./veza-chat-server
|
|
dockerfile: Dockerfile
|
|
ports:
|
|
- "8081:8081"
|
|
environment:
|
|
DATABASE_URL: postgresql://veza:${DB_PASSWORD:-password}@postgres:5432/veza_db
|
|
REDIS_URL: redis://redis:6379
|
|
depends_on:
|
|
postgres:
|
|
condition: service_healthy
|
|
redis:
|
|
condition: service_healthy
|
|
|
|
frontend:
|
|
build:
|
|
context: ./apps/web
|
|
dockerfile: Dockerfile
|
|
ports:
|
|
- "3000:80"
|
|
depends_on:
|
|
- backend
|
|
|
|
volumes:
|
|
postgres_data:
|
|
redis_data:
|
|
```
|
|
|
|
## 4. KUBERNETES ORCHESTRATION
|
|
|
|
### 4.1 Kubernetes Manifests
|
|
|
|
**Deployment (Backend)**:
|
|
```yaml
|
|
# k8s/backend/deployment.yaml
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: veza-backend
|
|
namespace: veza-production
|
|
labels:
|
|
app: veza-backend
|
|
version: v1.0.0
|
|
spec:
|
|
replicas: 3
|
|
strategy:
|
|
type: RollingUpdate
|
|
rollingUpdate:
|
|
maxSurge: 1
|
|
maxUnavailable: 0
|
|
selector:
|
|
matchLabels:
|
|
app: veza-backend
|
|
template:
|
|
metadata:
|
|
labels:
|
|
app: veza-backend
|
|
version: v1.0.0
|
|
spec:
|
|
securityContext:
|
|
runAsNonRoot: true
|
|
runAsUser: 1000
|
|
fsGroup: 1000
|
|
containers:
|
|
- name: backend
|
|
image: registry.veza.app/veza-backend-api:v1.0.0
|
|
imagePullPolicy: IfNotPresent
|
|
ports:
|
|
- containerPort: 8080
|
|
name: http
|
|
protocol: TCP
|
|
env:
|
|
- name: DATABASE_URL
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: veza-secrets
|
|
key: database-url
|
|
- name: REDIS_URL
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: veza-secrets
|
|
key: redis-url
|
|
- name: JWT_SECRET
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: veza-secrets
|
|
key: jwt-secret
|
|
resources:
|
|
requests:
|
|
cpu: 500m
|
|
memory: 512Mi
|
|
limits:
|
|
cpu: 1000m
|
|
memory: 1Gi
|
|
livenessProbe:
|
|
httpGet:
|
|
path: /health
|
|
port: 8080
|
|
initialDelaySeconds: 30
|
|
periodSeconds: 10
|
|
timeoutSeconds: 5
|
|
failureThreshold: 3
|
|
readinessProbe:
|
|
httpGet:
|
|
path: /ready
|
|
port: 8080
|
|
initialDelaySeconds: 10
|
|
periodSeconds: 5
|
|
timeoutSeconds: 3
|
|
failureThreshold: 3
|
|
securityContext:
|
|
allowPrivilegeEscalation: false
|
|
capabilities:
|
|
drop:
|
|
- ALL
|
|
readOnlyRootFilesystem: true
|
|
imagePullSecrets:
|
|
- name: registry-credentials
|
|
```
|
|
|
|
**Service**:
|
|
```yaml
|
|
# k8s/backend/service.yaml
|
|
apiVersion: v1
|
|
kind: Service
|
|
metadata:
|
|
name: veza-backend
|
|
namespace: veza-production
|
|
spec:
|
|
type: ClusterIP
|
|
selector:
|
|
app: veza-backend
|
|
ports:
|
|
- name: http
|
|
port: 80
|
|
targetPort: 8080
|
|
protocol: TCP
|
|
```
|
|
|
|
**Ingress**:
|
|
```yaml
|
|
# k8s/ingress.yaml
|
|
apiVersion: networking.k8s.io/v1
|
|
kind: Ingress
|
|
metadata:
|
|
name: veza-ingress
|
|
namespace: veza-production
|
|
annotations:
|
|
cert-manager.io/cluster-issuer: letsencrypt-prod
|
|
nginx.ingress.kubernetes.io/rate-limit: "100"
|
|
nginx.ingress.kubernetes.io/ssl-redirect: "true"
|
|
spec:
|
|
ingressClassName: nginx
|
|
tls:
|
|
- hosts:
|
|
- api.veza.app
|
|
- veza.app
|
|
secretName: veza-tls
|
|
rules:
|
|
- host: api.veza.app
|
|
http:
|
|
paths:
|
|
- path: /
|
|
pathType: Prefix
|
|
backend:
|
|
service:
|
|
name: veza-backend
|
|
port:
|
|
number: 80
|
|
- host: veza.app
|
|
http:
|
|
paths:
|
|
- path: /
|
|
pathType: Prefix
|
|
backend:
|
|
service:
|
|
name: veza-frontend
|
|
port:
|
|
number: 80
|
|
```
|
|
|
|
**HorizontalPodAutoscaler**:
|
|
```yaml
|
|
# k8s/backend/hpa.yaml
|
|
apiVersion: autoscaling/v2
|
|
kind: HorizontalPodAutoscaler
|
|
metadata:
|
|
name: veza-backend-hpa
|
|
namespace: veza-production
|
|
spec:
|
|
scaleTargetRef:
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
name: veza-backend
|
|
minReplicas: 3
|
|
maxReplicas: 10
|
|
metrics:
|
|
- type: Resource
|
|
resource:
|
|
name: cpu
|
|
target:
|
|
type: Utilization
|
|
averageUtilization: 70
|
|
- type: Resource
|
|
resource:
|
|
name: memory
|
|
target:
|
|
type: Utilization
|
|
averageUtilization: 80
|
|
behavior:
|
|
scaleUp:
|
|
stabilizationWindowSeconds: 60
|
|
policies:
|
|
- type: Percent
|
|
value: 100
|
|
periodSeconds: 60
|
|
scaleDown:
|
|
stabilizationWindowSeconds: 300
|
|
policies:
|
|
- type: Pods
|
|
value: 1
|
|
periodSeconds: 60
|
|
```
|
|
|
|
## 5. CI/CD PIPELINES
|
|
|
|
### 5.1 GitHub Actions Workflow
|
|
|
|
```yaml
|
|
# .github/workflows/deploy-production.yml
|
|
name: Deploy to Production
|
|
|
|
on:
|
|
push:
|
|
branches:
|
|
- main
|
|
tags:
|
|
- 'v*'
|
|
|
|
env:
|
|
REGISTRY: registry.veza.app
|
|
KUBE_NAMESPACE: veza-production
|
|
|
|
jobs:
|
|
build-and-test:
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: actions/checkout@v3
|
|
|
|
- name: Run tests
|
|
run: |
|
|
make test-all
|
|
|
|
- name: Security scan
|
|
run: |
|
|
make security-scan
|
|
|
|
build-backend:
|
|
needs: build-and-test
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: actions/checkout@v3
|
|
|
|
- name: Set up Docker Buildx
|
|
uses: docker/setup-buildx-action@v2
|
|
|
|
- name: Login to Registry
|
|
uses: docker/login-action@v2
|
|
with:
|
|
registry: ${{ env.REGISTRY }}
|
|
username: ${{ secrets.REGISTRY_USERNAME }}
|
|
password: ${{ secrets.REGISTRY_PASSWORD }}
|
|
|
|
- name: Extract metadata
|
|
id: meta
|
|
uses: docker/metadata-action@v4
|
|
with:
|
|
images: ${{ env.REGISTRY }}/veza-backend-api
|
|
tags: |
|
|
type=ref,event=branch
|
|
type=ref,event=pr
|
|
type=semver,pattern={{version}}
|
|
type=semver,pattern={{major}}.{{minor}}
|
|
type=sha,prefix={{branch}}-
|
|
|
|
- name: Build and push
|
|
uses: docker/build-push-action@v4
|
|
with:
|
|
context: ./veza-backend-api
|
|
push: true
|
|
tags: ${{ steps.meta.outputs.tags }}
|
|
labels: ${{ steps.meta.outputs.labels }}
|
|
cache-from: type=registry,ref=${{ env.REGISTRY }}/veza-backend-api:buildcache
|
|
cache-to: type=registry,ref=${{ env.REGISTRY }}/veza-backend-api:buildcache,mode=max
|
|
|
|
deploy-staging:
|
|
needs: [build-backend]
|
|
runs-on: ubuntu-latest
|
|
environment: staging
|
|
steps:
|
|
- uses: actions/checkout@v3
|
|
|
|
- name: Deploy to Staging
|
|
run: |
|
|
kubectl set image deployment/veza-backend \
|
|
backend=${{ env.REGISTRY }}/veza-backend-api:${{ github.sha }} \
|
|
-n veza-staging
|
|
kubectl rollout status deployment/veza-backend -n veza-staging --timeout=5m
|
|
|
|
- name: Run E2E tests
|
|
run: |
|
|
npm run test:e2e -- --env=staging
|
|
|
|
deploy-production:
|
|
needs: [deploy-staging]
|
|
runs-on: ubuntu-latest
|
|
environment: production
|
|
steps:
|
|
- uses: actions/checkout@v3
|
|
|
|
- name: Setup kubectl
|
|
uses: azure/setup-kubectl@v3
|
|
with:
|
|
version: 'v1.28.0'
|
|
|
|
- name: Configure kubectl
|
|
run: |
|
|
echo "${{ secrets.KUBE_CONFIG }}" | base64 -d > kubeconfig
|
|
export KUBECONFIG=./kubeconfig
|
|
|
|
- name: Deploy to Production (Blue-Green)
|
|
run: |
|
|
# Deploy green environment
|
|
kubectl apply -f k8s/backend/deployment-green.yaml
|
|
kubectl rollout status deployment/veza-backend-green -n ${{ env.KUBE_NAMESPACE }} --timeout=10m
|
|
|
|
# Run smoke tests
|
|
make smoke-tests ENDPOINT=https://green.api.veza.app
|
|
|
|
# Switch traffic to green
|
|
kubectl patch service veza-backend -n ${{ env.KUBE_NAMESPACE }} \
|
|
-p '{"spec":{"selector":{"version":"green"}}}'
|
|
|
|
# Wait for validation
|
|
sleep 60
|
|
|
|
# Monitor metrics
|
|
if ! make verify-deployment; then
|
|
echo "Deployment verification failed, rolling back..."
|
|
kubectl patch service veza-backend -n ${{ env.KUBE_NAMESPACE }} \
|
|
-p '{"spec":{"selector":{"version":"blue"}}}'
|
|
exit 1
|
|
fi
|
|
|
|
# Delete old blue deployment
|
|
kubectl delete deployment veza-backend-blue -n ${{ env.KUBE_NAMESPACE }}
|
|
|
|
- name: Notify Slack
|
|
if: always()
|
|
uses: slackapi/slack-github-action@v1
|
|
with:
|
|
payload: |
|
|
{
|
|
"text": "Production deployment ${{ job.status }}: ${{ github.sha }}"
|
|
}
|
|
env:
|
|
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}
|
|
```
|
|
|
|
## 6. ZERO-DOWNTIME STRATEGIES
|
|
|
|
### 6.1 Blue-Green Deployment
|
|
|
|
**Process**:
|
|
1. **Blue** (current production) serves all traffic
|
|
2. Deploy **Green** (new version) in parallel
|
|
3. Test Green thoroughly (smoke tests, health checks)
|
|
4. Switch load balancer from Blue to Green (instant cutover)
|
|
5. Monitor Green for issues (5-10 min)
|
|
6. If issues: Rollback to Blue (instant)
|
|
7. If stable: Decommission Blue
|
|
|
|
**Kubernetes Implementation**:
|
|
```bash
|
|
# Deploy green
|
|
kubectl apply -f k8s/backend/deployment-green.yaml
|
|
|
|
# Wait for readiness
|
|
kubectl wait --for=condition=available --timeout=10m deployment/veza-backend-green
|
|
|
|
# Switch service selector
|
|
kubectl patch service veza-backend -p '{"spec":{"selector":{"version":"green"}}}'
|
|
|
|
# Monitor
|
|
watch kubectl get pods -l app=veza-backend
|
|
|
|
# Rollback if needed
|
|
kubectl patch service veza-backend -p '{"spec":{"selector":{"version":"blue"}}}'
|
|
```
|
|
|
|
### 6.2 Canary Deployment
|
|
|
|
**Process**:
|
|
1. Deploy new version (canary) with 5% traffic
|
|
2. Monitor metrics (error rate, latency)
|
|
3. Gradually increase traffic: 5% → 25% → 50% → 100%
|
|
4. At each stage, verify metrics are healthy
|
|
5. If issues detected: Rollback immediately
|
|
|
|
**Kubernetes with Istio**:
|
|
```yaml
|
|
# k8s/canary/virtualservice.yaml
|
|
apiVersion: networking.istio.io/v1beta1
|
|
kind: VirtualService
|
|
metadata:
|
|
name: veza-backend
|
|
spec:
|
|
hosts:
|
|
- veza-backend
|
|
http:
|
|
- match:
|
|
- headers:
|
|
canary:
|
|
exact: "true"
|
|
route:
|
|
- destination:
|
|
host: veza-backend
|
|
subset: canary
|
|
- route:
|
|
- destination:
|
|
host: veza-backend
|
|
subset: stable
|
|
weight: 95
|
|
- destination:
|
|
host: veza-backend
|
|
subset: canary
|
|
weight: 5
|
|
```
|
|
|
|
**Automated Canary with Flagger**:
|
|
```yaml
|
|
# k8s/canary/flagger-canary.yaml
|
|
apiVersion: flagger.app/v1beta1
|
|
kind: Canary
|
|
metadata:
|
|
name: veza-backend
|
|
namespace: veza-production
|
|
spec:
|
|
targetRef:
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
name: veza-backend
|
|
service:
|
|
port: 80
|
|
analysis:
|
|
interval: 1m
|
|
threshold: 5
|
|
maxWeight: 50
|
|
stepWeight: 10
|
|
metrics:
|
|
- name: request-success-rate
|
|
thresholdRange:
|
|
min: 99
|
|
interval: 1m
|
|
- name: request-duration
|
|
thresholdRange:
|
|
max: 500
|
|
interval: 1m
|
|
webhooks:
|
|
- name: acceptance-test
|
|
type: pre-rollout
|
|
url: http://flagger-loadtester.test/
|
|
timeout: 30s
|
|
metadata:
|
|
type: bash
|
|
cmd: "curl -s http://veza-backend-canary/health | grep -q ok"
|
|
```
|
|
|
|
## 7. CONFIGURATION MANAGEMENT
|
|
|
|
### 7.1 ConfigMap (Non-Sensitive Config)
|
|
|
|
```yaml
|
|
# k8s/backend/configmap.yaml
|
|
apiVersion: v1
|
|
kind: ConfigMap
|
|
metadata:
|
|
name: veza-backend-config
|
|
namespace: veza-production
|
|
data:
|
|
APP_ENV: "production"
|
|
LOG_LEVEL: "info"
|
|
API_RATE_LIMIT: "300"
|
|
MAX_UPLOAD_SIZE: "500MB"
|
|
CORS_ORIGINS: "https://veza.app,https://www.veza.app"
|
|
```
|
|
|
|
### 7.2 Secrets (Sensitive Data)
|
|
|
|
```yaml
|
|
# k8s/backend/secret.yaml (encrypted with SOPS or sealed-secrets)
|
|
apiVersion: v1
|
|
kind: Secret
|
|
metadata:
|
|
name: veza-secrets
|
|
namespace: veza-production
|
|
type: Opaque
|
|
data:
|
|
database-url: <base64-encoded>
|
|
redis-url: <base64-encoded>
|
|
jwt-secret: <base64-encoded>
|
|
stripe-api-key: <base64-encoded>
|
|
```
|
|
|
|
**Create Secret from Vault**:
|
|
```bash
|
|
# Fetch from Vault and create K8s secret
|
|
vault kv get -field=database_url secret/veza/production | base64 | \
|
|
kubectl create secret generic veza-secrets \
|
|
--from-literal=database-url=- \
|
|
-n veza-production
|
|
```
|
|
|
|
## 8. SECRETS MANAGEMENT
|
|
|
|
### 8.1 HashiCorp Vault
|
|
|
|
**Vault Structure**:
|
|
```
|
|
secret/
|
|
├── veza/
|
|
│ ├── production/
|
|
│ │ ├── database_url
|
|
│ │ ├── redis_url
|
|
│ │ ├── jwt_secret
|
|
│ │ ├── stripe_api_key
|
|
│ │ ├── aws_access_key
|
|
│ │ └── aws_secret_key
|
|
│ └── staging/
|
|
│ └── ...
|
|
```
|
|
|
|
**Store Secret**:
|
|
```bash
|
|
# Write secret
|
|
vault kv put secret/veza/production \
|
|
database_url="postgresql://..." \
|
|
redis_url="redis://..." \
|
|
jwt_secret="..."
|
|
|
|
# Read secret
|
|
vault kv get secret/veza/production
|
|
|
|
# Rotate secret (new version)
|
|
vault kv put secret/veza/production jwt_secret="new-secret"
|
|
```
|
|
|
|
**Vault Agent Injector (Kubernetes)**:
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: Pod
|
|
metadata:
|
|
annotations:
|
|
vault.hashicorp.com/agent-inject: "true"
|
|
vault.hashicorp.com/role: "veza-backend"
|
|
vault.hashicorp.com/agent-inject-secret-database: "secret/data/veza/production"
|
|
vault.hashicorp.com/agent-inject-template-database: |
|
|
{{- with secret "secret/data/veza/production" -}}
|
|
export DATABASE_URL="{{ .Data.data.database_url }}"
|
|
{{- end }}
|
|
```
|
|
|
|
## 9. MONITORING & OBSERVABILITY
|
|
|
|
### 9.1 Prometheus + Grafana
|
|
|
|
**Prometheus Configuration**:
|
|
```yaml
|
|
# prometheus/prometheus.yml
|
|
global:
|
|
scrape_interval: 15s
|
|
evaluation_interval: 15s
|
|
|
|
scrape_configs:
|
|
- job_name: 'veza-backend'
|
|
kubernetes_sd_configs:
|
|
- role: pod
|
|
relabel_configs:
|
|
- source_labels: [__meta_kubernetes_pod_label_app]
|
|
action: keep
|
|
regex: veza-backend
|
|
- source_labels: [__meta_kubernetes_pod_ip]
|
|
target_label: __address__
|
|
replacement: $1:8080
|
|
|
|
- job_name: 'postgres'
|
|
static_configs:
|
|
- targets: ['postgres-exporter:9187']
|
|
|
|
- job_name: 'redis'
|
|
static_configs:
|
|
- targets: ['redis-exporter:9121']
|
|
```
|
|
|
|
**Grafana Dashboard**:
|
|
- **API Latency**: p50, p95, p99 response times
|
|
- **Throughput**: Requests per second
|
|
- **Error Rate**: 4xx, 5xx errors
|
|
- **Database**: Query time, connections, slow queries
|
|
- **Cache Hit Rate**: Redis hit/miss ratio
|
|
|
|
### 9.2 Logging (ELK Stack)
|
|
|
|
**Filebeat Configuration**:
|
|
```yaml
|
|
# filebeat/filebeat.yml
|
|
filebeat.inputs:
|
|
- type: container
|
|
paths:
|
|
- '/var/lib/docker/containers/*/*.log'
|
|
processors:
|
|
- add_kubernetes_metadata:
|
|
host: ${NODE_NAME}
|
|
matchers:
|
|
- logs_path:
|
|
logs_path: "/var/lib/docker/containers/"
|
|
|
|
output.elasticsearch:
|
|
hosts: ["elasticsearch:9200"]
|
|
index: "veza-logs-%{+yyyy.MM.dd}"
|
|
```
|
|
|
|
### 9.3 Tracing (Jaeger)
|
|
|
|
**OpenTelemetry Integration**:
|
|
```go
|
|
// Go - OpenTelemetry setup
|
|
import (
|
|
"go.opentelemetry.io/otel"
|
|
"go.opentelemetry.io/otel/exporters/jaeger"
|
|
"go.opentelemetry.io/otel/sdk/trace"
|
|
)
|
|
|
|
func initTracer() (*trace.TracerProvider, error) {
|
|
exporter, err := jaeger.New(jaeger.WithCollectorEndpoint(jaeger.WithEndpoint("http://jaeger:14268/api/traces")))
|
|
if err != nil {
|
|
return nil, err
|
|
}
|
|
|
|
tp := trace.NewTracerProvider(
|
|
trace.WithBatcher(exporter),
|
|
trace.WithResource(resource.NewWithAttributes(
|
|
semconv.SchemaURL,
|
|
semconv.ServiceNameKey.String("veza-backend-api"),
|
|
)),
|
|
)
|
|
|
|
otel.SetTracerProvider(tp)
|
|
return tp, nil
|
|
}
|
|
```
|
|
|
|
## 10. BACKUP & DISASTER RECOVERY
|
|
|
|
### 10.1 Database Backups
|
|
|
|
**Automated Backup Strategy**:
|
|
- **Daily**: Full backup (3 AM UTC)
|
|
- **Hourly**: Incremental backup
|
|
- **Retention**: 30 days daily, 12 weeks weekly, 2 years monthly
|
|
|
|
**Backup Script**:
|
|
```bash
|
|
#!/bin/bash
|
|
# scripts/backup-database.sh
|
|
|
|
DATE=$(date +%Y%m%d_%H%M%S)
|
|
BACKUP_DIR="/backups/postgres"
|
|
DATABASE="veza_db"
|
|
|
|
# Full backup
|
|
pg_dump -Fc -f "$BACKUP_DIR/veza_db_$DATE.dump" "$DATABASE"
|
|
|
|
# Encrypt
|
|
gpg --encrypt --recipient backup@veza.app "$BACKUP_DIR/veza_db_$DATE.dump"
|
|
|
|
# Upload to S3
|
|
aws s3 cp "$BACKUP_DIR/veza_db_$DATE.dump.gpg" s3://veza-backups/postgres/
|
|
|
|
# Cleanup local backups > 7 days
|
|
find "$BACKUP_DIR" -name "*.dump.gpg" -mtime +7 -delete
|
|
```
|
|
|
|
**Restore Procedure**:
|
|
```bash
|
|
#!/bin/bash
|
|
# scripts/restore-database.sh
|
|
|
|
BACKUP_FILE=$1
|
|
|
|
# Download from S3
|
|
aws s3 cp "s3://veza-backups/postgres/$BACKUP_FILE" /tmp/
|
|
|
|
# Decrypt
|
|
gpg --decrypt "/tmp/$BACKUP_FILE" > "/tmp/backup.dump"
|
|
|
|
# Restore
|
|
pg_restore -d veza_db "/tmp/backup.dump"
|
|
```
|
|
|
|
### 10.2 Disaster Recovery Plan
|
|
|
|
**RTO (Recovery Time Objective)**: < 4 hours
|
|
**RPO (Recovery Point Objective)**: < 1 hour
|
|
|
|
**Recovery Procedures**:
|
|
1. **Database Failure**: Failover to standby replica (< 5 min)
|
|
2. **Application Failure**: Rollback deployment (< 5 min)
|
|
3. **Complete Region Failure**: Failover to DR region (< 4 hours)
|
|
|
|
## 11. SCALING STRATEGY
|
|
|
|
### 11.1 Horizontal Scaling
|
|
|
|
**Auto-Scaling Rules**:
|
|
- **CPU > 70%**: Scale up
|
|
- **CPU < 30%**: Scale down (after 5 min stability)
|
|
- **Memory > 80%**: Scale up
|
|
- **Request queue > 100**: Scale up
|
|
|
|
### 11.2 Database Scaling
|
|
|
|
**Read Replicas**:
|
|
- 2 read replicas minimum
|
|
- Route read queries to replicas
|
|
- Write queries to primary only
|
|
|
|
**Connection Pooling** (PgBouncer):
|
|
```ini
|
|
[databases]
|
|
veza_db = host=postgres port=5432 dbname=veza_db
|
|
|
|
[pgbouncer]
|
|
pool_mode = transaction
|
|
max_client_conn = 1000
|
|
default_pool_size = 25
|
|
reserve_pool_size = 5
|
|
```
|
|
|
|
## 12. OPERATIONAL PROCEDURES
|
|
|
|
### 12.1 Deployment Checklist
|
|
|
|
**Pre-Deployment**:
|
|
- [ ] All tests pass (unit, integration, E2E)
|
|
- [ ] Security scan completed (no critical vulnerabilities)
|
|
- [ ] Database migrations tested in staging
|
|
- [ ] Rollback plan documented
|
|
- [ ] Monitoring dashboards ready
|
|
- [ ] On-call engineer notified
|
|
- [ ] Deployment window scheduled (low-traffic period)
|
|
|
|
**During Deployment**:
|
|
- [ ] Monitor error rates in real-time
|
|
- [ ] Monitor response times (p95, p99)
|
|
- [ ] Check logs for errors
|
|
- [ ] Verify database migrations applied
|
|
- [ ] Test critical user flows
|
|
|
|
**Post-Deployment**:
|
|
- [ ] Verify all services healthy
|
|
- [ ] Run smoke tests
|
|
- [ ] Monitor for 30 minutes
|
|
- [ ] Update deployment log
|
|
- [ ] Notify stakeholders
|
|
|
|
### 12.2 Rollback Procedure
|
|
|
|
**Immediate Rollback** (< 5 min):
|
|
```bash
|
|
# Kubernetes
|
|
kubectl rollout undo deployment/veza-backend -n veza-production
|
|
|
|
# Verify
|
|
kubectl rollout status deployment/veza-backend -n veza-production
|
|
|
|
# Check logs
|
|
kubectl logs -f deployment/veza-backend -n veza-production
|
|
```
|
|
|
|
### 12.3 Incident Response
|
|
|
|
**Severity Levels**:
|
|
- **P0 (Critical)**: Production down, data breach
|
|
- **P1 (High)**: Major feature broken, performance degradation
|
|
- **P2 (Medium)**: Minor feature broken
|
|
- **P3 (Low)**: Cosmetic issues
|
|
|
|
**Response Procedure**:
|
|
1. Acknowledge incident (< 5 min)
|
|
2. Assess severity
|
|
3. Notify stakeholders
|
|
4. Mitigate (rollback, hotfix, scaling)
|
|
5. Root cause analysis
|
|
6. Post-mortem
|
|
|
|
## 13. CORRECTIFS DE SÉCURITÉ PRIORITAIRES
|
|
|
|
> Identifiés lors de l'audit sécurité du 2026-03-04. Ces procédures sont **bloquantes** pour tout déploiement en production.
|
|
|
|
### 13.1 Rotation du Secret JWT
|
|
|
|
Le secret JWT doit être rotaté régulièrement (minimum trimestriel) et immédiatement en cas de suspicion de compromission.
|
|
|
|
**Procédure de rotation** :
|
|
```bash
|
|
#!/bin/bash
|
|
# scripts/rotate-jwt-secret.sh
|
|
set -euo pipefail
|
|
|
|
NEW_SECRET=$(openssl rand -base64 64)
|
|
|
|
# 1. Stocker le nouveau secret dans Vault
|
|
vault kv put secret/veza/production jwt_secret="$NEW_SECRET"
|
|
|
|
# 2. Mettre à jour le secret Kubernetes
|
|
kubectl create secret generic veza-jwt-secret \
|
|
--from-literal=jwt-secret="$NEW_SECRET" \
|
|
--dry-run=client -o yaml | kubectl apply -f - -n veza-production
|
|
|
|
# 3. Redémarrage progressif (rolling restart) pour charger le nouveau secret
|
|
kubectl rollout restart deployment/veza-backend -n veza-production
|
|
kubectl rollout restart deployment/veza-chat-server -n veza-production
|
|
|
|
# 4. Attendre que le rollout soit terminé
|
|
kubectl rollout status deployment/veza-backend -n veza-production --timeout=5m
|
|
kubectl rollout status deployment/veza-chat-server -n veza-production --timeout=5m
|
|
|
|
echo "JWT secret rotation complete. Old tokens will expire naturally."
|
|
```
|
|
|
|
**Points critiques** :
|
|
- Pendant la rotation, les anciens tokens restent valides jusqu'à leur expiration naturelle (configurer une durée de vie courte : 15 min access, 7 jours refresh)
|
|
- Tester en staging avant chaque rotation production
|
|
- Logger l'événement de rotation (sans le secret) dans l'audit log
|
|
|
|
### 13.2 Alignement JWT Issuer/Audience Go ↔ Rust
|
|
|
|
Le backend Go et le chat-server Rust doivent utiliser les mêmes claims JWT (`iss`, `aud`) pour éviter les rejets de tokens inter-services.
|
|
|
|
**Configuration alignée** :
|
|
```yaml
|
|
# Configuration partagée — identique Go et Rust
|
|
jwt:
|
|
issuer: "https://api.veza.app"
|
|
audience: "https://veza.app"
|
|
algorithm: "HS256" # Migrer vers RS256 — voir section 15
|
|
access_token_ttl: "15m"
|
|
refresh_token_ttl: "7d"
|
|
```
|
|
|
|
**Vérification Go** :
|
|
```go
|
|
// veza-backend-api/internal/auth/jwt.go
|
|
claims := jwt.MapClaims{
|
|
"iss": "https://api.veza.app",
|
|
"aud": "https://veza.app",
|
|
"sub": userID,
|
|
"exp": time.Now().Add(15 * time.Minute).Unix(),
|
|
"iat": time.Now().Unix(),
|
|
}
|
|
```
|
|
|
|
**Vérification Rust** :
|
|
```rust
|
|
// veza-chat-server/src/auth.rs
|
|
let validation = Validation::new(Algorithm::HS256);
|
|
validation.set_issuer(&["https://api.veza.app"]);
|
|
validation.set_audience(&["https://veza.app"]);
|
|
```
|
|
|
|
**Procédure de déploiement** :
|
|
1. Mettre à jour la config Rust pour accepter le même `iss`/`aud`
|
|
2. Déployer le chat-server Rust en premier (il accepte les tokens existants + nouveaux)
|
|
3. Mettre à jour la config Go
|
|
4. Déployer le backend Go
|
|
5. Vérifier les logs des deux services : zéro erreur `invalid issuer` ou `invalid audience`
|
|
|
|
### 13.3 Protection de la Route /metrics
|
|
|
|
La route `/metrics` (Prometheus) expose des métriques internes et ne doit **jamais** être accessible publiquement.
|
|
|
|
**Nginx/Ingress — bloquer l'accès externe** :
|
|
```yaml
|
|
# k8s/ingress.yaml — ajout d'annotation
|
|
apiVersion: networking.k8s.io/v1
|
|
kind: Ingress
|
|
metadata:
|
|
name: veza-ingress
|
|
annotations:
|
|
nginx.ingress.kubernetes.io/server-snippet: |
|
|
location /metrics {
|
|
deny all;
|
|
return 404;
|
|
}
|
|
```
|
|
|
|
**NetworkPolicy — restreindre au namespace monitoring** :
|
|
```yaml
|
|
apiVersion: networking.k8s.io/v1
|
|
kind: NetworkPolicy
|
|
metadata:
|
|
name: allow-metrics-from-prometheus
|
|
namespace: veza-production
|
|
spec:
|
|
podSelector:
|
|
matchLabels:
|
|
app: veza-backend
|
|
ingress:
|
|
- from:
|
|
- namespaceSelector:
|
|
matchLabels:
|
|
name: monitoring
|
|
ports:
|
|
- port: 8080
|
|
protocol: TCP
|
|
```
|
|
|
|
**Vérification post-déploiement** :
|
|
```bash
|
|
# Doit retourner 404 ou connexion refusée depuis l'extérieur
|
|
curl -s -o /dev/null -w "%{http_code}" https://api.veza.app/metrics
|
|
# Expected: 404
|
|
|
|
# Doit fonctionner depuis le pod Prometheus
|
|
kubectl exec -n monitoring deploy/prometheus -- curl -s http://veza-backend.veza-production/metrics | head -5
|
|
# Expected: 200 avec métriques
|
|
```
|
|
|
|
## 14. CHECKLIST DE DÉPLOIEMENT ÉTHIQUE
|
|
|
|
Avant chaque déploiement en production, les points suivants doivent être vérifiés. Cette checklist complète la checklist opérationnelle standard (section 12.1).
|
|
|
|
### 14.1 Protection des Données Personnelles
|
|
|
|
- [ ] **Métriques Prometheus** : vérifier qu'aucune métrique n'expose de données personnelles (emails, IPs, user agents complets, identifiants utilisateur)
|
|
```bash
|
|
# Vérification automatisée — aucune métrique ne doit contenir ces patterns
|
|
kubectl exec -n monitoring deploy/prometheus -- \
|
|
curl -s http://veza-backend.veza-production/metrics | \
|
|
grep -iE '(email|user_agent|ip_address|@)' && \
|
|
echo "FAIL: Personal data found in metrics" && exit 1 || \
|
|
echo "PASS: No personal data in metrics"
|
|
```
|
|
|
|
- [ ] **Logs anonymisés** : confirmer que les logs applicatifs ne contiennent pas de données personnelles en clair
|
|
```bash
|
|
# Vérification sur les 1000 dernières lignes de logs
|
|
kubectl logs deployment/veza-backend -n veza-production --tail=1000 | \
|
|
grep -iE '([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})' && \
|
|
echo "FAIL: Email addresses found in logs" && exit 1 || \
|
|
echo "PASS: No emails in logs"
|
|
```
|
|
|
|
- [ ] **Headers HTTP** : aucun header de tracking ou fingerprinting n'est émis par les services
|
|
|
|
### 14.2 Conformité RGPD
|
|
|
|
- [ ] **Export des données** : l'endpoint `POST /api/v1/me/data-export` retourne toutes les données de l'utilisateur authentifié (profil, tracks, playlists, historique)
|
|
- [ ] **Suppression de compte** : l'endpoint `DELETE /api/v1/me` supprime l'utilisateur et toutes ses données associées (cascade)
|
|
- [ ] **Consentement cookies** : seuls les cookies strictement nécessaires (session JWT) sont envoyés sans consentement
|
|
|
|
### 14.3 Intégrité Éthique
|
|
|
|
- [ ] **Pas de dépendance AI/ML** : aucune image Docker ne contient de framework ML (TensorFlow, PyTorch, scikit-learn, ONNX)
|
|
- [ ] **Pas de tracking tiers** : aucun script Google Analytics, Facebook Pixel, ou équivalent dans le bundle frontend
|
|
- [ ] **Algorithme de découverte** : les tests de biais (section 14 de ORIGIN_TESTING_STRATEGY) passent en CI
|
|
|
|
## 15. PLAN DE MIGRATION JWT HS256 → RS256
|
|
|
|
### 15.1 Motivation
|
|
|
|
HS256 (HMAC symétrique) nécessite le partage du même secret entre tous les services. RS256 (RSA asymétrique) permet au backend Go de signer avec une clé privée tandis que les autres services (Rust, frontend) vérifient avec la clé publique, réduisant la surface d'attaque.
|
|
|
|
### 15.2 Architecture Cible
|
|
|
|
```
|
|
┌──────────────────┐ ┌──────────────────┐
|
|
│ Backend Go │ │ Chat Server │
|
|
│ (signe JWT) │ │ Rust │
|
|
│ │ │ (vérifie JWT) │
|
|
│ Clé PRIVÉE RS256│ │ Clé PUBLIQUE │
|
|
└──────────────────┘ └──────────────────┘
|
|
│ │
|
|
│ Même clé publique │
|
|
└────────────┬───────────────┘
|
|
│
|
|
┌───────▼───────┐
|
|
│ Frontend │
|
|
│ (vérifie JWT │
|
|
│ optionnel) │
|
|
│ Clé PUBLIQUE │
|
|
└───────────────┘
|
|
```
|
|
|
|
### 15.3 Étapes de Migration
|
|
|
|
**Phase 1 — Préparation (semaine 1)** :
|
|
```bash
|
|
# Générer la paire de clés RSA 4096 bits
|
|
openssl genrsa -out jwt-private.pem 4096
|
|
openssl rsa -in jwt-private.pem -pubout -out jwt-public.pem
|
|
|
|
# Stocker dans Vault
|
|
vault kv put secret/veza/production \
|
|
jwt_private_key=@jwt-private.pem \
|
|
jwt_public_key=@jwt-public.pem
|
|
|
|
# Nettoyer les fichiers locaux
|
|
shred -u jwt-private.pem jwt-public.pem
|
|
```
|
|
|
|
**Phase 2 — Double validation (semaine 2)** :
|
|
|
|
Modifier les services pour accepter **les deux algorithmes** pendant la transition :
|
|
|
|
```go
|
|
// veza-backend-api — signe en RS256, valide HS256 et RS256
|
|
func (a *Auth) ValidateToken(tokenString string) (*Claims, error) {
|
|
token, err := jwt.Parse(tokenString, func(t *jwt.Token) (interface{}, error) {
|
|
switch t.Method.(type) {
|
|
case *jwt.SigningMethodRSA:
|
|
return a.rsaPublicKey, nil
|
|
case *jwt.SigningMethodHMAC:
|
|
return []byte(a.hmacSecret), nil
|
|
default:
|
|
return nil, fmt.Errorf("unexpected signing method: %v", t.Header["alg"])
|
|
}
|
|
})
|
|
// ...
|
|
}
|
|
```
|
|
|
|
```rust
|
|
// veza-chat-server — valide HS256 et RS256
|
|
fn validate_token(token: &str, config: &AuthConfig) -> Result<Claims, AuthError> {
|
|
// Try RS256 first, fall back to HS256
|
|
let rs256_result = decode::<Claims>(
|
|
token,
|
|
&DecodingKey::from_rsa_pem(config.rsa_public_key.as_bytes())?,
|
|
&Validation::new(Algorithm::RS256),
|
|
);
|
|
|
|
match rs256_result {
|
|
Ok(data) => Ok(data.claims),
|
|
Err(_) => {
|
|
let hs256_result = decode::<Claims>(
|
|
token,
|
|
&DecodingKey::from_secret(config.hmac_secret.as_bytes()),
|
|
&Validation::new(Algorithm::HS256),
|
|
);
|
|
hs256_result.map(|d| d.claims).map_err(AuthError::from)
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**Phase 3 — Basculement (semaine 3)** :
|
|
1. Déployer le backend Go pour signer exclusivement en RS256
|
|
2. Attendre l'expiration de tous les tokens HS256 (max 7 jours pour les refresh tokens)
|
|
3. Supprimer le code de fallback HS256 des services Go et Rust
|
|
4. Supprimer le secret HS256 de Vault
|
|
|
|
**Phase 4 — Nettoyage (semaine 4)** :
|
|
```bash
|
|
# Supprimer l'ancien secret HS256 de Vault
|
|
vault kv delete secret/veza/production/jwt_secret_hmac
|
|
|
|
# Vérifier qu'aucun service n'utilise encore HS256
|
|
kubectl logs -l app=veza-backend -n veza-production --since=24h | \
|
|
grep -i "hs256" && echo "WARNING: HS256 still in use" || echo "CLEAN: No HS256 usage"
|
|
```
|
|
|
|
### 15.4 Rollback Plan
|
|
|
|
Si des problèmes sont détectés pendant la migration :
|
|
1. Remettre le backend Go en mode signature HS256
|
|
2. Les services continuent d'accepter les deux formats
|
|
3. Investiguer et corriger avant de retenter
|
|
|
|
### 15.5 Critères de Succès
|
|
|
|
- [ ] Tous les services acceptent RS256
|
|
- [ ] Aucun token HS256 en circulation (après expiration naturelle)
|
|
- [ ] La clé privée RSA est uniquement accessible au backend Go
|
|
- [ ] Les tests d'intégration inter-services passent avec RS256
|
|
- [ ] Performance : la vérification RS256 < 1ms (benchmarker)
|
|
|
|
## ✅ CHECKLIST DE VALIDATION
|
|
|
|
### Infrastructure
|
|
- [ ] Infrastructure as Code (Terraform) complete
|
|
- [ ] All resources versioned in Git
|
|
- [ ] Secrets in Vault (no plaintext)
|
|
- [ ] Automated provisioning tested
|
|
|
|
### Deployment
|
|
- [ ] CI/CD pipeline functional
|
|
- [ ] Zero-downtime deployment strategy (blue-green or canary)
|
|
- [ ] Automated rollback configured
|
|
- [ ] Health checks implemented
|
|
|
|
### Monitoring
|
|
- [ ] Prometheus + Grafana dashboards
|
|
- [ ] Alerting configured (PagerDuty/Slack)
|
|
- [ ] Logging centralized (ELK Stack)
|
|
- [ ] Tracing implemented (Jaeger)
|
|
|
|
### Disaster Recovery
|
|
- [ ] Automated backups (daily + hourly)
|
|
- [ ] Backup restoration tested
|
|
- [ ] Failover procedure documented
|
|
- [ ] RTO < 4h, RPO < 1h validated
|
|
|
|
## 📊 MÉTRIQUES DE SUCCÈS
|
|
|
|
### Deployment Metrics
|
|
- **Deployment Frequency**: Multiple per day
|
|
- **Lead Time**: < 1 hour (commit to production)
|
|
- **MTTR (Mean Time To Recovery)**: < 5 minutes
|
|
- **Change Failure Rate**: < 5%
|
|
|
|
### Operational Metrics
|
|
- **Uptime**: > 99.9%
|
|
- **RTO**: < 4 hours
|
|
- **RPO**: < 1 hour
|
|
- **Deployment Success Rate**: > 95%
|
|
|
|
## 🔄 HISTORIQUE DES VERSIONS
|
|
|
|
| Version | Date | Changements |
|
|
|---------|------|-------------|
|
|
| 1.0.0 | 2025-11-02 | Version initiale - Guide de déploiement complet |
|
|
| 2.0.0 | 2026-03-04 | Audit sécurité : ajout correctifs prioritaires (rotation JWT, alignement issuer/audience Go↔Rust, protection /metrics), checklist de déploiement éthique (données personnelles, RGPD, intégrité), plan de migration JWT HS256→RS256. |
|
|
|
|
---
|
|
|
|
## ⚠️ AVERTISSEMENT
|
|
|
|
**CE GUIDE EST IMMUABLE**
|
|
|
|
---
|
|
|
|
**Document créé par**: DevOps Team + SRE
|
|
**Date de création**: 2025-11-02
|
|
**Dernière révision**: 2026-03-04 (audit sécurité)
|
|
**Prochaine révision**: Quarterly (2026-06-01)
|
|
**Propriétaire**: DevOps Lead
|
|
|
|
**Statut**: ✅ **APPROUVÉ ET VERROUILLÉ**
|
|
|