senke/veza

senke 2df921abd5 v0.9.1

2026-03-05 19:22:31 +01:00

45 KiB

Raw Blame History

ORIGIN_DEPLOYMENT_GUIDE.md

📋 RÉSUMÉ EXÉCUTIF

Ce document définit le guide de déploiement complet pour la plateforme Veza en production. Il couvre Infrastructure as Code (Terraform/Ansible), containerisation (Docker/Incus), orchestration (Kubernetes), CI/CD pipelines, stratégies zero-downtime, disaster recovery, monitoring, et procedures opérationnelles pour déploiements sécurisés, automatisés et réversibles sur 24 mois.

🎯 OBJECTIFS

Objectif Principal

Établir un processus de déploiement automatisé, sécurisé, reproductible et zero-downtime pour production avec rollback < 5 min, déploiements multiples par jour, et RTO < 4 heures en cas de disaster.

Objectifs Secondaires

Automatisation complète (Infrastructure as Code)
Zero-downtime deployments (blue-green, canary)
Rollback automatique en cas d'échec (< 5 min)
Disaster recovery plan opérationnel (RTO < 4h, RPO < 1h)
Monitoring et alerting en temps réel (Prometheus + Grafana)

🔒 RÈGLES IMMUABLES

Infrastructure as Code: 100% infrastructure versionnée (Terraform) - aucun changement manuel
Immutable Infrastructure: Jamais modifier serveurs existants, toujours redéployer
Zero Downtime: Aucun déploiement ne peut interrompre service (blue-green ou canary obligatoire)
Automated Rollback: Rollback automatique si health checks fail (< 5 min)
Version Control: Toutes les configs versionnées (Git) - aucune exception
Secrets in Vault: Aucun secret en clair (HashiCorp Vault ou équivalent)
Testing in Staging: Tous déploiements testés en staging d'abord
Monitoring Required: Alerting configuré avant mise en production
Backup Verification: Backups testés mensuellement (restore test)
Documentation: Runbooks à jour pour toutes procedures critiques

1. DEPLOYMENT PHILOSOPHY

1.1 Deployment Principles

Twelve-Factor App:

Codebase: One codebase tracked in Git, many deploys
Dependencies: Explicitly declare and isolate (go.mod, Cargo.lock, package-lock.json)
Config: Store config in environment (never in code)
Backing Services: Treat as attached resources (DB, Redis, S3)
Build, Release, Run: Strictly separate build and run stages
Processes: Execute app as stateless processes
Port Binding: Export services via port binding
Concurrency: Scale out via process model
Disposability: Fast startup and graceful shutdown
Dev/Prod Parity: Keep development, staging, production similar
Logs: Treat logs as event streams
Admin Processes: Run admin/management tasks as one-off processes

1.2 Deployment Environments

Environment	Purpose	Update Frequency	Users
Development	Local development	Continuous	Developers
Staging	Pre-production testing	Daily	QA, Product Team
Production	Live users	Multiple/day	All users

1.3 Deployment Workflow

┌─────────────┐
│   Develop   │ ─── git push ───> CI/CD Triggered
└─────────────┘
       │
       ▼
┌─────────────┐
│  Build      │ ─── Tests, Linting, Security Scan
└─────────────┘
       │
       ▼
┌─────────────┐
│  Staging    │ ─── Deploy to staging, E2E tests
└─────────────┘
       │
       ▼
┌─────────────┐
│ Production  │ ─── Blue-Green / Canary deployment
└─────────────┘
       │
       ▼
┌─────────────┐
│  Monitor    │ ─── Health checks, metrics, logs
└─────────────┘
       │
       ▼ (if issues)
┌─────────────┐
│  Rollback   │ ─── Automatic rollback < 5 min
└─────────────┘

2. INFRASTRUCTURE AS CODE

2.1 Terraform Configuration

Project Structure:

terraform/
├── environments/
│   ├── production/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── terraform.tfvars (encrypted)
│   │   └── outputs.tf
│   └── staging/
│       ├── main.tf
│       ├── variables.tf
│       ├── terraform.tfvars
│       └── outputs.tf
├── modules/
│   ├── compute/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── database/
│   ├── networking/
│   ├── storage/
│   └── kubernetes/
└── backend.tf (Terraform state in S3)

Example: Compute Module:

# terraform/modules/compute/main.tf
resource "aws_instance" "app_server" {
  count         = var.instance_count
  ami           = var.ami_id
  instance_type = var.instance_type
  
  vpc_security_group_ids = [aws_security_group.app.id]
  subnet_id              = var.subnet_ids[count.index % length(var.subnet_ids)]
  
  user_data = templatefile("${path.module}/user_data.sh", {
    environment = var.environment
  })
  
  tags = {
    Name        = "veza-app-${var.environment}-${count.index + 1}"
    Environment = var.environment
    ManagedBy   = "Terraform"
  }
  
  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_security_group" "app" {
  name        = "veza-app-${var.environment}"
  description = "Security group for Veza application servers"
  vpc_id      = var.vpc_id
  
  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Database Module:

# terraform/modules/database/main.tf
resource "aws_db_instance" "postgres" {
  identifier     = "veza-db-${var.environment}"
  engine         = "postgres"
  engine_version = "15.4"
  instance_class = var.instance_class
  
  allocated_storage     = var.allocated_storage
  max_allocated_storage = var.max_allocated_storage
  storage_encrypted     = true
  kms_key_id           = var.kms_key_id
  
  db_name  = var.database_name
  username = var.master_username
  password = var.master_password # From Vault
  
  vpc_security_group_ids = [aws_security_group.database.id]
  db_subnet_group_name   = aws_db_subnet_group.database.name
  
  backup_retention_period = var.backup_retention_days
  backup_window          = "03:00-04:00"
  maintenance_window     = "mon:04:00-mon:05:00"
  
  multi_az               = var.multi_az
  publicly_accessible    = false
  skip_final_snapshot    = false
  final_snapshot_identifier = "veza-db-${var.environment}-final-snapshot-${formatdate("YYYY-MM-DD-hhmm", timestamp())}"
  
  enabled_cloudwatch_logs_exports = ["postgresql", "upgrade"]
  
  tags = {
    Name        = "veza-db-${var.environment}"
    Environment = var.environment
    ManagedBy   = "Terraform"
  }
}

Terraform Workflow:

# Initialize
cd terraform/environments/production
terraform init

# Plan (review changes)
terraform plan -out=tfplan

# Apply (execute changes)
terraform apply tfplan

# Destroy (cleanup)
terraform destroy

2.2 Ansible Configuration

Playbook Structure:

ansible/
├── inventory/
│   ├── production/
│   │   ├── hosts.yml
│   │   └── group_vars/
│   └── staging/
│       ├── hosts.yml
│       └── group_vars/
├── playbooks/
│   ├── deploy-backend.yml
│   ├── deploy-chat-server.yml
│   ├── deploy-stream-server.yml
│   └── deploy-frontend.yml
├── roles/
│   ├── common/
│   ├── docker/
│   ├── nginx/
│   ├── postgres/
│   └── monitoring/
└── ansible.cfg

Deployment Playbook:

# ansible/playbooks/deploy-backend.yml
---
- name: Deploy Veza Backend API
  hosts: backend_servers
  become: yes
  
  vars:
    app_name: veza-backend-api
    app_version: "{{ lookup('env', 'VERSION') | default('latest') }}"
    docker_image: "registry.veza.app/{{ app_name }}:{{ app_version }}"
    
  tasks:
    - name: Pull Docker image
      docker_image:
        name: "{{ docker_image }}"
        source: pull
        
    - name: Stop old container
      docker_container:
        name: "{{ app_name }}"
        state: stopped
      ignore_errors: yes
      
    - name: Remove old container
      docker_container:
        name: "{{ app_name }}"
        state: absent
      ignore_errors: yes
      
    - name: Start new container
      docker_container:
        name: "{{ app_name }}"
        image: "{{ docker_image }}"
        state: started
        restart_policy: unless-stopped
        ports:
          - "8080:8080"
        env:
          DATABASE_URL: "{{ database_url }}"
          REDIS_URL: "{{ redis_url }}"
          JWT_SECRET: "{{ jwt_secret }}"
        volumes:
          - "/var/log/{{ app_name }}:/var/log/app"
        healthcheck:
          test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
          interval: 30s
          timeout: 10s
          retries: 3
          start_period: 40s
          
    - name: Wait for application to be healthy
      uri:
        url: http://localhost:8080/health
        status_code: 200
      register: result
      until: result.status == 200
      retries: 10
      delay: 5
      
    - name: Verify deployment
      debug:
        msg: "{{ app_name }} version {{ app_version }} deployed successfully"

3. CONTAINERIZATION

3.1 Docker Images

Multi-Stage Build (Go):

# veza-backend-api/Dockerfile
# Stage 1: Builder
FROM golang:1.21.5-alpine3.18 AS builder

WORKDIR /app

# Copy dependencies
COPY go.mod go.sum ./
RUN go mod download

# Copy source
COPY . .

# Build binary
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -ldflags="-w -s" -o main ./cmd/api

# Stage 2: Runner
FROM alpine:3.18

# Install CA certificates for HTTPS
RUN apk --no-cache add ca-certificates

WORKDIR /root/

# Copy binary from builder
COPY --from=builder /app/main .

# Create non-root user
RUN addgroup -g 1000 appuser && \
    adduser -D -u 1000 -G appuser appuser

USER appuser

# Expose port
EXPOSE 8080

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
  CMD ["/root/main", "healthcheck"]

# Run
ENTRYPOINT ["./main"]

Multi-Stage Build (Rust):

# veza-chat-server/Dockerfile
FROM rust:1.75-alpine AS builder

WORKDIR /app

RUN apk add --no-cache musl-dev

# Copy dependencies
COPY Cargo.toml Cargo.lock ./
RUN mkdir src && echo "fn main() {}" > src/main.rs && cargo build --release && rm -rf src

# Copy source
COPY . .

# Build binary
RUN cargo build --release

# Stage 2: Runner
FROM alpine:3.18

WORKDIR /app

# Copy binary
COPY --from=builder /app/target/release/veza-chat-server .

# Create non-root user
RUN addgroup -g 1000 appuser && \
    adduser -D -u 1000 -G appuser appuser

USER appuser

EXPOSE 8081

HEALTHCHECK --interval=30s --timeout=10s --start-period=20s --retries=3 \
  CMD ["wget", "--quiet", "--tries=1", "--spider", "http://localhost:8081/health"]

ENTRYPOINT ["./veza-chat-server"]

Frontend (React/Vite):

# apps/web/Dockerfile
FROM node:20-alpine AS builder

WORKDIR /app

COPY package*.json ./
RUN npm ci

COPY . .
RUN npm run build

# Stage 2: Nginx
FROM nginx:1.25-alpine

COPY --from=builder /app/dist /usr/share/nginx/html
COPY nginx.conf /etc/nginx/conf.d/default.conf

EXPOSE 80

HEALTHCHECK --interval=30s --timeout=3s --start-period=10s --retries=3 \
  CMD ["wget", "--quiet", "--tries=1", "--spider", "http://localhost/health"]

CMD ["nginx", "-g", "daemon off;"]

3.2 Docker Compose (Development)

# docker-compose.yml
version: '3.9'

services:
  postgres:
    image: postgres:15-alpine
    environment:
      POSTGRES_DB: veza_db
      POSTGRES_USER: veza
      POSTGRES_PASSWORD: ${DB_PASSWORD:-password}
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U veza"]
      interval: 10s
      timeout: 5s
      retries: 5

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 3s
      retries: 5

  backend:
    build:
      context: ./veza-backend-api
      dockerfile: Dockerfile
    ports:
      - "8080:8080"
    environment:
      DATABASE_URL: postgresql://veza:${DB_PASSWORD:-password}@postgres:5432/veza_db
      REDIS_URL: redis://redis:6379
      JWT_SECRET: ${JWT_SECRET}
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  chat-server:
    build:
      context: ./veza-chat-server
      dockerfile: Dockerfile
    ports:
      - "8081:8081"
    environment:
      DATABASE_URL: postgresql://veza:${DB_PASSWORD:-password}@postgres:5432/veza_db
      REDIS_URL: redis://redis:6379
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy

  frontend:
    build:
      context: ./apps/web
      dockerfile: Dockerfile
    ports:
      - "3000:80"
    depends_on:
      - backend

volumes:
  postgres_data:
  redis_data:

4. KUBERNETES ORCHESTRATION

4.1 Kubernetes Manifests

Deployment (Backend):

# k8s/backend/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: veza-backend
  namespace: veza-production
  labels:
    app: veza-backend
    version: v1.0.0
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: veza-backend
  template:
    metadata:
      labels:
        app: veza-backend
        version: v1.0.0
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
      containers:
      - name: backend
        image: registry.veza.app/veza-backend-api:v1.0.0
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: veza-secrets
              key: database-url
        - name: REDIS_URL
          valueFrom:
            secretKeyRef:
              name: veza-secrets
              key: redis-url
        - name: JWT_SECRET
          valueFrom:
            secretKeyRef:
              name: veza-secrets
              key: jwt-secret
        resources:
          requests:
            cpu: 500m
            memory: 512Mi
          limits:
            cpu: 1000m
            memory: 1Gi
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 3
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
      imagePullSecrets:
      - name: registry-credentials

Service:

# k8s/backend/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: veza-backend
  namespace: veza-production
spec:
  type: ClusterIP
  selector:
    app: veza-backend
  ports:
  - name: http
    port: 80
    targetPort: 8080
    protocol: TCP

Ingress:

# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: veza-ingress
  namespace: veza-production
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/rate-limit: "100"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - api.veza.app
    - veza.app
    secretName: veza-tls
  rules:
  - host: api.veza.app
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: veza-backend
            port:
              number: 80
  - host: veza.app
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: veza-frontend
            port:
              number: 80

HorizontalPodAutoscaler:

# k8s/backend/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: veza-backend-hpa
  namespace: veza-production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: veza-backend
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Pods
        value: 1
        periodSeconds: 60

5. CI/CD PIPELINES

5.1 GitHub Actions Workflow

# .github/workflows/deploy-production.yml
name: Deploy to Production

on:
  push:
    branches:
      - main
    tags:
      - 'v*'

env:
  REGISTRY: registry.veza.app
  KUBE_NAMESPACE: veza-production

jobs:
  build-and-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Run tests
        run: |
          make test-all          
      
      - name: Security scan
        run: |
          make security-scan          

  build-backend:
    needs: build-and-test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
      
      - name: Login to Registry
        uses: docker/login-action@v2
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ secrets.REGISTRY_USERNAME }}
          password: ${{ secrets.REGISTRY_PASSWORD }}
      
      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v4
        with:
          images: ${{ env.REGISTRY }}/veza-backend-api
          tags: |
            type=ref,event=branch
            type=ref,event=pr
            type=semver,pattern={{version}}
            type=semver,pattern={{major}}.{{minor}}
            type=sha,prefix={{branch}}-            
      
      - name: Build and push
        uses: docker/build-push-action@v4
        with:
          context: ./veza-backend-api
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=registry,ref=${{ env.REGISTRY }}/veza-backend-api:buildcache
          cache-to: type=registry,ref=${{ env.REGISTRY }}/veza-backend-api:buildcache,mode=max

  deploy-staging:
    needs: [build-backend]
    runs-on: ubuntu-latest
    environment: staging
    steps:
      - uses: actions/checkout@v3
      
      - name: Deploy to Staging
        run: |
          kubectl set image deployment/veza-backend \
            backend=${{ env.REGISTRY }}/veza-backend-api:${{ github.sha }} \
            -n veza-staging
          kubectl rollout status deployment/veza-backend -n veza-staging --timeout=5m          
      
      - name: Run E2E tests
        run: |
          npm run test:e2e -- --env=staging          

  deploy-production:
    needs: [deploy-staging]
    runs-on: ubuntu-latest
    environment: production
    steps:
      - uses: actions/checkout@v3
      
      - name: Setup kubectl
        uses: azure/setup-kubectl@v3
        with:
          version: 'v1.28.0'
      
      - name: Configure kubectl
        run: |
          echo "${{ secrets.KUBE_CONFIG }}" | base64 -d > kubeconfig
          export KUBECONFIG=./kubeconfig          
      
      - name: Deploy to Production (Blue-Green)
        run: |
          # Deploy green environment
          kubectl apply -f k8s/backend/deployment-green.yaml
          kubectl rollout status deployment/veza-backend-green -n ${{ env.KUBE_NAMESPACE }} --timeout=10m
          
          # Run smoke tests
          make smoke-tests ENDPOINT=https://green.api.veza.app
          
          # Switch traffic to green
          kubectl patch service veza-backend -n ${{ env.KUBE_NAMESPACE }} \
            -p '{"spec":{"selector":{"version":"green"}}}'
          
          # Wait for validation
          sleep 60
          
          # Monitor metrics
          if ! make verify-deployment; then
            echo "Deployment verification failed, rolling back..."
            kubectl patch service veza-backend -n ${{ env.KUBE_NAMESPACE }} \
              -p '{"spec":{"selector":{"version":"blue"}}}'
            exit 1
          fi
          
          # Delete old blue deployment
          kubectl delete deployment veza-backend-blue -n ${{ env.KUBE_NAMESPACE }}          
      
      - name: Notify Slack
        if: always()
        uses: slackapi/slack-github-action@v1
        with:
          payload: |
            {
              "text": "Production deployment ${{ job.status }}: ${{ github.sha }}"
            }            
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}

6. ZERO-DOWNTIME STRATEGIES

6.1 Blue-Green Deployment

Process:

Blue (current production) serves all traffic
Deploy Green (new version) in parallel
Test Green thoroughly (smoke tests, health checks)
Switch load balancer from Blue to Green (instant cutover)
Monitor Green for issues (5-10 min)
If issues: Rollback to Blue (instant)
If stable: Decommission Blue

Kubernetes Implementation:

# Deploy green
kubectl apply -f k8s/backend/deployment-green.yaml

# Wait for readiness
kubectl wait --for=condition=available --timeout=10m deployment/veza-backend-green

# Switch service selector
kubectl patch service veza-backend -p '{"spec":{"selector":{"version":"green"}}}'

# Monitor
watch kubectl get pods -l app=veza-backend

# Rollback if needed
kubectl patch service veza-backend -p '{"spec":{"selector":{"version":"blue"}}}'

6.2 Canary Deployment

Process:

Deploy new version (canary) with 5% traffic
Monitor metrics (error rate, latency)
Gradually increase traffic: 5% → 25% → 50% → 100%
At each stage, verify metrics are healthy
If issues detected: Rollback immediately

Kubernetes with Istio:

# k8s/canary/virtualservice.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: veza-backend
spec:
  hosts:
  - veza-backend
  http:
  - match:
    - headers:
        canary:
          exact: "true"
    route:
    - destination:
        host: veza-backend
        subset: canary
  - route:
    - destination:
        host: veza-backend
        subset: stable
      weight: 95
    - destination:
        host: veza-backend
        subset: canary
      weight: 5

Automated Canary with Flagger:

# k8s/canary/flagger-canary.yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: veza-backend
  namespace: veza-production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: veza-backend
  service:
    port: 80
  analysis:
    interval: 1m
    threshold: 5
    maxWeight: 50
    stepWeight: 10
    metrics:
    - name: request-success-rate
      thresholdRange:
        min: 99
      interval: 1m
    - name: request-duration
      thresholdRange:
        max: 500
      interval: 1m
  webhooks:
    - name: acceptance-test
      type: pre-rollout
      url: http://flagger-loadtester.test/
      timeout: 30s
      metadata:
        type: bash
        cmd: "curl -s http://veza-backend-canary/health | grep -q ok"

7. CONFIGURATION MANAGEMENT

7.1 ConfigMap (Non-Sensitive Config)

# k8s/backend/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: veza-backend-config
  namespace: veza-production
data:
  APP_ENV: "production"
  LOG_LEVEL: "info"
  API_RATE_LIMIT: "300"
  MAX_UPLOAD_SIZE: "500MB"
  CORS_ORIGINS: "https://veza.app,https://www.veza.app"

7.2 Secrets (Sensitive Data)

# k8s/backend/secret.yaml (encrypted with SOPS or sealed-secrets)
apiVersion: v1
kind: Secret
metadata:
  name: veza-secrets
  namespace: veza-production
type: Opaque
data:
  database-url: <base64-encoded>
  redis-url: <base64-encoded>
  jwt-secret: <base64-encoded>
  stripe-api-key: <base64-encoded>

Create Secret from Vault:

# Fetch from Vault and create K8s secret
vault kv get -field=database_url secret/veza/production | base64 | \
  kubectl create secret generic veza-secrets \
    --from-literal=database-url=- \
    -n veza-production

8. SECRETS MANAGEMENT

8.1 HashiCorp Vault

Vault Structure:

secret/
├── veza/
│   ├── production/
│   │   ├── database_url
│   │   ├── redis_url
│   │   ├── jwt_secret
│   │   ├── stripe_api_key
│   │   ├── aws_access_key
│   │   └── aws_secret_key
│   └── staging/
│       └── ...

Store Secret:

# Write secret
vault kv put secret/veza/production \
  database_url="postgresql://..." \
  redis_url="redis://..." \
  jwt_secret="..."

# Read secret
vault kv get secret/veza/production

# Rotate secret (new version)
vault kv put secret/veza/production jwt_secret="new-secret"

Vault Agent Injector (Kubernetes):

apiVersion: v1
kind: Pod
metadata:
  annotations:
    vault.hashicorp.com/agent-inject: "true"
    vault.hashicorp.com/role: "veza-backend"
    vault.hashicorp.com/agent-inject-secret-database: "secret/data/veza/production"
    vault.hashicorp.com/agent-inject-template-database: |
      {{- with secret "secret/data/veza/production" -}}
      export DATABASE_URL="{{ .Data.data.database_url }}"
      {{- end }}

9. MONITORING & OBSERVABILITY

9.1 Prometheus + Grafana

Prometheus Configuration:

# prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'veza-backend'
    kubernetes_sd_configs:
    - role: pod
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_label_app]
      action: keep
      regex: veza-backend
    - source_labels: [__meta_kubernetes_pod_ip]
      target_label: __address__
      replacement: $1:8080

  - job_name: 'postgres'
    static_configs:
    - targets: ['postgres-exporter:9187']

  - job_name: 'redis'
    static_configs:
    - targets: ['redis-exporter:9121']

Grafana Dashboard:

API Latency: p50, p95, p99 response times
Throughput: Requests per second
Error Rate: 4xx, 5xx errors
Database: Query time, connections, slow queries
Cache Hit Rate: Redis hit/miss ratio

9.2 Logging (ELK Stack)

Filebeat Configuration:

# filebeat/filebeat.yml
filebeat.inputs:
- type: container
  paths:
    - '/var/lib/docker/containers/*/*.log'
  processors:
    - add_kubernetes_metadata:
        host: ${NODE_NAME}
        matchers:
        - logs_path:
            logs_path: "/var/lib/docker/containers/"

output.elasticsearch:
  hosts: ["elasticsearch:9200"]
  index: "veza-logs-%{+yyyy.MM.dd}"

9.3 Tracing (Jaeger)

OpenTelemetry Integration:

// Go - OpenTelemetry setup
import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/exporters/jaeger"
    "go.opentelemetry.io/otel/sdk/trace"
)

func initTracer() (*trace.TracerProvider, error) {
    exporter, err := jaeger.New(jaeger.WithCollectorEndpoint(jaeger.WithEndpoint("http://jaeger:14268/api/traces")))
    if err != nil {
        return nil, err
    }
    
    tp := trace.NewTracerProvider(
        trace.WithBatcher(exporter),
        trace.WithResource(resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceNameKey.String("veza-backend-api"),
        )),
    )
    
    otel.SetTracerProvider(tp)
    return tp, nil
}

10. BACKUP & DISASTER RECOVERY

10.1 Database Backups

Automated Backup Strategy:

Daily: Full backup (3 AM UTC)
Hourly: Incremental backup
Retention: 30 days daily, 12 weeks weekly, 2 years monthly

Backup Script:

#!/bin/bash
# scripts/backup-database.sh

DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/backups/postgres"
DATABASE="veza_db"

# Full backup
pg_dump -Fc -f "$BACKUP_DIR/veza_db_$DATE.dump" "$DATABASE"

# Encrypt
gpg --encrypt --recipient backup@veza.app "$BACKUP_DIR/veza_db_$DATE.dump"

# Upload to S3
aws s3 cp "$BACKUP_DIR/veza_db_$DATE.dump.gpg" s3://veza-backups/postgres/

# Cleanup local backups > 7 days
find "$BACKUP_DIR" -name "*.dump.gpg" -mtime +7 -delete

Restore Procedure:

#!/bin/bash
# scripts/restore-database.sh

BACKUP_FILE=$1

# Download from S3
aws s3 cp "s3://veza-backups/postgres/$BACKUP_FILE" /tmp/

# Decrypt
gpg --decrypt "/tmp/$BACKUP_FILE" > "/tmp/backup.dump"

# Restore
pg_restore -d veza_db "/tmp/backup.dump"

10.2 Disaster Recovery Plan

RTO (Recovery Time Objective): < 4 hours
RPO (Recovery Point Objective): < 1 hour

Recovery Procedures:

Database Failure: Failover to standby replica (< 5 min)
Application Failure: Rollback deployment (< 5 min)
Complete Region Failure: Failover to DR region (< 4 hours)

11. SCALING STRATEGY

11.1 Horizontal Scaling

Auto-Scaling Rules:

CPU > 70%: Scale up
CPU < 30%: Scale down (after 5 min stability)
Memory > 80%: Scale up
Request queue > 100: Scale up

11.2 Database Scaling

Read Replicas:

2 read replicas minimum
Route read queries to replicas
Write queries to primary only

Connection Pooling (PgBouncer):

[databases]
veza_db = host=postgres port=5432 dbname=veza_db

[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
reserve_pool_size = 5

12. OPERATIONAL PROCEDURES

12.1 Deployment Checklist

Pre-Deployment:

All tests pass (unit, integration, E2E)
Security scan completed (no critical vulnerabilities)
Database migrations tested in staging
Rollback plan documented
Monitoring dashboards ready
On-call engineer notified
Deployment window scheduled (low-traffic period)

During Deployment:

Monitor error rates in real-time
Monitor response times (p95, p99)
Check logs for errors
Verify database migrations applied
Test critical user flows

Post-Deployment:

Verify all services healthy
Run smoke tests
Monitor for 30 minutes
Update deployment log
Notify stakeholders

12.2 Rollback Procedure

Immediate Rollback (< 5 min):

# Kubernetes
kubectl rollout undo deployment/veza-backend -n veza-production

# Verify
kubectl rollout status deployment/veza-backend -n veza-production

# Check logs
kubectl logs -f deployment/veza-backend -n veza-production

12.3 Incident Response

Severity Levels:

P0 (Critical): Production down, data breach
P1 (High): Major feature broken, performance degradation
P2 (Medium): Minor feature broken
P3 (Low): Cosmetic issues

Response Procedure:

Acknowledge incident (< 5 min)
Assess severity
Notify stakeholders
Mitigate (rollback, hotfix, scaling)
Root cause analysis
Post-mortem

13. CORRECTIFS DE SÉCURITÉ PRIORITAIRES

Identifiés lors de l'audit sécurité du 2026-03-04. Ces procédures sont bloquantes pour tout déploiement en production.

13.1 Rotation du Secret JWT

Le secret JWT doit être rotaté régulièrement (minimum trimestriel) et immédiatement en cas de suspicion de compromission.

Procédure de rotation :

#!/bin/bash
# scripts/rotate-jwt-secret.sh
set -euo pipefail

NEW_SECRET=$(openssl rand -base64 64)

# 1. Stocker le nouveau secret dans Vault
vault kv put secret/veza/production jwt_secret="$NEW_SECRET"

# 2. Mettre à jour le secret Kubernetes
kubectl create secret generic veza-jwt-secret \
  --from-literal=jwt-secret="$NEW_SECRET" \
  --dry-run=client -o yaml | kubectl apply -f - -n veza-production

# 3. Redémarrage progressif (rolling restart) pour charger le nouveau secret
kubectl rollout restart deployment/veza-backend -n veza-production
kubectl rollout restart deployment/veza-chat-server -n veza-production

# 4. Attendre que le rollout soit terminé
kubectl rollout status deployment/veza-backend -n veza-production --timeout=5m
kubectl rollout status deployment/veza-chat-server -n veza-production --timeout=5m

echo "JWT secret rotation complete. Old tokens will expire naturally."

Points critiques :

Pendant la rotation, les anciens tokens restent valides jusqu'à leur expiration naturelle (configurer une durée de vie courte : 15 min access, 7 jours refresh)
Tester en staging avant chaque rotation production
Logger l'événement de rotation (sans le secret) dans l'audit log

13.2 Alignement JWT Issuer/Audience Go ↔ Rust

Le backend Go et le chat-server Rust doivent utiliser les mêmes claims JWT (iss, aud) pour éviter les rejets de tokens inter-services.

Configuration alignée :

# Configuration partagée — identique Go et Rust
jwt:
  issuer: "https://api.veza.app"
  audience: "https://veza.app"
  algorithm: "HS256"   # Migrer vers RS256 — voir section 15
  access_token_ttl: "15m"
  refresh_token_ttl: "7d"

Vérification Go :

// veza-backend-api/internal/auth/jwt.go
claims := jwt.MapClaims{
    "iss": "https://api.veza.app",
    "aud": "https://veza.app",
    "sub": userID,
    "exp": time.Now().Add(15 * time.Minute).Unix(),
    "iat": time.Now().Unix(),
}

Vérification Rust :

// veza-chat-server/src/auth.rs
let validation = Validation::new(Algorithm::HS256);
validation.set_issuer(&["https://api.veza.app"]);
validation.set_audience(&["https://veza.app"]);

Procédure de déploiement :

Mettre à jour la config Rust pour accepter le même iss/aud
Déployer le chat-server Rust en premier (il accepte les tokens existants + nouveaux)
Mettre à jour la config Go
Déployer le backend Go
Vérifier les logs des deux services : zéro erreur invalid issuer ou invalid audience

13.3 Protection de la Route /metrics

La route /metrics (Prometheus) expose des métriques internes et ne doit jamais être accessible publiquement.

Nginx/Ingress — bloquer l'accès externe :

# k8s/ingress.yaml — ajout d'annotation
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: veza-ingress
  annotations:
    nginx.ingress.kubernetes.io/server-snippet: |
      location /metrics {
        deny all;
        return 404;
      }

NetworkPolicy — restreindre au namespace monitoring :

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-metrics-from-prometheus
  namespace: veza-production
spec:
  podSelector:
    matchLabels:
      app: veza-backend
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: monitoring
    ports:
    - port: 8080
      protocol: TCP

Vérification post-déploiement :

# Doit retourner 404 ou connexion refusée depuis l'extérieur
curl -s -o /dev/null -w "%{http_code}" https://api.veza.app/metrics
# Expected: 404

# Doit fonctionner depuis le pod Prometheus
kubectl exec -n monitoring deploy/prometheus -- curl -s http://veza-backend.veza-production/metrics | head -5
# Expected: 200 avec métriques

14. CHECKLIST DE DÉPLOIEMENT ÉTHIQUE

Avant chaque déploiement en production, les points suivants doivent être vérifiés. Cette checklist complète la checklist opérationnelle standard (section 12.1).

14.1 Protection des Données Personnelles

Métriques Prometheus : vérifier qu'aucune métrique n'expose de données personnelles (emails, IPs, user agents complets, identifiants utilisateur)

# Vérification automatisée — aucune métrique ne doit contenir ces patterns
kubectl exec -n monitoring deploy/prometheus -- \
  curl -s http://veza-backend.veza-production/metrics | \
  grep -iE '(email|user_agent|ip_address|@)' && \
  echo "FAIL: Personal data found in metrics" && exit 1 || \
  echo "PASS: No personal data in metrics"

Logs anonymisés : confirmer que les logs applicatifs ne contiennent pas de données personnelles en clair

# Vérification sur les 1000 dernières lignes de logs
kubectl logs deployment/veza-backend -n veza-production --tail=1000 | \
  grep -iE '([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})' && \
  echo "FAIL: Email addresses found in logs" && exit 1 || \
  echo "PASS: No emails in logs"

Headers HTTP : aucun header de tracking ou fingerprinting n'est émis par les services

14.2 Conformité RGPD

Export des données : l'endpoint POST /api/v1/me/data-export retourne toutes les données de l'utilisateur authentifié (profil, tracks, playlists, historique)
Suppression de compte : l'endpoint DELETE /api/v1/me supprime l'utilisateur et toutes ses données associées (cascade)
Consentement cookies : seuls les cookies strictement nécessaires (session JWT) sont envoyés sans consentement

14.3 Intégrité Éthique

Pas de dépendance AI/ML : aucune image Docker ne contient de framework ML (TensorFlow, PyTorch, scikit-learn, ONNX)
Pas de tracking tiers : aucun script Google Analytics, Facebook Pixel, ou équivalent dans le bundle frontend
Algorithme de découverte : les tests de biais (section 14 de ORIGIN_TESTING_STRATEGY) passent en CI

15. PLAN DE MIGRATION JWT HS256 → RS256

15.1 Motivation

HS256 (HMAC symétrique) nécessite le partage du même secret entre tous les services. RS256 (RSA asymétrique) permet au backend Go de signer avec une clé privée tandis que les autres services (Rust, frontend) vérifient avec la clé publique, réduisant la surface d'attaque.

15.2 Architecture Cible

┌──────────────────┐         ┌──────────────────┐
│  Backend Go      │         │  Chat Server     │
│  (signe JWT)     │         │  Rust            │
│                  │         │  (vérifie JWT)   │
│  Clé PRIVÉE RS256│         │  Clé PUBLIQUE    │
└──────────────────┘         └──────────────────┘
         │                            │
         │      Même clé publique     │
         └────────────┬───────────────┘
                      │
              ┌───────▼───────┐
              │  Frontend     │
              │  (vérifie JWT │
              │  optionnel)   │
              │  Clé PUBLIQUE │
              └───────────────┘

15.3 Étapes de Migration

Phase 1 — Préparation (semaine 1) :

# Générer la paire de clés RSA 4096 bits
openssl genrsa -out jwt-private.pem 4096
openssl rsa -in jwt-private.pem -pubout -out jwt-public.pem

# Stocker dans Vault
vault kv put secret/veza/production \
  jwt_private_key=@jwt-private.pem \
  jwt_public_key=@jwt-public.pem

# Nettoyer les fichiers locaux
shred -u jwt-private.pem jwt-public.pem

Phase 2 — Double validation (semaine 2) :

Modifier les services pour accepter les deux algorithmes pendant la transition :

// veza-backend-api — signe en RS256, valide HS256 et RS256
func (a *Auth) ValidateToken(tokenString string) (*Claims, error) {
    token, err := jwt.Parse(tokenString, func(t *jwt.Token) (interface{}, error) {
        switch t.Method.(type) {
        case *jwt.SigningMethodRSA:
            return a.rsaPublicKey, nil
        case *jwt.SigningMethodHMAC:
            return []byte(a.hmacSecret), nil
        default:
            return nil, fmt.Errorf("unexpected signing method: %v", t.Header["alg"])
        }
    })
    // ...
}

// veza-chat-server — valide HS256 et RS256
fn validate_token(token: &str, config: &AuthConfig) -> Result<Claims, AuthError> {
    // Try RS256 first, fall back to HS256
    let rs256_result = decode::<Claims>(
        token,
        &DecodingKey::from_rsa_pem(config.rsa_public_key.as_bytes())?,
        &Validation::new(Algorithm::RS256),
    );
    
    match rs256_result {
        Ok(data) => Ok(data.claims),
        Err(_) => {
            let hs256_result = decode::<Claims>(
                token,
                &DecodingKey::from_secret(config.hmac_secret.as_bytes()),
                &Validation::new(Algorithm::HS256),
            );
            hs256_result.map(|d| d.claims).map_err(AuthError::from)
        }
    }
}

Phase 3 — Basculement (semaine 3) :

Déployer le backend Go pour signer exclusivement en RS256
Attendre l'expiration de tous les tokens HS256 (max 7 jours pour les refresh tokens)
Supprimer le code de fallback HS256 des services Go et Rust
Supprimer le secret HS256 de Vault

Phase 4 — Nettoyage (semaine 4) :

# Supprimer l'ancien secret HS256 de Vault
vault kv delete secret/veza/production/jwt_secret_hmac

# Vérifier qu'aucun service n'utilise encore HS256
kubectl logs -l app=veza-backend -n veza-production --since=24h | \
  grep -i "hs256" && echo "WARNING: HS256 still in use" || echo "CLEAN: No HS256 usage"

15.4 Rollback Plan

Si des problèmes sont détectés pendant la migration :

Remettre le backend Go en mode signature HS256
Les services continuent d'accepter les deux formats
Investiguer et corriger avant de retenter

15.5 Critères de Succès

Tous les services acceptent RS256
Aucun token HS256 en circulation (après expiration naturelle)
La clé privée RSA est uniquement accessible au backend Go
Les tests d'intégration inter-services passent avec RS256
Performance : la vérification RS256 < 1ms (benchmarker)

✅ CHECKLIST DE VALIDATION

Infrastructure

Infrastructure as Code (Terraform) complete
All resources versioned in Git
Secrets in Vault (no plaintext)
Automated provisioning tested

Deployment

CI/CD pipeline functional
Zero-downtime deployment strategy (blue-green or canary)
Automated rollback configured
Health checks implemented

Monitoring

Prometheus + Grafana dashboards
Alerting configured (PagerDuty/Slack)
Logging centralized (ELK Stack)
Tracing implemented (Jaeger)

Disaster Recovery

Automated backups (daily + hourly)
Backup restoration tested
Failover procedure documented
RTO < 4h, RPO < 1h validated

📊 MÉTRIQUES DE SUCCÈS

Deployment Metrics

Deployment Frequency: Multiple per day
Lead Time: < 1 hour (commit to production)
MTTR (Mean Time To Recovery): < 5 minutes
Change Failure Rate: < 5%

Operational Metrics

Uptime: > 99.9%
RTO: < 4 hours
RPO: < 1 hour
Deployment Success Rate: > 95%

🔄 HISTORIQUE DES VERSIONS

Version	Date	Changements
1.0.0	2025-11-02	Version initiale - Guide de déploiement complet
2.0.0	2026-03-04	Audit sécurité : ajout correctifs prioritaires (rotation JWT, alignement issuer/audience Go↔Rust, protection /metrics), checklist de déploiement éthique (données personnelles, RGPD, intégrité), plan de migration JWT HS256→RS256.

⚠️ AVERTISSEMENT

CE GUIDE EST IMMUABLE

Document créé par: DevOps Team + SRE
Date de création: 2025-11-02
Dernière révision: 2026-03-04 (audit sécurité)
Prochaine révision: Quarterly (2026-06-01)
Propriétaire: DevOps Lead

Statut: ✅ APPROUVÉ ET VERROUILLÉ

45 KiB Raw Blame History