talas-group/04_INFRA_DEPLOIEMENT/CI_CD/PROCEDURES_DEPLOIEMENT.md

# Procédures de Déploiement Veza

> Du poste développeur à la production sur les R720.
> Prérequis : lire [[04_INFRA_DEPLOIEMENT/Architecture_Serveurs/ARCHITECTURE_INFRA]] pour comprendre la topologie.
> Source : code dans `/home/senke/git/talas/veza/`, Ansible dans `04_INFRA_DEPLOIEMENT/Ansible/`
> Dernière mise à jour : 27 mars 2026.

---

## 1. Environnements

| Environnement | Serveur | `APP_ENV` | Description |
|---------------|---------|-----------|-------------|
| **Développement** | Poste local | `development` | Docker Compose pour l'infra, services en hot reload |
| **Staging** | R720 #2 (conteneur dédié) | `staging` | Réplique de prod avec données de test |
| **Production** | R720 #1 | `production` | Blue-green via HAProxy, monitoring actif |

### Différences clés

| Aspect | Dev | Staging | Prod |
|--------|-----|---------|------|
| CORS | Wildcard `*` | Domaines staging | Domaines stricts |
| CSRF (Redis) | Optionnel | Obligatoire | Obligatoire |
| Swagger / pprof | Activé | Activé | **Désactivé** |
| Logs | Texte, DEBUG | JSON, INFO | JSON, WARN+ |
| ClamAV | Optionnel | Recommandé | **Obligatoire** |
| Rate limiting | Souple | Modéré | Strict (DDoS : 1000/s global, 100/s par IP) |

---

## 2. Setup développeur

### Prérequis

| Outil | Version minimum | Vérification |
|-------|-----------------|--------------|
| Go | 1.24+ | `go version` |
| Rust + Cargo | Stable | `rustc --version` |
| Node.js | 20+ | `node --version` |
| Docker + Compose | 24+ | `docker --version` |
| Make | — | `make --version` |
| Git | — | `git --version` |

### Installation

```bash
# 1. Cloner le monorepo
git clone <url-gitea>/talas/veza.git
cd veza

# 2. Copier et configurer l'environnement
cp .env.example .env
# Éditer .env : ajuster les ports si conflit

# 3. Lancer l'infrastructure (PostgreSQL, Redis, RabbitMQ, MinIO, ES, ClamAV)
docker-compose up -d

# 4. Appliquer les migrations
cd veza-backend-api
go run ./cmd/migrate_tool/main.go up
cd ..

# 5. Lancer les services
make dev          # Stack complète
# OU individuellement :
make dev-web              # Frontend seul (port 5173)
make dev-backend-api      # Backend Go (port 18080)
make dev-stream-server    # Stream Rust (port 18082)
```

### Commandes Makefile courantes

```bash
make dev              # Stack complète avec hot reload
make dev-full         # Tous les services locaux
make infra            # Infrastructure Docker seule
make build            # Build production de tous les conteneurs
make test             # Tous les tests
make test-backend     # Tests Go
make test-frontend    # Tests Vitest
make test-e2e         # Tests Playwright end-to-end
make migrate-up       # Migrations DB
make migrate-down     # Rollback dernière migration
```

### Frontend avec mocks (sans backend)

```bash
npm run dev:mocks     # Active MSW (Mock Service Worker)
```

---

## 3. Pipeline CI/CD

### Quality gates (avant merge)

| Outil | Langage | Seuil |
|-------|---------|-------|
| `golangci-lint` | Go | Zéro erreur |
| `cargo clippy` | Rust | Zéro warning |
| `ESLint` + `tsc --noEmit` | TypeScript | Zéro erreur |
| `npm audit` | Node.js | Pas de vulnérabilité critique |
| `govulncheck` | Go | Pas de vulnérabilité connue |
| `Trivy` | Docker | Pas de CVE critique dans les images |
| Tests Go | Go | Couverture ≥ 60% |
| Playwright | E2E | 17 suites passent |

### Workflow type

```
Commit → Push → CI (lint + tests + security scan)
         ↓
     Review (PR)
         ↓
     Merge → staging auto-deploy
         ↓
     Validation manuelle
         ↓
     Tag release → production deploy
```

CI/CD prévu sur **Woodpecker CI** (self-hosted, compatible Gitea/Forgejo).

---

## 4. Déploiement staging

```bash
# Sur R720 #2 (via JumpServer)

# 1. Pull des images
docker-compose -f docker-compose.staging.yml pull

# 2. Migrations
docker-compose -f docker-compose.staging.yml run --rm backend \
  go run ./cmd/migrate_tool/main.go up

# 3. Redémarrer les services
docker-compose -f docker-compose.staging.yml up -d

# 4. Vérifier la santé
curl -s https://staging.veza.fr/api/v1/healthz
curl -s https://staging.veza.fr/api/v1/readyz
```

---

## 5. Déploiement production (blue-green)

Le déploiement production utilise un schéma **blue-green** via HAProxy. Deux jeux de conteneurs coexistent ; HAProxy bascule entre les deux.

### Procédure

```bash
# Sur R720 #1 (via JumpServer)

# 1. Identifier le slot actif (blue ou green)
# Vérifier la configuration HAProxy pour savoir quel backend est actif

# 2. Construire et démarrer le slot inactif
docker-compose -f docker-compose.prod.yml build
docker-compose -f docker-compose.prod.yml up -d veza-backend-green veza-stream-green veza-frontend-green

# 3. Appliquer les migrations (si nécessaire)
# IMPORTANT : backup PG AVANT toute migration (voir §7)
docker-compose -f docker-compose.prod.yml run --rm backend-green \
  go run ./cmd/migrate_tool/main.go up

# 4. Vérifier la santé du slot inactif
curl -s http://localhost:18081/api/v1/healthz   # port du slot green
curl -s http://localhost:18081/api/v1/readyz

# 5. Basculer HAProxy vers le nouveau slot
# Éditer la configuration HAProxy pour pointer vers green
sudo systemctl reload haproxy

# 6. Vérifier la production
curl -s https://veza.fr/api/v1/healthz
curl -s https://veza.fr/api/v1/health/deep

# 7. Si OK : arrêter l'ancien slot (blue)
docker-compose -f docker-compose.prod.yml stop veza-backend-blue veza-stream-blue veza-frontend-blue

# 8. Si KO : rollback immédiat
# Repointer HAProxy vers l'ancien slot (blue)
sudo systemctl reload haproxy
```

### Checklist pré-déploiement

- [ ] Tous les tests CI passent
- [ ] Staging validé manuellement
- [ ] Backup PostgreSQL effectué (voir §7)
- [ ] Snapshot ZFS pris
- [ ] Migration DB testée en staging
- [ ] Pas de merge freeze en cours

---

## 6. Migrations base de données

### Principes

- Migrations **100% SQL** (pas de GORM AutoMigrate)
- Fichiers dans `veza-backend-api/migrations/` (115+ fichiers numérotés)
- Table `schema_migrations` pour le suivi
- **Toujours backup PG avant une migration en production**

### Commandes

```bash
# Appliquer toutes les migrations en attente
go run ./cmd/migrate_tool/main.go up

# Rollback de la dernière migration
go run ./cmd/migrate_tool/main.go down

# Vérifier l'état des migrations
go run ./cmd/migrate_tool/main.go status
```

### Procédure migration en production

1. **Backup** : `pg_dump -Fc veza > veza_pre_migration_$(date +%Y%m%d).dump`
2. **Snapshot ZFS** : `zfs snapshot pool/pg@pre_migration_$(date +%Y%m%d)`
3. **Appliquer** la migration sur le slot inactif
4. **Vérifier** : tester les endpoints critiques
5. **Si échec** : restaurer depuis le backup ou le snapshot ZFS

---

## 7. Backup et restauration

### PostgreSQL — PITR

| Composant | Description |
|-----------|-------------|
| **WAL archiving** | Les Write-Ahead Logs sont archivés en continu vers R720 #2 |
| **Base backup** | `pg_basebackup` hebdomadaire (dimanche 03:00) |
| **Rétention** | 7 jours de WAL + 4 base backups (1 mois) |
| **Outil Ansible** | Rôle `pg-wal-pull` pour la réplication WAL |

#### Restauration PITR

```bash
# 1. Arrêter PostgreSQL
sudo systemctl stop postgresql

# 2. Restaurer le base backup
pg_restore -d veza veza_backup.dump

# 3. Configurer le recovery target
echo "recovery_target_time = '2026-03-27 14:30:00'" >> recovery.conf

# 4. Démarrer PostgreSQL en mode recovery
sudo systemctl start postgresql
# PG rejoue les WAL jusqu'au point cible
```

### Snapshots ZFS

```bash
# Snapshot manuel avant opération risquée
zfs snapshot pool/data@before_upgrade

# Liste des snapshots
zfs list -t snapshot

# Rollback vers un snapshot
zfs rollback pool/data@before_upgrade
```

### MinIO (stockage objet)

- Données sur pool ZFS mirror → résilience disque native
- Synchronisation inter-serveurs via `mc mirror` (MinIO Client) pour backup croisé
- Fichiers audio = immuables après upload → pas de risque de corruption

---

## 8. Maintenance courante

### Quotidien (automatisé)

| Tâche | Outil | Fréquence |
|-------|-------|-----------|
| Health checks | Prometheus + Alertmanager | Toutes les 30s |
| Rotation des logs | logrotate | Quotidien |
| Scan SMART disques | smartmontools + Zabbix | Quotidien |
| Archivage WAL | pg_wal_archive | Continu |

### Hebdomadaire

| Tâche | Action |
|-------|--------|
| Base backup PostgreSQL | `pg_basebackup` (dimanche 03:00) |
| Vérification espace disque | Alertes Zabbix si > 80% |
| Mise à jour ClamAV signatures | `freshclam` (automatique) |
| Revue des alertes Zabbix/Grafana | Inspection manuelle |

### Mensuel

| Tâche | Action |
|-------|--------|
| Renouvellement certificats TLS | Automatique (certbot + HAProxy) |
| Rotation des snapshots ZFS | Supprimer snapshots > 1 mois |
| Mise à jour sécurité OS | `dnf upgrade --security` |
| Test de restauration backup | Restaurer un dump sur staging, vérifier |

---

## 9. Troubleshooting rapide

| Symptôme | Diagnostic | Action |
|----------|-----------|--------|
| 502 Bad Gateway | Service backend down | `docker-compose logs backend` → redémarrer |
| Latence élevée | PostgreSQL surchargé | `pg_stat_activity`, vérifier les requêtes longues |
| Espace disque plein | Logs ou WAL non archivés | `du -sh /var/log/*`, rotation, nettoyage WAL |
| Uploads échouent | ClamAV down ou MinIO inaccessible | Vérifier les deux services, logs |
| 429 Too Many Requests | Rate limiting | Normal si DDoS. Sinon ajuster les seuils |

---

## Voir aussi

- [[04_INFRA_DEPLOIEMENT/Architecture_Serveurs/ARCHITECTURE_INFRA]] — Topologie des serveurs
- [[03_APPS_&_SERVICES/ARCHITECTURE_VEZA]] — Architecture applicative
- [[03_APPS_&_SERVICES/CONFIGURATION_ENVIRONNEMENT]] — Variables d'environnement et Docker
- [[00_META/Glossaire/GLOSSAIRE_TALAS]] — Glossaire des termes techniques
Initial commit: Talas Group project management & documentation Knowledge base of ~80+ markdown files across 14 domains (00-13), Logseq graph, hardware design files (KiCAD), infrastructure configs, and talas-wiki static site. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-04-04 18:10:41 +00:00			`# Procédures de Déploiement Veza`

			`> Du poste développeur à la production sur les R720.`
			`> Prérequis : lire [[04_INFRA_DEPLOIEMENT/Architecture_Serveurs/ARCHITECTURE_INFRA]] pour comprendre la topologie.`
			> Source : code dans `/home/senke/git/talas/veza/`, Ansible dans `04_INFRA_DEPLOIEMENT/Ansible/`
			`> Dernière mise à jour : 27 mars 2026.`

			`---`

			`## 1. Environnements`

			\| Environnement \| Serveur \| `APP_ENV` \| Description \|
			`\|---------------\|---------\|-----------\|-------------\|`
			\| Développement \| Poste local \| `development` \| Docker Compose pour l'infra, services en hot reload \|
			\| Staging \| R720 #2 (conteneur dédié) \| `staging` \| Réplique de prod avec données de test \|
			\| Production \| R720 #1 \| `production` \| Blue-green via HAProxy, monitoring actif \|

			`### Différences clés`

			`\| Aspect \| Dev \| Staging \| Prod \|`
			`\|--------\|-----\|---------\|------\|`
			\| CORS \| Wildcard `*` \| Domaines staging \| Domaines stricts \|
			`\| CSRF (Redis) \| Optionnel \| Obligatoire \| Obligatoire \|`
			`\| Swagger / pprof \| Activé \| Activé \| Désactivé \|`
			`\| Logs \| Texte, DEBUG \| JSON, INFO \| JSON, WARN+ \|`
			`\| ClamAV \| Optionnel \| Recommandé \| Obligatoire \|`
			`\| Rate limiting \| Souple \| Modéré \| Strict (DDoS : 1000/s global, 100/s par IP) \|`

			`---`

			`## 2. Setup développeur`

			`### Prérequis`

			`\| Outil \| Version minimum \| Vérification \|`
			`\|-------\|-----------------\|--------------\|`
			\| Go \| 1.24+ \| `go version` \|
			\| Rust + Cargo \| Stable \| `rustc --version` \|
			\| Node.js \| 20+ \| `node --version` \|
			\| Docker + Compose \| 24+ \| `docker --version` \|
			\| Make \| — \| `make --version` \|
			\| Git \| — \| `git --version` \|

			`### Installation`

			```bash
			`# 1. Cloner le monorepo`
			`git clone <url-gitea>/talas/veza.git`
			`cd veza`

			`# 2. Copier et configurer l'environnement`
			`cp .env.example .env`
			`# Éditer .env : ajuster les ports si conflit`

			`# 3. Lancer l'infrastructure (PostgreSQL, Redis, RabbitMQ, MinIO, ES, ClamAV)`
			`docker-compose up -d`

			`# 4. Appliquer les migrations`
			`cd veza-backend-api`
			`go run ./cmd/migrate_tool/main.go up`
			`cd ..`

			`# 5. Lancer les services`
			`make dev # Stack complète`
			`# OU individuellement :`
			`make dev-web # Frontend seul (port 5173)`
			`make dev-backend-api # Backend Go (port 18080)`
			`make dev-stream-server # Stream Rust (port 18082)`
			```

			`### Commandes Makefile courantes`

			```bash
			`make dev # Stack complète avec hot reload`
			`make dev-full # Tous les services locaux`
			`make infra # Infrastructure Docker seule`
			`make build # Build production de tous les conteneurs`
			`make test # Tous les tests`
			`make test-backend # Tests Go`
			`make test-frontend # Tests Vitest`
			`make test-e2e # Tests Playwright end-to-end`
			`make migrate-up # Migrations DB`
			`make migrate-down # Rollback dernière migration`
			```

			`### Frontend avec mocks (sans backend)`

			```bash
			`npm run dev:mocks # Active MSW (Mock Service Worker)`
			```

			`---`

			`## 3. Pipeline CI/CD`

			`### Quality gates (avant merge)`

			`\| Outil \| Langage \| Seuil \|`
			`\|-------\|---------\|-------\|`
			\| `golangci-lint` \| Go \| Zéro erreur \|
			\| `cargo clippy` \| Rust \| Zéro warning \|
			\| `ESLint` + `tsc --noEmit` \| TypeScript \| Zéro erreur \|
			\| `npm audit` \| Node.js \| Pas de vulnérabilité critique \|
			\| `govulncheck` \| Go \| Pas de vulnérabilité connue \|
			\| `Trivy` \| Docker \| Pas de CVE critique dans les images \|
			`\| Tests Go \| Go \| Couverture ≥ 60% \|`
			`\| Playwright \| E2E \| 17 suites passent \|`

			`### Workflow type`

			```
			`Commit → Push → CI (lint + tests + security scan)`
			`↓`
			`Review (PR)`
			`↓`
			`Merge → staging auto-deploy`
			`↓`
			`Validation manuelle`
			`↓`
			`Tag release → production deploy`
			```

			`CI/CD prévu sur Woodpecker CI (self-hosted, compatible Gitea/Forgejo).`

			`---`

			`## 4. Déploiement staging`

			```bash
			`# Sur R720 #2 (via JumpServer)`

			`# 1. Pull des images`
			`docker-compose -f docker-compose.staging.yml pull`

			`# 2. Migrations`
			`docker-compose -f docker-compose.staging.yml run --rm backend \`
			`go run ./cmd/migrate_tool/main.go up`

			`# 3. Redémarrer les services`
			`docker-compose -f docker-compose.staging.yml up -d`

			`# 4. Vérifier la santé`
			`curl -s https://staging.veza.fr/api/v1/healthz`
			`curl -s https://staging.veza.fr/api/v1/readyz`
			```

			`---`

			`## 5. Déploiement production (blue-green)`

			`Le déploiement production utilise un schéma blue-green via HAProxy. Deux jeux de conteneurs coexistent ; HAProxy bascule entre les deux.`

			`### Procédure`

			```bash
			`# Sur R720 #1 (via JumpServer)`

			`# 1. Identifier le slot actif (blue ou green)`
			`# Vérifier la configuration HAProxy pour savoir quel backend est actif`

			`# 2. Construire et démarrer le slot inactif`
			`docker-compose -f docker-compose.prod.yml build`
			`docker-compose -f docker-compose.prod.yml up -d veza-backend-green veza-stream-green veza-frontend-green`

			`# 3. Appliquer les migrations (si nécessaire)`
			`# IMPORTANT : backup PG AVANT toute migration (voir §7)`
			`docker-compose -f docker-compose.prod.yml run --rm backend-green \`
			`go run ./cmd/migrate_tool/main.go up`

			`# 4. Vérifier la santé du slot inactif`
			`curl -s http://localhost:18081/api/v1/healthz # port du slot green`
			`curl -s http://localhost:18081/api/v1/readyz`

			`# 5. Basculer HAProxy vers le nouveau slot`
			`# Éditer la configuration HAProxy pour pointer vers green`
			`sudo systemctl reload haproxy`

			`# 6. Vérifier la production`
			`curl -s https://veza.fr/api/v1/healthz`
			`curl -s https://veza.fr/api/v1/health/deep`

			`# 7. Si OK : arrêter l'ancien slot (blue)`
			`docker-compose -f docker-compose.prod.yml stop veza-backend-blue veza-stream-blue veza-frontend-blue`

			`# 8. Si KO : rollback immédiat`
			`# Repointer HAProxy vers l'ancien slot (blue)`
			`sudo systemctl reload haproxy`
			```

			`### Checklist pré-déploiement`

			`- [ ] Tous les tests CI passent`
			`- [ ] Staging validé manuellement`
			`- [ ] Backup PostgreSQL effectué (voir §7)`
			`- [ ] Snapshot ZFS pris`
			`- [ ] Migration DB testée en staging`
			`- [ ] Pas de merge freeze en cours`

			`---`

			`## 6. Migrations base de données`

			`### Principes`

			`- Migrations 100% SQL (pas de GORM AutoMigrate)`
			- Fichiers dans `veza-backend-api/migrations/` (115+ fichiers numérotés)
			- Table `schema_migrations` pour le suivi
			`- Toujours backup PG avant une migration en production`

			`### Commandes`

			```bash
			`# Appliquer toutes les migrations en attente`
			`go run ./cmd/migrate_tool/main.go up`

			`# Rollback de la dernière migration`
			`go run ./cmd/migrate_tool/main.go down`

			`# Vérifier l'état des migrations`
			`go run ./cmd/migrate_tool/main.go status`
			```

			`### Procédure migration en production`

			1. Backup : `pg_dump -Fc veza > veza_pre_migration_$(date +%Y%m%d).dump`
			2. Snapshot ZFS : `zfs snapshot pool/pg@pre_migration_$(date +%Y%m%d)`
			`3. Appliquer la migration sur le slot inactif`
			`4. Vérifier : tester les endpoints critiques`
			`5. Si échec : restaurer depuis le backup ou le snapshot ZFS`

			`---`

			`## 7. Backup et restauration`

			`### PostgreSQL — PITR`

			`\| Composant \| Description \|`
			`\|-----------\|-------------\|`
			`\| WAL archiving \| Les Write-Ahead Logs sont archivés en continu vers R720 #2 \|`
			\| Base backup \| `pg_basebackup` hebdomadaire (dimanche 03:00) \|
			`\| Rétention \| 7 jours de WAL + 4 base backups (1 mois) \|`
			\| Outil Ansible \| Rôle `pg-wal-pull` pour la réplication WAL \|

			`#### Restauration PITR`

			```bash
			`# 1. Arrêter PostgreSQL`
			`sudo systemctl stop postgresql`

			`# 2. Restaurer le base backup`
			`pg_restore -d veza veza_backup.dump`

			`# 3. Configurer le recovery target`
			`echo "recovery_target_time = '2026-03-27 14:30:00'" >> recovery.conf`

			`# 4. Démarrer PostgreSQL en mode recovery`
			`sudo systemctl start postgresql`
			`# PG rejoue les WAL jusqu'au point cible`
			```

			`### Snapshots ZFS`

			```bash
			`# Snapshot manuel avant opération risquée`
			`zfs snapshot pool/data@before_upgrade`

			`# Liste des snapshots`
			`zfs list -t snapshot`

			`# Rollback vers un snapshot`
			`zfs rollback pool/data@before_upgrade`
			```

			`### MinIO (stockage objet)`

			`- Données sur pool ZFS mirror → résilience disque native`
			- Synchronisation inter-serveurs via `mc mirror` (MinIO Client) pour backup croisé
			`- Fichiers audio = immuables après upload → pas de risque de corruption`

			`---`

			`## 8. Maintenance courante`

			`### Quotidien (automatisé)`

			`\| Tâche \| Outil \| Fréquence \|`
			`\|-------\|-------\|-----------\|`
			`\| Health checks \| Prometheus + Alertmanager \| Toutes les 30s \|`
			`\| Rotation des logs \| logrotate \| Quotidien \|`
			`\| Scan SMART disques \| smartmontools + Zabbix \| Quotidien \|`
			`\| Archivage WAL \| pg_wal_archive \| Continu \|`

			`### Hebdomadaire`

			`\| Tâche \| Action \|`
			`\|-------\|--------\|`
			\| Base backup PostgreSQL \| `pg_basebackup` (dimanche 03:00) \|
			`\| Vérification espace disque \| Alertes Zabbix si > 80% \|`
			\| Mise à jour ClamAV signatures \| `freshclam` (automatique) \|
			`\| Revue des alertes Zabbix/Grafana \| Inspection manuelle \|`

			`### Mensuel`

			`\| Tâche \| Action \|`
			`\|-------\|--------\|`
			`\| Renouvellement certificats TLS \| Automatique (certbot + HAProxy) \|`
			`\| Rotation des snapshots ZFS \| Supprimer snapshots > 1 mois \|`
			\| Mise à jour sécurité OS \| `dnf upgrade --security` \|
			`\| Test de restauration backup \| Restaurer un dump sur staging, vérifier \|`

			`---`

			`## 9. Troubleshooting rapide`

			`\| Symptôme \| Diagnostic \| Action \|`
			`\|----------\|-----------\|--------\|`
			\| 502 Bad Gateway \| Service backend down \| `docker-compose logs backend` → redémarrer \|
			\| Latence élevée \| PostgreSQL surchargé \| `pg_stat_activity`, vérifier les requêtes longues \|
			\| Espace disque plein \| Logs ou WAL non archivés \| `du -sh /var/log/*`, rotation, nettoyage WAL \|
			`\| Uploads échouent \| ClamAV down ou MinIO inaccessible \| Vérifier les deux services, logs \|`
			`\| 429 Too Many Requests \| Rate limiting \| Normal si DDoS. Sinon ajuster les seuils \|`

			`---`

			`## Voir aussi`

			`- [[04_INFRA_DEPLOIEMENT/Architecture_Serveurs/ARCHITECTURE_INFRA]] — Topologie des serveurs`
			`- [[03_APPS_&_SERVICES/ARCHITECTURE_VEZA]] — Architecture applicative`
			`- [[03_APPS_&_SERVICES/CONFIGURATION_ENVIRONNEMENT]] — Variables d'environnement et Docker`
			`- [[00_META/Glossaire/GLOSSAIRE_TALAS]] — Glossaire des termes techniques`