veza/infra/ansible/roles/minio_distributed/README.md
senke d86815561c
Some checks failed
Veza CI / Rust (Stream Server) (push) Successful in 5m21s
Security Scan / Secret Scanning (gitleaks) (push) Failing after 54s
Veza CI / Backend (Go) (push) Failing after 8m27s
Veza CI / Notify on failure (push) Successful in 6s
E2E Playwright / e2e (full) (push) Failing after 12m42s
Veza CI / Frontend (Web) (push) Successful in 15m49s
feat(infra): MinIO distributed EC:2 + migration script (W3 Day 12)
Four-node distributed MinIO cluster, single erasure set EC:2, tolerates
2 simultaneous node losses. 50% storage efficiency. Pinned to
RELEASE.2025-09-07T16-13-09Z to match docker-compose so dev/prod
parity is preserved.

- infra/ansible/roles/minio_distributed/ : install pinned binary,
  systemd unit pointed at MINIO_VOLUMES with bracket-expansion form,
  EC:2 forced via MINIO_STORAGE_CLASS_STANDARD. Vault assertion
  blocks shipping placeholder credentials to staging/prod.
- bucket init : creates veza-prod-tracks, enables versioning, applies
  lifecycle.json (30d noncurrent expiry + 7d abort-multipart). Cold-tier
  transition ready but inert until minio_remote_tier_name is set.
- infra/ansible/playbooks/minio_distributed.yml : provisions the 4
  containers, applies common baseline + role.
- infra/ansible/inventory/lab.yml : new minio_nodes group.
- infra/ansible/tests/test_minio_resilience.sh : kill 2 nodes,
  verify EC:2 reconstruction (read OK + checksum matches), restart,
  wait for self-heal.
- scripts/minio-migrate-from-single.sh : mc mirror --preserve from
  the single-node bucket to the new cluster, count-verifies, prints
  rollout next-steps.
- config/prometheus/alert_rules.yml : MinIODriveOffline (warn) +
  MinIONodesUnreachable (page) — page fires at >= 2 nodes unreachable
  because that's the redundancy ceiling for EC:2.
- docs/ENV_VARIABLES.md §12 : MinIO migration cross-ref.

Acceptance (Day 12) : EC:2 survives 2 concurrent kills + self-heals.
Lab apply pending. No backend code change — interface stays AWS S3.

W3 progress : Redis Sentinel ✓ (Day 11), MinIO distribué ✓ (this),
CDN  Day 13, DMCA  Day 14, embed  Day 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 13:46:42 +02:00

6.2 KiB

minio_distributed role — distributed MinIO with EC:2

Four Incus containers, each running one MinIO server. Single erasure set of 4 drives = 2 data + 2 parity. The cluster tolerates 2 simultaneous node failures without data loss; storage efficiency is 50% (1 GB raw → 500 MB usable).

Topology

                     S3 API on :9000
                            │
            ┌───────────────┼───────────────┐
            │               │               │
       ┌────▼────┐     ┌────▼────┐     ┌────▼────┐     ┌────▼────┐
       │ minio-1 │     │ minio-2 │     │ minio-3 │     │ minio-4 │
       │  /data  │     │  /data  │     │  /data  │     │  /data  │
       └─────────┘     └─────────┘     └─────────┘     └─────────┘
                  └─── single erasure set, EC:2 ───┘

Each node also runs the web console on :9001.

Why EC:2 (not 4 or larger)

  • Recoverability ceiling. EC:N tolerates N drive losses. With 4 drives, EC:4 is a 4-way mirror — 25% efficiency, lose-3 OK but with no functional gain over EC:2 in the failure modes we care about (concurrent node losses).
  • Write amplification. EC:2 writes each object to 4 nodes (2 data + 2 parity). EC:4 would write to all 4 + a copy = 4-way replication. Doubling the wire cost for marginal durability isn't worth it on a 4-node cluster.
  • Future-proofing. When we go to 6+ nodes (W3+), the natural upgrade is EC:3 across a 6-drive set, NOT growing EC on the same 4 drives.

Defaults

variable default meaning
minio_version RELEASE.2025-09-07T16-13-09Z matches docker-compose.yml — keep them locked together
minio_port 9000 S3 API
minio_console_port 9001 web console
minio_data_path /var/lib/minio drive root on each node
minio_storage_class_standard EC:2 parity count for STANDARD storage class
minio_bucket_tracks veza-prod-tracks prod bucket created on first apply
minio_noncurrent_version_expiry_days 30 delete old object versions after N days
minio_cold_tier_after_days 90 only effective if minio_remote_tier_name is set
minio_remote_tier_name "" (none) future remote tier (Glacier / B2). v1.1 territory.
minio_root_user / minio_root_password (vault) root credentials

Vault setup

# group_vars/minio_ha.vault.yml — encrypt with `ansible-vault encrypt`
minio_root_user: "<random 32-char access key>"
minio_root_password: "<random 32-char secret>"

The role asserts the placeholder values are gone before applying to anything other than lab.

Backend integration

No code change. The backend's internal/services/storage/s3* already speaks the AWS SDK v2 ; pointing it at the new cluster is a config flip :

AWS_S3_ENABLED=true
AWS_S3_BUCKET=veza-prod-tracks
AWS_S3_ENDPOINT=http://minio-1.lxd:9000   # or behind HAProxy
AWS_S3_REGION=us-east-1                   # MinIO default region
AWS_ACCESS_KEY_ID=<minio_root_user>
AWS_SECRET_ACCESS_KEY=<minio_root_password>

For prod, front the 4 nodes with HAProxy (round-robin, health-checked) so the backend sees a single endpoint and tolerates any 1-node loss without DNS edits. HAProxy config lives in infra/haproxy/ (W4 day 19 ties this in).

Migration from single-node

# On the old single-node host (or via mc on a workstation) :
mc alias set veza-current http://veza.fr:19000 <ACCESS> <SECRET>
mc alias set veza-distributed http://minio-1.lxd:9000 <NEW_ACCESS> <NEW_SECRET>

# Mirror : preserves versioning, ACLs, content-types.
mc mirror --preserve veza-current/veza-files veza-distributed/veza-prod-tracks

# Verify count + bytes match before flipping the AWS_S3_ENDPOINT in
# backend env :
mc ls --recursive veza-current/veza-files       | wc -l
mc ls --recursive veza-distributed/veza-prod-tracks | wc -l

The old bucket can be kept hot for ~ 1 week after the flip in case a rollback is needed, then mc rm --recursive --force --dangerous drops it.

Operations

# Cluster health (admin info = info about each drive) :
mc admin info veza-distributed

# Per-node verbose state :
ssh minio-1 sudo journalctl -u minio -n 100 --no-pager

# Watch heal progress (after a node was offline / drive replaced) :
mc admin heal veza-distributed --recursive

# Check lifecycle policy :
mc ilm ls veza-distributed/veza-prod-tracks

# Console UI (per-node — pick any) :
open http://minio-1.lxd:9001

Failover smoke test

MINIO_ROOT_USER=... MINIO_ROOT_PASSWORD=... \
  bash infra/ansible/tests/test_minio_resilience.sh

Sequence : upload 100 MB random file, kill 2 nodes, assert read still works, restart nodes, wait for self-heal, assert all 4 nodes report healthy.

What this role does NOT cover

  • Cross-DC replication. Single-host (lab) or single-region in v1.0. v1.1+ adds bucket replication to a remote cluster.
  • Site replication / federation. Multi-tenant federation is out of scope.
  • Cold tier transitions. minio_remote_tier_name is empty by default — no Glacier / B2 / second-cluster behind the lifecycle yet. Wire when needed.
  • mTLS. --tls-cert/key is W4. The Incus bridge is the security boundary today.