119 lines
6.2 KiB
Markdown
119 lines
6.2 KiB
Markdown
|
|
# `minio_distributed` role — distributed MinIO with EC:2
|
||
|
|
|
||
|
|
Four Incus containers, each running one MinIO server. Single erasure set of 4 drives = 2 data + 2 parity. The cluster tolerates **2 simultaneous node failures** without data loss; storage efficiency is 50% (1 GB raw → 500 MB usable).
|
||
|
|
|
||
|
|
## Topology
|
||
|
|
|
||
|
|
```
|
||
|
|
S3 API on :9000
|
||
|
|
│
|
||
|
|
┌───────────────┼───────────────┐
|
||
|
|
│ │ │
|
||
|
|
┌────▼────┐ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐
|
||
|
|
│ minio-1 │ │ minio-2 │ │ minio-3 │ │ minio-4 │
|
||
|
|
│ /data │ │ /data │ │ /data │ │ /data │
|
||
|
|
└─────────┘ └─────────┘ └─────────┘ └─────────┘
|
||
|
|
└─── single erasure set, EC:2 ───┘
|
||
|
|
```
|
||
|
|
|
||
|
|
Each node also runs the web console on `:9001`.
|
||
|
|
|
||
|
|
## Why EC:2 (not 4 or larger)
|
||
|
|
|
||
|
|
- **Recoverability ceiling.** EC:N tolerates N drive losses. With 4 drives, EC:4 is a 4-way mirror — 25% efficiency, lose-3 OK but with no functional gain over EC:2 in the failure modes we care about (concurrent node losses).
|
||
|
|
- **Write amplification.** EC:2 writes each object to 4 nodes (2 data + 2 parity). EC:4 would write to all 4 + a copy = 4-way replication. Doubling the wire cost for marginal durability isn't worth it on a 4-node cluster.
|
||
|
|
- **Future-proofing.** When we go to 6+ nodes (W3+), the natural upgrade is EC:3 across a 6-drive set, NOT growing EC on the same 4 drives.
|
||
|
|
|
||
|
|
## Defaults
|
||
|
|
|
||
|
|
| variable | default | meaning |
|
||
|
|
| --------------------------------------- | ---------------------------------- | ---------------------------------------------------- |
|
||
|
|
| `minio_version` | `RELEASE.2025-09-07T16-13-09Z` | matches docker-compose.yml — keep them locked together |
|
||
|
|
| `minio_port` | `9000` | S3 API |
|
||
|
|
| `minio_console_port` | `9001` | web console |
|
||
|
|
| `minio_data_path` | `/var/lib/minio` | drive root on each node |
|
||
|
|
| `minio_storage_class_standard` | `EC:2` | parity count for STANDARD storage class |
|
||
|
|
| `minio_bucket_tracks` | `veza-prod-tracks` | prod bucket created on first apply |
|
||
|
|
| `minio_noncurrent_version_expiry_days` | `30` | delete old object versions after N days |
|
||
|
|
| `minio_cold_tier_after_days` | `90` | only effective if `minio_remote_tier_name` is set |
|
||
|
|
| `minio_remote_tier_name` | `""` (none) | future remote tier (Glacier / B2). v1.1 territory. |
|
||
|
|
| `minio_root_user` / `minio_root_password`| (vault) | root credentials |
|
||
|
|
|
||
|
|
## Vault setup
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
# group_vars/minio_ha.vault.yml — encrypt with `ansible-vault encrypt`
|
||
|
|
minio_root_user: "<random 32-char access key>"
|
||
|
|
minio_root_password: "<random 32-char secret>"
|
||
|
|
```
|
||
|
|
|
||
|
|
The role asserts the placeholder values are gone before applying to anything other than `lab`.
|
||
|
|
|
||
|
|
## Backend integration
|
||
|
|
|
||
|
|
**No code change.** The backend's `internal/services/storage/s3*` already speaks the AWS SDK v2 ; pointing it at the new cluster is a config flip :
|
||
|
|
|
||
|
|
```env
|
||
|
|
AWS_S3_ENABLED=true
|
||
|
|
AWS_S3_BUCKET=veza-prod-tracks
|
||
|
|
AWS_S3_ENDPOINT=http://minio-1.lxd:9000 # or behind HAProxy
|
||
|
|
AWS_S3_REGION=us-east-1 # MinIO default region
|
||
|
|
AWS_ACCESS_KEY_ID=<minio_root_user>
|
||
|
|
AWS_SECRET_ACCESS_KEY=<minio_root_password>
|
||
|
|
```
|
||
|
|
|
||
|
|
For prod, front the 4 nodes with HAProxy (round-robin, health-checked) so the backend sees a single endpoint and tolerates any 1-node loss without DNS edits. HAProxy config lives in `infra/haproxy/` (W4 day 19 ties this in).
|
||
|
|
|
||
|
|
## Migration from single-node
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# On the old single-node host (or via mc on a workstation) :
|
||
|
|
mc alias set veza-current http://veza.fr:19000 <ACCESS> <SECRET>
|
||
|
|
mc alias set veza-distributed http://minio-1.lxd:9000 <NEW_ACCESS> <NEW_SECRET>
|
||
|
|
|
||
|
|
# Mirror : preserves versioning, ACLs, content-types.
|
||
|
|
mc mirror --preserve veza-current/veza-files veza-distributed/veza-prod-tracks
|
||
|
|
|
||
|
|
# Verify count + bytes match before flipping the AWS_S3_ENDPOINT in
|
||
|
|
# backend env :
|
||
|
|
mc ls --recursive veza-current/veza-files | wc -l
|
||
|
|
mc ls --recursive veza-distributed/veza-prod-tracks | wc -l
|
||
|
|
```
|
||
|
|
|
||
|
|
The old bucket can be kept hot for ~ 1 week after the flip in case a rollback is needed, then `mc rm --recursive --force --dangerous` drops it.
|
||
|
|
|
||
|
|
## Operations
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Cluster health (admin info = info about each drive) :
|
||
|
|
mc admin info veza-distributed
|
||
|
|
|
||
|
|
# Per-node verbose state :
|
||
|
|
ssh minio-1 sudo journalctl -u minio -n 100 --no-pager
|
||
|
|
|
||
|
|
# Watch heal progress (after a node was offline / drive replaced) :
|
||
|
|
mc admin heal veza-distributed --recursive
|
||
|
|
|
||
|
|
# Check lifecycle policy :
|
||
|
|
mc ilm ls veza-distributed/veza-prod-tracks
|
||
|
|
|
||
|
|
# Console UI (per-node — pick any) :
|
||
|
|
open http://minio-1.lxd:9001
|
||
|
|
```
|
||
|
|
|
||
|
|
## Failover smoke test
|
||
|
|
|
||
|
|
```bash
|
||
|
|
MINIO_ROOT_USER=... MINIO_ROOT_PASSWORD=... \
|
||
|
|
bash infra/ansible/tests/test_minio_resilience.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
Sequence : upload 100 MB random file, kill 2 nodes, assert read still works, restart nodes, wait for self-heal, assert all 4 nodes report healthy.
|
||
|
|
|
||
|
|
## What this role does NOT cover
|
||
|
|
|
||
|
|
- **Cross-DC replication.** Single-host (lab) or single-region in v1.0. v1.1+ adds bucket replication to a remote cluster.
|
||
|
|
- **Site replication / federation.** Multi-tenant federation is out of scope.
|
||
|
|
- **Cold tier transitions.** `minio_remote_tier_name` is empty by default — no Glacier / B2 / second-cluster behind the lifecycle yet. Wire when needed.
|
||
|
|
- **mTLS.** `--tls-cert/key` is W4. The Incus bridge is the security boundary today.
|