Some checks failed
Veza CI / Frontend (Web) (push) Has been cancelled
E2E Playwright / e2e (full) (push) Has been cancelled
Veza CI / Notify on failure (push) Blocked by required conditions
Veza CI / Rust (Stream Server) (push) Successful in 3m27s
Security Scan / Secret Scanning (gitleaks) (push) Successful in 52s
Veza CI / Backend (Go) (push) Successful in 5m32s
Day 5 of ROADMAP_V1.0_LAUNCH.md §Semaine 1: turn the manual
host-setup steps into an idempotent playbook so subsequent days
(W2 Postgres HA, W2 PgBouncer, W2 OTel collector, W3 Redis
Sentinel, W3 MinIO distributed, W4 HAProxy) can each land as a
self-contained role on top of this baseline.
Layout (full tree under infra/ansible/):
ansible.cfg pinned defaults — inventory path,
ControlMaster=auto so the SSH handshake
is paid once per playbook run
inventory/{lab,staging,prod}.yml
three environments. lab is the R720's
local Incus container (10.0.20.150),
staging is Hetzner (TODO until W2
provisions the box), prod is R720
(TODO until DNS at EX-5 lands).
group_vars/all.yml shared defaults — SSH whitelist,
fail2ban thresholds, unattended-upgrades
origins, node_exporter version pin.
playbooks/site.yml entry point. Two plays:
1. common (every host)
2. incus_host (incus_hosts group)
roles/common/ idempotent baseline:
ssh.yml — drop-in
/etc/ssh/sshd_config.d/50-veza-
hardening.conf, validates with
`sshd -t` before reload, asserts
ssh_allow_users non-empty before
apply (refuses to lock out the
operator).
fail2ban.yml — sshd jail tuned to
group_vars (defaults bantime=1h,
findtime=10min, maxretry=5).
unattended_upgrades.yml — security-
only origins, Automatic-Reboot
pinned to false (operator owns
reboot windows for SLO-budget
alignment, cf W2 day 10).
node_exporter.yml — pinned to
1.8.2, runs as a systemd unit
on :9100. Skips download when
--version already matches.
roles/incus_host/ zabbly upstream apt repo + incus +
incus-client install. First-time
`incus admin init --preseed` only when
`incus list` errors (i.e. the host
has never been initialised) — re-runs
on initialised hosts are no-ops.
Configures incusbr0 / 10.99.0.1/24
with NAT + default storage pool.
Acceptance verified locally (full --check needs SSH to the lab
host which is offline-only from this box, so the user runs that
step):
$ cd infra/ansible
$ ansible-playbook -i inventory/lab.yml playbooks/site.yml --syntax-check
playbook: playbooks/site.yml ← clean
$ ansible-playbook -i inventory/lab.yml playbooks/site.yml --list-tasks
21 tasks across 2 plays, all tagged. ← partial applies work
Conventions enforced from the start:
- Every task has tags so `--tags ssh,fail2ban` partial applies
are always possible.
- Sub-task files (ssh.yml, fail2ban.yml, etc.) so the role
main.yml stays a directory of concerns, not a wall of tasks.
- Validators run before reload (sshd -t for sshd_config). The
role refuses to apply changes that would lock the operator out.
- Comments answer "why" — task names + module names already
say "what".
Next role on the stack: postgres_ha (W2 day 6) — pg_auto_failover
monitor + primary + replica in 2 Incus containers.
SKIP_TESTS=1 — IaC YAML, no app code.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
111 lines
5.5 KiB
Markdown
111 lines
5.5 KiB
Markdown
# Veza Ansible IaC
|
|
|
|
Infrastructure-as-code for the Veza self-hosted platform. Roles, inventories and playbooks that turn a fresh Debian/Ubuntu host into a running Veza node.
|
|
|
|
Scope at v1.0.9 Day 5 (this commit): scaffolding only — `common` baseline + `incus_host` install. Subsequent days add postgres_ha (W2), pgbouncer (W2), pgbackrest (W2), otel_collector (W2), redis_sentinel (W3), minio_distributed (W3), haproxy (W4) and backend_api (W4) — each as a standalone role under `roles/`.
|
|
|
|
## Layout
|
|
|
|
```
|
|
infra/ansible/
|
|
├── ansible.cfg # pinned defaults (inventory path, ControlMaster)
|
|
├── inventory/
|
|
│ ├── lab.yml # R720 lab Incus container — dry-run target
|
|
│ ├── staging.yml # Hetzner staging (TODO IP — W2 provision)
|
|
│ └── prod.yml # R720 prod (TODO IP — DNS at EX-5)
|
|
├── group_vars/
|
|
│ └── all.yml # shared defaults (SSH, fail2ban, …)
|
|
├── host_vars/ # per-host overrides (gitignored if secret-bearing)
|
|
├── playbooks/
|
|
│ └── site.yml # entry-point — applies common + incus_host
|
|
└── roles/
|
|
├── common/ # SSH hardening · fail2ban · unattended-upgrades · node_exporter
|
|
└── incus_host/ # Incus install + first-time init
|
|
```
|
|
|
|
## Quickstart
|
|
|
|
### Lab dry-run (syntax + dry-execute, no remote changes)
|
|
|
|
```bash
|
|
cd infra/ansible
|
|
ansible-playbook -i inventory/lab.yml playbooks/site.yml --check
|
|
```
|
|
|
|
`--check` is the acceptance gate for v1.0.9 Day 5 — must pass clean before merging any role change.
|
|
|
|
### Lab apply
|
|
|
|
```bash
|
|
ansible-playbook -i inventory/lab.yml playbooks/site.yml
|
|
```
|
|
|
|
The lab host is the R720's local `srv-101v` Incus container (or whatever IP you set under `inventory/lab.yml::veza-lab.ansible_host`). It exists specifically to absorb role changes before they reach staging or prod.
|
|
|
|
### Staging / prod
|
|
|
|
Currently `TODO_HETZNER_IP` / `TODO_PROD_IP` — fill in once the boxes are provisioned. Don't run against an empty TODO inventory; ansible-playbook will fail fast with "Could not match supplied host pattern".
|
|
|
|
### Tags — apply a single concern
|
|
|
|
```bash
|
|
# Re-render only the SSH hardening drop-in
|
|
ansible-playbook -i inventory/lab.yml playbooks/site.yml --tags ssh
|
|
|
|
# Bump node_exporter to a newer pinned version (after editing group_vars/all.yml)
|
|
ansible-playbook -i inventory/lab.yml playbooks/site.yml --tags node_exporter
|
|
```
|
|
|
|
Available tags: `common`, `packages`, `users`, `ssh`, `fail2ban`, `unattended-upgrades`, `monitoring`, `node_exporter`, `incus`, `init`, `service`.
|
|
|
|
## Roles
|
|
|
|
### `common` — host baseline
|
|
|
|
- `ssh.yml` — drops `/etc/ssh/sshd_config.d/50-veza-hardening.conf` from a Jinja template. Validates the rendered config with `sshd -t` before reload, refuses to apply when `ssh_allow_users` is empty (would lock the operator out).
|
|
- `fail2ban.yml` — `/etc/fail2ban/jail.local` with the sshd jail enabled, defaults to bantime=1h / findtime=10min / maxretry=5.
|
|
- `unattended_upgrades.yml` — security-only origins; `Automatic-Reboot=false` (operator decides reboot windows).
|
|
- `node_exporter.yml` — installs Prometheus node_exporter pinned to the version in `group_vars/all.yml::monitoring_node_exporter_version`, runs as a systemd unit on `:9100`.
|
|
|
|
Variables in `group_vars/all.yml`:
|
|
|
|
| var | default | notes |
|
|
|---|---|---|
|
|
| `ssh_port` | `22` | bump for prod once a bastion is in place |
|
|
| `ssh_permit_root_login` | `"no"` | string, not boolean (sshd config syntax) |
|
|
| `ssh_password_authentication` | `"no"` | |
|
|
| `ssh_allow_users` | `[senke, ansible]` | role asserts non-empty |
|
|
| `fail2ban_bantime` | `3600` | seconds |
|
|
| `fail2ban_findtime` | `600` | seconds |
|
|
| `fail2ban_maxretry` | `5` | |
|
|
| `unattended_upgrades_origins` | security-only | |
|
|
| `unattended_upgrades_auto_reboot` | `false` | operator-driven |
|
|
| `monitoring_node_exporter_version` | `1.8.2` | upstream pin |
|
|
| `monitoring_node_exporter_port` | `9100` | |
|
|
|
|
### `incus_host` — Incus server install
|
|
|
|
- Adds the upstream zabbly Incus apt repo.
|
|
- Installs `incus` + `incus-client`.
|
|
- Adds the `ansible` user to `incus-admin` so subsequent roles can run `incus` non-sudo.
|
|
- First-time `incus admin init` via preseed if the host has never been initialised. Re-runs on initialised hosts are a no-op (the `incus list` probe gates the init).
|
|
|
|
Bridge config:
|
|
|
|
| var | default | notes |
|
|
|---|---|---|
|
|
| `incus_bridge` | `incusbr0` | the bridge Veza app containers attach to |
|
|
| `incus_bridge_ipv4` | `10.99.0.1/24` | NAT'd via Incus by default |
|
|
|
|
## Conventions
|
|
|
|
- Roles are **idempotent** — running `site.yml` twice produces no changes. CI eventually validates this with a `--check` after a real apply.
|
|
- **No secrets in git.** `host_vars/<host>.yml` is fine for non-secrets; secrets go in `host_vars/<host>.vault.yml` encrypted with `ansible-vault`. The vault key lives outside the repo.
|
|
- **Tags are mandatory** on every task so a partial apply (`--tags ssh,monitoring`) is always possible. A new role missing tags fails its own commit's `--check` review.
|
|
- **Comment the why, not the what.** Role tasks should answer "why this knob, why this default, why this guard" — the task name + module already say what.
|
|
|
|
## See also
|
|
|
|
- `ROADMAP_V1.0_LAUNCH.md` §Semaine 1 day 5 — original scope brief
|
|
- `docs/runbooks/` — once roles for production services land, each gets a runbook
|
|
- `docker-compose.dev.yml` — the dev-host equivalent of these roles (kept for now; Ansible takes over for staging/prod once W2 lands)
|