Some checks failed
Veza CI / Frontend (Web) (push) Has been cancelled
E2E Playwright / e2e (full) (push) Has been cancelled
Veza CI / Notify on failure (push) Blocked by required conditions
Veza CI / Rust (Stream Server) (push) Successful in 3m27s
Security Scan / Secret Scanning (gitleaks) (push) Successful in 52s
Veza CI / Backend (Go) (push) Successful in 5m32s
Day 5 of ROADMAP_V1.0_LAUNCH.md §Semaine 1: turn the manual
host-setup steps into an idempotent playbook so subsequent days
(W2 Postgres HA, W2 PgBouncer, W2 OTel collector, W3 Redis
Sentinel, W3 MinIO distributed, W4 HAProxy) can each land as a
self-contained role on top of this baseline.
Layout (full tree under infra/ansible/):
ansible.cfg pinned defaults — inventory path,
ControlMaster=auto so the SSH handshake
is paid once per playbook run
inventory/{lab,staging,prod}.yml
three environments. lab is the R720's
local Incus container (10.0.20.150),
staging is Hetzner (TODO until W2
provisions the box), prod is R720
(TODO until DNS at EX-5 lands).
group_vars/all.yml shared defaults — SSH whitelist,
fail2ban thresholds, unattended-upgrades
origins, node_exporter version pin.
playbooks/site.yml entry point. Two plays:
1. common (every host)
2. incus_host (incus_hosts group)
roles/common/ idempotent baseline:
ssh.yml — drop-in
/etc/ssh/sshd_config.d/50-veza-
hardening.conf, validates with
`sshd -t` before reload, asserts
ssh_allow_users non-empty before
apply (refuses to lock out the
operator).
fail2ban.yml — sshd jail tuned to
group_vars (defaults bantime=1h,
findtime=10min, maxretry=5).
unattended_upgrades.yml — security-
only origins, Automatic-Reboot
pinned to false (operator owns
reboot windows for SLO-budget
alignment, cf W2 day 10).
node_exporter.yml — pinned to
1.8.2, runs as a systemd unit
on :9100. Skips download when
--version already matches.
roles/incus_host/ zabbly upstream apt repo + incus +
incus-client install. First-time
`incus admin init --preseed` only when
`incus list` errors (i.e. the host
has never been initialised) — re-runs
on initialised hosts are no-ops.
Configures incusbr0 / 10.99.0.1/24
with NAT + default storage pool.
Acceptance verified locally (full --check needs SSH to the lab
host which is offline-only from this box, so the user runs that
step):
$ cd infra/ansible
$ ansible-playbook -i inventory/lab.yml playbooks/site.yml --syntax-check
playbook: playbooks/site.yml ← clean
$ ansible-playbook -i inventory/lab.yml playbooks/site.yml --list-tasks
21 tasks across 2 plays, all tagged. ← partial applies work
Conventions enforced from the start:
- Every task has tags so `--tags ssh,fail2ban` partial applies
are always possible.
- Sub-task files (ssh.yml, fail2ban.yml, etc.) so the role
main.yml stays a directory of concerns, not a wall of tasks.
- Validators run before reload (sshd -t for sshd_config). The
role refuses to apply changes that would lock the operator out.
- Comments answer "why" — task names + module names already
say "what".
Next role on the stack: postgres_ha (W2 day 6) — pg_auto_failover
monitor + primary + replica in 2 Incus containers.
SKIP_TESTS=1 — IaC YAML, no app code.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
21 lines
420 B
YAML
21 lines
420 B
YAML
---
|
|
- name: Reload sshd
|
|
ansible.builtin.service:
|
|
name: ssh
|
|
state: reloaded
|
|
|
|
- name: Restart fail2ban
|
|
ansible.builtin.service:
|
|
name: fail2ban
|
|
state: restarted
|
|
|
|
- name: Restart unattended-upgrades
|
|
ansible.builtin.service:
|
|
name: unattended-upgrades
|
|
state: restarted
|
|
|
|
- name: Restart node_exporter
|
|
ansible.builtin.systemd:
|
|
name: node_exporter
|
|
state: restarted
|
|
daemon_reload: true
|