veza/infra/ansible/group_vars/all.yml

41 lines
1.6 KiB
YAML
Raw Normal View History

feat(infra): Ansible IaC scaffolding — common + incus_host roles (Day 5 v1.0.9) Day 5 of ROADMAP_V1.0_LAUNCH.md §Semaine 1: turn the manual host-setup steps into an idempotent playbook so subsequent days (W2 Postgres HA, W2 PgBouncer, W2 OTel collector, W3 Redis Sentinel, W3 MinIO distributed, W4 HAProxy) can each land as a self-contained role on top of this baseline. Layout (full tree under infra/ansible/): ansible.cfg pinned defaults — inventory path, ControlMaster=auto so the SSH handshake is paid once per playbook run inventory/{lab,staging,prod}.yml three environments. lab is the R720's local Incus container (10.0.20.150), staging is Hetzner (TODO until W2 provisions the box), prod is R720 (TODO until DNS at EX-5 lands). group_vars/all.yml shared defaults — SSH whitelist, fail2ban thresholds, unattended-upgrades origins, node_exporter version pin. playbooks/site.yml entry point. Two plays: 1. common (every host) 2. incus_host (incus_hosts group) roles/common/ idempotent baseline: ssh.yml — drop-in /etc/ssh/sshd_config.d/50-veza- hardening.conf, validates with `sshd -t` before reload, asserts ssh_allow_users non-empty before apply (refuses to lock out the operator). fail2ban.yml — sshd jail tuned to group_vars (defaults bantime=1h, findtime=10min, maxretry=5). unattended_upgrades.yml — security- only origins, Automatic-Reboot pinned to false (operator owns reboot windows for SLO-budget alignment, cf W2 day 10). node_exporter.yml — pinned to 1.8.2, runs as a systemd unit on :9100. Skips download when --version already matches. roles/incus_host/ zabbly upstream apt repo + incus + incus-client install. First-time `incus admin init --preseed` only when `incus list` errors (i.e. the host has never been initialised) — re-runs on initialised hosts are no-ops. Configures incusbr0 / 10.99.0.1/24 with NAT + default storage pool. Acceptance verified locally (full --check needs SSH to the lab host which is offline-only from this box, so the user runs that step): $ cd infra/ansible $ ansible-playbook -i inventory/lab.yml playbooks/site.yml --syntax-check playbook: playbooks/site.yml ← clean $ ansible-playbook -i inventory/lab.yml playbooks/site.yml --list-tasks 21 tasks across 2 plays, all tagged. ← partial applies work Conventions enforced from the start: - Every task has tags so `--tags ssh,fail2ban` partial applies are always possible. - Sub-task files (ssh.yml, fail2ban.yml, etc.) so the role main.yml stays a directory of concerns, not a wall of tasks. - Validators run before reload (sshd -t for sshd_config). The role refuses to apply changes that would lock the operator out. - Comments answer "why" — task names + module names already say "what". Next role on the stack: postgres_ha (W2 day 6) — pg_auto_failover monitor + primary + replica in 2 Incus containers. SKIP_TESTS=1 — IaC YAML, no app code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:16:38 +00:00
# Shared defaults across every inventory (lab/staging/prod). Override
# per-environment in `group_vars/<group>.yml` or per-host in
# `host_vars/<host>.yml`.
---
# Owner contact (used in some unattended-upgrades + monitoring agent configs).
veza_ops_email: ops@veza.fr
# v1.0.9 Day 5: SSH hardening surface that the `common` role enforces.
# Override these in production via group_vars/veza_prod.yml when the
# bastion's specific port / allowed users are decided. Defaults are
# safe for lab.
ssh_port: 22
ssh_permit_root_login: "no"
ssh_password_authentication: "no"
ssh_allow_users:
- senke
- ansible
# fail2ban — per-jail thresholds. The defaults are conservative for
# a self-hosted single-machine deployment; production may want
# lower findtime / higher bantime once Forgejo + Veza traffic is
# baselined.
fail2ban_bantime: 3600 # 1h
fail2ban_findtime: 600 # 10min
fail2ban_maxretry: 5
# unattended-upgrades — security updates only by default. The role
# never enables auto-reboot; ROADMAP_V1.0_LAUNCH.md §5 game day pins
# downtime windows to controlled cycles, not OS-driven reboots.
unattended_upgrades_origins:
- "${distro_id}:${distro_codename}-security"
- "${distro_id}ESMApps:${distro_codename}-apps-security"
- "${distro_id}ESM:${distro_codename}-infra-security"
unattended_upgrades_auto_reboot: false
# Monitoring agent: prometheus node_exporter is the bare-minimum
# host metrics surface (CPU / memory / disk / network). The
# observability stack (Tempo + Loki + Grafana) lands W2 in roadmap.
monitoring_node_exporter_version: "1.8.2"
monitoring_node_exporter_port: 9100