Some checks failed
Veza CI / Frontend (Web) (push) Has been cancelled
E2E Playwright / e2e (full) (push) Has been cancelled
Veza CI / Notify on failure (push) Blocked by required conditions
Veza CI / Rust (Stream Server) (push) Successful in 3m27s
Security Scan / Secret Scanning (gitleaks) (push) Successful in 52s
Veza CI / Backend (Go) (push) Successful in 5m32s
Day 5 of ROADMAP_V1.0_LAUNCH.md §Semaine 1: turn the manual
host-setup steps into an idempotent playbook so subsequent days
(W2 Postgres HA, W2 PgBouncer, W2 OTel collector, W3 Redis
Sentinel, W3 MinIO distributed, W4 HAProxy) can each land as a
self-contained role on top of this baseline.
Layout (full tree under infra/ansible/):
ansible.cfg pinned defaults — inventory path,
ControlMaster=auto so the SSH handshake
is paid once per playbook run
inventory/{lab,staging,prod}.yml
three environments. lab is the R720's
local Incus container (10.0.20.150),
staging is Hetzner (TODO until W2
provisions the box), prod is R720
(TODO until DNS at EX-5 lands).
group_vars/all.yml shared defaults — SSH whitelist,
fail2ban thresholds, unattended-upgrades
origins, node_exporter version pin.
playbooks/site.yml entry point. Two plays:
1. common (every host)
2. incus_host (incus_hosts group)
roles/common/ idempotent baseline:
ssh.yml — drop-in
/etc/ssh/sshd_config.d/50-veza-
hardening.conf, validates with
`sshd -t` before reload, asserts
ssh_allow_users non-empty before
apply (refuses to lock out the
operator).
fail2ban.yml — sshd jail tuned to
group_vars (defaults bantime=1h,
findtime=10min, maxretry=5).
unattended_upgrades.yml — security-
only origins, Automatic-Reboot
pinned to false (operator owns
reboot windows for SLO-budget
alignment, cf W2 day 10).
node_exporter.yml — pinned to
1.8.2, runs as a systemd unit
on :9100. Skips download when
--version already matches.
roles/incus_host/ zabbly upstream apt repo + incus +
incus-client install. First-time
`incus admin init --preseed` only when
`incus list` errors (i.e. the host
has never been initialised) — re-runs
on initialised hosts are no-ops.
Configures incusbr0 / 10.99.0.1/24
with NAT + default storage pool.
Acceptance verified locally (full --check needs SSH to the lab
host which is offline-only from this box, so the user runs that
step):
$ cd infra/ansible
$ ansible-playbook -i inventory/lab.yml playbooks/site.yml --syntax-check
playbook: playbooks/site.yml ← clean
$ ansible-playbook -i inventory/lab.yml playbooks/site.yml --list-tasks
21 tasks across 2 plays, all tagged. ← partial applies work
Conventions enforced from the start:
- Every task has tags so `--tags ssh,fail2ban` partial applies
are always possible.
- Sub-task files (ssh.yml, fail2ban.yml, etc.) so the role
main.yml stays a directory of concerns, not a wall of tasks.
- Validators run before reload (sshd -t for sshd_config). The
role refuses to apply changes that would lock the operator out.
- Comments answer "why" — task names + module names already
say "what".
Next role on the stack: postgres_ha (W2 day 6) — pg_auto_failover
monitor + primary + replica in 2 Incus containers.
SKIP_TESTS=1 — IaC YAML, no app code.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
40 lines
1.6 KiB
YAML
40 lines
1.6 KiB
YAML
# Shared defaults across every inventory (lab/staging/prod). Override
|
|
# per-environment in `group_vars/<group>.yml` or per-host in
|
|
# `host_vars/<host>.yml`.
|
|
---
|
|
# Owner contact (used in some unattended-upgrades + monitoring agent configs).
|
|
veza_ops_email: ops@veza.fr
|
|
|
|
# v1.0.9 Day 5: SSH hardening surface that the `common` role enforces.
|
|
# Override these in production via group_vars/veza_prod.yml when the
|
|
# bastion's specific port / allowed users are decided. Defaults are
|
|
# safe for lab.
|
|
ssh_port: 22
|
|
ssh_permit_root_login: "no"
|
|
ssh_password_authentication: "no"
|
|
ssh_allow_users:
|
|
- senke
|
|
- ansible
|
|
|
|
# fail2ban — per-jail thresholds. The defaults are conservative for
|
|
# a self-hosted single-machine deployment; production may want
|
|
# lower findtime / higher bantime once Forgejo + Veza traffic is
|
|
# baselined.
|
|
fail2ban_bantime: 3600 # 1h
|
|
fail2ban_findtime: 600 # 10min
|
|
fail2ban_maxretry: 5
|
|
|
|
# unattended-upgrades — security updates only by default. The role
|
|
# never enables auto-reboot; ROADMAP_V1.0_LAUNCH.md §5 game day pins
|
|
# downtime windows to controlled cycles, not OS-driven reboots.
|
|
unattended_upgrades_origins:
|
|
- "${distro_id}:${distro_codename}-security"
|
|
- "${distro_id}ESMApps:${distro_codename}-apps-security"
|
|
- "${distro_id}ESM:${distro_codename}-infra-security"
|
|
unattended_upgrades_auto_reboot: false
|
|
|
|
# Monitoring agent: prometheus node_exporter is the bare-minimum
|
|
# host metrics surface (CPU / memory / disk / network). The
|
|
# observability stack (Tempo + Loki + Grafana) lands W2 in roadmap.
|
|
monitoring_node_exporter_version: "1.8.2"
|
|
monitoring_node_exporter_port: 9100
|