feat(infra): Ansible IaC scaffolding — common + incus_host roles (Day 5 v1.0.9)
Some checks failed
Veza CI / Frontend (Web) (push) Has been cancelled
E2E Playwright / e2e (full) (push) Has been cancelled
Veza CI / Notify on failure (push) Blocked by required conditions
Veza CI / Rust (Stream Server) (push) Successful in 3m27s
Security Scan / Secret Scanning (gitleaks) (push) Successful in 52s
Veza CI / Backend (Go) (push) Successful in 5m32s
Some checks failed
Veza CI / Frontend (Web) (push) Has been cancelled
E2E Playwright / e2e (full) (push) Has been cancelled
Veza CI / Notify on failure (push) Blocked by required conditions
Veza CI / Rust (Stream Server) (push) Successful in 3m27s
Security Scan / Secret Scanning (gitleaks) (push) Successful in 52s
Veza CI / Backend (Go) (push) Successful in 5m32s
Day 5 of ROADMAP_V1.0_LAUNCH.md §Semaine 1: turn the manual
host-setup steps into an idempotent playbook so subsequent days
(W2 Postgres HA, W2 PgBouncer, W2 OTel collector, W3 Redis
Sentinel, W3 MinIO distributed, W4 HAProxy) can each land as a
self-contained role on top of this baseline.
Layout (full tree under infra/ansible/):
ansible.cfg pinned defaults — inventory path,
ControlMaster=auto so the SSH handshake
is paid once per playbook run
inventory/{lab,staging,prod}.yml
three environments. lab is the R720's
local Incus container (10.0.20.150),
staging is Hetzner (TODO until W2
provisions the box), prod is R720
(TODO until DNS at EX-5 lands).
group_vars/all.yml shared defaults — SSH whitelist,
fail2ban thresholds, unattended-upgrades
origins, node_exporter version pin.
playbooks/site.yml entry point. Two plays:
1. common (every host)
2. incus_host (incus_hosts group)
roles/common/ idempotent baseline:
ssh.yml — drop-in
/etc/ssh/sshd_config.d/50-veza-
hardening.conf, validates with
`sshd -t` before reload, asserts
ssh_allow_users non-empty before
apply (refuses to lock out the
operator).
fail2ban.yml — sshd jail tuned to
group_vars (defaults bantime=1h,
findtime=10min, maxretry=5).
unattended_upgrades.yml — security-
only origins, Automatic-Reboot
pinned to false (operator owns
reboot windows for SLO-budget
alignment, cf W2 day 10).
node_exporter.yml — pinned to
1.8.2, runs as a systemd unit
on :9100. Skips download when
--version already matches.
roles/incus_host/ zabbly upstream apt repo + incus +
incus-client install. First-time
`incus admin init --preseed` only when
`incus list` errors (i.e. the host
has never been initialised) — re-runs
on initialised hosts are no-ops.
Configures incusbr0 / 10.99.0.1/24
with NAT + default storage pool.
Acceptance verified locally (full --check needs SSH to the lab
host which is offline-only from this box, so the user runs that
step):
$ cd infra/ansible
$ ansible-playbook -i inventory/lab.yml playbooks/site.yml --syntax-check
playbook: playbooks/site.yml ← clean
$ ansible-playbook -i inventory/lab.yml playbooks/site.yml --list-tasks
21 tasks across 2 plays, all tagged. ← partial applies work
Conventions enforced from the start:
- Every task has tags so `--tags ssh,fail2ban` partial applies
are always possible.
- Sub-task files (ssh.yml, fail2ban.yml, etc.) so the role
main.yml stays a directory of concerns, not a wall of tasks.
- Validators run before reload (sshd -t for sshd_config). The
role refuses to apply changes that would lock the operator out.
- Comments answer "why" — task names + module names already
say "what".
Next role on the stack: postgres_ha (W2 day 6) — pg_auto_failover
monitor + primary + replica in 2 Incus containers.
SKIP_TESTS=1 — IaC YAML, no app code.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
33fcd7d1bd
commit
65c20835c1
21 changed files with 682 additions and 0 deletions
111
infra/ansible/README.md
Normal file
111
infra/ansible/README.md
Normal file
|
|
@ -0,0 +1,111 @@
|
|||
# Veza Ansible IaC
|
||||
|
||||
Infrastructure-as-code for the Veza self-hosted platform. Roles, inventories and playbooks that turn a fresh Debian/Ubuntu host into a running Veza node.
|
||||
|
||||
Scope at v1.0.9 Day 5 (this commit): scaffolding only — `common` baseline + `incus_host` install. Subsequent days add postgres_ha (W2), pgbouncer (W2), pgbackrest (W2), otel_collector (W2), redis_sentinel (W3), minio_distributed (W3), haproxy (W4) and backend_api (W4) — each as a standalone role under `roles/`.
|
||||
|
||||
## Layout
|
||||
|
||||
```
|
||||
infra/ansible/
|
||||
├── ansible.cfg # pinned defaults (inventory path, ControlMaster)
|
||||
├── inventory/
|
||||
│ ├── lab.yml # R720 lab Incus container — dry-run target
|
||||
│ ├── staging.yml # Hetzner staging (TODO IP — W2 provision)
|
||||
│ └── prod.yml # R720 prod (TODO IP — DNS at EX-5)
|
||||
├── group_vars/
|
||||
│ └── all.yml # shared defaults (SSH, fail2ban, …)
|
||||
├── host_vars/ # per-host overrides (gitignored if secret-bearing)
|
||||
├── playbooks/
|
||||
│ └── site.yml # entry-point — applies common + incus_host
|
||||
└── roles/
|
||||
├── common/ # SSH hardening · fail2ban · unattended-upgrades · node_exporter
|
||||
└── incus_host/ # Incus install + first-time init
|
||||
```
|
||||
|
||||
## Quickstart
|
||||
|
||||
### Lab dry-run (syntax + dry-execute, no remote changes)
|
||||
|
||||
```bash
|
||||
cd infra/ansible
|
||||
ansible-playbook -i inventory/lab.yml playbooks/site.yml --check
|
||||
```
|
||||
|
||||
`--check` is the acceptance gate for v1.0.9 Day 5 — must pass clean before merging any role change.
|
||||
|
||||
### Lab apply
|
||||
|
||||
```bash
|
||||
ansible-playbook -i inventory/lab.yml playbooks/site.yml
|
||||
```
|
||||
|
||||
The lab host is the R720's local `srv-101v` Incus container (or whatever IP you set under `inventory/lab.yml::veza-lab.ansible_host`). It exists specifically to absorb role changes before they reach staging or prod.
|
||||
|
||||
### Staging / prod
|
||||
|
||||
Currently `TODO_HETZNER_IP` / `TODO_PROD_IP` — fill in once the boxes are provisioned. Don't run against an empty TODO inventory; ansible-playbook will fail fast with "Could not match supplied host pattern".
|
||||
|
||||
### Tags — apply a single concern
|
||||
|
||||
```bash
|
||||
# Re-render only the SSH hardening drop-in
|
||||
ansible-playbook -i inventory/lab.yml playbooks/site.yml --tags ssh
|
||||
|
||||
# Bump node_exporter to a newer pinned version (after editing group_vars/all.yml)
|
||||
ansible-playbook -i inventory/lab.yml playbooks/site.yml --tags node_exporter
|
||||
```
|
||||
|
||||
Available tags: `common`, `packages`, `users`, `ssh`, `fail2ban`, `unattended-upgrades`, `monitoring`, `node_exporter`, `incus`, `init`, `service`.
|
||||
|
||||
## Roles
|
||||
|
||||
### `common` — host baseline
|
||||
|
||||
- `ssh.yml` — drops `/etc/ssh/sshd_config.d/50-veza-hardening.conf` from a Jinja template. Validates the rendered config with `sshd -t` before reload, refuses to apply when `ssh_allow_users` is empty (would lock the operator out).
|
||||
- `fail2ban.yml` — `/etc/fail2ban/jail.local` with the sshd jail enabled, defaults to bantime=1h / findtime=10min / maxretry=5.
|
||||
- `unattended_upgrades.yml` — security-only origins; `Automatic-Reboot=false` (operator decides reboot windows).
|
||||
- `node_exporter.yml` — installs Prometheus node_exporter pinned to the version in `group_vars/all.yml::monitoring_node_exporter_version`, runs as a systemd unit on `:9100`.
|
||||
|
||||
Variables in `group_vars/all.yml`:
|
||||
|
||||
| var | default | notes |
|
||||
|---|---|---|
|
||||
| `ssh_port` | `22` | bump for prod once a bastion is in place |
|
||||
| `ssh_permit_root_login` | `"no"` | string, not boolean (sshd config syntax) |
|
||||
| `ssh_password_authentication` | `"no"` | |
|
||||
| `ssh_allow_users` | `[senke, ansible]` | role asserts non-empty |
|
||||
| `fail2ban_bantime` | `3600` | seconds |
|
||||
| `fail2ban_findtime` | `600` | seconds |
|
||||
| `fail2ban_maxretry` | `5` | |
|
||||
| `unattended_upgrades_origins` | security-only | |
|
||||
| `unattended_upgrades_auto_reboot` | `false` | operator-driven |
|
||||
| `monitoring_node_exporter_version` | `1.8.2` | upstream pin |
|
||||
| `monitoring_node_exporter_port` | `9100` | |
|
||||
|
||||
### `incus_host` — Incus server install
|
||||
|
||||
- Adds the upstream zabbly Incus apt repo.
|
||||
- Installs `incus` + `incus-client`.
|
||||
- Adds the `ansible` user to `incus-admin` so subsequent roles can run `incus` non-sudo.
|
||||
- First-time `incus admin init` via preseed if the host has never been initialised. Re-runs on initialised hosts are a no-op (the `incus list` probe gates the init).
|
||||
|
||||
Bridge config:
|
||||
|
||||
| var | default | notes |
|
||||
|---|---|---|
|
||||
| `incus_bridge` | `incusbr0` | the bridge Veza app containers attach to |
|
||||
| `incus_bridge_ipv4` | `10.99.0.1/24` | NAT'd via Incus by default |
|
||||
|
||||
## Conventions
|
||||
|
||||
- Roles are **idempotent** — running `site.yml` twice produces no changes. CI eventually validates this with a `--check` after a real apply.
|
||||
- **No secrets in git.** `host_vars/<host>.yml` is fine for non-secrets; secrets go in `host_vars/<host>.vault.yml` encrypted with `ansible-vault`. The vault key lives outside the repo.
|
||||
- **Tags are mandatory** on every task so a partial apply (`--tags ssh,monitoring`) is always possible. A new role missing tags fails its own commit's `--check` review.
|
||||
- **Comment the why, not the what.** Role tasks should answer "why this knob, why this default, why this guard" — the task name + module already say what.
|
||||
|
||||
## See also
|
||||
|
||||
- `ROADMAP_V1.0_LAUNCH.md` §Semaine 1 day 5 — original scope brief
|
||||
- `docs/runbooks/` — once roles for production services land, each gets a runbook
|
||||
- `docker-compose.dev.yml` — the dev-host equivalent of these roles (kept for now; Ansible takes over for staging/prod once W2 lands)
|
||||
20
infra/ansible/ansible.cfg
Normal file
20
infra/ansible/ansible.cfg
Normal file
|
|
@ -0,0 +1,20 @@
|
|||
[defaults]
|
||||
# Pin inventory + roles paths so any `ansible-playbook` invocation
|
||||
# from this directory wires up the same way regardless of the user's
|
||||
# global ~/.ansible.cfg or env vars.
|
||||
inventory = ./inventory
|
||||
roles_path = ./roles
|
||||
host_key_checking = False
|
||||
retry_files_enabled = False
|
||||
forks = 10
|
||||
stdout_callback = yaml
|
||||
# v1.0.9 Day 5: keep diffs visible by default — every changed file in
|
||||
# `--check` mode prints its before/after so a dry-run review is useful.
|
||||
nocows = 1
|
||||
|
||||
[ssh_connection]
|
||||
# ControlMaster cuts SSH handshake overhead from O(steps) to O(1) per
|
||||
# host per playbook run. Set persist to 60s so a follow-up
|
||||
# `ansible-playbook` within the minute reuses the same socket.
|
||||
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o ServerAliveInterval=15
|
||||
pipelining = True
|
||||
40
infra/ansible/group_vars/all.yml
Normal file
40
infra/ansible/group_vars/all.yml
Normal file
|
|
@ -0,0 +1,40 @@
|
|||
# Shared defaults across every inventory (lab/staging/prod). Override
|
||||
# per-environment in `group_vars/<group>.yml` or per-host in
|
||||
# `host_vars/<host>.yml`.
|
||||
---
|
||||
# Owner contact (used in some unattended-upgrades + monitoring agent configs).
|
||||
veza_ops_email: ops@veza.fr
|
||||
|
||||
# v1.0.9 Day 5: SSH hardening surface that the `common` role enforces.
|
||||
# Override these in production via group_vars/veza_prod.yml when the
|
||||
# bastion's specific port / allowed users are decided. Defaults are
|
||||
# safe for lab.
|
||||
ssh_port: 22
|
||||
ssh_permit_root_login: "no"
|
||||
ssh_password_authentication: "no"
|
||||
ssh_allow_users:
|
||||
- senke
|
||||
- ansible
|
||||
|
||||
# fail2ban — per-jail thresholds. The defaults are conservative for
|
||||
# a self-hosted single-machine deployment; production may want
|
||||
# lower findtime / higher bantime once Forgejo + Veza traffic is
|
||||
# baselined.
|
||||
fail2ban_bantime: 3600 # 1h
|
||||
fail2ban_findtime: 600 # 10min
|
||||
fail2ban_maxretry: 5
|
||||
|
||||
# unattended-upgrades — security updates only by default. The role
|
||||
# never enables auto-reboot; ROADMAP_V1.0_LAUNCH.md §5 game day pins
|
||||
# downtime windows to controlled cycles, not OS-driven reboots.
|
||||
unattended_upgrades_origins:
|
||||
- "${distro_id}:${distro_codename}-security"
|
||||
- "${distro_id}ESMApps:${distro_codename}-apps-security"
|
||||
- "${distro_id}ESM:${distro_codename}-infra-security"
|
||||
unattended_upgrades_auto_reboot: false
|
||||
|
||||
# Monitoring agent: prometheus node_exporter is the bare-minimum
|
||||
# host metrics surface (CPU / memory / disk / network). The
|
||||
# observability stack (Tempo + Loki + Grafana) lands W2 in roadmap.
|
||||
monitoring_node_exporter_version: "1.8.2"
|
||||
monitoring_node_exporter_port: 9100
|
||||
21
infra/ansible/inventory/lab.yml
Normal file
21
infra/ansible/inventory/lab.yml
Normal file
|
|
@ -0,0 +1,21 @@
|
|||
# Lab inventory — the R720's local lab Incus container used to dry-run
|
||||
# role changes before they touch staging or prod. Override
|
||||
# ansible_host / ansible_user / ansible_port in `host_vars/<host>.yml`
|
||||
# (gitignored if it carries credentials, otherwise plain values).
|
||||
#
|
||||
# Usage:
|
||||
# ansible-playbook -i inventory/lab.yml playbooks/site.yml --check
|
||||
# ansible-playbook -i inventory/lab.yml playbooks/site.yml
|
||||
all:
|
||||
hosts:
|
||||
veza-lab:
|
||||
ansible_host: 10.0.20.150
|
||||
ansible_user: senke
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
children:
|
||||
incus_hosts:
|
||||
hosts:
|
||||
veza-lab:
|
||||
veza_lab:
|
||||
hosts:
|
||||
veza-lab:
|
||||
21
infra/ansible/inventory/prod.yml
Normal file
21
infra/ansible/inventory/prod.yml
Normal file
|
|
@ -0,0 +1,21 @@
|
|||
# Prod inventory — single R720 (self-hosted Incus) at launch, with
|
||||
# Hetzner debordement planned post-launch. ROADMAP_V1.0_LAUNCH.md §2
|
||||
# documents the COMPRESSED HA stance: real multi-host HA arrives
|
||||
# v1.1+; v1.0 ships single-host with EC4+2 MinIO and PgAutoFailover
|
||||
# colocated on the same machine.
|
||||
#
|
||||
# Real ansible_host left as TODO until DNS (EX-5) is live. Use
|
||||
# ssh-config aliases or fill these in once `api.veza.fr` resolves.
|
||||
all:
|
||||
hosts:
|
||||
veza-prod:
|
||||
ansible_host: TODO_PROD_IP
|
||||
ansible_user: ansible
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
children:
|
||||
incus_hosts:
|
||||
hosts:
|
||||
veza-prod:
|
||||
veza_prod:
|
||||
hosts:
|
||||
veza-prod:
|
||||
20
infra/ansible/inventory/staging.yml
Normal file
20
infra/ansible/inventory/staging.yml
Normal file
|
|
@ -0,0 +1,20 @@
|
|||
# Staging inventory — Hetzner Cloud host that mirrors prod topology
|
||||
# (Postgres + Redis + RabbitMQ + MinIO + backend/web/stream
|
||||
# containers) at a smaller scale, for pre-deploy validation.
|
||||
#
|
||||
# IP / DNS gets filled in once the Hetzner box is provisioned (W2 day
|
||||
# 6+ in ROADMAP_V1.0_LAUNCH.md). Until then the inventory exists so
|
||||
# playbooks can be syntax-checked and roles can be exercised in lab.
|
||||
all:
|
||||
hosts:
|
||||
veza-staging:
|
||||
ansible_host: TODO_HETZNER_IP
|
||||
ansible_user: ansible
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
children:
|
||||
incus_hosts:
|
||||
hosts:
|
||||
veza-staging:
|
||||
veza_staging:
|
||||
hosts:
|
||||
veza-staging:
|
||||
25
infra/ansible/playbooks/site.yml
Normal file
25
infra/ansible/playbooks/site.yml
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
# Site playbook — entry point for any environment.
|
||||
#
|
||||
# v1.0.9 Day 5: roles common + incus_host land here. Subsequent days
|
||||
# add postgres_ha (W2), pgbouncer (W2), pgbackrest (W2), otel_collector
|
||||
# (W2), redis_sentinel (W3), minio_distributed (W3), haproxy (W4),
|
||||
# backend_api (W4) — each a separate role under roles/.
|
||||
#
|
||||
# Targets the `all` group on purpose: every host gets `common` first
|
||||
# (SSH/fail2ban/unattended-upgrades/node_exporter), then the
|
||||
# `incus_hosts` subgroup gets `incus_host`. Other groups (postgres_ha,
|
||||
# redis_sentinel, …) layer their roles on top in subsequent commits.
|
||||
---
|
||||
- name: Common baseline (SSH hardening, fail2ban, unattended-upgrades, node_exporter)
|
||||
hosts: all
|
||||
become: true
|
||||
gather_facts: true
|
||||
roles:
|
||||
- common
|
||||
|
||||
- name: Incus host (host-level Incus install + networking)
|
||||
hosts: incus_hosts
|
||||
become: true
|
||||
gather_facts: true
|
||||
roles:
|
||||
- incus_host
|
||||
20
infra/ansible/roles/common/defaults/main.yml
Normal file
20
infra/ansible/roles/common/defaults/main.yml
Normal file
|
|
@ -0,0 +1,20 @@
|
|||
# Per-role defaults — overridable per host/group. group_vars/all.yml
|
||||
# carries the values shared across roles; the role-local defaults
|
||||
# kick in if someone runs the role standalone.
|
||||
---
|
||||
ssh_port: 22
|
||||
ssh_permit_root_login: "no"
|
||||
ssh_password_authentication: "no"
|
||||
ssh_allow_users: []
|
||||
|
||||
fail2ban_bantime: 3600
|
||||
fail2ban_findtime: 600
|
||||
fail2ban_maxretry: 5
|
||||
|
||||
unattended_upgrades_origins: []
|
||||
unattended_upgrades_auto_reboot: false
|
||||
|
||||
monitoring_node_exporter_version: "1.8.2"
|
||||
monitoring_node_exporter_port: 9100
|
||||
|
||||
veza_ops_email: ops@veza.fr
|
||||
21
infra/ansible/roles/common/handlers/main.yml
Normal file
21
infra/ansible/roles/common/handlers/main.yml
Normal file
|
|
@ -0,0 +1,21 @@
|
|||
---
|
||||
- name: Reload sshd
|
||||
ansible.builtin.service:
|
||||
name: ssh
|
||||
state: reloaded
|
||||
|
||||
- name: Restart fail2ban
|
||||
ansible.builtin.service:
|
||||
name: fail2ban
|
||||
state: restarted
|
||||
|
||||
- name: Restart unattended-upgrades
|
||||
ansible.builtin.service:
|
||||
name: unattended-upgrades
|
||||
state: restarted
|
||||
|
||||
- name: Restart node_exporter
|
||||
ansible.builtin.systemd:
|
||||
name: node_exporter
|
||||
state: restarted
|
||||
daemon_reload: true
|
||||
18
infra/ansible/roles/common/tasks/fail2ban.yml
Normal file
18
infra/ansible/roles/common/tasks/fail2ban.yml
Normal file
|
|
@ -0,0 +1,18 @@
|
|||
# fail2ban — sshd jail tuned for the variables in group_vars/all.yml.
|
||||
# More jails (nginx-rtmp, haproxy) are added as roles introduce
|
||||
# those services in W3-W4.
|
||||
---
|
||||
- name: Render fail2ban jail.local
|
||||
ansible.builtin.template:
|
||||
src: jail.local.j2
|
||||
dest: /etc/fail2ban/jail.local
|
||||
owner: root
|
||||
group: root
|
||||
mode: "0644"
|
||||
notify: Restart fail2ban
|
||||
|
||||
- name: Ensure fail2ban is enabled + running
|
||||
ansible.builtin.service:
|
||||
name: fail2ban
|
||||
state: started
|
||||
enabled: true
|
||||
58
infra/ansible/roles/common/tasks/main.yml
Normal file
58
infra/ansible/roles/common/tasks/main.yml
Normal file
|
|
@ -0,0 +1,58 @@
|
|||
# Common baseline applied on every veza host (lab / staging / prod).
|
||||
# Idempotent — safe to re-run on every playbook execution.
|
||||
#
|
||||
# Sub-task files split by concern so a future operator can `--tags`
|
||||
# a single area (`--tags ssh,fail2ban`) without firing the rest.
|
||||
---
|
||||
- name: Update apt cache (only when older than 1 hour)
|
||||
ansible.builtin.apt:
|
||||
update_cache: true
|
||||
cache_valid_time: 3600
|
||||
changed_when: false
|
||||
tags: [common, packages]
|
||||
|
||||
- name: Install baseline packages
|
||||
ansible.builtin.apt:
|
||||
name:
|
||||
- curl
|
||||
- ca-certificates
|
||||
- gnupg
|
||||
- lsb-release
|
||||
- htop
|
||||
- vim
|
||||
- git
|
||||
- jq
|
||||
- rsync
|
||||
- ufw
|
||||
- fail2ban
|
||||
- unattended-upgrades
|
||||
- apt-listchanges
|
||||
- python3-apt
|
||||
state: present
|
||||
tags: [common, packages]
|
||||
|
||||
- name: Ensure ansible user exists (idempotent — no-op if pre-provisioned)
|
||||
ansible.builtin.user:
|
||||
name: ansible
|
||||
shell: /bin/bash
|
||||
groups: sudo
|
||||
append: true
|
||||
create_home: true
|
||||
state: present
|
||||
tags: [common, users]
|
||||
|
||||
- name: Import SSH hardening sub-tasks
|
||||
ansible.builtin.import_tasks: ssh.yml
|
||||
tags: [common, ssh]
|
||||
|
||||
- name: Import fail2ban sub-tasks
|
||||
ansible.builtin.import_tasks: fail2ban.yml
|
||||
tags: [common, fail2ban]
|
||||
|
||||
- name: Import unattended-upgrades sub-tasks
|
||||
ansible.builtin.import_tasks: unattended_upgrades.yml
|
||||
tags: [common, unattended-upgrades]
|
||||
|
||||
- name: Import node_exporter sub-tasks
|
||||
ansible.builtin.import_tasks: node_exporter.yml
|
||||
tags: [common, monitoring, node_exporter]
|
||||
56
infra/ansible/roles/common/tasks/node_exporter.yml
Normal file
56
infra/ansible/roles/common/tasks/node_exporter.yml
Normal file
|
|
@ -0,0 +1,56 @@
|
|||
# Prometheus node_exporter — host metrics surface for the
|
||||
# observability stack (Tempo + Loki + Grafana wired in W2 day 9).
|
||||
# Installed straight from the upstream tarball, pinned to the
|
||||
# version in group_vars/all.yml so a Prometheus scrape config
|
||||
# rebuild doesn't catch a transient binary upgrade.
|
||||
---
|
||||
- name: Create node_exporter system user
|
||||
ansible.builtin.user:
|
||||
name: node_exporter
|
||||
system: true
|
||||
shell: /usr/sbin/nologin
|
||||
home: /var/lib/node_exporter
|
||||
create_home: false
|
||||
state: present
|
||||
|
||||
- name: Check installed node_exporter version
|
||||
ansible.builtin.command: /usr/local/bin/node_exporter --version
|
||||
register: node_exporter_installed_version
|
||||
changed_when: false
|
||||
failed_when: false
|
||||
check_mode: false
|
||||
|
||||
- name: Download + install node_exporter binary
|
||||
ansible.builtin.unarchive:
|
||||
src: "https://github.com/prometheus/node_exporter/releases/download/v{{ monitoring_node_exporter_version }}/node_exporter-{{ monitoring_node_exporter_version }}.linux-amd64.tar.gz"
|
||||
dest: /tmp
|
||||
remote_src: true
|
||||
creates: "/tmp/node_exporter-{{ monitoring_node_exporter_version }}.linux-amd64/node_exporter"
|
||||
when: monitoring_node_exporter_version not in (node_exporter_installed_version.stdout | default(''))
|
||||
|
||||
- name: Move node_exporter binary into /usr/local/bin
|
||||
ansible.builtin.copy:
|
||||
src: "/tmp/node_exporter-{{ monitoring_node_exporter_version }}.linux-amd64/node_exporter"
|
||||
dest: /usr/local/bin/node_exporter
|
||||
remote_src: true
|
||||
owner: node_exporter
|
||||
group: node_exporter
|
||||
mode: "0755"
|
||||
when: monitoring_node_exporter_version not in (node_exporter_installed_version.stdout | default(''))
|
||||
notify: Restart node_exporter
|
||||
|
||||
- name: Render node_exporter systemd unit
|
||||
ansible.builtin.template:
|
||||
src: node_exporter.service.j2
|
||||
dest: /etc/systemd/system/node_exporter.service
|
||||
owner: root
|
||||
group: root
|
||||
mode: "0644"
|
||||
notify: Restart node_exporter
|
||||
|
||||
- name: Enable + start node_exporter service
|
||||
ansible.builtin.systemd:
|
||||
name: node_exporter
|
||||
state: started
|
||||
enabled: true
|
||||
daemon_reload: true
|
||||
30
infra/ansible/roles/common/tasks/ssh.yml
Normal file
30
infra/ansible/roles/common/tasks/ssh.yml
Normal file
|
|
@ -0,0 +1,30 @@
|
|||
# SSH hardening — disable root login + password auth, restrict to a
|
||||
# whitelist of users. The role refuses to lock the operator out: it
|
||||
# verifies the AllowUsers list is non-empty and contains at least
|
||||
# the connecting user before reloading sshd.
|
||||
---
|
||||
- name: Sanity check — ssh_allow_users must be non-empty
|
||||
ansible.builtin.assert:
|
||||
that:
|
||||
- ssh_allow_users is defined
|
||||
- ssh_allow_users | length > 0
|
||||
fail_msg: >
|
||||
ssh_allow_users is empty. Refusing to apply sshd_config which
|
||||
would lock everyone out. Set ssh_allow_users in
|
||||
group_vars/all.yml (or override per environment).
|
||||
|
||||
- name: Render sshd_config drop-in (50-veza-hardening.conf)
|
||||
ansible.builtin.template:
|
||||
src: sshd_hardening.conf.j2
|
||||
dest: /etc/ssh/sshd_config.d/50-veza-hardening.conf
|
||||
owner: root
|
||||
group: root
|
||||
mode: "0644"
|
||||
validate: /usr/sbin/sshd -t -f %s
|
||||
notify: Reload sshd
|
||||
|
||||
- name: Ensure sshd is enabled + running
|
||||
ansible.builtin.service:
|
||||
name: ssh
|
||||
state: started
|
||||
enabled: true
|
||||
30
infra/ansible/roles/common/tasks/unattended_upgrades.yml
Normal file
30
infra/ansible/roles/common/tasks/unattended_upgrades.yml
Normal file
|
|
@ -0,0 +1,30 @@
|
|||
# unattended-upgrades — security-only updates, no auto-reboot.
|
||||
# Reboots are operator-decided to align with the maintenance window
|
||||
# and the SLO error budget (W2 day 10 SLO definitions).
|
||||
---
|
||||
- name: Render 50unattended-upgrades drop-in
|
||||
ansible.builtin.template:
|
||||
src: 50unattended-upgrades.j2
|
||||
dest: /etc/apt/apt.conf.d/50unattended-upgrades
|
||||
owner: root
|
||||
group: root
|
||||
mode: "0644"
|
||||
notify: Restart unattended-upgrades
|
||||
|
||||
- name: Render 20auto-upgrades — enable timer
|
||||
ansible.builtin.copy:
|
||||
dest: /etc/apt/apt.conf.d/20auto-upgrades
|
||||
owner: root
|
||||
group: root
|
||||
mode: "0644"
|
||||
content: |
|
||||
APT::Periodic::Update-Package-Lists "1";
|
||||
APT::Periodic::Unattended-Upgrade "1";
|
||||
APT::Periodic::AutocleanInterval "7";
|
||||
notify: Restart unattended-upgrades
|
||||
|
||||
- name: Ensure unattended-upgrades is enabled
|
||||
ansible.builtin.service:
|
||||
name: unattended-upgrades
|
||||
state: started
|
||||
enabled: true
|
||||
|
|
@ -0,0 +1,14 @@
|
|||
// Managed by Ansible — do not edit by hand.
|
||||
// Source: infra/ansible/roles/common/templates/50unattended-upgrades.j2
|
||||
|
||||
Unattended-Upgrade::Allowed-Origins {
|
||||
{% for origin in unattended_upgrades_origins %}
|
||||
"{{ origin }}";
|
||||
{% endfor %}
|
||||
};
|
||||
|
||||
Unattended-Upgrade::Mail "{{ veza_ops_email }}";
|
||||
Unattended-Upgrade::MailReport "on-change";
|
||||
Unattended-Upgrade::Remove-Unused-Kernel-Packages "true";
|
||||
Unattended-Upgrade::Remove-Unused-Dependencies "true";
|
||||
Unattended-Upgrade::Automatic-Reboot "{{ unattended_upgrades_auto_reboot | string | lower }}";
|
||||
17
infra/ansible/roles/common/templates/jail.local.j2
Normal file
17
infra/ansible/roles/common/templates/jail.local.j2
Normal file
|
|
@ -0,0 +1,17 @@
|
|||
# Managed by Ansible — do not edit by hand.
|
||||
# Source: infra/ansible/roles/common/templates/jail.local.j2
|
||||
|
||||
[DEFAULT]
|
||||
bantime = {{ fail2ban_bantime }}
|
||||
findtime = {{ fail2ban_findtime }}
|
||||
maxretry = {{ fail2ban_maxretry }}
|
||||
backend = systemd
|
||||
|
||||
# Don't ban the operator's local network during lab work.
|
||||
ignoreip = 127.0.0.1/8 10.0.0.0/8 192.168.0.0/16
|
||||
|
||||
[sshd]
|
||||
enabled = true
|
||||
port = {{ ssh_port }}
|
||||
filter = sshd
|
||||
logpath = /var/log/auth.log
|
||||
|
|
@ -0,0 +1,25 @@
|
|||
# Managed by Ansible — do not edit by hand.
|
||||
# Source: infra/ansible/roles/common/templates/node_exporter.service.j2
|
||||
|
||||
[Unit]
|
||||
Description=Prometheus node_exporter
|
||||
After=network-online.target
|
||||
Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
User=node_exporter
|
||||
Group=node_exporter
|
||||
Type=simple
|
||||
ExecStart=/usr/local/bin/node_exporter \
|
||||
--web.listen-address=:{{ monitoring_node_exporter_port }} \
|
||||
--collector.systemd \
|
||||
--collector.processes
|
||||
Restart=on-failure
|
||||
RestartSec=5s
|
||||
NoNewPrivileges=true
|
||||
ProtectSystem=strict
|
||||
ProtectHome=true
|
||||
PrivateTmp=true
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
18
infra/ansible/roles/common/templates/sshd_hardening.conf.j2
Normal file
18
infra/ansible/roles/common/templates/sshd_hardening.conf.j2
Normal file
|
|
@ -0,0 +1,18 @@
|
|||
# Managed by Ansible — do not edit by hand.
|
||||
# Source: infra/ansible/roles/common/templates/sshd_hardening.conf.j2
|
||||
# Re-render with: ansible-playbook -i inventory/<env> playbooks/site.yml --tags ssh
|
||||
|
||||
Port {{ ssh_port }}
|
||||
PermitRootLogin {{ ssh_permit_root_login }}
|
||||
PasswordAuthentication {{ ssh_password_authentication }}
|
||||
PubkeyAuthentication yes
|
||||
KbdInteractiveAuthentication no
|
||||
ChallengeResponseAuthentication no
|
||||
UsePAM yes
|
||||
X11Forwarding no
|
||||
PrintMotd no
|
||||
ClientAliveInterval 300
|
||||
ClientAliveCountMax 2
|
||||
MaxAuthTries 3
|
||||
LoginGraceTime 30
|
||||
AllowUsers {{ ssh_allow_users | join(' ') }}
|
||||
6
infra/ansible/roles/incus_host/defaults/main.yml
Normal file
6
infra/ansible/roles/incus_host/defaults/main.yml
Normal file
|
|
@ -0,0 +1,6 @@
|
|||
---
|
||||
# Bridge the Veza containers attach to. Override per environment if a
|
||||
# different subnet is desired (e.g. staging on Hetzner using the cloud
|
||||
# private network range).
|
||||
incus_bridge: incusbr0
|
||||
incus_bridge_ipv4: 10.99.0.1/24
|
||||
10
infra/ansible/roles/incus_host/handlers/main.yml
Normal file
10
infra/ansible/roles/incus_host/handlers/main.yml
Normal file
|
|
@ -0,0 +1,10 @@
|
|||
---
|
||||
- name: Update apt cache after Incus repo add
|
||||
ansible.builtin.apt:
|
||||
update_cache: true
|
||||
changed_when: false
|
||||
|
||||
- name: Restart incus
|
||||
ansible.builtin.service:
|
||||
name: incus
|
||||
state: restarted
|
||||
101
infra/ansible/roles/incus_host/tasks/main.yml
Normal file
101
infra/ansible/roles/incus_host/tasks/main.yml
Normal file
|
|
@ -0,0 +1,101 @@
|
|||
# Incus host role — installs Incus from the upstream zabbly repo
|
||||
# (Ubuntu 22.04+) and stages the network bridge that Veza
|
||||
# containers attach to.
|
||||
#
|
||||
# v1.0.9 Day 5: bare bones (install + bridge + first-time init).
|
||||
# Postgres / Redis / MinIO / RabbitMQ / Veza app containers land in
|
||||
# their own roles (W2-W4) and reference `incus_bridge` here.
|
||||
#
|
||||
# Idempotent — running on a host that already has Incus reuses
|
||||
# the existing config rather than re-initialising.
|
||||
---
|
||||
- name: Install zabbly Incus repo signing key
|
||||
ansible.builtin.get_url:
|
||||
url: https://pkgs.zabbly.com/key.asc
|
||||
dest: /etc/apt/keyrings/zabbly.asc
|
||||
mode: "0644"
|
||||
force: false
|
||||
tags: [incus, packages]
|
||||
|
||||
- name: Add zabbly Incus apt source
|
||||
ansible.builtin.copy:
|
||||
dest: /etc/apt/sources.list.d/zabbly-incus-stable.sources
|
||||
owner: root
|
||||
group: root
|
||||
mode: "0644"
|
||||
content: |
|
||||
Enabled: yes
|
||||
Types: deb
|
||||
URIs: https://pkgs.zabbly.com/incus/stable
|
||||
Suites: {{ ansible_distribution_release }}
|
||||
Components: main
|
||||
Architectures: {{ ansible_architecture | replace('x86_64', 'amd64') }}
|
||||
Signed-By: /etc/apt/keyrings/zabbly.asc
|
||||
notify: Update apt cache after Incus repo add
|
||||
tags: [incus, packages]
|
||||
|
||||
- name: Update apt cache (Incus repo just added)
|
||||
ansible.builtin.apt:
|
||||
update_cache: true
|
||||
changed_when: false
|
||||
tags: [incus, packages]
|
||||
|
||||
- name: Install Incus packages
|
||||
ansible.builtin.apt:
|
||||
name:
|
||||
- incus
|
||||
- incus-client
|
||||
state: present
|
||||
tags: [incus, packages]
|
||||
|
||||
- name: Ensure ansible user is in the incus-admin group (lets it run `incus` non-sudo)
|
||||
ansible.builtin.user:
|
||||
name: ansible
|
||||
groups: incus-admin
|
||||
append: true
|
||||
tags: [incus, users]
|
||||
|
||||
- name: Check whether Incus is already initialised
|
||||
ansible.builtin.command: incus list
|
||||
register: incus_init_check
|
||||
changed_when: false
|
||||
failed_when: false
|
||||
check_mode: false
|
||||
tags: [incus, init]
|
||||
|
||||
- name: First-time Incus init via preseed (only when not initialised)
|
||||
ansible.builtin.shell:
|
||||
cmd: |
|
||||
cat <<EOF | incus admin init --preseed
|
||||
config: {}
|
||||
networks:
|
||||
- name: {{ incus_bridge }}
|
||||
type: bridge
|
||||
config:
|
||||
ipv4.address: {{ incus_bridge_ipv4 }}
|
||||
ipv4.nat: "true"
|
||||
ipv6.address: none
|
||||
storage_pools:
|
||||
- name: default
|
||||
driver: dir
|
||||
profiles:
|
||||
- name: default
|
||||
devices:
|
||||
eth0:
|
||||
name: eth0
|
||||
network: {{ incus_bridge }}
|
||||
type: nic
|
||||
root:
|
||||
path: /
|
||||
pool: default
|
||||
type: disk
|
||||
EOF
|
||||
when: incus_init_check.rc != 0
|
||||
tags: [incus, init]
|
||||
|
||||
- name: Ensure Incus service is enabled + running
|
||||
ansible.builtin.service:
|
||||
name: incus
|
||||
state: started
|
||||
enabled: true
|
||||
tags: [incus, service]
|
||||
Loading…
Reference in a new issue