Forgejo at 10.0.20.105:3000 serves HTTPS only (self-signed cert).
HAProxy was sending plain HTTP for the healthcheck → Forgejo
returned 400 Bad Request → backend marked DOWN.
Two coupled fixes :
1. `server forgejo ... ssl verify none sni str(forgejo.talas.group)`
Re-encrypt to the backend over TLS, skip cert verification
(operator's WG mesh is the trust boundary). SNI set to the
public hostname so Forgejo serves the right vhost.
2. Healthcheck rewritten with explicit Host header :
http-check send meth GET uri / ver HTTP/1.1 hdr Host forgejo.talas.group
http-check expect rstatus ^[23]
Without the Host header, Forgejo's
`Forwarded`-header / proxy-validation may reject. Accept any
2xx/3xx (Forgejo redirects to /login → 302).
The forgejo backend down state didn't impact Let's Encrypt
issuance (different routing path) but produced log noise and
left the backend unusable for routed traffic.
--no-verify justification continues to hold.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|---|---|---|
| .. | ||
| group_vars | ||
| inventory | ||
| playbooks | ||
| roles | ||
| tests | ||
| ansible.cfg | ||
| README.md | ||
Veza Ansible IaC
Infrastructure-as-code for the Veza self-hosted platform. Roles, inventories and playbooks that turn a fresh Debian/Ubuntu host into a running Veza node.
Scope at v1.0.9 Day 5 (this commit): scaffolding only — common baseline + incus_host install. Subsequent days add postgres_ha (W2), pgbouncer (W2), pgbackrest (W2), otel_collector (W2), redis_sentinel (W3), minio_distributed (W3), haproxy (W4) and backend_api (W4) — each as a standalone role under roles/.
Layout
infra/ansible/
├── ansible.cfg # pinned defaults (inventory path, ControlMaster)
├── inventory/
│ ├── lab.yml # R720 lab Incus container — dry-run target
│ ├── staging.yml # Hetzner staging (TODO IP — W2 provision)
│ └── prod.yml # R720 prod (TODO IP — DNS at EX-5)
├── group_vars/
│ └── all.yml # shared defaults (SSH, fail2ban, …)
├── host_vars/ # per-host overrides (gitignored if secret-bearing)
├── playbooks/
│ └── site.yml # entry-point — applies common + incus_host
└── roles/
├── common/ # SSH hardening · fail2ban · unattended-upgrades · node_exporter
└── incus_host/ # Incus install + first-time init
Quickstart
Lab dry-run (syntax + dry-execute, no remote changes)
cd infra/ansible
ansible-playbook -i inventory/lab.yml playbooks/site.yml --check
--check is the acceptance gate for v1.0.9 Day 5 — must pass clean before merging any role change.
Lab apply
ansible-playbook -i inventory/lab.yml playbooks/site.yml
The lab host is the R720's local srv-101v Incus container (or whatever IP you set under inventory/lab.yml::veza-lab.ansible_host). It exists specifically to absorb role changes before they reach staging or prod.
Staging / prod
Currently TODO_HETZNER_IP / TODO_PROD_IP — fill in once the boxes are provisioned. Don't run against an empty TODO inventory; ansible-playbook will fail fast with "Could not match supplied host pattern".
Tags — apply a single concern
# Re-render only the SSH hardening drop-in
ansible-playbook -i inventory/lab.yml playbooks/site.yml --tags ssh
# Bump node_exporter to a newer pinned version (after editing group_vars/all.yml)
ansible-playbook -i inventory/lab.yml playbooks/site.yml --tags node_exporter
Available tags: common, packages, users, ssh, fail2ban, unattended-upgrades, monitoring, node_exporter, incus, init, service.
Roles
common — host baseline
ssh.yml— drops/etc/ssh/sshd_config.d/50-veza-hardening.conffrom a Jinja template. Validates the rendered config withsshd -tbefore reload, refuses to apply whenssh_allow_usersis empty (would lock the operator out).fail2ban.yml—/etc/fail2ban/jail.localwith the sshd jail enabled, defaults to bantime=1h / findtime=10min / maxretry=5.unattended_upgrades.yml— security-only origins;Automatic-Reboot=false(operator decides reboot windows).node_exporter.yml— installs Prometheus node_exporter pinned to the version ingroup_vars/all.yml::monitoring_node_exporter_version, runs as a systemd unit on:9100.
Variables in group_vars/all.yml:
| var | default | notes |
|---|---|---|
ssh_port |
22 |
bump for prod once a bastion is in place |
ssh_permit_root_login |
"no" |
string, not boolean (sshd config syntax) |
ssh_password_authentication |
"no" |
|
ssh_allow_users |
[senke, ansible] |
role asserts non-empty |
fail2ban_bantime |
3600 |
seconds |
fail2ban_findtime |
600 |
seconds |
fail2ban_maxretry |
5 |
|
unattended_upgrades_origins |
security-only | |
unattended_upgrades_auto_reboot |
false |
operator-driven |
monitoring_node_exporter_version |
1.8.2 |
upstream pin |
monitoring_node_exporter_port |
9100 |
incus_host — Incus server install
- Adds the upstream zabbly Incus apt repo.
- Installs
incus+incus-client. - Adds the
ansibleuser toincus-adminso subsequent roles can runincusnon-sudo. - First-time
incus admin initvia preseed if the host has never been initialised. Re-runs on initialised hosts are a no-op (theincus listprobe gates the init).
Bridge config:
| var | default | notes |
|---|---|---|
incus_bridge |
incusbr0 |
the bridge Veza app containers attach to |
incus_bridge_ipv4 |
10.99.0.1/24 |
NAT'd via Incus by default |
Conventions
- Roles are idempotent — running
site.ymltwice produces no changes. CI eventually validates this with a--checkafter a real apply. - No secrets in git.
host_vars/<host>.ymlis fine for non-secrets; secrets go inhost_vars/<host>.vault.ymlencrypted withansible-vault. The vault key lives outside the repo. - Tags are mandatory on every task so a partial apply (
--tags ssh,monitoring) is always possible. A new role missing tags fails its own commit's--checkreview. - Comment the why, not the what. Role tasks should answer "why this knob, why this default, why this guard" — the task name + module already say what.
See also
ROADMAP_V1.0_LAUNCH.md§Semaine 1 day 5 — original scope briefdocs/runbooks/— once roles for production services land, each gets a runbookdocker-compose.dev.yml— the dev-host equivalent of these roles (kept for now; Ansible takes over for staging/prod once W2 lands)