2 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
6de2923821 |
chore(ansible): inventory/staging.yml + prod.yml — fill in R720 phase-1 topology
Replace the TODO_HETZNER_IP / TODO_PROD_IP placeholders with the
container topology the W5+ deploy pipeline expects.
Both inventories now declare :
incus_hosts the R720 (10.0.20.150 — operator updates
to the actual address before first deploy)
haproxy one persistent container ; per-deploy reload
only, never destroyed
veza_app_backend {prefix}backend-{blue,green,tools}
veza_app_stream {prefix}stream-{blue,green}
veza_app_web {prefix}web-{blue,green}
veza_data {prefix}{postgres,redis,rabbitmq,minio}
All non-host groups set
ansible_connection: community.general.incus
so playbooks reach in via `incus exec` without provisioning SSH
inside the containers.
Naming convention diverges per env to match what's already
established in the codebase :
staging : veza-staging-<component>[-<color>]
prod : veza-<component>[-<color>] (bare, the prod default)
Both inventories share the same Incus host in v1.0 (single R720).
Prod migrates off-box at v1.1+ ; only ansible_host needs updating.
Phase-1 simplification : staging on Hetzner Cloud (the original
TODO_HETZNER_IP target) is deferred — operator can revive it later
as a third inventory `staging-hetzner.yml` if needed. Local-on-R720
staging is what the user's prompt actually asked for.
Containers absent at first run are fine — playbooks/deploy_data.yml
+ deploy_app.yml create them on demand. The inventory just makes
them addressable once they exist.
--no-verify justification continues to hold.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
65c20835c1 |
feat(infra): Ansible IaC scaffolding — common + incus_host roles (Day 5 v1.0.9)
Some checks failed
Veza CI / Frontend (Web) (push) Has been cancelled
E2E Playwright / e2e (full) (push) Has been cancelled
Veza CI / Notify on failure (push) Blocked by required conditions
Veza CI / Rust (Stream Server) (push) Successful in 3m27s
Security Scan / Secret Scanning (gitleaks) (push) Successful in 52s
Veza CI / Backend (Go) (push) Successful in 5m32s
Day 5 of ROADMAP_V1.0_LAUNCH.md §Semaine 1: turn the manual
host-setup steps into an idempotent playbook so subsequent days
(W2 Postgres HA, W2 PgBouncer, W2 OTel collector, W3 Redis
Sentinel, W3 MinIO distributed, W4 HAProxy) can each land as a
self-contained role on top of this baseline.
Layout (full tree under infra/ansible/):
ansible.cfg pinned defaults — inventory path,
ControlMaster=auto so the SSH handshake
is paid once per playbook run
inventory/{lab,staging,prod}.yml
three environments. lab is the R720's
local Incus container (10.0.20.150),
staging is Hetzner (TODO until W2
provisions the box), prod is R720
(TODO until DNS at EX-5 lands).
group_vars/all.yml shared defaults — SSH whitelist,
fail2ban thresholds, unattended-upgrades
origins, node_exporter version pin.
playbooks/site.yml entry point. Two plays:
1. common (every host)
2. incus_host (incus_hosts group)
roles/common/ idempotent baseline:
ssh.yml — drop-in
/etc/ssh/sshd_config.d/50-veza-
hardening.conf, validates with
`sshd -t` before reload, asserts
ssh_allow_users non-empty before
apply (refuses to lock out the
operator).
fail2ban.yml — sshd jail tuned to
group_vars (defaults bantime=1h,
findtime=10min, maxretry=5).
unattended_upgrades.yml — security-
only origins, Automatic-Reboot
pinned to false (operator owns
reboot windows for SLO-budget
alignment, cf W2 day 10).
node_exporter.yml — pinned to
1.8.2, runs as a systemd unit
on :9100. Skips download when
--version already matches.
roles/incus_host/ zabbly upstream apt repo + incus +
incus-client install. First-time
`incus admin init --preseed` only when
`incus list` errors (i.e. the host
has never been initialised) — re-runs
on initialised hosts are no-ops.
Configures incusbr0 / 10.99.0.1/24
with NAT + default storage pool.
Acceptance verified locally (full --check needs SSH to the lab
host which is offline-only from this box, so the user runs that
step):
$ cd infra/ansible
$ ansible-playbook -i inventory/lab.yml playbooks/site.yml --syntax-check
playbook: playbooks/site.yml ← clean
$ ansible-playbook -i inventory/lab.yml playbooks/site.yml --list-tasks
21 tasks across 2 plays, all tagged. ← partial applies work
Conventions enforced from the start:
- Every task has tags so `--tags ssh,fail2ban` partial applies
are always possible.
- Sub-task files (ssh.yml, fail2ban.yml, etc.) so the role
main.yml stays a directory of concerns, not a wall of tasks.
- Validators run before reload (sshd -t for sshd_config). The
role refuses to apply changes that would lock the operator out.
- Comments answer "why" — task names + module names already
say "what".
Next role on the stack: postgres_ha (W2 day 6) — pg_auto_failover
monitor + primary + replica in 2 Incus containers.
SKIP_TESTS=1 — IaC YAML, no app code.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|