veza/infra/ansible/inventory/prod.yml

61 lines
1.9 KiB
YAML
Raw Normal View History

chore(ansible): inventory/staging.yml + prod.yml — fill in R720 phase-1 topology Replace the TODO_HETZNER_IP / TODO_PROD_IP placeholders with the container topology the W5+ deploy pipeline expects. Both inventories now declare : incus_hosts the R720 (10.0.20.150 — operator updates to the actual address before first deploy) haproxy one persistent container ; per-deploy reload only, never destroyed veza_app_backend {prefix}backend-{blue,green,tools} veza_app_stream {prefix}stream-{blue,green} veza_app_web {prefix}web-{blue,green} veza_data {prefix}{postgres,redis,rabbitmq,minio} All non-host groups set ansible_connection: community.general.incus so playbooks reach in via `incus exec` without provisioning SSH inside the containers. Naming convention diverges per env to match what's already established in the codebase : staging : veza-staging-<component>[-<color>] prod : veza-<component>[-<color>] (bare, the prod default) Both inventories share the same Incus host in v1.0 (single R720). Prod migrates off-box at v1.1+ ; only ansible_host needs updating. Phase-1 simplification : staging on Hetzner Cloud (the original TODO_HETZNER_IP target) is deferred — operator can revive it later as a third inventory `staging-hetzner.yml` if needed. Local-on-R720 staging is what the user's prompt actually asked for. Containers absent at first run are fine — playbooks/deploy_data.yml + deploy_app.yml create them on demand. The inventory just makes them addressable once they exist. --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 12:50:27 +00:00
# Prod inventory — single R720 (self-hosted Incus) at v1.0 launch,
# Hetzner debordement post-launch. ROADMAP_V1.0_LAUNCH.md §2 documents
# the COMPRESSED HA stance : real multi-host HA arrives v1.1+ ; v1.0
# ships single-host with EC4+2 MinIO + PgAutoFailover colocated.
feat(infra): Ansible IaC scaffolding — common + incus_host roles (Day 5 v1.0.9) Day 5 of ROADMAP_V1.0_LAUNCH.md §Semaine 1: turn the manual host-setup steps into an idempotent playbook so subsequent days (W2 Postgres HA, W2 PgBouncer, W2 OTel collector, W3 Redis Sentinel, W3 MinIO distributed, W4 HAProxy) can each land as a self-contained role on top of this baseline. Layout (full tree under infra/ansible/): ansible.cfg pinned defaults — inventory path, ControlMaster=auto so the SSH handshake is paid once per playbook run inventory/{lab,staging,prod}.yml three environments. lab is the R720's local Incus container (10.0.20.150), staging is Hetzner (TODO until W2 provisions the box), prod is R720 (TODO until DNS at EX-5 lands). group_vars/all.yml shared defaults — SSH whitelist, fail2ban thresholds, unattended-upgrades origins, node_exporter version pin. playbooks/site.yml entry point. Two plays: 1. common (every host) 2. incus_host (incus_hosts group) roles/common/ idempotent baseline: ssh.yml — drop-in /etc/ssh/sshd_config.d/50-veza- hardening.conf, validates with `sshd -t` before reload, asserts ssh_allow_users non-empty before apply (refuses to lock out the operator). fail2ban.yml — sshd jail tuned to group_vars (defaults bantime=1h, findtime=10min, maxretry=5). unattended_upgrades.yml — security- only origins, Automatic-Reboot pinned to false (operator owns reboot windows for SLO-budget alignment, cf W2 day 10). node_exporter.yml — pinned to 1.8.2, runs as a systemd unit on :9100. Skips download when --version already matches. roles/incus_host/ zabbly upstream apt repo + incus + incus-client install. First-time `incus admin init --preseed` only when `incus list` errors (i.e. the host has never been initialised) — re-runs on initialised hosts are no-ops. Configures incusbr0 / 10.99.0.1/24 with NAT + default storage pool. Acceptance verified locally (full --check needs SSH to the lab host which is offline-only from this box, so the user runs that step): $ cd infra/ansible $ ansible-playbook -i inventory/lab.yml playbooks/site.yml --syntax-check playbook: playbooks/site.yml ← clean $ ansible-playbook -i inventory/lab.yml playbooks/site.yml --list-tasks 21 tasks across 2 plays, all tagged. ← partial applies work Conventions enforced from the start: - Every task has tags so `--tags ssh,fail2ban` partial applies are always possible. - Sub-task files (ssh.yml, fail2ban.yml, etc.) so the role main.yml stays a directory of concerns, not a wall of tasks. - Validators run before reload (sshd -t for sshd_config). The role refuses to apply changes that would lock the operator out. - Comments answer "why" — task names + module names already say "what". Next role on the stack: postgres_ha (W2 day 6) — pg_auto_failover monitor + primary + replica in 2 Incus containers. SKIP_TESTS=1 — IaC YAML, no app code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:16:38 +00:00
#
chore(ansible): inventory/staging.yml + prod.yml — fill in R720 phase-1 topology Replace the TODO_HETZNER_IP / TODO_PROD_IP placeholders with the container topology the W5+ deploy pipeline expects. Both inventories now declare : incus_hosts the R720 (10.0.20.150 — operator updates to the actual address before first deploy) haproxy one persistent container ; per-deploy reload only, never destroyed veza_app_backend {prefix}backend-{blue,green,tools} veza_app_stream {prefix}stream-{blue,green} veza_app_web {prefix}web-{blue,green} veza_data {prefix}{postgres,redis,rabbitmq,minio} All non-host groups set ansible_connection: community.general.incus so playbooks reach in via `incus exec` without provisioning SSH inside the containers. Naming convention diverges per env to match what's already established in the codebase : staging : veza-staging-<component>[-<color>] prod : veza-<component>[-<color>] (bare, the prod default) Both inventories share the same Incus host in v1.0 (single R720). Prod migrates off-box at v1.1+ ; only ansible_host needs updating. Phase-1 simplification : staging on Hetzner Cloud (the original TODO_HETZNER_IP target) is deferred — operator can revive it later as a third inventory `staging-hetzner.yml` if needed. Local-on-R720 staging is what the user's prompt actually asked for. Containers absent at first run are fine — playbooks/deploy_data.yml + deploy_app.yml create them on demand. The inventory just makes them addressable once they exist. --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 12:50:27 +00:00
# Topology mirrors staging.yml (same shape, different prefix +
# different network — see group_vars/prod.yml). Phase-2 (post v1.1)
# flips `veza-prod` to a non-R720 host without changing any other
# part of this file.
#
# Naming : every container ends up `veza-<component>[-<color>]` because
# group_vars/prod.yml sets veza_container_prefix=veza- (the established
# convention — staging is prefixed, prod is bare).
feat(infra): Ansible IaC scaffolding — common + incus_host roles (Day 5 v1.0.9) Day 5 of ROADMAP_V1.0_LAUNCH.md §Semaine 1: turn the manual host-setup steps into an idempotent playbook so subsequent days (W2 Postgres HA, W2 PgBouncer, W2 OTel collector, W3 Redis Sentinel, W3 MinIO distributed, W4 HAProxy) can each land as a self-contained role on top of this baseline. Layout (full tree under infra/ansible/): ansible.cfg pinned defaults — inventory path, ControlMaster=auto so the SSH handshake is paid once per playbook run inventory/{lab,staging,prod}.yml three environments. lab is the R720's local Incus container (10.0.20.150), staging is Hetzner (TODO until W2 provisions the box), prod is R720 (TODO until DNS at EX-5 lands). group_vars/all.yml shared defaults — SSH whitelist, fail2ban thresholds, unattended-upgrades origins, node_exporter version pin. playbooks/site.yml entry point. Two plays: 1. common (every host) 2. incus_host (incus_hosts group) roles/common/ idempotent baseline: ssh.yml — drop-in /etc/ssh/sshd_config.d/50-veza- hardening.conf, validates with `sshd -t` before reload, asserts ssh_allow_users non-empty before apply (refuses to lock out the operator). fail2ban.yml — sshd jail tuned to group_vars (defaults bantime=1h, findtime=10min, maxretry=5). unattended_upgrades.yml — security- only origins, Automatic-Reboot pinned to false (operator owns reboot windows for SLO-budget alignment, cf W2 day 10). node_exporter.yml — pinned to 1.8.2, runs as a systemd unit on :9100. Skips download when --version already matches. roles/incus_host/ zabbly upstream apt repo + incus + incus-client install. First-time `incus admin init --preseed` only when `incus list` errors (i.e. the host has never been initialised) — re-runs on initialised hosts are no-ops. Configures incusbr0 / 10.99.0.1/24 with NAT + default storage pool. Acceptance verified locally (full --check needs SSH to the lab host which is offline-only from this box, so the user runs that step): $ cd infra/ansible $ ansible-playbook -i inventory/lab.yml playbooks/site.yml --syntax-check playbook: playbooks/site.yml ← clean $ ansible-playbook -i inventory/lab.yml playbooks/site.yml --list-tasks 21 tasks across 2 plays, all tagged. ← partial applies work Conventions enforced from the start: - Every task has tags so `--tags ssh,fail2ban` partial applies are always possible. - Sub-task files (ssh.yml, fail2ban.yml, etc.) so the role main.yml stays a directory of concerns, not a wall of tasks. - Validators run before reload (sshd -t for sshd_config). The role refuses to apply changes that would lock the operator out. - Comments answer "why" — task names + module names already say "what". Next role on the stack: postgres_ha (W2 day 6) — pg_auto_failover monitor + primary + replica in 2 Incus containers. SKIP_TESTS=1 — IaC YAML, no app code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:16:38 +00:00
all:
hosts:
veza-prod:
chore(ansible): inventory/staging.yml + prod.yml — fill in R720 phase-1 topology Replace the TODO_HETZNER_IP / TODO_PROD_IP placeholders with the container topology the W5+ deploy pipeline expects. Both inventories now declare : incus_hosts the R720 (10.0.20.150 — operator updates to the actual address before first deploy) haproxy one persistent container ; per-deploy reload only, never destroyed veza_app_backend {prefix}backend-{blue,green,tools} veza_app_stream {prefix}stream-{blue,green} veza_app_web {prefix}web-{blue,green} veza_data {prefix}{postgres,redis,rabbitmq,minio} All non-host groups set ansible_connection: community.general.incus so playbooks reach in via `incus exec` without provisioning SSH inside the containers. Naming convention diverges per env to match what's already established in the codebase : staging : veza-staging-<component>[-<color>] prod : veza-<component>[-<color>] (bare, the prod default) Both inventories share the same Incus host in v1.0 (single R720). Prod migrates off-box at v1.1+ ; only ansible_host needs updating. Phase-1 simplification : staging on Hetzner Cloud (the original TODO_HETZNER_IP target) is deferred — operator can revive it later as a third inventory `staging-hetzner.yml` if needed. Local-on-R720 staging is what the user's prompt actually asked for. Containers absent at first run are fine — playbooks/deploy_data.yml + deploy_app.yml create them on demand. The inventory just makes them addressable once they exist. --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 12:50:27 +00:00
ansible_host: 10.0.20.150
feat(infra): Ansible IaC scaffolding — common + incus_host roles (Day 5 v1.0.9) Day 5 of ROADMAP_V1.0_LAUNCH.md §Semaine 1: turn the manual host-setup steps into an idempotent playbook so subsequent days (W2 Postgres HA, W2 PgBouncer, W2 OTel collector, W3 Redis Sentinel, W3 MinIO distributed, W4 HAProxy) can each land as a self-contained role on top of this baseline. Layout (full tree under infra/ansible/): ansible.cfg pinned defaults — inventory path, ControlMaster=auto so the SSH handshake is paid once per playbook run inventory/{lab,staging,prod}.yml three environments. lab is the R720's local Incus container (10.0.20.150), staging is Hetzner (TODO until W2 provisions the box), prod is R720 (TODO until DNS at EX-5 lands). group_vars/all.yml shared defaults — SSH whitelist, fail2ban thresholds, unattended-upgrades origins, node_exporter version pin. playbooks/site.yml entry point. Two plays: 1. common (every host) 2. incus_host (incus_hosts group) roles/common/ idempotent baseline: ssh.yml — drop-in /etc/ssh/sshd_config.d/50-veza- hardening.conf, validates with `sshd -t` before reload, asserts ssh_allow_users non-empty before apply (refuses to lock out the operator). fail2ban.yml — sshd jail tuned to group_vars (defaults bantime=1h, findtime=10min, maxretry=5). unattended_upgrades.yml — security- only origins, Automatic-Reboot pinned to false (operator owns reboot windows for SLO-budget alignment, cf W2 day 10). node_exporter.yml — pinned to 1.8.2, runs as a systemd unit on :9100. Skips download when --version already matches. roles/incus_host/ zabbly upstream apt repo + incus + incus-client install. First-time `incus admin init --preseed` only when `incus list` errors (i.e. the host has never been initialised) — re-runs on initialised hosts are no-ops. Configures incusbr0 / 10.99.0.1/24 with NAT + default storage pool. Acceptance verified locally (full --check needs SSH to the lab host which is offline-only from this box, so the user runs that step): $ cd infra/ansible $ ansible-playbook -i inventory/lab.yml playbooks/site.yml --syntax-check playbook: playbooks/site.yml ← clean $ ansible-playbook -i inventory/lab.yml playbooks/site.yml --list-tasks 21 tasks across 2 plays, all tagged. ← partial applies work Conventions enforced from the start: - Every task has tags so `--tags ssh,fail2ban` partial applies are always possible. - Sub-task files (ssh.yml, fail2ban.yml, etc.) so the role main.yml stays a directory of concerns, not a wall of tasks. - Validators run before reload (sshd -t for sshd_config). The role refuses to apply changes that would lock the operator out. - Comments answer "why" — task names + module names already say "what". Next role on the stack: postgres_ha (W2 day 6) — pg_auto_failover monitor + primary + replica in 2 Incus containers. SKIP_TESTS=1 — IaC YAML, no app code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:16:38 +00:00
ansible_user: ansible
ansible_python_interpreter: /usr/bin/python3
children:
incus_hosts:
hosts:
veza-prod:
chore(ansible): inventory/staging.yml + prod.yml — fill in R720 phase-1 topology Replace the TODO_HETZNER_IP / TODO_PROD_IP placeholders with the container topology the W5+ deploy pipeline expects. Both inventories now declare : incus_hosts the R720 (10.0.20.150 — operator updates to the actual address before first deploy) haproxy one persistent container ; per-deploy reload only, never destroyed veza_app_backend {prefix}backend-{blue,green,tools} veza_app_stream {prefix}stream-{blue,green} veza_app_web {prefix}web-{blue,green} veza_data {prefix}{postgres,redis,rabbitmq,minio} All non-host groups set ansible_connection: community.general.incus so playbooks reach in via `incus exec` without provisioning SSH inside the containers. Naming convention diverges per env to match what's already established in the codebase : staging : veza-staging-<component>[-<color>] prod : veza-<component>[-<color>] (bare, the prod default) Both inventories share the same Incus host in v1.0 (single R720). Prod migrates off-box at v1.1+ ; only ansible_host needs updating. Phase-1 simplification : staging on Hetzner Cloud (the original TODO_HETZNER_IP target) is deferred — operator can revive it later as a third inventory `staging-hetzner.yml` if needed. Local-on-R720 staging is what the user's prompt actually asked for. Containers absent at first run are fine — playbooks/deploy_data.yml + deploy_app.yml create them on demand. The inventory just makes them addressable once they exist. --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 12:50:27 +00:00
haproxy:
feat(infra): Ansible IaC scaffolding — common + incus_host roles (Day 5 v1.0.9) Day 5 of ROADMAP_V1.0_LAUNCH.md §Semaine 1: turn the manual host-setup steps into an idempotent playbook so subsequent days (W2 Postgres HA, W2 PgBouncer, W2 OTel collector, W3 Redis Sentinel, W3 MinIO distributed, W4 HAProxy) can each land as a self-contained role on top of this baseline. Layout (full tree under infra/ansible/): ansible.cfg pinned defaults — inventory path, ControlMaster=auto so the SSH handshake is paid once per playbook run inventory/{lab,staging,prod}.yml three environments. lab is the R720's local Incus container (10.0.20.150), staging is Hetzner (TODO until W2 provisions the box), prod is R720 (TODO until DNS at EX-5 lands). group_vars/all.yml shared defaults — SSH whitelist, fail2ban thresholds, unattended-upgrades origins, node_exporter version pin. playbooks/site.yml entry point. Two plays: 1. common (every host) 2. incus_host (incus_hosts group) roles/common/ idempotent baseline: ssh.yml — drop-in /etc/ssh/sshd_config.d/50-veza- hardening.conf, validates with `sshd -t` before reload, asserts ssh_allow_users non-empty before apply (refuses to lock out the operator). fail2ban.yml — sshd jail tuned to group_vars (defaults bantime=1h, findtime=10min, maxretry=5). unattended_upgrades.yml — security- only origins, Automatic-Reboot pinned to false (operator owns reboot windows for SLO-budget alignment, cf W2 day 10). node_exporter.yml — pinned to 1.8.2, runs as a systemd unit on :9100. Skips download when --version already matches. roles/incus_host/ zabbly upstream apt repo + incus + incus-client install. First-time `incus admin init --preseed` only when `incus list` errors (i.e. the host has never been initialised) — re-runs on initialised hosts are no-ops. Configures incusbr0 / 10.99.0.1/24 with NAT + default storage pool. Acceptance verified locally (full --check needs SSH to the lab host which is offline-only from this box, so the user runs that step): $ cd infra/ansible $ ansible-playbook -i inventory/lab.yml playbooks/site.yml --syntax-check playbook: playbooks/site.yml ← clean $ ansible-playbook -i inventory/lab.yml playbooks/site.yml --list-tasks 21 tasks across 2 plays, all tagged. ← partial applies work Conventions enforced from the start: - Every task has tags so `--tags ssh,fail2ban` partial applies are always possible. - Sub-task files (ssh.yml, fail2ban.yml, etc.) so the role main.yml stays a directory of concerns, not a wall of tasks. - Validators run before reload (sshd -t for sshd_config). The role refuses to apply changes that would lock the operator out. - Comments answer "why" — task names + module names already say "what". Next role on the stack: postgres_ha (W2 day 6) — pg_auto_failover monitor + primary + replica in 2 Incus containers. SKIP_TESTS=1 — IaC YAML, no app code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:16:38 +00:00
hosts:
chore(ansible): inventory/staging.yml + prod.yml — fill in R720 phase-1 topology Replace the TODO_HETZNER_IP / TODO_PROD_IP placeholders with the container topology the W5+ deploy pipeline expects. Both inventories now declare : incus_hosts the R720 (10.0.20.150 — operator updates to the actual address before first deploy) haproxy one persistent container ; per-deploy reload only, never destroyed veza_app_backend {prefix}backend-{blue,green,tools} veza_app_stream {prefix}stream-{blue,green} veza_app_web {prefix}web-{blue,green} veza_data {prefix}{postgres,redis,rabbitmq,minio} All non-host groups set ansible_connection: community.general.incus so playbooks reach in via `incus exec` without provisioning SSH inside the containers. Naming convention diverges per env to match what's already established in the codebase : staging : veza-staging-<component>[-<color>] prod : veza-<component>[-<color>] (bare, the prod default) Both inventories share the same Incus host in v1.0 (single R720). Prod migrates off-box at v1.1+ ; only ansible_host needs updating. Phase-1 simplification : staging on Hetzner Cloud (the original TODO_HETZNER_IP target) is deferred — operator can revive it later as a third inventory `staging-hetzner.yml` if needed. Local-on-R720 staging is what the user's prompt actually asked for. Containers absent at first run are fine — playbooks/deploy_data.yml + deploy_app.yml create them on demand. The inventory just makes them addressable once they exist. --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 12:50:27 +00:00
veza-haproxy:
vars:
ansible_connection: community.general.incus
ansible_python_interpreter: /usr/bin/python3
veza_app_backend:
hosts:
veza-backend-blue:
veza-backend-green:
veza-backend-tools: # ephemeral, Phase A only
vars:
ansible_connection: community.general.incus
ansible_python_interpreter: /usr/bin/python3
veza_app_stream:
hosts:
veza-stream-blue:
veza-stream-green:
vars:
ansible_connection: community.general.incus
ansible_python_interpreter: /usr/bin/python3
veza_app_web:
hosts:
veza-web-blue:
veza-web-green:
vars:
ansible_connection: community.general.incus
ansible_python_interpreter: /usr/bin/python3
veza_data:
hosts:
veza-postgres:
veza-redis:
veza-rabbitmq:
veza-minio:
vars:
ansible_connection: community.general.incus
ansible_python_interpreter: /usr/bin/python3