veza/scripts/bootstrap/bootstrap-local.sh

357 lines
15 KiB
Bash
Raw Normal View History

feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
#!/usr/bin/env bash
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
# bootstrap-local.sh — run on the operator's laptop. Drives the
# bootstrap end-to-end via Ansible (no NOPASSWD sudo, no manual
# SSH-script-streaming).
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
#
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
# Phases (each idempotent ; resumable via PHASE=N) :
# 1. preflight — required local tools, SSH to R720, DNS
# 2. vault — render + encrypt vault.yml, write .vault-pass
# 3. forgejo — set repo Secrets / Variables via Forgejo API
# 4. ansible-bootstrap — single ansible-playbook run that does :
# * Incus profiles on R720
# * forgejo-runner Incus socket + nesting + binary
# * forgejo-runner registered with `incus` label
# * HAProxy edge container + Let's Encrypt certs
# 5. summary
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
#
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
# Inputs (env vars or .env file in this dir) :
# R720_HOST ssh target (default: srv-102v)
# R720_USER ssh user (leave empty if alias has User=)
# FORGEJO_API_URL default: https://10.0.20.105:3000
# FORGEJO_INSECURE 1 to skip TLS verify (default: 1 for LAN)
# FORGEJO_OWNER default: senke
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
# FORGEJO_REPO default: veza
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
# FORGEJO_ADMIN_TOKEN MANDATORY (Forgejo Settings → Applications)
#
# Sudo on the R720 : NOT NOPASSWD. Ansible prompts the operator ONCE
# per run via --ask-become-pass.
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
set -Eeuo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
. "$SCRIPT_DIR/lib.sh"
trap_errors
[[ -f "$SCRIPT_DIR/.env" ]] && . "$SCRIPT_DIR/.env"
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
: "${R720_HOST:=srv-102v}"
: "${R720_USER:=}"
: "${FORGEJO_API_URL:=https://10.0.20.105:3000}"
: "${FORGEJO_INSECURE:=1}"
: "${FORGEJO_OWNER:=senke}"
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
: "${FORGEJO_REPO:=veza}"
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
REPO_ROOT=$(git -C "$SCRIPT_DIR" rev-parse --show-toplevel) \
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
|| die "not in a git repo (or git missing)"
VAULT_YML="$REPO_ROOT/infra/ansible/group_vars/all/vault.yml"
VAULT_EXAMPLE="$REPO_ROOT/infra/ansible/group_vars/all/vault.yml.example"
VAULT_PASS="$REPO_ROOT/infra/ansible/.vault-pass"
TALAS_STATE_DIR="$REPO_ROOT/.git/talas-bootstrap"
TALAS_STATE_FILE="$TALAS_STATE_DIR/local.state"
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
# SSH target = "user@host" or just "host" if R720_USER is empty
# (alias's User= line wins).
if [[ -n "$R720_USER" ]]; then SSH_TARGET="$R720_USER@$R720_HOST"; else SSH_TARGET="$R720_HOST"; fi
feat(bootstrap): phase 2 auto-fills 11 vault secrets, prompts on the rest The vault.yml.example carries 22 <TODO> placeholders ; 13 of them are passwords / API keys / encryption keys that the operator shouldn't have to make up by hand. Phase 2 now generates them. Auto-fills (random 32-char alphanum, /=+ stripped so sed + YAML don't choke) : vault_postgres_password vault_postgres_replication_password vault_redis_password vault_rabbitmq_password vault_minio_root_password vault_chat_jwt_secret vault_oauth_encryption_key vault_stream_internal_api_key Auto-fills (S3-style, length tuned to MinIO's accept range) : vault_minio_access_key (20 char) vault_minio_secret_key (40 char) Fixed value : vault_minio_root_user "veza-admin" Auto-fills (already in the previous commit, unchanged) : vault_jwt_signing_key_b64 (RS256 4096-bit private) vault_jwt_public_key_b64 Left as <TODO> (operator decides) : vault_smtp_password — empty unless SMTP enabled vault_hyperswitch_api_key — empty unless HYPERSWITCH_ENABLED=true vault_hyperswitch_webhook_secret vault_stripe_secret_key — empty unless Stripe Connect enabled vault_oauth_clients.{google,spotify}.{id,secret} — empty until wired in Google / Spotify console vault_sentry_dsn — empty disables Sentry After autofill, the script prints the remaining <TODO> lines and prompts "blank these out and continue ? (y/n)". Answering y replaces every remaining "<TODO ...>" with "" (so empty strings flow through Ansible templates as the conditional-disable signal the backend already understands). Answering n exits with a suggestion to edit vault.yml manually. The autofill is idempotent — re-running phase 2 on a vault.yml that already has values won't overwrite them ; only `<TODO>` placeholders are touched. Helper functions live at the top of bootstrap-local.sh : _rand_token <len> — URL-safe random alphanum _autofill_field <file> <key> <value> — sed-replace one TODO line _autogen_jwt_keys <file> — RS256 keypair → both b64 fields _autofill_vault_secrets <file> — drives the per-field map above --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 21:06:47 +00:00
# ============================================================================
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
# Vault helpers (used by phase 2)
feat(bootstrap): phase 2 auto-fills 11 vault secrets, prompts on the rest The vault.yml.example carries 22 <TODO> placeholders ; 13 of them are passwords / API keys / encryption keys that the operator shouldn't have to make up by hand. Phase 2 now generates them. Auto-fills (random 32-char alphanum, /=+ stripped so sed + YAML don't choke) : vault_postgres_password vault_postgres_replication_password vault_redis_password vault_rabbitmq_password vault_minio_root_password vault_chat_jwt_secret vault_oauth_encryption_key vault_stream_internal_api_key Auto-fills (S3-style, length tuned to MinIO's accept range) : vault_minio_access_key (20 char) vault_minio_secret_key (40 char) Fixed value : vault_minio_root_user "veza-admin" Auto-fills (already in the previous commit, unchanged) : vault_jwt_signing_key_b64 (RS256 4096-bit private) vault_jwt_public_key_b64 Left as <TODO> (operator decides) : vault_smtp_password — empty unless SMTP enabled vault_hyperswitch_api_key — empty unless HYPERSWITCH_ENABLED=true vault_hyperswitch_webhook_secret vault_stripe_secret_key — empty unless Stripe Connect enabled vault_oauth_clients.{google,spotify}.{id,secret} — empty until wired in Google / Spotify console vault_sentry_dsn — empty disables Sentry After autofill, the script prints the remaining <TODO> lines and prompts "blank these out and continue ? (y/n)". Answering y replaces every remaining "<TODO ...>" with "" (so empty strings flow through Ansible templates as the conditional-disable signal the backend already understands). Answering n exits with a suggestion to edit vault.yml manually. The autofill is idempotent — re-running phase 2 on a vault.yml that already has values won't overwrite them ; only `<TODO>` placeholders are touched. Helper functions live at the top of bootstrap-local.sh : _rand_token <len> — URL-safe random alphanum _autofill_field <file> <key> <value> — sed-replace one TODO line _autogen_jwt_keys <file> — RS256 keypair → both b64 fields _autofill_vault_secrets <file> — drives the per-field map above --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 21:06:47 +00:00
# ============================================================================
_rand_token() {
local len=${1:-32}
openssl rand -base64 $((len * 2)) 2>/dev/null | tr -dc 'A-Za-z0-9' | head -c "$len"
}
_autofill_field() {
local file=$1 key=$2 value=$3
local esc=${value//|/\\|}
sed -i "s|^${key}: \"<TODO[^\"]*\"|${key}: \"${esc}\"|" "$file"
}
_autogen_jwt_keys() {
local file=$1
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
grep -q '<TODO: base64 of RS256 private PEM>' "$file" || return 0
feat(bootstrap): phase 2 auto-fills 11 vault secrets, prompts on the rest The vault.yml.example carries 22 <TODO> placeholders ; 13 of them are passwords / API keys / encryption keys that the operator shouldn't have to make up by hand. Phase 2 now generates them. Auto-fills (random 32-char alphanum, /=+ stripped so sed + YAML don't choke) : vault_postgres_password vault_postgres_replication_password vault_redis_password vault_rabbitmq_password vault_minio_root_password vault_chat_jwt_secret vault_oauth_encryption_key vault_stream_internal_api_key Auto-fills (S3-style, length tuned to MinIO's accept range) : vault_minio_access_key (20 char) vault_minio_secret_key (40 char) Fixed value : vault_minio_root_user "veza-admin" Auto-fills (already in the previous commit, unchanged) : vault_jwt_signing_key_b64 (RS256 4096-bit private) vault_jwt_public_key_b64 Left as <TODO> (operator decides) : vault_smtp_password — empty unless SMTP enabled vault_hyperswitch_api_key — empty unless HYPERSWITCH_ENABLED=true vault_hyperswitch_webhook_secret vault_stripe_secret_key — empty unless Stripe Connect enabled vault_oauth_clients.{google,spotify}.{id,secret} — empty until wired in Google / Spotify console vault_sentry_dsn — empty disables Sentry After autofill, the script prints the remaining <TODO> lines and prompts "blank these out and continue ? (y/n)". Answering y replaces every remaining "<TODO ...>" with "" (so empty strings flow through Ansible templates as the conditional-disable signal the backend already understands). Answering n exits with a suggestion to edit vault.yml manually. The autofill is idempotent — re-running phase 2 on a vault.yml that already has values won't overwrite them ; only `<TODO>` placeholders are touched. Helper functions live at the top of bootstrap-local.sh : _rand_token <len> — URL-safe random alphanum _autofill_field <file> <key> <value> — sed-replace one TODO line _autogen_jwt_keys <file> — RS256 keypair → both b64 fields _autofill_vault_secrets <file> — drives the per-field map above --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 21:06:47 +00:00
info "generating RS256 JWT keypair"
local priv pub
priv=$(openssl genrsa 4096 2>/dev/null) || die "openssl genrsa failed"
pub=$(echo "$priv" | openssl rsa -pubout 2>/dev/null) || die "openssl rsa -pubout failed"
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
_autofill_field "$file" vault_jwt_signing_key_b64 "$(echo "$priv" | base64 -w0)"
_autofill_field "$file" vault_jwt_public_key_b64 "$(echo "$pub" | base64 -w0)"
ok "JWT keys generated"
feat(bootstrap): phase 2 auto-fills 11 vault secrets, prompts on the rest The vault.yml.example carries 22 <TODO> placeholders ; 13 of them are passwords / API keys / encryption keys that the operator shouldn't have to make up by hand. Phase 2 now generates them. Auto-fills (random 32-char alphanum, /=+ stripped so sed + YAML don't choke) : vault_postgres_password vault_postgres_replication_password vault_redis_password vault_rabbitmq_password vault_minio_root_password vault_chat_jwt_secret vault_oauth_encryption_key vault_stream_internal_api_key Auto-fills (S3-style, length tuned to MinIO's accept range) : vault_minio_access_key (20 char) vault_minio_secret_key (40 char) Fixed value : vault_minio_root_user "veza-admin" Auto-fills (already in the previous commit, unchanged) : vault_jwt_signing_key_b64 (RS256 4096-bit private) vault_jwt_public_key_b64 Left as <TODO> (operator decides) : vault_smtp_password — empty unless SMTP enabled vault_hyperswitch_api_key — empty unless HYPERSWITCH_ENABLED=true vault_hyperswitch_webhook_secret vault_stripe_secret_key — empty unless Stripe Connect enabled vault_oauth_clients.{google,spotify}.{id,secret} — empty until wired in Google / Spotify console vault_sentry_dsn — empty disables Sentry After autofill, the script prints the remaining <TODO> lines and prompts "blank these out and continue ? (y/n)". Answering y replaces every remaining "<TODO ...>" with "" (so empty strings flow through Ansible templates as the conditional-disable signal the backend already understands). Answering n exits with a suggestion to edit vault.yml manually. The autofill is idempotent — re-running phase 2 on a vault.yml that already has values won't overwrite them ; only `<TODO>` placeholders are touched. Helper functions live at the top of bootstrap-local.sh : _rand_token <len> — URL-safe random alphanum _autofill_field <file> <key> <value> — sed-replace one TODO line _autogen_jwt_keys <file> — RS256 keypair → both b64 fields _autofill_vault_secrets <file> — drives the per-field map above --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 21:06:47 +00:00
}
_autofill_vault_secrets() {
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
local file=$1 filled=()
for k in vault_postgres_password vault_postgres_replication_password \
vault_redis_password vault_rabbitmq_password \
vault_minio_root_password vault_chat_jwt_secret \
vault_oauth_encryption_key vault_stream_internal_api_key; do
feat(bootstrap): phase 2 auto-fills 11 vault secrets, prompts on the rest The vault.yml.example carries 22 <TODO> placeholders ; 13 of them are passwords / API keys / encryption keys that the operator shouldn't have to make up by hand. Phase 2 now generates them. Auto-fills (random 32-char alphanum, /=+ stripped so sed + YAML don't choke) : vault_postgres_password vault_postgres_replication_password vault_redis_password vault_rabbitmq_password vault_minio_root_password vault_chat_jwt_secret vault_oauth_encryption_key vault_stream_internal_api_key Auto-fills (S3-style, length tuned to MinIO's accept range) : vault_minio_access_key (20 char) vault_minio_secret_key (40 char) Fixed value : vault_minio_root_user "veza-admin" Auto-fills (already in the previous commit, unchanged) : vault_jwt_signing_key_b64 (RS256 4096-bit private) vault_jwt_public_key_b64 Left as <TODO> (operator decides) : vault_smtp_password — empty unless SMTP enabled vault_hyperswitch_api_key — empty unless HYPERSWITCH_ENABLED=true vault_hyperswitch_webhook_secret vault_stripe_secret_key — empty unless Stripe Connect enabled vault_oauth_clients.{google,spotify}.{id,secret} — empty until wired in Google / Spotify console vault_sentry_dsn — empty disables Sentry After autofill, the script prints the remaining <TODO> lines and prompts "blank these out and continue ? (y/n)". Answering y replaces every remaining "<TODO ...>" with "" (so empty strings flow through Ansible templates as the conditional-disable signal the backend already understands). Answering n exits with a suggestion to edit vault.yml manually. The autofill is idempotent — re-running phase 2 on a vault.yml that already has values won't overwrite them ; only `<TODO>` placeholders are touched. Helper functions live at the top of bootstrap-local.sh : _rand_token <len> — URL-safe random alphanum _autofill_field <file> <key> <value> — sed-replace one TODO line _autogen_jwt_keys <file> — RS256 keypair → both b64 fields _autofill_vault_secrets <file> — drives the per-field map above --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 21:06:47 +00:00
if grep -q "^${k}: \"<TODO" "$file"; then
_autofill_field "$file" "$k" "$(_rand_token 32)"
filled+=("$k")
fi
done
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
grep -q '^vault_minio_access_key: "<TODO' "$file" \
&& { _autofill_field "$file" vault_minio_access_key "$(_rand_token 20)"; filled+=(vault_minio_access_key); }
grep -q '^vault_minio_secret_key: "<TODO' "$file" \
&& { _autofill_field "$file" vault_minio_secret_key "$(_rand_token 40)"; filled+=(vault_minio_secret_key); }
grep -q '^vault_minio_root_user: "<TODO' "$file" \
&& { _autofill_field "$file" vault_minio_root_user "veza-admin"; filled+=(vault_minio_root_user); }
(( ${#filled[@]} > 0 )) && ok "auto-generated ${#filled[@]} secret(s) : ${filled[*]}"
feat(bootstrap): phase 2 auto-fills 11 vault secrets, prompts on the rest The vault.yml.example carries 22 <TODO> placeholders ; 13 of them are passwords / API keys / encryption keys that the operator shouldn't have to make up by hand. Phase 2 now generates them. Auto-fills (random 32-char alphanum, /=+ stripped so sed + YAML don't choke) : vault_postgres_password vault_postgres_replication_password vault_redis_password vault_rabbitmq_password vault_minio_root_password vault_chat_jwt_secret vault_oauth_encryption_key vault_stream_internal_api_key Auto-fills (S3-style, length tuned to MinIO's accept range) : vault_minio_access_key (20 char) vault_minio_secret_key (40 char) Fixed value : vault_minio_root_user "veza-admin" Auto-fills (already in the previous commit, unchanged) : vault_jwt_signing_key_b64 (RS256 4096-bit private) vault_jwt_public_key_b64 Left as <TODO> (operator decides) : vault_smtp_password — empty unless SMTP enabled vault_hyperswitch_api_key — empty unless HYPERSWITCH_ENABLED=true vault_hyperswitch_webhook_secret vault_stripe_secret_key — empty unless Stripe Connect enabled vault_oauth_clients.{google,spotify}.{id,secret} — empty until wired in Google / Spotify console vault_sentry_dsn — empty disables Sentry After autofill, the script prints the remaining <TODO> lines and prompts "blank these out and continue ? (y/n)". Answering y replaces every remaining "<TODO ...>" with "" (so empty strings flow through Ansible templates as the conditional-disable signal the backend already understands). Answering n exits with a suggestion to edit vault.yml manually. The autofill is idempotent — re-running phase 2 on a vault.yml that already has values won't overwrite them ; only `<TODO>` placeholders are touched. Helper functions live at the top of bootstrap-local.sh : _rand_token <len> — URL-safe random alphanum _autofill_field <file> <key> <value> — sed-replace one TODO line _autogen_jwt_keys <file> — RS256 keypair → both b64 fields _autofill_vault_secrets <file> — drives the per-field map above --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 21:06:47 +00:00
}
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
# ============================================================================
# Phase 1 — preflight
# ============================================================================
phase_1_preflight() {
section "Phase 1 — Preflight"
_current_phase=preflight
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
skip_if_done preflight "preflight" && return 0
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
require_cmd git ansible ansible-vault dig curl ssh openssl base64 jq
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
info "SSH to $SSH_TARGET"
ssh -o ConnectTimeout=5 -o BatchMode=yes "$SSH_TARGET" /bin/true \
|| { TALAS_HINT="check ~/.ssh/config (Host $R720_HOST) ; key in agent ?"
die "SSH to $SSH_TARGET failed"; }
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
ok "SSH OK"
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
info "incus reachable on R720 (no sudo)"
if ssh "$SSH_TARGET" "incus list >/dev/null 2>&1"; then
ok "operator is in incus-admin group (no sudo needed)"
else
warn "operator can't `incus list` without sudo — fine, ansible will prompt for sudo password"
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
fi
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
info "DNS resolution"
local missing=()
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
for d in veza.fr staging.veza.fr talas.fr forgejo.talas.group; do
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
dig +short +time=2 +tries=1 "$d" @1.1.1.1 2>/dev/null | grep -qE '^[0-9]+\.' \
|| missing+=("$d")
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
done
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
(( ${#missing[@]} > 0 )) \
&& warn "DNS missing for: ${missing[*]} — Let's Encrypt will fail for those" \
|| ok "all 4 public domains resolve"
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
mark_done preflight
}
# ============================================================================
# Phase 2 — vault
# ============================================================================
phase_2_vault() {
section "Phase 2 — Local vault"
_current_phase=vault
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
skip_if_done vault "vault setup" && return 0
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
if [[ -f "$VAULT_YML" ]] && head -1 "$VAULT_YML" 2>/dev/null | grep -q '^\$ANSIBLE_VAULT'; then
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
info "vault.yml already encrypted"
[[ -f "$VAULT_PASS" ]] || die "vault.yml encrypted but $VAULT_PASS missing — recover with ./reset-vault.sh"
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
else
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
if [[ ! -f "$VAULT_YML" ]]; then
feat(bootstrap): phase 2 auto-fills 11 vault secrets, prompts on the rest The vault.yml.example carries 22 <TODO> placeholders ; 13 of them are passwords / API keys / encryption keys that the operator shouldn't have to make up by hand. Phase 2 now generates them. Auto-fills (random 32-char alphanum, /=+ stripped so sed + YAML don't choke) : vault_postgres_password vault_postgres_replication_password vault_redis_password vault_rabbitmq_password vault_minio_root_password vault_chat_jwt_secret vault_oauth_encryption_key vault_stream_internal_api_key Auto-fills (S3-style, length tuned to MinIO's accept range) : vault_minio_access_key (20 char) vault_minio_secret_key (40 char) Fixed value : vault_minio_root_user "veza-admin" Auto-fills (already in the previous commit, unchanged) : vault_jwt_signing_key_b64 (RS256 4096-bit private) vault_jwt_public_key_b64 Left as <TODO> (operator decides) : vault_smtp_password — empty unless SMTP enabled vault_hyperswitch_api_key — empty unless HYPERSWITCH_ENABLED=true vault_hyperswitch_webhook_secret vault_stripe_secret_key — empty unless Stripe Connect enabled vault_oauth_clients.{google,spotify}.{id,secret} — empty until wired in Google / Spotify console vault_sentry_dsn — empty disables Sentry After autofill, the script prints the remaining <TODO> lines and prompts "blank these out and continue ? (y/n)". Answering y replaces every remaining "<TODO ...>" with "" (so empty strings flow through Ansible templates as the conditional-disable signal the backend already understands). Answering n exits with a suggestion to edit vault.yml manually. The autofill is idempotent — re-running phase 2 on a vault.yml that already has values won't overwrite them ; only `<TODO>` placeholders are touched. Helper functions live at the top of bootstrap-local.sh : _rand_token <len> — URL-safe random alphanum _autofill_field <file> <key> <value> — sed-replace one TODO line _autogen_jwt_keys <file> — RS256 keypair → both b64 fields _autofill_vault_secrets <file> — drives the per-field map above --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 21:06:47 +00:00
info "rendering vault.yml from example"
cp "$VAULT_EXAMPLE" "$VAULT_YML"
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
fi
feat(bootstrap): phase 2 auto-fills 11 vault secrets, prompts on the rest The vault.yml.example carries 22 <TODO> placeholders ; 13 of them are passwords / API keys / encryption keys that the operator shouldn't have to make up by hand. Phase 2 now generates them. Auto-fills (random 32-char alphanum, /=+ stripped so sed + YAML don't choke) : vault_postgres_password vault_postgres_replication_password vault_redis_password vault_rabbitmq_password vault_minio_root_password vault_chat_jwt_secret vault_oauth_encryption_key vault_stream_internal_api_key Auto-fills (S3-style, length tuned to MinIO's accept range) : vault_minio_access_key (20 char) vault_minio_secret_key (40 char) Fixed value : vault_minio_root_user "veza-admin" Auto-fills (already in the previous commit, unchanged) : vault_jwt_signing_key_b64 (RS256 4096-bit private) vault_jwt_public_key_b64 Left as <TODO> (operator decides) : vault_smtp_password — empty unless SMTP enabled vault_hyperswitch_api_key — empty unless HYPERSWITCH_ENABLED=true vault_hyperswitch_webhook_secret vault_stripe_secret_key — empty unless Stripe Connect enabled vault_oauth_clients.{google,spotify}.{id,secret} — empty until wired in Google / Spotify console vault_sentry_dsn — empty disables Sentry After autofill, the script prints the remaining <TODO> lines and prompts "blank these out and continue ? (y/n)". Answering y replaces every remaining "<TODO ...>" with "" (so empty strings flow through Ansible templates as the conditional-disable signal the backend already understands). Answering n exits with a suggestion to edit vault.yml manually. The autofill is idempotent — re-running phase 2 on a vault.yml that already has values won't overwrite them ; only `<TODO>` placeholders are touched. Helper functions live at the top of bootstrap-local.sh : _rand_token <len> — URL-safe random alphanum _autofill_field <file> <key> <value> — sed-replace one TODO line _autogen_jwt_keys <file> — RS256 keypair → both b64 fields _autofill_vault_secrets <file> — drives the per-field map above --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 21:06:47 +00:00
_autogen_jwt_keys "$VAULT_YML"
_autofill_vault_secrets "$VAULT_YML"
local remaining
remaining=$(grep -cE '<TODO' "$VAULT_YML" || true)
if (( remaining > 0 )); then
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
warn "$remaining <TODO> placeholders left (optional fields)"
feat(bootstrap): phase 2 auto-fills 11 vault secrets, prompts on the rest The vault.yml.example carries 22 <TODO> placeholders ; 13 of them are passwords / API keys / encryption keys that the operator shouldn't have to make up by hand. Phase 2 now generates them. Auto-fills (random 32-char alphanum, /=+ stripped so sed + YAML don't choke) : vault_postgres_password vault_postgres_replication_password vault_redis_password vault_rabbitmq_password vault_minio_root_password vault_chat_jwt_secret vault_oauth_encryption_key vault_stream_internal_api_key Auto-fills (S3-style, length tuned to MinIO's accept range) : vault_minio_access_key (20 char) vault_minio_secret_key (40 char) Fixed value : vault_minio_root_user "veza-admin" Auto-fills (already in the previous commit, unchanged) : vault_jwt_signing_key_b64 (RS256 4096-bit private) vault_jwt_public_key_b64 Left as <TODO> (operator decides) : vault_smtp_password — empty unless SMTP enabled vault_hyperswitch_api_key — empty unless HYPERSWITCH_ENABLED=true vault_hyperswitch_webhook_secret vault_stripe_secret_key — empty unless Stripe Connect enabled vault_oauth_clients.{google,spotify}.{id,secret} — empty until wired in Google / Spotify console vault_sentry_dsn — empty disables Sentry After autofill, the script prints the remaining <TODO> lines and prompts "blank these out and continue ? (y/n)". Answering y replaces every remaining "<TODO ...>" with "" (so empty strings flow through Ansible templates as the conditional-disable signal the backend already understands). Answering n exits with a suggestion to edit vault.yml manually. The autofill is idempotent — re-running phase 2 on a vault.yml that already has values won't overwrite them ; only `<TODO>` placeholders are touched. Helper functions live at the top of bootstrap-local.sh : _rand_token <len> — URL-safe random alphanum _autofill_field <file> <key> <value> — sed-replace one TODO line _autogen_jwt_keys <file> — RS256 keypair → both b64 fields _autofill_vault_secrets <file> — drives the per-field map above --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 21:06:47 +00:00
grep -n '<TODO' "$VAULT_YML" >&2
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
local cont; prompt_value cont "blank these out and continue ? (y/n)" "y"
[[ "${cont,,}" == "y" ]] || die "edit $VAULT_YML manually then rerun PHASE=2"
sed -i 's|"<TODO[^"]*"|""|g' "$VAULT_YML"
ok "blanked"
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
fi
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
if [[ ! -f "$VAULT_PASS" ]]; then
local pw; prompt_password pw "choose a vault password (memorize it !)"
echo "$pw" > "$VAULT_PASS"
chmod 0400 "$VAULT_PASS"
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
fi
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
ansible-vault encrypt --vault-password-file "$VAULT_PASS" "$VAULT_YML"
ok "encrypted"
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
fi
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
ansible-vault view --vault-password-file "$VAULT_PASS" "$VAULT_YML" >/dev/null \
|| { TALAS_HINT="run ./reset-vault.sh to start over"
die "cannot decrypt $VAULT_YML"; }
ok "decryption verified"
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
mark_done vault
}
# ============================================================================
# Phase 3 — Forgejo Secrets + Variables
# ============================================================================
phase_3_forgejo() {
section "Phase 3 — Forgejo Secrets + Variables"
_current_phase=forgejo
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
skip_if_done forgejo "Forgejo provisioning" && return 0
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
require_env FORGEJO_ADMIN_TOKEN \
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
"create at $FORGEJO_API_URL/-/user/settings/applications (scopes: write:repository + write:package)"
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
local insec=()
[[ "${FORGEJO_INSECURE:-0}" == "1" ]] && insec=(-k)
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
info "API reachability (auth-free /version probe)"
curl -fsSL "${insec[@]}" --max-time 10 "$FORGEJO_API_URL/api/v1/version" >/dev/null \
|| die "Forgejo API unreachable at $FORGEJO_API_URL"
ok "reachable"
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
info "repo $FORGEJO_OWNER/$FORGEJO_REPO + token write access"
forgejo_api GET "/repos/$FORGEJO_OWNER/$FORGEJO_REPO" >/dev/null \
|| die "repo not found or token lacks read:repository"
ok "repo + token OK"
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
# Registry token : skip if already set ; else prompt.
local _exists=0
forgejo_api GET "/repos/$FORGEJO_OWNER/$FORGEJO_REPO/actions/secrets" 2>/dev/null \
| jq -e '.[]? | select(.name == "FORGEJO_REGISTRY_TOKEN")' >/dev/null \
&& _exists=1
if [[ "${FORCE_FORGEJO_REPROMPT:-0}" != "1" ]] && (( _exists == 1 )); then
ok "FORGEJO_REGISTRY_TOKEN already set (FORCE_FORGEJO_REPROMPT=1 to replace)"
else
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
local rtok=""
if [[ -n "${FORGEJO_REGISTRY_TOKEN:-}" ]]; then
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
rtok="$FORGEJO_REGISTRY_TOKEN"
else
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
warn "create the token manually at $FORGEJO_API_URL/-/user/settings/applications"
warn " → name: veza-deploy-registry, scopes: write:package + read:package"
prompt_password rtok "paste the token (input hidden)"
fi
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
forgejo_set_secret "$FORGEJO_OWNER" "$FORGEJO_REPO" FORGEJO_REGISTRY_TOKEN "$rtok"
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
fi
forgejo_set_secret "$FORGEJO_OWNER" "$FORGEJO_REPO" ANSIBLE_VAULT_PASSWORD "$(cat "$VAULT_PASS")"
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
forgejo_set_var "$FORGEJO_OWNER" "$FORGEJO_REPO" FORGEJO_REGISTRY_URL \
"$FORGEJO_API_URL/api/packages/$FORGEJO_OWNER/generic"
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
mark_done forgejo
}
# ============================================================================
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
# Phase 4 — single ansible-playbook bootstrap (no shell SSH plumbing)
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
# ============================================================================
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
phase_4_ansible() {
section "Phase 4 — Ansible bootstrap (runner pipeline + edge HAProxy)"
_current_phase=ansible
skip_if_done ansible "ansible bootstrap" && return 0
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
info "ensuring ansible collections (community.general / .postgresql / .rabbitmq)"
for col in community.general community.postgresql community.rabbitmq; do
ansible-galaxy collection list "$col" 2>/dev/null | grep -q "^$col" \
|| ansible-galaxy collection install "$col" >/dev/null \
|| die "ansible-galaxy install $col failed"
done
ok "collections present"
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
require_env FORGEJO_ADMIN_TOKEN
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
# Try to auto-fetch a runner registration token. The /actions/runners/
# registration-token endpoint sometimes hangs or 404s depending on the
# Forgejo version + token scope. On failure, fall back to a manual
# prompt (operator generates a token in the UI).
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
info "fetching runner registration token from Forgejo"
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
local reg_token
if reg_token=$(forgejo_get_runner_token "$FORGEJO_OWNER" "$FORGEJO_REPO"); then
ok "got runner registration token (${#reg_token} chars)"
else
warn "auto-fetch failed (timeout or scope) — falling back to manual prompt"
warn ""
warn "Generate the token at :"
warn " $FORGEJO_API_URL/$FORGEJO_OWNER/$FORGEJO_REPO/settings/actions/runners"
warn " → 'Create new runner' → copy the token (looks like a 40-char hex)"
warn ""
prompt_password reg_token "paste runner registration token (input hidden)"
[[ -n "$reg_token" ]] || die "no token provided"
fi
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
cd "$REPO_ROOT/infra/ansible"
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
# Detect Incus network from forgejo container (no sudo needed if
# operator is in incus-admin group, otherwise skip — Ansible's own
# tasks will handle it via the host's incus binary).
info "detecting Incus network on R720"
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
local detected_net
detected_net=$(ssh "$SSH_TARGET" \
"incus config device get forgejo eth0 network 2>/dev/null" \
| tr -d '[:space:]' || true)
[[ -z "$detected_net" || "$detected_net" == "None" ]] && detected_net="net-veza"
ok "Incus network : $detected_net"
info "running bootstrap_runner.yml + haproxy.yml"
info " → ansible will prompt 'BECOME password:' below — type your sudo password ON THE R720"
info " (NOT a NOPASSWD-sudo bypass — your password is sent over SSH and never stored)"
# Single ansible-playbook invocation runs both playbooks in sequence.
# --ask-become-pass prompts ONCE for sudo on the R720 ; that password
# is held in memory by ansible and reused for every become: true task
# in both playbooks. No NOPASSWD sudo needed.
if ! ansible-playbook \
-i inventory/staging.yml \
--vault-password-file .vault-pass \
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
--ask-become-pass \
-e forgejo_registration_token="$reg_token" \
-e forgejo_api_url="$FORGEJO_API_URL" \
-e veza_incus_network="$detected_net" \
playbooks/bootstrap_runner.yml \
playbooks/haproxy.yml; then
TALAS_HINT="check ansible output above ; common: wrong sudo password, port 80 not reachable from Internet (Let's Encrypt HTTP-01)"
die "ansible-playbook failed"
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
fi
info "verifying Let's Encrypt certs landed"
local certs
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
certs=$(ssh "$SSH_TARGET" \
"incus exec veza-haproxy -- ls /usr/local/etc/tls/haproxy/ 2>/dev/null" || true)
[[ -n "$certs" ]] \
&& ok "certs : $(echo "$certs" | tr '\n' ' ')" \
|| warn "no certs found — re-run, or check port 80 reachable from Internet"
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
mark_done ansible
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
}
# ============================================================================
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
# Phase 5 — Summary
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
# ============================================================================
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
phase_5_summary() {
section "Phase 5 — Summary"
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
_current_phase=summary
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
cat >&2 <<EOF
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
${_GREEN}${_BOLD}✓ Bootstrap complete.${_RESET}
What works now :
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
• Forgejo registry has the deploy secrets + variable
• forgejo-runner has the 'incus' label + Incus socket access
• veza-haproxy edge container is up with Let's Encrypt certs
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
Next steps :
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
1. Trigger a manual deploy via Forgejo Actions UI :
$FORGEJO_API_URL/$FORGEJO_OWNER/$FORGEJO_REPO/actions
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
"Veza deploy""Run workflow"env=staging
2. Once green, re-enable auto-trigger :
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
$SCRIPT_DIR/enable-auto-deploy.sh
3. Verify state any time :
$SCRIPT_DIR/verify-local.sh
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
ssh $SSH_TARGET "$SCRIPT_DIR/verify-r720.sh" (if repo cloned on R720)
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
State file : $TALAS_STATE_FILE
EOF
mark_done summary
}
main() {
local start=${PHASE:-1}
info "starting at phase $start"
[[ $start -le 1 ]] && phase_1_preflight
[[ $start -le 2 ]] && phase_2_vault
[[ $start -le 3 ]] && phase_3_forgejo
refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing Rearchitecture after operator pushback : the previous design did too much in bash (SSH-streaming script chunks, manual sudo dance, NOPASSWD requirement). Ansible is the right tool. The shell scripts are now thin orchestrators handling the chicken-and-egg of vault + Forgejo CI provisioning, then calling ansible-playbook. Key principles : 1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive, password held in ansible memory only for the run. 2. Two parallel scripts — one per host, fully self-contained. 3. Both run the SAME Ansible playbooks (bootstrap_runner.yml + haproxy.yml). Difference is the inventory. Files (new + replaced) : ansible.cfg pipelining=True → False. Required for --ask-become-pass to work reliably ; the previous setting raced sudo's prompt and timed out at 12s. playbooks/bootstrap_runner.yml (new) The Incus-host-side bootstrap, ported from the old scripts/bootstrap/bootstrap-remote.sh. Three plays : Phase 1 : ensure veza-app + veza-data profiles exist ; drop legacy empty veza-net profile. Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket attached as a disk device, security.nesting=true, /usr/bin/incus pushed in as /usr/local/bin/incus, smoke-tested. Phase 3 : forgejo-runner registered with `incus,self-hosted` label (idempotent — skips if already labelled). Each task uses Ansible idioms (`incus_profile`, `incus_command` where they exist, `command:` with `failed_when` and explicit state-checking elsewhere). no_log on the registration token. inventory/local.yml (new) Inventory for `bootstrap-r720.sh` — connection: local instead of SSH+become. Same group structure as staging.yml ; container groups use community.general.incus connection plugin (the local incus binary, no remote). inventory/{staging,prod}.yml (modified) Added `forgejo_runner` group (target of bootstrap_runner.yml phase 3, reached via community.general.incus from the host). scripts/bootstrap/bootstrap-local.sh (rewritten) Five phases : preflight, vault, forgejo, ansible, summary. Phase 4 calls a single `ansible-playbook` with both bootstrap_runner.yml + haproxy.yml in sequence. --ask-become-pass : ansible prompts ONCE for sudo, holds in memory, reuses for every become: true task. scripts/bootstrap/bootstrap-r720.sh (new) Symmetric to bootstrap-local.sh but runs as root on the R720. No SSH preflight, no --ask-become-pass (already root). Same Ansible playbooks, inventory/local.yml. scripts/bootstrap/verify-r720.sh (new — replaces verify-remote) Read-only checks of R720 state. Run as root locally on the R720. scripts/bootstrap/verify-local.sh (modified) Cross-host SSH check now fits the env-var-driven SSH_TARGET pattern (R720_USER may be empty if the alias has User=). scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh, verify-remote-ssh.sh} (DELETED) Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh. README.md (rewritten) Documents the parallel-script architecture, the no-NOPASSWD-sudo design choice (--ask-become-pass), each phase's needs, and a refreshed troubleshooting list. State files unchanged in shape : laptop : .git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 13:12:26 +00:00
[[ $start -le 4 ]] && phase_4_ansible
[[ $start -le 5 ]] && phase_5_summary
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
ok "ALL DONE"
}
main "$@"