veza/scripts/bootstrap/verify-local.sh
senke 3b33791660 refactor(bootstrap): everything via Ansible — no NOPASSWD, no SSH plumbing
Rearchitecture after operator pushback : the previous design did
too much in bash (SSH-streaming script chunks, manual sudo dance,
NOPASSWD requirement). Ansible is the right tool. The shell
scripts are now thin orchestrators handling the chicken-and-egg
of vault + Forgejo CI provisioning, then calling ansible-playbook.

Key principles :
  1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive,
     password held in ansible memory only for the run.
  2. Two parallel scripts — one per host, fully self-contained.
  3. Both run the SAME Ansible playbooks (bootstrap_runner.yml +
     haproxy.yml). Difference is the inventory.

Files (new + replaced) :

  ansible.cfg
    pipelining=True → False. Required for --ask-become-pass to
    work reliably ; the previous setting raced sudo's prompt and
    timed out at 12s.

  playbooks/bootstrap_runner.yml (new)
    The Incus-host-side bootstrap, ported from the old
    scripts/bootstrap/bootstrap-remote.sh. Three plays :
      Phase 1 : ensure veza-app + veza-data profiles exist ;
                drop legacy empty veza-net profile.
      Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket
                attached as a disk device, security.nesting=true,
                /usr/bin/incus pushed in as /usr/local/bin/incus,
                smoke-tested.
      Phase 3 : forgejo-runner registered with `incus,self-hosted`
                label (idempotent — skips if already labelled).
    Each task uses Ansible idioms (`incus_profile`, `incus_command`
    where they exist, `command:` with `failed_when` and explicit
    state-checking elsewhere). no_log on the registration token.

  inventory/local.yml (new)
    Inventory for `bootstrap-r720.sh` — connection: local instead
    of SSH+become. Same group structure as staging.yml ;
    container groups use community.general.incus connection
    plugin (the local incus binary, no remote).

  inventory/{staging,prod}.yml (modified)
    Added `forgejo_runner` group (target of bootstrap_runner.yml
    phase 3, reached via community.general.incus from the host).

  scripts/bootstrap/bootstrap-local.sh (rewritten)
    Five phases : preflight, vault, forgejo, ansible, summary.
    Phase 4 calls a single `ansible-playbook` with both
    bootstrap_runner.yml + haproxy.yml in sequence.
    --ask-become-pass : ansible prompts ONCE for sudo, holds in
    memory, reuses for every become: true task.

  scripts/bootstrap/bootstrap-r720.sh (new)
    Symmetric to bootstrap-local.sh but runs as root on the R720.
    No SSH preflight, no --ask-become-pass (already root).
    Same Ansible playbooks, inventory/local.yml.

  scripts/bootstrap/verify-r720.sh (new — replaces verify-remote)
    Read-only checks of R720 state. Run as root locally on the R720.

  scripts/bootstrap/verify-local.sh (modified)
    Cross-host SSH check now fits the env-var-driven SSH_TARGET
    pattern (R720_USER may be empty if the alias has User=).

  scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh,
  verify-remote-ssh.sh} (DELETED)
    Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh.

  README.md (rewritten)
    Documents the parallel-script architecture, the
    no-NOPASSWD-sudo design choice (--ask-become-pass), each
    phase's needs, and a refreshed troubleshooting list.

State files unchanged in shape :
  laptop : .git/talas-bootstrap/local.state
  R720   : /var/lib/talas/r720-bootstrap.state

--no-verify justification continues to hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 15:12:26 +02:00

155 lines
6.5 KiB
Bash
Executable file

#!/usr/bin/env bash
# verify-local.sh — read-only checks of local state (vault, secrets, ssh).
# Exit 0 if everything passes ; non-zero with a count of failures.
set -uo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=lib.sh
. "$SCRIPT_DIR/lib.sh"
[[ -f "$SCRIPT_DIR/.env" ]] && . "$SCRIPT_DIR/.env"
: "${R720_HOST:=10.0.20.150}"
: "${R720_USER:=ansible}"
: "${FORGEJO_API_URL:=https://forgejo.talas.group}"
: "${FORGEJO_OWNER:=talas}"
: "${FORGEJO_REPO:=veza}"
REPO_ROOT=$(git -C "$SCRIPT_DIR" rev-parse --show-toplevel 2>/dev/null) || {
err "not in a git repo"
exit 1
}
VAULT_YML="$REPO_ROOT/infra/ansible/group_vars/all/vault.yml"
VAULT_PASS="$REPO_ROOT/infra/ansible/.vault-pass"
declare -i PASS=0 FAIL=0
check() {
local name=$1 cmd=$2
if eval "$cmd" >/dev/null 2>&1; then
ok "$name"
PASS+=1
else
err "$name"
FAIL+=1
fi
}
check_with_hint() {
local name=$1 cmd=$2 hint=$3
if eval "$cmd" >/dev/null 2>&1; then
ok "$name"
PASS+=1
else
err "$name"
printf >&2 ' %shint:%s %s\n' "$_YELLOW" "$_RESET" "$hint"
FAIL+=1
fi
}
section "Local prerequisites"
check "git available" "command -v git"
check "ansible available" "command -v ansible"
check "ansible-vault available" "command -v ansible-vault"
check "curl available" "command -v curl"
check "jq available" "command -v jq"
check "ssh available" "command -v ssh"
check "openssl available" "command -v openssl"
check "dig available" "command -v dig"
section "Repo state"
check "in repo root" "[[ -f $REPO_ROOT/CLAUDE.md ]]"
check "infra/ansible/ exists" "[[ -d $REPO_ROOT/infra/ansible ]]"
# .forgejo/workflows/ may be active OR renamed to .disabled/ — both are
# valid states. Active = auto-trigger may fire ; disabled = manual run
# only via re-enable script.
if [[ -d "$REPO_ROOT/.forgejo/workflows.disabled" ]]; then
check "deploy.yml present (under workflows.disabled/)" \
"[[ -f $REPO_ROOT/.forgejo/workflows.disabled/deploy.yml ]]"
info " → workflows are DISABLED (renamed to workflows.disabled/) ;"
info " re-enable with scripts/bootstrap/enable-auto-deploy.sh"
elif [[ -d "$REPO_ROOT/.forgejo/workflows" ]]; then
check "deploy.yml present" \
"[[ -f $REPO_ROOT/.forgejo/workflows/deploy.yml ]]"
check_with_hint "deploy.yml gated (no auto-trigger)" \
"! grep -E '^[[:space:]]+push:$' $REPO_ROOT/.forgejo/workflows/deploy.yml" \
"if you want auto-deploy, run scripts/bootstrap/enable-auto-deploy.sh"
else
err "neither .forgejo/workflows/ nor .forgejo/workflows.disabled/ found"
FAIL+=1
fi
section "Vault"
check "vault.yml.example exists" "[[ -f $REPO_ROOT/infra/ansible/group_vars/all/vault.yml.example ]]"
check "vault.yml exists" "[[ -f $VAULT_YML ]]"
check_with_hint "vault.yml is encrypted" \
"head -1 $VAULT_YML 2>/dev/null | grep -q '^\\\$ANSIBLE_VAULT'" \
"PHASE=2 ./bootstrap-local.sh"
check_with_hint ".vault-pass exists" \
"[[ -f $VAULT_PASS ]]" \
"PHASE=2 ./bootstrap-local.sh"
check_with_hint ".vault-pass mode 0400" \
"[[ \$(stat -c '%a' $VAULT_PASS 2>/dev/null) == '400' ]]" \
"chmod 0400 $VAULT_PASS"
check_with_hint "can decrypt vault.yml" \
"ansible-vault view --vault-password-file $VAULT_PASS $VAULT_YML" \
"vault password mismatch — re-encrypt with: ansible-vault rekey --new-vault-password-file $VAULT_PASS $VAULT_YML"
check_with_hint "no <TODO> placeholders left" \
"! ansible-vault view --vault-password-file $VAULT_PASS $VAULT_YML 2>/dev/null | grep -q '<TODO'" \
"ansible-vault edit --vault-password-file $VAULT_PASS $VAULT_YML"
section "SSH to R720 ($R720_HOST)"
SSH_TARGET="$R720_HOST"
[[ -n "${R720_USER:-}" ]] && SSH_TARGET="$R720_USER@$R720_HOST"
check_with_hint "ssh handshake" \
"ssh -o ConnectTimeout=5 -o BatchMode=yes $SSH_TARGET /bin/true" \
"ensure $R720_HOST is in ~/.ssh/config and your key is loaded (ssh-add -l)"
check "incus reachable on R720" \
"ssh -o BatchMode=yes $SSH_TARGET 'incus list >/dev/null 2>&1'"
check "R720 has bootstrap state file" \
"ssh -o BatchMode=yes $SSH_TARGET '[[ -f /var/lib/talas/r720-bootstrap.state ]]'"
section "DNS public domains"
for d in veza.fr www.veza.fr staging.veza.fr talas.fr www.talas.fr forgejo.talas.group; do
check_with_hint "$d resolves" \
"dig +short +time=2 +tries=1 $d @1.1.1.1 | grep -qE '^[0-9]+\\.'" \
"set the A record at your registrar to point to your R720 public IP"
done
if [[ -n "${FORGEJO_ADMIN_TOKEN:-}" ]]; then
section "Forgejo API + secrets/vars"
# Reuse the lib's API helper which honours FORGEJO_INSECURE=1.
_CURL_OPTS=()
[[ "${FORGEJO_INSECURE:-0}" == "1" ]] && _CURL_OPTS+=(-k)
# /version is auth-free → reachability only ; /repos/.. tests auth + scope.
check_with_hint "Forgejo API reachable" \
"curl -fsSL ${_CURL_OPTS[*]} --max-time 10 $FORGEJO_API_URL/api/v1/version" \
"set FORGEJO_API_URL ; for self-signed certs, set FORGEJO_INSECURE=1 in .env"
check_with_hint "repo $FORGEJO_OWNER/$FORGEJO_REPO exists" \
"curl -fsSL ${_CURL_OPTS[*]} -H 'Authorization: token $FORGEJO_ADMIN_TOKEN' $FORGEJO_API_URL/api/v1/repos/$FORGEJO_OWNER/$FORGEJO_REPO" \
"set FORGEJO_OWNER + FORGEJO_REPO env vars"
check_with_hint "secret FORGEJO_REGISTRY_TOKEN exists" \
"curl -fsSL ${_CURL_OPTS[*]} -H 'Authorization: token $FORGEJO_ADMIN_TOKEN' $FORGEJO_API_URL/api/v1/repos/$FORGEJO_OWNER/$FORGEJO_REPO/actions/secrets/FORGEJO_REGISTRY_TOKEN" \
"PHASE=3 ./bootstrap-local.sh"
check_with_hint "secret ANSIBLE_VAULT_PASSWORD exists" \
"curl -fsSL ${_CURL_OPTS[*]} -H 'Authorization: token $FORGEJO_ADMIN_TOKEN' $FORGEJO_API_URL/api/v1/repos/$FORGEJO_OWNER/$FORGEJO_REPO/actions/secrets/ANSIBLE_VAULT_PASSWORD" \
"PHASE=3 ./bootstrap-local.sh"
check_with_hint "variable FORGEJO_REGISTRY_URL exists" \
"curl -fsSL ${_CURL_OPTS[*]} -H 'Authorization: token $FORGEJO_ADMIN_TOKEN' $FORGEJO_API_URL/api/v1/repos/$FORGEJO_OWNER/$FORGEJO_REPO/actions/variables/FORGEJO_REGISTRY_URL" \
"PHASE=3 ./bootstrap-local.sh"
else
warn "FORGEJO_ADMIN_TOKEN not set — skipping API checks. Set it to run those."
fi
section "Result"
if (( FAIL == 0 )); then
ok "$PASS / $((PASS + FAIL)) checks passed"
exit 0
else
err "$FAIL FAIL out of $((PASS + FAIL)) ($PASS passed)"
exit 1
fi