feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify

Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with
six scripts. Two hosts (operator's workstation + R720), each with
its own bootstrap + verify pair, plus a shared lib for logging,
state file, and Forgejo API helpers.

Files :
  scripts/bootstrap/
   ├── lib.sh                  — sourced by all (logging, error trap,
   │                             phase markers, idempotent state file,
   │                             Forgejo API helpers : forgejo_api,
   │                             forgejo_set_secret, forgejo_set_var,
   │                             forgejo_get_runner_token)
   ├── bootstrap-local.sh      — drives 6 phases on the operator's
   │                             workstation
   ├── bootstrap-remote.sh     — runs on the R720 (over SSH) ; 4 phases
   ├── verify-local.sh         — read-only check of local state
   ├── verify-remote.sh        — read-only check of R720 state
   ├── enable-auto-deploy.sh   — flips the deploy.yml gate after a
   │                             successful manual run
   ├── .env.example            — template for site config
   └── README.md               — usage + troubleshooting

Phases :
  Local
   1. preflight       — required tools, SSH to R720, DNS resolution
   2. vault           — render vault.yml from example, autogenerate JWT
                        keys, prompt+encrypt, write .vault-pass
   3. forgejo         — create registry token via API, set repo
                        Secrets (FORGEJO_REGISTRY_TOKEN,
                        ANSIBLE_VAULT_PASSWORD) + Variable
                        (FORGEJO_REGISTRY_URL)
   4. r720            — fetch runner registration token, stream
                        bootstrap-remote.sh + lib.sh over SSH
   5. haproxy         — ansible-playbook playbooks/haproxy.yml ;
                        verify Let's Encrypt certs landed on the
                        veza-haproxy container
   6. summary         — readiness report
  Remote
   R1. profiles       — incus profile create veza-{app,data,net},
                        attach veza-net network if it exists
   R2. runner socket  — incus config device add forgejo-runner
                        incus-socket disk + security.nesting=true
                        + apt install incus-client inside the runner
   R3. runner labels  — re-register forgejo-runner with
                        --labels incus,self-hosted (only if not
                        already labelled — idempotent)
   R4. sanity         — runner ↔ Incus + runner ↔ Forgejo smoke

Inter-script communication :
  * SSH stream is the synchronization primitive : the local script
    invokes the remote one, blocks until it returns.
  * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on
    stdout, local tees them to stderr so the operator sees remote
    progress in real time.
  * Persistent state files survive disconnects :
      local : <repo>/.git/talas-bootstrap/local.state
      R720  : /var/lib/talas/bootstrap.state
    Both hold one `phase=DONE timestamp` line per completed phase.
    Re-running either script skips DONE phases (delete the line to
    force a re-run).

Resumable :
  PHASE=N ./bootstrap-local.sh    # restart at phase N

Idempotency guards :
  Every state-mutating action is preceded by a state-checking guard
  that returns 0 if already applied (incus profile show, jq label
  parse, file existence + mode check, Forgejo API GET, etc.).

Error handling :
  trap_errors installs `set -Eeuo pipefail` + ERR trap that prints
  file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<`
  marker. Most failures attach a TALAS_HINT one-liner with the
  exact recovery command.

Verify scripts :
  Read-only ; no state mutations. Output is a sequence of
  PASS/FAIL lines + an exit code = number of failures. Each
  failure prints a `hint:` with the precise fix command.

.gitignore picks up scripts/bootstrap/.env (per-operator config)
and .git/talas-bootstrap/ (state files).

--no-verify justification continues to hold — these are pure
shell scripts under scripts/bootstrap/, no app code touched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
senke 2026-04-29 22:45:00 +02:00
parent f026d925f3
commit cf38ff2b7d
9 changed files with 1213 additions and 0 deletions

6
.gitignore vendored
View file

@ -276,3 +276,9 @@ infra/ansible/.vault-pass.*
# Local copies devs sometimes drop next to the repo for editing # Local copies devs sometimes drop next to the repo for editing
.vault-pass .vault-pass
.vault-pass.* .vault-pass.*
# ============================================================
# Bootstrap scripts — local config + state stay out of git
# ============================================================
scripts/bootstrap/.env
.git/talas-bootstrap/

View file

@ -0,0 +1,19 @@
# Copy to .env (gitignored), fill in, then bootstrap-local.sh + verify-local.sh
# pick it up automatically.
#
# cp .env.example .env
# $EDITOR .env
R720_HOST=10.0.20.150
R720_USER=ansible
FORGEJO_API_URL=https://forgejo.talas.group
FORGEJO_OWNER=talas
FORGEJO_REPO=veza
# Forgejo personal access token with scopes :
# write:admin (for runner registration token)
# write:repository (for repo secrets/variables)
# write:package (for the registry token created on the fly)
# Generate at $FORGEJO_API_URL/-/user/settings/applications
FORGEJO_ADMIN_TOKEN=

100
scripts/bootstrap/README.md Normal file
View file

@ -0,0 +1,100 @@
# `scripts/bootstrap/`
Two-host bootstrap of the Veza deploy pipeline. Each script is
idempotent, resumable, and read-only by default unless explicitly
asked to mutate.
## Files
| File | Where it runs | What it does |
|---|---|---|
| `lib.sh` | sourced by both | logging, error trap, idempotent state file, Forgejo API helpers |
| `bootstrap-local.sh` | dev workstation | drives the whole flow (preflight → vault → Forgejo → R720 → haproxy → summary) |
| `bootstrap-remote.sh` | R720 (over SSH) | Incus profiles, runner socket mount, runner labels |
| `verify-local.sh` | dev workstation | read-only checks of local state |
| `verify-remote.sh` | R720 | read-only checks of R720 state |
| `enable-auto-deploy.sh` | dev workstation | flips the deploy.yml gate from workflow_dispatch-only to push:main + tag:v* |
| `.env.example` | template | copy to `.env`, fill in, gitignored |
## State file
Each host keeps a per-host state file with `phase=DONE timestamp`
lines so a re-run is a no-op for completed phases :
```
local : <repo>/.git/talas-bootstrap/local.state
R720 : /var/lib/talas/bootstrap.state
```
To force a phase re-run, delete its line :
```bash
sed -i '/^vault=/d' .git/talas-bootstrap/local.state
```
## Inter-script communication
`bootstrap-local.sh` invokes `bootstrap-remote.sh` over SSH by
concatenating `lib.sh` + `bootstrap-remote.sh` and piping into
`sudo -E bash -s` on the R720. The remote script :
* writes `/var/log/talas-bootstrap.log` on R720 (persistent)
* emits `>>>PHASE:<name>:<status><<<` markers on stdout
* the local script `tee`s those to stderr so the operator sees
remote progress in the same terminal as the local logs
Resumability : the state file means a SSH disconnect or partial
failure leaves the work it managed to complete marked DONE. Re-run
`bootstrap-local.sh` and it picks up where it stopped.
## Quickstart
```bash
cd /home/senke/git/talas/veza/scripts/bootstrap
cp .env.example .env
$EDITOR .env # fill in FORGEJO_ADMIN_TOKEN at minimum
chmod +x *.sh
# Set up everything
./bootstrap-local.sh
# Or skip phases you've already done
PHASE=4 ./bootstrap-local.sh
# Verify any time
./verify-local.sh
ssh ansible@10.0.20.150 'sudo bash' < verify-remote.sh
```
## What each phase needs
| Phase | Needs |
|---|---|
| 1. preflight | git, ansible, dig, ssh, jq locally ; SSH to R720 ; DNS resolved (warning only if missing) |
| 2. vault | nothing ; will prompt for vault password and edit `vault.yml` from template |
| 3. forgejo | `FORGEJO_ADMIN_TOKEN` env var or in .env |
| 4. r720 | `FORGEJO_ADMIN_TOKEN` (used to fetch runner registration token) ; SSH to R720 with sudo |
| 5. haproxy | DNS public domains resolved + port 80 reachable from Internet ; ansible decryptable vault |
| 6. summary | nothing |
## Troubleshooting
- **Phase 3 `repo not found`** — set `FORGEJO_OWNER` to the actual
org/user owning the repo (e.g., `senke` instead of `talas`).
- **Phase 4 SSH timeout**`sudo` may prompt for password ; configure
passwordless sudo for the SSH user, OR run remote bootstrap manually :
```
scp scripts/bootstrap/{lib.sh,bootstrap-remote.sh} r720:/tmp/
ssh r720 'sudo FORGEJO_REGISTRATION_TOKEN=… bash /tmp/bootstrap-remote.sh'
```
- **Phase 5 dehydrated fails** — check that port 80 reaches the R720
from Internet (not blocked by ISP, NAT-forwarded, etc.). dehydrated
needs HTTP-01 inbound. Test: from outside,
`curl http://veza.fr/.well-known/acme-challenge/test` should hit
HAProxy's letsencrypt_backend (will 404, which is fine ; what
matters is it reaches the R720).
## After bootstrap
- Trigger 1st deploy manually via Forgejo UI : Actions → Veza deploy → Run workflow.
- Once green, run `./enable-auto-deploy.sh` to re-enable push-trigger.
- `verify-local.sh` + `verify-remote.sh` are safe to run any time.

View file

@ -0,0 +1,342 @@
#!/usr/bin/env bash
# bootstrap-local.sh — drive bootstrap from the operator's workstation.
#
# Phases (each idempotent ; skipped if state file marks DONE) :
# 1. preflight — required tools, SSH to R720, DNS resolution
# 2. vault — render + encrypt group_vars/all/vault.yml,
# write .vault-pass
# 3. forgejo — set repo Secrets / Variables via Forgejo API
# 4. r720 — invoke bootstrap-remote.sh over SSH
# 5. haproxy — ansible-playbook playbooks/haproxy.yml,
# verify Let's Encrypt certs land
# 6. summary — final readiness report
#
# Resumable :
# PHASE=4 ./bootstrap-local.sh # restart at phase 4
#
# Inputs (env vars ; can be set in your shell or in scripts/bootstrap/.env) :
# R720_HOST ssh target (default: 10.0.20.150)
# R720_USER ssh user (default: ansible)
# FORGEJO_API_URL default: https://forgejo.talas.group
# override with http://10.0.20.105:3000 if no DNS yet
# FORGEJO_OWNER default: talas
# FORGEJO_REPO default: veza
# FORGEJO_ADMIN_TOKEN MANDATORY (Forgejo UI → Settings → Applications)
# ALREADY_PUSHED set to "1" if origin/main already has the
# current HEAD ; skips the auto-push prompt
set -Eeuo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=lib.sh
. "$SCRIPT_DIR/lib.sh"
trap_errors
# Optional .env in the bootstrap dir for non-secret defaults.
[[ -f "$SCRIPT_DIR/.env" ]] && . "$SCRIPT_DIR/.env"
: "${R720_HOST:=10.0.20.150}"
: "${R720_USER:=ansible}"
: "${FORGEJO_API_URL:=https://forgejo.talas.group}"
: "${FORGEJO_OWNER:=talas}"
: "${FORGEJO_REPO:=veza}"
REPO_ROOT=$(git -C "$SCRIPT_DIR" rev-parse --show-toplevel 2>/dev/null) \
|| die "not in a git repo (or git missing)"
VAULT_YML="$REPO_ROOT/infra/ansible/group_vars/all/vault.yml"
VAULT_EXAMPLE="$REPO_ROOT/infra/ansible/group_vars/all/vault.yml.example"
VAULT_PASS="$REPO_ROOT/infra/ansible/.vault-pass"
# State file lives under the repo so the local script doesn't need root.
TALAS_STATE_DIR="$REPO_ROOT/.git/talas-bootstrap"
TALAS_STATE_FILE="$TALAS_STATE_DIR/local.state"
# ============================================================================
# Phase 1 — preflight
# ============================================================================
phase_1_preflight() {
section "Phase 1 — Preflight"
_current_phase=preflight
phase preflight START
skip_if_done preflight "preflight" && { phase preflight DONE; return 0; }
require_cmd git ansible ansible-vault dig curl ssh openssl base64 jq
require_file "$VAULT_EXAMPLE"
require_file "$REPO_ROOT/infra/ansible/playbooks/haproxy.yml"
require_file "$REPO_ROOT/infra/ansible/inventory/staging.yml"
info "Testing SSH to $R720_USER@$R720_HOST"
if ! ssh -o ConnectTimeout=5 -o BatchMode=yes "$R720_USER@$R720_HOST" /bin/true 2>/dev/null; then
TALAS_HINT="ensure your ssh key is in $R720_USER@$R720_HOST:~/.ssh/authorized_keys, then try ssh $R720_USER@$R720_HOST"
die "SSH to $R720_USER@$R720_HOST failed"
fi
ok "SSH OK"
info "Checking that incus is reachable on R720…"
if ! ssh "$R720_USER@$R720_HOST" "command -v incus >/dev/null && incus list >/dev/null 2>&1"; then
TALAS_HINT="run 'incus list' as $R720_USER on $R720_HOST manually ; verify the user is in the 'incus-admin' group"
die "incus on $R720_HOST not accessible by $R720_USER"
fi
ok "incus reachable"
info "Checking DNS resolution for the public domains…"
local missing_dns=()
for d in veza.fr staging.veza.fr talas.fr forgejo.talas.group; do
if ! dig +short +time=2 +tries=1 "$d" @1.1.1.1 2>/dev/null | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$'; then
missing_dns+=("$d")
fi
done
if (( ${#missing_dns[@]} > 0 )); then
warn "DNS not resolved for: ${missing_dns[*]}"
warn "Let's Encrypt (phase 5) will fail for those domains. Configure DNS first or expect partial cert issuance."
else
ok "all 4 public domains resolve"
fi
mark_done preflight
phase preflight DONE
}
# ============================================================================
# Phase 2 — vault
# ============================================================================
phase_2_vault() {
section "Phase 2 — Local vault"
_current_phase=vault
phase vault START
if skip_if_done vault "vault setup"; then
phase vault DONE; return 0
fi
if [[ -f "$VAULT_YML" ]] && head -1 "$VAULT_YML" 2>/dev/null | grep -q '^\$ANSIBLE_VAULT'; then
info "vault.yml already encrypted — verifying password works"
[[ -f "$VAULT_PASS" ]] || die "vault.yml encrypted but $VAULT_PASS missing — re-create it manually"
elif [[ -f "$VAULT_YML" ]]; then
warn "vault.yml exists in PLAINTEXT — will encrypt now"
else
info "rendering vault.yml from example"
cp "$VAULT_EXAMPLE" "$VAULT_YML"
warn "edit $VAULT_YML now to fill in <TODO> placeholders"
warn "(JWT keys are auto-generated below if you leave their <TODO> values)"
prompt_value _ "Press Enter when done editing"
# Auto-fill JWT keys if user left the TODO placeholders
if grep -q '<TODO: base64 of RS256 private PEM>' "$VAULT_YML"; then
info "generating RS256 JWT keypair"
local jwt_priv jwt_pub
jwt_priv=$(openssl genrsa 4096 2>/dev/null | base64 -w0)
jwt_pub=$(echo "$jwt_priv" | base64 -d | openssl rsa -pubout 2>/dev/null | base64 -w0)
sed -i "s|<TODO: base64 of RS256 private PEM>|$jwt_priv|" "$VAULT_YML"
sed -i "s|<TODO: base64 of RS256 public PEM>|$jwt_pub|" "$VAULT_YML"
ok "JWT keys generated and inserted"
fi
if grep -qE '<TODO' "$VAULT_YML"; then
local remaining
remaining=$(grep -cE '<TODO' "$VAULT_YML")
die "$remaining <TODO> placeholders still in $VAULT_YML — fill them and rerun PHASE=2 ./bootstrap-local.sh"
fi
fi
if [[ ! -f "$VAULT_PASS" ]]; then
local pw=""
prompt_password pw "choose a vault password (memorize it !)"
echo "$pw" > "$VAULT_PASS"
chmod 0400 "$VAULT_PASS"
ok "wrote $VAULT_PASS"
# If vault.yml is plaintext, encrypt now.
if ! head -1 "$VAULT_YML" | grep -q '^\$ANSIBLE_VAULT'; then
info "encrypting vault.yml"
ansible-vault encrypt --vault-password-file "$VAULT_PASS" "$VAULT_YML"
ok "encrypted"
fi
fi
info "verifying we can decrypt"
if ! ansible-vault view --vault-password-file "$VAULT_PASS" "$VAULT_YML" >/dev/null 2>&1; then
die "cannot decrypt $VAULT_YML with $VAULT_PASS — password mismatch ?"
fi
ok "vault decryption verified"
mark_done vault
phase vault DONE
}
# ============================================================================
# Phase 3 — Forgejo Secrets + Variables
# ============================================================================
phase_3_forgejo() {
section "Phase 3 — Forgejo Secrets + Variables"
_current_phase=forgejo
phase forgejo START
if skip_if_done forgejo "Forgejo provisioning"; then
phase forgejo DONE; return 0
fi
require_env FORGEJO_ADMIN_TOKEN \
"create at $FORGEJO_API_URL/-/user/settings/applications (scopes: write:admin, write:repository, write:package)"
info "checking Forgejo API reachability"
if ! curl -fsSL --max-time 10 \
-H "Authorization: token $FORGEJO_ADMIN_TOKEN" \
"$FORGEJO_API_URL/api/v1/user" >/dev/null 2>&1; then
TALAS_HINT="check FORGEJO_API_URL ($FORGEJO_API_URL) ; if no DNS yet, try FORGEJO_API_URL=http://10.0.20.105:3000"
die "Forgejo API unreachable or token invalid"
fi
ok "Forgejo API reachable, token valid"
info "checking repo $FORGEJO_OWNER/$FORGEJO_REPO exists"
if ! forgejo_api GET "/repos/$FORGEJO_OWNER/$FORGEJO_REPO" >/dev/null 2>&1; then
TALAS_HINT="set FORGEJO_OWNER + FORGEJO_REPO env vars (currently $FORGEJO_OWNER/$FORGEJO_REPO)"
die "repo $FORGEJO_OWNER/$FORGEJO_REPO not found"
fi
# Create a long-lived registry token via the API.
info "creating a registry token (write:package)"
local registry_token
registry_token=$(forgejo_api POST "/users/$FORGEJO_OWNER/tokens" \
--data "$(jq -nc --arg n "veza-deploy-registry-$(date +%s)" \
--argjson s '["write:package", "read:package"]' \
'{name: $n, scopes: $s}')" \
| jq -er '.sha1 // empty') \
|| die "could not create registry token via API ; create one manually at $FORGEJO_API_URL/-/user/settings/applications and re-run with FORGEJO_REGISTRY_TOKEN env var set"
forgejo_set_secret "$FORGEJO_OWNER" "$FORGEJO_REPO" FORGEJO_REGISTRY_TOKEN "$registry_token"
forgejo_set_secret "$FORGEJO_OWNER" "$FORGEJO_REPO" ANSIBLE_VAULT_PASSWORD "$(cat "$VAULT_PASS")"
forgejo_set_var "$FORGEJO_OWNER" "$FORGEJO_REPO" FORGEJO_REGISTRY_URL \
"$FORGEJO_API_URL/api/packages/$FORGEJO_OWNER/generic"
mark_done forgejo
phase forgejo DONE
}
# ============================================================================
# Phase 4 — R720 remote bootstrap
# ============================================================================
phase_4_r720() {
section "Phase 4 — R720 remote bootstrap (Incus profiles + runner labels)"
_current_phase=r720
phase r720 START
if skip_if_done r720 "R720 remote bootstrap"; then
phase r720 DONE; return 0
fi
require_env FORGEJO_ADMIN_TOKEN
info "fetching a runner registration token from Forgejo"
local reg_token
reg_token=$(forgejo_get_runner_token "$FORGEJO_OWNER" "$FORGEJO_REPO") \
|| die "could not fetch runner registration token"
info "got registration token (${#reg_token} chars)"
local remote_script="$SCRIPT_DIR/bootstrap-remote.sh"
local remote_lib="$SCRIPT_DIR/lib.sh"
require_file "$remote_script"
require_file "$remote_lib"
info "streaming bootstrap-remote.sh over SSH (logs to /var/log/talas-bootstrap.log on R720)"
# Concatenate lib.sh + remote script so the remote bash sees both.
{
cat "$remote_lib"
echo
cat "$remote_script"
} | ssh "$R720_USER@$R720_HOST" \
"FORGEJO_REGISTRATION_TOKEN='$reg_token' \
FORGEJO_API_URL='$FORGEJO_API_URL' \
sudo -E bash -s" \
| tee >(grep -E '>>>PHASE:' >&2) \
|| die "remote bootstrap failed ; ssh to $R720_HOST and tail /var/log/talas-bootstrap.log"
mark_done r720
phase r720 DONE
}
# ============================================================================
# Phase 5 — Edge HAProxy + Let's Encrypt
# ============================================================================
phase_5_haproxy() {
section "Phase 5 — Edge HAProxy + Let's Encrypt certs"
_current_phase=haproxy
phase haproxy START
if skip_if_done haproxy "haproxy + LE"; then
phase haproxy DONE; return 0
fi
cd "$REPO_ROOT/infra/ansible"
info "running ansible-playbook playbooks/haproxy.yml (510 min)"
if ! ansible-playbook -i inventory/staging.yml playbooks/haproxy.yml \
--vault-password-file .vault-pass; then
TALAS_HINT="check the ansible output above ; common issues : Incus profile missing, port 80 blocked from Internet, DNS not yet propagated"
die "ansible-playbook haproxy.yml failed"
fi
info "verifying Let's Encrypt certs landed"
local certs
certs=$(ssh "$R720_USER@$R720_HOST" "incus exec veza-haproxy -- ls /usr/local/etc/tls/haproxy/ 2>/dev/null" || true)
if [[ -z "$certs" ]]; then
warn "no certs found in /usr/local/etc/tls/haproxy/ on veza-haproxy"
warn "check /var/log/letsencrypt or run again — dehydrated retries on next playbook run"
return 1
fi
ok "certs : $(echo "$certs" | tr '\n' ' ')"
mark_done haproxy
phase haproxy DONE
}
# ============================================================================
# Phase 6 — Summary
# ============================================================================
phase_6_summary() {
section "Phase 6 — Summary"
_current_phase=summary
phase summary START
cat <<EOF >&2
${_GREEN}${_BOLD}✓ Bootstrap complete.${_RESET}
What works now :
• Forgejo registry has the deploy secrets + variable.
• forgejo-runner has the 'incus' label and Incus socket access.
• veza-haproxy edge container is up with Let's Encrypt certs.
What you can do next :
1. Trigger a manual deploy via Forgejo Actions UI :
$FORGEJO_API_URL/$FORGEJO_OWNER/$FORGEJO_REPO/actions
"Veza deploy""Run workflow"env=staging.
2. Once that run is green, re-enable auto-trigger :
$SCRIPT_DIR/enable-auto-deploy.sh
3. Verify state any time :
$SCRIPT_DIR/verify-local.sh
ssh $R720_USER@$R720_HOST $SCRIPT_DIR/verify-remote.sh
State file : $TALAS_STATE_FILE
EOF
mark_done summary
phase summary DONE
}
# ============================================================================
# main
# ============================================================================
main() {
local start=${PHASE:-1}
info "starting at phase $start"
[[ $start -le 1 ]] && phase_1_preflight
[[ $start -le 2 ]] && phase_2_vault
[[ $start -le 3 ]] && phase_3_forgejo
[[ $start -le 4 ]] && phase_4_r720
[[ $start -le 5 ]] && phase_5_haproxy
[[ $start -le 6 ]] && phase_6_summary
ok "ALL DONE"
}
main "$@"

View file

@ -0,0 +1,238 @@
#!/usr/bin/env bash
# bootstrap-remote.sh — runs ON the R720, invoked over SSH by
# bootstrap-local.sh. Idempotent ; resumable via PHASE env var.
#
# Inputs (from SSH-passed env vars) :
# FORGEJO_REGISTRATION_TOKEN short-lived token to register runner
# FORGEJO_API_URL default: https://forgejo.talas.group
#
# Each phase logs to /var/log/talas-bootstrap.log AND emits structured
# >>>PHASE:<name>:<status><<< markers on stdout for the local script.
# lib.sh is concatenated upstream by bootstrap-local before this file is
# piped to bash. When run standalone, source it manually.
if ! declare -F info >/dev/null 2>&1; then
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=lib.sh
. "$SCRIPT_DIR/lib.sh"
fi
trap_errors
# Persistent log on R720 — useful when the SSH stream gets cut off.
exec > >(tee -a /var/log/talas-bootstrap.log) 2>&1
: "${FORGEJO_API_URL:=https://forgejo.talas.group}"
# ============================================================================
# Phase R1 — Incus profiles
# ============================================================================
remote_phase_1_profiles() {
section "R1 — Incus profiles (veza-app, veza-data, veza-net)"
_current_phase=r1_profiles
phase r1_profiles START
if skip_if_done r1_profiles "incus profiles"; then
phase r1_profiles DONE; return 0
fi
for p in veza-app veza-data veza-net; do
if incus profile show "$p" >/dev/null 2>&1; then
ok "profile $p already exists"
else
incus profile create "$p"
ok "profile $p created (empty — operator may add limits later)"
fi
done
# If there's an existing veza-net network, add it to veza-net profile
# so containers using that profile pick it up by default. Otherwise
# leave the profile empty (caller passes --network on launch).
if incus network show veza-net >/dev/null 2>&1; then
if ! incus profile device show veza-net 2>/dev/null | grep -q '^eth0:'; then
incus profile device add veza-net eth0 nic \
network=veza-net \
name=eth0 >/dev/null
ok "veza-net profile : eth0 → network veza-net"
else
ok "veza-net profile : eth0 device already configured"
fi
else
warn "incus network 'veza-net' not found — containers will need explicit --network on launch"
fi
mark_done r1_profiles
phase r1_profiles DONE
}
# ============================================================================
# Phase R2 — mount Incus socket into forgejo-runner container
# ============================================================================
remote_phase_2_runner_socket() {
section "R2 — mount /var/lib/incus/unix.socket into forgejo-runner"
_current_phase=r2_runner_socket
phase r2_runner_socket START
if skip_if_done r2_runner_socket "runner socket mount"; then
phase r2_runner_socket DONE; return 0
fi
if ! incus info forgejo-runner >/dev/null 2>&1; then
die "container 'forgejo-runner' not found ; expected at the IP shown in the design"
fi
if incus config device show forgejo-runner 2>/dev/null | grep -q '^incus-socket:'; then
ok "incus-socket device already attached"
else
info "attaching unix socket as a disk device"
incus config device add forgejo-runner incus-socket disk \
source=/var/lib/incus/unix.socket \
path=/var/lib/incus/unix.socket >/dev/null
ok "device added"
fi
if [[ "$(incus config get forgejo-runner security.nesting)" != "true" ]]; then
info "enabling security.nesting"
incus config set forgejo-runner security.nesting=true
ok "nesting=true ; restart required"
info "restarting forgejo-runner container"
incus restart forgejo-runner
sleep 3
fi
info "ensuring incus client is installed inside the runner"
if ! incus exec forgejo-runner -- command -v incus >/dev/null 2>&1; then
incus exec forgejo-runner -- apt-get update -qq
incus exec forgejo-runner -- apt-get install -y incus-client >/dev/null
ok "incus-client installed in runner"
else
ok "incus-client already in runner"
fi
info "smoke-test : runner can incus list"
if ! incus exec forgejo-runner -- incus list >/dev/null 2>&1; then
die "runner cannot reach Incus socket — verify nesting + permissions"
fi
ok "runner has Incus access"
mark_done r2_runner_socket
phase r2_runner_socket DONE
}
# ============================================================================
# Phase R3 — runner label = 'incus'
# ============================================================================
remote_phase_3_runner_labels() {
section "R3 — forgejo-runner labelled 'incus,self-hosted'"
_current_phase=r3_runner_labels
phase r3_runner_labels START
if skip_if_done r3_runner_labels "runner labels"; then
phase r3_runner_labels DONE; return 0
fi
require_env FORGEJO_REGISTRATION_TOKEN \
"set on the SSH command-line by bootstrap-local.sh"
# Find the runner config inside the container. Path varies by install
# method ; act_runner default is /etc/forgejo-runner/.runner.
local runner_cfg
runner_cfg=$(incus exec forgejo-runner -- bash -c '
for f in /etc/forgejo-runner/.runner /var/lib/forgejo-runner/.runner /opt/forgejo-runner/.runner; do
[[ -f "$f" ]] && echo "$f" && exit 0
done
exit 1
' 2>/dev/null) || true
local labels=""
if [[ -n "$runner_cfg" ]]; then
labels=$(incus exec forgejo-runner -- jq -r '.labels[]?' "$runner_cfg" 2>/dev/null \
|| incus exec forgejo-runner -- grep -oE '"labels":\[[^]]+' "$runner_cfg" 2>/dev/null \
|| echo "")
fi
if echo "$labels" | grep -qw incus; then
ok "runner already has 'incus' label"
mark_done r3_runner_labels
phase r3_runner_labels DONE
return 0
fi
info "re-registering runner with labels incus,self-hosted"
# Stop systemd unit, wipe old registration, re-register, start.
incus exec forgejo-runner -- systemctl stop forgejo-runner.service 2>/dev/null \
|| incus exec forgejo-runner -- systemctl stop act_runner.service 2>/dev/null \
|| warn "no systemd unit to stop ; will skip"
[[ -n "$runner_cfg" ]] && incus exec forgejo-runner -- rm -f "$runner_cfg"
# Detect runner binary name
local runner_bin
runner_bin=$(incus exec forgejo-runner -- bash -c '
for b in forgejo-runner act_runner; do
command -v "$b" >/dev/null 2>&1 && echo "$b" && exit 0
done
exit 1
' 2>/dev/null) || die "no forgejo-runner / act_runner binary found in container"
incus exec forgejo-runner -- "$runner_bin" register \
--no-interactive \
--instance "$FORGEJO_API_URL" \
--token "$FORGEJO_REGISTRATION_TOKEN" \
--name "r720-incus" \
--labels "incus,self-hosted"
incus exec forgejo-runner -- systemctl start "$runner_bin.service" \
|| incus exec forgejo-runner -- systemctl start forgejo-runner.service
ok "runner re-registered with incus label"
mark_done r3_runner_labels
phase r3_runner_labels DONE
}
# ============================================================================
# Phase R4 — sanity, summary
# ============================================================================
remote_phase_4_sanity() {
section "R4 — sanity check"
_current_phase=r4_sanity
phase r4_sanity START
info "incus profiles :"
incus profile list -f csv | grep -E '^veza-' | awk -F, '{print " " $1}'
info "forgejo-runner status :"
incus exec forgejo-runner -- systemctl is-active forgejo-runner.service 2>/dev/null \
|| incus exec forgejo-runner -- systemctl is-active act_runner.service 2>/dev/null \
|| warn "no active runner service — verify manually"
info "forgejo container reachable from runner :"
if incus exec forgejo-runner -- curl -sSf -o /dev/null --max-time 5 \
"$FORGEJO_API_URL" 2>/dev/null \
|| incus exec forgejo-runner -- curl -sSf -ko /dev/null --max-time 5 \
https://10.0.20.105:3000/ 2>/dev/null \
|| incus exec forgejo-runner -- curl -sSf -o /dev/null --max-time 5 \
http://10.0.20.105:3000/ 2>/dev/null; then
ok "runner can reach Forgejo"
else
warn "runner cannot reach Forgejo — check WireGuard / DNS / firewall"
fi
mark_done r4_sanity
phase r4_sanity DONE
}
main() {
local start=${PHASE:-1}
info "remote bootstrap starting at phase $start (log: /var/log/talas-bootstrap.log)"
[[ $start -le 1 ]] && remote_phase_1_profiles
[[ $start -le 2 ]] && remote_phase_2_runner_socket
[[ $start -le 3 ]] && remote_phase_3_runner_labels
[[ $start -le 4 ]] && remote_phase_4_sanity
ok "remote bootstrap done"
}
main "$@"

View file

@ -0,0 +1,52 @@
#!/usr/bin/env bash
# enable-auto-deploy.sh — flip the workflow_dispatch-only gate on
# .forgejo/workflows/deploy.yml back to push:main + tag:v*. Run this
# AFTER one successful manual workflow_dispatch run has proven the
# chain end-to-end.
set -Eeuo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
. "$SCRIPT_DIR/lib.sh"
trap_errors
REPO_ROOT=$(git -C "$SCRIPT_DIR" rev-parse --show-toplevel) || die "not in a git repo"
DEPLOY_YML="$REPO_ROOT/.forgejo/workflows/deploy.yml"
require_file "$DEPLOY_YML"
if grep -qE '^[[:space:]]+push:$' "$DEPLOY_YML"; then
ok "auto-deploy already enabled"
exit 0
fi
if ! grep -qE '^[[:space:]]+# push:' "$DEPLOY_YML"; then
die "deploy.yml has neither active push: nor commented '# push:' — manual edit required"
fi
info "uncommenting push: + branches: + tags: in $DEPLOY_YML"
# Conservative single-line replacements, indentation preserved.
sed -i \
-e 's|^ # push: # GATED — uncomment after first| push:|' \
-e 's|^ # branches: \[main\] # successful workflow_dispatch run| branches: [main]|' \
-e 's|^ # tags: \['"'"'v\*'"'"'\] # see RUNBOOK_DEPLOY_BOOTSTRAP.md| tags: ['"'"'v*'"'"']|' \
"$DEPLOY_YML"
# Verify.
if ! grep -qE '^[[:space:]]+push:$' "$DEPLOY_YML"; then
die "sed didn't apply — open $DEPLOY_YML and uncomment by hand"
fi
ok "edited $DEPLOY_YML"
info "diff:"
git -C "$REPO_ROOT" --no-pager diff -- "$DEPLOY_YML" >&2
cat >&2 <<EOF
Next step :
cd $REPO_ROOT
git add .forgejo/workflows/deploy.yml
git commit --no-verify -m "feat(forgejo): re-enable auto-deploy on push:main + tag:v*"
git push origin main
The push itself triggers the first auto-deploy. Watch :
https://forgejo.talas.group/${FORGEJO_OWNER:-talas}/${FORGEJO_REPO:-veza}/actions
EOF

203
scripts/bootstrap/lib.sh Executable file
View file

@ -0,0 +1,203 @@
# shellcheck shell=bash
# Shared helpers for the bootstrap + verify scripts. Source from each
# script ; never run directly.
#
# . "$(dirname "${BASH_SOURCE[0]}")/lib.sh"
#
# Conventions :
# * All functions log to stderr ; stdout is reserved for return values.
# * Every state-mutating action is paired with a state-checking guard
# that returns 0 if the action is already applied (idempotency).
# * Failures call `die` which exits non-zero with a hint.
# * Phase markers `>>>PHASE:<name>:<status><<<` are emitted on stdout
# so a parent script (bootstrap-local.sh streaming bootstrap-remote.sh
# over SSH) can grep + parse the progression.
# ----- ANSI + structured output -----------------------------------------------
if [[ -t 2 ]]; then
_RED=$'\033[31m'; _GREEN=$'\033[32m'; _YELLOW=$'\033[33m'
_BLUE=$'\033[34m'; _BOLD=$'\033[1m'; _RESET=$'\033[0m'
else
_RED=''; _GREEN=''; _YELLOW=''; _BLUE=''; _BOLD=''; _RESET=''
fi
_now() { date -u +'%Y-%m-%dT%H:%M:%SZ'; }
_log() { printf >&2 '%s [%s] %s\n' "$(_now)" "$1" "$2"; }
info() { _log "${_BLUE}INFO${_RESET}" "$*"; }
ok() { _log "${_GREEN}OK${_RESET}" "$*"; }
warn() { _log "${_YELLOW}WARN${_RESET}" "$*"; }
err() { _log "${_RED}ERR${_RESET}" "$*"; }
section() { printf >&2 '\n%s%s===== %s =====%s\n' "$_BOLD" "$_BLUE" "$*" "$_RESET"; }
# Phase marker emitted on stdout (parsed by parent scripts).
phase() { printf '>>>PHASE:%s:%s<<<\n' "$1" "$2"; }
# Hard fail with hint.
die() {
err "$*"
if [[ -n "${TALAS_HINT:-}" ]]; then
printf >&2 '%shint:%s %s\n' "$_YELLOW" "$_RESET" "$TALAS_HINT"
fi
exit 1
}
# ----- pre-conditions ---------------------------------------------------------
require_cmd() {
local missing=()
for c in "$@"; do
command -v "$c" >/dev/null 2>&1 || missing+=("$c")
done
if (( ${#missing[@]} > 0 )); then
TALAS_HINT="apt install ${missing[*]} (Debian/Ubuntu)"
die "missing commands: ${missing[*]}"
fi
}
require_file() {
[[ -f "$1" ]] || die "missing file: $1"
}
require_env() {
local var=$1 hint=${2:-}
if [[ -z "${!var:-}" ]]; then
TALAS_HINT="$hint"
die "env var \$$var is not set"
fi
}
# ----- state file (shared across bootstrap + verify) --------------------------
# State lives at /var/lib/talas/bootstrap.state on each host. One key=value
# line per phase. mark_done is idempotent ; phase_done returns 0 if marked.
: "${TALAS_STATE_DIR:=/var/lib/talas}"
: "${TALAS_STATE_FILE:=$TALAS_STATE_DIR/bootstrap.state}"
ensure_state_dir() {
if [[ ! -d "$TALAS_STATE_DIR" ]]; then
# Try without sudo first (already root in container case).
mkdir -p "$TALAS_STATE_DIR" 2>/dev/null \
|| sudo mkdir -p "$TALAS_STATE_DIR" \
|| die "cannot create $TALAS_STATE_DIR (need root or run with sudo)"
fi
[[ -f "$TALAS_STATE_FILE" ]] || (touch "$TALAS_STATE_FILE" 2>/dev/null || sudo touch "$TALAS_STATE_FILE")
}
mark_done() {
local key=$1
ensure_state_dir
local line="$key=DONE $(_now)"
if ! grep -q "^$key=" "$TALAS_STATE_FILE" 2>/dev/null; then
echo "$line" | (tee -a "$TALAS_STATE_FILE" 2>/dev/null || sudo tee -a "$TALAS_STATE_FILE") >/dev/null
fi
}
phase_done() {
local key=$1
[[ -f "$TALAS_STATE_FILE" ]] || return 1
grep -q "^$key=DONE" "$TALAS_STATE_FILE" 2>/dev/null
}
skip_if_done() {
local key=$1 label=$2
if phase_done "$key"; then
ok "$label — already done (skipped)"
return 0
fi
return 1
}
# ----- error trap -------------------------------------------------------------
_trap_err() {
local rc=$? line=$1
err "FAILED at $0:$line (rc=$rc)"
if [[ -n "${TALAS_HINT:-}" ]]; then
printf >&2 '%shint:%s %s\n' "$_YELLOW" "$_RESET" "$TALAS_HINT"
fi
phase "$(_current_phase)" "FAIL"
exit "$rc"
}
_current_phase=""
_current_phase() { echo "${_current_phase:-unknown}"; }
# Call once at script start.
trap_errors() {
set -Eeuo pipefail
trap '_trap_err $LINENO' ERR
}
# ----- prompts (interactive only) ---------------------------------------------
prompt_password() {
local var=$1 question=${2:-"value (input hidden):"}
local v=""
while [[ -z "$v" ]]; do
printf >&2 '%s ' "$question"
IFS= read -rs v
printf >&2 '\n'
[[ -z "$v" ]] && warn "empty — try again"
done
eval "$var=\$v"
}
prompt_value() {
local var=$1 question=${2:-"value:"} default=${3:-}
local v=""
if [[ -n "$default" ]]; then
printf >&2 '%s [%s] ' "$question" "$default"
else
printf >&2 '%s ' "$question"
fi
IFS= read -r v
[[ -z "$v" && -n "$default" ]] && v="$default"
eval "$var=\$v"
}
# ----- Forgejo API helper -----------------------------------------------------
# Requires: $FORGEJO_API_URL, $FORGEJO_ADMIN_TOKEN
forgejo_api() {
local method=$1 path=$2; shift 2
curl -fsSL --max-time 30 \
-X "$method" \
-H "Authorization: token ${FORGEJO_ADMIN_TOKEN:?FORGEJO_ADMIN_TOKEN unset}" \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
"$FORGEJO_API_URL/api/v1$path" "$@"
}
forgejo_set_secret() {
local owner=$1 repo=$2 name=$3 value=$4
local body
body=$(jq -nc --arg v "$value" '{data: $v}')
if forgejo_api PUT "/repos/$owner/$repo/actions/secrets/$name" --data "$body" >/dev/null 2>&1; then
ok "secret $name set"
else
die "failed to set secret $name (token scope ? repo path ?)"
fi
}
forgejo_set_var() {
local owner=$1 repo=$2 name=$3 value=$4
local body
body=$(jq -nc --arg n "$name" --arg v "$value" '{name: $n, value: $v}')
# Try update (PUT) ; if 404, create (POST).
if forgejo_api PUT "/repos/$owner/$repo/actions/variables/$name" --data "$body" >/dev/null 2>&1; then
ok "variable $name updated"
elif forgejo_api POST "/repos/$owner/$repo/actions/variables" --data "$body" >/dev/null 2>&1; then
ok "variable $name created"
else
die "failed to set variable $name"
fi
}
forgejo_get_runner_token() {
local owner=$1 repo=$2
forgejo_api GET "/repos/$owner/$repo/actions/runners/registration-token" \
| jq -er '.token // empty' \
|| die "failed to fetch runner registration token (admin scope ?)"
}

131
scripts/bootstrap/verify-local.sh Executable file
View file

@ -0,0 +1,131 @@
#!/usr/bin/env bash
# verify-local.sh — read-only checks of local state (vault, secrets, ssh).
# Exit 0 if everything passes ; non-zero with a count of failures.
set -uo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=lib.sh
. "$SCRIPT_DIR/lib.sh"
[[ -f "$SCRIPT_DIR/.env" ]] && . "$SCRIPT_DIR/.env"
: "${R720_HOST:=10.0.20.150}"
: "${R720_USER:=ansible}"
: "${FORGEJO_API_URL:=https://forgejo.talas.group}"
: "${FORGEJO_OWNER:=talas}"
: "${FORGEJO_REPO:=veza}"
REPO_ROOT=$(git -C "$SCRIPT_DIR" rev-parse --show-toplevel 2>/dev/null) || {
err "not in a git repo"
exit 1
}
VAULT_YML="$REPO_ROOT/infra/ansible/group_vars/all/vault.yml"
VAULT_PASS="$REPO_ROOT/infra/ansible/.vault-pass"
declare -i PASS=0 FAIL=0
check() {
local name=$1 cmd=$2
if eval "$cmd" >/dev/null 2>&1; then
ok "$name"
PASS+=1
else
err "$name"
FAIL+=1
fi
}
check_with_hint() {
local name=$1 cmd=$2 hint=$3
if eval "$cmd" >/dev/null 2>&1; then
ok "$name"
PASS+=1
else
err "$name"
printf >&2 ' %shint:%s %s\n' "$_YELLOW" "$_RESET" "$hint"
FAIL+=1
fi
}
section "Local prerequisites"
check "git available" "command -v git"
check "ansible available" "command -v ansible"
check "ansible-vault available" "command -v ansible-vault"
check "curl available" "command -v curl"
check "jq available" "command -v jq"
check "ssh available" "command -v ssh"
check "openssl available" "command -v openssl"
check "dig available" "command -v dig"
section "Repo state"
check "in repo root" "[[ -f $REPO_ROOT/CLAUDE.md ]]"
check "infra/ansible/ exists" "[[ -d $REPO_ROOT/infra/ansible ]]"
check ".forgejo/workflows/deploy.yml" "[[ -f $REPO_ROOT/.forgejo/workflows/deploy.yml ]]"
check_with_hint "deploy.yml gated (no auto-trigger)" \
"! grep -E '^[[:space:]]+push:$' $REPO_ROOT/.forgejo/workflows/deploy.yml" \
"if you want auto-deploy, run scripts/bootstrap/enable-auto-deploy.sh"
section "Vault"
check "vault.yml.example exists" "[[ -f $REPO_ROOT/infra/ansible/group_vars/all/vault.yml.example ]]"
check "vault.yml exists" "[[ -f $VAULT_YML ]]"
check_with_hint "vault.yml is encrypted" \
"head -1 $VAULT_YML 2>/dev/null | grep -q '^\\\$ANSIBLE_VAULT'" \
"PHASE=2 ./bootstrap-local.sh"
check_with_hint ".vault-pass exists" \
"[[ -f $VAULT_PASS ]]" \
"PHASE=2 ./bootstrap-local.sh"
check_with_hint ".vault-pass mode 0400" \
"[[ \$(stat -c '%a' $VAULT_PASS 2>/dev/null) == '400' ]]" \
"chmod 0400 $VAULT_PASS"
check_with_hint "can decrypt vault.yml" \
"ansible-vault view --vault-password-file $VAULT_PASS $VAULT_YML" \
"vault password mismatch — re-encrypt with: ansible-vault rekey --new-vault-password-file $VAULT_PASS $VAULT_YML"
check_with_hint "no <TODO> placeholders left" \
"! ansible-vault view --vault-password-file $VAULT_PASS $VAULT_YML 2>/dev/null | grep -q '<TODO'" \
"ansible-vault edit --vault-password-file $VAULT_PASS $VAULT_YML"
section "SSH to R720 ($R720_USER@$R720_HOST)"
check_with_hint "ssh handshake" \
"ssh -o ConnectTimeout=5 -o BatchMode=yes $R720_USER@$R720_HOST /bin/true" \
"ensure your key is in $R720_USER@$R720_HOST:~/.ssh/authorized_keys"
check "incus reachable on R720" \
"ssh -o BatchMode=yes $R720_USER@$R720_HOST 'incus list >/dev/null 2>&1'"
section "DNS public domains"
for d in veza.fr www.veza.fr staging.veza.fr talas.fr www.talas.fr forgejo.talas.group; do
check_with_hint "$d resolves" \
"dig +short +time=2 +tries=1 $d @1.1.1.1 | grep -qE '^[0-9]+\\.'" \
"set the A record at your registrar to point to your R720 public IP"
done
if [[ -n "${FORGEJO_ADMIN_TOKEN:-}" ]]; then
section "Forgejo API + secrets/vars"
check_with_hint "Forgejo API reachable" \
"curl -fsSL --max-time 10 -H 'Authorization: token $FORGEJO_ADMIN_TOKEN' $FORGEJO_API_URL/api/v1/user" \
"set FORGEJO_API_URL ; if no DNS yet, FORGEJO_API_URL=http://10.0.20.105:3000"
check_with_hint "repo $FORGEJO_OWNER/$FORGEJO_REPO exists" \
"curl -fsSL -H 'Authorization: token $FORGEJO_ADMIN_TOKEN' $FORGEJO_API_URL/api/v1/repos/$FORGEJO_OWNER/$FORGEJO_REPO" \
"set FORGEJO_OWNER + FORGEJO_REPO env vars"
check_with_hint "secret FORGEJO_REGISTRY_TOKEN exists" \
"curl -fsSL -H 'Authorization: token $FORGEJO_ADMIN_TOKEN' $FORGEJO_API_URL/api/v1/repos/$FORGEJO_OWNER/$FORGEJO_REPO/actions/secrets/FORGEJO_REGISTRY_TOKEN" \
"PHASE=3 ./bootstrap-local.sh"
check_with_hint "secret ANSIBLE_VAULT_PASSWORD exists" \
"curl -fsSL -H 'Authorization: token $FORGEJO_ADMIN_TOKEN' $FORGEJO_API_URL/api/v1/repos/$FORGEJO_OWNER/$FORGEJO_REPO/actions/secrets/ANSIBLE_VAULT_PASSWORD" \
"PHASE=3 ./bootstrap-local.sh"
check_with_hint "variable FORGEJO_REGISTRY_URL exists" \
"curl -fsSL -H 'Authorization: token $FORGEJO_ADMIN_TOKEN' $FORGEJO_API_URL/api/v1/repos/$FORGEJO_OWNER/$FORGEJO_REPO/actions/variables/FORGEJO_REGISTRY_URL" \
"PHASE=3 ./bootstrap-local.sh"
else
warn "FORGEJO_ADMIN_TOKEN not set — skipping API checks. Set it to run those."
fi
section "Result"
if (( FAIL == 0 )); then
ok "$PASS / $((PASS + FAIL)) checks passed"
exit 0
else
err "$FAIL FAIL out of $((PASS + FAIL)) ($PASS passed)"
exit 1
fi

View file

@ -0,0 +1,122 @@
#!/usr/bin/env bash
# verify-remote.sh — read-only checks of R720 state (Incus profiles,
# runner labels, container reachability, certs). Run on the R720 itself
# (locally or via `ssh r720 verify-remote.sh`).
#
# Exit 0 if everything passes ; non-zero with a count of failures.
set -uo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=lib.sh
. "$SCRIPT_DIR/lib.sh"
: "${FORGEJO_API_URL:=https://forgejo.talas.group}"
declare -i PASS=0 FAIL=0
check() {
local name=$1 cmd=$2
if eval "$cmd" >/dev/null 2>&1; then
ok "$name"
PASS+=1
else
err "$name"
FAIL+=1
fi
}
check_with_hint() {
local name=$1 cmd=$2 hint=$3
if eval "$cmd" >/dev/null 2>&1; then
ok "$name"
PASS+=1
else
err "$name"
printf >&2 ' %shint:%s %s\n' "$_YELLOW" "$_RESET" "$hint"
FAIL+=1
fi
}
section "R720 prerequisites"
check "incus available" "command -v incus"
check "zfs available" "command -v zfs"
check "incus list works" "incus list"
section "Incus profiles"
for p in veza-app veza-data veza-net; do
check_with_hint "profile $p exists" \
"incus profile show $p" \
"run scripts/bootstrap/bootstrap-remote.sh as root"
done
section "Forgejo container"
check "container 'forgejo' exists" "incus info forgejo"
check "container 'forgejo' RUNNING" \
"incus list forgejo -f csv -c s 2>/dev/null | grep -q RUNNING"
check_with_hint "Forgejo HTTP responds on :3000" \
"curl -ksSf -o /dev/null --max-time 5 http://10.0.20.105:3000/ || curl -ksSf -o /dev/null --max-time 5 https://10.0.20.105:3000/" \
"incus exec forgejo -- systemctl status forgejo"
section "Forgejo runner"
check "container 'forgejo-runner' exists" "incus info forgejo-runner"
check "container 'forgejo-runner' RUNNING" \
"incus list forgejo-runner -f csv -c s 2>/dev/null | grep -q RUNNING"
check_with_hint "incus-socket device attached" \
"incus config device show forgejo-runner | grep -q '^incus-socket:'" \
"PHASE=2 sudo bash scripts/bootstrap/bootstrap-remote.sh"
check_with_hint "security.nesting=true" \
"[[ \$(incus config get forgejo-runner security.nesting) == true ]]" \
"incus config set forgejo-runner security.nesting=true && incus restart forgejo-runner"
check_with_hint "incus-client installed in runner" \
"incus exec forgejo-runner -- command -v incus" \
"incus exec forgejo-runner -- apt install -y incus-client"
check_with_hint "runner can incus list (socket reachable)" \
"incus exec forgejo-runner -- incus list" \
"verify the unix-socket disk device + nesting"
check_with_hint "runner config has 'incus' label" \
"incus exec forgejo-runner -- bash -c 'for f in /etc/forgejo-runner/.runner /var/lib/forgejo-runner/.runner /opt/forgejo-runner/.runner ; do [[ -f \$f ]] && grep -q incus \$f && exit 0 ; done ; exit 1'" \
"PHASE=3 sudo bash scripts/bootstrap/bootstrap-remote.sh"
check_with_hint "runner systemd unit active" \
"incus exec forgejo-runner -- bash -c 'systemctl is-active forgejo-runner.service 2>/dev/null || systemctl is-active act_runner.service'" \
"incus exec forgejo-runner -- journalctl -u forgejo-runner -n 50"
section "Edge HAProxy (only after running playbooks/haproxy.yml)"
if incus info veza-haproxy >/dev/null 2>&1; then
check "container 'veza-haproxy' RUNNING" \
"incus list veza-haproxy -f csv -c s | grep -q RUNNING"
check_with_hint "haproxy systemd unit active" \
"incus exec veza-haproxy -- systemctl is-active haproxy" \
"incus exec veza-haproxy -- journalctl -u haproxy -n 50"
check_with_hint "haproxy.cfg present" \
"incus exec veza-haproxy -- test -f /etc/haproxy/haproxy.cfg" \
"ansible-playbook -i inventory/staging.yml playbooks/haproxy.yml"
check_with_hint "haproxy.cfg passes self-validation" \
"incus exec veza-haproxy -- haproxy -f /etc/haproxy/haproxy.cfg -c -q" \
"config syntax error — re-run ansible-playbook to re-render"
check_with_hint "Let's Encrypt cert dir has at least 1 .pem" \
"incus exec veza-haproxy -- bash -c 'ls /usr/local/etc/tls/haproxy/*.pem 2>/dev/null | wc -l | grep -q -E \"^[1-9]\"'" \
"rerun ansible-playbook ; verify port 80 reachable from Internet for HTTP-01"
else
warn "container 'veza-haproxy' does not exist yet — run ansible-playbook playbooks/haproxy.yml"
fi
section "ZFS state (snapshots tolerated)"
check "rpool exists" \
"zpool list rpool"
section "State file"
if [[ -f "$TALAS_STATE_FILE" ]]; then
info "phases recorded :"
cat "$TALAS_STATE_FILE" | sed 's/^/ /'
else
warn "no state file at $TALAS_STATE_FILE — bootstrap-remote.sh hasn't run yet"
fi
section "Result"
if (( FAIL == 0 )); then
ok "$PASS / $((PASS + FAIL)) checks passed"
exit 0
else
err "$FAIL FAIL out of $((PASS + FAIL)) ($PASS passed)"
exit 1
fi