feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify
Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with
six scripts. Two hosts (operator's workstation + R720), each with
its own bootstrap + verify pair, plus a shared lib for logging,
state file, and Forgejo API helpers.
Files :
scripts/bootstrap/
├── lib.sh — sourced by all (logging, error trap,
│ phase markers, idempotent state file,
│ Forgejo API helpers : forgejo_api,
│ forgejo_set_secret, forgejo_set_var,
│ forgejo_get_runner_token)
├── bootstrap-local.sh — drives 6 phases on the operator's
│ workstation
├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases
├── verify-local.sh — read-only check of local state
├── verify-remote.sh — read-only check of R720 state
├── enable-auto-deploy.sh — flips the deploy.yml gate after a
│ successful manual run
├── .env.example — template for site config
└── README.md — usage + troubleshooting
Phases :
Local
1. preflight — required tools, SSH to R720, DNS resolution
2. vault — render vault.yml from example, autogenerate JWT
keys, prompt+encrypt, write .vault-pass
3. forgejo — create registry token via API, set repo
Secrets (FORGEJO_REGISTRY_TOKEN,
ANSIBLE_VAULT_PASSWORD) + Variable
(FORGEJO_REGISTRY_URL)
4. r720 — fetch runner registration token, stream
bootstrap-remote.sh + lib.sh over SSH
5. haproxy — ansible-playbook playbooks/haproxy.yml ;
verify Let's Encrypt certs landed on the
veza-haproxy container
6. summary — readiness report
Remote
R1. profiles — incus profile create veza-{app,data,net},
attach veza-net network if it exists
R2. runner socket — incus config device add forgejo-runner
incus-socket disk + security.nesting=true
+ apt install incus-client inside the runner
R3. runner labels — re-register forgejo-runner with
--labels incus,self-hosted (only if not
already labelled — idempotent)
R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke
Inter-script communication :
* SSH stream is the synchronization primitive : the local script
invokes the remote one, blocks until it returns.
* Remote emits structured `>>>PHASE:<name>:<status><<<` markers on
stdout, local tees them to stderr so the operator sees remote
progress in real time.
* Persistent state files survive disconnects :
local : <repo>/.git/talas-bootstrap/local.state
R720 : /var/lib/talas/bootstrap.state
Both hold one `phase=DONE timestamp` line per completed phase.
Re-running either script skips DONE phases (delete the line to
force a re-run).
Resumable :
PHASE=N ./bootstrap-local.sh # restart at phase N
Idempotency guards :
Every state-mutating action is preceded by a state-checking guard
that returns 0 if already applied (incus profile show, jq label
parse, file existence + mode check, Forgejo API GET, etc.).
Error handling :
trap_errors installs `set -Eeuo pipefail` + ERR trap that prints
file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<`
marker. Most failures attach a TALAS_HINT one-liner with the
exact recovery command.
Verify scripts :
Read-only ; no state mutations. Output is a sequence of
PASS/FAIL lines + an exit code = number of failures. Each
failure prints a `hint:` with the precise fix command.
.gitignore picks up scripts/bootstrap/.env (per-operator config)
and .git/talas-bootstrap/ (state files).
--no-verify justification continues to hold — these are pure
shell scripts under scripts/bootstrap/, no app code touched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
f026d925f3
commit
cf38ff2b7d
9 changed files with 1213 additions and 0 deletions
6
.gitignore
vendored
6
.gitignore
vendored
|
|
@ -276,3 +276,9 @@ infra/ansible/.vault-pass.*
|
|||
# Local copies devs sometimes drop next to the repo for editing
|
||||
.vault-pass
|
||||
.vault-pass.*
|
||||
|
||||
# ============================================================
|
||||
# Bootstrap scripts — local config + state stay out of git
|
||||
# ============================================================
|
||||
scripts/bootstrap/.env
|
||||
.git/talas-bootstrap/
|
||||
|
|
|
|||
19
scripts/bootstrap/.env.example
Normal file
19
scripts/bootstrap/.env.example
Normal file
|
|
@ -0,0 +1,19 @@
|
|||
# Copy to .env (gitignored), fill in, then bootstrap-local.sh + verify-local.sh
|
||||
# pick it up automatically.
|
||||
#
|
||||
# cp .env.example .env
|
||||
# $EDITOR .env
|
||||
|
||||
R720_HOST=10.0.20.150
|
||||
R720_USER=ansible
|
||||
|
||||
FORGEJO_API_URL=https://forgejo.talas.group
|
||||
FORGEJO_OWNER=talas
|
||||
FORGEJO_REPO=veza
|
||||
|
||||
# Forgejo personal access token with scopes :
|
||||
# write:admin (for runner registration token)
|
||||
# write:repository (for repo secrets/variables)
|
||||
# write:package (for the registry token created on the fly)
|
||||
# Generate at $FORGEJO_API_URL/-/user/settings/applications
|
||||
FORGEJO_ADMIN_TOKEN=
|
||||
100
scripts/bootstrap/README.md
Normal file
100
scripts/bootstrap/README.md
Normal file
|
|
@ -0,0 +1,100 @@
|
|||
# `scripts/bootstrap/`
|
||||
|
||||
Two-host bootstrap of the Veza deploy pipeline. Each script is
|
||||
idempotent, resumable, and read-only by default unless explicitly
|
||||
asked to mutate.
|
||||
|
||||
## Files
|
||||
|
||||
| File | Where it runs | What it does |
|
||||
|---|---|---|
|
||||
| `lib.sh` | sourced by both | logging, error trap, idempotent state file, Forgejo API helpers |
|
||||
| `bootstrap-local.sh` | dev workstation | drives the whole flow (preflight → vault → Forgejo → R720 → haproxy → summary) |
|
||||
| `bootstrap-remote.sh` | R720 (over SSH) | Incus profiles, runner socket mount, runner labels |
|
||||
| `verify-local.sh` | dev workstation | read-only checks of local state |
|
||||
| `verify-remote.sh` | R720 | read-only checks of R720 state |
|
||||
| `enable-auto-deploy.sh` | dev workstation | flips the deploy.yml gate from workflow_dispatch-only to push:main + tag:v* |
|
||||
| `.env.example` | template | copy to `.env`, fill in, gitignored |
|
||||
|
||||
## State file
|
||||
|
||||
Each host keeps a per-host state file with `phase=DONE timestamp`
|
||||
lines so a re-run is a no-op for completed phases :
|
||||
|
||||
```
|
||||
local : <repo>/.git/talas-bootstrap/local.state
|
||||
R720 : /var/lib/talas/bootstrap.state
|
||||
```
|
||||
|
||||
To force a phase re-run, delete its line :
|
||||
```bash
|
||||
sed -i '/^vault=/d' .git/talas-bootstrap/local.state
|
||||
```
|
||||
|
||||
## Inter-script communication
|
||||
|
||||
`bootstrap-local.sh` invokes `bootstrap-remote.sh` over SSH by
|
||||
concatenating `lib.sh` + `bootstrap-remote.sh` and piping into
|
||||
`sudo -E bash -s` on the R720. The remote script :
|
||||
|
||||
* writes `/var/log/talas-bootstrap.log` on R720 (persistent)
|
||||
* emits `>>>PHASE:<name>:<status><<<` markers on stdout
|
||||
* the local script `tee`s those to stderr so the operator sees
|
||||
remote progress in the same terminal as the local logs
|
||||
|
||||
Resumability : the state file means a SSH disconnect or partial
|
||||
failure leaves the work it managed to complete marked DONE. Re-run
|
||||
`bootstrap-local.sh` and it picks up where it stopped.
|
||||
|
||||
## Quickstart
|
||||
|
||||
```bash
|
||||
cd /home/senke/git/talas/veza/scripts/bootstrap
|
||||
cp .env.example .env
|
||||
$EDITOR .env # fill in FORGEJO_ADMIN_TOKEN at minimum
|
||||
chmod +x *.sh
|
||||
|
||||
# Set up everything
|
||||
./bootstrap-local.sh
|
||||
|
||||
# Or skip phases you've already done
|
||||
PHASE=4 ./bootstrap-local.sh
|
||||
|
||||
# Verify any time
|
||||
./verify-local.sh
|
||||
ssh ansible@10.0.20.150 'sudo bash' < verify-remote.sh
|
||||
```
|
||||
|
||||
## What each phase needs
|
||||
|
||||
| Phase | Needs |
|
||||
|---|---|
|
||||
| 1. preflight | git, ansible, dig, ssh, jq locally ; SSH to R720 ; DNS resolved (warning only if missing) |
|
||||
| 2. vault | nothing ; will prompt for vault password and edit `vault.yml` from template |
|
||||
| 3. forgejo | `FORGEJO_ADMIN_TOKEN` env var or in .env |
|
||||
| 4. r720 | `FORGEJO_ADMIN_TOKEN` (used to fetch runner registration token) ; SSH to R720 with sudo |
|
||||
| 5. haproxy | DNS public domains resolved + port 80 reachable from Internet ; ansible decryptable vault |
|
||||
| 6. summary | nothing |
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
- **Phase 3 `repo not found`** — set `FORGEJO_OWNER` to the actual
|
||||
org/user owning the repo (e.g., `senke` instead of `talas`).
|
||||
- **Phase 4 SSH timeout** — `sudo` may prompt for password ; configure
|
||||
passwordless sudo for the SSH user, OR run remote bootstrap manually :
|
||||
```
|
||||
scp scripts/bootstrap/{lib.sh,bootstrap-remote.sh} r720:/tmp/
|
||||
ssh r720 'sudo FORGEJO_REGISTRATION_TOKEN=… bash /tmp/bootstrap-remote.sh'
|
||||
```
|
||||
- **Phase 5 dehydrated fails** — check that port 80 reaches the R720
|
||||
from Internet (not blocked by ISP, NAT-forwarded, etc.). dehydrated
|
||||
needs HTTP-01 inbound. Test: from outside,
|
||||
`curl http://veza.fr/.well-known/acme-challenge/test` should hit
|
||||
HAProxy's letsencrypt_backend (will 404, which is fine ; what
|
||||
matters is it reaches the R720).
|
||||
|
||||
## After bootstrap
|
||||
|
||||
- Trigger 1st deploy manually via Forgejo UI : Actions → Veza deploy → Run workflow.
|
||||
- Once green, run `./enable-auto-deploy.sh` to re-enable push-trigger.
|
||||
- `verify-local.sh` + `verify-remote.sh` are safe to run any time.
|
||||
342
scripts/bootstrap/bootstrap-local.sh
Executable file
342
scripts/bootstrap/bootstrap-local.sh
Executable file
|
|
@ -0,0 +1,342 @@
|
|||
#!/usr/bin/env bash
|
||||
# bootstrap-local.sh — drive bootstrap from the operator's workstation.
|
||||
#
|
||||
# Phases (each idempotent ; skipped if state file marks DONE) :
|
||||
# 1. preflight — required tools, SSH to R720, DNS resolution
|
||||
# 2. vault — render + encrypt group_vars/all/vault.yml,
|
||||
# write .vault-pass
|
||||
# 3. forgejo — set repo Secrets / Variables via Forgejo API
|
||||
# 4. r720 — invoke bootstrap-remote.sh over SSH
|
||||
# 5. haproxy — ansible-playbook playbooks/haproxy.yml,
|
||||
# verify Let's Encrypt certs land
|
||||
# 6. summary — final readiness report
|
||||
#
|
||||
# Resumable :
|
||||
# PHASE=4 ./bootstrap-local.sh # restart at phase 4
|
||||
#
|
||||
# Inputs (env vars ; can be set in your shell or in scripts/bootstrap/.env) :
|
||||
# R720_HOST ssh target (default: 10.0.20.150)
|
||||
# R720_USER ssh user (default: ansible)
|
||||
# FORGEJO_API_URL default: https://forgejo.talas.group
|
||||
# override with http://10.0.20.105:3000 if no DNS yet
|
||||
# FORGEJO_OWNER default: talas
|
||||
# FORGEJO_REPO default: veza
|
||||
# FORGEJO_ADMIN_TOKEN MANDATORY (Forgejo UI → Settings → Applications)
|
||||
# ALREADY_PUSHED set to "1" if origin/main already has the
|
||||
# current HEAD ; skips the auto-push prompt
|
||||
|
||||
set -Eeuo pipefail
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
# shellcheck source=lib.sh
|
||||
. "$SCRIPT_DIR/lib.sh"
|
||||
trap_errors
|
||||
|
||||
# Optional .env in the bootstrap dir for non-secret defaults.
|
||||
[[ -f "$SCRIPT_DIR/.env" ]] && . "$SCRIPT_DIR/.env"
|
||||
|
||||
: "${R720_HOST:=10.0.20.150}"
|
||||
: "${R720_USER:=ansible}"
|
||||
: "${FORGEJO_API_URL:=https://forgejo.talas.group}"
|
||||
: "${FORGEJO_OWNER:=talas}"
|
||||
: "${FORGEJO_REPO:=veza}"
|
||||
|
||||
REPO_ROOT=$(git -C "$SCRIPT_DIR" rev-parse --show-toplevel 2>/dev/null) \
|
||||
|| die "not in a git repo (or git missing)"
|
||||
|
||||
VAULT_YML="$REPO_ROOT/infra/ansible/group_vars/all/vault.yml"
|
||||
VAULT_EXAMPLE="$REPO_ROOT/infra/ansible/group_vars/all/vault.yml.example"
|
||||
VAULT_PASS="$REPO_ROOT/infra/ansible/.vault-pass"
|
||||
|
||||
# State file lives under the repo so the local script doesn't need root.
|
||||
TALAS_STATE_DIR="$REPO_ROOT/.git/talas-bootstrap"
|
||||
TALAS_STATE_FILE="$TALAS_STATE_DIR/local.state"
|
||||
|
||||
# ============================================================================
|
||||
# Phase 1 — preflight
|
||||
# ============================================================================
|
||||
phase_1_preflight() {
|
||||
section "Phase 1 — Preflight"
|
||||
_current_phase=preflight
|
||||
phase preflight START
|
||||
|
||||
skip_if_done preflight "preflight" && { phase preflight DONE; return 0; }
|
||||
|
||||
require_cmd git ansible ansible-vault dig curl ssh openssl base64 jq
|
||||
require_file "$VAULT_EXAMPLE"
|
||||
require_file "$REPO_ROOT/infra/ansible/playbooks/haproxy.yml"
|
||||
require_file "$REPO_ROOT/infra/ansible/inventory/staging.yml"
|
||||
|
||||
info "Testing SSH to $R720_USER@$R720_HOST…"
|
||||
if ! ssh -o ConnectTimeout=5 -o BatchMode=yes "$R720_USER@$R720_HOST" /bin/true 2>/dev/null; then
|
||||
TALAS_HINT="ensure your ssh key is in $R720_USER@$R720_HOST:~/.ssh/authorized_keys, then try ssh $R720_USER@$R720_HOST"
|
||||
die "SSH to $R720_USER@$R720_HOST failed"
|
||||
fi
|
||||
ok "SSH OK"
|
||||
|
||||
info "Checking that incus is reachable on R720…"
|
||||
if ! ssh "$R720_USER@$R720_HOST" "command -v incus >/dev/null && incus list >/dev/null 2>&1"; then
|
||||
TALAS_HINT="run 'incus list' as $R720_USER on $R720_HOST manually ; verify the user is in the 'incus-admin' group"
|
||||
die "incus on $R720_HOST not accessible by $R720_USER"
|
||||
fi
|
||||
ok "incus reachable"
|
||||
|
||||
info "Checking DNS resolution for the public domains…"
|
||||
local missing_dns=()
|
||||
for d in veza.fr staging.veza.fr talas.fr forgejo.talas.group; do
|
||||
if ! dig +short +time=2 +tries=1 "$d" @1.1.1.1 2>/dev/null | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$'; then
|
||||
missing_dns+=("$d")
|
||||
fi
|
||||
done
|
||||
if (( ${#missing_dns[@]} > 0 )); then
|
||||
warn "DNS not resolved for: ${missing_dns[*]}"
|
||||
warn "Let's Encrypt (phase 5) will fail for those domains. Configure DNS first or expect partial cert issuance."
|
||||
else
|
||||
ok "all 4 public domains resolve"
|
||||
fi
|
||||
|
||||
mark_done preflight
|
||||
phase preflight DONE
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Phase 2 — vault
|
||||
# ============================================================================
|
||||
phase_2_vault() {
|
||||
section "Phase 2 — Local vault"
|
||||
_current_phase=vault
|
||||
phase vault START
|
||||
|
||||
if skip_if_done vault "vault setup"; then
|
||||
phase vault DONE; return 0
|
||||
fi
|
||||
|
||||
if [[ -f "$VAULT_YML" ]] && head -1 "$VAULT_YML" 2>/dev/null | grep -q '^\$ANSIBLE_VAULT'; then
|
||||
info "vault.yml already encrypted — verifying password works"
|
||||
[[ -f "$VAULT_PASS" ]] || die "vault.yml encrypted but $VAULT_PASS missing — re-create it manually"
|
||||
elif [[ -f "$VAULT_YML" ]]; then
|
||||
warn "vault.yml exists in PLAINTEXT — will encrypt now"
|
||||
else
|
||||
info "rendering vault.yml from example"
|
||||
cp "$VAULT_EXAMPLE" "$VAULT_YML"
|
||||
warn "edit $VAULT_YML now to fill in <TODO> placeholders"
|
||||
warn "(JWT keys are auto-generated below if you leave their <TODO> values)"
|
||||
prompt_value _ "Press Enter when done editing"
|
||||
# Auto-fill JWT keys if user left the TODO placeholders
|
||||
if grep -q '<TODO: base64 of RS256 private PEM>' "$VAULT_YML"; then
|
||||
info "generating RS256 JWT keypair"
|
||||
local jwt_priv jwt_pub
|
||||
jwt_priv=$(openssl genrsa 4096 2>/dev/null | base64 -w0)
|
||||
jwt_pub=$(echo "$jwt_priv" | base64 -d | openssl rsa -pubout 2>/dev/null | base64 -w0)
|
||||
sed -i "s|<TODO: base64 of RS256 private PEM>|$jwt_priv|" "$VAULT_YML"
|
||||
sed -i "s|<TODO: base64 of RS256 public PEM>|$jwt_pub|" "$VAULT_YML"
|
||||
ok "JWT keys generated and inserted"
|
||||
fi
|
||||
if grep -qE '<TODO' "$VAULT_YML"; then
|
||||
local remaining
|
||||
remaining=$(grep -cE '<TODO' "$VAULT_YML")
|
||||
die "$remaining <TODO> placeholders still in $VAULT_YML — fill them and rerun PHASE=2 ./bootstrap-local.sh"
|
||||
fi
|
||||
fi
|
||||
|
||||
if [[ ! -f "$VAULT_PASS" ]]; then
|
||||
local pw=""
|
||||
prompt_password pw "choose a vault password (memorize it !)"
|
||||
echo "$pw" > "$VAULT_PASS"
|
||||
chmod 0400 "$VAULT_PASS"
|
||||
ok "wrote $VAULT_PASS"
|
||||
# If vault.yml is plaintext, encrypt now.
|
||||
if ! head -1 "$VAULT_YML" | grep -q '^\$ANSIBLE_VAULT'; then
|
||||
info "encrypting vault.yml"
|
||||
ansible-vault encrypt --vault-password-file "$VAULT_PASS" "$VAULT_YML"
|
||||
ok "encrypted"
|
||||
fi
|
||||
fi
|
||||
|
||||
info "verifying we can decrypt"
|
||||
if ! ansible-vault view --vault-password-file "$VAULT_PASS" "$VAULT_YML" >/dev/null 2>&1; then
|
||||
die "cannot decrypt $VAULT_YML with $VAULT_PASS — password mismatch ?"
|
||||
fi
|
||||
ok "vault decryption verified"
|
||||
|
||||
mark_done vault
|
||||
phase vault DONE
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Phase 3 — Forgejo Secrets + Variables
|
||||
# ============================================================================
|
||||
phase_3_forgejo() {
|
||||
section "Phase 3 — Forgejo Secrets + Variables"
|
||||
_current_phase=forgejo
|
||||
phase forgejo START
|
||||
|
||||
if skip_if_done forgejo "Forgejo provisioning"; then
|
||||
phase forgejo DONE; return 0
|
||||
fi
|
||||
|
||||
require_env FORGEJO_ADMIN_TOKEN \
|
||||
"create at $FORGEJO_API_URL/-/user/settings/applications (scopes: write:admin, write:repository, write:package)"
|
||||
|
||||
info "checking Forgejo API reachability"
|
||||
if ! curl -fsSL --max-time 10 \
|
||||
-H "Authorization: token $FORGEJO_ADMIN_TOKEN" \
|
||||
"$FORGEJO_API_URL/api/v1/user" >/dev/null 2>&1; then
|
||||
TALAS_HINT="check FORGEJO_API_URL ($FORGEJO_API_URL) ; if no DNS yet, try FORGEJO_API_URL=http://10.0.20.105:3000"
|
||||
die "Forgejo API unreachable or token invalid"
|
||||
fi
|
||||
ok "Forgejo API reachable, token valid"
|
||||
|
||||
info "checking repo $FORGEJO_OWNER/$FORGEJO_REPO exists"
|
||||
if ! forgejo_api GET "/repos/$FORGEJO_OWNER/$FORGEJO_REPO" >/dev/null 2>&1; then
|
||||
TALAS_HINT="set FORGEJO_OWNER + FORGEJO_REPO env vars (currently $FORGEJO_OWNER/$FORGEJO_REPO)"
|
||||
die "repo $FORGEJO_OWNER/$FORGEJO_REPO not found"
|
||||
fi
|
||||
|
||||
# Create a long-lived registry token via the API.
|
||||
info "creating a registry token (write:package)"
|
||||
local registry_token
|
||||
registry_token=$(forgejo_api POST "/users/$FORGEJO_OWNER/tokens" \
|
||||
--data "$(jq -nc --arg n "veza-deploy-registry-$(date +%s)" \
|
||||
--argjson s '["write:package", "read:package"]' \
|
||||
'{name: $n, scopes: $s}')" \
|
||||
| jq -er '.sha1 // empty') \
|
||||
|| die "could not create registry token via API ; create one manually at $FORGEJO_API_URL/-/user/settings/applications and re-run with FORGEJO_REGISTRY_TOKEN env var set"
|
||||
|
||||
forgejo_set_secret "$FORGEJO_OWNER" "$FORGEJO_REPO" FORGEJO_REGISTRY_TOKEN "$registry_token"
|
||||
forgejo_set_secret "$FORGEJO_OWNER" "$FORGEJO_REPO" ANSIBLE_VAULT_PASSWORD "$(cat "$VAULT_PASS")"
|
||||
forgejo_set_var "$FORGEJO_OWNER" "$FORGEJO_REPO" FORGEJO_REGISTRY_URL \
|
||||
"$FORGEJO_API_URL/api/packages/$FORGEJO_OWNER/generic"
|
||||
|
||||
mark_done forgejo
|
||||
phase forgejo DONE
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Phase 4 — R720 remote bootstrap
|
||||
# ============================================================================
|
||||
phase_4_r720() {
|
||||
section "Phase 4 — R720 remote bootstrap (Incus profiles + runner labels)"
|
||||
_current_phase=r720
|
||||
phase r720 START
|
||||
|
||||
if skip_if_done r720 "R720 remote bootstrap"; then
|
||||
phase r720 DONE; return 0
|
||||
fi
|
||||
|
||||
require_env FORGEJO_ADMIN_TOKEN
|
||||
info "fetching a runner registration token from Forgejo"
|
||||
local reg_token
|
||||
reg_token=$(forgejo_get_runner_token "$FORGEJO_OWNER" "$FORGEJO_REPO") \
|
||||
|| die "could not fetch runner registration token"
|
||||
info "got registration token (${#reg_token} chars)"
|
||||
|
||||
local remote_script="$SCRIPT_DIR/bootstrap-remote.sh"
|
||||
local remote_lib="$SCRIPT_DIR/lib.sh"
|
||||
require_file "$remote_script"
|
||||
require_file "$remote_lib"
|
||||
|
||||
info "streaming bootstrap-remote.sh over SSH (logs to /var/log/talas-bootstrap.log on R720)"
|
||||
# Concatenate lib.sh + remote script so the remote bash sees both.
|
||||
{
|
||||
cat "$remote_lib"
|
||||
echo
|
||||
cat "$remote_script"
|
||||
} | ssh "$R720_USER@$R720_HOST" \
|
||||
"FORGEJO_REGISTRATION_TOKEN='$reg_token' \
|
||||
FORGEJO_API_URL='$FORGEJO_API_URL' \
|
||||
sudo -E bash -s" \
|
||||
| tee >(grep -E '>>>PHASE:' >&2) \
|
||||
|| die "remote bootstrap failed ; ssh to $R720_HOST and tail /var/log/talas-bootstrap.log"
|
||||
|
||||
mark_done r720
|
||||
phase r720 DONE
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Phase 5 — Edge HAProxy + Let's Encrypt
|
||||
# ============================================================================
|
||||
phase_5_haproxy() {
|
||||
section "Phase 5 — Edge HAProxy + Let's Encrypt certs"
|
||||
_current_phase=haproxy
|
||||
phase haproxy START
|
||||
|
||||
if skip_if_done haproxy "haproxy + LE"; then
|
||||
phase haproxy DONE; return 0
|
||||
fi
|
||||
|
||||
cd "$REPO_ROOT/infra/ansible"
|
||||
info "running ansible-playbook playbooks/haproxy.yml (5–10 min)"
|
||||
if ! ansible-playbook -i inventory/staging.yml playbooks/haproxy.yml \
|
||||
--vault-password-file .vault-pass; then
|
||||
TALAS_HINT="check the ansible output above ; common issues : Incus profile missing, port 80 blocked from Internet, DNS not yet propagated"
|
||||
die "ansible-playbook haproxy.yml failed"
|
||||
fi
|
||||
|
||||
info "verifying Let's Encrypt certs landed"
|
||||
local certs
|
||||
certs=$(ssh "$R720_USER@$R720_HOST" "incus exec veza-haproxy -- ls /usr/local/etc/tls/haproxy/ 2>/dev/null" || true)
|
||||
if [[ -z "$certs" ]]; then
|
||||
warn "no certs found in /usr/local/etc/tls/haproxy/ on veza-haproxy"
|
||||
warn "check /var/log/letsencrypt or run again — dehydrated retries on next playbook run"
|
||||
return 1
|
||||
fi
|
||||
ok "certs : $(echo "$certs" | tr '\n' ' ')"
|
||||
|
||||
mark_done haproxy
|
||||
phase haproxy DONE
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Phase 6 — Summary
|
||||
# ============================================================================
|
||||
phase_6_summary() {
|
||||
section "Phase 6 — Summary"
|
||||
_current_phase=summary
|
||||
phase summary START
|
||||
|
||||
cat <<EOF >&2
|
||||
|
||||
${_GREEN}${_BOLD}✓ Bootstrap complete.${_RESET}
|
||||
|
||||
What works now :
|
||||
• Forgejo registry has the deploy secrets + variable.
|
||||
• forgejo-runner has the 'incus' label and Incus socket access.
|
||||
• veza-haproxy edge container is up with Let's Encrypt certs.
|
||||
|
||||
What you can do next :
|
||||
1. Trigger a manual deploy via Forgejo Actions UI :
|
||||
$FORGEJO_API_URL/$FORGEJO_OWNER/$FORGEJO_REPO/actions
|
||||
→ "Veza deploy" → "Run workflow" → env=staging.
|
||||
|
||||
2. Once that run is green, re-enable auto-trigger :
|
||||
$SCRIPT_DIR/enable-auto-deploy.sh
|
||||
|
||||
3. Verify state any time :
|
||||
$SCRIPT_DIR/verify-local.sh
|
||||
ssh $R720_USER@$R720_HOST $SCRIPT_DIR/verify-remote.sh
|
||||
|
||||
State file : $TALAS_STATE_FILE
|
||||
EOF
|
||||
|
||||
mark_done summary
|
||||
phase summary DONE
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# main
|
||||
# ============================================================================
|
||||
main() {
|
||||
local start=${PHASE:-1}
|
||||
info "starting at phase $start"
|
||||
|
||||
[[ $start -le 1 ]] && phase_1_preflight
|
||||
[[ $start -le 2 ]] && phase_2_vault
|
||||
[[ $start -le 3 ]] && phase_3_forgejo
|
||||
[[ $start -le 4 ]] && phase_4_r720
|
||||
[[ $start -le 5 ]] && phase_5_haproxy
|
||||
[[ $start -le 6 ]] && phase_6_summary
|
||||
|
||||
ok "ALL DONE"
|
||||
}
|
||||
|
||||
main "$@"
|
||||
238
scripts/bootstrap/bootstrap-remote.sh
Executable file
238
scripts/bootstrap/bootstrap-remote.sh
Executable file
|
|
@ -0,0 +1,238 @@
|
|||
#!/usr/bin/env bash
|
||||
# bootstrap-remote.sh — runs ON the R720, invoked over SSH by
|
||||
# bootstrap-local.sh. Idempotent ; resumable via PHASE env var.
|
||||
#
|
||||
# Inputs (from SSH-passed env vars) :
|
||||
# FORGEJO_REGISTRATION_TOKEN short-lived token to register runner
|
||||
# FORGEJO_API_URL default: https://forgejo.talas.group
|
||||
#
|
||||
# Each phase logs to /var/log/talas-bootstrap.log AND emits structured
|
||||
# >>>PHASE:<name>:<status><<< markers on stdout for the local script.
|
||||
|
||||
# lib.sh is concatenated upstream by bootstrap-local before this file is
|
||||
# piped to bash. When run standalone, source it manually.
|
||||
if ! declare -F info >/dev/null 2>&1; then
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
# shellcheck source=lib.sh
|
||||
. "$SCRIPT_DIR/lib.sh"
|
||||
fi
|
||||
trap_errors
|
||||
|
||||
# Persistent log on R720 — useful when the SSH stream gets cut off.
|
||||
exec > >(tee -a /var/log/talas-bootstrap.log) 2>&1
|
||||
|
||||
: "${FORGEJO_API_URL:=https://forgejo.talas.group}"
|
||||
|
||||
# ============================================================================
|
||||
# Phase R1 — Incus profiles
|
||||
# ============================================================================
|
||||
remote_phase_1_profiles() {
|
||||
section "R1 — Incus profiles (veza-app, veza-data, veza-net)"
|
||||
_current_phase=r1_profiles
|
||||
phase r1_profiles START
|
||||
|
||||
if skip_if_done r1_profiles "incus profiles"; then
|
||||
phase r1_profiles DONE; return 0
|
||||
fi
|
||||
|
||||
for p in veza-app veza-data veza-net; do
|
||||
if incus profile show "$p" >/dev/null 2>&1; then
|
||||
ok "profile $p already exists"
|
||||
else
|
||||
incus profile create "$p"
|
||||
ok "profile $p created (empty — operator may add limits later)"
|
||||
fi
|
||||
done
|
||||
|
||||
# If there's an existing veza-net network, add it to veza-net profile
|
||||
# so containers using that profile pick it up by default. Otherwise
|
||||
# leave the profile empty (caller passes --network on launch).
|
||||
if incus network show veza-net >/dev/null 2>&1; then
|
||||
if ! incus profile device show veza-net 2>/dev/null | grep -q '^eth0:'; then
|
||||
incus profile device add veza-net eth0 nic \
|
||||
network=veza-net \
|
||||
name=eth0 >/dev/null
|
||||
ok "veza-net profile : eth0 → network veza-net"
|
||||
else
|
||||
ok "veza-net profile : eth0 device already configured"
|
||||
fi
|
||||
else
|
||||
warn "incus network 'veza-net' not found — containers will need explicit --network on launch"
|
||||
fi
|
||||
|
||||
mark_done r1_profiles
|
||||
phase r1_profiles DONE
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Phase R2 — mount Incus socket into forgejo-runner container
|
||||
# ============================================================================
|
||||
remote_phase_2_runner_socket() {
|
||||
section "R2 — mount /var/lib/incus/unix.socket into forgejo-runner"
|
||||
_current_phase=r2_runner_socket
|
||||
phase r2_runner_socket START
|
||||
|
||||
if skip_if_done r2_runner_socket "runner socket mount"; then
|
||||
phase r2_runner_socket DONE; return 0
|
||||
fi
|
||||
|
||||
if ! incus info forgejo-runner >/dev/null 2>&1; then
|
||||
die "container 'forgejo-runner' not found ; expected at the IP shown in the design"
|
||||
fi
|
||||
|
||||
if incus config device show forgejo-runner 2>/dev/null | grep -q '^incus-socket:'; then
|
||||
ok "incus-socket device already attached"
|
||||
else
|
||||
info "attaching unix socket as a disk device"
|
||||
incus config device add forgejo-runner incus-socket disk \
|
||||
source=/var/lib/incus/unix.socket \
|
||||
path=/var/lib/incus/unix.socket >/dev/null
|
||||
ok "device added"
|
||||
fi
|
||||
|
||||
if [[ "$(incus config get forgejo-runner security.nesting)" != "true" ]]; then
|
||||
info "enabling security.nesting"
|
||||
incus config set forgejo-runner security.nesting=true
|
||||
ok "nesting=true ; restart required"
|
||||
info "restarting forgejo-runner container"
|
||||
incus restart forgejo-runner
|
||||
sleep 3
|
||||
fi
|
||||
|
||||
info "ensuring incus client is installed inside the runner"
|
||||
if ! incus exec forgejo-runner -- command -v incus >/dev/null 2>&1; then
|
||||
incus exec forgejo-runner -- apt-get update -qq
|
||||
incus exec forgejo-runner -- apt-get install -y incus-client >/dev/null
|
||||
ok "incus-client installed in runner"
|
||||
else
|
||||
ok "incus-client already in runner"
|
||||
fi
|
||||
|
||||
info "smoke-test : runner can incus list"
|
||||
if ! incus exec forgejo-runner -- incus list >/dev/null 2>&1; then
|
||||
die "runner cannot reach Incus socket — verify nesting + permissions"
|
||||
fi
|
||||
ok "runner has Incus access"
|
||||
|
||||
mark_done r2_runner_socket
|
||||
phase r2_runner_socket DONE
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Phase R3 — runner label = 'incus'
|
||||
# ============================================================================
|
||||
remote_phase_3_runner_labels() {
|
||||
section "R3 — forgejo-runner labelled 'incus,self-hosted'"
|
||||
_current_phase=r3_runner_labels
|
||||
phase r3_runner_labels START
|
||||
|
||||
if skip_if_done r3_runner_labels "runner labels"; then
|
||||
phase r3_runner_labels DONE; return 0
|
||||
fi
|
||||
|
||||
require_env FORGEJO_REGISTRATION_TOKEN \
|
||||
"set on the SSH command-line by bootstrap-local.sh"
|
||||
|
||||
# Find the runner config inside the container. Path varies by install
|
||||
# method ; act_runner default is /etc/forgejo-runner/.runner.
|
||||
local runner_cfg
|
||||
runner_cfg=$(incus exec forgejo-runner -- bash -c '
|
||||
for f in /etc/forgejo-runner/.runner /var/lib/forgejo-runner/.runner /opt/forgejo-runner/.runner; do
|
||||
[[ -f "$f" ]] && echo "$f" && exit 0
|
||||
done
|
||||
exit 1
|
||||
' 2>/dev/null) || true
|
||||
|
||||
local labels=""
|
||||
if [[ -n "$runner_cfg" ]]; then
|
||||
labels=$(incus exec forgejo-runner -- jq -r '.labels[]?' "$runner_cfg" 2>/dev/null \
|
||||
|| incus exec forgejo-runner -- grep -oE '"labels":\[[^]]+' "$runner_cfg" 2>/dev/null \
|
||||
|| echo "")
|
||||
fi
|
||||
|
||||
if echo "$labels" | grep -qw incus; then
|
||||
ok "runner already has 'incus' label"
|
||||
mark_done r3_runner_labels
|
||||
phase r3_runner_labels DONE
|
||||
return 0
|
||||
fi
|
||||
|
||||
info "re-registering runner with labels incus,self-hosted"
|
||||
|
||||
# Stop systemd unit, wipe old registration, re-register, start.
|
||||
incus exec forgejo-runner -- systemctl stop forgejo-runner.service 2>/dev/null \
|
||||
|| incus exec forgejo-runner -- systemctl stop act_runner.service 2>/dev/null \
|
||||
|| warn "no systemd unit to stop ; will skip"
|
||||
|
||||
[[ -n "$runner_cfg" ]] && incus exec forgejo-runner -- rm -f "$runner_cfg"
|
||||
|
||||
# Detect runner binary name
|
||||
local runner_bin
|
||||
runner_bin=$(incus exec forgejo-runner -- bash -c '
|
||||
for b in forgejo-runner act_runner; do
|
||||
command -v "$b" >/dev/null 2>&1 && echo "$b" && exit 0
|
||||
done
|
||||
exit 1
|
||||
' 2>/dev/null) || die "no forgejo-runner / act_runner binary found in container"
|
||||
|
||||
incus exec forgejo-runner -- "$runner_bin" register \
|
||||
--no-interactive \
|
||||
--instance "$FORGEJO_API_URL" \
|
||||
--token "$FORGEJO_REGISTRATION_TOKEN" \
|
||||
--name "r720-incus" \
|
||||
--labels "incus,self-hosted"
|
||||
|
||||
incus exec forgejo-runner -- systemctl start "$runner_bin.service" \
|
||||
|| incus exec forgejo-runner -- systemctl start forgejo-runner.service
|
||||
|
||||
ok "runner re-registered with incus label"
|
||||
|
||||
mark_done r3_runner_labels
|
||||
phase r3_runner_labels DONE
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Phase R4 — sanity, summary
|
||||
# ============================================================================
|
||||
remote_phase_4_sanity() {
|
||||
section "R4 — sanity check"
|
||||
_current_phase=r4_sanity
|
||||
phase r4_sanity START
|
||||
|
||||
info "incus profiles :"
|
||||
incus profile list -f csv | grep -E '^veza-' | awk -F, '{print " " $1}'
|
||||
|
||||
info "forgejo-runner status :"
|
||||
incus exec forgejo-runner -- systemctl is-active forgejo-runner.service 2>/dev/null \
|
||||
|| incus exec forgejo-runner -- systemctl is-active act_runner.service 2>/dev/null \
|
||||
|| warn "no active runner service — verify manually"
|
||||
|
||||
info "forgejo container reachable from runner :"
|
||||
if incus exec forgejo-runner -- curl -sSf -o /dev/null --max-time 5 \
|
||||
"$FORGEJO_API_URL" 2>/dev/null \
|
||||
|| incus exec forgejo-runner -- curl -sSf -ko /dev/null --max-time 5 \
|
||||
https://10.0.20.105:3000/ 2>/dev/null \
|
||||
|| incus exec forgejo-runner -- curl -sSf -o /dev/null --max-time 5 \
|
||||
http://10.0.20.105:3000/ 2>/dev/null; then
|
||||
ok "runner can reach Forgejo"
|
||||
else
|
||||
warn "runner cannot reach Forgejo — check WireGuard / DNS / firewall"
|
||||
fi
|
||||
|
||||
mark_done r4_sanity
|
||||
phase r4_sanity DONE
|
||||
}
|
||||
|
||||
main() {
|
||||
local start=${PHASE:-1}
|
||||
info "remote bootstrap starting at phase $start (log: /var/log/talas-bootstrap.log)"
|
||||
|
||||
[[ $start -le 1 ]] && remote_phase_1_profiles
|
||||
[[ $start -le 2 ]] && remote_phase_2_runner_socket
|
||||
[[ $start -le 3 ]] && remote_phase_3_runner_labels
|
||||
[[ $start -le 4 ]] && remote_phase_4_sanity
|
||||
|
||||
ok "remote bootstrap done"
|
||||
}
|
||||
|
||||
main "$@"
|
||||
52
scripts/bootstrap/enable-auto-deploy.sh
Executable file
52
scripts/bootstrap/enable-auto-deploy.sh
Executable file
|
|
@ -0,0 +1,52 @@
|
|||
#!/usr/bin/env bash
|
||||
# enable-auto-deploy.sh — flip the workflow_dispatch-only gate on
|
||||
# .forgejo/workflows/deploy.yml back to push:main + tag:v*. Run this
|
||||
# AFTER one successful manual workflow_dispatch run has proven the
|
||||
# chain end-to-end.
|
||||
|
||||
set -Eeuo pipefail
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
. "$SCRIPT_DIR/lib.sh"
|
||||
trap_errors
|
||||
|
||||
REPO_ROOT=$(git -C "$SCRIPT_DIR" rev-parse --show-toplevel) || die "not in a git repo"
|
||||
DEPLOY_YML="$REPO_ROOT/.forgejo/workflows/deploy.yml"
|
||||
require_file "$DEPLOY_YML"
|
||||
|
||||
if grep -qE '^[[:space:]]+push:$' "$DEPLOY_YML"; then
|
||||
ok "auto-deploy already enabled"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
if ! grep -qE '^[[:space:]]+# push:' "$DEPLOY_YML"; then
|
||||
die "deploy.yml has neither active push: nor commented '# push:' — manual edit required"
|
||||
fi
|
||||
|
||||
info "uncommenting push: + branches: + tags: in $DEPLOY_YML"
|
||||
# Conservative single-line replacements, indentation preserved.
|
||||
sed -i \
|
||||
-e 's|^ # push: # GATED — uncomment after first| push:|' \
|
||||
-e 's|^ # branches: \[main\] # successful workflow_dispatch run| branches: [main]|' \
|
||||
-e 's|^ # tags: \['"'"'v\*'"'"'\] # see RUNBOOK_DEPLOY_BOOTSTRAP.md| tags: ['"'"'v*'"'"']|' \
|
||||
"$DEPLOY_YML"
|
||||
|
||||
# Verify.
|
||||
if ! grep -qE '^[[:space:]]+push:$' "$DEPLOY_YML"; then
|
||||
die "sed didn't apply — open $DEPLOY_YML and uncomment by hand"
|
||||
fi
|
||||
|
||||
ok "edited $DEPLOY_YML"
|
||||
info "diff:"
|
||||
git -C "$REPO_ROOT" --no-pager diff -- "$DEPLOY_YML" >&2
|
||||
|
||||
cat >&2 <<EOF
|
||||
|
||||
Next step :
|
||||
cd $REPO_ROOT
|
||||
git add .forgejo/workflows/deploy.yml
|
||||
git commit --no-verify -m "feat(forgejo): re-enable auto-deploy on push:main + tag:v*"
|
||||
git push origin main
|
||||
|
||||
The push itself triggers the first auto-deploy. Watch :
|
||||
https://forgejo.talas.group/${FORGEJO_OWNER:-talas}/${FORGEJO_REPO:-veza}/actions
|
||||
EOF
|
||||
203
scripts/bootstrap/lib.sh
Executable file
203
scripts/bootstrap/lib.sh
Executable file
|
|
@ -0,0 +1,203 @@
|
|||
# shellcheck shell=bash
|
||||
# Shared helpers for the bootstrap + verify scripts. Source from each
|
||||
# script ; never run directly.
|
||||
#
|
||||
# . "$(dirname "${BASH_SOURCE[0]}")/lib.sh"
|
||||
#
|
||||
# Conventions :
|
||||
# * All functions log to stderr ; stdout is reserved for return values.
|
||||
# * Every state-mutating action is paired with a state-checking guard
|
||||
# that returns 0 if the action is already applied (idempotency).
|
||||
# * Failures call `die` which exits non-zero with a hint.
|
||||
# * Phase markers `>>>PHASE:<name>:<status><<<` are emitted on stdout
|
||||
# so a parent script (bootstrap-local.sh streaming bootstrap-remote.sh
|
||||
# over SSH) can grep + parse the progression.
|
||||
|
||||
# ----- ANSI + structured output -----------------------------------------------
|
||||
|
||||
if [[ -t 2 ]]; then
|
||||
_RED=$'\033[31m'; _GREEN=$'\033[32m'; _YELLOW=$'\033[33m'
|
||||
_BLUE=$'\033[34m'; _BOLD=$'\033[1m'; _RESET=$'\033[0m'
|
||||
else
|
||||
_RED=''; _GREEN=''; _YELLOW=''; _BLUE=''; _BOLD=''; _RESET=''
|
||||
fi
|
||||
|
||||
_now() { date -u +'%Y-%m-%dT%H:%M:%SZ'; }
|
||||
_log() { printf >&2 '%s [%s] %s\n' "$(_now)" "$1" "$2"; }
|
||||
|
||||
info() { _log "${_BLUE}INFO${_RESET}" "$*"; }
|
||||
ok() { _log "${_GREEN}OK${_RESET}" "$*"; }
|
||||
warn() { _log "${_YELLOW}WARN${_RESET}" "$*"; }
|
||||
err() { _log "${_RED}ERR${_RESET}" "$*"; }
|
||||
section() { printf >&2 '\n%s%s===== %s =====%s\n' "$_BOLD" "$_BLUE" "$*" "$_RESET"; }
|
||||
|
||||
# Phase marker emitted on stdout (parsed by parent scripts).
|
||||
phase() { printf '>>>PHASE:%s:%s<<<\n' "$1" "$2"; }
|
||||
|
||||
# Hard fail with hint.
|
||||
die() {
|
||||
err "$*"
|
||||
if [[ -n "${TALAS_HINT:-}" ]]; then
|
||||
printf >&2 '%shint:%s %s\n' "$_YELLOW" "$_RESET" "$TALAS_HINT"
|
||||
fi
|
||||
exit 1
|
||||
}
|
||||
|
||||
# ----- pre-conditions ---------------------------------------------------------
|
||||
|
||||
require_cmd() {
|
||||
local missing=()
|
||||
for c in "$@"; do
|
||||
command -v "$c" >/dev/null 2>&1 || missing+=("$c")
|
||||
done
|
||||
if (( ${#missing[@]} > 0 )); then
|
||||
TALAS_HINT="apt install ${missing[*]} (Debian/Ubuntu)"
|
||||
die "missing commands: ${missing[*]}"
|
||||
fi
|
||||
}
|
||||
|
||||
require_file() {
|
||||
[[ -f "$1" ]] || die "missing file: $1"
|
||||
}
|
||||
|
||||
require_env() {
|
||||
local var=$1 hint=${2:-}
|
||||
if [[ -z "${!var:-}" ]]; then
|
||||
TALAS_HINT="$hint"
|
||||
die "env var \$$var is not set"
|
||||
fi
|
||||
}
|
||||
|
||||
# ----- state file (shared across bootstrap + verify) --------------------------
|
||||
# State lives at /var/lib/talas/bootstrap.state on each host. One key=value
|
||||
# line per phase. mark_done is idempotent ; phase_done returns 0 if marked.
|
||||
|
||||
: "${TALAS_STATE_DIR:=/var/lib/talas}"
|
||||
: "${TALAS_STATE_FILE:=$TALAS_STATE_DIR/bootstrap.state}"
|
||||
|
||||
ensure_state_dir() {
|
||||
if [[ ! -d "$TALAS_STATE_DIR" ]]; then
|
||||
# Try without sudo first (already root in container case).
|
||||
mkdir -p "$TALAS_STATE_DIR" 2>/dev/null \
|
||||
|| sudo mkdir -p "$TALAS_STATE_DIR" \
|
||||
|| die "cannot create $TALAS_STATE_DIR (need root or run with sudo)"
|
||||
fi
|
||||
[[ -f "$TALAS_STATE_FILE" ]] || (touch "$TALAS_STATE_FILE" 2>/dev/null || sudo touch "$TALAS_STATE_FILE")
|
||||
}
|
||||
|
||||
mark_done() {
|
||||
local key=$1
|
||||
ensure_state_dir
|
||||
local line="$key=DONE $(_now)"
|
||||
if ! grep -q "^$key=" "$TALAS_STATE_FILE" 2>/dev/null; then
|
||||
echo "$line" | (tee -a "$TALAS_STATE_FILE" 2>/dev/null || sudo tee -a "$TALAS_STATE_FILE") >/dev/null
|
||||
fi
|
||||
}
|
||||
|
||||
phase_done() {
|
||||
local key=$1
|
||||
[[ -f "$TALAS_STATE_FILE" ]] || return 1
|
||||
grep -q "^$key=DONE" "$TALAS_STATE_FILE" 2>/dev/null
|
||||
}
|
||||
|
||||
skip_if_done() {
|
||||
local key=$1 label=$2
|
||||
if phase_done "$key"; then
|
||||
ok "$label — already done (skipped)"
|
||||
return 0
|
||||
fi
|
||||
return 1
|
||||
}
|
||||
|
||||
# ----- error trap -------------------------------------------------------------
|
||||
|
||||
_trap_err() {
|
||||
local rc=$? line=$1
|
||||
err "FAILED at $0:$line (rc=$rc)"
|
||||
if [[ -n "${TALAS_HINT:-}" ]]; then
|
||||
printf >&2 '%shint:%s %s\n' "$_YELLOW" "$_RESET" "$TALAS_HINT"
|
||||
fi
|
||||
phase "$(_current_phase)" "FAIL"
|
||||
exit "$rc"
|
||||
}
|
||||
|
||||
_current_phase=""
|
||||
_current_phase() { echo "${_current_phase:-unknown}"; }
|
||||
|
||||
# Call once at script start.
|
||||
trap_errors() {
|
||||
set -Eeuo pipefail
|
||||
trap '_trap_err $LINENO' ERR
|
||||
}
|
||||
|
||||
# ----- prompts (interactive only) ---------------------------------------------
|
||||
|
||||
prompt_password() {
|
||||
local var=$1 question=${2:-"value (input hidden):"}
|
||||
local v=""
|
||||
while [[ -z "$v" ]]; do
|
||||
printf >&2 '%s ' "$question"
|
||||
IFS= read -rs v
|
||||
printf >&2 '\n'
|
||||
[[ -z "$v" ]] && warn "empty — try again"
|
||||
done
|
||||
eval "$var=\$v"
|
||||
}
|
||||
|
||||
prompt_value() {
|
||||
local var=$1 question=${2:-"value:"} default=${3:-}
|
||||
local v=""
|
||||
if [[ -n "$default" ]]; then
|
||||
printf >&2 '%s [%s] ' "$question" "$default"
|
||||
else
|
||||
printf >&2 '%s ' "$question"
|
||||
fi
|
||||
IFS= read -r v
|
||||
[[ -z "$v" && -n "$default" ]] && v="$default"
|
||||
eval "$var=\$v"
|
||||
}
|
||||
|
||||
# ----- Forgejo API helper -----------------------------------------------------
|
||||
|
||||
# Requires: $FORGEJO_API_URL, $FORGEJO_ADMIN_TOKEN
|
||||
forgejo_api() {
|
||||
local method=$1 path=$2; shift 2
|
||||
curl -fsSL --max-time 30 \
|
||||
-X "$method" \
|
||||
-H "Authorization: token ${FORGEJO_ADMIN_TOKEN:?FORGEJO_ADMIN_TOKEN unset}" \
|
||||
-H "Accept: application/json" \
|
||||
-H "Content-Type: application/json" \
|
||||
"$FORGEJO_API_URL/api/v1$path" "$@"
|
||||
}
|
||||
|
||||
forgejo_set_secret() {
|
||||
local owner=$1 repo=$2 name=$3 value=$4
|
||||
local body
|
||||
body=$(jq -nc --arg v "$value" '{data: $v}')
|
||||
if forgejo_api PUT "/repos/$owner/$repo/actions/secrets/$name" --data "$body" >/dev/null 2>&1; then
|
||||
ok "secret $name set"
|
||||
else
|
||||
die "failed to set secret $name (token scope ? repo path ?)"
|
||||
fi
|
||||
}
|
||||
|
||||
forgejo_set_var() {
|
||||
local owner=$1 repo=$2 name=$3 value=$4
|
||||
local body
|
||||
body=$(jq -nc --arg n "$name" --arg v "$value" '{name: $n, value: $v}')
|
||||
# Try update (PUT) ; if 404, create (POST).
|
||||
if forgejo_api PUT "/repos/$owner/$repo/actions/variables/$name" --data "$body" >/dev/null 2>&1; then
|
||||
ok "variable $name updated"
|
||||
elif forgejo_api POST "/repos/$owner/$repo/actions/variables" --data "$body" >/dev/null 2>&1; then
|
||||
ok "variable $name created"
|
||||
else
|
||||
die "failed to set variable $name"
|
||||
fi
|
||||
}
|
||||
|
||||
forgejo_get_runner_token() {
|
||||
local owner=$1 repo=$2
|
||||
forgejo_api GET "/repos/$owner/$repo/actions/runners/registration-token" \
|
||||
| jq -er '.token // empty' \
|
||||
|| die "failed to fetch runner registration token (admin scope ?)"
|
||||
}
|
||||
131
scripts/bootstrap/verify-local.sh
Executable file
131
scripts/bootstrap/verify-local.sh
Executable file
|
|
@ -0,0 +1,131 @@
|
|||
#!/usr/bin/env bash
|
||||
# verify-local.sh — read-only checks of local state (vault, secrets, ssh).
|
||||
# Exit 0 if everything passes ; non-zero with a count of failures.
|
||||
|
||||
set -uo pipefail
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
# shellcheck source=lib.sh
|
||||
. "$SCRIPT_DIR/lib.sh"
|
||||
|
||||
[[ -f "$SCRIPT_DIR/.env" ]] && . "$SCRIPT_DIR/.env"
|
||||
|
||||
: "${R720_HOST:=10.0.20.150}"
|
||||
: "${R720_USER:=ansible}"
|
||||
: "${FORGEJO_API_URL:=https://forgejo.talas.group}"
|
||||
: "${FORGEJO_OWNER:=talas}"
|
||||
: "${FORGEJO_REPO:=veza}"
|
||||
|
||||
REPO_ROOT=$(git -C "$SCRIPT_DIR" rev-parse --show-toplevel 2>/dev/null) || {
|
||||
err "not in a git repo"
|
||||
exit 1
|
||||
}
|
||||
|
||||
VAULT_YML="$REPO_ROOT/infra/ansible/group_vars/all/vault.yml"
|
||||
VAULT_PASS="$REPO_ROOT/infra/ansible/.vault-pass"
|
||||
|
||||
declare -i PASS=0 FAIL=0
|
||||
|
||||
check() {
|
||||
local name=$1 cmd=$2
|
||||
if eval "$cmd" >/dev/null 2>&1; then
|
||||
ok "$name"
|
||||
PASS+=1
|
||||
else
|
||||
err "$name"
|
||||
FAIL+=1
|
||||
fi
|
||||
}
|
||||
|
||||
check_with_hint() {
|
||||
local name=$1 cmd=$2 hint=$3
|
||||
if eval "$cmd" >/dev/null 2>&1; then
|
||||
ok "$name"
|
||||
PASS+=1
|
||||
else
|
||||
err "$name"
|
||||
printf >&2 ' %shint:%s %s\n' "$_YELLOW" "$_RESET" "$hint"
|
||||
FAIL+=1
|
||||
fi
|
||||
}
|
||||
|
||||
section "Local prerequisites"
|
||||
check "git available" "command -v git"
|
||||
check "ansible available" "command -v ansible"
|
||||
check "ansible-vault available" "command -v ansible-vault"
|
||||
check "curl available" "command -v curl"
|
||||
check "jq available" "command -v jq"
|
||||
check "ssh available" "command -v ssh"
|
||||
check "openssl available" "command -v openssl"
|
||||
check "dig available" "command -v dig"
|
||||
|
||||
section "Repo state"
|
||||
check "in repo root" "[[ -f $REPO_ROOT/CLAUDE.md ]]"
|
||||
check "infra/ansible/ exists" "[[ -d $REPO_ROOT/infra/ansible ]]"
|
||||
check ".forgejo/workflows/deploy.yml" "[[ -f $REPO_ROOT/.forgejo/workflows/deploy.yml ]]"
|
||||
check_with_hint "deploy.yml gated (no auto-trigger)" \
|
||||
"! grep -E '^[[:space:]]+push:$' $REPO_ROOT/.forgejo/workflows/deploy.yml" \
|
||||
"if you want auto-deploy, run scripts/bootstrap/enable-auto-deploy.sh"
|
||||
|
||||
section "Vault"
|
||||
check "vault.yml.example exists" "[[ -f $REPO_ROOT/infra/ansible/group_vars/all/vault.yml.example ]]"
|
||||
check "vault.yml exists" "[[ -f $VAULT_YML ]]"
|
||||
check_with_hint "vault.yml is encrypted" \
|
||||
"head -1 $VAULT_YML 2>/dev/null | grep -q '^\\\$ANSIBLE_VAULT'" \
|
||||
"PHASE=2 ./bootstrap-local.sh"
|
||||
check_with_hint ".vault-pass exists" \
|
||||
"[[ -f $VAULT_PASS ]]" \
|
||||
"PHASE=2 ./bootstrap-local.sh"
|
||||
check_with_hint ".vault-pass mode 0400" \
|
||||
"[[ \$(stat -c '%a' $VAULT_PASS 2>/dev/null) == '400' ]]" \
|
||||
"chmod 0400 $VAULT_PASS"
|
||||
check_with_hint "can decrypt vault.yml" \
|
||||
"ansible-vault view --vault-password-file $VAULT_PASS $VAULT_YML" \
|
||||
"vault password mismatch — re-encrypt with: ansible-vault rekey --new-vault-password-file $VAULT_PASS $VAULT_YML"
|
||||
check_with_hint "no <TODO> placeholders left" \
|
||||
"! ansible-vault view --vault-password-file $VAULT_PASS $VAULT_YML 2>/dev/null | grep -q '<TODO'" \
|
||||
"ansible-vault edit --vault-password-file $VAULT_PASS $VAULT_YML"
|
||||
|
||||
section "SSH to R720 ($R720_USER@$R720_HOST)"
|
||||
check_with_hint "ssh handshake" \
|
||||
"ssh -o ConnectTimeout=5 -o BatchMode=yes $R720_USER@$R720_HOST /bin/true" \
|
||||
"ensure your key is in $R720_USER@$R720_HOST:~/.ssh/authorized_keys"
|
||||
check "incus reachable on R720" \
|
||||
"ssh -o BatchMode=yes $R720_USER@$R720_HOST 'incus list >/dev/null 2>&1'"
|
||||
|
||||
section "DNS public domains"
|
||||
for d in veza.fr www.veza.fr staging.veza.fr talas.fr www.talas.fr forgejo.talas.group; do
|
||||
check_with_hint "$d resolves" \
|
||||
"dig +short +time=2 +tries=1 $d @1.1.1.1 | grep -qE '^[0-9]+\\.'" \
|
||||
"set the A record at your registrar to point to your R720 public IP"
|
||||
done
|
||||
|
||||
if [[ -n "${FORGEJO_ADMIN_TOKEN:-}" ]]; then
|
||||
section "Forgejo API + secrets/vars"
|
||||
check_with_hint "Forgejo API reachable" \
|
||||
"curl -fsSL --max-time 10 -H 'Authorization: token $FORGEJO_ADMIN_TOKEN' $FORGEJO_API_URL/api/v1/user" \
|
||||
"set FORGEJO_API_URL ; if no DNS yet, FORGEJO_API_URL=http://10.0.20.105:3000"
|
||||
check_with_hint "repo $FORGEJO_OWNER/$FORGEJO_REPO exists" \
|
||||
"curl -fsSL -H 'Authorization: token $FORGEJO_ADMIN_TOKEN' $FORGEJO_API_URL/api/v1/repos/$FORGEJO_OWNER/$FORGEJO_REPO" \
|
||||
"set FORGEJO_OWNER + FORGEJO_REPO env vars"
|
||||
|
||||
check_with_hint "secret FORGEJO_REGISTRY_TOKEN exists" \
|
||||
"curl -fsSL -H 'Authorization: token $FORGEJO_ADMIN_TOKEN' $FORGEJO_API_URL/api/v1/repos/$FORGEJO_OWNER/$FORGEJO_REPO/actions/secrets/FORGEJO_REGISTRY_TOKEN" \
|
||||
"PHASE=3 ./bootstrap-local.sh"
|
||||
check_with_hint "secret ANSIBLE_VAULT_PASSWORD exists" \
|
||||
"curl -fsSL -H 'Authorization: token $FORGEJO_ADMIN_TOKEN' $FORGEJO_API_URL/api/v1/repos/$FORGEJO_OWNER/$FORGEJO_REPO/actions/secrets/ANSIBLE_VAULT_PASSWORD" \
|
||||
"PHASE=3 ./bootstrap-local.sh"
|
||||
check_with_hint "variable FORGEJO_REGISTRY_URL exists" \
|
||||
"curl -fsSL -H 'Authorization: token $FORGEJO_ADMIN_TOKEN' $FORGEJO_API_URL/api/v1/repos/$FORGEJO_OWNER/$FORGEJO_REPO/actions/variables/FORGEJO_REGISTRY_URL" \
|
||||
"PHASE=3 ./bootstrap-local.sh"
|
||||
else
|
||||
warn "FORGEJO_ADMIN_TOKEN not set — skipping API checks. Set it to run those."
|
||||
fi
|
||||
|
||||
section "Result"
|
||||
if (( FAIL == 0 )); then
|
||||
ok "$PASS / $((PASS + FAIL)) checks passed"
|
||||
exit 0
|
||||
else
|
||||
err "$FAIL FAIL out of $((PASS + FAIL)) ($PASS passed)"
|
||||
exit 1
|
||||
fi
|
||||
122
scripts/bootstrap/verify-remote.sh
Executable file
122
scripts/bootstrap/verify-remote.sh
Executable file
|
|
@ -0,0 +1,122 @@
|
|||
#!/usr/bin/env bash
|
||||
# verify-remote.sh — read-only checks of R720 state (Incus profiles,
|
||||
# runner labels, container reachability, certs). Run on the R720 itself
|
||||
# (locally or via `ssh r720 verify-remote.sh`).
|
||||
#
|
||||
# Exit 0 if everything passes ; non-zero with a count of failures.
|
||||
|
||||
set -uo pipefail
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
# shellcheck source=lib.sh
|
||||
. "$SCRIPT_DIR/lib.sh"
|
||||
|
||||
: "${FORGEJO_API_URL:=https://forgejo.talas.group}"
|
||||
|
||||
declare -i PASS=0 FAIL=0
|
||||
|
||||
check() {
|
||||
local name=$1 cmd=$2
|
||||
if eval "$cmd" >/dev/null 2>&1; then
|
||||
ok "$name"
|
||||
PASS+=1
|
||||
else
|
||||
err "$name"
|
||||
FAIL+=1
|
||||
fi
|
||||
}
|
||||
|
||||
check_with_hint() {
|
||||
local name=$1 cmd=$2 hint=$3
|
||||
if eval "$cmd" >/dev/null 2>&1; then
|
||||
ok "$name"
|
||||
PASS+=1
|
||||
else
|
||||
err "$name"
|
||||
printf >&2 ' %shint:%s %s\n' "$_YELLOW" "$_RESET" "$hint"
|
||||
FAIL+=1
|
||||
fi
|
||||
}
|
||||
|
||||
section "R720 prerequisites"
|
||||
check "incus available" "command -v incus"
|
||||
check "zfs available" "command -v zfs"
|
||||
check "incus list works" "incus list"
|
||||
|
||||
section "Incus profiles"
|
||||
for p in veza-app veza-data veza-net; do
|
||||
check_with_hint "profile $p exists" \
|
||||
"incus profile show $p" \
|
||||
"run scripts/bootstrap/bootstrap-remote.sh as root"
|
||||
done
|
||||
|
||||
section "Forgejo container"
|
||||
check "container 'forgejo' exists" "incus info forgejo"
|
||||
check "container 'forgejo' RUNNING" \
|
||||
"incus list forgejo -f csv -c s 2>/dev/null | grep -q RUNNING"
|
||||
check_with_hint "Forgejo HTTP responds on :3000" \
|
||||
"curl -ksSf -o /dev/null --max-time 5 http://10.0.20.105:3000/ || curl -ksSf -o /dev/null --max-time 5 https://10.0.20.105:3000/" \
|
||||
"incus exec forgejo -- systemctl status forgejo"
|
||||
|
||||
section "Forgejo runner"
|
||||
check "container 'forgejo-runner' exists" "incus info forgejo-runner"
|
||||
check "container 'forgejo-runner' RUNNING" \
|
||||
"incus list forgejo-runner -f csv -c s 2>/dev/null | grep -q RUNNING"
|
||||
check_with_hint "incus-socket device attached" \
|
||||
"incus config device show forgejo-runner | grep -q '^incus-socket:'" \
|
||||
"PHASE=2 sudo bash scripts/bootstrap/bootstrap-remote.sh"
|
||||
check_with_hint "security.nesting=true" \
|
||||
"[[ \$(incus config get forgejo-runner security.nesting) == true ]]" \
|
||||
"incus config set forgejo-runner security.nesting=true && incus restart forgejo-runner"
|
||||
check_with_hint "incus-client installed in runner" \
|
||||
"incus exec forgejo-runner -- command -v incus" \
|
||||
"incus exec forgejo-runner -- apt install -y incus-client"
|
||||
check_with_hint "runner can incus list (socket reachable)" \
|
||||
"incus exec forgejo-runner -- incus list" \
|
||||
"verify the unix-socket disk device + nesting"
|
||||
check_with_hint "runner config has 'incus' label" \
|
||||
"incus exec forgejo-runner -- bash -c 'for f in /etc/forgejo-runner/.runner /var/lib/forgejo-runner/.runner /opt/forgejo-runner/.runner ; do [[ -f \$f ]] && grep -q incus \$f && exit 0 ; done ; exit 1'" \
|
||||
"PHASE=3 sudo bash scripts/bootstrap/bootstrap-remote.sh"
|
||||
check_with_hint "runner systemd unit active" \
|
||||
"incus exec forgejo-runner -- bash -c 'systemctl is-active forgejo-runner.service 2>/dev/null || systemctl is-active act_runner.service'" \
|
||||
"incus exec forgejo-runner -- journalctl -u forgejo-runner -n 50"
|
||||
|
||||
section "Edge HAProxy (only after running playbooks/haproxy.yml)"
|
||||
if incus info veza-haproxy >/dev/null 2>&1; then
|
||||
check "container 'veza-haproxy' RUNNING" \
|
||||
"incus list veza-haproxy -f csv -c s | grep -q RUNNING"
|
||||
check_with_hint "haproxy systemd unit active" \
|
||||
"incus exec veza-haproxy -- systemctl is-active haproxy" \
|
||||
"incus exec veza-haproxy -- journalctl -u haproxy -n 50"
|
||||
check_with_hint "haproxy.cfg present" \
|
||||
"incus exec veza-haproxy -- test -f /etc/haproxy/haproxy.cfg" \
|
||||
"ansible-playbook -i inventory/staging.yml playbooks/haproxy.yml"
|
||||
check_with_hint "haproxy.cfg passes self-validation" \
|
||||
"incus exec veza-haproxy -- haproxy -f /etc/haproxy/haproxy.cfg -c -q" \
|
||||
"config syntax error — re-run ansible-playbook to re-render"
|
||||
check_with_hint "Let's Encrypt cert dir has at least 1 .pem" \
|
||||
"incus exec veza-haproxy -- bash -c 'ls /usr/local/etc/tls/haproxy/*.pem 2>/dev/null | wc -l | grep -q -E \"^[1-9]\"'" \
|
||||
"rerun ansible-playbook ; verify port 80 reachable from Internet for HTTP-01"
|
||||
else
|
||||
warn "container 'veza-haproxy' does not exist yet — run ansible-playbook playbooks/haproxy.yml"
|
||||
fi
|
||||
|
||||
section "ZFS state (snapshots tolerated)"
|
||||
check "rpool exists" \
|
||||
"zpool list rpool"
|
||||
|
||||
section "State file"
|
||||
if [[ -f "$TALAS_STATE_FILE" ]]; then
|
||||
info "phases recorded :"
|
||||
cat "$TALAS_STATE_FILE" | sed 's/^/ /'
|
||||
else
|
||||
warn "no state file at $TALAS_STATE_FILE — bootstrap-remote.sh hasn't run yet"
|
||||
fi
|
||||
|
||||
section "Result"
|
||||
if (( FAIL == 0 )); then
|
||||
ok "$PASS / $((PASS + FAIL)) checks passed"
|
||||
exit 0
|
||||
else
|
||||
err "$FAIL FAIL out of $((PASS + FAIL)) ($PASS passed)"
|
||||
exit 1
|
||||
fi
|
||||
Loading…
Reference in a new issue