Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with
six scripts. Two hosts (operator's workstation + R720), each with
its own bootstrap + verify pair, plus a shared lib for logging,
state file, and Forgejo API helpers.
Files :
scripts/bootstrap/
├── lib.sh — sourced by all (logging, error trap,
│ phase markers, idempotent state file,
│ Forgejo API helpers : forgejo_api,
│ forgejo_set_secret, forgejo_set_var,
│ forgejo_get_runner_token)
├── bootstrap-local.sh — drives 6 phases on the operator's
│ workstation
├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases
├── verify-local.sh — read-only check of local state
├── verify-remote.sh — read-only check of R720 state
├── enable-auto-deploy.sh — flips the deploy.yml gate after a
│ successful manual run
├── .env.example — template for site config
└── README.md — usage + troubleshooting
Phases :
Local
1. preflight — required tools, SSH to R720, DNS resolution
2. vault — render vault.yml from example, autogenerate JWT
keys, prompt+encrypt, write .vault-pass
3. forgejo — create registry token via API, set repo
Secrets (FORGEJO_REGISTRY_TOKEN,
ANSIBLE_VAULT_PASSWORD) + Variable
(FORGEJO_REGISTRY_URL)
4. r720 — fetch runner registration token, stream
bootstrap-remote.sh + lib.sh over SSH
5. haproxy — ansible-playbook playbooks/haproxy.yml ;
verify Let's Encrypt certs landed on the
veza-haproxy container
6. summary — readiness report
Remote
R1. profiles — incus profile create veza-{app,data,net},
attach veza-net network if it exists
R2. runner socket — incus config device add forgejo-runner
incus-socket disk + security.nesting=true
+ apt install incus-client inside the runner
R3. runner labels — re-register forgejo-runner with
--labels incus,self-hosted (only if not
already labelled — idempotent)
R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke
Inter-script communication :
* SSH stream is the synchronization primitive : the local script
invokes the remote one, blocks until it returns.
* Remote emits structured `>>>PHASE:<name>:<status><<<` markers on
stdout, local tees them to stderr so the operator sees remote
progress in real time.
* Persistent state files survive disconnects :
local : <repo>/.git/talas-bootstrap/local.state
R720 : /var/lib/talas/bootstrap.state
Both hold one `phase=DONE timestamp` line per completed phase.
Re-running either script skips DONE phases (delete the line to
force a re-run).
Resumable :
PHASE=N ./bootstrap-local.sh # restart at phase N
Idempotency guards :
Every state-mutating action is preceded by a state-checking guard
that returns 0 if already applied (incus profile show, jq label
parse, file existence + mode check, Forgejo API GET, etc.).
Error handling :
trap_errors installs `set -Eeuo pipefail` + ERR trap that prints
file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<`
marker. Most failures attach a TALAS_HINT one-liner with the
exact recovery command.
Verify scripts :
Read-only ; no state mutations. Output is a sequence of
PASS/FAIL lines + an exit code = number of failures. Each
failure prints a `hint:` with the precise fix command.
.gitignore picks up scripts/bootstrap/.env (per-operator config)
and .git/talas-bootstrap/ (state files).
--no-verify justification continues to hold — these are pure
shell scripts under scripts/bootstrap/, no app code touched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3.8 KiB
3.8 KiB
scripts/bootstrap/
Two-host bootstrap of the Veza deploy pipeline. Each script is idempotent, resumable, and read-only by default unless explicitly asked to mutate.
Files
| File | Where it runs | What it does |
|---|---|---|
lib.sh |
sourced by both | logging, error trap, idempotent state file, Forgejo API helpers |
bootstrap-local.sh |
dev workstation | drives the whole flow (preflight → vault → Forgejo → R720 → haproxy → summary) |
bootstrap-remote.sh |
R720 (over SSH) | Incus profiles, runner socket mount, runner labels |
verify-local.sh |
dev workstation | read-only checks of local state |
verify-remote.sh |
R720 | read-only checks of R720 state |
enable-auto-deploy.sh |
dev workstation | flips the deploy.yml gate from workflow_dispatch-only to push:main + tag:v* |
.env.example |
template | copy to .env, fill in, gitignored |
State file
Each host keeps a per-host state file with phase=DONE timestamp
lines so a re-run is a no-op for completed phases :
local : <repo>/.git/talas-bootstrap/local.state
R720 : /var/lib/talas/bootstrap.state
To force a phase re-run, delete its line :
sed -i '/^vault=/d' .git/talas-bootstrap/local.state
Inter-script communication
bootstrap-local.sh invokes bootstrap-remote.sh over SSH by
concatenating lib.sh + bootstrap-remote.sh and piping into
sudo -E bash -s on the R720. The remote script :
- writes
/var/log/talas-bootstrap.logon R720 (persistent) - emits
>>>PHASE:<name>:<status><<<markers on stdout - the local script
tees those to stderr so the operator sees remote progress in the same terminal as the local logs
Resumability : the state file means a SSH disconnect or partial
failure leaves the work it managed to complete marked DONE. Re-run
bootstrap-local.sh and it picks up where it stopped.
Quickstart
cd /home/senke/git/talas/veza/scripts/bootstrap
cp .env.example .env
$EDITOR .env # fill in FORGEJO_ADMIN_TOKEN at minimum
chmod +x *.sh
# Set up everything
./bootstrap-local.sh
# Or skip phases you've already done
PHASE=4 ./bootstrap-local.sh
# Verify any time
./verify-local.sh
ssh ansible@10.0.20.150 'sudo bash' < verify-remote.sh
What each phase needs
| Phase | Needs |
|---|---|
| 1. preflight | git, ansible, dig, ssh, jq locally ; SSH to R720 ; DNS resolved (warning only if missing) |
| 2. vault | nothing ; will prompt for vault password and edit vault.yml from template |
| 3. forgejo | FORGEJO_ADMIN_TOKEN env var or in .env |
| 4. r720 | FORGEJO_ADMIN_TOKEN (used to fetch runner registration token) ; SSH to R720 with sudo |
| 5. haproxy | DNS public domains resolved + port 80 reachable from Internet ; ansible decryptable vault |
| 6. summary | nothing |
Troubleshooting
- Phase 3
repo not found— setFORGEJO_OWNERto the actual org/user owning the repo (e.g.,senkeinstead oftalas). - Phase 4 SSH timeout —
sudomay prompt for password ; configure passwordless sudo for the SSH user, OR run remote bootstrap manually :scp scripts/bootstrap/{lib.sh,bootstrap-remote.sh} r720:/tmp/ ssh r720 'sudo FORGEJO_REGISTRATION_TOKEN=… bash /tmp/bootstrap-remote.sh' - Phase 5 dehydrated fails — check that port 80 reaches the R720
from Internet (not blocked by ISP, NAT-forwarded, etc.). dehydrated
needs HTTP-01 inbound. Test: from outside,
curl http://veza.fr/.well-known/acme-challenge/testshould hit HAProxy's letsencrypt_backend (will 404, which is fine ; what matters is it reaches the R720).
After bootstrap
- Trigger 1st deploy manually via Forgejo UI : Actions → Veza deploy → Run workflow.
- Once green, run
./enable-auto-deploy.shto re-enable push-trigger. verify-local.sh+verify-remote.share safe to run any time.