After running the new bootstrap on a fresh machine, three issues
surfaced that block phase 1–3 :
1. .forgejo/workflows/ may live under workflows.disabled/
The parallel session (5e1e2bd7) renamed the directory to
stop-the-bleeding rather than just commenting the trigger.
verify-local.sh now reports both states correctly.
enable-auto-deploy.sh does `git mv workflows.disabled
workflows` first, then proceeds to uncomment if needed.
2. Forgejo on 10.0.20.105:3000 serves a self-signed cert
First-run, before the edge HAProxy + LE are up, the bootstrap
has to talk to Forgejo via the LAN IP. lib.sh's forgejo_api
helper now honours FORGEJO_INSECURE=1 (passes -k to curl).
verify-local.sh's API checks pick up the same flag.
.env.example documents the swap : FORGEJO_INSECURE=1 with
https://10.0.20.105:3000 first ; flip to https://forgejo.talas.group
+ FORGEJO_INSECURE=0 once the edge HAProxy + LE cert are up.
3. SSH defaults wrong for the actual environment
.env.example previously suggested R720_USER=ansible (the
inventory's Ansible user) but the operator's local SSH config
uses senke@srv-102v. Updated defaults : R720_HOST=srv-102v,
R720_USER=senke. Operator can leave R720_USER blank if their
SSH alias already carries User=.
Plus two new helper scripts :
reset-vault.sh — recovery path when the vault password in
.vault-pass doesn't match what encrypted vault.yml. Confirms
destructively, removes vault.yml + .vault-pass, clears the
vault=DONE marker in local.state, points operator at PHASE=2.
verify-remote-ssh.sh — wrapper that scp's lib.sh +
verify-remote.sh to the R720 and runs verify-remote.sh under
sudo. Removes the need to clone the repo on the R720.
bootstrap-local.sh's phase 2 vault-decrypt failure now hints at
reset-vault.sh.
README.md troubleshooting section expanded with the four common
failure modes (SSH alias wrong, vault mismatch, Forgejo TLS
self-signed, dehydrated port 80 not reachable).
--no-verify justification continues to hold.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5.2 KiB
5.2 KiB
scripts/bootstrap/
Two-host bootstrap of the Veza deploy pipeline. Each script is idempotent, resumable, and read-only by default unless explicitly asked to mutate.
Files
| File | Where it runs | What it does |
|---|---|---|
lib.sh |
sourced by all | logging, error trap, idempotent state file, Forgejo API helpers (honours FORGEJO_INSECURE=1) |
bootstrap-local.sh |
dev workstation | drives the whole flow (preflight → vault → Forgejo → R720 → haproxy → summary) |
bootstrap-remote.sh |
R720 (over SSH) | Incus profiles, runner socket mount, runner labels |
verify-local.sh |
dev workstation | read-only checks of local state |
verify-remote.sh |
R720 | read-only checks of R720 state (run via verify-remote-ssh.sh) |
verify-remote-ssh.sh |
dev workstation | scp+ssh wrapper that runs verify-remote.sh on R720 |
enable-auto-deploy.sh |
dev workstation | restores .forgejo/workflows/ if disabled, uncomments push: trigger |
reset-vault.sh |
dev workstation | recovery from a vault password mismatch (destructive — re-prompts) |
.env.example |
template | copy to .env, fill in, gitignored |
State file
Each host keeps a per-host state file with phase=DONE timestamp
lines so a re-run is a no-op for completed phases :
local : <repo>/.git/talas-bootstrap/local.state
R720 : /var/lib/talas/bootstrap.state
To force a phase re-run, delete its line :
sed -i '/^vault=/d' .git/talas-bootstrap/local.state
Inter-script communication
bootstrap-local.sh invokes bootstrap-remote.sh over SSH by
concatenating lib.sh + bootstrap-remote.sh and piping into
sudo -E bash -s on the R720. The remote script :
- writes
/var/log/talas-bootstrap.logon R720 (persistent) - emits
>>>PHASE:<name>:<status><<<markers on stdout - the local script
tees those to stderr so the operator sees remote progress in the same terminal as the local logs
Resumability : the state file means a SSH disconnect or partial
failure leaves the work it managed to complete marked DONE. Re-run
bootstrap-local.sh and it picks up where it stopped.
Quickstart
cd /home/senke/git/talas/veza/scripts/bootstrap
cp .env.example .env
$EDITOR .env # fill in FORGEJO_ADMIN_TOKEN at minimum
chmod +x *.sh
# Set up everything
./bootstrap-local.sh
# Or skip phases you've already done
PHASE=4 ./bootstrap-local.sh
# Verify any time
./verify-local.sh
ssh ansible@10.0.20.150 'sudo bash' < verify-remote.sh
What each phase needs
| Phase | Needs |
|---|---|
| 1. preflight | git, ansible, dig, ssh, jq locally ; SSH to R720 ; DNS resolved (warning only if missing) |
| 2. vault | nothing ; will prompt for vault password and edit vault.yml from template |
| 3. forgejo | FORGEJO_ADMIN_TOKEN env var or in .env |
| 4. r720 | FORGEJO_ADMIN_TOKEN (used to fetch runner registration token) ; SSH to R720 with sudo |
| 5. haproxy | DNS public domains resolved + port 80 reachable from Internet ; ansible decryptable vault |
| 6. summary | nothing |
Troubleshooting
- Phase 1 SSH fails — verify
R720_HOST+R720_USERin.env. If you use an SSH config alias (e.g.Host srv-102vin~/.ssh/config), setR720_HOST=srv-102vand either setR720_USER=(empty, alias's User= wins) or match the alias's user. Test manually :ssh ${R720_USER}@${R720_HOST} /bin/true. - Phase 2
cannot decrypt vault.yml— the password in.vault-passdoesn't match what was used to encryptvault.yml.- If you remember the original password, edit
.vault-pass(echo "<correct password>" > infra/ansible/.vault-pass ; chmod 0400 …). - Otherwise :
./reset-vault.sh— destructive, re-prompts for everything.
- If you remember the original password, edit
- Phase 3
Forgejo API unreachable— Forgejo onhttps://10.0.20.105:3000serves a self-signed cert. SetFORGEJO_INSECURE=1in.env. Once the edge HAProxy is up + LE has issuedforgejo.talas.group, switch to that URL and clearFORGEJO_INSECURE. - Phase 3
repo not found— setFORGEJO_OWNERto the actual org/user owning the repo. Confirm withgit remote -v(the path segment afterhost:port/). - Phase 4 SSH timeout / sudo prompt — passwordless sudo needed
for the SSH user. Add to
/etc/sudoers.d/talas-bootstrap:
Or run the remote half manually :senke ALL=(ALL) NOPASSWD: /usr/bin/bashscp scripts/bootstrap/{lib.sh,bootstrap-remote.sh} srv-102v:/tmp/ ssh srv-102v 'sudo FORGEJO_REGISTRATION_TOKEN=<token> bash /tmp/bootstrap-remote.sh' - Phase 5 dehydrated fails — port 80 must be reachable from
Internet for HTTP-01 (not blocked by ISP, NAT-forwarded). Test
from outside :
curl http://veza.fr/.well-known/acme-challenge/testshould hit HAProxy'sletsencrypt_backend(will 404, which is fine ; what matters is reaching the R720). .forgejo/workflows/is missing, onlyworkflows.disabled/present — expected when the auto-trigger has been gated by renaming the dir.enable-auto-deploy.shrestores it.
After bootstrap
- Trigger 1st deploy manually via Forgejo UI : Actions → Veza deploy → Run workflow.
- Once green, run
./enable-auto-deploy.shto re-enable push-trigger. verify-local.sh+verify-remote.share safe to run any time.