ansible.cfg sets stdout_callback=yaml ; that callback ships in the community.general collection. Without the collection installed, ansible-playbook errors out before parsing the playbook : "Invalid callback for stdout specified: yaml". Phase 5 now installs the three collections the haproxy + deploy playbooks need (community.general, community.postgresql, community.rabbitmq) before running the playbook. Per-collection guard via `ansible-galaxy collection list` skips re-install on re-runs. Same set the deploy.yml workflow already installs on the runner ; keeping the local + CI sides in sync. --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| .env.example | ||
| bootstrap-local.sh | ||
| bootstrap-remote.sh | ||
| enable-auto-deploy.sh | ||
| lib.sh | ||
| README.md | ||
| reset-vault.sh | ||
| verify-local.sh | ||
| verify-remote-ssh.sh | ||
| verify-remote.sh | ||
scripts/bootstrap/
Two-host bootstrap of the Veza deploy pipeline. Each script is idempotent, resumable, and read-only by default unless explicitly asked to mutate.
Files
| File | Where it runs | What it does |
|---|---|---|
lib.sh |
sourced by all | logging, error trap, idempotent state file, Forgejo API helpers (honours FORGEJO_INSECURE=1) |
bootstrap-local.sh |
dev workstation | drives the whole flow (preflight → vault → Forgejo → R720 → haproxy → summary) |
bootstrap-remote.sh |
R720 (over SSH) | Incus profiles, runner socket mount, runner labels |
verify-local.sh |
dev workstation | read-only checks of local state |
verify-remote.sh |
R720 | read-only checks of R720 state (run via verify-remote-ssh.sh) |
verify-remote-ssh.sh |
dev workstation | scp+ssh wrapper that runs verify-remote.sh on R720 |
enable-auto-deploy.sh |
dev workstation | restores .forgejo/workflows/ if disabled, uncomments push: trigger |
reset-vault.sh |
dev workstation | recovery from a vault password mismatch (destructive — re-prompts) |
.env.example |
template | copy to .env, fill in, gitignored |
State file
Each host keeps a per-host state file with phase=DONE timestamp
lines so a re-run is a no-op for completed phases :
local : <repo>/.git/talas-bootstrap/local.state
R720 : /var/lib/talas/bootstrap.state
To force a phase re-run, delete its line :
sed -i '/^vault=/d' .git/talas-bootstrap/local.state
Inter-script communication
bootstrap-local.sh invokes bootstrap-remote.sh over SSH by
concatenating lib.sh + bootstrap-remote.sh and piping into
sudo -E bash -s on the R720. The remote script :
- writes
/var/log/talas-bootstrap.logon R720 (persistent) - emits
>>>PHASE:<name>:<status><<<markers on stdout - the local script
tees those to stderr so the operator sees remote progress in the same terminal as the local logs
Resumability : the state file means a SSH disconnect or partial
failure leaves the work it managed to complete marked DONE. Re-run
bootstrap-local.sh and it picks up where it stopped.
Quickstart
cd /home/senke/git/talas/veza/scripts/bootstrap
cp .env.example .env
$EDITOR .env # fill in FORGEJO_ADMIN_TOKEN at minimum
chmod +x *.sh
# Set up everything
./bootstrap-local.sh
# Or skip phases you've already done
PHASE=4 ./bootstrap-local.sh
# Verify any time
./verify-local.sh
ssh ansible@10.0.20.150 'sudo bash' < verify-remote.sh
What each phase needs
| Phase | Needs |
|---|---|
| 1. preflight | git, ansible, dig, ssh, jq locally ; SSH to R720 ; DNS resolved (warning only if missing) |
| 2. vault | nothing ; will prompt for vault password and edit vault.yml from template |
| 3. forgejo | FORGEJO_ADMIN_TOKEN env var or in .env |
| 4. r720 | FORGEJO_ADMIN_TOKEN (used to fetch runner registration token) ; SSH to R720 with sudo |
| 5. haproxy | DNS public domains resolved + port 80 reachable from Internet ; ansible decryptable vault |
| 6. summary | nothing |
Troubleshooting
- Phase 1 SSH fails — verify
R720_HOST+R720_USERin.env. If you use an SSH config alias (e.g.Host srv-102vin~/.ssh/config), setR720_HOST=srv-102vand either setR720_USER=(empty, alias's User= wins) or match the alias's user. Test manually :ssh ${R720_USER}@${R720_HOST} /bin/true. - Phase 2
cannot decrypt vault.yml— the password in.vault-passdoesn't match what was used to encryptvault.yml.- If you remember the original password, edit
.vault-pass(echo "<correct password>" > infra/ansible/.vault-pass ; chmod 0400 …). - Otherwise :
./reset-vault.sh— destructive, re-prompts for everything.
- If you remember the original password, edit
- Phase 3
Forgejo API unreachable— Forgejo onhttps://10.0.20.105:3000serves a self-signed cert. SetFORGEJO_INSECURE=1in.env. Once the edge HAProxy is up + LE has issuedforgejo.talas.group, switch to that URL and clearFORGEJO_INSECURE. - Phase 3
repo not found— setFORGEJO_OWNERto the actual org/user owning the repo. Confirm withgit remote -v(the path segment afterhost:port/). - Phase 4 SSH timeout / sudo prompt — passwordless sudo needed
for the SSH user. Add to
/etc/sudoers.d/talas-bootstrap:
Or run the remote half manually :senke ALL=(ALL) NOPASSWD: /usr/bin/bashscp scripts/bootstrap/{lib.sh,bootstrap-remote.sh} srv-102v:/tmp/ ssh srv-102v 'sudo FORGEJO_REGISTRATION_TOKEN=<token> bash /tmp/bootstrap-remote.sh' - Phase 5 dehydrated fails — port 80 must be reachable from
Internet for HTTP-01 (not blocked by ISP, NAT-forwarded). Test
from outside :
curl http://veza.fr/.well-known/acme-challenge/testshould hit HAProxy'sletsencrypt_backend(will 404, which is fine ; what matters is reaching the R720). .forgejo/workflows/is missing, onlyworkflows.disabled/present — expected when the auto-trigger has been gated by renaming the dir.enable-auto-deploy.shrestores it.
After bootstrap
- Trigger 1st deploy manually via Forgejo UI : Actions → Veza deploy → Run workflow.
- Once green, run
./enable-auto-deploy.shto re-enable push-trigger. verify-local.sh+verify-remote.share safe to run any time.