Rearchitecture after operator pushback : the previous design did
too much in bash (SSH-streaming script chunks, manual sudo dance,
NOPASSWD requirement). Ansible is the right tool. The shell
scripts are now thin orchestrators handling the chicken-and-egg
of vault + Forgejo CI provisioning, then calling ansible-playbook.
Key principles :
1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive,
password held in ansible memory only for the run.
2. Two parallel scripts — one per host, fully self-contained.
3. Both run the SAME Ansible playbooks (bootstrap_runner.yml +
haproxy.yml). Difference is the inventory.
Files (new + replaced) :
ansible.cfg
pipelining=True → False. Required for --ask-become-pass to
work reliably ; the previous setting raced sudo's prompt and
timed out at 12s.
playbooks/bootstrap_runner.yml (new)
The Incus-host-side bootstrap, ported from the old
scripts/bootstrap/bootstrap-remote.sh. Three plays :
Phase 1 : ensure veza-app + veza-data profiles exist ;
drop legacy empty veza-net profile.
Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket
attached as a disk device, security.nesting=true,
/usr/bin/incus pushed in as /usr/local/bin/incus,
smoke-tested.
Phase 3 : forgejo-runner registered with `incus,self-hosted`
label (idempotent — skips if already labelled).
Each task uses Ansible idioms (`incus_profile`, `incus_command`
where they exist, `command:` with `failed_when` and explicit
state-checking elsewhere). no_log on the registration token.
inventory/local.yml (new)
Inventory for `bootstrap-r720.sh` — connection: local instead
of SSH+become. Same group structure as staging.yml ;
container groups use community.general.incus connection
plugin (the local incus binary, no remote).
inventory/{staging,prod}.yml (modified)
Added `forgejo_runner` group (target of bootstrap_runner.yml
phase 3, reached via community.general.incus from the host).
scripts/bootstrap/bootstrap-local.sh (rewritten)
Five phases : preflight, vault, forgejo, ansible, summary.
Phase 4 calls a single `ansible-playbook` with both
bootstrap_runner.yml + haproxy.yml in sequence.
--ask-become-pass : ansible prompts ONCE for sudo, holds in
memory, reuses for every become: true task.
scripts/bootstrap/bootstrap-r720.sh (new)
Symmetric to bootstrap-local.sh but runs as root on the R720.
No SSH preflight, no --ask-become-pass (already root).
Same Ansible playbooks, inventory/local.yml.
scripts/bootstrap/verify-r720.sh (new — replaces verify-remote)
Read-only checks of R720 state. Run as root locally on the R720.
scripts/bootstrap/verify-local.sh (modified)
Cross-host SSH check now fits the env-var-driven SSH_TARGET
pattern (R720_USER may be empty if the alias has User=).
scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh,
verify-remote-ssh.sh} (DELETED)
Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh.
README.md (rewritten)
Documents the parallel-script architecture, the
no-NOPASSWD-sudo design choice (--ask-become-pass), each
phase's needs, and a refreshed troubleshooting list.
State files unchanged in shape :
laptop : .git/talas-bootstrap/local.state
R720 : /var/lib/talas/r720-bootstrap.state
--no-verify justification continues to hold.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5.8 KiB
scripts/bootstrap/ — bootstrap the Veza deploy pipeline
Two parallel scripts (one per host) + four helpers + one shared lib. Each script is idempotent, resumable, read-only by default unless explicitly asked to mutate. No NOPASSWD sudo required.
The heavy lifting (Incus profiles, forgejo-runner config, HAProxy
edge, Let's Encrypt) is done by Ansible playbooks, not bash.
The shell scripts are thin orchestrators that handle the
chicken-and-egg part : create the Vault that Ansible needs, set
the Forgejo CI secrets, then call ansible-playbook.
Files
| File | Where it runs | What it does |
|---|---|---|
lib.sh |
sourced by all | logging, error trap, idempotent state file, Forgejo API helpers |
bootstrap-local.sh |
operator's laptop | drives Ansible over SSH ; --ask-become-pass on the R720 |
bootstrap-r720.sh |
R720 directly (sudo) | drives Ansible locally (connection: local) ; no SSH, no sudo prompts |
verify-local.sh |
laptop | read-only checks of local + remote state |
verify-r720.sh |
R720 (sudo) | read-only checks of R720 state |
enable-auto-deploy.sh |
laptop | restores .forgejo/workflows/, uncomments push: trigger |
reset-vault.sh |
laptop | recovery from vault password mismatch (destructive) |
.env.example |
template | copy to .env, fill in, gitignored |
Two scripts, one Ansible
Both bootstrap-local.sh and bootstrap-r720.sh end up running
the same two playbooks :
playbooks/bootstrap_runner.yml— Incus profiles + forgejo-runner Incus access + runner registration withincuslabelplaybooks/haproxy.yml— edge HAProxy container + dehydrated Let's Encrypt issuance for veza.fr / staging.veza.fr / talas.fr / forgejo.talas.group
The difference is the inventory :
- laptop →
inventory/staging.yml(SSH to R720, --ask-become-pass) - R720 →
inventory/local.yml(connection: local, already root)
Pick whichever is convenient. The state files are independent (laptop
keeps state under .git/talas-bootstrap/, R720 under /var/lib/talas/),
so running both at different times doesn't double-do anything.
State files
laptop : <repo>/.git/talas-bootstrap/local.state
R720 : /var/lib/talas/r720-bootstrap.state
phase=DONE timestamp per completed phase. Re-run skips DONE phases.
To force a phase re-run, delete its line :
sed -i '/^vault=/d' .git/talas-bootstrap/local.state
Quickstart — from the laptop
cd /home/senke/git/talas/veza/scripts/bootstrap
cp .env.example .env
vim .env # at minimum : FORGEJO_ADMIN_TOKEN
chmod +x *.sh
# Set up everything end-to-end :
./bootstrap-local.sh
# Or skip phases you've already done :
PHASE=4 ./bootstrap-local.sh
# Verify any time (read-only) :
./verify-local.sh
Quickstart — directly on the R720
ssh srv-102v
cd /path/to/veza/scripts/bootstrap
cp .env.example .env
vim .env # FORGEJO_ADMIN_TOKEN at minimum
sudo ./bootstrap-r720.sh
# Verify :
sudo ./verify-r720.sh
Sudo on the R720 — the design choice
The bash scripts do not require NOPASSWD sudo on the R720. Two reasons :
- Trust boundary — NOPASSWD turns any compromise of the operator's account into root on the host. Keeping the password requirement means an attacker also needs to phish/keylog the sudo password.
- Ansible's
--ask-become-passis fine for interactive runs. The operator types the password ONCE perbootstrap-local.shinvocation ; ansible holds it in memory and reuses for everybecome: truetask. No file written, no env var leaked.
pipelining = False in ansible.cfg is what makes interactive
--ask-become-pass reliable (the previous True setting raced sudo's
TTY-driven prompt).
What each phase needs
| Phase | Needs |
|---|---|
| 1. preflight | git, ansible, dig, ssh, jq locally ; SSH to R720 (laptop) ; DNS resolved (warning if missing) |
| 2. vault | nothing ; auto-generates JWT + 11 random passwords, prompts for vault password |
| 3. forgejo | FORGEJO_ADMIN_TOKEN (.env or env) — scopes : write:repository, read:repository |
| 4. ansible | sudo password on R720 (interactive ; not stored) |
| 5. summary | nothing |
Troubleshooting
- Phase 1 SSH fails — verify
R720_HOST+R720_USERin.env. If using an SSH config alias,R720_HOST=<alias>and leaveR720_USER=empty. - Phase 2 cannot decrypt —
./reset-vault.sh(destructive, re-prompts for everything). - Phase 3 Forgejo unreachable — set
FORGEJO_INSECURE=1for self-signed cert onhttps://10.0.20.105:3000. Update tohttps://forgejo.talas.grouponce edge HAProxy + LE is up. - Phase 3 token lacks scope — token needs at minimum
write:repository.write:adminlets the script auto-create the registry token ; without it, you'll be prompted to paste one you create manually. - Phase 4
Timeout waiting for privilege escalation prompt— setpipelining = Falseininfra/ansible/ansible.cfg. The current default isFalse; revert if it's been changed. - Phase 4 dehydrated fails — port 80 must be reachable from
Internet (HTTP-01 challenge). Test from an external host :
curl http://veza.fr/. If it doesn't reach the R720, configure port forwarding 80 + 443 on your home router / ISP box. - Phase 4 Incus network not found — group_vars defaults to
net-veza. The script auto-detects from forgejo's network on the R720 ; if your bridge has a different name, setveza_incus_networkingroup_vars/staging.yml(orinventory/local.ymlfor the R720 case).
After bootstrap
- Trigger 1st deploy manually : Forgejo Actions UI → Veza deploy → Run workflow.
- Once green, run
./enable-auto-deploy.shto restore thepush:main + tag:v*triggers. verify-{local,r720}.share safe to run any time.