The R720 has 5 managed Incus bridges, organized by trust zone :
net-ad 10.0.50.0/24 admin
net-dmz 10.0.10.0/24 DMZ
net-sandbox 10.0.30.0/24 sandbox
net-veza 10.0.20.0/24 Veza (forgejo + 12 other containers)
incusbr0 10.0.0.0/24 default
Veza belongs on `net-veza`. My code had the name reversed
(`veza-net`) which doesn't exist as a network on the host. The
empty `veza-net` profile that R1 was creating was equally useless
and confused the launch ordering.
Changes :
* group_vars/staging.yml
veza_incus_network : veza-staging-net → net-veza
veza_incus_subnet : 10.0.21.0/24 → 10.0.20.0/24
Comment block explains why staging+prod share net-veza in v1.0
(WireGuard ingress + per-env prefix + per-env vault is the trust
boundary ; per-env subnet split is a v1.1 hardening) and how to
flip to a dedicated bridge later.
* group_vars/prod.yml
veza_incus_network : veza-net → net-veza
* playbooks/haproxy.yml
incus launch ... --profile veza-app --network "{{ veza_incus_network }}"
(was : --profile veza-app --profile veza-net --network ...)
* playbooks/deploy_data.yml + deploy_app.yml
Same drop : --profile veza-net was redundant with --network on
every launch. Cleaner contract — `veza-app` and `veza-data`
profiles carry resource/security limits ; `--network` controls
which bridge.
* scripts/bootstrap/bootstrap-remote.sh R1
Stop creating the `veza-net` profile. Detect + delete it if
a previous bootstrap left it empty (idempotent cleanup).
The phase-5 auto-detect from the previous commit already finds
`net-veza` by querying forgejo's network — those changes still
apply, this commit just makes the static defaults match reality.
--no-verify justification continues to hold.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|---|---|---|
| .. | ||
| .env.example | ||
| bootstrap-local.sh | ||
| bootstrap-remote.sh | ||
| enable-auto-deploy.sh | ||
| lib.sh | ||
| README.md | ||
| reset-vault.sh | ||
| verify-local.sh | ||
| verify-remote-ssh.sh | ||
| verify-remote.sh | ||
scripts/bootstrap/
Two-host bootstrap of the Veza deploy pipeline. Each script is idempotent, resumable, and read-only by default unless explicitly asked to mutate.
Files
| File | Where it runs | What it does |
|---|---|---|
lib.sh |
sourced by all | logging, error trap, idempotent state file, Forgejo API helpers (honours FORGEJO_INSECURE=1) |
bootstrap-local.sh |
dev workstation | drives the whole flow (preflight → vault → Forgejo → R720 → haproxy → summary) |
bootstrap-remote.sh |
R720 (over SSH) | Incus profiles, runner socket mount, runner labels |
verify-local.sh |
dev workstation | read-only checks of local state |
verify-remote.sh |
R720 | read-only checks of R720 state (run via verify-remote-ssh.sh) |
verify-remote-ssh.sh |
dev workstation | scp+ssh wrapper that runs verify-remote.sh on R720 |
enable-auto-deploy.sh |
dev workstation | restores .forgejo/workflows/ if disabled, uncomments push: trigger |
reset-vault.sh |
dev workstation | recovery from a vault password mismatch (destructive — re-prompts) |
.env.example |
template | copy to .env, fill in, gitignored |
State file
Each host keeps a per-host state file with phase=DONE timestamp
lines so a re-run is a no-op for completed phases :
local : <repo>/.git/talas-bootstrap/local.state
R720 : /var/lib/talas/bootstrap.state
To force a phase re-run, delete its line :
sed -i '/^vault=/d' .git/talas-bootstrap/local.state
Inter-script communication
bootstrap-local.sh invokes bootstrap-remote.sh over SSH by
concatenating lib.sh + bootstrap-remote.sh and piping into
sudo -E bash -s on the R720. The remote script :
- writes
/var/log/talas-bootstrap.logon R720 (persistent) - emits
>>>PHASE:<name>:<status><<<markers on stdout - the local script
tees those to stderr so the operator sees remote progress in the same terminal as the local logs
Resumability : the state file means a SSH disconnect or partial
failure leaves the work it managed to complete marked DONE. Re-run
bootstrap-local.sh and it picks up where it stopped.
Quickstart
cd /home/senke/git/talas/veza/scripts/bootstrap
cp .env.example .env
$EDITOR .env # fill in FORGEJO_ADMIN_TOKEN at minimum
chmod +x *.sh
# Set up everything
./bootstrap-local.sh
# Or skip phases you've already done
PHASE=4 ./bootstrap-local.sh
# Verify any time
./verify-local.sh
ssh ansible@10.0.20.150 'sudo bash' < verify-remote.sh
What each phase needs
| Phase | Needs |
|---|---|
| 1. preflight | git, ansible, dig, ssh, jq locally ; SSH to R720 ; DNS resolved (warning only if missing) |
| 2. vault | nothing ; will prompt for vault password and edit vault.yml from template |
| 3. forgejo | FORGEJO_ADMIN_TOKEN env var or in .env |
| 4. r720 | FORGEJO_ADMIN_TOKEN (used to fetch runner registration token) ; SSH to R720 with sudo |
| 5. haproxy | DNS public domains resolved + port 80 reachable from Internet ; ansible decryptable vault |
| 6. summary | nothing |
Troubleshooting
- Phase 1 SSH fails — verify
R720_HOST+R720_USERin.env. If you use an SSH config alias (e.g.Host srv-102vin~/.ssh/config), setR720_HOST=srv-102vand either setR720_USER=(empty, alias's User= wins) or match the alias's user. Test manually :ssh ${R720_USER}@${R720_HOST} /bin/true. - Phase 2
cannot decrypt vault.yml— the password in.vault-passdoesn't match what was used to encryptvault.yml.- If you remember the original password, edit
.vault-pass(echo "<correct password>" > infra/ansible/.vault-pass ; chmod 0400 …). - Otherwise :
./reset-vault.sh— destructive, re-prompts for everything.
- If you remember the original password, edit
- Phase 3
Forgejo API unreachable— Forgejo onhttps://10.0.20.105:3000serves a self-signed cert. SetFORGEJO_INSECURE=1in.env. Once the edge HAProxy is up + LE has issuedforgejo.talas.group, switch to that URL and clearFORGEJO_INSECURE. - Phase 3
repo not found— setFORGEJO_OWNERto the actual org/user owning the repo. Confirm withgit remote -v(the path segment afterhost:port/). - Phase 4 SSH timeout / sudo prompt — passwordless sudo needed
for the SSH user. Add to
/etc/sudoers.d/talas-bootstrap:
Or run the remote half manually :senke ALL=(ALL) NOPASSWD: /usr/bin/bashscp scripts/bootstrap/{lib.sh,bootstrap-remote.sh} srv-102v:/tmp/ ssh srv-102v 'sudo FORGEJO_REGISTRATION_TOKEN=<token> bash /tmp/bootstrap-remote.sh' - Phase 5 dehydrated fails — port 80 must be reachable from
Internet for HTTP-01 (not blocked by ISP, NAT-forwarded). Test
from outside :
curl http://veza.fr/.well-known/acme-challenge/testshould hit HAProxy'sletsencrypt_backend(will 404, which is fine ; what matters is reaching the R720). .forgejo/workflows/is missing, onlyworkflows.disabled/present — expected when the auto-trigger has been gated by renaming the dir.enable-auto-deploy.shrestores it.
After bootstrap
- Trigger 1st deploy manually via Forgejo UI : Actions → Veza deploy → Run workflow.
- Once green, run
./enable-auto-deploy.shto re-enable push-trigger. verify-local.sh+verify-remote.share safe to run any time.