# `scripts/bootstrap/` — bootstrap the Veza deploy pipeline Two parallel scripts (one per host) + four helpers + one shared lib. Each script is **idempotent**, **resumable**, **read-only by default** unless explicitly asked to mutate. **No NOPASSWD sudo required**. The heavy lifting (Incus profiles, forgejo-runner config, HAProxy edge, Let's Encrypt) is done by **Ansible playbooks**, not bash. The shell scripts are thin orchestrators that handle the chicken-and-egg part : create the Vault that Ansible needs, set the Forgejo CI secrets, then call `ansible-playbook`. ## Files | File | Where it runs | What it does | |---|---|---| | `lib.sh` | sourced by all | logging, error trap, idempotent state file, Forgejo API helpers | | `bootstrap-local.sh` | operator's laptop | drives Ansible over SSH ; --ask-become-pass on the R720 | | `bootstrap-r720.sh` | R720 directly (sudo) | drives Ansible locally (connection: local) ; no SSH, no sudo prompts | | `verify-local.sh` | laptop | read-only checks of local + remote state | | `verify-r720.sh` | R720 (sudo) | read-only checks of R720 state | | `enable-auto-deploy.sh` | laptop | restores `.forgejo/workflows/`, uncomments push: trigger | | `reset-vault.sh` | laptop | recovery from vault password mismatch (destructive) | | `.env.example` | template | copy to `.env`, fill in, gitignored | ## Two scripts, one Ansible Both `bootstrap-local.sh` and `bootstrap-r720.sh` end up running the **same two playbooks** : * `playbooks/bootstrap_runner.yml` — Incus profiles + forgejo-runner Incus access + runner registration with `incus` label * `playbooks/haproxy.yml` — edge HAProxy container + dehydrated Let's Encrypt issuance for veza.fr / staging.veza.fr / talas.fr / forgejo.talas.group The difference is the **inventory** : * laptop → `inventory/staging.yml` (SSH to R720, --ask-become-pass) * R720 → `inventory/local.yml` (connection: local, already root) Pick whichever is convenient. The state files are independent (laptop keeps state under `.git/talas-bootstrap/`, R720 under `/var/lib/talas/`), so running both at different times doesn't double-do anything. ## State files ``` laptop : /.git/talas-bootstrap/local.state R720 : /var/lib/talas/r720-bootstrap.state ``` `phase=DONE timestamp` per completed phase. Re-run skips DONE phases. To force a phase re-run, delete its line : ```bash sed -i '/^vault=/d' .git/talas-bootstrap/local.state ``` ## Quickstart — from the laptop ```bash cd /home/senke/git/talas/veza/scripts/bootstrap cp .env.example .env vim .env # at minimum : FORGEJO_ADMIN_TOKEN chmod +x *.sh # Set up everything end-to-end : ./bootstrap-local.sh # Or skip phases you've already done : PHASE=4 ./bootstrap-local.sh # Verify any time (read-only) : ./verify-local.sh ``` ## Quickstart — directly on the R720 ```bash ssh srv-102v cd /path/to/veza/scripts/bootstrap cp .env.example .env vim .env # FORGEJO_ADMIN_TOKEN at minimum sudo ./bootstrap-r720.sh # Verify : sudo ./verify-r720.sh ``` ## Sudo on the R720 — the design choice The bash scripts **do not require NOPASSWD sudo** on the R720. Two reasons : 1. **Trust boundary** — NOPASSWD turns any compromise of the operator's account into root on the host. Keeping the password requirement means an attacker also needs to phish/keylog the sudo password. 2. **Ansible's `--ask-become-pass`** is fine for interactive runs. The operator types the password ONCE per `bootstrap-local.sh` invocation ; ansible holds it in memory and reuses for every `become: true` task. No file written, no env var leaked. `pipelining = False` in `ansible.cfg` is what makes interactive `--ask-become-pass` reliable (the previous `True` setting raced sudo's TTY-driven prompt). ## What each phase needs | Phase | Needs | |---|---| | 1. preflight | git, ansible, dig, ssh, jq locally ; SSH to R720 (laptop) ; DNS resolved (warning if missing) | | 2. vault | nothing ; auto-generates JWT + 11 random passwords, prompts for vault password | | 3. forgejo | `FORGEJO_ADMIN_TOKEN` (.env or env) — scopes : write:repository, read:repository | | 4. ansible | sudo password on R720 (interactive ; not stored) | | 5. summary | nothing | ## Troubleshooting * **Phase 1 SSH fails** — verify `R720_HOST` + `R720_USER` in `.env`. If using an SSH config alias, `R720_HOST=` and leave `R720_USER=` empty. * **Phase 2 cannot decrypt** — `./reset-vault.sh` (destructive, re-prompts for everything). * **Phase 3 Forgejo unreachable** — set `FORGEJO_INSECURE=1` for self-signed cert on `https://10.0.20.105:3000`. Update to `https://forgejo.talas.group` once edge HAProxy + LE is up. * **Phase 3 token lacks scope** — token needs at minimum `write:repository`. `write:admin` lets the script auto-create the registry token ; without it, you'll be prompted to paste one you create manually. * **Phase 4 `Timeout waiting for privilege escalation prompt`** — set `pipelining = False` in `infra/ansible/ansible.cfg`. The current default is `False` ; revert if it's been changed. * **Phase 4 dehydrated fails** — port 80 must be reachable from Internet (HTTP-01 challenge). Test from an external host : `curl http://veza.fr/`. If it doesn't reach the R720, configure port forwarding 80 + 443 on your home router / ISP box. * **Phase 4 Incus network not found** — group_vars defaults to `net-veza`. The script auto-detects from forgejo's network on the R720 ; if your bridge has a different name, set `veza_incus_network` in `group_vars/staging.yml` (or `inventory/local.yml` for the R720 case). ## After bootstrap * Trigger 1st deploy manually : Forgejo Actions UI → Veza deploy → Run workflow. * Once green, run `./enable-auto-deploy.sh` to restore the `push:main + tag:v*` triggers. * `verify-{local,r720}.sh` are safe to run any time.