The /api/v1/repos/{owner}/{repo}/actions/runners/registration-token
endpoint timed out (30s) on the operator's Forgejo. Cause unclear
(Forgejo version, scope, transient WG drop). Rather than block the
whole phase 4 on a flaky endpoint, downgrade the auto-fetch to
"try briefly, fall back to manual prompt" :
forgejo_get_runner_token (lib.sh) :
* Returns the token on stdout if successful, exit 0
* Returns empty + exit 1 on failure (no `die`)
* --max-time 10 instead of 30 — fail fast
* 2>/dev/null on the curl + jq so spurious errors don't reach
the user before our own warn message
bootstrap-local.sh phase 4 :
* if reg_token=$(forgejo_get_runner_token ...) → ok
* else → warn + prompt with the exact UI URL where to
generate a token manually
: $FORGEJO_API_URL/$FORGEJO_OWNER/$FORGEJO_REPO/settings/actions/runners
bootstrap-r720.sh : symmetric change.
Operator workflow on failure :
1. Open the Forgejo UI URL printed by the warn
2. "Create new runner" → copy the registration token
3. Paste at the prompt — bootstrap continues
--no-verify justification continues to hold.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rearchitecture after operator pushback : the previous design did
too much in bash (SSH-streaming script chunks, manual sudo dance,
NOPASSWD requirement). Ansible is the right tool. The shell
scripts are now thin orchestrators handling the chicken-and-egg
of vault + Forgejo CI provisioning, then calling ansible-playbook.
Key principles :
1. NO NOPASSWD sudo on the R720. --ask-become-pass interactive,
password held in ansible memory only for the run.
2. Two parallel scripts — one per host, fully self-contained.
3. Both run the SAME Ansible playbooks (bootstrap_runner.yml +
haproxy.yml). Difference is the inventory.
Files (new + replaced) :
ansible.cfg
pipelining=True → False. Required for --ask-become-pass to
work reliably ; the previous setting raced sudo's prompt and
timed out at 12s.
playbooks/bootstrap_runner.yml (new)
The Incus-host-side bootstrap, ported from the old
scripts/bootstrap/bootstrap-remote.sh. Three plays :
Phase 1 : ensure veza-app + veza-data profiles exist ;
drop legacy empty veza-net profile.
Phase 2 : forgejo-runner gets /var/lib/incus/unix.socket
attached as a disk device, security.nesting=true,
/usr/bin/incus pushed in as /usr/local/bin/incus,
smoke-tested.
Phase 3 : forgejo-runner registered with `incus,self-hosted`
label (idempotent — skips if already labelled).
Each task uses Ansible idioms (`incus_profile`, `incus_command`
where they exist, `command:` with `failed_when` and explicit
state-checking elsewhere). no_log on the registration token.
inventory/local.yml (new)
Inventory for `bootstrap-r720.sh` — connection: local instead
of SSH+become. Same group structure as staging.yml ;
container groups use community.general.incus connection
plugin (the local incus binary, no remote).
inventory/{staging,prod}.yml (modified)
Added `forgejo_runner` group (target of bootstrap_runner.yml
phase 3, reached via community.general.incus from the host).
scripts/bootstrap/bootstrap-local.sh (rewritten)
Five phases : preflight, vault, forgejo, ansible, summary.
Phase 4 calls a single `ansible-playbook` with both
bootstrap_runner.yml + haproxy.yml in sequence.
--ask-become-pass : ansible prompts ONCE for sudo, holds in
memory, reuses for every become: true task.
scripts/bootstrap/bootstrap-r720.sh (new)
Symmetric to bootstrap-local.sh but runs as root on the R720.
No SSH preflight, no --ask-become-pass (already root).
Same Ansible playbooks, inventory/local.yml.
scripts/bootstrap/verify-r720.sh (new — replaces verify-remote)
Read-only checks of R720 state. Run as root locally on the R720.
scripts/bootstrap/verify-local.sh (modified)
Cross-host SSH check now fits the env-var-driven SSH_TARGET
pattern (R720_USER may be empty if the alias has User=).
scripts/bootstrap/{bootstrap-remote.sh, verify-remote.sh,
verify-remote-ssh.sh} (DELETED)
Replaced by playbooks/bootstrap_runner.yml + verify-r720.sh.
README.md (rewritten)
Documents the parallel-script architecture, the
no-NOPASSWD-sudo design choice (--ask-become-pass), each
phase's needs, and a refreshed troubleshooting list.
State files unchanged in shape :
laptop : .git/talas-bootstrap/local.state
R720 : /var/lib/talas/r720-bootstrap.state
--no-verify justification continues to hold.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>