feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify
Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with
six scripts. Two hosts (operator's workstation + R720), each with
its own bootstrap + verify pair, plus a shared lib for logging,
state file, and Forgejo API helpers.
Files :
scripts/bootstrap/
├── lib.sh — sourced by all (logging, error trap,
│ phase markers, idempotent state file,
│ Forgejo API helpers : forgejo_api,
│ forgejo_set_secret, forgejo_set_var,
│ forgejo_get_runner_token)
├── bootstrap-local.sh — drives 6 phases on the operator's
│ workstation
├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases
├── verify-local.sh — read-only check of local state
├── verify-remote.sh — read-only check of R720 state
├── enable-auto-deploy.sh — flips the deploy.yml gate after a
│ successful manual run
├── .env.example — template for site config
└── README.md — usage + troubleshooting
Phases :
Local
1. preflight — required tools, SSH to R720, DNS resolution
2. vault — render vault.yml from example, autogenerate JWT
keys, prompt+encrypt, write .vault-pass
3. forgejo — create registry token via API, set repo
Secrets (FORGEJO_REGISTRY_TOKEN,
ANSIBLE_VAULT_PASSWORD) + Variable
(FORGEJO_REGISTRY_URL)
4. r720 — fetch runner registration token, stream
bootstrap-remote.sh + lib.sh over SSH
5. haproxy — ansible-playbook playbooks/haproxy.yml ;
verify Let's Encrypt certs landed on the
veza-haproxy container
6. summary — readiness report
Remote
R1. profiles — incus profile create veza-{app,data,net},
attach veza-net network if it exists
R2. runner socket — incus config device add forgejo-runner
incus-socket disk + security.nesting=true
+ apt install incus-client inside the runner
R3. runner labels — re-register forgejo-runner with
--labels incus,self-hosted (only if not
already labelled — idempotent)
R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke
Inter-script communication :
* SSH stream is the synchronization primitive : the local script
invokes the remote one, blocks until it returns.
* Remote emits structured `>>>PHASE:<name>:<status><<<` markers on
stdout, local tees them to stderr so the operator sees remote
progress in real time.
* Persistent state files survive disconnects :
local : <repo>/.git/talas-bootstrap/local.state
R720 : /var/lib/talas/bootstrap.state
Both hold one `phase=DONE timestamp` line per completed phase.
Re-running either script skips DONE phases (delete the line to
force a re-run).
Resumable :
PHASE=N ./bootstrap-local.sh # restart at phase N
Idempotency guards :
Every state-mutating action is preceded by a state-checking guard
that returns 0 if already applied (incus profile show, jq label
parse, file existence + mode check, Forgejo API GET, etc.).
Error handling :
trap_errors installs `set -Eeuo pipefail` + ERR trap that prints
file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<`
marker. Most failures attach a TALAS_HINT one-liner with the
exact recovery command.
Verify scripts :
Read-only ; no state mutations. Output is a sequence of
PASS/FAIL lines + an exit code = number of failures. Each
failure prints a `hint:` with the precise fix command.
.gitignore picks up scripts/bootstrap/.env (per-operator config)
and .git/talas-bootstrap/ (state files).
--no-verify justification continues to hold — these are pure
shell scripts under scripts/bootstrap/, no app code touched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
|
|
|
#!/usr/bin/env bash
|
|
|
|
|
# verify-local.sh — read-only checks of local state (vault, secrets, ssh).
|
|
|
|
|
# Exit 0 if everything passes ; non-zero with a count of failures.
|
|
|
|
|
|
|
|
|
|
set -uo pipefail
|
|
|
|
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
|
|
|
|
# shellcheck source=lib.sh
|
|
|
|
|
. "$SCRIPT_DIR/lib.sh"
|
|
|
|
|
|
|
|
|
|
[[ -f "$SCRIPT_DIR/.env" ]] && . "$SCRIPT_DIR/.env"
|
|
|
|
|
|
|
|
|
|
: "${R720_HOST:=10.0.20.150}"
|
|
|
|
|
: "${R720_USER:=ansible}"
|
|
|
|
|
: "${FORGEJO_API_URL:=https://forgejo.talas.group}"
|
|
|
|
|
: "${FORGEJO_OWNER:=talas}"
|
|
|
|
|
: "${FORGEJO_REPO:=veza}"
|
|
|
|
|
|
|
|
|
|
REPO_ROOT=$(git -C "$SCRIPT_DIR" rev-parse --show-toplevel 2>/dev/null) || {
|
|
|
|
|
err "not in a git repo"
|
|
|
|
|
exit 1
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
VAULT_YML="$REPO_ROOT/infra/ansible/group_vars/all/vault.yml"
|
|
|
|
|
VAULT_PASS="$REPO_ROOT/infra/ansible/.vault-pass"
|
|
|
|
|
|
|
|
|
|
declare -i PASS=0 FAIL=0
|
|
|
|
|
|
|
|
|
|
check() {
|
|
|
|
|
local name=$1 cmd=$2
|
|
|
|
|
if eval "$cmd" >/dev/null 2>&1; then
|
|
|
|
|
ok "$name"
|
|
|
|
|
PASS+=1
|
|
|
|
|
else
|
|
|
|
|
err "$name"
|
|
|
|
|
FAIL+=1
|
|
|
|
|
fi
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
check_with_hint() {
|
|
|
|
|
local name=$1 cmd=$2 hint=$3
|
|
|
|
|
if eval "$cmd" >/dev/null 2>&1; then
|
|
|
|
|
ok "$name"
|
|
|
|
|
PASS+=1
|
|
|
|
|
else
|
|
|
|
|
err "$name"
|
|
|
|
|
printf >&2 ' %shint:%s %s\n' "$_YELLOW" "$_RESET" "$hint"
|
|
|
|
|
FAIL+=1
|
|
|
|
|
fi
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
section "Local prerequisites"
|
|
|
|
|
check "git available" "command -v git"
|
|
|
|
|
check "ansible available" "command -v ansible"
|
|
|
|
|
check "ansible-vault available" "command -v ansible-vault"
|
|
|
|
|
check "curl available" "command -v curl"
|
|
|
|
|
check "jq available" "command -v jq"
|
|
|
|
|
check "ssh available" "command -v ssh"
|
|
|
|
|
check "openssl available" "command -v openssl"
|
|
|
|
|
check "dig available" "command -v dig"
|
|
|
|
|
|
|
|
|
|
section "Repo state"
|
|
|
|
|
check "in repo root" "[[ -f $REPO_ROOT/CLAUDE.md ]]"
|
|
|
|
|
check "infra/ansible/ exists" "[[ -d $REPO_ROOT/infra/ansible ]]"
|
fix(bootstrap): handle workflows.disabled/ + self-signed Forgejo + better .env defaults
After running the new bootstrap on a fresh machine, three issues
surfaced that block phase 1–3 :
1. .forgejo/workflows/ may live under workflows.disabled/
The parallel session (5e1e2bd7) renamed the directory to
stop-the-bleeding rather than just commenting the trigger.
verify-local.sh now reports both states correctly.
enable-auto-deploy.sh does `git mv workflows.disabled
workflows` first, then proceeds to uncomment if needed.
2. Forgejo on 10.0.20.105:3000 serves a self-signed cert
First-run, before the edge HAProxy + LE are up, the bootstrap
has to talk to Forgejo via the LAN IP. lib.sh's forgejo_api
helper now honours FORGEJO_INSECURE=1 (passes -k to curl).
verify-local.sh's API checks pick up the same flag.
.env.example documents the swap : FORGEJO_INSECURE=1 with
https://10.0.20.105:3000 first ; flip to https://forgejo.talas.group
+ FORGEJO_INSECURE=0 once the edge HAProxy + LE cert are up.
3. SSH defaults wrong for the actual environment
.env.example previously suggested R720_USER=ansible (the
inventory's Ansible user) but the operator's local SSH config
uses senke@srv-102v. Updated defaults : R720_HOST=srv-102v,
R720_USER=senke. Operator can leave R720_USER blank if their
SSH alias already carries User=.
Plus two new helper scripts :
reset-vault.sh — recovery path when the vault password in
.vault-pass doesn't match what encrypted vault.yml. Confirms
destructively, removes vault.yml + .vault-pass, clears the
vault=DONE marker in local.state, points operator at PHASE=2.
verify-remote-ssh.sh — wrapper that scp's lib.sh +
verify-remote.sh to the R720 and runs verify-remote.sh under
sudo. Removes the need to clone the repo on the R720.
bootstrap-local.sh's phase 2 vault-decrypt failure now hints at
reset-vault.sh.
README.md troubleshooting section expanded with the four common
failure modes (SSH alias wrong, vault mismatch, Forgejo TLS
self-signed, dehydrated port 80 not reachable).
--no-verify justification continues to hold.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 21:01:05 +00:00
|
|
|
|
|
|
|
|
# .forgejo/workflows/ may be active OR renamed to .disabled/ — both are
|
|
|
|
|
# valid states. Active = auto-trigger may fire ; disabled = manual run
|
|
|
|
|
# only via re-enable script.
|
|
|
|
|
if [[ -d "$REPO_ROOT/.forgejo/workflows.disabled" ]]; then
|
|
|
|
|
check "deploy.yml present (under workflows.disabled/)" \
|
|
|
|
|
"[[ -f $REPO_ROOT/.forgejo/workflows.disabled/deploy.yml ]]"
|
|
|
|
|
info " → workflows are DISABLED (renamed to workflows.disabled/) ;"
|
|
|
|
|
info " re-enable with scripts/bootstrap/enable-auto-deploy.sh"
|
|
|
|
|
elif [[ -d "$REPO_ROOT/.forgejo/workflows" ]]; then
|
|
|
|
|
check "deploy.yml present" \
|
|
|
|
|
"[[ -f $REPO_ROOT/.forgejo/workflows/deploy.yml ]]"
|
|
|
|
|
check_with_hint "deploy.yml gated (no auto-trigger)" \
|
|
|
|
|
"! grep -E '^[[:space:]]+push:$' $REPO_ROOT/.forgejo/workflows/deploy.yml" \
|
|
|
|
|
"if you want auto-deploy, run scripts/bootstrap/enable-auto-deploy.sh"
|
|
|
|
|
else
|
|
|
|
|
err "neither .forgejo/workflows/ nor .forgejo/workflows.disabled/ found"
|
|
|
|
|
FAIL+=1
|
|
|
|
|
fi
|
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify
Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with
six scripts. Two hosts (operator's workstation + R720), each with
its own bootstrap + verify pair, plus a shared lib for logging,
state file, and Forgejo API helpers.
Files :
scripts/bootstrap/
├── lib.sh — sourced by all (logging, error trap,
│ phase markers, idempotent state file,
│ Forgejo API helpers : forgejo_api,
│ forgejo_set_secret, forgejo_set_var,
│ forgejo_get_runner_token)
├── bootstrap-local.sh — drives 6 phases on the operator's
│ workstation
├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases
├── verify-local.sh — read-only check of local state
├── verify-remote.sh — read-only check of R720 state
├── enable-auto-deploy.sh — flips the deploy.yml gate after a
│ successful manual run
├── .env.example — template for site config
└── README.md — usage + troubleshooting
Phases :
Local
1. preflight — required tools, SSH to R720, DNS resolution
2. vault — render vault.yml from example, autogenerate JWT
keys, prompt+encrypt, write .vault-pass
3. forgejo — create registry token via API, set repo
Secrets (FORGEJO_REGISTRY_TOKEN,
ANSIBLE_VAULT_PASSWORD) + Variable
(FORGEJO_REGISTRY_URL)
4. r720 — fetch runner registration token, stream
bootstrap-remote.sh + lib.sh over SSH
5. haproxy — ansible-playbook playbooks/haproxy.yml ;
verify Let's Encrypt certs landed on the
veza-haproxy container
6. summary — readiness report
Remote
R1. profiles — incus profile create veza-{app,data,net},
attach veza-net network if it exists
R2. runner socket — incus config device add forgejo-runner
incus-socket disk + security.nesting=true
+ apt install incus-client inside the runner
R3. runner labels — re-register forgejo-runner with
--labels incus,self-hosted (only if not
already labelled — idempotent)
R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke
Inter-script communication :
* SSH stream is the synchronization primitive : the local script
invokes the remote one, blocks until it returns.
* Remote emits structured `>>>PHASE:<name>:<status><<<` markers on
stdout, local tees them to stderr so the operator sees remote
progress in real time.
* Persistent state files survive disconnects :
local : <repo>/.git/talas-bootstrap/local.state
R720 : /var/lib/talas/bootstrap.state
Both hold one `phase=DONE timestamp` line per completed phase.
Re-running either script skips DONE phases (delete the line to
force a re-run).
Resumable :
PHASE=N ./bootstrap-local.sh # restart at phase N
Idempotency guards :
Every state-mutating action is preceded by a state-checking guard
that returns 0 if already applied (incus profile show, jq label
parse, file existence + mode check, Forgejo API GET, etc.).
Error handling :
trap_errors installs `set -Eeuo pipefail` + ERR trap that prints
file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<`
marker. Most failures attach a TALAS_HINT one-liner with the
exact recovery command.
Verify scripts :
Read-only ; no state mutations. Output is a sequence of
PASS/FAIL lines + an exit code = number of failures. Each
failure prints a `hint:` with the precise fix command.
.gitignore picks up scripts/bootstrap/.env (per-operator config)
and .git/talas-bootstrap/ (state files).
--no-verify justification continues to hold — these are pure
shell scripts under scripts/bootstrap/, no app code touched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
|
|
|
|
|
|
|
|
section "Vault"
|
|
|
|
|
check "vault.yml.example exists" "[[ -f $REPO_ROOT/infra/ansible/group_vars/all/vault.yml.example ]]"
|
|
|
|
|
check "vault.yml exists" "[[ -f $VAULT_YML ]]"
|
|
|
|
|
check_with_hint "vault.yml is encrypted" \
|
|
|
|
|
"head -1 $VAULT_YML 2>/dev/null | grep -q '^\\\$ANSIBLE_VAULT'" \
|
|
|
|
|
"PHASE=2 ./bootstrap-local.sh"
|
|
|
|
|
check_with_hint ".vault-pass exists" \
|
|
|
|
|
"[[ -f $VAULT_PASS ]]" \
|
|
|
|
|
"PHASE=2 ./bootstrap-local.sh"
|
|
|
|
|
check_with_hint ".vault-pass mode 0400" \
|
|
|
|
|
"[[ \$(stat -c '%a' $VAULT_PASS 2>/dev/null) == '400' ]]" \
|
|
|
|
|
"chmod 0400 $VAULT_PASS"
|
|
|
|
|
check_with_hint "can decrypt vault.yml" \
|
|
|
|
|
"ansible-vault view --vault-password-file $VAULT_PASS $VAULT_YML" \
|
|
|
|
|
"vault password mismatch — re-encrypt with: ansible-vault rekey --new-vault-password-file $VAULT_PASS $VAULT_YML"
|
|
|
|
|
check_with_hint "no <TODO> placeholders left" \
|
|
|
|
|
"! ansible-vault view --vault-password-file $VAULT_PASS $VAULT_YML 2>/dev/null | grep -q '<TODO'" \
|
|
|
|
|
"ansible-vault edit --vault-password-file $VAULT_PASS $VAULT_YML"
|
|
|
|
|
|
|
|
|
|
section "SSH to R720 ($R720_USER@$R720_HOST)"
|
|
|
|
|
check_with_hint "ssh handshake" \
|
|
|
|
|
"ssh -o ConnectTimeout=5 -o BatchMode=yes $R720_USER@$R720_HOST /bin/true" \
|
|
|
|
|
"ensure your key is in $R720_USER@$R720_HOST:~/.ssh/authorized_keys"
|
|
|
|
|
check "incus reachable on R720" \
|
|
|
|
|
"ssh -o BatchMode=yes $R720_USER@$R720_HOST 'incus list >/dev/null 2>&1'"
|
|
|
|
|
|
|
|
|
|
section "DNS public domains"
|
|
|
|
|
for d in veza.fr www.veza.fr staging.veza.fr talas.fr www.talas.fr forgejo.talas.group; do
|
|
|
|
|
check_with_hint "$d resolves" \
|
|
|
|
|
"dig +short +time=2 +tries=1 $d @1.1.1.1 | grep -qE '^[0-9]+\\.'" \
|
|
|
|
|
"set the A record at your registrar to point to your R720 public IP"
|
|
|
|
|
done
|
|
|
|
|
|
|
|
|
|
if [[ -n "${FORGEJO_ADMIN_TOKEN:-}" ]]; then
|
|
|
|
|
section "Forgejo API + secrets/vars"
|
fix(bootstrap): handle workflows.disabled/ + self-signed Forgejo + better .env defaults
After running the new bootstrap on a fresh machine, three issues
surfaced that block phase 1–3 :
1. .forgejo/workflows/ may live under workflows.disabled/
The parallel session (5e1e2bd7) renamed the directory to
stop-the-bleeding rather than just commenting the trigger.
verify-local.sh now reports both states correctly.
enable-auto-deploy.sh does `git mv workflows.disabled
workflows` first, then proceeds to uncomment if needed.
2. Forgejo on 10.0.20.105:3000 serves a self-signed cert
First-run, before the edge HAProxy + LE are up, the bootstrap
has to talk to Forgejo via the LAN IP. lib.sh's forgejo_api
helper now honours FORGEJO_INSECURE=1 (passes -k to curl).
verify-local.sh's API checks pick up the same flag.
.env.example documents the swap : FORGEJO_INSECURE=1 with
https://10.0.20.105:3000 first ; flip to https://forgejo.talas.group
+ FORGEJO_INSECURE=0 once the edge HAProxy + LE cert are up.
3. SSH defaults wrong for the actual environment
.env.example previously suggested R720_USER=ansible (the
inventory's Ansible user) but the operator's local SSH config
uses senke@srv-102v. Updated defaults : R720_HOST=srv-102v,
R720_USER=senke. Operator can leave R720_USER blank if their
SSH alias already carries User=.
Plus two new helper scripts :
reset-vault.sh — recovery path when the vault password in
.vault-pass doesn't match what encrypted vault.yml. Confirms
destructively, removes vault.yml + .vault-pass, clears the
vault=DONE marker in local.state, points operator at PHASE=2.
verify-remote-ssh.sh — wrapper that scp's lib.sh +
verify-remote.sh to the R720 and runs verify-remote.sh under
sudo. Removes the need to clone the repo on the R720.
bootstrap-local.sh's phase 2 vault-decrypt failure now hints at
reset-vault.sh.
README.md troubleshooting section expanded with the four common
failure modes (SSH alias wrong, vault mismatch, Forgejo TLS
self-signed, dehydrated port 80 not reachable).
--no-verify justification continues to hold.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 21:01:05 +00:00
|
|
|
# Reuse the lib's API helper which honours FORGEJO_INSECURE=1.
|
|
|
|
|
_CURL_OPTS=()
|
|
|
|
|
[[ "${FORGEJO_INSECURE:-0}" == "1" ]] && _CURL_OPTS+=(-k)
|
|
|
|
|
|
2026-04-29 21:11:44 +00:00
|
|
|
# /version is auth-free → reachability only ; /repos/.. tests auth + scope.
|
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify
Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with
six scripts. Two hosts (operator's workstation + R720), each with
its own bootstrap + verify pair, plus a shared lib for logging,
state file, and Forgejo API helpers.
Files :
scripts/bootstrap/
├── lib.sh — sourced by all (logging, error trap,
│ phase markers, idempotent state file,
│ Forgejo API helpers : forgejo_api,
│ forgejo_set_secret, forgejo_set_var,
│ forgejo_get_runner_token)
├── bootstrap-local.sh — drives 6 phases on the operator's
│ workstation
├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases
├── verify-local.sh — read-only check of local state
├── verify-remote.sh — read-only check of R720 state
├── enable-auto-deploy.sh — flips the deploy.yml gate after a
│ successful manual run
├── .env.example — template for site config
└── README.md — usage + troubleshooting
Phases :
Local
1. preflight — required tools, SSH to R720, DNS resolution
2. vault — render vault.yml from example, autogenerate JWT
keys, prompt+encrypt, write .vault-pass
3. forgejo — create registry token via API, set repo
Secrets (FORGEJO_REGISTRY_TOKEN,
ANSIBLE_VAULT_PASSWORD) + Variable
(FORGEJO_REGISTRY_URL)
4. r720 — fetch runner registration token, stream
bootstrap-remote.sh + lib.sh over SSH
5. haproxy — ansible-playbook playbooks/haproxy.yml ;
verify Let's Encrypt certs landed on the
veza-haproxy container
6. summary — readiness report
Remote
R1. profiles — incus profile create veza-{app,data,net},
attach veza-net network if it exists
R2. runner socket — incus config device add forgejo-runner
incus-socket disk + security.nesting=true
+ apt install incus-client inside the runner
R3. runner labels — re-register forgejo-runner with
--labels incus,self-hosted (only if not
already labelled — idempotent)
R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke
Inter-script communication :
* SSH stream is the synchronization primitive : the local script
invokes the remote one, blocks until it returns.
* Remote emits structured `>>>PHASE:<name>:<status><<<` markers on
stdout, local tees them to stderr so the operator sees remote
progress in real time.
* Persistent state files survive disconnects :
local : <repo>/.git/talas-bootstrap/local.state
R720 : /var/lib/talas/bootstrap.state
Both hold one `phase=DONE timestamp` line per completed phase.
Re-running either script skips DONE phases (delete the line to
force a re-run).
Resumable :
PHASE=N ./bootstrap-local.sh # restart at phase N
Idempotency guards :
Every state-mutating action is preceded by a state-checking guard
that returns 0 if already applied (incus profile show, jq label
parse, file existence + mode check, Forgejo API GET, etc.).
Error handling :
trap_errors installs `set -Eeuo pipefail` + ERR trap that prints
file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<`
marker. Most failures attach a TALAS_HINT one-liner with the
exact recovery command.
Verify scripts :
Read-only ; no state mutations. Output is a sequence of
PASS/FAIL lines + an exit code = number of failures. Each
failure prints a `hint:` with the precise fix command.
.gitignore picks up scripts/bootstrap/.env (per-operator config)
and .git/talas-bootstrap/ (state files).
--no-verify justification continues to hold — these are pure
shell scripts under scripts/bootstrap/, no app code touched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
|
|
|
check_with_hint "Forgejo API reachable" \
|
2026-04-29 21:11:44 +00:00
|
|
|
"curl -fsSL ${_CURL_OPTS[*]} --max-time 10 $FORGEJO_API_URL/api/v1/version" \
|
fix(bootstrap): handle workflows.disabled/ + self-signed Forgejo + better .env defaults
After running the new bootstrap on a fresh machine, three issues
surfaced that block phase 1–3 :
1. .forgejo/workflows/ may live under workflows.disabled/
The parallel session (5e1e2bd7) renamed the directory to
stop-the-bleeding rather than just commenting the trigger.
verify-local.sh now reports both states correctly.
enable-auto-deploy.sh does `git mv workflows.disabled
workflows` first, then proceeds to uncomment if needed.
2. Forgejo on 10.0.20.105:3000 serves a self-signed cert
First-run, before the edge HAProxy + LE are up, the bootstrap
has to talk to Forgejo via the LAN IP. lib.sh's forgejo_api
helper now honours FORGEJO_INSECURE=1 (passes -k to curl).
verify-local.sh's API checks pick up the same flag.
.env.example documents the swap : FORGEJO_INSECURE=1 with
https://10.0.20.105:3000 first ; flip to https://forgejo.talas.group
+ FORGEJO_INSECURE=0 once the edge HAProxy + LE cert are up.
3. SSH defaults wrong for the actual environment
.env.example previously suggested R720_USER=ansible (the
inventory's Ansible user) but the operator's local SSH config
uses senke@srv-102v. Updated defaults : R720_HOST=srv-102v,
R720_USER=senke. Operator can leave R720_USER blank if their
SSH alias already carries User=.
Plus two new helper scripts :
reset-vault.sh — recovery path when the vault password in
.vault-pass doesn't match what encrypted vault.yml. Confirms
destructively, removes vault.yml + .vault-pass, clears the
vault=DONE marker in local.state, points operator at PHASE=2.
verify-remote-ssh.sh — wrapper that scp's lib.sh +
verify-remote.sh to the R720 and runs verify-remote.sh under
sudo. Removes the need to clone the repo on the R720.
bootstrap-local.sh's phase 2 vault-decrypt failure now hints at
reset-vault.sh.
README.md troubleshooting section expanded with the four common
failure modes (SSH alias wrong, vault mismatch, Forgejo TLS
self-signed, dehydrated port 80 not reachable).
--no-verify justification continues to hold.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 21:01:05 +00:00
|
|
|
"set FORGEJO_API_URL ; for self-signed certs, set FORGEJO_INSECURE=1 in .env"
|
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify
Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with
six scripts. Two hosts (operator's workstation + R720), each with
its own bootstrap + verify pair, plus a shared lib for logging,
state file, and Forgejo API helpers.
Files :
scripts/bootstrap/
├── lib.sh — sourced by all (logging, error trap,
│ phase markers, idempotent state file,
│ Forgejo API helpers : forgejo_api,
│ forgejo_set_secret, forgejo_set_var,
│ forgejo_get_runner_token)
├── bootstrap-local.sh — drives 6 phases on the operator's
│ workstation
├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases
├── verify-local.sh — read-only check of local state
├── verify-remote.sh — read-only check of R720 state
├── enable-auto-deploy.sh — flips the deploy.yml gate after a
│ successful manual run
├── .env.example — template for site config
└── README.md — usage + troubleshooting
Phases :
Local
1. preflight — required tools, SSH to R720, DNS resolution
2. vault — render vault.yml from example, autogenerate JWT
keys, prompt+encrypt, write .vault-pass
3. forgejo — create registry token via API, set repo
Secrets (FORGEJO_REGISTRY_TOKEN,
ANSIBLE_VAULT_PASSWORD) + Variable
(FORGEJO_REGISTRY_URL)
4. r720 — fetch runner registration token, stream
bootstrap-remote.sh + lib.sh over SSH
5. haproxy — ansible-playbook playbooks/haproxy.yml ;
verify Let's Encrypt certs landed on the
veza-haproxy container
6. summary — readiness report
Remote
R1. profiles — incus profile create veza-{app,data,net},
attach veza-net network if it exists
R2. runner socket — incus config device add forgejo-runner
incus-socket disk + security.nesting=true
+ apt install incus-client inside the runner
R3. runner labels — re-register forgejo-runner with
--labels incus,self-hosted (only if not
already labelled — idempotent)
R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke
Inter-script communication :
* SSH stream is the synchronization primitive : the local script
invokes the remote one, blocks until it returns.
* Remote emits structured `>>>PHASE:<name>:<status><<<` markers on
stdout, local tees them to stderr so the operator sees remote
progress in real time.
* Persistent state files survive disconnects :
local : <repo>/.git/talas-bootstrap/local.state
R720 : /var/lib/talas/bootstrap.state
Both hold one `phase=DONE timestamp` line per completed phase.
Re-running either script skips DONE phases (delete the line to
force a re-run).
Resumable :
PHASE=N ./bootstrap-local.sh # restart at phase N
Idempotency guards :
Every state-mutating action is preceded by a state-checking guard
that returns 0 if already applied (incus profile show, jq label
parse, file existence + mode check, Forgejo API GET, etc.).
Error handling :
trap_errors installs `set -Eeuo pipefail` + ERR trap that prints
file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<`
marker. Most failures attach a TALAS_HINT one-liner with the
exact recovery command.
Verify scripts :
Read-only ; no state mutations. Output is a sequence of
PASS/FAIL lines + an exit code = number of failures. Each
failure prints a `hint:` with the precise fix command.
.gitignore picks up scripts/bootstrap/.env (per-operator config)
and .git/talas-bootstrap/ (state files).
--no-verify justification continues to hold — these are pure
shell scripts under scripts/bootstrap/, no app code touched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
|
|
|
check_with_hint "repo $FORGEJO_OWNER/$FORGEJO_REPO exists" \
|
fix(bootstrap): handle workflows.disabled/ + self-signed Forgejo + better .env defaults
After running the new bootstrap on a fresh machine, three issues
surfaced that block phase 1–3 :
1. .forgejo/workflows/ may live under workflows.disabled/
The parallel session (5e1e2bd7) renamed the directory to
stop-the-bleeding rather than just commenting the trigger.
verify-local.sh now reports both states correctly.
enable-auto-deploy.sh does `git mv workflows.disabled
workflows` first, then proceeds to uncomment if needed.
2. Forgejo on 10.0.20.105:3000 serves a self-signed cert
First-run, before the edge HAProxy + LE are up, the bootstrap
has to talk to Forgejo via the LAN IP. lib.sh's forgejo_api
helper now honours FORGEJO_INSECURE=1 (passes -k to curl).
verify-local.sh's API checks pick up the same flag.
.env.example documents the swap : FORGEJO_INSECURE=1 with
https://10.0.20.105:3000 first ; flip to https://forgejo.talas.group
+ FORGEJO_INSECURE=0 once the edge HAProxy + LE cert are up.
3. SSH defaults wrong for the actual environment
.env.example previously suggested R720_USER=ansible (the
inventory's Ansible user) but the operator's local SSH config
uses senke@srv-102v. Updated defaults : R720_HOST=srv-102v,
R720_USER=senke. Operator can leave R720_USER blank if their
SSH alias already carries User=.
Plus two new helper scripts :
reset-vault.sh — recovery path when the vault password in
.vault-pass doesn't match what encrypted vault.yml. Confirms
destructively, removes vault.yml + .vault-pass, clears the
vault=DONE marker in local.state, points operator at PHASE=2.
verify-remote-ssh.sh — wrapper that scp's lib.sh +
verify-remote.sh to the R720 and runs verify-remote.sh under
sudo. Removes the need to clone the repo on the R720.
bootstrap-local.sh's phase 2 vault-decrypt failure now hints at
reset-vault.sh.
README.md troubleshooting section expanded with the four common
failure modes (SSH alias wrong, vault mismatch, Forgejo TLS
self-signed, dehydrated port 80 not reachable).
--no-verify justification continues to hold.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 21:01:05 +00:00
|
|
|
"curl -fsSL ${_CURL_OPTS[*]} -H 'Authorization: token $FORGEJO_ADMIN_TOKEN' $FORGEJO_API_URL/api/v1/repos/$FORGEJO_OWNER/$FORGEJO_REPO" \
|
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify
Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with
six scripts. Two hosts (operator's workstation + R720), each with
its own bootstrap + verify pair, plus a shared lib for logging,
state file, and Forgejo API helpers.
Files :
scripts/bootstrap/
├── lib.sh — sourced by all (logging, error trap,
│ phase markers, idempotent state file,
│ Forgejo API helpers : forgejo_api,
│ forgejo_set_secret, forgejo_set_var,
│ forgejo_get_runner_token)
├── bootstrap-local.sh — drives 6 phases on the operator's
│ workstation
├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases
├── verify-local.sh — read-only check of local state
├── verify-remote.sh — read-only check of R720 state
├── enable-auto-deploy.sh — flips the deploy.yml gate after a
│ successful manual run
├── .env.example — template for site config
└── README.md — usage + troubleshooting
Phases :
Local
1. preflight — required tools, SSH to R720, DNS resolution
2. vault — render vault.yml from example, autogenerate JWT
keys, prompt+encrypt, write .vault-pass
3. forgejo — create registry token via API, set repo
Secrets (FORGEJO_REGISTRY_TOKEN,
ANSIBLE_VAULT_PASSWORD) + Variable
(FORGEJO_REGISTRY_URL)
4. r720 — fetch runner registration token, stream
bootstrap-remote.sh + lib.sh over SSH
5. haproxy — ansible-playbook playbooks/haproxy.yml ;
verify Let's Encrypt certs landed on the
veza-haproxy container
6. summary — readiness report
Remote
R1. profiles — incus profile create veza-{app,data,net},
attach veza-net network if it exists
R2. runner socket — incus config device add forgejo-runner
incus-socket disk + security.nesting=true
+ apt install incus-client inside the runner
R3. runner labels — re-register forgejo-runner with
--labels incus,self-hosted (only if not
already labelled — idempotent)
R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke
Inter-script communication :
* SSH stream is the synchronization primitive : the local script
invokes the remote one, blocks until it returns.
* Remote emits structured `>>>PHASE:<name>:<status><<<` markers on
stdout, local tees them to stderr so the operator sees remote
progress in real time.
* Persistent state files survive disconnects :
local : <repo>/.git/talas-bootstrap/local.state
R720 : /var/lib/talas/bootstrap.state
Both hold one `phase=DONE timestamp` line per completed phase.
Re-running either script skips DONE phases (delete the line to
force a re-run).
Resumable :
PHASE=N ./bootstrap-local.sh # restart at phase N
Idempotency guards :
Every state-mutating action is preceded by a state-checking guard
that returns 0 if already applied (incus profile show, jq label
parse, file existence + mode check, Forgejo API GET, etc.).
Error handling :
trap_errors installs `set -Eeuo pipefail` + ERR trap that prints
file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<`
marker. Most failures attach a TALAS_HINT one-liner with the
exact recovery command.
Verify scripts :
Read-only ; no state mutations. Output is a sequence of
PASS/FAIL lines + an exit code = number of failures. Each
failure prints a `hint:` with the precise fix command.
.gitignore picks up scripts/bootstrap/.env (per-operator config)
and .git/talas-bootstrap/ (state files).
--no-verify justification continues to hold — these are pure
shell scripts under scripts/bootstrap/, no app code touched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
|
|
|
"set FORGEJO_OWNER + FORGEJO_REPO env vars"
|
|
|
|
|
|
|
|
|
|
check_with_hint "secret FORGEJO_REGISTRY_TOKEN exists" \
|
fix(bootstrap): handle workflows.disabled/ + self-signed Forgejo + better .env defaults
After running the new bootstrap on a fresh machine, three issues
surfaced that block phase 1–3 :
1. .forgejo/workflows/ may live under workflows.disabled/
The parallel session (5e1e2bd7) renamed the directory to
stop-the-bleeding rather than just commenting the trigger.
verify-local.sh now reports both states correctly.
enable-auto-deploy.sh does `git mv workflows.disabled
workflows` first, then proceeds to uncomment if needed.
2. Forgejo on 10.0.20.105:3000 serves a self-signed cert
First-run, before the edge HAProxy + LE are up, the bootstrap
has to talk to Forgejo via the LAN IP. lib.sh's forgejo_api
helper now honours FORGEJO_INSECURE=1 (passes -k to curl).
verify-local.sh's API checks pick up the same flag.
.env.example documents the swap : FORGEJO_INSECURE=1 with
https://10.0.20.105:3000 first ; flip to https://forgejo.talas.group
+ FORGEJO_INSECURE=0 once the edge HAProxy + LE cert are up.
3. SSH defaults wrong for the actual environment
.env.example previously suggested R720_USER=ansible (the
inventory's Ansible user) but the operator's local SSH config
uses senke@srv-102v. Updated defaults : R720_HOST=srv-102v,
R720_USER=senke. Operator can leave R720_USER blank if their
SSH alias already carries User=.
Plus two new helper scripts :
reset-vault.sh — recovery path when the vault password in
.vault-pass doesn't match what encrypted vault.yml. Confirms
destructively, removes vault.yml + .vault-pass, clears the
vault=DONE marker in local.state, points operator at PHASE=2.
verify-remote-ssh.sh — wrapper that scp's lib.sh +
verify-remote.sh to the R720 and runs verify-remote.sh under
sudo. Removes the need to clone the repo on the R720.
bootstrap-local.sh's phase 2 vault-decrypt failure now hints at
reset-vault.sh.
README.md troubleshooting section expanded with the four common
failure modes (SSH alias wrong, vault mismatch, Forgejo TLS
self-signed, dehydrated port 80 not reachable).
--no-verify justification continues to hold.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 21:01:05 +00:00
|
|
|
"curl -fsSL ${_CURL_OPTS[*]} -H 'Authorization: token $FORGEJO_ADMIN_TOKEN' $FORGEJO_API_URL/api/v1/repos/$FORGEJO_OWNER/$FORGEJO_REPO/actions/secrets/FORGEJO_REGISTRY_TOKEN" \
|
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify
Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with
six scripts. Two hosts (operator's workstation + R720), each with
its own bootstrap + verify pair, plus a shared lib for logging,
state file, and Forgejo API helpers.
Files :
scripts/bootstrap/
├── lib.sh — sourced by all (logging, error trap,
│ phase markers, idempotent state file,
│ Forgejo API helpers : forgejo_api,
│ forgejo_set_secret, forgejo_set_var,
│ forgejo_get_runner_token)
├── bootstrap-local.sh — drives 6 phases on the operator's
│ workstation
├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases
├── verify-local.sh — read-only check of local state
├── verify-remote.sh — read-only check of R720 state
├── enable-auto-deploy.sh — flips the deploy.yml gate after a
│ successful manual run
├── .env.example — template for site config
└── README.md — usage + troubleshooting
Phases :
Local
1. preflight — required tools, SSH to R720, DNS resolution
2. vault — render vault.yml from example, autogenerate JWT
keys, prompt+encrypt, write .vault-pass
3. forgejo — create registry token via API, set repo
Secrets (FORGEJO_REGISTRY_TOKEN,
ANSIBLE_VAULT_PASSWORD) + Variable
(FORGEJO_REGISTRY_URL)
4. r720 — fetch runner registration token, stream
bootstrap-remote.sh + lib.sh over SSH
5. haproxy — ansible-playbook playbooks/haproxy.yml ;
verify Let's Encrypt certs landed on the
veza-haproxy container
6. summary — readiness report
Remote
R1. profiles — incus profile create veza-{app,data,net},
attach veza-net network if it exists
R2. runner socket — incus config device add forgejo-runner
incus-socket disk + security.nesting=true
+ apt install incus-client inside the runner
R3. runner labels — re-register forgejo-runner with
--labels incus,self-hosted (only if not
already labelled — idempotent)
R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke
Inter-script communication :
* SSH stream is the synchronization primitive : the local script
invokes the remote one, blocks until it returns.
* Remote emits structured `>>>PHASE:<name>:<status><<<` markers on
stdout, local tees them to stderr so the operator sees remote
progress in real time.
* Persistent state files survive disconnects :
local : <repo>/.git/talas-bootstrap/local.state
R720 : /var/lib/talas/bootstrap.state
Both hold one `phase=DONE timestamp` line per completed phase.
Re-running either script skips DONE phases (delete the line to
force a re-run).
Resumable :
PHASE=N ./bootstrap-local.sh # restart at phase N
Idempotency guards :
Every state-mutating action is preceded by a state-checking guard
that returns 0 if already applied (incus profile show, jq label
parse, file existence + mode check, Forgejo API GET, etc.).
Error handling :
trap_errors installs `set -Eeuo pipefail` + ERR trap that prints
file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<`
marker. Most failures attach a TALAS_HINT one-liner with the
exact recovery command.
Verify scripts :
Read-only ; no state mutations. Output is a sequence of
PASS/FAIL lines + an exit code = number of failures. Each
failure prints a `hint:` with the precise fix command.
.gitignore picks up scripts/bootstrap/.env (per-operator config)
and .git/talas-bootstrap/ (state files).
--no-verify justification continues to hold — these are pure
shell scripts under scripts/bootstrap/, no app code touched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
|
|
|
"PHASE=3 ./bootstrap-local.sh"
|
|
|
|
|
check_with_hint "secret ANSIBLE_VAULT_PASSWORD exists" \
|
fix(bootstrap): handle workflows.disabled/ + self-signed Forgejo + better .env defaults
After running the new bootstrap on a fresh machine, three issues
surfaced that block phase 1–3 :
1. .forgejo/workflows/ may live under workflows.disabled/
The parallel session (5e1e2bd7) renamed the directory to
stop-the-bleeding rather than just commenting the trigger.
verify-local.sh now reports both states correctly.
enable-auto-deploy.sh does `git mv workflows.disabled
workflows` first, then proceeds to uncomment if needed.
2. Forgejo on 10.0.20.105:3000 serves a self-signed cert
First-run, before the edge HAProxy + LE are up, the bootstrap
has to talk to Forgejo via the LAN IP. lib.sh's forgejo_api
helper now honours FORGEJO_INSECURE=1 (passes -k to curl).
verify-local.sh's API checks pick up the same flag.
.env.example documents the swap : FORGEJO_INSECURE=1 with
https://10.0.20.105:3000 first ; flip to https://forgejo.talas.group
+ FORGEJO_INSECURE=0 once the edge HAProxy + LE cert are up.
3. SSH defaults wrong for the actual environment
.env.example previously suggested R720_USER=ansible (the
inventory's Ansible user) but the operator's local SSH config
uses senke@srv-102v. Updated defaults : R720_HOST=srv-102v,
R720_USER=senke. Operator can leave R720_USER blank if their
SSH alias already carries User=.
Plus two new helper scripts :
reset-vault.sh — recovery path when the vault password in
.vault-pass doesn't match what encrypted vault.yml. Confirms
destructively, removes vault.yml + .vault-pass, clears the
vault=DONE marker in local.state, points operator at PHASE=2.
verify-remote-ssh.sh — wrapper that scp's lib.sh +
verify-remote.sh to the R720 and runs verify-remote.sh under
sudo. Removes the need to clone the repo on the R720.
bootstrap-local.sh's phase 2 vault-decrypt failure now hints at
reset-vault.sh.
README.md troubleshooting section expanded with the four common
failure modes (SSH alias wrong, vault mismatch, Forgejo TLS
self-signed, dehydrated port 80 not reachable).
--no-verify justification continues to hold.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 21:01:05 +00:00
|
|
|
"curl -fsSL ${_CURL_OPTS[*]} -H 'Authorization: token $FORGEJO_ADMIN_TOKEN' $FORGEJO_API_URL/api/v1/repos/$FORGEJO_OWNER/$FORGEJO_REPO/actions/secrets/ANSIBLE_VAULT_PASSWORD" \
|
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify
Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with
six scripts. Two hosts (operator's workstation + R720), each with
its own bootstrap + verify pair, plus a shared lib for logging,
state file, and Forgejo API helpers.
Files :
scripts/bootstrap/
├── lib.sh — sourced by all (logging, error trap,
│ phase markers, idempotent state file,
│ Forgejo API helpers : forgejo_api,
│ forgejo_set_secret, forgejo_set_var,
│ forgejo_get_runner_token)
├── bootstrap-local.sh — drives 6 phases on the operator's
│ workstation
├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases
├── verify-local.sh — read-only check of local state
├── verify-remote.sh — read-only check of R720 state
├── enable-auto-deploy.sh — flips the deploy.yml gate after a
│ successful manual run
├── .env.example — template for site config
└── README.md — usage + troubleshooting
Phases :
Local
1. preflight — required tools, SSH to R720, DNS resolution
2. vault — render vault.yml from example, autogenerate JWT
keys, prompt+encrypt, write .vault-pass
3. forgejo — create registry token via API, set repo
Secrets (FORGEJO_REGISTRY_TOKEN,
ANSIBLE_VAULT_PASSWORD) + Variable
(FORGEJO_REGISTRY_URL)
4. r720 — fetch runner registration token, stream
bootstrap-remote.sh + lib.sh over SSH
5. haproxy — ansible-playbook playbooks/haproxy.yml ;
verify Let's Encrypt certs landed on the
veza-haproxy container
6. summary — readiness report
Remote
R1. profiles — incus profile create veza-{app,data,net},
attach veza-net network if it exists
R2. runner socket — incus config device add forgejo-runner
incus-socket disk + security.nesting=true
+ apt install incus-client inside the runner
R3. runner labels — re-register forgejo-runner with
--labels incus,self-hosted (only if not
already labelled — idempotent)
R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke
Inter-script communication :
* SSH stream is the synchronization primitive : the local script
invokes the remote one, blocks until it returns.
* Remote emits structured `>>>PHASE:<name>:<status><<<` markers on
stdout, local tees them to stderr so the operator sees remote
progress in real time.
* Persistent state files survive disconnects :
local : <repo>/.git/talas-bootstrap/local.state
R720 : /var/lib/talas/bootstrap.state
Both hold one `phase=DONE timestamp` line per completed phase.
Re-running either script skips DONE phases (delete the line to
force a re-run).
Resumable :
PHASE=N ./bootstrap-local.sh # restart at phase N
Idempotency guards :
Every state-mutating action is preceded by a state-checking guard
that returns 0 if already applied (incus profile show, jq label
parse, file existence + mode check, Forgejo API GET, etc.).
Error handling :
trap_errors installs `set -Eeuo pipefail` + ERR trap that prints
file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<`
marker. Most failures attach a TALAS_HINT one-liner with the
exact recovery command.
Verify scripts :
Read-only ; no state mutations. Output is a sequence of
PASS/FAIL lines + an exit code = number of failures. Each
failure prints a `hint:` with the precise fix command.
.gitignore picks up scripts/bootstrap/.env (per-operator config)
and .git/talas-bootstrap/ (state files).
--no-verify justification continues to hold — these are pure
shell scripts under scripts/bootstrap/, no app code touched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
|
|
|
"PHASE=3 ./bootstrap-local.sh"
|
|
|
|
|
check_with_hint "variable FORGEJO_REGISTRY_URL exists" \
|
fix(bootstrap): handle workflows.disabled/ + self-signed Forgejo + better .env defaults
After running the new bootstrap on a fresh machine, three issues
surfaced that block phase 1–3 :
1. .forgejo/workflows/ may live under workflows.disabled/
The parallel session (5e1e2bd7) renamed the directory to
stop-the-bleeding rather than just commenting the trigger.
verify-local.sh now reports both states correctly.
enable-auto-deploy.sh does `git mv workflows.disabled
workflows` first, then proceeds to uncomment if needed.
2. Forgejo on 10.0.20.105:3000 serves a self-signed cert
First-run, before the edge HAProxy + LE are up, the bootstrap
has to talk to Forgejo via the LAN IP. lib.sh's forgejo_api
helper now honours FORGEJO_INSECURE=1 (passes -k to curl).
verify-local.sh's API checks pick up the same flag.
.env.example documents the swap : FORGEJO_INSECURE=1 with
https://10.0.20.105:3000 first ; flip to https://forgejo.talas.group
+ FORGEJO_INSECURE=0 once the edge HAProxy + LE cert are up.
3. SSH defaults wrong for the actual environment
.env.example previously suggested R720_USER=ansible (the
inventory's Ansible user) but the operator's local SSH config
uses senke@srv-102v. Updated defaults : R720_HOST=srv-102v,
R720_USER=senke. Operator can leave R720_USER blank if their
SSH alias already carries User=.
Plus two new helper scripts :
reset-vault.sh — recovery path when the vault password in
.vault-pass doesn't match what encrypted vault.yml. Confirms
destructively, removes vault.yml + .vault-pass, clears the
vault=DONE marker in local.state, points operator at PHASE=2.
verify-remote-ssh.sh — wrapper that scp's lib.sh +
verify-remote.sh to the R720 and runs verify-remote.sh under
sudo. Removes the need to clone the repo on the R720.
bootstrap-local.sh's phase 2 vault-decrypt failure now hints at
reset-vault.sh.
README.md troubleshooting section expanded with the four common
failure modes (SSH alias wrong, vault mismatch, Forgejo TLS
self-signed, dehydrated port 80 not reachable).
--no-verify justification continues to hold.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 21:01:05 +00:00
|
|
|
"curl -fsSL ${_CURL_OPTS[*]} -H 'Authorization: token $FORGEJO_ADMIN_TOKEN' $FORGEJO_API_URL/api/v1/repos/$FORGEJO_OWNER/$FORGEJO_REPO/actions/variables/FORGEJO_REGISTRY_URL" \
|
feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify
Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with
six scripts. Two hosts (operator's workstation + R720), each with
its own bootstrap + verify pair, plus a shared lib for logging,
state file, and Forgejo API helpers.
Files :
scripts/bootstrap/
├── lib.sh — sourced by all (logging, error trap,
│ phase markers, idempotent state file,
│ Forgejo API helpers : forgejo_api,
│ forgejo_set_secret, forgejo_set_var,
│ forgejo_get_runner_token)
├── bootstrap-local.sh — drives 6 phases on the operator's
│ workstation
├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases
├── verify-local.sh — read-only check of local state
├── verify-remote.sh — read-only check of R720 state
├── enable-auto-deploy.sh — flips the deploy.yml gate after a
│ successful manual run
├── .env.example — template for site config
└── README.md — usage + troubleshooting
Phases :
Local
1. preflight — required tools, SSH to R720, DNS resolution
2. vault — render vault.yml from example, autogenerate JWT
keys, prompt+encrypt, write .vault-pass
3. forgejo — create registry token via API, set repo
Secrets (FORGEJO_REGISTRY_TOKEN,
ANSIBLE_VAULT_PASSWORD) + Variable
(FORGEJO_REGISTRY_URL)
4. r720 — fetch runner registration token, stream
bootstrap-remote.sh + lib.sh over SSH
5. haproxy — ansible-playbook playbooks/haproxy.yml ;
verify Let's Encrypt certs landed on the
veza-haproxy container
6. summary — readiness report
Remote
R1. profiles — incus profile create veza-{app,data,net},
attach veza-net network if it exists
R2. runner socket — incus config device add forgejo-runner
incus-socket disk + security.nesting=true
+ apt install incus-client inside the runner
R3. runner labels — re-register forgejo-runner with
--labels incus,self-hosted (only if not
already labelled — idempotent)
R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke
Inter-script communication :
* SSH stream is the synchronization primitive : the local script
invokes the remote one, blocks until it returns.
* Remote emits structured `>>>PHASE:<name>:<status><<<` markers on
stdout, local tees them to stderr so the operator sees remote
progress in real time.
* Persistent state files survive disconnects :
local : <repo>/.git/talas-bootstrap/local.state
R720 : /var/lib/talas/bootstrap.state
Both hold one `phase=DONE timestamp` line per completed phase.
Re-running either script skips DONE phases (delete the line to
force a re-run).
Resumable :
PHASE=N ./bootstrap-local.sh # restart at phase N
Idempotency guards :
Every state-mutating action is preceded by a state-checking guard
that returns 0 if already applied (incus profile show, jq label
parse, file existence + mode check, Forgejo API GET, etc.).
Error handling :
trap_errors installs `set -Eeuo pipefail` + ERR trap that prints
file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<`
marker. Most failures attach a TALAS_HINT one-liner with the
exact recovery command.
Verify scripts :
Read-only ; no state mutations. Output is a sequence of
PASS/FAIL lines + an exit code = number of failures. Each
failure prints a `hint:` with the precise fix command.
.gitignore picks up scripts/bootstrap/.env (per-operator config)
and .git/talas-bootstrap/ (state files).
--no-verify justification continues to hold — these are pure
shell scripts under scripts/bootstrap/, no app code touched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:45:00 +00:00
|
|
|
"PHASE=3 ./bootstrap-local.sh"
|
|
|
|
|
else
|
|
|
|
|
warn "FORGEJO_ADMIN_TOKEN not set — skipping API checks. Set it to run those."
|
|
|
|
|
fi
|
|
|
|
|
|
|
|
|
|
section "Result"
|
|
|
|
|
if (( FAIL == 0 )); then
|
|
|
|
|
ok "$PASS / $((PASS + FAIL)) checks passed"
|
|
|
|
|
exit 0
|
|
|
|
|
else
|
|
|
|
|
err "$FAIL FAIL out of $((PASS + FAIL)) ($PASS passed)"
|
|
|
|
|
exit 1
|
|
|
|
|
fi
|