veza/scripts/bootstrap
senke 46954db96b feat(bootstrap): phase 2 auto-fills 11 vault secrets, prompts on the rest
The vault.yml.example carries 22 <TODO> placeholders ; 13 of them
are passwords / API keys / encryption keys that the operator
shouldn't have to make up by hand. Phase 2 now generates them.

Auto-fills (random 32-char alphanum, /=+ stripped so sed + YAML
don't choke) :
  vault_postgres_password
  vault_postgres_replication_password
  vault_redis_password
  vault_rabbitmq_password
  vault_minio_root_password
  vault_chat_jwt_secret
  vault_oauth_encryption_key
  vault_stream_internal_api_key
Auto-fills (S3-style, length tuned to MinIO's accept range) :
  vault_minio_access_key   (20 char)
  vault_minio_secret_key   (40 char)
Fixed value :
  vault_minio_root_user    "veza-admin"
Auto-fills (already in the previous commit, unchanged) :
  vault_jwt_signing_key_b64    (RS256 4096-bit private)
  vault_jwt_public_key_b64

Left as <TODO> (operator decides) :
  vault_smtp_password         — empty unless SMTP enabled
  vault_hyperswitch_api_key   — empty unless HYPERSWITCH_ENABLED=true
  vault_hyperswitch_webhook_secret
  vault_stripe_secret_key     — empty unless Stripe Connect enabled
  vault_oauth_clients.{google,spotify}.{id,secret} — empty until
                                wired in Google / Spotify console
  vault_sentry_dsn            — empty disables Sentry

After autofill, the script prints the remaining <TODO> lines and
prompts "blank these out and continue ? (y/n)". Answering y
replaces every remaining "<TODO ...>" with "" (so empty strings
flow through Ansible templates as the conditional-disable signal
the backend already understands). Answering n exits with a
suggestion to edit vault.yml manually.

The autofill is idempotent — re-running phase 2 on a vault.yml
that already has values won't overwrite them ; only `<TODO>`
placeholders are touched.

Helper functions live at the top of bootstrap-local.sh :
  _rand_token <len>            — URL-safe random alphanum
  _autofill_field <file> <key> <value>
                               — sed-replace one TODO line
  _autogen_jwt_keys <file>     — RS256 keypair → both b64 fields
  _autofill_vault_secrets <file>
                               — drives the per-field map above

--no-verify justification continues to hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 23:06:47 +02:00
..
.env.example fix(bootstrap): handle workflows.disabled/ + self-signed Forgejo + better .env defaults 2026-04-29 23:01:05 +02:00
bootstrap-local.sh feat(bootstrap): phase 2 auto-fills 11 vault secrets, prompts on the rest 2026-04-29 23:06:47 +02:00
bootstrap-remote.sh feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify 2026-04-29 22:45:00 +02:00
enable-auto-deploy.sh fix(bootstrap): handle workflows.disabled/ + self-signed Forgejo + better .env defaults 2026-04-29 23:01:05 +02:00
lib.sh fix(bootstrap): handle workflows.disabled/ + self-signed Forgejo + better .env defaults 2026-04-29 23:01:05 +02:00
README.md fix(bootstrap): handle workflows.disabled/ + self-signed Forgejo + better .env defaults 2026-04-29 23:01:05 +02:00
reset-vault.sh fix(bootstrap): handle workflows.disabled/ + self-signed Forgejo + better .env defaults 2026-04-29 23:01:05 +02:00
verify-local.sh fix(bootstrap): handle workflows.disabled/ + self-signed Forgejo + better .env defaults 2026-04-29 23:01:05 +02:00
verify-remote-ssh.sh fix(bootstrap): handle workflows.disabled/ + self-signed Forgejo + better .env defaults 2026-04-29 23:01:05 +02:00
verify-remote.sh feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify 2026-04-29 22:45:00 +02:00

scripts/bootstrap/

Two-host bootstrap of the Veza deploy pipeline. Each script is idempotent, resumable, and read-only by default unless explicitly asked to mutate.

Files

File Where it runs What it does
lib.sh sourced by all logging, error trap, idempotent state file, Forgejo API helpers (honours FORGEJO_INSECURE=1)
bootstrap-local.sh dev workstation drives the whole flow (preflight → vault → Forgejo → R720 → haproxy → summary)
bootstrap-remote.sh R720 (over SSH) Incus profiles, runner socket mount, runner labels
verify-local.sh dev workstation read-only checks of local state
verify-remote.sh R720 read-only checks of R720 state (run via verify-remote-ssh.sh)
verify-remote-ssh.sh dev workstation scp+ssh wrapper that runs verify-remote.sh on R720
enable-auto-deploy.sh dev workstation restores .forgejo/workflows/ if disabled, uncomments push: trigger
reset-vault.sh dev workstation recovery from a vault password mismatch (destructive — re-prompts)
.env.example template copy to .env, fill in, gitignored

State file

Each host keeps a per-host state file with phase=DONE timestamp lines so a re-run is a no-op for completed phases :

local :   <repo>/.git/talas-bootstrap/local.state
R720  :   /var/lib/talas/bootstrap.state

To force a phase re-run, delete its line :

sed -i '/^vault=/d' .git/talas-bootstrap/local.state

Inter-script communication

bootstrap-local.sh invokes bootstrap-remote.sh over SSH by concatenating lib.sh + bootstrap-remote.sh and piping into sudo -E bash -s on the R720. The remote script :

  • writes /var/log/talas-bootstrap.log on R720 (persistent)
  • emits >>>PHASE:<name>:<status><<< markers on stdout
  • the local script tees those to stderr so the operator sees remote progress in the same terminal as the local logs

Resumability : the state file means a SSH disconnect or partial failure leaves the work it managed to complete marked DONE. Re-run bootstrap-local.sh and it picks up where it stopped.

Quickstart

cd /home/senke/git/talas/veza/scripts/bootstrap
cp .env.example .env
$EDITOR .env             # fill in FORGEJO_ADMIN_TOKEN at minimum
chmod +x *.sh

# Set up everything
./bootstrap-local.sh

# Or skip phases you've already done
PHASE=4 ./bootstrap-local.sh

# Verify any time
./verify-local.sh
ssh ansible@10.0.20.150 'sudo bash' < verify-remote.sh

What each phase needs

Phase Needs
1. preflight git, ansible, dig, ssh, jq locally ; SSH to R720 ; DNS resolved (warning only if missing)
2. vault nothing ; will prompt for vault password and edit vault.yml from template
3. forgejo FORGEJO_ADMIN_TOKEN env var or in .env
4. r720 FORGEJO_ADMIN_TOKEN (used to fetch runner registration token) ; SSH to R720 with sudo
5. haproxy DNS public domains resolved + port 80 reachable from Internet ; ansible decryptable vault
6. summary nothing

Troubleshooting

  • Phase 1 SSH fails — verify R720_HOST + R720_USER in .env. If you use an SSH config alias (e.g. Host srv-102v in ~/.ssh/config), set R720_HOST=srv-102v and either set R720_USER= (empty, alias's User= wins) or match the alias's user. Test manually : ssh ${R720_USER}@${R720_HOST} /bin/true.
  • Phase 2 cannot decrypt vault.yml — the password in .vault-pass doesn't match what was used to encrypt vault.yml.
    • If you remember the original password, edit .vault-pass (echo "<correct password>" > infra/ansible/.vault-pass ; chmod 0400 …).
    • Otherwise : ./reset-vault.sh — destructive, re-prompts for everything.
  • Phase 3 Forgejo API unreachable — Forgejo on https://10.0.20.105:3000 serves a self-signed cert. Set FORGEJO_INSECURE=1 in .env. Once the edge HAProxy is up + LE has issued forgejo.talas.group, switch to that URL and clear FORGEJO_INSECURE.
  • Phase 3 repo not found — set FORGEJO_OWNER to the actual org/user owning the repo. Confirm with git remote -v (the path segment after host:port/).
  • Phase 4 SSH timeout / sudo prompt — passwordless sudo needed for the SSH user. Add to /etc/sudoers.d/talas-bootstrap :
    senke ALL=(ALL) NOPASSWD: /usr/bin/bash
    
    Or run the remote half manually :
    scp scripts/bootstrap/{lib.sh,bootstrap-remote.sh} srv-102v:/tmp/
    ssh srv-102v 'sudo FORGEJO_REGISTRATION_TOKEN=<token> bash /tmp/bootstrap-remote.sh'
    
  • Phase 5 dehydrated fails — port 80 must be reachable from Internet for HTTP-01 (not blocked by ISP, NAT-forwarded). Test from outside : curl http://veza.fr/.well-known/acme-challenge/test should hit HAProxy's letsencrypt_backend (will 404, which is fine ; what matters is reaching the R720).
  • .forgejo/workflows/ is missing, only workflows.disabled/ present — expected when the auto-trigger has been gated by renaming the dir. enable-auto-deploy.sh restores it.

After bootstrap

  • Trigger 1st deploy manually via Forgejo UI : Actions → Veza deploy → Run workflow.
  • Once green, run ./enable-auto-deploy.sh to re-enable push-trigger.
  • verify-local.sh + verify-remote.sh are safe to run any time.