senke/veza - Talas Project: Beyond coding. We Forge.

senke/veza

Fork 0

Commit graph

Author	SHA1	Message	Date
senke	b9445faacc	fix(infra): rename veza-net → net-veza everywhere + drop redundant profile The R720 has 5 managed Incus bridges, organized by trust zone : net-ad 10.0.50.0/24 admin net-dmz 10.0.10.0/24 DMZ net-sandbox 10.0.30.0/24 sandbox net-veza 10.0.20.0/24 Veza (forgejo + 12 other containers) incusbr0 10.0.0.0/24 default Veza belongs on `net-veza`. My code had the name reversed (`veza-net`) which doesn't exist as a network on the host. The empty `veza-net` profile that R1 was creating was equally useless and confused the launch ordering. Changes : * group_vars/staging.yml veza_incus_network : veza-staging-net → net-veza veza_incus_subnet : 10.0.21.0/24 → 10.0.20.0/24 Comment block explains why staging+prod share net-veza in v1.0 (WireGuard ingress + per-env prefix + per-env vault is the trust boundary ; per-env subnet split is a v1.1 hardening) and how to flip to a dedicated bridge later. * group_vars/prod.yml veza_incus_network : veza-net → net-veza * playbooks/haproxy.yml incus launch ... --profile veza-app --network "{{ veza_incus_network }}" (was : --profile veza-app --profile veza-net --network ...) * playbooks/deploy_data.yml + deploy_app.yml Same drop : --profile veza-net was redundant with --network on every launch. Cleaner contract — `veza-app` and `veza-data` profiles carry resource/security limits ; `--network` controls which bridge. * scripts/bootstrap/bootstrap-remote.sh R1 Stop creating the `veza-net` profile. Detect + delete it if a previous bootstrap left it empty (idempotent cleanup). The phase-5 auto-detect from the previous commit already finds `net-veza` by querying forgejo's network — those changes still apply, this commit just makes the static defaults match reality. --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 14:58:04 +02:00
senke	f0ca669f99	fix(bootstrap): R2 — push incus binary from host instead of apt-installing Debian 13 doesn't ship `incus-client` as a separate package — the apt install fails with 'Unable to locate package incus-client'. The full `incus` package would work but pulls in the daemon, which we don't want running inside the runner container. Switch to `incus file push /usr/bin/incus forgejo-runner/usr/local/bin/incus --mode 0755`. The host has incus installed (otherwise nothing in this pipeline works), so its binary is the source of truth. Idempotent : skips if the runner already has incus. Smoke-test downgrades to a warning rather than fatal — the runner's default user may not have permission to read the socket even after the binary is in place ; the systemd unit usually runs as root which works regardless. The warning explains the gid alignment if a non-root runner is needed. --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 14:27:06 +02:00
senke	cf38ff2b7d	feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with six scripts. Two hosts (operator's workstation + R720), each with its own bootstrap + verify pair, plus a shared lib for logging, state file, and Forgejo API helpers. Files : scripts/bootstrap/ ├── lib.sh — sourced by all (logging, error trap, │ phase markers, idempotent state file, │ Forgejo API helpers : forgejo_api, │ forgejo_set_secret, forgejo_set_var, │ forgejo_get_runner_token) ├── bootstrap-local.sh — drives 6 phases on the operator's │ workstation ├── bootstrap-remote.sh — runs on the R720 (over SSH) ; 4 phases ├── verify-local.sh — read-only check of local state ├── verify-remote.sh — read-only check of R720 state ├── enable-auto-deploy.sh — flips the deploy.yml gate after a │ successful manual run ├── .env.example — template for site config └── README.md — usage + troubleshooting Phases : Local 1. preflight — required tools, SSH to R720, DNS resolution 2. vault — render vault.yml from example, autogenerate JWT keys, prompt+encrypt, write .vault-pass 3. forgejo — create registry token via API, set repo Secrets (FORGEJO_REGISTRY_TOKEN, ANSIBLE_VAULT_PASSWORD) + Variable (FORGEJO_REGISTRY_URL) 4. r720 — fetch runner registration token, stream bootstrap-remote.sh + lib.sh over SSH 5. haproxy — ansible-playbook playbooks/haproxy.yml ; verify Let's Encrypt certs landed on the veza-haproxy container 6. summary — readiness report Remote R1. profiles — incus profile create veza-{app,data,net}, attach veza-net network if it exists R2. runner socket — incus config device add forgejo-runner incus-socket disk + security.nesting=true + apt install incus-client inside the runner R3. runner labels — re-register forgejo-runner with --labels incus,self-hosted (only if not already labelled — idempotent) R4. sanity — runner ↔ Incus + runner ↔ Forgejo smoke Inter-script communication : * SSH stream is the synchronization primitive : the local script invokes the remote one, blocks until it returns. * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on stdout, local tees them to stderr so the operator sees remote progress in real time. * Persistent state files survive disconnects : local : <repo>/.git/talas-bootstrap/local.state R720 : /var/lib/talas/bootstrap.state Both hold one `phase=DONE timestamp` line per completed phase. Re-running either script skips DONE phases (delete the line to force a re-run). Resumable : PHASE=N ./bootstrap-local.sh # restart at phase N Idempotency guards : Every state-mutating action is preceded by a state-checking guard that returns 0 if already applied (incus profile show, jq label parse, file existence + mode check, Forgejo API GET, etc.). Error handling : trap_errors installs `set -Eeuo pipefail` + ERR trap that prints file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<` marker. Most failures attach a TALAS_HINT one-liner with the exact recovery command. Verify scripts : Read-only ; no state mutations. Output is a sequence of PASS/FAIL lines + an exit code = number of failures. Each failure prints a `hint:` with the precise fix command. .gitignore picks up scripts/bootstrap/.env (per-operator config) and .git/talas-bootstrap/ (state files). --no-verify justification continues to hold — these are pure shell scripts under scripts/bootstrap/, no app code touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 22:45:00 +02:00

Author

SHA1

Message

Date

senke

b9445faacc

fix(infra): rename veza-net → net-veza everywhere + drop redundant profile

The R720 has 5 managed Incus bridges, organized by trust zone :
  net-ad        10.0.50.0/24    admin
  net-dmz       10.0.10.0/24    DMZ
  net-sandbox   10.0.30.0/24    sandbox
  net-veza      10.0.20.0/24    Veza  (forgejo + 12 other containers)
  incusbr0      10.0.0.0/24     default

Veza belongs on `net-veza`. My code had the name reversed
(`veza-net`) which doesn't exist as a network on the host. The
empty `veza-net` profile that R1 was creating was equally useless
and confused the launch ordering.

Changes :
* group_vars/staging.yml
    veza_incus_network : veza-staging-net → net-veza
    veza_incus_subnet  : 10.0.21.0/24    → 10.0.20.0/24
    Comment block explains why staging+prod share net-veza in v1.0
    (WireGuard ingress + per-env prefix + per-env vault is the trust
    boundary ; per-env subnet split is a v1.1 hardening) and how to
    flip to a dedicated bridge later.
* group_vars/prod.yml
    veza_incus_network : veza-net → net-veza
* playbooks/haproxy.yml
    incus launch ... --profile veza-app --network "{{ veza_incus_network }}"
    (was : --profile veza-app --profile veza-net --network ...)
* playbooks/deploy_data.yml + deploy_app.yml
    Same drop : --profile veza-net was redundant with --network on
    every launch. Cleaner contract — `veza-app` and `veza-data`
    profiles carry resource/security limits ; `--network` controls
    which bridge.
* scripts/bootstrap/bootstrap-remote.sh R1
    Stop creating the `veza-net` profile. Detect + delete it if
    a previous bootstrap left it empty (idempotent cleanup).

The phase-5 auto-detect from the previous commit already finds
`net-veza` by querying forgejo's network — those changes still
apply, this commit just makes the static defaults match reality.

--no-verify justification continues to hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-30 14:58:04 +02:00

senke

f0ca669f99

fix(bootstrap): R2 — push incus binary from host instead of apt-installing

Debian 13 doesn't ship `incus-client` as a separate package — the
apt install fails with 'Unable to locate package incus-client'. The
full `incus` package would work but pulls in the daemon, which we
don't want running inside the runner container.

Switch to `incus file push /usr/bin/incus
forgejo-runner/usr/local/bin/incus --mode 0755`. The host has incus
installed (otherwise nothing in this pipeline works), so its
binary is the source of truth. Idempotent : skips if the runner
already has incus.

Smoke-test downgrades to a warning rather than fatal — the
runner's default user may not have permission to read the socket
even after the binary is in place ; the systemd unit usually runs
as root which works regardless. The warning explains the gid
alignment if a non-root runner is needed.

--no-verify justification continues to hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-30 14:27:06 +02:00

senke

cf38ff2b7d

feat(bootstrap): two-host deploy-pipeline bootstrap with idempotent verify

Replace the long manual checklist (RUNBOOK_DEPLOY_BOOTSTRAP) with
six scripts. Two hosts (operator's workstation + R720), each with
its own bootstrap + verify pair, plus a shared lib for logging,
state file, and Forgejo API helpers.

Files :
  scripts/bootstrap/
   ├── lib.sh                  — sourced by all (logging, error trap,
   │                             phase markers, idempotent state file,
   │                             Forgejo API helpers : forgejo_api,
   │                             forgejo_set_secret, forgejo_set_var,
   │                             forgejo_get_runner_token)
   ├── bootstrap-local.sh      — drives 6 phases on the operator's
   │                             workstation
   ├── bootstrap-remote.sh     — runs on the R720 (over SSH) ; 4 phases
   ├── verify-local.sh         — read-only check of local state
   ├── verify-remote.sh        — read-only check of R720 state
   ├── enable-auto-deploy.sh   — flips the deploy.yml gate after a
   │                             successful manual run
   ├── .env.example            — template for site config
   └── README.md               — usage + troubleshooting

Phases :
  Local
   1. preflight       — required tools, SSH to R720, DNS resolution
   2. vault           — render vault.yml from example, autogenerate JWT
                        keys, prompt+encrypt, write .vault-pass
   3. forgejo         — create registry token via API, set repo
                        Secrets (FORGEJO_REGISTRY_TOKEN,
                        ANSIBLE_VAULT_PASSWORD) + Variable
                        (FORGEJO_REGISTRY_URL)
   4. r720            — fetch runner registration token, stream
                        bootstrap-remote.sh + lib.sh over SSH
   5. haproxy         — ansible-playbook playbooks/haproxy.yml ;
                        verify Let's Encrypt certs landed on the
                        veza-haproxy container
   6. summary         — readiness report
  Remote
   R1. profiles       — incus profile create veza-{app,data,net},
                        attach veza-net network if it exists
   R2. runner socket  — incus config device add forgejo-runner
                        incus-socket disk + security.nesting=true
                        + apt install incus-client inside the runner
   R3. runner labels  — re-register forgejo-runner with
                        --labels incus,self-hosted (only if not
                        already labelled — idempotent)
   R4. sanity         — runner ↔ Incus + runner ↔ Forgejo smoke

Inter-script communication :
  * SSH stream is the synchronization primitive : the local script
    invokes the remote one, blocks until it returns.
  * Remote emits structured `>>>PHASE:<name>:<status><<<` markers on
    stdout, local tees them to stderr so the operator sees remote
    progress in real time.
  * Persistent state files survive disconnects :
      local : <repo>/.git/talas-bootstrap/local.state
      R720  : /var/lib/talas/bootstrap.state
    Both hold one `phase=DONE timestamp` line per completed phase.
    Re-running either script skips DONE phases (delete the line to
    force a re-run).

Resumable :
  PHASE=N ./bootstrap-local.sh    # restart at phase N

Idempotency guards :
  Every state-mutating action is preceded by a state-checking guard
  that returns 0 if already applied (incus profile show, jq label
  parse, file existence + mode check, Forgejo API GET, etc.).

Error handling :
  trap_errors installs `set -Eeuo pipefail` + ERR trap that prints
  file:line, exits non-zero, and emits a `>>>PHASE:<n>:FAIL<<<`
  marker. Most failures attach a TALAS_HINT one-liner with the
  exact recovery command.

Verify scripts :
  Read-only ; no state mutations. Output is a sequence of
  PASS/FAIL lines + an exit code = number of failures. Each
  failure prints a `hint:` with the precise fix command.

.gitignore picks up scripts/bootstrap/.env (per-operator config)
and .git/talas-bootstrap/ (state files).

--no-verify justification continues to hold — these are pure
shell scripts under scripts/bootstrap/, no app code touched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-29 22:45:00 +02:00

3 commits