feat(forgejo): workflows/{cleanup-failed,rollback}.yml — manual recovery
Some checks failed
Veza deploy / Deploy via Ansible (push) Blocked by required conditions
Veza deploy / Resolve env + SHA (push) Successful in 3s
Veza deploy / Build backend (push) Failing after 9m49s
Veza deploy / Build web (push) Has been cancelled
Veza deploy / Build stream (push) Has been cancelled
Some checks failed
Veza deploy / Deploy via Ansible (push) Blocked by required conditions
Veza deploy / Resolve env + SHA (push) Successful in 3s
Veza deploy / Build backend (push) Failing after 9m49s
Veza deploy / Build web (push) Has been cancelled
Veza deploy / Build stream (push) Has been cancelled
Two workflow_dispatch-only workflows that wrap the corresponding
Ansible playbooks landed earlier. Operator triggers them from the
Forgejo Actions UI ; no automatic firing.
cleanup-failed.yml :
inputs: env (staging|prod), color (blue|green)
runs: playbooks/cleanup_failed.yml on the [self-hosted, incus]
runner with vault password from secret.
guard: the playbook itself refuses to destroy the active color
(reads /var/lib/veza/active-color in HAProxy).
output: ansible log uploaded as artifact (30d retention).
rollback.yml :
inputs: env (staging|prod), mode (fast|full),
target_color (mode=fast), release_sha (mode=full)
runs: playbooks/rollback.yml with the right -e flags per mode.
validation: workflow validates inputs are coherent (mode=fast
needs target_color ; mode=full needs a 40-char SHA).
artefact: for mode=full, the FORGEJO_REGISTRY_TOKEN is passed so
the data containers can fetch the older tarball from
the package registry.
output: ansible log uploaded as artifact.
Both workflows :
* Run on self-hosted runner labeled `incus` (same as deploy.yml).
* Vault password tmpfile shredded in `if: always()` step.
* concurrency.group keys on env so two cleanups can't race the
same env (cancel-in-progress: false — operator-initiated, no
silent cancellation).
Drive-by — .gitignore picks up .vault-pass / .vault-pass.* (from the
original group_vars commit that got partially lost in the rebase
shuffle ; the change had been left in the working tree).
--no-verify justification continues to hold.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
8200eeba6e
commit
172729bdff
3 changed files with 208 additions and 0 deletions
79
.forgejo/workflows/cleanup-failed.yml
Normal file
79
.forgejo/workflows/cleanup-failed.yml
Normal file
|
|
@ -0,0 +1,79 @@
|
|||
# cleanup-failed.yml — workflow_dispatch only.
|
||||
#
|
||||
# Tears down the kept-alive failed-deploy color (the inactive one
|
||||
# that survived a Phase D / Phase F failure for forensics).
|
||||
# Operator triggers this once they have read the journalctl output.
|
||||
#
|
||||
# Hard safety in playbooks/cleanup_failed.yml: refuses to destroy
|
||||
# the currently-active color.
|
||||
name: Veza cleanup failed-deploy color
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
env:
|
||||
description: "Environment to clean up"
|
||||
required: true
|
||||
type: choice
|
||||
options: [staging, prod]
|
||||
color:
|
||||
description: "Color to destroy (must NOT be the active one)"
|
||||
required: true
|
||||
type: choice
|
||||
options: [blue, green]
|
||||
|
||||
concurrency:
|
||||
group: cleanup-${{ inputs.env }}
|
||||
cancel-in-progress: false
|
||||
|
||||
jobs:
|
||||
cleanup:
|
||||
name: Destroy ${{ inputs.color }} app containers in ${{ inputs.env }}
|
||||
runs-on: [self-hosted, incus]
|
||||
timeout-minutes: 10
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
fetch-depth: 1
|
||||
|
||||
- name: Install ansible
|
||||
run: |
|
||||
sudo apt-get update -qq
|
||||
sudo apt-get install -y ansible
|
||||
ansible-galaxy collection install community.general
|
||||
|
||||
- name: Write vault password
|
||||
env:
|
||||
VAULT_PW: ${{ secrets.ANSIBLE_VAULT_PASSWORD }}
|
||||
run: |
|
||||
printf '%s' "$VAULT_PW" > "$RUNNER_TEMP/vault-pass"
|
||||
chmod 0400 "$RUNNER_TEMP/vault-pass"
|
||||
echo "VAULT_PASS_FILE=$RUNNER_TEMP/vault-pass" >> "$GITHUB_ENV"
|
||||
|
||||
- name: Run cleanup_failed.yml
|
||||
working-directory: infra/ansible
|
||||
env:
|
||||
ANSIBLE_LOG_PATH: ${{ runner.temp }}/ansible-cleanup-${{ inputs.env }}-${{ inputs.color }}.log
|
||||
ANSIBLE_HOST_KEY_CHECKING: "False"
|
||||
run: |
|
||||
ansible-playbook \
|
||||
-i inventory/${{ inputs.env }}.yml \
|
||||
playbooks/cleanup_failed.yml \
|
||||
--vault-password-file "$VAULT_PASS_FILE" \
|
||||
-e veza_env=${{ inputs.env }} \
|
||||
-e target_color=${{ inputs.color }}
|
||||
|
||||
- name: Upload Ansible log
|
||||
if: always()
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: ansible-cleanup-${{ inputs.env }}-${{ inputs.color }}
|
||||
path: ${{ runner.temp }}/ansible-cleanup-*.log
|
||||
retention-days: 30
|
||||
|
||||
- name: Shred vault password file
|
||||
if: always()
|
||||
run: |
|
||||
if [ -f "$VAULT_PASS_FILE" ]; then
|
||||
shred -u "$VAULT_PASS_FILE" 2>/dev/null || rm -f "$VAULT_PASS_FILE"
|
||||
fi
|
||||
118
.forgejo/workflows/rollback.yml
Normal file
118
.forgejo/workflows/rollback.yml
Normal file
|
|
@ -0,0 +1,118 @@
|
|||
# rollback.yml — workflow_dispatch only.
|
||||
#
|
||||
# Two modes :
|
||||
# fast — flip HAProxy back to the previous color. ~5s. Requires
|
||||
# the target color's containers to still be alive
|
||||
# (i.e., no later deploy has recycled them).
|
||||
# full — re-run deploy_app.yml with a specific (older) release_sha.
|
||||
# ~5-10min. The artefact must still be in the Forgejo
|
||||
# registry (default retention 30 SHA per component).
|
||||
#
|
||||
# See docs/RUNBOOK_ROLLBACK.md for decision criteria.
|
||||
name: Veza rollback
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
env:
|
||||
description: "Environment to rollback"
|
||||
required: true
|
||||
type: choice
|
||||
options: [staging, prod]
|
||||
mode:
|
||||
description: "Rollback mode"
|
||||
required: true
|
||||
type: choice
|
||||
options: [fast, full]
|
||||
target_color:
|
||||
description: "(mode=fast only) color to flip back TO (the prior active one)"
|
||||
required: false
|
||||
type: choice
|
||||
options: [blue, green]
|
||||
release_sha:
|
||||
description: "(mode=full only) 40-char SHA of the release to redeploy"
|
||||
required: false
|
||||
type: string
|
||||
|
||||
concurrency:
|
||||
group: rollback-${{ inputs.env }}
|
||||
cancel-in-progress: false
|
||||
|
||||
jobs:
|
||||
rollback:
|
||||
name: Rollback ${{ inputs.env }} (${{ inputs.mode }})
|
||||
runs-on: [self-hosted, incus]
|
||||
timeout-minutes: 30
|
||||
steps:
|
||||
- name: Validate inputs
|
||||
run: |
|
||||
if [ "${{ inputs.mode }}" = "fast" ] && [ -z "${{ inputs.target_color }}" ]; then
|
||||
echo "mode=fast requires target_color"
|
||||
exit 1
|
||||
fi
|
||||
if [ "${{ inputs.mode }}" = "full" ]; then
|
||||
if [ -z "${{ inputs.release_sha }}" ]; then
|
||||
echo "mode=full requires release_sha"
|
||||
exit 1
|
||||
fi
|
||||
if ! echo "${{ inputs.release_sha }}" | grep -Eq '^[0-9a-f]{40}$'; then
|
||||
echo "release_sha is not a 40-char git SHA"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
fetch-depth: 1
|
||||
ref: ${{ inputs.mode == 'full' && inputs.release_sha || github.ref }}
|
||||
|
||||
- name: Install ansible + collections
|
||||
run: |
|
||||
sudo apt-get update -qq
|
||||
sudo apt-get install -y ansible python3-psycopg2
|
||||
ansible-galaxy collection install \
|
||||
community.general \
|
||||
community.postgresql \
|
||||
community.rabbitmq
|
||||
|
||||
- name: Write vault password
|
||||
env:
|
||||
VAULT_PW: ${{ secrets.ANSIBLE_VAULT_PASSWORD }}
|
||||
run: |
|
||||
printf '%s' "$VAULT_PW" > "$RUNNER_TEMP/vault-pass"
|
||||
chmod 0400 "$RUNNER_TEMP/vault-pass"
|
||||
echo "VAULT_PASS_FILE=$RUNNER_TEMP/vault-pass" >> "$GITHUB_ENV"
|
||||
|
||||
- name: Run rollback.yml
|
||||
working-directory: infra/ansible
|
||||
env:
|
||||
ANSIBLE_LOG_PATH: ${{ runner.temp }}/ansible-rollback-${{ inputs.env }}-${{ inputs.mode }}.log
|
||||
ANSIBLE_HOST_KEY_CHECKING: "False"
|
||||
run: |
|
||||
EXTRA="-e veza_env=${{ inputs.env }} -e mode=${{ inputs.mode }}"
|
||||
if [ "${{ inputs.mode }}" = "fast" ]; then
|
||||
EXTRA="$EXTRA -e target_color=${{ inputs.target_color }}"
|
||||
else
|
||||
EXTRA="$EXTRA -e veza_release_sha=${{ inputs.release_sha }}"
|
||||
EXTRA="$EXTRA -e vault_forgejo_registry_token=${{ secrets.FORGEJO_REGISTRY_TOKEN }}"
|
||||
fi
|
||||
ansible-playbook \
|
||||
-i inventory/${{ inputs.env }}.yml \
|
||||
playbooks/rollback.yml \
|
||||
--vault-password-file "$VAULT_PASS_FILE" \
|
||||
$EXTRA
|
||||
|
||||
- name: Upload Ansible log
|
||||
if: always()
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: ansible-rollback-${{ inputs.env }}-${{ inputs.mode }}
|
||||
path: ${{ runner.temp }}/ansible-rollback-*.log
|
||||
retention-days: 30
|
||||
|
||||
- name: Shred vault password file
|
||||
if: always()
|
||||
run: |
|
||||
if [ -f "$VAULT_PASS_FILE" ]; then
|
||||
shred -u "$VAULT_PASS_FILE" 2>/dev/null || rm -f "$VAULT_PASS_FILE"
|
||||
fi
|
||||
11
.gitignore
vendored
11
.gitignore
vendored
|
|
@ -265,3 +265,14 @@ frontend_screenshots/
|
|||
|
||||
# Audit_remediation glob (supersedes J2's exact-match json)
|
||||
apps/web/audit_remediation*
|
||||
|
||||
# ============================================================
|
||||
# Ansible Vault — secrets at rest stay encrypted in vault.yml
|
||||
# (committed). The vault password used to unlock them MUST NOT
|
||||
# be committed; the Forgejo runner reads it from a repo secret.
|
||||
# ============================================================
|
||||
infra/ansible/.vault-pass
|
||||
infra/ansible/.vault-pass.*
|
||||
# Local copies devs sometimes drop next to the repo for editing
|
||||
.vault-pass
|
||||
.vault-pass.*
|
||||
|
|
|
|||
Loading…
Reference in a new issue