Compare commits
2 commits
594204fb86
...
8fa4b75387
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
8fa4b75387 | ||
|
|
f9d00bbe4d |
6 changed files with 410 additions and 191 deletions
149
docs/PENTEST_SCOPE_2026.md
Normal file
149
docs/PENTEST_SCOPE_2026.md
Normal file
|
|
@ -0,0 +1,149 @@
|
|||
# External pentest scope — v2026 (v1.0.9 pre-launch audit)
|
||||
|
||||
> **Engagement period** : v1.0.9 W5-W6 (per `docs/ROADMAP_V1.0_LAUNCH.md` §Day 25). Async work, ~10 business days.
|
||||
> **Authorisation** : signed scope letter + NDA on file (see "Legal context" below).
|
||||
> **Re-test** : one re-test included after the team's fix pass.
|
||||
> **Contact** : `security@veza.fr` ; PGP key fingerprint published at `https://veza.fr/.well-known/security.txt`.
|
||||
|
||||
This brief is the technical hand-off for the external pentest team. It complements the contractual scope letter ; the contract governs commercial terms, this doc governs the technical surface.
|
||||
|
||||
## Engagement summary
|
||||
|
||||
**Target** : Veza, an ethical music streaming platform. Backend is Go 1.25 + Gin + GORM ; streaming is Rust + Axum ; frontend is React 18 + Vite. Infrastructure is Incus (LXD) on a single self-hosted R720 in v1.0, moving to a multi-host Hetzner topology in v1.1.
|
||||
|
||||
**Version under test** : v1.0.9 (release candidate for v2.0.0 public launch). Commit SHA pinned at `<TBD-at-engagement-start>` ; the staging environment freezes at this SHA for the engagement.
|
||||
|
||||
**Goals** :
|
||||
|
||||
1. Find what the internal pre-flight audit (`docs/SECURITY_PRELAUNCH_AUDIT.md`, W5 Day 21) missed — focus on business-logic abuse paths the automated scanners can't model.
|
||||
2. Validate the v1.0.9 surface added since the last review : DMCA workflow, marketplace pre-listen, embed widget, WebRTC ICE config, faceted search.
|
||||
3. Assess the multi-tenant invariants (creator vs. listener vs. admin) under malicious user input.
|
||||
|
||||
## In-scope assets
|
||||
|
||||
| Asset | Endpoint / surface | Notes |
|
||||
| ------------------------------- | --------------------------------------------------------------- | ---------------------------------------------------- |
|
||||
| **Backend API** | `https://staging.veza.fr/api/v1/*` | All v1.0.9 endpoints + the OpenAPI spec at `/swagger` |
|
||||
| **Stream server** | `https://staging.veza.fr/api/v1/tracks/*/hls/*` | HLS-only — RTMP ingest is out (v1.1) |
|
||||
| **Embed widget** | `https://staging.veza.fr/embed/track/:id` | Public iframable HTML, OG tags |
|
||||
| **oEmbed** | `https://staging.veza.fr/oembed` | JSON envelope |
|
||||
| **Status / health** | `https://staging.veza.fr/api/v1/status`, `/health` | Public ; intentional disclosure |
|
||||
| **Frontend SPA** | `https://staging.veza.fr/` | React 18 + Vite ; sourcemaps available on staging |
|
||||
| **WebSocket (chat / live)** | `wss://staging.veza.fr/api/v1/ws` | Protocol described in `docs/api/websocket.md` |
|
||||
| **Marketplace** | `/api/v1/marketplace/{products,orders,licenses,reviews}` | Hyperswitch sandbox, no real card processing |
|
||||
| **DMCA workflow** | `POST /api/v1/dmca/notice` + admin queue | Sworn-statement validation, audit log, takedown gate |
|
||||
|
||||
## Out of scope
|
||||
|
||||
- **Production** (`api.veza.fr`, `app.veza.fr`). Engaging prod is not authorised — every test runs against staging.
|
||||
- **Third-party services we don't operate** : Hyperswitch live mode, Bunny.net edges, Sentry, Forgejo. Their security posture is the providers' responsibility.
|
||||
- **Denial-of-service testing** above the rate-limiter quotas. The platform's rate-limit middleware is in scope ; sustained flooding to deplete bandwidth is not.
|
||||
- **Social engineering against Veza staff.** Phishing simulations require a separate engagement with prior written authorisation.
|
||||
- **Physical / wireless** attacks against the R720 lab.
|
||||
- **Source-code modification** : the engagement is grey-box (source available read-only at `https://10.0.20.105:3000/senke/veza` once the pentester's IP is allow-listed) but findings must be reproducible against staging without local patches.
|
||||
|
||||
## Authentication context
|
||||
|
||||
Three test accounts pre-seeded on staging :
|
||||
|
||||
| Role | Email | Password | Notes |
|
||||
| ------------ | --------------------------- | ----------------------- | -------------------------------------- |
|
||||
| Listener | `pentest-listener@…` | `<delivered out-of-band>` | role=user, no 2FA, fully-verified |
|
||||
| Creator | `pentest-creator@…` | `<delivered out-of-band>` | role=creator, owns 5 seed tracks |
|
||||
| Admin | `pentest-admin@…` | `<delivered out-of-band>` | role=admin + MFA bypass token |
|
||||
|
||||
Bearer tokens for synthetic-client style testing are derivable from `/api/v1/auth/login`. All passwords are randomised per-engagement and rotated immediately after the engagement ends.
|
||||
|
||||
## High-priority focus areas
|
||||
|
||||
We're particularly interested in the following surfaces (in order of impact). The internal audit cleared the trivial OWASP-Top-10 hits ; here we want creative attacks.
|
||||
|
||||
### 1. Authentication + session lifecycle
|
||||
|
||||
- JWT key rotation : staging uses RS256 with `JWT_PRIVATE_KEY_PATH`. Can the public key be inferred from misconfigured JWKS-style endpoints ?
|
||||
- 2FA bypass : the login flow returns `requires_2fa=true` on partial-auth. Is there a state-machine flaw between partial-auth and full-auth ?
|
||||
- Refresh-token replay after logout : revocation list is Redis-backed. What happens if Redis is partitioned ?
|
||||
- Session fixation via the OAuth callback : `OAUTH_ALLOWED_REDIRECT_DOMAINS` allow-list — does the validation hold for IDN homograph URLs ?
|
||||
|
||||
### 2. Payment / marketplace
|
||||
|
||||
- Order tampering : the `POST /api/v1/marketplace/orders` body contains product IDs + quantity. Can a buyer craft an order at an arbitrary price ? (Roadmap subscription Phase 2 + 3 hardening was done but the order flow predates that work.)
|
||||
- Webhook signature replay : `POST /webhooks/hyperswitch` validates a signature. Does the implementation check timestamps, or only the HMAC ?
|
||||
- Refund window race : `RefundDeadline` is set to `+14d` on order completion. What happens if the buyer initiates a refund at exactly `14d - 1ms` and the validation race is exposed ?
|
||||
- Pre-listen abuse : `?preview=30` is anonymous-OK when `products.preview_enabled=true`. The 30 s cap is **client-side** (HTML5 audio currentTime) ; can an attacker grab the full audio via byte-range requests despite the gate ? (Trust model is documented as "tease-to-buy, not anti-rip" but we want to know how leaky it is in practice.)
|
||||
|
||||
### 3. DMCA workflow
|
||||
|
||||
- Notice forgery : `POST /api/v1/dmca/notice` is public + rate-limited. Can the rate-limit be bypassed via header rotation, X-Forwarded-For spoofing, or IPv6 prefix walking ?
|
||||
- Sworn statement bypass : the `sworn_statement: true` field is trusted. Can a malformed JSON body land a notice with `sworn_statement` absent (Go's zero-value) ?
|
||||
- Admin takedown enumeration : `GET /api/v1/admin/dmca/notices` returns paginated pending notices. Does the offset+limit handling leak a separate-tenant's claimant data ?
|
||||
|
||||
### 4. Upload + transcoder pipeline
|
||||
|
||||
- Chunked upload state pollution : `POST /api/v1/tracks/upload/initiate` allocates an upload_id. Can two users with the same upload_id collide on the chunked-state Redis keys ?
|
||||
- File-type confusion via `Content-Type` : the upload validator checks magic bytes. Are there codec-level flaws (e.g. malformed FLAC header that crashes the transcoder) ?
|
||||
- HLS segment poisoning : the streamer caches segments by track_id. Can a crafted upload pollute another track's cache via path traversal in the segment filename ?
|
||||
|
||||
### 5. WebRTC ICE config + embed
|
||||
|
||||
- The `/api/v1/config/webrtc` endpoint is intentionally public per `SECURITY_PRELAUNCH_AUDIT.md`. We want a second opinion on whether the short-lived TURN credentials are short enough.
|
||||
- Embed iframe XSS : `/embed/track/:id` interpolates `track.title` + `track.artist` into HTML body + OG tags via `html.EscapeString`. Try crafted Unicode + HTML-entity edge cases (e.g. surrogates, RTLO, byte-order marks).
|
||||
- oEmbed URL injection : `?url=` is parsed for `/tracks/<uuid>`. Is there a way to redirect the iframe to an attacker-controlled domain via a malformed input ?
|
||||
|
||||
### 6. Faceted search + share tokens
|
||||
|
||||
- SQL injection via the search facets : `genre`, `musical_key` are bounded by length but passed as parameterised values. Verify parameterisation holds end-to-end.
|
||||
- Share-token enumeration : the W5 Day 21 audit unified error responses to a single 403. Cross-check there are no remaining timing oracles (DB latency vs cache hit, Redis vs Postgres-only paths).
|
||||
|
||||
## Internal audit — already fixed (skip these)
|
||||
|
||||
The W5 Day 21 audit already addressed the items below. They're listed so the external doesn't waste time re-reporting them.
|
||||
|
||||
| Finding | Resolution | Commit ref |
|
||||
| ----------------------------------------------- | ----------------------------------------------------------- | --------------------- |
|
||||
| Share-token enumeration via 404 vs 403 split | Unified to 403 + generic message in track_hls + track_social handlers | v1.0.9 W5 Day 21 |
|
||||
| XSS via track metadata in embed widget | `html.EscapeString` wraps every HTML interpolation | v1.0.9 W3 Day 15 |
|
||||
| DMCA workflow XSS via `work_description` | Storage parameterised, render is React-escaped | (audit, no code change) |
|
||||
| `/config/webrtc` disclosure | Accepted by design, short-lived TURN credentials | (audit, accepted) |
|
||||
|
||||
## Reporting protocol
|
||||
|
||||
- **Severity scale** : CVSS 3.1. Critical (9.0+), High (7.0–8.9), Medium (4.0–6.9), Low (0.1–3.9), Informational.
|
||||
- **Reporting cadence** : ad-hoc for Critical/High (within 4 business hours of confirmation), batched daily for Medium and below.
|
||||
- **Channel** : encrypted email to `security@veza.fr`. PGP key at `https://veza.fr/.well-known/security.txt`. For Critical findings, also use the Signal contact in the engagement letter.
|
||||
- **Format** : per finding — title, severity, CVSS vector, reproduction steps (curl / browser-side script), proof of exploitation, recommended remediation, affected component(s).
|
||||
- **Status calls** : weekly 30-min check-in (calendar invite from `security@veza.fr`).
|
||||
|
||||
## Re-test
|
||||
|
||||
The engagement includes one re-test. After the team confirms remediation of all High+ findings, the pentester verifies each fix in the same environment + signs off on the report.
|
||||
|
||||
## Legal context
|
||||
|
||||
- Authorisation letter on file : signed by `<CEO name>` for Veza, signed by `<lead pentester>` for the firm. Effective `<start date>` to `<end date + 30 d for re-test>`.
|
||||
- NDA covers : everything observed during the engagement, including findings, source code, internal architecture, runbooks.
|
||||
- Logs : Veza retains all server-side logs for 30 d post-engagement so the team can reconstruct any reported finding without relying on the pentester's local notes.
|
||||
- Incident-response coordination : if the pentester believes they've triggered a real incident (e.g. accidentally took staging down beyond the agreed scope), they ping `security@veza.fr` immediately ; we coordinate a controlled rollback per the canary release runbook (`docs/CANARY_RELEASE.md`).
|
||||
|
||||
## What we'll do with the report
|
||||
|
||||
- **Critical / High** : fix before the v2.0.0 public launch. The launch GO/NO-GO checklist (W6 Day 26) blocks on these.
|
||||
- **Medium** : fix in v2.0.x patch releases.
|
||||
- **Low / Info** : tracked in the `docs/SECURITY_PRELAUNCH_AUDIT.md` follow-up table for the next review cycle.
|
||||
- **Public credit** : the firm's name in `docs/SECURITY_ACKNOWLEDGEMENTS.md` (with prior consent) once the report is delivered + remediation is shipped.
|
||||
|
||||
## Files for the pentester's first day
|
||||
|
||||
- `docs/ROADMAP_V1.0_LAUNCH.md` — what shipped in v1.0.9 + the launch acceptance bar.
|
||||
- `docs/SECURITY_PRELAUNCH_AUDIT.md` — internal audit findings + resolutions (skip these in the external).
|
||||
- `docs/api/` — OpenAPI / Swagger generated from the live source ; `https://staging.veza.fr/swagger` mirrors it.
|
||||
- `docs/CANARY_RELEASE.md` — how the team rolls fixes during the engagement (so the pentester can predict re-test windows).
|
||||
- `infra/ansible/` — read-only via the Forgejo allow-list ; gives architectural context.
|
||||
|
||||
## Acceptance gate (Day 25 internal milestone)
|
||||
|
||||
- [ ] Pentester briefed (this doc + scope letter handed off)
|
||||
- [ ] Staging access provisioned + test accounts delivered out-of-band
|
||||
- [ ] Source-code repo allow-list includes pentester's static IP
|
||||
- [ ] Initial check-in scheduled
|
||||
- [ ] Internal audit findings (W5 Day 21) confirmed fixed in the staging build the pentester is testing
|
||||
|
|
@ -28,33 +28,66 @@ all:
|
|||
ansible_connection: community.general.incus
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
veza_app_backend:
|
||||
children:
|
||||
veza_app_backend_blue:
|
||||
veza_app_backend_green:
|
||||
veza_app_backend_tools:
|
||||
vars:
|
||||
ansible_connection: community.general.incus
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
veza_app_backend_blue:
|
||||
hosts:
|
||||
veza-backend-blue:
|
||||
veza_app_backend_green:
|
||||
hosts:
|
||||
veza-backend-green:
|
||||
veza_app_backend_tools:
|
||||
hosts:
|
||||
veza-backend-tools: # ephemeral, Phase A only
|
||||
veza_app_stream:
|
||||
children:
|
||||
veza_app_stream_blue:
|
||||
veza_app_stream_green:
|
||||
vars:
|
||||
ansible_connection: community.general.incus
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
veza_app_stream:
|
||||
veza_app_stream_blue:
|
||||
hosts:
|
||||
veza-stream-blue:
|
||||
veza_app_stream_green:
|
||||
hosts:
|
||||
veza-stream-green:
|
||||
veza_app_web:
|
||||
children:
|
||||
veza_app_web_blue:
|
||||
veza_app_web_green:
|
||||
vars:
|
||||
ansible_connection: community.general.incus
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
veza_app_web:
|
||||
veza_app_web_blue:
|
||||
hosts:
|
||||
veza-web-blue:
|
||||
veza_app_web_green:
|
||||
hosts:
|
||||
veza-web-green:
|
||||
veza_data:
|
||||
children:
|
||||
veza_data_postgres:
|
||||
veza_data_redis:
|
||||
veza_data_rabbitmq:
|
||||
veza_data_minio:
|
||||
vars:
|
||||
ansible_connection: community.general.incus
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
veza_data:
|
||||
veza_data_postgres:
|
||||
hosts:
|
||||
veza-postgres:
|
||||
veza_data_redis:
|
||||
hosts:
|
||||
veza-redis:
|
||||
veza_data_rabbitmq:
|
||||
hosts:
|
||||
veza-rabbitmq:
|
||||
veza_data_minio:
|
||||
hosts:
|
||||
veza-minio:
|
||||
vars:
|
||||
ansible_connection: community.general.incus
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
|
|
|
|||
|
|
@ -48,35 +48,68 @@ all:
|
|||
# container's /var/lib/veza/active-color file ; both blue and
|
||||
# green sit in inventory so either color is reachable when needed.
|
||||
veza_app_backend:
|
||||
children:
|
||||
veza_app_backend_blue:
|
||||
veza_app_backend_green:
|
||||
veza_app_backend_tools:
|
||||
vars:
|
||||
ansible_connection: community.general.incus
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
veza_app_backend_blue:
|
||||
hosts:
|
||||
veza-staging-backend-blue:
|
||||
veza_app_backend_green:
|
||||
hosts:
|
||||
veza-staging-backend-green:
|
||||
veza_app_backend_tools:
|
||||
hosts:
|
||||
veza-staging-backend-tools: # ephemeral, Phase A only
|
||||
veza_app_stream:
|
||||
children:
|
||||
veza_app_stream_blue:
|
||||
veza_app_stream_green:
|
||||
vars:
|
||||
ansible_connection: community.general.incus
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
veza_app_stream:
|
||||
veza_app_stream_blue:
|
||||
hosts:
|
||||
veza-staging-stream-blue:
|
||||
veza_app_stream_green:
|
||||
hosts:
|
||||
veza-staging-stream-green:
|
||||
veza_app_web:
|
||||
children:
|
||||
veza_app_web_blue:
|
||||
veza_app_web_green:
|
||||
vars:
|
||||
ansible_connection: community.general.incus
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
veza_app_web:
|
||||
veza_app_web_blue:
|
||||
hosts:
|
||||
veza-staging-web-blue:
|
||||
veza_app_web_green:
|
||||
hosts:
|
||||
veza-staging-web-green:
|
||||
vars:
|
||||
ansible_connection: community.general.incus
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
# Data tier — never destroyed, only created if absent. ZFS
|
||||
# snapshots taken on every deploy as the safety net.
|
||||
veza_data:
|
||||
hosts:
|
||||
veza-staging-postgres:
|
||||
veza-staging-redis:
|
||||
veza-staging-rabbitmq:
|
||||
veza-staging-minio:
|
||||
children:
|
||||
veza_data_postgres:
|
||||
veza_data_redis:
|
||||
veza_data_rabbitmq:
|
||||
veza_data_minio:
|
||||
vars:
|
||||
ansible_connection: community.general.incus
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
veza_data_postgres:
|
||||
hosts:
|
||||
veza-staging-postgres:
|
||||
veza_data_redis:
|
||||
hosts:
|
||||
veza-staging-redis:
|
||||
veza_data_rabbitmq:
|
||||
hosts:
|
||||
veza-staging-rabbitmq:
|
||||
veza_data_minio:
|
||||
hosts:
|
||||
veza-staging-minio:
|
||||
|
|
|
|||
|
|
@ -62,14 +62,9 @@
|
|||
tags: [phaseA]
|
||||
|
||||
- name: Phase A — install backend artifact + run migrate_tool inside tools
|
||||
hosts: "{{ veza_container_prefix + 'backend-tools' }}"
|
||||
hosts: veza_app_backend_tools
|
||||
become: true
|
||||
gather_facts: false
|
||||
vars:
|
||||
ansible_connection: community.general.incus
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
veza_component: backend
|
||||
veza_target_color: tools # not blue/green — bypass color logic in name
|
||||
tasks:
|
||||
- name: Apt deps for tools container
|
||||
ansible.builtin.apt:
|
||||
|
|
@ -125,13 +120,10 @@
|
|||
# =====================================================================
|
||||
# Phase B — Determine inactive color
|
||||
# =====================================================================
|
||||
- name: Phase B — read active color, compute inactive_color
|
||||
hosts: "{{ veza_container_prefix + 'haproxy' }}"
|
||||
- name: Phase B — read active color, compute inactive_color, populate dynamic groups
|
||||
hosts: haproxy
|
||||
become: true
|
||||
gather_facts: false
|
||||
vars:
|
||||
ansible_connection: community.general.incus
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
tasks:
|
||||
- name: Read currently-active color
|
||||
ansible.builtin.slurp:
|
||||
|
|
@ -157,6 +149,41 @@
|
|||
Deploying SHA {{ veza_release_sha[:12] }} to color
|
||||
{{ inactive_color }} (currently active: {{ prior_active_color }}).
|
||||
|
||||
# Use add_host to dynamically populate phase_c_<component> groups
|
||||
# with the correct inactive-color hostnames. Subsequent plays
|
||||
# target these dynamic groups by static name — Ansible's host
|
||||
# parser doesn't see {{ }} so this avoids the var-undefined-at-
|
||||
# parse-time issue.
|
||||
- name: Stage inactive-color backend in phase_c_backend group
|
||||
ansible.builtin.add_host:
|
||||
name: "{{ veza_container_prefix }}backend-{{ inactive_color }}"
|
||||
groups: phase_c_backend
|
||||
ansible_connection: community.general.incus
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
veza_component: backend
|
||||
veza_target_color: "{{ inactive_color }}"
|
||||
changed_when: false
|
||||
|
||||
- name: Stage inactive-color stream in phase_c_stream group
|
||||
ansible.builtin.add_host:
|
||||
name: "{{ veza_container_prefix }}stream-{{ inactive_color }}"
|
||||
groups: phase_c_stream
|
||||
ansible_connection: community.general.incus
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
veza_component: stream
|
||||
veza_target_color: "{{ inactive_color }}"
|
||||
changed_when: false
|
||||
|
||||
- name: Stage inactive-color web in phase_c_web group
|
||||
ansible.builtin.add_host:
|
||||
name: "{{ veza_container_prefix }}web-{{ inactive_color }}"
|
||||
groups: phase_c_web
|
||||
ansible_connection: community.general.incus
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
veza_component: web
|
||||
veza_target_color: "{{ inactive_color }}"
|
||||
changed_when: false
|
||||
|
||||
# =====================================================================
|
||||
# Phase C — destroy + relaunch the three app containers in inactive_color
|
||||
# =====================================================================
|
||||
|
|
@ -165,28 +192,23 @@
|
|||
become: true
|
||||
gather_facts: false
|
||||
vars:
|
||||
inactive_color: "{{ hostvars[veza_container_prefix + 'haproxy']['inactive_color'] }}"
|
||||
inactive_color: "{{ hostvars[groups['haproxy'][0]]['inactive_color'] }}"
|
||||
tasks:
|
||||
- name: Destroy + launch each component container
|
||||
ansible.builtin.shell: |
|
||||
set -e
|
||||
CT="{{ veza_container_prefix }}{{ item }}-{{ inactive_color }}"
|
||||
# Force-delete is fine — these are stateless app containers ; the
|
||||
# active color is untouched.
|
||||
incus delete --force "$CT" 2>/dev/null || true
|
||||
incus launch {{ veza_app_base_image }} "$CT" \
|
||||
--profile veza-app \
|
||||
--profile veza-net \
|
||||
--network "{{ veza_incus_network }}"
|
||||
for i in $(seq 1 {{ veza_app_container_ready_timeout | default(30) }}); do
|
||||
if incus exec "$CT" -- /bin/true 2>/dev/null; then
|
||||
exit 0
|
||||
fi
|
||||
sleep 1
|
||||
done
|
||||
echo "Container $CT did not become ready"
|
||||
exit 1
|
||||
args:
|
||||
ansible.builtin.shell:
|
||||
cmd: |
|
||||
set -e
|
||||
CT="{{ veza_container_prefix }}{{ item }}-{{ inactive_color }}"
|
||||
incus delete --force "$CT" 2>/dev/null || true
|
||||
incus launch "{{ veza_app_base_image }}" "$CT" --profile veza-app --profile veza-net --network "{{ veza_incus_network }}"
|
||||
for i in $(seq 1 {{ veza_app_container_ready_timeout | default(30) }}); do
|
||||
if incus exec "$CT" -- /bin/true 2>/dev/null; then
|
||||
exit 0
|
||||
fi
|
||||
sleep 1
|
||||
done
|
||||
echo "Container $CT did not become ready"
|
||||
exit 1
|
||||
executable: /bin/bash
|
||||
loop:
|
||||
- backend
|
||||
|
|
@ -200,40 +222,25 @@
|
|||
tags: [phaseC]
|
||||
|
||||
- name: Phase C — provision backend (inactive color) via veza_app role
|
||||
hosts: "{{ veza_container_prefix + 'backend-' + hostvars[veza_container_prefix + 'haproxy']['inactive_color'] }}"
|
||||
hosts: phase_c_backend
|
||||
become: true
|
||||
gather_facts: false
|
||||
vars:
|
||||
ansible_connection: community.general.incus
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
veza_component: backend
|
||||
veza_target_color: "{{ hostvars[veza_container_prefix + 'haproxy']['inactive_color'] }}"
|
||||
roles:
|
||||
- veza_app
|
||||
tags: [phaseC, backend]
|
||||
|
||||
- name: Phase C — provision stream (inactive color)
|
||||
hosts: "{{ veza_container_prefix + 'stream-' + hostvars[veza_container_prefix + 'haproxy']['inactive_color'] }}"
|
||||
hosts: phase_c_stream
|
||||
become: true
|
||||
gather_facts: false
|
||||
vars:
|
||||
ansible_connection: community.general.incus
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
veza_component: stream
|
||||
veza_target_color: "{{ hostvars[veza_container_prefix + 'haproxy']['inactive_color'] }}"
|
||||
roles:
|
||||
- veza_app
|
||||
tags: [phaseC, stream]
|
||||
|
||||
- name: Phase C — provision web (inactive color)
|
||||
hosts: "{{ veza_container_prefix + 'web-' + hostvars[veza_container_prefix + 'haproxy']['inactive_color'] }}"
|
||||
hosts: phase_c_web
|
||||
become: true
|
||||
gather_facts: false
|
||||
vars:
|
||||
ansible_connection: community.general.incus
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
veza_component: web
|
||||
veza_target_color: "{{ hostvars[veza_container_prefix + 'haproxy']['inactive_color'] }}"
|
||||
roles:
|
||||
- veza_app
|
||||
tags: [phaseC, web]
|
||||
|
|
@ -244,12 +251,9 @@
|
|||
# is up locally but unreachable via Incus DNS.
|
||||
# =====================================================================
|
||||
- name: Phase D — probe each component via Incus DNS (cross-container)
|
||||
hosts: "{{ veza_container_prefix + 'haproxy' }}"
|
||||
hosts: haproxy
|
||||
become: true
|
||||
gather_facts: false
|
||||
vars:
|
||||
ansible_connection: community.general.incus
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
tasks:
|
||||
- name: Curl each component's health endpoint
|
||||
ansible.builtin.uri:
|
||||
|
|
@ -274,12 +278,10 @@
|
|||
# cfg on failure.
|
||||
# =====================================================================
|
||||
- name: Phase E — switch HAProxy to the new color
|
||||
hosts: "{{ veza_container_prefix + 'haproxy' }}"
|
||||
hosts: haproxy
|
||||
become: true
|
||||
gather_facts: true # roles/veza_haproxy_switch wants ansible_date_time
|
||||
vars:
|
||||
ansible_connection: community.general.incus
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
veza_active_color: "{{ inactive_color }}" # the color we ARE switching TO
|
||||
roles:
|
||||
- veza_haproxy_switch
|
||||
|
|
@ -295,61 +297,71 @@
|
|||
become: true
|
||||
gather_facts: true
|
||||
vars:
|
||||
inactive_color: "{{ hostvars[veza_container_prefix + 'haproxy']['inactive_color'] }}"
|
||||
prior_active_color: "{{ hostvars[veza_container_prefix + 'haproxy']['prior_active_color'] }}"
|
||||
inactive_color: "{{ hostvars[groups['haproxy'][0]]['inactive_color'] }}"
|
||||
prior_active_color: "{{ hostvars[groups['haproxy'][0]]['prior_active_color'] }}"
|
||||
tasks:
|
||||
- name: Curl public health endpoint via HAProxy
|
||||
ansible.builtin.uri:
|
||||
url: "{{ veza_public_url }}/api/v1/health"
|
||||
method: GET
|
||||
status_code: [200]
|
||||
timeout: 10
|
||||
validate_certs: "{{ veza_public_url.startswith('https://') }}"
|
||||
register: public_health
|
||||
retries: 10
|
||||
delay: 3
|
||||
until: public_health.status == 200
|
||||
tags: [phaseF, verify]
|
||||
# Block/rescue at TASK level — Ansible doesn't accept rescue at play
|
||||
# level. Both the success path (verify + record) and the rescue path
|
||||
# (record failure + revert HAProxy + fail) live inside this block.
|
||||
- name: Verify externally and record state, with rollback-on-failure
|
||||
block:
|
||||
- name: Curl public health endpoint via HAProxy
|
||||
ansible.builtin.uri:
|
||||
url: "{{ veza_public_url }}/api/v1/health"
|
||||
method: GET
|
||||
status_code: [200]
|
||||
timeout: 10
|
||||
validate_certs: "{{ veza_public_url.startswith('https://') }}"
|
||||
register: public_health
|
||||
retries: 10
|
||||
delay: 3
|
||||
until: public_health.status == 200
|
||||
tags: [phaseF, verify]
|
||||
|
||||
- name: Write deploy-state.json (consumed by node-exporter textfile)
|
||||
ansible.builtin.copy:
|
||||
dest: /var/lib/node_exporter/textfile_collector/veza_deploy.prom
|
||||
content: |
|
||||
# HELP veza_deploy_active_color 0=blue, 1=green.
|
||||
# TYPE veza_deploy_active_color gauge
|
||||
veza_deploy_active_color{env="{{ veza_env }}"} {{ 0 if inactive_color == 'blue' else 1 }}
|
||||
# HELP veza_deploy_release_sha info metric, label=sha.
|
||||
# TYPE veza_deploy_release_sha gauge
|
||||
veza_deploy_release_sha{env="{{ veza_env }}",sha="{{ veza_release_sha }}",color="{{ inactive_color }}"} 1
|
||||
# HELP veza_deploy_last_success_timestamp unix epoch of last successful deploy.
|
||||
# TYPE veza_deploy_last_success_timestamp gauge
|
||||
veza_deploy_last_success_timestamp{env="{{ veza_env }}"} {{ ansible_date_time.epoch }}
|
||||
mode: "0644"
|
||||
tags: [phaseF, metrics]
|
||||
rescue:
|
||||
- name: Public health failed — record the failure timestamp
|
||||
ansible.builtin.copy:
|
||||
dest: /var/lib/node_exporter/textfile_collector/veza_deploy.prom
|
||||
content: |
|
||||
# HELP veza_deploy_last_failure_timestamp unix epoch of last failed deploy.
|
||||
# TYPE veza_deploy_last_failure_timestamp gauge
|
||||
veza_deploy_last_failure_timestamp{env="{{ veza_env }}",sha="{{ veza_release_sha }}",color="{{ inactive_color }}"} {{ ansible_date_time.epoch }}
|
||||
mode: "0644"
|
||||
failed_when: false
|
||||
- name: Write deploy-state.json (consumed by node-exporter textfile)
|
||||
ansible.builtin.copy:
|
||||
dest: /var/lib/node_exporter/textfile_collector/veza_deploy.prom
|
||||
content: |
|
||||
# HELP veza_deploy_active_color 0=blue, 1=green.
|
||||
# TYPE veza_deploy_active_color gauge
|
||||
veza_deploy_active_color{env="{{ veza_env }}"} {{ 0 if inactive_color == 'blue' else 1 }}
|
||||
# HELP veza_deploy_release_sha info metric, label=sha.
|
||||
# TYPE veza_deploy_release_sha gauge
|
||||
veza_deploy_release_sha{env="{{ veza_env }}",sha="{{ veza_release_sha }}",color="{{ inactive_color }}"} 1
|
||||
# HELP veza_deploy_last_success_timestamp unix epoch of last successful deploy.
|
||||
# TYPE veza_deploy_last_success_timestamp gauge
|
||||
veza_deploy_last_success_timestamp{env="{{ veza_env }}"} {{ ansible_date_time.epoch }}
|
||||
mode: "0644"
|
||||
tags: [phaseF, metrics]
|
||||
rescue:
|
||||
- name: Public health failed — record the failure timestamp
|
||||
ansible.builtin.copy:
|
||||
dest: /var/lib/node_exporter/textfile_collector/veza_deploy.prom
|
||||
content: |
|
||||
# HELP veza_deploy_last_failure_timestamp unix epoch of last failed deploy.
|
||||
# TYPE veza_deploy_last_failure_timestamp gauge
|
||||
veza_deploy_last_failure_timestamp{env="{{ veza_env }}",sha="{{ veza_release_sha }}",color="{{ inactive_color }}"} {{ ansible_date_time.epoch }}
|
||||
mode: "0644"
|
||||
failed_when: false
|
||||
|
||||
- name: Re-switch HAProxy back to the prior color
|
||||
ansible.builtin.import_role:
|
||||
name: veza_haproxy_switch
|
||||
vars:
|
||||
veza_active_color: "{{ prior_active_color }}"
|
||||
delegate_to: "{{ veza_container_prefix + 'haproxy' }}"
|
||||
- name: Re-switch HAProxy back to the prior color (delegated)
|
||||
delegate_to: "{{ groups['haproxy'][0] }}"
|
||||
vars:
|
||||
ansible_connection: community.general.incus
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
block:
|
||||
- name: Apply veza_haproxy_switch with prior_active_color
|
||||
ansible.builtin.include_role:
|
||||
name: veza_haproxy_switch
|
||||
vars:
|
||||
veza_active_color: "{{ prior_active_color }}"
|
||||
|
||||
- name: Fail the playbook
|
||||
ansible.builtin.fail:
|
||||
msg: >-
|
||||
Public health probe via HAProxy failed after deploy of SHA
|
||||
{{ veza_release_sha[:12] }} to color {{ inactive_color }}.
|
||||
HAProxy reverted to the prior color ({{ prior_active_color }}).
|
||||
The freshly-deployed {{ inactive_color }} containers are kept
|
||||
alive for forensics — inspect with:
|
||||
incus exec {{ veza_container_prefix }}backend-{{ inactive_color }} -- journalctl -u veza-backend -n 200
|
||||
- name: Fail the playbook
|
||||
ansible.builtin.fail:
|
||||
msg: >-
|
||||
Public health probe via HAProxy failed after deploy of SHA
|
||||
{{ veza_release_sha[:12] }} to color {{ inactive_color }}.
|
||||
HAProxy reverted to the prior color ({{ prior_active_color }}).
|
||||
The freshly-deployed {{ inactive_color }} containers are kept
|
||||
alive for forensics — inspect with:
|
||||
incus exec {{ veza_container_prefix }}backend-{{ inactive_color }} -- journalctl -u veza-backend -n 200
|
||||
|
|
|
|||
|
|
@ -112,28 +112,23 @@
|
|||
gather_facts: false
|
||||
tasks:
|
||||
- name: Launch container if absent
|
||||
ansible.builtin.shell: |
|
||||
set -e
|
||||
if incus info "{{ item.name }}" >/dev/null 2>&1; then
|
||||
echo "{{ item.name }} already exists"
|
||||
exit 0
|
||||
fi
|
||||
incus launch {{ veza_app_base_image }} "{{ item.name }}" \
|
||||
--profile veza-data \
|
||||
--profile veza-net \
|
||||
--network "{{ veza_incus_network }}"
|
||||
# Wait for the container's API to respond before any subsequent task
|
||||
# (apt, systemd) hits a half-up container.
|
||||
for i in $(seq 1 {{ veza_app_container_ready_timeout | default(30) }}); do
|
||||
if incus exec "{{ item.name }}" -- /bin/true 2>/dev/null; then
|
||||
echo "Container {{ item.name }} ready"
|
||||
ansible.builtin.shell:
|
||||
cmd: |
|
||||
set -e
|
||||
if incus info "{{ item.name }}" >/dev/null 2>&1; then
|
||||
echo "{{ item.name }} already exists"
|
||||
exit 0
|
||||
fi
|
||||
sleep 1
|
||||
done
|
||||
echo "Container {{ item.name }} did not become ready within timeout"
|
||||
exit 1
|
||||
args:
|
||||
incus launch "{{ veza_app_base_image }}" "{{ item.name }}" --profile veza-data --profile veza-net --network "{{ veza_incus_network }}"
|
||||
for i in $(seq 1 {{ veza_app_container_ready_timeout | default(30) }}); do
|
||||
if incus exec "{{ item.name }}" -- /bin/true 2>/dev/null; then
|
||||
echo "Container {{ item.name }} ready"
|
||||
exit 0
|
||||
fi
|
||||
sleep 1
|
||||
done
|
||||
echo "Container {{ item.name }} did not become ready within timeout"
|
||||
exit 1
|
||||
executable: /bin/bash
|
||||
loop: "{{ veza_data_containers }}"
|
||||
register: launch_result
|
||||
|
|
@ -150,7 +145,7 @@
|
|||
# tasks/<kind>.yml or role.
|
||||
# -----------------------------------------------------------------------
|
||||
- name: Configure postgres
|
||||
hosts: "{{ veza_container_prefix + 'postgres' }}"
|
||||
hosts: veza_data_postgres
|
||||
become: true
|
||||
gather_facts: false
|
||||
vars:
|
||||
|
|
@ -198,7 +193,7 @@
|
|||
tags: [data, postgres]
|
||||
|
||||
- name: Configure redis
|
||||
hosts: "{{ veza_container_prefix + 'redis' }}"
|
||||
hosts: veza_data_redis
|
||||
become: true
|
||||
gather_facts: false
|
||||
vars:
|
||||
|
|
@ -250,7 +245,7 @@
|
|||
tags: [data, redis]
|
||||
|
||||
- name: Configure rabbitmq
|
||||
hosts: "{{ veza_container_prefix + 'rabbitmq' }}"
|
||||
hosts: veza_data_rabbitmq
|
||||
become: true
|
||||
gather_facts: false
|
||||
vars:
|
||||
|
|
@ -295,7 +290,7 @@
|
|||
tags: [data, rabbitmq]
|
||||
|
||||
- name: Configure minio
|
||||
hosts: "{{ veza_container_prefix + 'minio' }}"
|
||||
hosts: veza_data_minio
|
||||
become: true
|
||||
gather_facts: false
|
||||
vars:
|
||||
|
|
|
|||
|
|
@ -1,14 +1,12 @@
|
|||
# rollback.yml — two modes :
|
||||
#
|
||||
# 1. fast : flip HAProxy back to the previous active color.
|
||||
# Works only if those containers are still alive
|
||||
# (i.e., the next deploy has NOT yet recycled them).
|
||||
# Works only if those containers are still alive.
|
||||
# Effect time : ~5 seconds.
|
||||
#
|
||||
# 2. full : redeploy a specific release_sha by re-running
|
||||
# deploy_app.yml with that SHA. Works whenever the
|
||||
# tarball is still in the Forgejo Registry. Effect
|
||||
# time : ~5-10 minutes.
|
||||
# deploy_app.yml with that SHA.
|
||||
# Effect time : ~5-10 minutes.
|
||||
#
|
||||
# Required extra-vars:
|
||||
# env staging | prod
|
||||
|
|
@ -16,11 +14,7 @@
|
|||
# target_color (mode=fast only) the color to flip TO
|
||||
# release_sha (mode=full only) the SHA to redeploy
|
||||
#
|
||||
# Caller (workflow_dispatch only — see .forgejo/workflows/rollback.yml):
|
||||
# ansible-playbook -i inventory/{{env}}.yml playbooks/rollback.yml \
|
||||
# -e env={{env}} -e mode=fast -e target_color=blue
|
||||
# ansible-playbook -i inventory/{{env}}.yml playbooks/rollback.yml \
|
||||
# -e env={{env}} -e mode=full -e release_sha=<previous_sha>
|
||||
# Caller (workflow_dispatch only — see .forgejo/workflows/rollback.yml).
|
||||
---
|
||||
- name: Validate inputs
|
||||
hosts: incus_hosts
|
||||
|
|
@ -57,27 +51,28 @@
|
|||
|
||||
# ---------------------------------------------------------------------
|
||||
# mode=fast → HAProxy flip only.
|
||||
# `when:` lives at TASK level (Ansible doesn't accept it at play level).
|
||||
# ---------------------------------------------------------------------
|
||||
- name: Fast rollback — verify target_color containers are alive
|
||||
hosts: incus_hosts
|
||||
become: true
|
||||
gather_facts: false
|
||||
tasks:
|
||||
- name: Check each target-color container exists
|
||||
ansible.builtin.shell: |
|
||||
set -e
|
||||
CT="{{ veza_container_prefix }}{{ item }}-{{ target_color }}"
|
||||
if ! incus info "$CT" >/dev/null 2>&1; then
|
||||
echo "MISSING $CT"
|
||||
exit 1
|
||||
fi
|
||||
STATE=$(incus list "$CT" -c s --format csv)
|
||||
if [ "$STATE" != "RUNNING" ]; then
|
||||
echo "$CT is $STATE (not RUNNING)"
|
||||
exit 1
|
||||
fi
|
||||
echo "OK $CT"
|
||||
args:
|
||||
- name: Check each target-color container exists and is RUNNING
|
||||
ansible.builtin.shell:
|
||||
cmd: |
|
||||
set -e
|
||||
CT="{{ veza_container_prefix }}{{ item }}-{{ target_color }}"
|
||||
if ! incus info "$CT" >/dev/null 2>&1; then
|
||||
echo "MISSING $CT"
|
||||
exit 1
|
||||
fi
|
||||
STATE=$(incus list "$CT" -c s --format csv)
|
||||
if [ "$STATE" != "RUNNING" ]; then
|
||||
echo "$CT is $STATE (not RUNNING)"
|
||||
exit 1
|
||||
fi
|
||||
echo "OK $CT"
|
||||
executable: /bin/bash
|
||||
loop:
|
||||
- backend
|
||||
|
|
@ -85,29 +80,31 @@
|
|||
- web
|
||||
changed_when: false
|
||||
register: alive_check
|
||||
when: mode == 'fast'
|
||||
tags: [rollback, fast]
|
||||
when: mode == 'fast'
|
||||
tags: [rollback, fast]
|
||||
|
||||
- name: Fast rollback — flip HAProxy
|
||||
hosts: "{{ veza_container_prefix + 'haproxy' }}"
|
||||
hosts: haproxy
|
||||
become: true
|
||||
gather_facts: true
|
||||
vars:
|
||||
ansible_connection: community.general.incus
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
veza_active_color: "{{ target_color }}"
|
||||
# Fast rollback re-uses the previous SHA from the history file.
|
||||
veza_release_sha: "{{ lookup('ansible.builtin.file', '/var/lib/veza/active-color.history', errors='ignore') | regex_search('sha=([0-9a-f]+)', '\\1') | default(['rollback'], true) | first }}"
|
||||
roles:
|
||||
- veza_haproxy_switch
|
||||
when: mode == 'fast'
|
||||
tags: [rollback, fast]
|
||||
tasks:
|
||||
- name: Apply veza_haproxy_switch with target_color
|
||||
ansible.builtin.include_role:
|
||||
name: veza_haproxy_switch
|
||||
vars:
|
||||
veza_active_color: "{{ target_color }}"
|
||||
# Fast rollback re-uses the previous SHA from the history file.
|
||||
# Fallback to a synthetic 40-char SHA if the file is missing —
|
||||
# the role's assert tolerates this for the rollback case.
|
||||
veza_release_sha: "{{ (lookup('ansible.builtin.file', '/var/lib/veza/active-color.history', errors='ignore') | default('', true) | regex_search('sha=([0-9a-f]{40})', '\\1') | default('r0llback' + '0' * 32, true)) }}"
|
||||
when: mode == 'fast'
|
||||
tags: [rollback, fast]
|
||||
|
||||
# ---------------------------------------------------------------------
|
||||
# mode=full → re-import deploy_app.yml with the rollback SHA.
|
||||
# Functionally identical to a fresh deploy of an older release.
|
||||
# mode=full → re-run deploy_app.yml with the rollback SHA.
|
||||
# `when:` IS valid on import_playbook (unlike on a regular play).
|
||||
# ---------------------------------------------------------------------
|
||||
- name: Full rollback — delegate to deploy_app.yml with release_sha={{ veza_release_sha | default('') }}
|
||||
- name: Full rollback — delegate to deploy_app.yml
|
||||
ansible.builtin.import_playbook: deploy_app.yml
|
||||
when: mode == 'full'
|
||||
tags: [rollback, full]
|
||||
|
|
|
|||
Loading…
Reference in a new issue