Commit graph

3 commits

Author SHA1 Message Date
senke
ed1bb4084a ci(e2e): replace docker-compose with native services block
Some checks failed
Veza CI / Rust (Stream Server) (push) Successful in 3m56s
Security Scan / Secret Scanning (gitleaks) (push) Successful in 40s
Veza CI / Backend (Go) (push) Failing after 14m15s
E2E Playwright / e2e (full) (push) Failing after 15m25s
Veza CI / Frontend (Web) (push) Successful in 26m8s
Veza CI / Notify on failure (push) Successful in 3s
Symptom: e2e.yml was bringing up Postgres/Redis/RabbitMQ via
`docker compose up -d`, which forces the runner job container to share
the host docker socket, parses the entire docker-compose.yml at every
run (so unrelated interpolations like `${JWT_SECRET:?required}` block
the step), and never auto-cleans the started containers. Concurrent e2e
runs collided on host ports 15432/16379/15672. Combined with the
already-fragile DinD setup, this is one of the top sources of flakes.

Fix: use the GHA-native `services:` block. act_runner spawns the three
service containers on the job network with healthchecks, exposes them
by service hostname on standard ports, tears them down at the end. Net
removal: docker-compose dependency, host port mapping, manual readiness
loop, leaked-container risk.

Wire-shape changes (DB/cache/MQ URLs hoisted to job-level env):
  postgres -> postgres:5432 (was localhost:15432)
  redis    -> redis:6379    (was localhost:16379, + auth required)
  rabbitmq -> rabbitmq:5672 (was localhost:5672)

REDIS_URL now carries the requirepass secret to match
docker-compose.yml's REM-023 convention; previously the runner-side
redis happened to start without auth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 10:01:28 +02:00
senke
161840e0ab fix(ci): hoist JWT_SECRET to workflow env so docker compose validates
Some checks failed
Veza CI / Notify on failure (push) Blocked by required conditions
Security Scan / Secret Scanning (gitleaks) (push) Waiting to run
Veza CI / Rust (Stream Server) (push) Successful in 3m21s
Veza CI / Frontend (Web) (push) Has been cancelled
Veza CI / Backend (Go) (push) Has been cancelled
E2E Playwright / e2e (full) (push) Has been cancelled
docker-compose.yml declares the backend-api service environment with
`${JWT_SECRET:?JWT_SECRET must be set in .env}`. docker compose
validates the WHOLE file at parse time, even when `up -d` is asked
only for `postgres redis rabbitmq` — so the missing value blocks the
"Start backend services" step before anything actually runs.

Fix: hoist JWT_SECRET to the workflow-level env block (with the same
secret/fallback resolution as the Build+start step). The "Build+start
backend API" step now inherits it instead of re-defining.

Behaviour change : none for the backend itself — JWT_SECRET reaches
the same Go process via the same fallback chain. The fix is purely a
docker-compose validation step earlier in the pipeline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 09:43:43 +02:00
senke
f23d23cf2b feat(ci): add E2E Playwright workflow + runbook (v1.0.8 C2 + C5)
Closes the second-to-last item of Batch C (after C3 reuseExistingServer
and C4 seed --ci flag landed earlier). Wires the existing Playwright
suite (60+ spec files in tests/e2e/) into Forgejo Actions.

Workflow shape (.github/workflows/e2e.yml):
- pull_request → @critical only (5-7min target, 20min timeout)
- push to main → full suite (~25min target, 45min timeout)
- nightly cron 03:00 UTC → full suite, catches infra drift
- workflow_dispatch → full suite, manual trigger

Single job structure with conditional steps based on github.event_name.
The job:
  1. Boots Postgres / Redis / RabbitMQ via docker compose.
  2. Runs Go migrations.
  3. `go run ./cmd/tools/seed --ci` — the lean seed landed in C4
     (5 test accounts + 10 tracks + 3 playlists, ~5s).
  4. Builds + starts the backend with APP_ENV=test plus
     DISABLE_RATE_LIMIT_FOR_TESTS=true and the lockout-exempt
     emails matching the auth fixture.
  5. `playwright install --with-deps chromium`.
  6. `npm run e2e:critical` (PR) or `npm run e2e` (push/cron).
  7. Uploads the Playwright HTML report + backend log on failure
     (7-day retention, sufficient for triage).

The `CI: "true"` env var is set workflow-wide so playwright.config.ts
(line 141, 155) sees `process.env.CI` and flips reuseExistingServer
to false, guaranteeing a fresh backend + Vite per job.

Secrets fall back to dev defaults (devpassword / 38-char dev JWT /
guest:guest@localhost:5672) so a fresh repo runs without configuring
secrets first; production-style runs should set `E2E_DB_PASSWORD`,
`E2E_JWT_SECRET`, `E2E_RABBITMQ_URL` in Forgejo Actions secrets.

Runbook (docs/CI_E2E.md):
- Trigger / scope / target time table.
- Step-by-step explanation of what a CI run does.
- Required secrets + their fallbacks.
- "Reproducing a CI failure locally" — exact mirror of the workflow
  invocation so a dev can rerun without pushing.
- "Debugging a red run" — where to look in the Forgejo UI, what the
  artifacts contain, when to check SKIPPED_TESTS.md.
- "Adding a new E2E test" — fixture usage, when to tag @critical.

Action pin SHAs match the rest of the workflows (consistent supply-
chain hygiene). Go 1.25 (matches ci.yml backend job, NOT the older
1.24 used in the disabled accessibility.yml template).

Remaining Batch C item: C6 — flake stabilisation (~3-5 of the 22
SKIPPED_TESTS.md entries that look fixable). Defer to a follow-up
session — wiring the workflow first means the next push-to-main run
will tell us empirically which @critical tests are flaky in CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 23:51:33 +02:00