veza/docs/CI_E2E.md

# E2E CI — runbook

> **v1.0.8 Batch C** — Playwright E2E suite running on Forgejo Actions.
> Workflow: `.github/workflows/e2e.yml`. Tests: `tests/e2e/*.spec.ts`.
> Skipped tests inventory: `tests/e2e/SKIPPED_TESTS.md`.

---

## Triggers

| Trigger | Scope | Target time | Why |
|---|---|---|---|
| PR opened / synced (against `main`) | `@critical` only | ~5–7 min | Fast feedback loop, blocks merge if red |
| Push to `main` | Full suite | ~25 min | Catches regressions that slipped past `@critical` |
| Nightly cron (03:00 UTC) | Full suite | ~25 min | Catches infra drift independent of merges |
| `workflow_dispatch` | Full suite | manual | Re-run after a flaky failure or on a feature branch |

`@critical` is a Playwright `--grep` tag — see `npm run e2e:critical`.

---

## How a CI run works

1. `actions/checkout` + `setup-node@20` + `setup-go@1.25`.
2. `npm ci` from repo root.
3. Adds `127.0.0.1 veza.fr` to `/etc/hosts` so the browsers can hit
   the dev domain.
4. Generates dev JWT keys + SSL cert via the existing scripts.
5. Brings up `postgres / redis / rabbitmq` via `docker compose`.
6. Runs Go migrations.
7. **`go run ./cmd/tools/seed --ci`** — the lean seed: 5 test accounts
   + 10 tracks + 3 playlists, no chat/live/marketplace/analytics. ~5s.
8. Builds + starts the backend on `localhost:18080`, asserts
   `/api/v1/health`.
9. `playwright install --with-deps chromium`.
10. Runs `npm run e2e:critical` (PR) or `npm run e2e` (push/cron).
    `CI=true` is exported globally so `playwright.config.ts:141,155`
    spawns its own Vite + backend instance instead of trying to reuse.
11. On failure: uploads the Playwright HTML report and `backend.log`
    as artifacts, retained 7 days.

---

## Required secrets (Forgejo)

The workflow falls back to dev defaults so it can still run on a
fresh repo without secrets configured, but **production-style runs
should set these in Forgejo Actions secrets**:

| Secret | Default fallback | Purpose |
|---|---|---|
| `E2E_DB_PASSWORD` | `devpassword` | Postgres password (must match `docker-compose.yml`) |
| `E2E_JWT_SECRET` | `ci-dev-jwt-secret-32-chars-min-padding!!` | HS256 signing key (32+ chars) |
| `E2E_RABBITMQ_URL` | `amqp://guest:guest@localhost:5672/` | RabbitMQ AMQP URL |

Without these, the workflow still passes for everything that doesn't
exercise WebSocket / RabbitMQ paths under load.

---

## Reproducing a CI failure locally

Mirrors the workflow exactly:

```bash
# From repo root
make infra-up-dev                  # postgres + redis + rabbitmq
cd veza-backend-api
go run cmd/migrate_tool/main.go
go run ./cmd/tools/seed --ci       # 5 test accounts only
go build -o veza-api ./cmd/api/main.go
APP_ENV=test ./veza-api &

# In another shell
cd apps/web && npm run dev -- --host 127.0.0.1 --port 5174 &

# Run the same tests CI ran
cd /path/to/repo
CI=true npm run e2e:critical       # PR scope
# or
CI=true npm run e2e                # full suite
```

If the failure only reproduces under `CI=true`, suspect
`reuseExistingServer` — set `CI=` (empty) to flip back to local mode
and bisect.

---

## Debugging a red run

1. **Open the run** in Forgejo Actions UI.
2. Find the failing job's "Run E2E" step. Each test failure shows the
   selector / assertion / screenshot inline.
3. Scroll to the artifact section: download
   `playwright-report-<run-id>-<attempt>` (the HTML report — opens in
   any browser, shows trace viewer + video for retry-on-fail) and
   `backend-log-<run-id>-<attempt>` (full backend stdout + stderr).
4. If the failure looks env-related (404 on a known route, 500
   without a clear cause), check `backend-log` for panics or
   migration errors before assuming a test bug.
5. Cross-check `tests/e2e/SKIPPED_TESTS.md` — if the test is already
   listed as flaky, the right fix may be `.skip()` until the
   underlying app bug is tracked.

---

## Adding a new E2E test

1. Drop a `*.spec.ts` file under `tests/e2e/`.
2. Tag it with `@critical` if it must run on every PR (be conservative
   — every `@critical` test extends the PR feedback loop).
3. Use the auth fixture from `tests/e2e/fixtures/auth.fixture.ts`
   (`listenerPage` / `creatorPage` / `adminPage` / `moderatorPage`)
   instead of writing UI login flows.
4. If the test needs DB state outside the `--ci` seed (rare), seed it
   from inside the test via `page.request.post(...)` rather than
   extending the seed tool — keeps the seed lean.
5. Run locally with `CI=true npm run e2e:critical -- --grep "your test"`
   before pushing.

---

## Scaling considerations

- Forgejo runner pool is shared across CI workflows — keep PR runs
  under 10 min so we don't hold a runner during peak hours.
- `docker compose up -d postgres redis rabbitmq` reuses the dev
  compose file; if that file changes, the workflow inherits the
  change automatically.
- The full suite is gated to push/cron/dispatch precisely because we
  don't want to pay 25 min on every PR push.

---

## Related

- `tests/e2e/playwright.config.ts` — base config, `reuseExistingServer:
  !process.env.CI` (committed in v1.0.8 C3, commit `46d21c5c`).
- `veza-backend-api/cmd/tools/seed/config.go` — `CIConfig()` and the
  `--ci` flag (committed in v1.0.8 C4, commit `cee850a5`).
- `tests/e2e/SKIPPED_TESTS.md` — known flakes + tickets to resolve.
- `docs/audit-2026-04/v107-plan.md` — historical context for E2E
  coverage gaps that landed in v1.0.7.
-												feat(ci): add E2E Playwright workflow + runbook (v1.0.8 C2 + C5)

Closes the second-to-last item of Batch C (after C3 reuseExistingServer
and C4 seed --ci flag landed earlier). Wires the existing Playwright
suite (60+ spec files in tests/e2e/) into Forgejo Actions.

Workflow shape (.github/workflows/e2e.yml):
- pull_request → @critical only (5-7min target, 20min timeout)
- push to main → full suite (~25min target, 45min timeout)
- nightly cron 03:00 UTC → full suite, catches infra drift
- workflow_dispatch → full suite, manual trigger

Single job structure with conditional steps based on github.event_name.
The job:
  1. Boots Postgres / Redis / RabbitMQ via docker compose.
  2. Runs Go migrations.
  3. `go run ./cmd/tools/seed --ci` — the lean seed landed in C4
     (5 test accounts + 10 tracks + 3 playlists, ~5s).
  4. Builds + starts the backend with APP_ENV=test plus
     DISABLE_RATE_LIMIT_FOR_TESTS=true and the lockout-exempt
     emails matching the auth fixture.
  5. `playwright install --with-deps chromium`.
  6. `npm run e2e:critical` (PR) or `npm run e2e` (push/cron).
  7. Uploads the Playwright HTML report + backend log on failure
     (7-day retention, sufficient for triage).

The `CI: "true"` env var is set workflow-wide so playwright.config.ts
(line 141, 155) sees `process.env.CI` and flips reuseExistingServer
to false, guaranteeing a fresh backend + Vite per job.

Secrets fall back to dev defaults (devpassword / 38-char dev JWT /
guest:guest@localhost:5672) so a fresh repo runs without configuring
secrets first; production-style runs should set `E2E_DB_PASSWORD`,
`E2E_JWT_SECRET`, `E2E_RABBITMQ_URL` in Forgejo Actions secrets.

Runbook (docs/CI_E2E.md):
- Trigger / scope / target time table.
- Step-by-step explanation of what a CI run does.
- Required secrets + their fallbacks.
- "Reproducing a CI failure locally" — exact mirror of the workflow
  invocation so a dev can rerun without pushing.
- "Debugging a red run" — where to look in the Forgejo UI, what the
  artifacts contain, when to check SKIPPED_TESTS.md.
- "Adding a new E2E test" — fixture usage, when to tag @critical.

Action pin SHAs match the rest of the workflows (consistent supply-
chain hygiene). Go 1.25 (matches ci.yml backend job, NOT the older
1.24 used in the disabled accessibility.yml template).

Remaining Batch C item: C6 — flake stabilisation (~3-5 of the 22
SKIPPED_TESTS.md entries that look fixable). Defer to a follow-up
session — wiring the workflow first means the next push-to-main run
will tell us empirically which @critical tests are flaky in CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

											
										
										
											2026-04-25 21:51:33 +00:00
+								# E2E CI — runbook
 								> **v1.0.8 Batch C** — Playwright E2E suite running on Forgejo Actions.
 								> Workflow: `.github/workflows/e2e.yml`. Tests: `tests/e2e/*.spec.ts`.
 								> Skipped tests inventory: `tests/e2e/SKIPPED_TESTS.md`.
 								---
 								## Triggers
 								| Trigger | Scope | Target time | Why |
 								|---|---|---|---|
 								| PR opened / synced (against `main`) | `@critical` only | ~5–7 min | Fast feedback loop, blocks merge if red |
 								| Push to `main` | Full suite | ~25 min | Catches regressions that slipped past `@critical` |
 								| Nightly cron (03:00 UTC) | Full suite | ~25 min | Catches infra drift independent of merges |
 								| `workflow_dispatch` | Full suite | manual | Re-run after a flaky failure or on a feature branch |
 								`@critical` is a Playwright `--grep` tag — see `npm run e2e:critical`.
 								---
 								## How a CI run works
 . `actions/checkout` + `setup-node@20` + `setup-go@1.25`.
 . `npm ci` from repo root.
 . Adds `127.0.0.1 veza.fr` to `/etc/hosts` so the browsers can hit
 								   the dev domain.
 . Generates dev JWT keys + SSL cert via the existing scripts.
 . Brings up `postgres / redis / rabbitmq` via `docker compose`.
 . Runs Go migrations.
 . **`go run ./cmd/tools/seed --ci`** — the lean seed: 5 test accounts
 								   + 10 tracks + 3 playlists, no chat/live/marketplace/analytics. ~5s.
 . Builds + starts the backend on `localhost:18080`, asserts
 								   `/api/v1/health`.
 . `playwright install --with-deps chromium`.
 . Runs `npm run e2e:critical` (PR) or `npm run e2e` (push/cron).
 								    `CI=true` is exported globally so `playwright.config.ts:141,155`
 								    spawns its own Vite + backend instance instead of trying to reuse.
 . On failure: uploads the Playwright HTML report and `backend.log`
 								    as artifacts, retained 7 days.
 								---
 								## Required secrets (Forgejo)
 								The workflow falls back to dev defaults so it can still run on a
 								fresh repo without secrets configured, but **production-style runs
 								should set these in Forgejo Actions secrets**:
 								| Secret | Default fallback | Purpose |
 								|---|---|---|
 								| `E2E_DB_PASSWORD` | `devpassword` | Postgres password (must match `docker-compose.yml`) |
 								| `E2E_JWT_SECRET` | `ci-dev-jwt-secret-32-chars-min-padding!!` | HS256 signing key (32+ chars) |
 								| `E2E_RABBITMQ_URL` | `amqp://guest:guest@localhost:5672/` | RabbitMQ AMQP URL |
 								Without these, the workflow still passes for everything that doesn't
 								exercise WebSocket / RabbitMQ paths under load.
 								---
 								## Reproducing a CI failure locally
 								Mirrors the workflow exactly:
 								```bash
 								# From repo root
 								make infra-up-dev                  # postgres + redis + rabbitmq
 								cd veza-backend-api
 								go run cmd/migrate_tool/main.go
 								go run ./cmd/tools/seed --ci       # 5 test accounts only
 								go build -o veza-api ./cmd/api/main.go
 								APP_ENV=test ./veza-api &
 								# In another shell
 								cd apps/web && npm run dev -- --host 127.0.0.1 --port 5174 &
 								# Run the same tests CI ran
 								cd /path/to/repo
 								CI=true npm run e2e:critical       # PR scope
 								# or
 								CI=true npm run e2e                # full suite
 								```
 								If the failure only reproduces under `CI=true`, suspect
 								`reuseExistingServer` — set `CI=` (empty) to flip back to local mode
 								and bisect.
 								---
 								## Debugging a red run
 . **Open the run** in Forgejo Actions UI.
 . Find the failing job's "Run E2E" step. Each test failure shows the
 								   selector / assertion / screenshot inline.
 . Scroll to the artifact section: download
 								   `playwright-report-<run-id>-<attempt>` (the HTML report — opens in
 								   any browser, shows trace viewer + video for retry-on-fail) and
 								   `backend-log-<run-id>-<attempt>` (full backend stdout + stderr).
 . If the failure looks env-related (404 on a known route, 500
 								   without a clear cause), check `backend-log` for panics or
 								   migration errors before assuming a test bug.
 . Cross-check `tests/e2e/SKIPPED_TESTS.md` — if the test is already
 								   listed as flaky, the right fix may be `.skip()` until the
 								   underlying app bug is tracked.
 								---
 								## Adding a new E2E test
 . Drop a `*.spec.ts` file under `tests/e2e/`.
 . Tag it with `@critical` if it must run on every PR (be conservative
 								   — every `@critical` test extends the PR feedback loop).
 . Use the auth fixture from `tests/e2e/fixtures/auth.fixture.ts`
 								   (`listenerPage` / `creatorPage` / `adminPage` / `moderatorPage`)
 								   instead of writing UI login flows.
 . If the test needs DB state outside the `--ci` seed (rare), seed it
 								   from inside the test via `page.request.post(...)` rather than
 								   extending the seed tool — keeps the seed lean.
 . Run locally with `CI=true npm run e2e:critical -- --grep "your test"`
 								   before pushing.
 								---
 								## Scaling considerations
 								- Forgejo runner pool is shared across CI workflows — keep PR runs
 								  under 10 min so we don't hold a runner during peak hours.
 								- `docker compose up -d postgres redis rabbitmq` reuses the dev
 								  compose file; if that file changes, the workflow inherits the
 								  change automatically.
 								- The full suite is gated to push/cron/dispatch precisely because we
 								  don't want to pay 25 min on every PR push.
 								---
 								## Related
 								- `tests/e2e/playwright.config.ts` — base config, `reuseExistingServer:
 								  !process.env.CI` (committed in v1.0.8 C3, commit `46d21c5c`).
 								- `veza-backend-api/cmd/tools/seed/config.go` — `CIConfig()` and the
 								  `--ci` flag (committed in v1.0.8 C4, commit `cee850a5`).
 								- `tests/e2e/SKIPPED_TESTS.md` — known flakes + tickets to resolve.
 								- `docs/audit-2026-04/v107-plan.md` — historical context for E2E
 								  coverage gaps that landed in v1.0.7.