8 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
a2fa2eb493 |
fix(e2e): unblock @critical green slate for v1.0.9 tag (Day 4 triage)
Some checks failed
Veza CI / Rust (Stream Server) (push) Successful in 3m42s
Security Scan / Secret Scanning (gitleaks) (push) Successful in 55s
Veza CI / Backend (Go) (push) Successful in 5m17s
Veza CI / Frontend (Web) (push) Successful in 13m55s
Veza CI / Notify on failure (push) Has been skipped
E2E Playwright / e2e (full) (push) Failing after 24m53s
Triage of the 7 @critical failures from run 462 (full e2e on
|
||
|
|
88a165e4ec |
perf(ci): cut frontend unit + e2e wall time ~5-10× (vitest threads + chromium-only + browser cache)
Some checks failed
Veza CI / Notify on failure (push) Blocked by required conditions
Veza CI / Rust (Stream Server) (push) Successful in 3m47s
Security Scan / Secret Scanning (gitleaks) (push) Successful in 50s
Veza CI / Backend (Go) (push) Successful in 5m25s
Veza CI / Frontend (Web) (push) Has been cancelled
E2E Playwright / e2e (full) (push) Has been cancelled
CI runtime audit:
- vitest: ~6min on 12-core R720 — `maxThreads: 2` AND
`fileParallelism: false` made the 285-file suite essentially
file-serial.
- playwright e2e: ~1h30 — `workers: 2` in CI on a 12-core box,
PLUS `allBrowsers = isCI` lit up 5 projects (chromium + firefox
+ webkit + mobile-chrome + mobile-safari) even though the
workflow only runs `playwright install --with-deps chromium`.
Firefox/webkit projects were silently failing/skipping for ~150
test slots each.
- playwright install: ~150MB chromium download on every cold run,
not cached.
Three knobs flipped:
(1) apps/web/vitest.config.ts
- `fileParallelism: false` → `true`
- `maxThreads: 2` → `6`
Local bench: 344s → 130s (≈2.7× speedup). On a fresh CI box with
cold setup the gain is wider since the setup overhead amortises
across 6 workers instead of 2.
(2) tests/e2e/playwright.config.ts
- `allBrowsers = isCI || PLAYWRIGHT_ALL=1` → `PLAYWRIGHT_ALL=1`
only. CI defaults to chromium-only; nightly cron can opt back
into the full matrix by setting PLAYWRIGHT_ALL=1.
- `workers: 2` (CI) → `6`. R720 has 12 cores; 6 leaves headroom
for backend/postgres/redis containers.
(3) .github/workflows/e2e.yml
- Cache `~/.cache/ms-playwright` keyed on the resolved
Playwright version. Cache hit → run `playwright install-deps`
(apt-get only, ~5s). Cache miss → full install (~30-60s,
first run after a Playwright bump).
Combined ETA on the e2e workflow: ~10-15min vs ~1h30. The 5×
project reduction is the dominant gain; workers and cache are
smaller multipliers on top.
If a fileParallelism-related regression shows up (cross-file global
state, MSW mock leakage), the fix is test isolation — the previous
caps were a workaround, not a root cause.
SKIP_TESTS=1 — config-only, vitest already verified locally
(285/285 file pass, 3469/3470 tests pass).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
72ff070876 |
fix(ci): correct e2e health check jq path — .data.status == "ok"
Some checks failed
Security Scan / Secret Scanning (gitleaks) (push) Successful in 50s
Veza CI / Backend (Go) (push) Successful in 6m17s
Veza CI / Frontend (Web) (push) Failing after 23m33s
Veza CI / Notify on failure (push) Successful in 7s
E2E Playwright / e2e (full) (push) Has been cancelled
Veza CI / Rust (Stream Server) (push) Successful in 4m16s
Run 459 (e2e on |
||
|
|
86faeb16a8 |
fix(ci): build design-system tokens before tsc/vite (Day 4 follow-up)
Some checks failed
Veza CI / Rust (Stream Server) (push) Successful in 4m6s
Security Scan / Secret Scanning (gitleaks) (push) Successful in 1m20s
Veza CI / Backend (Go) (push) Successful in 5m37s
E2E Playwright / e2e (full) (push) Failing after 16m58s
Veza CI / Frontend (Web) (push) Successful in 29m45s
Veza CI / Notify on failure (push) Has been skipped
CI run 455/456 surfaced: src/features/player/components/AudioVisualizer.tsx(22,8): error TS2307: Cannot find module '@veza/design-system/tokens-generated' or its corresponding type declarations. Root cause: the sprint 2 design-system migration (commits |
||
|
|
1de016dfeb |
fix(ci): drop redis auth in e2e service + emit health body inline
Some checks failed
Veza CI / Rust (Stream Server) (push) Successful in 3m40s
Security Scan / Secret Scanning (gitleaks) (push) Successful in 1m4s
E2E Playwright / e2e (full) (push) Failing after 14m36s
Veza CI / Backend (Go) (push) Failing after 17m6s
Veza CI / Frontend (Web) (push) Successful in 26m17s
Veza CI / Notify on failure (push) Successful in 7s
Two issues from run 430: 1. Health probe never produced a diagnosable signal. The script printed only `false` (jq output) and "Health response invalid" without the body or backend log, because Forgejo artifact upload is broken under GHES so /tmp/backend.log never made it out. Fix: poll instead of fixed sleep, always cat the health body, and tail backend.log on any non-ok status. 2. Redis auth never actually took effect. I had set REDIS_ARGS=--requirepass on the redis service expecting the redis:7-alpine entrypoint to pick it up. It does not — the entrypoint just execs whatever CMD is set, and act_runner services don't accept a `command:` field. So the service started without auth while the backend was sending a password in REDIS_URL → AUTH rejected → .status != "ok". Fix: drop auth on the CI redis service (the dev/prod REM-023 policy lives in docker-compose.yml; the CI service network is ephemeral and isolated), and change REDIS_URL accordingly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
ed1bb4084a |
ci(e2e): replace docker-compose with native services block
Some checks failed
Veza CI / Rust (Stream Server) (push) Successful in 3m56s
Security Scan / Secret Scanning (gitleaks) (push) Successful in 40s
Veza CI / Backend (Go) (push) Failing after 14m15s
E2E Playwright / e2e (full) (push) Failing after 15m25s
Veza CI / Frontend (Web) (push) Successful in 26m8s
Veza CI / Notify on failure (push) Successful in 3s
Symptom: e2e.yml was bringing up Postgres/Redis/RabbitMQ via
`docker compose up -d`, which forces the runner job container to share
the host docker socket, parses the entire docker-compose.yml at every
run (so unrelated interpolations like `${JWT_SECRET:?required}` block
the step), and never auto-cleans the started containers. Concurrent e2e
runs collided on host ports 15432/16379/15672. Combined with the
already-fragile DinD setup, this is one of the top sources of flakes.
Fix: use the GHA-native `services:` block. act_runner spawns the three
service containers on the job network with healthchecks, exposes them
by service hostname on standard ports, tears them down at the end. Net
removal: docker-compose dependency, host port mapping, manual readiness
loop, leaked-container risk.
Wire-shape changes (DB/cache/MQ URLs hoisted to job-level env):
postgres -> postgres:5432 (was localhost:15432)
redis -> redis:6379 (was localhost:16379, + auth required)
rabbitmq -> rabbitmq:5672 (was localhost:5672)
REDIS_URL now carries the requirepass secret to match
docker-compose.yml's REM-023 convention; previously the runner-side
redis happened to start without auth.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
161840e0ab |
fix(ci): hoist JWT_SECRET to workflow env so docker compose validates
Some checks failed
Veza CI / Notify on failure (push) Blocked by required conditions
Security Scan / Secret Scanning (gitleaks) (push) Waiting to run
Veza CI / Rust (Stream Server) (push) Successful in 3m21s
Veza CI / Frontend (Web) (push) Has been cancelled
Veza CI / Backend (Go) (push) Has been cancelled
E2E Playwright / e2e (full) (push) Has been cancelled
docker-compose.yml declares the backend-api service environment with
`${JWT_SECRET:?JWT_SECRET must be set in .env}`. docker compose
validates the WHOLE file at parse time, even when `up -d` is asked
only for `postgres redis rabbitmq` — so the missing value blocks the
"Start backend services" step before anything actually runs.
Fix: hoist JWT_SECRET to the workflow-level env block (with the same
secret/fallback resolution as the Build+start step). The "Build+start
backend API" step now inherits it instead of re-defining.
Behaviour change : none for the backend itself — JWT_SECRET reaches
the same Go process via the same fallback chain. The fix is purely a
docker-compose validation step earlier in the pipeline.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
f23d23cf2b |
feat(ci): add E2E Playwright workflow + runbook (v1.0.8 C2 + C5)
Closes the second-to-last item of Batch C (after C3 reuseExistingServer
and C4 seed --ci flag landed earlier). Wires the existing Playwright
suite (60+ spec files in tests/e2e/) into Forgejo Actions.
Workflow shape (.github/workflows/e2e.yml):
- pull_request → @critical only (5-7min target, 20min timeout)
- push to main → full suite (~25min target, 45min timeout)
- nightly cron 03:00 UTC → full suite, catches infra drift
- workflow_dispatch → full suite, manual trigger
Single job structure with conditional steps based on github.event_name.
The job:
1. Boots Postgres / Redis / RabbitMQ via docker compose.
2. Runs Go migrations.
3. `go run ./cmd/tools/seed --ci` — the lean seed landed in C4
(5 test accounts + 10 tracks + 3 playlists, ~5s).
4. Builds + starts the backend with APP_ENV=test plus
DISABLE_RATE_LIMIT_FOR_TESTS=true and the lockout-exempt
emails matching the auth fixture.
5. `playwright install --with-deps chromium`.
6. `npm run e2e:critical` (PR) or `npm run e2e` (push/cron).
7. Uploads the Playwright HTML report + backend log on failure
(7-day retention, sufficient for triage).
The `CI: "true"` env var is set workflow-wide so playwright.config.ts
(line 141, 155) sees `process.env.CI` and flips reuseExistingServer
to false, guaranteeing a fresh backend + Vite per job.
Secrets fall back to dev defaults (devpassword / 38-char dev JWT /
guest:guest@localhost:5672) so a fresh repo runs without configuring
secrets first; production-style runs should set `E2E_DB_PASSWORD`,
`E2E_JWT_SECRET`, `E2E_RABBITMQ_URL` in Forgejo Actions secrets.
Runbook (docs/CI_E2E.md):
- Trigger / scope / target time table.
- Step-by-step explanation of what a CI run does.
- Required secrets + their fallbacks.
- "Reproducing a CI failure locally" — exact mirror of the workflow
invocation so a dev can rerun without pushing.
- "Debugging a red run" — where to look in the Forgejo UI, what the
artifacts contain, when to check SKIPPED_TESTS.md.
- "Adding a new E2E test" — fixture usage, when to tag @critical.
Action pin SHAs match the rest of the workflows (consistent supply-
chain hygiene). Go 1.25 (matches ci.yml backend job, NOT the older
1.24 used in the disabled accessibility.yml template).
Remaining Batch C item: C6 — flake stabilisation (~3-5 of the 22
SKIPPED_TESTS.md entries that look fixable). Defer to a follow-up
session — wiring the workflow first means the next push-to-main run
will tell us empirically which @critical tests are flaky in CI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|