senke/veza

Fork 0

senke 59be60e1c3

Veza CI / Backend (Go) (push) Failing after 4m55s

Details

Veza CI / Rust (Stream Server) (push) Successful in 5m37s

Details

Security Scan / Secret Scanning (gitleaks) (push) Failing after 1m16s

Details

E2E Playwright / e2e (full) (push) Failing after 12m18s

Details

Veza CI / Frontend (Web) (push) Failing after 15m31s

Details

Veza CI / Notify on failure (push) Successful in 3s

Details

feat(perf): k6 mixed-scenarios load test + nightly workflow + baseline doc (W4 Day 20)

End of W4. Capacity validation gate before launch : sustain 1650 VU
concurrent (100 upload + 500 streaming + 1000 browse + 50 checkout)
on staging without breaking p95 < 500 ms or error rate > 0.5 %.
Acceptance bar : 3 nuits consécutives green.

- scripts/loadtest/k6_mixed_scenarios.js : 4 parallel scenarios via
  k6's executor=constant-vus. Per-scenario p95 thresholds layered on
  top of the global gate so a single-flow regression doesn't get
  masked. discardResponseBodies=true (memory pressure ; we assert
  on status codes + latency, not payload). VU counts overridable via
  UPLOAD_VUS / STREAM_VUS / BROWSE_VUS / CHECKOUT_VUS env vars for
  local runs.
  * upload     : 100 VU, initiate + 10 × 1 MiB chunks (10 MiB tracks).
  * streaming  : 500 VU, master.m3u8 → 256k playlist → 4 .ts segments.
  * browse     : 1000 VU, mix 60% search / 30% list / 10% detail.
  * checkout   : 50 VU, list-products + POST orders (rejected at
    validation — exercises auth + rate-limit + Redis state, doesn't
    burn Hyperswitch sandbox quota).

- .github/workflows/loadtest.yml : Forgejo Actions nightly cron
  02:30 UTC. workflow_dispatch lets the operator override duration
  + base_url for ad-hoc capacity drills. Pre-flight GET /api/v1/health
  aborts before consuming runner time when staging is already down.
  Artifacts : k6-summary.json (30d retention) + the script itself.
  Step summary annotates p95/p99 + failed rate so the Action listing
  shows the verdict at a glance.

- docs/PERFORMANCE_BASELINE.md §v1.0.9 W4 Day 20 : scenarios table,
  thresholds, local-run command, operating notes (token rotation,
  upload-scenario approximation, staging-only guard rail), Grafana
  cross-reference, acceptance gate spelled out.

Acceptance (Day 20) : workflow file is valid YAML ; k6 script parses
clean (Node test acknowledges k6/* imports as runtime-provided, the
rest of the syntax checks). Real green-night accumulation requires
the workflow running on staging — that's a deployment milestone, not
a code change.

W4 verification gate progress : Lighthouse PWA / HLS ABR / faceted
search / HAProxy failover / k6 nightly capacity all wired ; W4 = done.
W5 (pentest interne + game day + canary + status page) up next.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-29 11:44:06 +02:00

7.9 KiB

Raw Permalink Blame History

Performance Baseline — Veza API

Version : v0.951 Objectif : Documenter les latences P50/P95/P99 des endpoints critiques pour détecter les régressions.

Méthodologie

Démarrer l'API en mode profiling : pprof est exposé si ENABLE_PPROF=true
Exécuter un load test (k6 ou Go) sur les endpoints critiques
Mesurer latences via Prometheus (http_request_duration_seconds) ou pprof

Endpoints critiques à monitorer

Endpoint	Méthode	Description
`/api/v1/auth/login`	POST	Login utilisateur
`/api/v1/auth/register`	POST	Inscription
`/api/v1/tracks`	GET	Liste des tracks (cursor pagination v0.931)
`/api/v1/tracks/search`	GET	Recherche
`/api/v1/users/me`	GET	Profil utilisateur
`/api/v1/marketplace/orders`	POST	Création commande
`/api/v1/notifications`	GET	Notifications
`/api/v1/conversations`	GET	Conversations
`/api/v1/analytics/me`	GET	Analytics
`/health`	GET	Health check

Cibles v1.0 (v0.951)

P99 < 500ms sur tous les endpoints critiques à 500 req/s (stress_500rps.js)
1000 WebSocket : connexions stables 5 min, taux livraison > 99% (stress_1000ws.js)
50 uploads concurrents : tous réussis, backpressure respecté (uploads.js)
GET /tracks : pagination cursor-based (v0.931) garantit des performances constantes quelle que soit la page

Scripts k6 v0.951

Script	Commande	Seuils
API stress 500 VUs	`k6 run loadtests/backend/stress_500rps.js`	P99 < 500ms (login, tracks, search, products)
WebSocket 1000	`k6 run loadtests/chat/stress_1000ws.js`	ws_connection_failures < 1%, ws_message_failures < 1%
Uploads 50	`k6 run loadtests/backend/uploads.js`	P95 < 5s (simple), P95 < 8s (chunked)

Voir loadtests/README.md pour l'exécution complète.

Commande pprof

# Profiler 30s pendant un load test
go tool pprof -http=:8081 http://localhost:8080/debug/pprof/profile?seconds=30

Métriques Prometheus

Les middlewares de monitoring exposent http_request_duration_seconds avec les labels method, path, status. Utiliser des histogram quantiles pour P50/P95/P99.

Lighthouse v0.982 (Frontend)

Objectif : Performance ≥ 90, Accessibility ≥ 90, Best Practices ≥ 90 sur les pages critiques.

Pages à auditer

Page	Route	Cible Performance	Cible Accessibility
Login	`/login`	≥ 90	≥ 90
Dashboard	`/dashboard`	≥ 90	≥ 90
Tracks	`/library` ou `/tracks`	≥ 90	≥ 90
Marketplace	`/marketplace`	≥ 90	≥ 90
Search	`/search`	≥ 90	≥ 90
Profile	`/profile`	≥ 90	≥ 90

Procédure d'audit

# Prérequis: app frontend en cours d'exécution (npm run dev ou build + preview)
npx lighthouse http://localhost:4173/ --view --output=html --output-path=./lighthouse-reports/home.html
npx lighthouse http://localhost:4173/login --view --output=html --output-path=./lighthouse-reports/login.html
# Répéter pour chaque page critique

Dernier audit

Voir config/incus/LIGHTHOUSE_AUDIT_REPORT.md pour le dernier rapport (2026-01-15). Accessibility 93, Best Practices 96 — objectif v0.982 atteint sur ces critères. Performance à revalider après corrections NO_LCP.

Résultats v1.0.2

Prérequis : docker compose up -d, backend + PostgreSQL + Redis.

Load tests corrigés (v0.502)

WebSocket load test : CHAT_ORIGIN pointant vers backend ws://localhost:8080, WS_URL = /api/v1/ws
Fichiers : loadtests/config.js, loadtests/chat/stress_1000ws.js, loadtests/chat/websocket.js

Commandes pour exécution

k6 run loadtests/backend/stress_500rps.js      # 500 req/s, P99 < 500ms
k6 run loadtests/chat/stress_1000ws.js        # 1000 WebSocket, < 1% échec
k6 run loadtests/backend/uploads.js            # 50 uploads

Tableau résultats (à remplir après exécution sur infra)

Endpoint / Script	P50	P95	P99	Taux échec
stress_500rps (login, tracks, search)
stress_1000ws
uploads

v1.0.9 W4 Day 20 — Mixed-scenarios nightly k6

Capacity gate before launch : sustain 1650 VU concurrent for 5 minutes on staging without breaking the global thresholds. Scheduled by .github/workflows/loadtest.yml at 02:30 UTC ; the acceptance bar is "3 nuits consécutives green" before the launch goes hot.

Scenarios

Run in parallel via the k6 scenarios block in scripts/loadtest/k6_mixed_scenarios.js. Each one uses executor: constant-vus so the steady state is unambiguous.

Scenario	VU	Workload	Per-scenario p95 gate
upload	100	initiate + 10×1 MiB chunks (synthetic 10 MiB tracks)	global only
streaming	500	master.m3u8 → quality playlist → 4 segments loop	< 300 ms
browse	1000	search 60% / list 30% / detail 10%	< 400 ms
checkout	50	list products → POST orders (rejected at validation)	< 800 ms

Global thresholds (acceptance bar)

Metric	Threshold	Reason
`http_req_duration`	p(95) < 500 ms	Roadmap §Day 20.
`http_req_duration`	p(99) < 1500 ms	Tail latency cap ; catches one-off sync stalls.
`http_req_failed`	rate < 0.5 %	Roadmap §Day 20. Looser per-scenario for upload + checkout (network + Hyperswitch).

How to run locally

# Against the lab haproxy (no auth required for browse/streaming) :
k6 run scripts/loadtest/k6_mixed_scenarios.js \
  --env BASE_URL=http://haproxy.lxd \
  --env STREAM_TRACK_ID=<seed-uuid> \
  --env DURATION=2m \
  --env UPLOAD_VUS=10 --env STREAM_VUS=50 --env BROWSE_VUS=100 --env CHECKOUT_VUS=5

# Full nightly profile against staging :
USER_TOKEN=$(./scripts/issue-loadtest-token.sh) \
k6 run scripts/loadtest/k6_mixed_scenarios.js \
  --env BASE_URL=https://staging.veza.fr \
  --env STREAM_TRACK_ID=<seed-uuid> \
  --env USER_TOKEN="$USER_TOKEN"

Operating notes

Override per-scenario VU with UPLOAD_VUS, STREAM_VUS, BROWSE_VUS, CHECKOUT_VUS env vars to dial the load down for local runs.
Staging-only. The workflow refuses to run against prod ; the BASE_URL is set from vars.STAGING_BASE_URL (or DEFAULT_BASE_URL env in the workflow) and never reads from a prod-shaped variable.
Token rotation. STAGING_LOADTEST_TOKEN is a long-lived token bound to a dedicated loadtest@veza.music user with role=user (no admin powers). Rotate quarterly.
Upload scenario approximation. The chunked endpoint expects multipart bodies ; for load shaping we POST raw 1 MiB chunks with the upload-id header. The cost path (auth + rate-limit + Redis state) is exercised correctly even though the resulting upload is rejected at the multipart parser.

After-run dashboard

The Grafana dashboard Veza API Overview (config/grafana/dashboards/api-overview.json) carries the p95/p99 panels. Add the k6 run window via the timepicker to compare. The k6 JSON summary uploaded as a workflow artifact carries the per-scenario breakdown that the dashboard can't show directly.

Acceptance gate (W4 verification)

3 consecutive nightly runs green (no threshold violation).
p95 < 500 ms on the global metric.
Per-scenario gates met for every flow.

When the gate breaks, the workflow's "Annotate thresholds in summary" step writes the failing values to the GitHub Actions summary so the on-call can triage from a single page.

7.9 KiB Raw Permalink Blame History Unescape Escape

Performance Baseline — Veza API

Méthodologie

Endpoints critiques à monitorer

Cibles v1.0 (v0.951)

Scripts k6 v0.951

Commande pprof

Métriques Prometheus

Lighthouse v0.982 (Frontend)

Pages à auditer

Procédure d'audit

Dernier audit

Résultats v1.0.2

Load tests corrigés (v0.502)

Commandes pour exécution

Tableau résultats (à remplir après exécution sur infra)

v1.0.9 W4 Day 20 — Mixed-scenarios nightly k6

Scenarios

Global thresholds (acceptance bar)

How to run locally

Operating notes

After-run dashboard

Acceptance gate (W4 verification)

7.9 KiB

Raw Permalink Blame History