senke/veza

No description

Find a file

senke bf31a91ae6 Some checks failed Veza CI / Frontend (Web) (push) Failing after 16m6s Details Veza CI / Notify on failure (push) Successful in 11s Details E2E Playwright / e2e (full) (push) Successful in 19m59s Details Veza CI / Rust (Stream Server) (push) Successful in 4m57s Details Security Scan / Secret Scanning (gitleaks) (push) Successful in 49s Details Veza CI / Backend (Go) (push) Successful in 6m4s Details feat(infra): pgbackrest role + dr-drill + Prometheus backup alerts (W2 Day 8) ROADMAP_V1.0_LAUNCH.md §Semaine 2 day 8 deliverable: - Postgres backups land in MinIO via pgbackrest - dr-drill restores them weekly into an ephemeral Incus container and asserts the data round-trips - Prometheus alerts fire when the drill fails OR when the timer has stopped firing for >8 days Cadence: full — weekly (Sun 02:00 UTC, systemd timer) diff — daily (Mon-Sat 02:00 UTC, systemd timer) WAL — continuous (postgres archive_command, archive_timeout=60s) drill — weekly (Sun 04:00 UTC — runs 2h after the Sun full so the restore exercises fresh data) RPO ≈ 1 min (archive_timeout). RTO ≤ 30 min (drill measures actual restore wall-clock). Files: infra/ansible/roles/pgbackrest/ defaults/main.yml — repo1-* config (MinIO/S3, path-style, aes-256-cbc encryption, vault-backed creds), retention 4 full / 7 diff / 4 archive cycles, zstd@3 compression. The role's first task asserts the placeholder secrets are gone — refuses to apply until the vault carries real keys. tasks/main.yml — install pgbackrest, render /etc/pgbackrest/pgbackrest.conf, set archive_command on the postgres instance via ALTER SYSTEM, detect role at runtime via `pg_autoctl show state --json`, stanza-create from primary only, render + enable systemd timers (full + diff + drill). templates/pgbackrest.conf.j2 — global + per-stanza sections; pg1-path defaults to the pg_auto_failover state dir so the role plugs straight into the Day 6 formation. templates/pgbackrest-{full,diff,drill}.{service,timer}.j2 — systemd units. Backup services run as `postgres`, drill service runs as `root` (needs `incus`). RandomizedDelaySec on every timer to absorb clock skew + node collision risk. README.md — RPO/RTO guarantees, vault setup, repo wiring, operational cheatsheet (info / check / manual backup), restore procedure documented separately as the dr-drill. scripts/dr-drill.sh Acceptance script for the day. Sequence: 0. pre-flight: required tools, latest backup metadata visible 1. launch ephemeral `pg-restore-drill` Incus container 2. install postgres + pgbackrest inside, push the SAME pgbackrest.conf as the host (read-only against the bucket by pgbackrest semantics — the same s3 keys get reused so the drill exercises the production credential path) 3. `pgbackrest restore` — full + WAL replay 4. start postgres, wait for pg_isready 5. smoke query: SELECT count() FROM users — must be ≥ MIN_USERS_EXPECTED 6. write veza_backup_drill_ metrics to the textfile-collector 7. teardown (or --keep for postmortem inspection) Exit codes 0/1/2 (pass / drill failure / env problem) so a Prometheus runner can plug in directly. config/prometheus/alert_rules.yml — new `veza_backup` group: - BackupRestoreDrillFailed (critical, 5m): the last drill reported success=0. Pages because a backup we haven't proved restorable is dette technique waiting for a disaster. - BackupRestoreDrillStale (warning, 1h after >8 days): the drill timer has stopped firing. Catches a broken cron / unit / runner before the failure-mode alert above ever sees data. Both annotations include a runbook_url stub (veza.fr/runbooks/...) — those land alongside W2 day 10's SLO runbook batch. infra/ansible/playbooks/postgres_ha.yml Two new plays: 6. apply pgbackrest role to postgres_ha_nodes (install + config + full/diff timers on every data node; pgbackrest's repo lock arbitrates collision) 7. install dr-drill on the incus_hosts group (push /usr/local/bin/dr-drill.sh + render drill timer + ensure /var/lib/node_exporter/textfile_collector exists) Acceptance verified locally: $ ansible-playbook -i inventory/lab.yml playbooks/postgres_ha.yml \ --syntax-check playbook: playbooks/postgres_ha.yml ← clean $ python3 -c "import yaml; yaml.safe_load(open('config/prometheus/alert_rules.yml'))" YAML OK $ bash -n scripts/dr-drill.sh syntax OK Real apply + drill needs the lab R720 + a populated MinIO bucket + the secrets in vault — operator's call. Out of scope (deferred per ROADMAP §2): - Off-site backup replica (B2 / Bunny.net) — v1.1+ - Logical export pipeline for RGPD per-user dumps — separate feature track, not a backup-system concern - PITR admin UI — CLI-only via `--type=time` for v1.0 - pgbackrest_exporter Prometheus integration — W2 day 9 alongside the OTel collector Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-04-28 00:51:00 +02:00
.github	fix(e2e): unblock @critical green slate for v1.0.9 tag (Day 4 triage)	2026-04-27 16:18:56 +02:00
.husky	chore(web): drop legacy openapi-generator-cli — orval is the single source (v1.0.8 B9)	2026-04-26 00:02:58 +02:00
.zap	chore(cleanup): remove orphan code + archive disabled workflows + .playwright-mcp	2026-04-20 20:33:40 +02:00
apps/web	feat(branding): scaffold Logo component + Sumi icons + brand assets pipeline (Sprint 3)	2026-04-27 17:08:17 +02:00
chat_exports	report generation and future tasks selection	2025-12-08 19:57:54 +01:00
config	feat(infra): pgbackrest role + dr-drill + Prometheus backup alerts (W2 Day 8)	2026-04-28 00:51:00 +02:00
dev-environment	refactor: remove dead code (api_manager.go, unused templates)	2026-02-22 17:44:19 +01:00
docker/haproxy	chore: consolidate CI, E2E, backend and frontend updates	2026-02-17 16:43:21 +01:00
docs	docs(roadmap): add v1.0 → v2.0.0-public launch roadmap (6 weeks)	2026-04-26 23:50:07 +02:00
docs-assets/mermaid	BASE: completing the initial repo state	2025-12-03 22:56:50 +01:00
fixtures	release(v0.903): Vault - ORDER BY whitelist, rate limiter, VERSION sync, chat-server cleanup, Go 1.24	2026-02-27 09:43:25 +01:00
full_veza_audit_data	feat(v0.923): API contract tests, OpenAPI generation, CI type sync check	2026-02-27 20:23:10 +01:00
home/senke/git/talas/veza/apps/web/src	small fixes : cors + login loop	2026-02-07 20:36:48 +01:00
infra	feat(infra): pgbackrest role + dr-drill + Prometheus backup alerts (W2 Day 8)	2026-04-28 00:51:00 +02:00
k8s	docs(J2): align docs with reality — rewrite CLAUDE.md, fix README, purge chat-server refs	2026-04-14 17:23:50 +02:00
loadtests	chore(cleanup): remove orphan code + archive disabled workflows + .playwright-mcp	2026-04-20 20:33:40 +02:00
make	feat(ci): enforce OpenAPI type sync — drift prevention (v1.0.8 P0)	2026-04-23 20:33:13 +02:00
packages/design-system	refactor(design-system): finish Sprint 2 — light theme + 3 viz pigments canonized	2026-04-27 16:57:12 +02:00
prompts	chore: add audit screenshots, audit scripts, and prompt templates	2026-03-31 19:17:05 +02:00
proto	chore(cleanup): remove orphan code + archive disabled workflows + .playwright-mcp	2026-04-20 20:33:40 +02:00
scripts	feat(infra): pgbackrest role + dr-drill + Prometheus backup alerts (W2 Day 8)	2026-04-28 00:51:00 +02:00
sub_task_agents	Phase 2 stabilisation: code mort, Modal→Dialog, feature flags, tests, router split, Rust legacy	2026-02-14 17:23:32 +01:00
test-reports/20251226-132633	[TEST] MVP integration tests executed - 2/28 API passed, 0/20 E2E passed, 3 bugs found	2026-01-04 01:44:13 +01:00
tests	fix(e2e): triage @critical batch 2 — chat WS proxy + FeedPage dette (Day 4)	2026-04-27 16:55:15 +02:00
tmt	fix: sync E2E tests with seed data + i18n fix	2026-04-02 19:42:03 +02:00
tools	BASE: completing the initial repo state	2025-12-03 22:56:50 +01:00
veza-backend-api	feat(infra): pgbouncer role + pgbench load test (W2 Day 7)	2026-04-27 18:35:05 +02:00
veza-common	chore(cleanup): remove orphan code + archive disabled workflows + .playwright-mcp	2026-04-20 20:33:40 +02:00
veza-docs	docs(origin): align brand identity with CHARTE_GRAPHIQUE_TALAS (Sprint 2 follow-up #4 )	2026-04-27 16:48:37 +02:00
veza-stream-server	chore(infra): J6 — mark 3 dormant docker-compose files as deprecated	2026-04-15 12:58:39 +02:00
.commitlintrc.json	chore(cleanup): remove orphan code + archive disabled workflows + .playwright-mcp	2026-04-20 20:33:40 +02:00
.cursorrules	docs: retrospective v0.803, archive scope, update SCOPE_CONTROL	2026-03-03 09:25:34 +01:00
.editorconfig	initial: initial repo set up (README, LICENSE, CONTRIBUTORS, etc...)	2025-12-03 13:54:23 +01:00
.gitattributes	initial: initial repo set up (README, LICENSE, CONTRIBUTORS, etc...)	2025-12-03 13:54:23 +01:00
.gitignore	chore(web): install orval + mutator for OpenAPI code generation (v1.0.8 P1)	2026-04-24 00:18:14 +02:00
.gitleaks.toml	ci(security): expand gitleaks allowlist for e2e artifacts, docs, templates	2026-04-14 12:32:34 +02:00
.lighthouserc.js	feat(v0.14.0): validation runtime & staging pipeline	2026-03-13 16:09:43 +01:00
.lintstagedrc.json	fix(ci): lint-staged eslint rule was linting the whole project	2026-04-15 12:47:21 +02:00
.nvmrc	v0.9.3	2026-03-05 19:35:57 +01:00
.pa11yci.json	chore(cleanup): remove orphan code + archive disabled workflows + .playwright-mcp	2026-04-20 20:33:40 +02:00
.semgrepignore	chore(cleanup): remove orphan code + archive disabled workflows + .playwright-mcp	2026-04-20 20:33:40 +02:00
AUDIT_REPORT.md	chore(docs): archive obsolete v0.12.6 security docs	2026-04-23 15:32:25 +02:00
CHANGELOG.md	chore: release v1.0.8	2026-04-26 00:23:59 +02:00
CLAUDE.md	docs: update CLAUDE.md stack table + history post-v1.0.8	2026-04-26 01:46:27 +02:00
CONTRIBUTING.md	release(v0.903): Vault - ORDER BY whitelist, rate limiter, VERSION sync, chat-server cleanup, Go 1.24	2026-02-27 09:43:25 +01:00
docker-compose.dev.yml	chore(docker): pin MinIO + mc to dated release tags	2026-04-20 20:32:01 +02:00
docker-compose.env.example	feat(payments): document Hyperswitch activation and validate checkout flow	2026-02-15 16:08:49 +01:00
docker-compose.override.yml.example	BASE: completing the initial repo state	2025-12-03 22:56:50 +01:00
docker-compose.prod.yml	chore(docker): pin MinIO + mc to dated release tags	2026-04-20 20:32:01 +02:00
docker-compose.staging.yml	chore(docker): pin MinIO + mc to dated release tags	2026-04-20 20:32:01 +02:00
docker-compose.test.yml	fix(infra): align PostgreSQL to version 16 in test compose	2026-02-22 17:35:35 +01:00
docker-compose.yml	chore(docker): pin MinIO + mc to dated release tags	2026-04-20 20:32:01 +02:00
env.remote-r720.example	stabilisation commit: while implementing v0.10.5	2026-03-09 19:36:33 +01:00
FUNCTIONAL_AUDIT.md	docs(audit): reconcile top-15 priorities with tier 1-3 + BFG pass	2026-04-23 14:20:28 +02:00
go.work	fix(ci): bump go.work to 1.25 to match veza-backend-api/go.mod	2026-04-15 15:06:50 +02:00
go.work.sum	chore(release): v0.931 — Cursor (cursor-based pagination, performance baseline)	2026-03-02 12:35:49 +01:00
help	chore(cleanup): remove orphan code + archive disabled workflows + .playwright-mcp	2026-04-20 20:33:40 +02:00
Makefile	release(v0.903): Vault - ORDER BY whitelist, rate limiter, VERSION sync, chat-server cleanup, Go 1.24	2026-02-27 09:43:25 +01:00
package-lock.json	feat(design-system): introduce Style Dictionary (W3C tokens) — Sprint 2 foundation	2026-04-27 04:52:15 +02:00
package.json	chore(deps): add @commitlint/cli + config-conventional dev deps	2026-04-25 21:06:38 +02:00
README.md	chore(cleanup): J5 — defer GeoIP, rename v2-v3-types, document Storybook kill	2026-04-15 12:43:57 +02:00
RELEASE_NOTES_V1.md	chore(release): v0.992 RC2 — Release notes, sign-off final	2026-03-03 19:53:41 +01:00
run-audit.sh	chore: add audit screenshots, audit scripts, and prompt templates	2026-03-31 19:17:05 +02:00
rust-toolchain.toml	BASE: completing the initial repo state	2025-12-03 22:56:50 +01:00
status.sh	docs: add project documentation, logging config, status script	2026-03-18 11:36:36 +01:00
turbo.json	feat(design-system): introduce Style Dictionary (W3C tokens) — Sprint 2 foundation	2026-04-27 04:52:15 +02:00
Untitled	chore: consolidate CI, E2E, backend and frontend updates	2026-02-17 16:43:21 +01:00
VERSION	chore: release v1.0.8	2026-04-26 00:23:59 +02:00
VEZA_VERSIONS_ROADMAP.md	docs: update VEZA_VERSIONS_ROADMAP [v1.0.0-rc1 DONE]	2026-03-13 16:24:04 +01:00

README.md

Veza Monorepo

Version courante : v1.0.4 (cleanup + consolidation post-audit). Voir CHANGELOG.md et docs/PROJECT_STATE.md.

Project Structure

apps/web — Frontend React 18 + Vite 5 + TypeScript strict (source of truth for the UI)
veza-backend-api — Main Go 1.25 API service (Gin, GORM, Postgres, Redis, RabbitMQ, Elasticsearch). Handles REST, WebSocket, and chat (chat server was merged into this service in v0.502).
veza-stream-server — Rust streaming server (Axum 0.8, Tokio 1.35, Symphonia) — HLS, HTTP Range, WebSocket, gRPC
veza-common — Shared Rust types and logging
packages/design-system — Shared design tokens

See CLAUDE.md for the full architecture map.

Development Setup

Prerequisites: Node 20 (see .nvmrc), Go, Rust, Docker. Configure .env from .env.example.

# Verify environment
make doctor
./scripts/validate-env.sh development

# Install dependencies
make install-deps

# Option A — Backend in Docker + Web local
make dev

# Option B — All apps local with hot reload (infra from docker-compose.dev.yml)
make dev-full

# Option C — Infra only, then run services manually
docker compose -f docker-compose.dev.yml up -d
make dev-web              # or make dev-backend-api, make dev-stream-server

See docs/ENV_VARIABLES.md for required variables. make build builds all services.

Quick Start

Frontend only

cd apps/web
npm install
npm run dev

Docker Production

Canonical production compose file: docker-compose.prod.yml

docker compose -f docker-compose.prod.yml up -d

See make/config.mk for COMPOSE_PROD and deployment docs.

CI/CD

Badge : CI status above. Set SLACK_WEBHOOK_URL (Incoming Webhook) in repo secrets to receive Slack notifications on failure.

Disabled workflows

Storybook (chromatic.yml.disabled, storybook-audit.yml.disabled, visual-regression.yml.disabled): deferred until MSW is wired up for /api/v1/auth/me and /api/v1/logs/frontend, which currently causes ~1 400 network errors in the Storybook build. The npm scripts (storybook, build-storybook) still work locally for one-off component inspection. To reactivate in CI, fix the MSW handlers and rename the three files back to .yml.

Documentation

Developer Onboarding — Setup, architecture, conventions, troubleshooting
Documentation index — Index complet de la documentation
See docs/ for detailed architecture and development guides. Older audits and reports are archived in docs/archive/.