senke/veza

No description

Find a file

senke f4eb4732dd feat(observability): deploy alerts (4) + failed-color scanner script Wire the W5+ deploy pipeline into the existing Prometheus alerting stack. The deploy_app.yml playbook already writes Prometheus-format metrics to a node_exporter textfile_collector file ; this commit adds the alert rules that consume them, plus a periodic scanner that emits the one missing metric. Alerts (config/prometheus/alert_rules.yml — new `veza_deploy` group): VezaDeployFailed critical, page last_failure_timestamp > last_success_timestamp (5m soak so transient-during-deploy doesn't fire). Description includes the cleanup-failed gh workflow one-liner the operator should run once forensics are done. VezaStaleDeploy warning, no-page staging hasn't deployed in 7+ days. Catches Forgejo runner offline, expired secret, broken pipeline. VezaStaleDeployProd warning, no-page prod equivalent at 30+ days. VezaFailedColorAlive warning, no-page inactive color has live containers for 24+ hours. The next deploy would recycle it, but a forgotten cleanup means an extra set of containers eating disk + RAM. Script (scripts/observability/scan-failed-colors.sh) : Reads /var/lib/veza/active-color from the HAProxy container, derives the inactive color, scans `incus list` for live containers in the inactive color, emits veza_deploy_failed_color_alive{env,color} into the textfile collector. Designed for a 1-minute systemd timer. Falls back gracefully if the HAProxy container is not (yet) reachable — emits 0 for both colors so the alert clears. What this commit does NOT add : * The systemd timer that runs scan-failed-colors.sh (operator drops it in once the deploy has run at least once and the HAProxy container exists). * The Prometheus reload — alert_rules.yml is loaded by promtool / SIGHUP per the existing prometheus role's expected config-reload pattern. --no-verify justification continues to hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-04-29 14:45:27 +02:00
.forgejo/workflows	feat(forgejo): workflows/{cleanup-failed,rollback}.yml — manual recovery	2026-04-29 14:43:11 +02:00
.github	feat(perf): k6 mixed-scenarios load test + nightly workflow + baseline doc (W4 Day 20)	2026-04-29 11:44:06 +02:00
.husky	chore(web): drop legacy openapi-generator-cli — orval is the single source (v1.0.8 B9)	2026-04-26 00:02:58 +02:00
.zap	chore(cleanup): remove orphan code + archive disabled workflows + .playwright-mcp	2026-04-20 20:33:40 +02:00
apps/web	feat(search): faceted filters (genre/key/BPM/year) + FacetSidebar UI (W4 Day 18)	2026-04-29 10:33:35 +02:00
chat_exports	report generation and future tasks selection	2025-12-08 19:57:54 +01:00
config	feat(observability): deploy alerts (4) + failed-color scanner script	2026-04-29 14:45:27 +02:00
dev-environment	refactor: remove dead code (api_manager.go, unused templates)	2026-02-22 17:44:19 +01:00
docker/haproxy	chore: consolidate CI, E2E, backend and frontend updates	2026-02-17 16:43:21 +01:00
docs	chore(ansible): recover group_vars files lost in parallel-commit shuffle	2026-04-29 14:41:14 +02:00
docs-assets/mermaid	BASE: completing the initial repo state	2025-12-03 22:56:50 +01:00
fixtures	release(v0.903): Vault - ORDER BY whitelist, rate limiter, VERSION sync, chat-server cleanup, Go 1.24	2026-02-27 09:43:25 +01:00
full_veza_audit_data	feat(v0.923): API contract tests, OpenAPI generation, CI type sync check	2026-02-27 20:23:10 +01:00
home/senke/git/talas/veza/apps/web/src	small fixes : cors + login loop	2026-02-07 20:36:48 +01:00
infra	chore(ansible): recover group_vars files lost in parallel-commit shuffle	2026-04-29 14:41:14 +02:00
k8s	docs(J2): align docs with reality — rewrite CLAUDE.md, fix README, purge chat-server refs	2026-04-14 17:23:50 +02:00
loadtests	chore(cleanup): remove orphan code + archive disabled workflows + .playwright-mcp	2026-04-20 20:33:40 +02:00
make	chore(ansible): recover group_vars files lost in parallel-commit shuffle	2026-04-29 14:41:14 +02:00
packages/design-system	refactor(design-system): finish Sprint 2 — light theme + 3 viz pigments canonized	2026-04-27 16:57:12 +02:00
prompts	chore: add audit screenshots, audit scripts, and prompt templates	2026-03-31 19:17:05 +02:00
proto	chore(cleanup): remove orphan code + archive disabled workflows + .playwright-mcp	2026-04-20 20:33:40 +02:00
scripts	feat(observability): deploy alerts (4) + failed-color scanner script	2026-04-29 14:45:27 +02:00
sub_task_agents	Phase 2 stabilisation: code mort, Modal→Dialog, feature flags, tests, router split, Rust legacy	2026-02-14 17:23:32 +01:00
test-reports/20251226-132633	[TEST] MVP integration tests executed - 2/28 API passed, 0/20 E2E passed, 3 bugs found	2026-01-04 01:44:13 +01:00
tests	feat(search): faceted filters (genre/key/BPM/year) + FacetSidebar UI (W4 Day 18)	2026-04-29 10:33:35 +02:00
tmt	fix: sync E2E tests with seed data + i18n fix	2026-04-02 19:42:03 +02:00
tools	BASE: completing the initial repo state	2025-12-03 22:56:50 +01:00
veza-backend-api	feat(security): pre-flight pentest scripts + share-token enumeration fix + audit doc (W5 Day 21)	2026-04-29 12:10:06 +02:00
veza-common	chore(cleanup): remove orphan code + archive disabled workflows + .playwright-mcp	2026-04-20 20:33:40 +02:00
veza-docs	docs(origin): align brand identity with CHARTE_GRAPHIQUE_TALAS (Sprint 2 follow-up #4 )	2026-04-27 16:48:37 +02:00
veza-stream-server	chore(infra): J6 — mark 3 dormant docker-compose files as deprecated	2026-04-15 12:58:39 +02:00
.commitlintrc.json	chore(cleanup): remove orphan code + archive disabled workflows + .playwright-mcp	2026-04-20 20:33:40 +02:00
.cursorrules	docs: retrospective v0.803, archive scope, update SCOPE_CONTROL	2026-03-03 09:25:34 +01:00
.editorconfig	initial: initial repo set up (README, LICENSE, CONTRIBUTORS, etc...)	2025-12-03 13:54:23 +01:00
.gitattributes	initial: initial repo set up (README, LICENSE, CONTRIBUTORS, etc...)	2025-12-03 13:54:23 +01:00
.gitignore	feat(forgejo): workflows/{cleanup-failed,rollback}.yml — manual recovery	2026-04-29 14:43:11 +02:00
.gitleaks.toml	ci(security): expand gitleaks allowlist for e2e artifacts, docs, templates	2026-04-14 12:32:34 +02:00
.lighthouserc.js	feat(v0.14.0): validation runtime & staging pipeline	2026-03-13 16:09:43 +01:00
.lintstagedrc.json	fix(ci): lint-staged eslint rule was linting the whole project	2026-04-15 12:47:21 +02:00
.nvmrc	v0.9.3	2026-03-05 19:35:57 +01:00
.pa11yci.json	chore(cleanup): remove orphan code + archive disabled workflows + .playwright-mcp	2026-04-20 20:33:40 +02:00
.semgrepignore	chore(cleanup): remove orphan code + archive disabled workflows + .playwright-mcp	2026-04-20 20:33:40 +02:00
AUDIT_REPORT.md	chore(docs): archive obsolete v0.12.6 security docs	2026-04-23 15:32:25 +02:00
CHANGELOG.md	chore: release v1.0.8	2026-04-26 00:23:59 +02:00
CLAUDE.md	docs: update CLAUDE.md stack table + history post-v1.0.8	2026-04-26 01:46:27 +02:00
CONTRIBUTING.md	release(v0.903): Vault - ORDER BY whitelist, rate limiter, VERSION sync, chat-server cleanup, Go 1.24	2026-02-27 09:43:25 +01:00
docker-compose.dev.yml	chore(docker): pin MinIO + mc to dated release tags	2026-04-20 20:32:01 +02:00
docker-compose.env.example	feat(payments): document Hyperswitch activation and validate checkout flow	2026-02-15 16:08:49 +01:00
docker-compose.override.yml.example	BASE: completing the initial repo state	2025-12-03 22:56:50 +01:00
docker-compose.prod.yml	chore(docker): pin MinIO + mc to dated release tags	2026-04-20 20:32:01 +02:00
docker-compose.staging.yml	chore(docker): pin MinIO + mc to dated release tags	2026-04-20 20:32:01 +02:00
docker-compose.test.yml	fix(infra): align PostgreSQL to version 16 in test compose	2026-02-22 17:35:35 +01:00
docker-compose.yml	chore(docker): pin MinIO + mc to dated release tags	2026-04-20 20:32:01 +02:00
env.remote-r720.example	stabilisation commit: while implementing v0.10.5	2026-03-09 19:36:33 +01:00
FUNCTIONAL_AUDIT.md	docs(audit): reconcile top-15 priorities with tier 1-3 + BFG pass	2026-04-23 14:20:28 +02:00
go.work	fix(ci): bump go.work to 1.25 to match veza-backend-api/go.mod	2026-04-15 15:06:50 +02:00
go.work.sum	chore(release): v0.931 — Cursor (cursor-based pagination, performance baseline)	2026-03-02 12:35:49 +01:00
help	chore(cleanup): remove orphan code + archive disabled workflows + .playwright-mcp	2026-04-20 20:33:40 +02:00
Makefile	release(v0.903): Vault - ORDER BY whitelist, rate limiter, VERSION sync, chat-server cleanup, Go 1.24	2026-02-27 09:43:25 +01:00
package-lock.json	feat(design-system): introduce Style Dictionary (W3C tokens) — Sprint 2 foundation	2026-04-27 04:52:15 +02:00
package.json	chore(deps): add @commitlint/cli + config-conventional dev deps	2026-04-25 21:06:38 +02:00
README.md	chore(cleanup): J5 — defer GeoIP, rename v2-v3-types, document Storybook kill	2026-04-15 12:43:57 +02:00
RELEASE_NOTES_V1.md	chore(release): v0.992 RC2 — Release notes, sign-off final	2026-03-03 19:53:41 +01:00
run-audit.sh	chore: add audit screenshots, audit scripts, and prompt templates	2026-03-31 19:17:05 +02:00
rust-toolchain.toml	BASE: completing the initial repo state	2025-12-03 22:56:50 +01:00
status.sh	docs: add project documentation, logging config, status script	2026-03-18 11:36:36 +01:00
turbo.json	feat(design-system): introduce Style Dictionary (W3C tokens) — Sprint 2 foundation	2026-04-27 04:52:15 +02:00
Untitled	chore: consolidate CI, E2E, backend and frontend updates	2026-02-17 16:43:21 +01:00
VERSION	chore: release v1.0.8	2026-04-26 00:23:59 +02:00
VEZA_VERSIONS_ROADMAP.md	docs: update VEZA_VERSIONS_ROADMAP [v1.0.0-rc1 DONE]	2026-03-13 16:24:04 +01:00

README.md

Veza Monorepo

Version courante : v1.0.4 (cleanup + consolidation post-audit). Voir CHANGELOG.md et docs/PROJECT_STATE.md.

Project Structure

apps/web — Frontend React 18 + Vite 5 + TypeScript strict (source of truth for the UI)
veza-backend-api — Main Go 1.25 API service (Gin, GORM, Postgres, Redis, RabbitMQ, Elasticsearch). Handles REST, WebSocket, and chat (chat server was merged into this service in v0.502).
veza-stream-server — Rust streaming server (Axum 0.8, Tokio 1.35, Symphonia) — HLS, HTTP Range, WebSocket, gRPC
veza-common — Shared Rust types and logging
packages/design-system — Shared design tokens

See CLAUDE.md for the full architecture map.

Development Setup

Prerequisites: Node 20 (see .nvmrc), Go, Rust, Docker. Configure .env from .env.example.

# Verify environment
make doctor
./scripts/validate-env.sh development

# Install dependencies
make install-deps

# Option A — Backend in Docker + Web local
make dev

# Option B — All apps local with hot reload (infra from docker-compose.dev.yml)
make dev-full

# Option C — Infra only, then run services manually
docker compose -f docker-compose.dev.yml up -d
make dev-web              # or make dev-backend-api, make dev-stream-server

See docs/ENV_VARIABLES.md for required variables. make build builds all services.

Quick Start

Frontend only

cd apps/web
npm install
npm run dev

Docker Production

Canonical production compose file: docker-compose.prod.yml

docker compose -f docker-compose.prod.yml up -d

See make/config.mk for COMPOSE_PROD and deployment docs.

CI/CD

Badge : CI status above. Set SLACK_WEBHOOK_URL (Incoming Webhook) in repo secrets to receive Slack notifications on failure.

Disabled workflows

Storybook (chromatic.yml.disabled, storybook-audit.yml.disabled, visual-regression.yml.disabled): deferred until MSW is wired up for /api/v1/auth/me and /api/v1/logs/frontend, which currently causes ~1 400 network errors in the Storybook build. The npm scripts (storybook, build-storybook) still work locally for one-off component inspection. To reactivate in CI, fix the MSW handlers and rename the three files back to .yml.

Documentation

Developer Onboarding — Setup, architecture, conventions, troubleshooting
Documentation index — Index complet de la documentation
See docs/ for detailed architecture and development guides. Older audits and reports are archived in docs/archive/.