veza/veza-backend-api/internal/monitoring/metrics.go

345 lines
9.5 KiB
Go
Raw Normal View History

2025-12-03 19:29:37 +00:00
package monitoring
import (
"time"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
)
// Métriques Prometheus custom pour l'application Veza
var (
// HTTP Requests Metrics
HTTPRequestsTotal = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "veza_http_requests_total",
Help: "Total number of HTTP requests",
},
[]string{"method", "endpoint", "status"},
)
HTTPRequestDuration = promauto.NewHistogramVec(
prometheus.HistogramOpts{
Name: "veza_http_request_duration_seconds",
Help: "HTTP request duration in seconds",
Buckets: []float64{0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0, 5.0},
},
[]string{"method", "endpoint"},
)
// Authentication Metrics
AuthLoginAttempts = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "veza_auth_login_attempts_total",
Help: "Total number of login attempts",
},
[]string{"success"},
)
AuthSessionActive = promauto.NewGauge(
prometheus.GaugeOpts{
Name: "veza_auth_sessions_active",
Help: "Number of active sessions",
},
)
// Database Metrics
DatabaseQueryDuration = promauto.NewHistogramVec(
prometheus.HistogramOpts{
Name: "veza_database_query_duration_seconds",
Help: "Database query duration in seconds",
Buckets: []float64{0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0},
},
[]string{"operation", "table"},
)
DatabaseConnectionsActive = promauto.NewGauge(
prometheus.GaugeOpts{
Name: "veza_database_connections_active",
Help: "Number of active database connections",
},
)
DatabaseQueryErrors = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "veza_database_query_errors_total",
Help: "Total number of database query errors",
},
[]string{"operation", "error_type"},
)
// File Upload Metrics
FileUploadsTotal = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "veza_file_uploads_total",
Help: "Total number of file uploads",
},
[]string{"type", "status"},
)
FileUploadSize = promauto.NewHistogramVec(
prometheus.HistogramOpts{
Name: "veza_file_upload_size_bytes",
Help: "File upload size in bytes",
Buckets: prometheus.ExponentialBuckets(1024, 2, 15), // 1KB to 32MB
},
[]string{"type"},
)
// Rate Limiting Metrics
RateLimitHitsTotal = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "veza_rate_limit_hits_total",
Help: "Total number of rate limit hits",
},
[]string{"endpoint", "limit_type"},
)
// Active Users Metrics
ActiveUsers = promauto.NewGauge(
prometheus.GaugeOpts{
Name: "veza_active_users",
Help: "Number of active users",
},
)
// WebSocket Metrics
WebSocketConnectionsActive = promauto.NewGauge(
prometheus.GaugeOpts{
Name: "veza_websocket_connections_active",
Help: "Number of active WebSocket connections",
},
)
WebSocketMessagesTotal = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "veza_websocket_messages_total",
Help: "Total number of WebSocket messages",
},
[]string{"type", "status"},
)
feat(cdn): Bunny.net signed URLs + HLS cache headers + metric collision fix (W3 Day 13) CDN edge in front of S3/MinIO via origin-pull. Backend signs URLs with Bunny.net token-auth (SHA-256 over security_key + path + expires) so edges verify before serving cached objects ; origin is never hit on a valid token. Cloudflare CDN / R2 / CloudFront stubs kept. - internal/services/cdn_service.go : new providers CDNProviderBunny + CDNProviderCloudflareR2. SecurityKey added to CDNConfig. generateBunnySignedURL implements the documented Bunny scheme (url-safe base64, no padding, expires query). HLSSegmentCacheHeaders + HLSPlaylistCacheHeaders helpers exported for handlers. - internal/services/cdn_service_test.go : pin Bunny URL shape + base64-url charset ; assert empty SecurityKey fails fast (no silent fallback to unsigned URLs). - internal/core/track/service.go : new CDNURLSigner interface + SetCDNService(cdn). GetStorageURL prefers CDN signed URL when cdnService.IsEnabled, falls back to direct S3 presign on signing error so a CDN partial outage doesn't block playback. - internal/api/routes_tracks.go + routes_core.go : wire SetCDNService on the two TrackService construction sites that serve stream/download. - internal/config/config.go : 4 new env vars (CDN_ENABLED, CDN_PROVIDER, CDN_BASE_URL, CDN_SECURITY_KEY). config.CDNService always non-nil after init ; IsEnabled gates the actual usage. - internal/handlers/hls_handler.go : segments now return Cache-Control: public, max-age=86400, immutable (content-addressed filenames make this safe). Playlists at max-age=60. - veza-backend-api/.env.template : 4 placeholder env vars. - docs/ENV_VARIABLES.md §12 : provider matrix + Bunny vs Cloudflare vs R2 trade-offs. Bug fix collateral : v1.0.9 Day 11 introduced veza_cache_hits_total which collided in name with monitoring.CacheHitsTotal (different label set ⇒ promauto MustRegister panic at process init). Day 13 deletes the monitoring duplicate and restores the metrics-package counter as the single source of truth (label: subsystem). All 8 affected packages green : services, core/track, handlers, middleware, websocket/chat, metrics, monitoring, config. Acceptance (Day 13) : code path is wired ; verifying via real Bunny edge requires a Pull Zone provisioned by the user (EX-? in roadmap). On the user side : create Pull Zone w/ origin = MinIO, copy token auth key into CDN_SECURITY_KEY, set CDN_ENABLED=true. W3 progress : Redis Sentinel ✓ · MinIO distribué ✓ · CDN ✓ · DMCA ⏳ Day 14 · embed ⏳ Day 15. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 12:07:20 +00:00
// Cache Metrics moved to internal/metrics/cache_hit_rate.go in
// v1.0.9 W3 Day 13. Use metrics.RecordCacheHit / RecordCacheMiss.
2025-12-03 19:29:37 +00:00
// Error Metrics
ErrorsTotal = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "veza_errors_total",
Help: "Total number of errors",
},
[]string{"type", "severity"},
)
P0: stabilisation backend/chat/stream + nouvelle base migrations v1 Backend Go: - Remplacement complet des anciennes migrations par la base V1 alignée sur ORIGIN. - Durcissement global du parsing JSON (BindAndValidateJSON + RespondWithAppError). - Sécurisation de config.go, CORS, statuts de santé et monitoring. - Implémentation des transactions P0 (RBAC, duplication de playlists, social toggles). - Ajout d’un job worker structuré (emails, analytics, thumbnails) + tests associés. - Nouvelle doc backend : AUDIT_CONFIG, BACKEND_CONFIG, AUTH_PASSWORD_RESET, JOB_WORKER_*. Chat server (Rust): - Refonte du pipeline JWT + sécurité, audit et rate limiting avancé. - Implémentation complète du cycle de message (read receipts, delivered, edit/delete, typing). - Nettoyage des panics, gestion d’erreurs robuste, logs structurés. - Migrations chat alignées sur le schéma UUID et nouvelles features. Stream server (Rust): - Refonte du moteur de streaming (encoding pipeline + HLS) et des modules core. - Transactions P0 pour les jobs et segments, garanties d’atomicité. - Documentation détaillée de la pipeline (AUDIT_STREAM_*, DESIGN_STREAM_PIPELINE, TRANSACTIONS_P0_IMPLEMENTATION). Documentation & audits: - TRIAGE.md et AUDIT_STABILITY.md à jour avec l’état réel des 3 services. - Cartographie complète des migrations et des transactions (DB_MIGRATIONS_*, DB_TRANSACTION_PLAN, AUDIT_DB_TRANSACTIONS, TRANSACTION_TESTS_PHASE3). - Scripts de reset et de cleanup pour la lab DB et la V1. Ce commit fige l’ensemble du travail de stabilisation P0 (UUID, backend, chat et stream) avant les phases suivantes (Coherence Guardian, WS hardening, etc.).
2025-12-06 10:14:38 +00:00
// Health Check Metrics
HealthCheckDuration = promauto.NewHistogramVec(
prometheus.HistogramOpts{
Name: "veza_health_check_duration_ms",
Help: "Health check duration in milliseconds",
Buckets: []float64{1, 5, 10, 25, 50, 100, 250, 500, 1000},
},
[]string{"service"}, // database, redis, chat_server, stream_server
)
HealthCheckStatus = promauto.NewGaugeVec(
prometheus.GaugeOpts{
Name: "veza_health_check_status",
Help: "Health check status (1=ok, 0.5=slow, 0=error)",
},
[]string{"service"},
)
2025-12-16 18:34:08 +00:00
// MOD-P2-003: Business Metrics
TracksUploadedTotal = promauto.NewCounter(
prometheus.CounterOpts{
Name: "veza_tracks_uploaded_total",
Help: "Total number of tracks uploaded",
},
)
// v0.602 INF2: Commerce Metrics
CommerceOrdersTotal = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "veza_commerce_orders_total",
Help: "Total number of commerce orders created",
},
[]string{"status"},
)
CommerceCheckoutDuration = promauto.NewHistogram(
prometheus.HistogramOpts{
Name: "veza_commerce_checkout_duration_seconds",
Help: "Checkout (cart to order) duration in seconds",
Buckets: []float64{0.1, 0.25, 0.5, 1.0, 2.0, 5.0},
},
)
2025-12-16 18:34:08 +00:00
UsersRegisteredTotal = promauto.NewCounter(
prometheus.CounterOpts{
Name: "veza_users_registered_total",
Help: "Total number of users registered",
},
)
PlaylistsCreatedTotal = promauto.NewCounter(
prometheus.CounterOpts{
Name: "veza_playlists_created_total",
Help: "Total number of playlists created",
},
)
UploadsFailedTotal = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "veza_uploads_failed_total",
Help: "Total number of failed uploads",
},
[]string{"reason"}, // ClamAV, validation, quota, etc.
)
// v0.701: Transfer Retry Metrics
TransferRetryTotal = promauto.NewCounter(prometheus.CounterOpts{
Name: "veza_transfer_retry_total",
Help: "Total number of transfer retry attempts",
})
TransferRetrySuccess = promauto.NewCounter(prometheus.CounterOpts{
Name: "veza_transfer_retry_success_total",
Help: "Total number of successful transfer retries",
})
TransferRetryFailures = promauto.NewCounter(prometheus.CounterOpts{
Name: "veza_transfer_retry_failures_total",
Help: "Total number of failed transfer retries",
})
TransferRetryPermanent = promauto.NewCounter(prometheus.CounterOpts{
Name: "veza_transfer_retry_permanent_failures_total",
Help: "Total number of permanently failed transfers (max retries exceeded)",
})
// v0.703: Live Streaming Metrics
LiveStreamsActive = promauto.NewGauge(prometheus.GaugeOpts{
Name: "veza_live_streams_active",
Help: "Number of currently active live streams",
})
LiveStreamViewersTotal = promauto.NewGauge(prometheus.GaugeOpts{
Name: "veza_live_stream_viewers_total",
Help: "Total number of viewers across all active live streams",
})
2025-12-03 19:29:37 +00:00
)
// Middleware pour enregistrer les métriques HTTP
func HTTPMetricsMiddleware(endpoint string, duration time.Duration, statusCode int, method string) {
status := string(rune(statusCode / 100)) // '2', '4', '5'
HTTPRequestsTotal.WithLabelValues(method, endpoint, status).Inc()
HTTPRequestDuration.WithLabelValues(method, endpoint).Observe(duration.Seconds())
}
// Enregistrer une tentative de login
func RecordLoginAttempt(success bool) {
status := "failure"
if success {
status = "success"
}
AuthLoginAttempts.WithLabelValues(status).Inc()
}
// Mettre à jour le nombre de sessions actives
func UpdateActiveSessions(count int) {
AuthSessionActive.Set(float64(count))
}
// Enregistrer une requête database
func RecordDatabaseQuery(operation, table string, duration time.Duration) {
DatabaseQueryDuration.WithLabelValues(operation, table).Observe(duration.Seconds())
}
// Enregistrer une erreur de database
func RecordDatabaseError(operation, errorType string) {
DatabaseQueryErrors.WithLabelValues(operation, errorType).Inc()
}
// Enregistrer un upload de fichier
func RecordFileUpload(fileType, status string, sizeBytes int64) {
FileUploadsTotal.WithLabelValues(fileType, status).Inc()
FileUploadSize.WithLabelValues(fileType).Observe(float64(sizeBytes))
}
// Enregistrer un hit de rate limit
func RecordRateLimitHit(endpoint, limitType string) {
RateLimitHitsTotal.WithLabelValues(endpoint, limitType).Inc()
}
// Mettre à jour le nombre d'utilisateurs actifs
func UpdateActiveUsers(count int) {
ActiveUsers.Set(float64(count))
}
// Enregistrer une connexion WebSocket
func UpdateWebSocketConnections(count int) {
WebSocketConnectionsActive.Set(float64(count))
}
// Enregistrer un message WebSocket
func RecordWebSocketMessage(messageType, status string) {
WebSocketMessagesTotal.WithLabelValues(messageType, status).Inc()
}
feat(cdn): Bunny.net signed URLs + HLS cache headers + metric collision fix (W3 Day 13) CDN edge in front of S3/MinIO via origin-pull. Backend signs URLs with Bunny.net token-auth (SHA-256 over security_key + path + expires) so edges verify before serving cached objects ; origin is never hit on a valid token. Cloudflare CDN / R2 / CloudFront stubs kept. - internal/services/cdn_service.go : new providers CDNProviderBunny + CDNProviderCloudflareR2. SecurityKey added to CDNConfig. generateBunnySignedURL implements the documented Bunny scheme (url-safe base64, no padding, expires query). HLSSegmentCacheHeaders + HLSPlaylistCacheHeaders helpers exported for handlers. - internal/services/cdn_service_test.go : pin Bunny URL shape + base64-url charset ; assert empty SecurityKey fails fast (no silent fallback to unsigned URLs). - internal/core/track/service.go : new CDNURLSigner interface + SetCDNService(cdn). GetStorageURL prefers CDN signed URL when cdnService.IsEnabled, falls back to direct S3 presign on signing error so a CDN partial outage doesn't block playback. - internal/api/routes_tracks.go + routes_core.go : wire SetCDNService on the two TrackService construction sites that serve stream/download. - internal/config/config.go : 4 new env vars (CDN_ENABLED, CDN_PROVIDER, CDN_BASE_URL, CDN_SECURITY_KEY). config.CDNService always non-nil after init ; IsEnabled gates the actual usage. - internal/handlers/hls_handler.go : segments now return Cache-Control: public, max-age=86400, immutable (content-addressed filenames make this safe). Playlists at max-age=60. - veza-backend-api/.env.template : 4 placeholder env vars. - docs/ENV_VARIABLES.md §12 : provider matrix + Bunny vs Cloudflare vs R2 trade-offs. Bug fix collateral : v1.0.9 Day 11 introduced veza_cache_hits_total which collided in name with monitoring.CacheHitsTotal (different label set ⇒ promauto MustRegister panic at process init). Day 13 deletes the monitoring duplicate and restores the metrics-package counter as the single source of truth (label: subsystem). All 8 affected packages green : services, core/track, handlers, middleware, websocket/chat, metrics, monitoring, config. Acceptance (Day 13) : code path is wired ; verifying via real Bunny edge requires a Pull Zone provisioned by the user (EX-? in roadmap). On the user side : create Pull Zone w/ origin = MinIO, copy token auth key into CDN_SECURITY_KEY, set CDN_ENABLED=true. W3 progress : Redis Sentinel ✓ · MinIO distribué ✓ · CDN ✓ · DMCA ⏳ Day 14 · embed ⏳ Day 15. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 12:07:20 +00:00
// RecordCacheHit / RecordCacheMiss moved to internal/metrics package
// in v1.0.9 W3 Day 13 — see internal/metrics/cache_hit_rate.go.
2025-12-03 19:29:37 +00:00
// Enregistrer une erreur
func RecordError(errorType, severity string) {
ErrorsTotal.WithLabelValues(errorType, severity).Inc()
}
P0: stabilisation backend/chat/stream + nouvelle base migrations v1 Backend Go: - Remplacement complet des anciennes migrations par la base V1 alignée sur ORIGIN. - Durcissement global du parsing JSON (BindAndValidateJSON + RespondWithAppError). - Sécurisation de config.go, CORS, statuts de santé et monitoring. - Implémentation des transactions P0 (RBAC, duplication de playlists, social toggles). - Ajout d’un job worker structuré (emails, analytics, thumbnails) + tests associés. - Nouvelle doc backend : AUDIT_CONFIG, BACKEND_CONFIG, AUTH_PASSWORD_RESET, JOB_WORKER_*. Chat server (Rust): - Refonte du pipeline JWT + sécurité, audit et rate limiting avancé. - Implémentation complète du cycle de message (read receipts, delivered, edit/delete, typing). - Nettoyage des panics, gestion d’erreurs robuste, logs structurés. - Migrations chat alignées sur le schéma UUID et nouvelles features. Stream server (Rust): - Refonte du moteur de streaming (encoding pipeline + HLS) et des modules core. - Transactions P0 pour les jobs et segments, garanties d’atomicité. - Documentation détaillée de la pipeline (AUDIT_STREAM_*, DESIGN_STREAM_PIPELINE, TRANSACTIONS_P0_IMPLEMENTATION). Documentation & audits: - TRIAGE.md et AUDIT_STABILITY.md à jour avec l’état réel des 3 services. - Cartographie complète des migrations et des transactions (DB_MIGRATIONS_*, DB_TRANSACTION_PLAN, AUDIT_DB_TRANSACTIONS, TRANSACTION_TESTS_PHASE3). - Scripts de reset et de cleanup pour la lab DB et la V1. Ce commit fige l’ensemble du travail de stabilisation P0 (UUID, backend, chat et stream) avant les phases suivantes (Coherence Guardian, WS hardening, etc.).
2025-12-06 10:14:38 +00:00
// Enregistrer un health check
func RecordHealthCheck(service string, durationMs float64, status string) {
HealthCheckDuration.WithLabelValues(service).Observe(durationMs)
P0: stabilisation backend/chat/stream + nouvelle base migrations v1 Backend Go: - Remplacement complet des anciennes migrations par la base V1 alignée sur ORIGIN. - Durcissement global du parsing JSON (BindAndValidateJSON + RespondWithAppError). - Sécurisation de config.go, CORS, statuts de santé et monitoring. - Implémentation des transactions P0 (RBAC, duplication de playlists, social toggles). - Ajout d’un job worker structuré (emails, analytics, thumbnails) + tests associés. - Nouvelle doc backend : AUDIT_CONFIG, BACKEND_CONFIG, AUTH_PASSWORD_RESET, JOB_WORKER_*. Chat server (Rust): - Refonte du pipeline JWT + sécurité, audit et rate limiting avancé. - Implémentation complète du cycle de message (read receipts, delivered, edit/delete, typing). - Nettoyage des panics, gestion d’erreurs robuste, logs structurés. - Migrations chat alignées sur le schéma UUID et nouvelles features. Stream server (Rust): - Refonte du moteur de streaming (encoding pipeline + HLS) et des modules core. - Transactions P0 pour les jobs et segments, garanties d’atomicité. - Documentation détaillée de la pipeline (AUDIT_STREAM_*, DESIGN_STREAM_PIPELINE, TRANSACTIONS_P0_IMPLEMENTATION). Documentation & audits: - TRIAGE.md et AUDIT_STABILITY.md à jour avec l’état réel des 3 services. - Cartographie complète des migrations et des transactions (DB_MIGRATIONS_*, DB_TRANSACTION_PLAN, AUDIT_DB_TRANSACTIONS, TRANSACTION_TESTS_PHASE3). - Scripts de reset et de cleanup pour la lab DB et la V1. Ce commit fige l’ensemble du travail de stabilisation P0 (UUID, backend, chat et stream) avant les phases suivantes (Coherence Guardian, WS hardening, etc.).
2025-12-06 10:14:38 +00:00
// Convertir le status en valeur numérique pour la gauge
var statusValue float64
switch status {
case "ok":
statusValue = 1.0
case "slow":
statusValue = 0.5
case "error":
statusValue = 0.0
default:
statusValue = 0.0
}
HealthCheckStatus.WithLabelValues(service).Set(statusValue)
}
2025-12-16 18:34:08 +00:00
// MOD-P2-003: Business Metrics Functions
// RecordTrackUploaded incrémente le compteur de tracks uploadés
func RecordTrackUploaded() {
TracksUploadedTotal.Inc()
}
// RecordUserRegistered incrémente le compteur d'utilisateurs enregistrés
func RecordUserRegistered() {
UsersRegisteredTotal.Inc()
}
// RecordPlaylistCreated incrémente le compteur de playlists créées
func RecordPlaylistCreated() {
PlaylistsCreatedTotal.Inc()
}
// RecordUploadFailed enregistre un échec d'upload avec la raison
func RecordUploadFailed(reason string) {
UploadsFailedTotal.WithLabelValues(reason).Inc()
}
// v0.701: Transfer Retry Metrics
func RecordTransferRetry() { TransferRetryTotal.Inc() }
2026-03-05 22:03:43 +00:00
func RecordTransferRetrySuccess() { TransferRetrySuccess.Inc() }
func RecordTransferRetryFailure() { TransferRetryFailures.Inc() }
func RecordTransferRetryPermanent() { TransferRetryPermanent.Inc() }