Two complementary signals : pool-side (do we have enough connections
for the load?) and per-request side (does any single handler quietly
run hundreds of queries?). Both feed Prometheus + Grafana + alert
rules.
Pool stats exporter (internal/database/pool_stats_exporter.go) :
- Background goroutine ticks every 15s and feeds the existing
veza_db_connections{state} gauges. Before this, the gauges only
refreshed when /health/deep was hit, so PoolExhaustionImminent
evaluated against stale data.
- Wired into cmd/api/main.go alongside the ledger sampler with a
shutdown hook for clean cancellation.
N+1 detector (internal/database/n1_detector.go +
internal/middleware/n1_query_counter.go) :
- Per-request *int64 counter attached to ctx by the gin
middleware ; GORM after-callback (Query/Create/Update/Delete/
Row/Raw) atomic-adds.
- Cost : one pointer load + one atomic add per query.
- Cardinality bounded by c.FullPath() (templated route, not URL).
- Threshold default 50, override via VEZA_N1_THRESHOLD.
- Histogram veza_db_request_query_count + counter
veza_db_n1_suspicions_total.
Alerts in alert_rules.yml veza_db_pool_n1 group :
- PoolExhaustionImminent (in_use ≥ 90% for 5m)
- PoolStatsExporterStuck (gauges frozen for 10m despite traffic)
- N1QuerySpike (> 3% of requests over threshold for 15m)
- SlowQuerySustained (slow query rate > 2/min for 15m on same op+table)
Tests : 8 detector tests + 4 middleware tests, all pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|---|---|---|
| .. | ||
| chat_repository.go | ||
| database.go | ||
| migration_rollback_test.go | ||
| migrations.go | ||
| migrations_password_reset_test.go | ||
| migrations_sessions_test.go | ||
| migrations_test.go | ||
| n1_detector.go | ||
| n1_detector_test.go | ||
| performance_monitor.go | ||
| pool.go | ||
| pool_stats_exporter.go | ||
| pool_test.go | ||
| prepared_statements.go | ||