Two complementary signals : pool-side (do we have enough connections
for the load?) and per-request side (does any single handler quietly
run hundreds of queries?). Both feed Prometheus + Grafana + alert
rules.
Pool stats exporter (internal/database/pool_stats_exporter.go) :
- Background goroutine ticks every 15s and feeds the existing
veza_db_connections{state} gauges. Before this, the gauges only
refreshed when /health/deep was hit, so PoolExhaustionImminent
evaluated against stale data.
- Wired into cmd/api/main.go alongside the ledger sampler with a
shutdown hook for clean cancellation.
N+1 detector (internal/database/n1_detector.go +
internal/middleware/n1_query_counter.go) :
- Per-request *int64 counter attached to ctx by the gin
middleware ; GORM after-callback (Query/Create/Update/Delete/
Row/Raw) atomic-adds.
- Cost : one pointer load + one atomic add per query.
- Cardinality bounded by c.FullPath() (templated route, not URL).
- Threshold default 50, override via VEZA_N1_THRESHOLD.
- Histogram veza_db_request_query_count + counter
veza_db_n1_suspicions_total.
Alerts in alert_rules.yml veza_db_pool_n1 group :
- PoolExhaustionImminent (in_use ≥ 90% for 5m)
- PoolStatsExporterStuck (gauges frozen for 10m despite traffic)
- N1QuerySpike (> 3% of requests over threshold for 15m)
- SlowQuerySustained (slow query rate > 2/min for 15m on same op+table)
Tests : 8 detector tests + 4 middleware tests, all pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
87 lines
2.8 KiB
Go
87 lines
2.8 KiB
Go
package database
|
|
|
|
import (
|
|
"context"
|
|
"sync"
|
|
"sync/atomic"
|
|
"testing"
|
|
|
|
"github.com/stretchr/testify/assert"
|
|
)
|
|
|
|
// AttachCounter / CounterFromContext are pure-Go helpers — no DB
|
|
// needed. Tests here cover the contract that the gin middleware
|
|
// + GORM callback rely on.
|
|
|
|
func TestAttachCounter_FreshCounterIsZero(t *testing.T) {
|
|
ctx, counter := AttachCounter(context.Background())
|
|
|
|
assert.NotNil(t, counter)
|
|
assert.Equal(t, int64(0), atomic.LoadInt64(counter))
|
|
|
|
got := CounterFromContext(ctx)
|
|
assert.Same(t, counter, got, "ctx-stored counter must be the same pointer")
|
|
}
|
|
|
|
func TestCounterFromContext_NilWhenNotAttached(t *testing.T) {
|
|
got := CounterFromContext(context.Background())
|
|
assert.Nil(t, got, "ctx without AttachCounter must return nil")
|
|
}
|
|
|
|
func TestAttachCounter_PerRequestIsolation(t *testing.T) {
|
|
// Each AttachCounter call returns its own pointer ; one
|
|
// request's queries must not leak into another request's
|
|
// counter.
|
|
ctx1, c1 := AttachCounter(context.Background())
|
|
ctx2, c2 := AttachCounter(context.Background())
|
|
|
|
atomic.AddInt64(c1, 5)
|
|
atomic.AddInt64(c2, 3)
|
|
|
|
assert.Equal(t, int64(5), *CounterFromContext(ctx1))
|
|
assert.Equal(t, int64(3), *CounterFromContext(ctx2))
|
|
assert.NotSame(t, c1, c2)
|
|
}
|
|
|
|
func TestAttachCounter_ConcurrentIncrement(t *testing.T) {
|
|
// The counter is bumped atomically from GORM callbacks ; in
|
|
// real use multiple goroutines may share the same context
|
|
// (e.g. errgroup parallel queries inside one request). Verify
|
|
// 1000 concurrent increments land safely.
|
|
_, counter := AttachCounter(context.Background())
|
|
const n = 1000
|
|
|
|
var wg sync.WaitGroup
|
|
wg.Add(n)
|
|
for i := 0; i < n; i++ {
|
|
go func() {
|
|
defer wg.Done()
|
|
atomic.AddInt64(counter, 1)
|
|
}()
|
|
}
|
|
wg.Wait()
|
|
|
|
assert.Equal(t, int64(n), atomic.LoadInt64(counter))
|
|
}
|
|
|
|
func TestReportRequestQueryCount_DisabledIsNoop(t *testing.T) {
|
|
// When Enabled=false the call must not panic and not touch
|
|
// the metrics ; the histogram is asserted by the metrics
|
|
// package's own test, here we just verify the contract.
|
|
ReportRequestQueryCount(N1DetectorConfig{Enabled: false}, "/api/v1/foo", 9999)
|
|
}
|
|
|
|
func TestReportRequestQueryCount_BelowThresholdSkipsLog(t *testing.T) {
|
|
// With Logger=nil the function must not panic when count is
|
|
// at or below threshold — the caller passes a nil logger when
|
|
// the env-driven config doesn't supply one.
|
|
ReportRequestQueryCount(N1DetectorConfig{Enabled: true, Threshold: 50}, "/api/v1/foo", 49)
|
|
ReportRequestQueryCount(N1DetectorConfig{Enabled: true, Threshold: 50}, "/api/v1/foo", 50)
|
|
}
|
|
|
|
func TestReportRequestQueryCount_AboveThresholdNoLogger(t *testing.T) {
|
|
// Above threshold but Logger=nil : must still bump the
|
|
// suspicion counter (verified indirectly — no panic) and
|
|
// not crash on the missing logger.
|
|
ReportRequestQueryCount(N1DetectorConfig{Enabled: true, Threshold: 10}, "/api/v1/foo", 100)
|
|
}
|