Two complementary signals : pool-side (do we have enough connections
for the load?) and per-request side (does any single handler quietly
run hundreds of queries?). Both feed Prometheus + Grafana + alert
rules.
Pool stats exporter (internal/database/pool_stats_exporter.go) :
- Background goroutine ticks every 15s and feeds the existing
veza_db_connections{state} gauges. Before this, the gauges only
refreshed when /health/deep was hit, so PoolExhaustionImminent
evaluated against stale data.
- Wired into cmd/api/main.go alongside the ledger sampler with a
shutdown hook for clean cancellation.
N+1 detector (internal/database/n1_detector.go +
internal/middleware/n1_query_counter.go) :
- Per-request *int64 counter attached to ctx by the gin
middleware ; GORM after-callback (Query/Create/Update/Delete/
Row/Raw) atomic-adds.
- Cost : one pointer load + one atomic add per query.
- Cardinality bounded by c.FullPath() (templated route, not URL).
- Threshold default 50, override via VEZA_N1_THRESHOLD.
- Histogram veza_db_request_query_count + counter
veza_db_n1_suspicions_total.
Alerts in alert_rules.yml veza_db_pool_n1 group :
- PoolExhaustionImminent (in_use ≥ 90% for 5m)
- PoolStatsExporterStuck (gauges frozen for 10m despite traffic)
- N1QuerySpike (> 3% of requests over threshold for 15m)
- SlowQuerySustained (slow query rate > 2/min for 15m on same op+table)
Tests : 8 detector tests + 4 middleware tests, all pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
87 lines
2.8 KiB
Go
87 lines
2.8 KiB
Go
package middleware
|
|
|
|
// N+1 query counter middleware — v1.0.10 ops item 11.
|
|
//
|
|
// This middleware works in tandem with internal/database/n1_detector.go.
|
|
// It runs once per HTTP request :
|
|
// 1. Attach a fresh per-request query counter to the request ctx
|
|
// 2. Replace c.Request.Context() with the augmented one so any
|
|
// handler that reads from c.Request.Context() inherits the
|
|
// counter (which is propagated to GORM via db.WithContext).
|
|
// 3. After the handler returns, read the counter and report it
|
|
// to the metrics + the warn log (when over threshold).
|
|
//
|
|
// The middleware is GORM-agnostic ; it just attaches a counter to
|
|
// ctx and reads it back. The DB-side counting is done by callbacks
|
|
// registered via database.RegisterN1Callbacks(db) at startup.
|
|
|
|
import (
|
|
"strconv"
|
|
"sync/atomic"
|
|
|
|
"veza-backend-api/internal/database"
|
|
|
|
"github.com/gin-gonic/gin"
|
|
"go.uber.org/zap"
|
|
)
|
|
|
|
// N1QueryCounterConfig configures the threshold + logger. Threshold
|
|
// is read from VEZA_N1_THRESHOLD at startup ; the constructor
|
|
// applies a default of 50 when 0 is passed.
|
|
type N1QueryCounterConfig struct {
|
|
Logger *zap.Logger
|
|
Threshold int64
|
|
Enabled bool
|
|
}
|
|
|
|
// N1QueryCounter returns a gin middleware that counts DB queries
|
|
// per request and logs+metrics when the count exceeds threshold.
|
|
// Threshold ≤ 0 disables the warn log path but the histogram
|
|
// always records (so the metric is available regardless).
|
|
func N1QueryCounter(cfg N1QueryCounterConfig) gin.HandlerFunc {
|
|
threshold := cfg.Threshold
|
|
if threshold <= 0 {
|
|
threshold = 50
|
|
}
|
|
if !cfg.Enabled {
|
|
// Pass-through ; lets the call site keep wiring the
|
|
// middleware unconditionally and toggle via env without
|
|
// re-reading the router config.
|
|
return func(c *gin.Context) { c.Next() }
|
|
}
|
|
return func(c *gin.Context) {
|
|
ctx, counter := database.AttachCounter(c.Request.Context())
|
|
c.Request = c.Request.WithContext(ctx)
|
|
|
|
c.Next()
|
|
|
|
count := atomic.LoadInt64(counter)
|
|
// Use the gin matched route (c.FullPath()) — this is the
|
|
// templated route ("/api/v1/tracks/:id"), not the literal
|
|
// URL with the user-supplied id substituted in. That
|
|
// gives us bounded cardinality (one label value per
|
|
// route) instead of one per unique URL.
|
|
route := c.FullPath()
|
|
if route == "" {
|
|
route = "unmatched"
|
|
}
|
|
database.ReportRequestQueryCount(database.N1DetectorConfig{
|
|
Logger: cfg.Logger,
|
|
Threshold: threshold,
|
|
Enabled: cfg.Enabled,
|
|
}, route, count)
|
|
}
|
|
}
|
|
|
|
// N1ThresholdFromEnv reads VEZA_N1_THRESHOLD with the given default.
|
|
// Helper used by router setup.
|
|
func N1ThresholdFromEnv(envVal string, defaultThreshold int64) int64 {
|
|
if envVal == "" {
|
|
return defaultThreshold
|
|
}
|
|
parsed, err := strconv.ParseInt(envVal, 10, 64)
|
|
if err != nil || parsed < 0 {
|
|
return defaultThreshold
|
|
}
|
|
return parsed
|
|
}
|