veza/veza-backend-api/internal/monitoring/business_metrics.go
senke a8a8b47b00 fix(backend): print config-init error to stderr before silent exit
main.go's config-load failure path silently os.Exit(1)s, which means
lumberjack's file-rotation buffer never flushes before exit and the
journal only sees \"started → exited 1\" with zero diagnostic. Last
deploy run's app log had only the \"Logger initialized\" line; the
actual NewConfig error never made it to disk because os.Exit doesn't
run defers.

A plain fmt.Fprintf to stderr → goes to systemd journal synchronously
→ the next probe rescue dump will show what's actually failing.

The original \"don't write to stderr to avoid broken pipe with
journald\" comment cited a concern that doesn't apply at this point in
startup: there's no parent to break the pipe to, and journald accepts
arbitrary bytes on stderr. Keep the os.Exit but print first.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 12:34:17 +02:00

137 lines
5.7 KiB
Go

package monitoring
// Business KPI metrics — v1.0.10 ops item 10.
//
// The platform already has technical metrics (request rate, latency,
// error rate, queue depth, etc.) wired through the http middleware,
// the SLO recording rules in config/prometheus/slo.yml, and a few
// per-feature counters (TracksUploadedTotal, UsersRegisteredTotal,
// PlaylistsCreatedTotal in metrics.go). Those answer "is the
// platform healthy ?" but not "is the business healthy ?".
//
// This file adds the business-side counters that drive the alert
// rules in alert_rules.yml `veza_business` group : login pass/fail
// rate (account-takeover signal), order lifecycle, revenue. A
// signup drop of -50% in 1h vs the same hour last week is a real
// product / signup-flow incident that the existing infra alerts
// don't catch.
//
// Naming convention : `veza_business_*_total` for counters,
// matching the existing `veza_*_total` style. Labels are bounded
// (status enum) so cardinality stays low — Prometheus tolerates
// thousands of distinct label value combinations but bills CPU
// for them.
import (
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
)
var (
// BusinessLoginsTotal counts every login attempt by outcome.
// Outcomes : "success", "failure_credentials" (wrong pwd),
// "failure_unverified" (email not verified), "failure_locked"
// (account locked), "failure_2fa" (2FA failed), "failure_other".
// The granularity matches the auth handler's branching ; new
// outcomes need a new label value AND a new alert if the rate
// matters.
BusinessLoginsTotal = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "veza_business_logins_total",
Help: "Logins broken down by outcome. Drives the auth-failure-spike alert.",
},
[]string{"outcome"},
)
// BusinessOrdersTotal counts every marketplace order state
// transition. Statuses : "created" (CreateOrder), "completed"
// (payment webhook), "refunded" (refund webhook), "failed"
// (payment provider rejected). The "created → completed" gap
// (over time) is the funnel ; alerts page when it widens.
BusinessOrdersTotal = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "veza_business_orders_total",
Help: "Marketplace orders by status transition.",
},
[]string{"status"},
)
// BusinessRevenueCentsTotal accumulates platform revenue in
// minor currency units (cents/centimes). Labelled by currency
// because EUR + USD orders go to the same counter and Grafana
// needs to split for display. Cents (int) avoids float
// precision drift across millions of orders.
BusinessRevenueCentsTotal = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "veza_business_revenue_cents_total",
Help: "Cumulative platform revenue in minor units (e.g. EUR cents).",
},
[]string{"currency"},
)
// BusinessAccountDeletionsTotal counts hard-delete + GDPR-erasure
// account closures. Spikes are a churn signal ; sustained drops
// to zero indicate the deletion endpoint is broken (which is
// also a problem — RGPD requires the surface to be reachable).
BusinessAccountDeletionsTotal = promauto.NewCounter(
prometheus.CounterOpts{
Name: "veza_business_account_deletions_total",
Help: "Hard-delete + RGPD account erasures.",
},
)
)
// RecordLogin increments the appropriate login outcome counter.
// outcome must be one of the pre-declared label values ; passing an
// unknown string explodes the cardinality and is a bug. Helpers
// below cover the common cases.
func RecordLogin(outcome string) {
BusinessLoginsTotal.WithLabelValues(outcome).Inc()
}
// RecordLoginSuccess and the failure-* helpers are provided so
// call sites don't have to remember the exact label string.
func RecordLoginSuccess() { RecordLogin("success") }
func RecordLoginFailureCredentials() { RecordLogin("failure_credentials") }
func RecordLoginFailureUnverified() { RecordLogin("failure_unverified") }
func RecordLoginFailureLocked() { RecordLogin("failure_locked") }
func RecordLoginFailure2FA() { RecordLogin("failure_2fa") }
func RecordLoginFailureOther() { RecordLogin("failure_other") }
// RecordOrderEvent records a marketplace order state transition.
// status must be one of "created", "completed", "refunded",
// "failed". The handler / webhook flow is responsible for
// idempotency (Prometheus counters are not transactional, but a
// duplicate webhook causing a duplicate count is a known
// approximation — doesn't affect the trend, which is what the
// alert reads).
func RecordOrderEvent(status string) {
BusinessOrdersTotal.WithLabelValues(status).Inc()
}
// RecordRevenue adds amountCents to the running revenue total for
// the given currency. Negative values are accepted (refunds) so
// the gauge tracks net revenue. Prometheus counter semantics
// require monotonic increase, so refunds are tracked as a
// separate `BusinessOrdersTotal{status="refunded"}` event ; net
// revenue is computed in PromQL by subtracting the refund-amount
// counter (TODO if needed — for now `currency=EUR-refund` is the
// pragmatic shortcut).
//
// Practical : call this from the payment webhook on the
// `completed` transition with the amount actually settled by the
// PSP, not the order's nominal price (the two can differ on
// partial refunds, currency conversion fees, etc.).
func RecordRevenue(amountCents int64, currency string) {
if amountCents <= 0 || currency == "" {
return
}
BusinessRevenueCentsTotal.WithLabelValues(currency).Add(float64(amountCents))
}
// RecordAccountDeletion increments the deletion counter. Called
// from the GDPR erasure handler + the user-initiated account-
// delete endpoint.
func RecordAccountDeletion() {
BusinessAccountDeletionsTotal.Inc()
}