veza/veza-backend-api/cmd/api/main.go
senke 6c1e87e52f
Some checks failed
Veza CI / Backend (Go) (push) Failing after 0s
Veza CI / Rust (Stream Server) (push) Failing after 0s
Veza CI / Frontend (Web) (push) Failing after 0s
Security Scan / Secret Scanning (gitleaks) (push) Failing after 0s
Veza CI / Notify on failure (push) Failing after 0s
feat(marketplace): async stripe connect reversal worker — v1.0.7 item B day 2
Day-2 cut of item B: the reversal path becomes async. Pre-v1.0.7
(and v1.0.7 day 1) the refund handler flipped seller_transfers
straight from completed to reversed without ever calling Stripe —
the ledger said "reversed" while the seller's Stripe balance still
showed the original transfer as settled. The new flow:

  refund.succeeded webhook
    → reverseSellerAccounting transitions row: completed → reversal_pending
    → StripeReversalWorker (every REVERSAL_CHECK_INTERVAL, default 1m)
      → calls ReverseTransfer on Stripe
      → success: row → reversed + persist stripe_reversal_id
      → 404 already-reversed (dead code until day 3): row → reversed + log
      → 404 resource_missing (dead code until day 3): row → permanently_failed
      → transient error: stay reversal_pending, bump retry_count,
        exponential backoff (base * 2^retry, capped at backoffMax)
      → retries exhausted: row → permanently_failed
    → buyer-facing refund completes immediately regardless of Stripe health

State machine enforcement:
  * New `SellerTransfer.TransitionStatus(tx, to, extras)` wraps every
    mutation: validates against AllowedTransferTransitions, guarded
    UPDATE with WHERE status=<from> (optimistic lock semantics), no
    RowsAffected = stale state / concurrent winner detected.
  * processSellerTransfers no longer mutates .Status in place —
    terminal status is decided before struct construction, so the
    row is Created with its final state.
  * transfer_retry.retryOne and admin RetryTransfer route through
    TransitionStatus. Legacy direct assignment removed.
  * TestNoDirectTransferStatusMutation greps the package for any
    `st.Status = "..."` / `t.Status = "..."` / GORM
    Model(&SellerTransfer{}).Update("status"...) outside the
    allowlist and fails if found. Verified by temporarily injecting
    a violation during development — test caught it as expected.

Configuration (v1.0.7 item B):
  * REVERSAL_WORKER_ENABLED=true (default)
  * REVERSAL_MAX_RETRIES=5 (default)
  * REVERSAL_CHECK_INTERVAL=1m (default)
  * REVERSAL_BACKOFF_BASE=1m (default)
  * REVERSAL_BACKOFF_MAX=1h (default, caps exponential growth)
  * .env.template documents TRANSFER_RETRY_* and REVERSAL_* env vars
    so an ops reader can grep them.

Interface change: TransferService.ReverseTransfer(ctx,
stripe_transfer_id, amount *int64, reason) (reversalID, error)
added. All four mocks extended (process_webhook, transfer_retry,
admin_transfer_handler, payment_flow integration). amount=nil means
full reversal; v1.0.7 always passes nil (partial reversal is future
scope per axis-1 P2).

Stripe 404 disambiguation (ErrTransferAlreadyReversed /
ErrTransferNotFound) is wired in the worker as dead code — the
sentinels are declared and the worker branches on them, but
StripeConnectService.ReverseTransfer doesn't yet emit them. Day 3
will parse stripe.Error.Code and populate the sentinels; no worker
change needed at that point. Keeping the handling skeleton in day 2
so the worker's branch shape doesn't change between days and the
tests can already cover all four paths against the mock.

Worker unit tests (9 cases, all green, sqlite :memory:):
  * happy path: reversal_pending → reversed + stripe_reversal_id set
  * already reversed (mock returns sentinel): → reversed + log
  * not found (mock returns sentinel): → permanently_failed + log
  * transient 503: retry_count++, next_retry_at set with backoff,
    stays reversal_pending
  * backoff capped at backoffMax (verified with base=1s, max=10s,
    retry_count=4 → capped at 10s not 16s)
  * max retries exhausted: → permanently_failed
  * legacy row with empty stripe_transfer_id: → permanently_failed,
    does not call Stripe
  * only picks up reversal_pending (skips all other statuses)
  * respects next_retry_at (future rows skipped)

Existing test updated: TestProcessRefundWebhook_SucceededFinalizesState
now asserts the row lands at reversal_pending with next_retry_at
set (worker's responsibility to drive to reversed), not reversed.

Worker wired in cmd/api/main.go alongside TransferRetryWorker,
sharing the same StripeConnectService instance. Shutdown path
registered for graceful stop.

Cut from day 2 scope (per agreed-upon discipline), landing in day 3:
  * Stripe 404 disambiguation implementation (parse error.Code)
  * End-to-end smoke probe (refund → reversal_pending → worker
    processes → reversed) against local Postgres + mock Stripe
  * Batch-size tuning / inter-batch sleep — batchLimit=20 today is
    safely under Stripe's 100 req/s default rate limit; revisit if
    observed load warrants

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 15:34:29 +02:00

373 lines
13 KiB
Go
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

package main
import (
"context"
"fmt"
"log"
"net/http"
// SECURITY(REM-027): pprof removed from production — use build tag or dedicated debug binary instead.
// To enable: go build -tags debug ./cmd/api
"os"
"os/signal"
"syscall"
"time"
"github.com/getsentry/sentry-go"
"github.com/gin-gonic/gin"
"github.com/joho/godotenv"
"go.uber.org/zap"
"veza-backend-api/internal/api"
"veza-backend-api/internal/config"
"veza-backend-api/internal/core/marketplace"
vezaes "veza-backend-api/internal/elasticsearch"
"veza-backend-api/internal/jobs"
"veza-backend-api/internal/metrics"
"veza-backend-api/internal/services"
"veza-backend-api/internal/shutdown"
"veza-backend-api/internal/workers"
_ "veza-backend-api/docs" // Import docs for swagger
)
// @title Veza Backend API
// @version 1.2.0
// @description Backend API for Veza platform.
// @termsOfService http://swagger.io/terms/
// @contact.name API Support
// @contact.url http://www.veza.app/support
// @contact.email support@veza.app
// @license.name Apache 2.0
// @license.url http://www.apache.org/licenses/LICENSE-2.0.html
// @host localhost:18080
// @BasePath /api/v1
// @securityDefinitions.apikey BearerAuth
// @in header
// @name Authorization
// @securityDefinitions.apikey ApiKeyAuth
// @in header
// @name X-API-Key
// @description Developer API key (obtain from Developer Portal). Format: vza_xxxxx
func main() {
// Charger les variables d'environnement
// NOTE: Do not write to stderr to avoid broken pipe errors with systemd journald
// The message will be logged by the logger once it's initialized
_ = godotenv.Load()
// FIX #1: Supprimer l'initialisation dupliquée du logger
// Le logger sera initialisé dans config.NewConfig() avec le bon LOG_LEVEL
// Charger la configuration (qui initialise le logger)
cfg, err := config.NewConfig()
if err != nil {
// CRITICAL: Do not write to stderr or files to avoid broken pipe errors
// Just exit silently - systemd will capture the exit code
// The error details will be in the application logs if the logger was initialized
os.Exit(1)
}
// Utiliser le logger de la config
logger := cfg.Logger
if logger == nil {
log.Fatal("❌ Logger non initialisé dans la configuration")
}
logger.Info("🚀 Démarrage de Veza Backend API")
// Valider la configuration
if err := cfg.Validate(); err != nil {
logger.Fatal("❌ Configuration invalide", zap.Error(err))
}
// Initialiser Sentry si DSN configuré
if cfg.SentryDsn != "" {
err := sentry.Init(sentry.ClientOptions{
Dsn: cfg.SentryDsn,
Environment: cfg.SentryEnvironment,
TracesSampleRate: cfg.SentrySampleRateTransactions,
SampleRate: cfg.SentrySampleRateErrors,
// AttachStacktrace pour capturer les stack traces
AttachStacktrace: true,
})
if err != nil {
logger.Warn("❌ Impossible d'initialiser Sentry", zap.Error(err))
} else {
logger.Info("✅ Sentry initialisé", zap.String("environment", cfg.SentryEnvironment))
}
// Flush les événements Sentry avant shutdown
defer sentry.Flush(2 * time.Second)
} else {
logger.Info(" Sentry non configuré (SENTRY_DSN non défini)")
}
// Initialisation de la base de données
db := cfg.Database
if db == nil {
logger.Fatal("❌ Base de données non initialisée")
}
defer db.Close()
if err := db.Initialize(); err != nil {
logger.Fatal("❌ Impossible d'initialiser la base de données", zap.Error(err))
}
// MOD-P2-004: Démarrer le collecteur de métriques DB pool
// Collecte les stats DB pool toutes les 10 secondes et les expose via Prometheus
metrics.StartDBPoolStatsCollector(db.DB, 10*time.Second)
logger.Info("✅ Collecteur de métriques DB pool démarré")
// Fail-Fast: Vérifier RabbitMQ si activé
if cfg.RabbitMQEnable {
if cfg.RabbitMQEventBus == nil {
logger.Fatal("❌ RabbitMQ activé (RABBITMQ_ENABLE=true) mais non initialisé (problème de connexion?)")
} else {
// Optionnel: Check connection status if RabbitMQEventBus exposes it
// For now, assume if initialized it's connected or retrying.
// If we want STRICT fail fast, we would need to verify connection is Open here.
logger.Info("✅ RabbitMQ actif")
}
} else {
logger.Info(" RabbitMQ désactivé")
}
// BE-SVC-017: Créer le gestionnaire de shutdown gracieux
shutdownManager := shutdown.NewShutdownManager(logger)
// Démarrer le Job Worker avec contexte pour shutdown gracieux
var workerCtx context.Context
var workerCancel context.CancelFunc
if cfg.JobWorker != nil {
workerCtx, workerCancel = context.WithCancel(context.Background())
cfg.JobWorker.Start(workerCtx)
logger.Info("✅ Job Worker démarré")
// Enregistrer le Job Worker pour shutdown gracieux
shutdownManager.Register(shutdown.NewShutdownFunc("job_worker", func(ctx context.Context) error {
if workerCancel != nil {
workerCancel()
// Attendre un peu pour que les workers se terminent
time.Sleep(2 * time.Second)
}
return nil
}))
} else {
logger.Warn("⚠️ Job Worker non initialisé")
}
// v0.701: Start Transfer Retry Worker
// v1.0.7 item B: Start Reversal Worker (shares the same
// StripeConnectService — one initialisation for both workers).
if cfg.StripeConnectEnabled && cfg.StripeConnectSecretKey != "" {
stripeConnectSvc := services.NewStripeConnectService(db.GormDB, cfg.StripeConnectSecretKey, logger)
if cfg.TransferRetryEnabled {
retryWorker := marketplace.NewTransferRetryWorker(
db.GormDB, stripeConnectSvc, logger, cfg.TransferRetryInterval, cfg.TransferRetryMaxAttempts,
)
retryCtx, retryCancel := context.WithCancel(context.Background())
go retryWorker.Start(retryCtx)
logger.Info("Transfer Retry Worker started",
zap.Duration("interval", cfg.TransferRetryInterval),
zap.Int("max_retries", cfg.TransferRetryMaxAttempts))
shutdownManager.Register(shutdown.NewShutdownFunc("transfer_retry_worker", func(ctx context.Context) error {
retryCancel()
return nil
}))
}
if cfg.ReversalWorkerEnabled {
reversalWorker := marketplace.NewStripeReversalWorker(
db.GormDB, stripeConnectSvc, logger,
cfg.ReversalCheckInterval, cfg.ReversalMaxRetries,
cfg.ReversalBackoffBase, cfg.ReversalBackoffMax,
)
reversalCtx, reversalCancel := context.WithCancel(context.Background())
go reversalWorker.Start(reversalCtx)
logger.Info("Stripe Reversal Worker started",
zap.Duration("interval", cfg.ReversalCheckInterval),
zap.Int("max_retries", cfg.ReversalMaxRetries),
zap.Duration("backoff_base", cfg.ReversalBackoffBase),
zap.Duration("backoff_max", cfg.ReversalBackoffMax))
shutdownManager.Register(shutdown.NewShutdownFunc("stripe_reversal_worker", func(ctx context.Context) error {
reversalCancel()
return nil
}))
}
} else if cfg.TransferRetryEnabled || cfg.ReversalWorkerEnabled {
logger.Info("Transfer Retry / Reversal workers skipped — Stripe Connect not enabled")
}
// v0.802: Start Cloud Backup Worker (copies cloud files to backup prefix every 24h)
if cfg.S3StorageService != nil {
backupWorker := services.NewCloudBackupWorker(db.GormDB, cfg.S3StorageService, logger)
backupCtx, backupCancel := context.WithCancel(context.Background())
go backupWorker.Start(backupCtx)
logger.Info("Cloud Backup Worker started (24h interval)")
shutdownManager.Register(shutdown.NewShutdownFunc("cloud_backup_worker", func(ctx context.Context) error {
backupCancel()
return nil
}))
}
// v0.802: Start Gear Warranty Notifier (sends notifications when warranty expires in 30 days)
notificationService := services.NewNotificationService(db, logger)
warrantyNotifier := services.NewGearWarrantyNotifier(db.GormDB, notificationService, logger)
warrantyCtx, warrantyCancel := context.WithCancel(context.Background())
go warrantyNotifier.Start(warrantyCtx)
logger.Info("Gear Warranty Notifier started (24h interval)")
shutdownManager.Register(shutdown.NewShutdownFunc("gear_warranty_notifier", func(ctx context.Context) error {
warrantyCancel()
return nil
}))
// v0.10.5 F552: Weekly notification digest (runs on Sunday)
if cfg.JobWorker != nil {
digestWorker := services.NewNotificationDigestWorker(db.GormDB, cfg.JobWorker, logger)
digestCtx, digestCancel := context.WithCancel(context.Background())
go digestWorker.Start(digestCtx)
logger.Info("Notification digest worker started (weekly on Sunday)")
shutdownManager.Register(shutdown.NewShutdownFunc("notification_digest_worker", func(ctx context.Context) error {
digestCancel()
return nil
}))
}
// v0.10.8 F065: Hard delete worker (GDPR - final anonymization after 30 days)
if os.Getenv("HARD_DELETE_CRON_ENABLED") != "false" {
// Optional ES client for the worker's RGPD cleanup (users/tracks/playlists indices).
// Non-fatal if ES is disabled or unreachable — the worker will just skip ES cleanup.
var hardDeleteESClient *vezaes.Client
if esCfg := vezaes.LoadConfig(); esCfg.Enabled {
if esc, esErr := vezaes.NewClient(esCfg, logger); esErr != nil {
logger.Warn("Elasticsearch unavailable for hard delete worker, ES cleanup disabled",
zap.Error(esErr))
} else {
hardDeleteESClient = esc
}
}
hardDeleteWorker := workers.NewHardDeleteWorker(db.GormDB, logger, 24*time.Hour).
WithRedis(cfg.RedisClient).
WithElasticsearch(hardDeleteESClient)
hardDeleteCtx, hardDeleteCancel := context.WithCancel(context.Background())
go hardDeleteWorker.Start(hardDeleteCtx)
logger.Info("Hard delete worker started (24h interval)",
zap.Bool("redis_cleanup", cfg.RedisClient != nil),
zap.Bool("es_cleanup", hardDeleteESClient != nil),
)
shutdownManager.Register(shutdown.NewShutdownFunc("hard_delete_worker", func(ctx context.Context) error {
hardDeleteWorker.Stop()
hardDeleteCancel()
return nil
}))
} else {
logger.Info("Hard delete worker disabled (HARD_DELETE_CRON_ENABLED=false)")
}
// Configuration du mode Gin
// Correction: Utilisation directe de la variable d'env car non exposée dans Config
appEnv := os.Getenv("APP_ENV")
if appEnv == "production" {
gin.SetMode(gin.ReleaseMode)
} else {
gin.SetMode(gin.DebugMode)
}
// Créer le router Gin
router := gin.New()
// SECURITY(HIGH-006): Restrict trusted proxies to prevent IP spoofing via X-Forwarded-For.
// Default: trust nothing (c.ClientIP() returns RemoteAddr only).
// Set TRUSTED_PROXIES="10.0.0.1,10.0.0.2" if behind a known reverse proxy/load balancer.
router.SetTrustedProxies(nil)
// Middleware globaux (Logger, Recovery) recommandés par ORIGIN
router.Use(gin.Logger(), gin.Recovery())
// Configuration des routes
apiRouter := api.NewAPIRouter(db, cfg) // Instantiate APIRouter
if err := apiRouter.Setup(router); err != nil {
logger.Error("Failed to setup API routes", zap.Error(err))
os.Exit(1)
}
// v1.0.4: Hourly cleanup of tracks stuck in `processing` whose upload file
// vanished (crash, SIGKILL, disk wipe). Keeps the tracks table honest.
jobs.ScheduleOrphanTracksCleanup(db, logger)
// Configuration du serveur HTTP
port := fmt.Sprintf("%d", cfg.AppPort)
if cfg.AppPort == 0 {
port = "8080"
}
server := &http.Server{
Addr: fmt.Sprintf(":%s", port),
Handler: router,
ReadTimeout: 30 * time.Second, // Standards ORIGIN
WriteTimeout: 30 * time.Second,
}
// BE-SVC-017: Enregistrer tous les services pour shutdown gracieux
// Enregistrer le serveur HTTP
shutdownManager.Register(shutdown.NewShutdownFunc("http_server", func(ctx context.Context) error {
return server.Shutdown(ctx)
}))
// Enregistrer la configuration (ferme DB, Redis, RabbitMQ, etc.)
shutdownManager.Register(shutdown.NewShutdownFunc("config", func(ctx context.Context) error {
return cfg.Close()
}))
// Enregistrer le logger pour flush final
shutdownManager.Register(shutdown.NewShutdownFunc("logger", func(ctx context.Context) error {
if logger != nil {
return logger.Sync()
}
return nil
}))
// Enregistrer Sentry pour flush final
if cfg.SentryDsn != "" {
shutdownManager.Register(shutdown.NewShutdownFunc("sentry", func(ctx context.Context) error {
sentry.Flush(2 * time.Second)
return nil
}))
}
// Gestion de l'arrêt gracieux
quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
go func() {
logger.Info("🌐 Serveur HTTP démarré", zap.String("port", port))
if err := server.ListenAndServe(); err != nil && err != http.ErrServerClosed {
logger.Fatal("❌ Erreur du serveur HTTP", zap.Error(err))
}
}()
// Attendre le signal d'arrêt
<-quit
logger.Info("🔄 Signal d'arrêt reçu, démarrage du shutdown gracieux...")
// BE-SVC-017: Arrêt gracieux coordonné de tous les services
shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 30*time.Second)
defer shutdownCancel()
if err := shutdownManager.Shutdown(shutdownCtx); err != nil {
logger.Error("❌ Erreur lors du shutdown gracieux", zap.Error(err))
} else {
logger.Info("✅ Shutdown gracieux terminé avec succès")
}
}