senke/veza - Talas Project: Beyond coding. We Forge.

senke/veza

Fork 0

Commit graph

Author	SHA1	Message	Date
senke	1cab2a1d56	fix(middleware): persist maintenance flag via platform_settings table The maintenance toggle lived in a package-level `bool` inside `middleware/maintenance.go`. Flipping it via `PUT /admin/maintenance` only updated the pod handling that request — the other N-1 pods stayed open for traffic. In practice this meant deploys-in-progress or incident playbooks silently failed to put the fleet into maintenance. New storage: * Migration `976_platform_settings.sql` adds a typed key/value table (`value_bool` / `value_text` to avoid string parsing in the hot path) and seeds `maintenance_mode=false`. Idempotent on re-run. * `middleware/maintenance.go` rewritten around a `maintenanceState` with a 10s TTL cache. `InitMaintenanceMode(db, logger)` primes the cache at boot; `MaintenanceModeEnabled()` refreshes lazily when the next request lands after the TTL. Startup `MAINTENANCE_MODE` env is still honoured for fresh pods. * `router.go` calls `InitMaintenanceMode` before applying the `MaintenanceGin()` middleware so the first request sees DB truth. * `PUT /api/v1/admin/maintenance` in `routes_core.go` now does an `INSERT ... ON CONFLICT DO UPDATE` on the table before the in-memory setter, so the flip survives restarts and propagates to every pod within ~10s (one TTL window). Tests: `TestMaintenanceGin_DBBacked` flips the DB row, waits past a shrunk-for-test TTL, and asserts the cache picked up the change. All four pre-existing tests preserved (`Disabled`, `Enabled_Returns503`, `HealthExempt`, `AdminExempt`). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 14:57:06 +02:00

Author

SHA1

Message

Date

senke

1cab2a1d56

fix(middleware): persist maintenance flag via platform_settings table

The maintenance toggle lived in a package-level `bool` inside
`middleware/maintenance.go`. Flipping it via `PUT /admin/maintenance`
only updated the pod handling that request — the other N-1 pods stayed
open for traffic. In practice this meant deploys-in-progress or
incident playbooks silently failed to put the fleet into maintenance.

New storage:

  * Migration `976_platform_settings.sql` adds a typed key/value table
    (`value_bool` / `value_text` to avoid string parsing in the hot
    path) and seeds `maintenance_mode=false`. Idempotent on re-run.
  * `middleware/maintenance.go` rewritten around a `maintenanceState`
    with a 10s TTL cache. `InitMaintenanceMode(db, logger)` primes the
    cache at boot; `MaintenanceModeEnabled()` refreshes lazily when the
    next request lands after the TTL. Startup `MAINTENANCE_MODE` env is
    still honoured for fresh pods.
  * `router.go` calls `InitMaintenanceMode` before applying the
    `MaintenanceGin()` middleware so the first request sees DB truth.
  * `PUT /api/v1/admin/maintenance` in `routes_core.go` now does an
    `INSERT ... ON CONFLICT DO UPDATE` on the table *before* the
    in-memory setter, so the flip survives restarts and propagates to
    every pod within ~10s (one TTL window).

Tests: `TestMaintenanceGin_DBBacked` flips the DB row, waits past a
shrunk-for-test TTL, and asserts the cache picked up the change. All
four pre-existing tests preserved (`Disabled`, `Enabled_Returns503`,
`HealthExempt`, `AdminExempt`).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-16 14:57:06 +02:00

1 commit