veza/veza-backend-api/internal
senke 9cdfc6d898 fix(backend): J4 — GDPR-compliant hard delete with Redis and ES cleanup
Closes TODO(HIGH-007). When the hard-delete worker anonymizes a user past
their recovery deadline, it now also cleans the user's residual data from
Redis and Elasticsearch, not just PostgreSQL. Without this, a user who
invoked their right to erasure would still appear in cached feed/profile
responses and in ES search results for up to the next reindex cycle.

Worker changes (internal/workers/hard_delete_worker.go):

  WithRedis / WithElasticsearch builder methods inject the clients. Both
  are optional: if either is nil (feature disabled or unreachable), the
  corresponding cleanup is skipped with a debug log and the worker keeps
  going. Partial progress beats panic.

  cleanRedisKeys uses SCAN with a cursor loop (COUNT 100), NEVER KEYS —
  KEYS would block the Redis server on multi-million-key deployments.
  Pattern is user:{id}:*. Transient SCAN errors retry up to 3 times with
  100ms * retry linear backoff; persistent errors return without panic.
  DEL errors on a batch are logged but non-fatal so subsequent batches
  are still attempted.

  cleanESDocs hits three indices independently:
    - users index: DELETE doc by _id (the user UUID); 404 treated as
      success (already gone = desired state)
    - tracks index: DeleteByQuery with a terms filter on _id, using the
      list of track IDs collected from PostgreSQL BEFORE anonymization
    - playlists index: same pattern as tracks
  A failure on one index does not prevent the others from being tried;
  the first error is returned so the caller can log.

  Track/playlist IDs are pre-collected (collectTrackIDs, collectPlaylistIDs)
  before the UPDATE anonymization runs, because the anonymization does NOT
  cascade (no DELETE on users), so tracks and playlists rows remain with
  their creator_id / user_id intact and resolvable at query time.

Wiring (cmd/api/main.go):

  The worker now receives cfg.RedisClient directly, and an optional ES
  client built from elasticsearch.LoadConfig() + NewClient. If ES is
  disabled or unreachable at startup, the worker logs a warning and
  proceeds with Redis-only cleanup.

Tests (internal/workers/hard_delete_worker_test.go, +260 lines):

  Pure-function unit tests:
    - TestUUIDsToStrings
    - TestEsIndexNameFor
  Nil-client safety tests:
    - TestCleanRedisKeys_NilClientIsNoop
    - TestCleanESDocs_NilClientIsNoop
  ES mock-server tests (httptest.Server mimicking /_doc and
  /_delete_by_query endpoints with valid ES 8.11 responses):
    - TestCleanESDocs_CallsAllThreeIndices — verifies the three expected
      HTTP calls land with the right paths and request bodies containing
      the provided UUIDs
    - TestCleanESDocs_SkipsEmptyIDLists — verifies no DeleteByQuery is
      issued when the ID lists are empty
  Redis testcontainer integration test (gated by VEZA_SKIP_INTEGRATION):
    - TestCleanRedisKeys_Integration — seeds 154 keys (4 fixed + 150 bulk
      to force the SCAN loop past a single batch) plus 4 unrelated keys
      from another user / global, runs cleanRedisKeys, asserts all 154
      own keys are gone and all 4 unrelated keys remain.

Verification:
  go build ./...                                                OK
  go vet ./...                                                  OK
  VEZA_SKIP_INTEGRATION=1 go test ./internal/workers/... short  OK
  go test ./internal/workers/ -run TestCleanRedisKeys_Integration
    → testcontainers spins redis:7-alpine, test passes in 1.34s

Out of J4 scope (noted for a follow-up):
  - No "activity" ES index exists in the codebase today (the audit plan
    mentioned it as a possible target). The three real indices with user
    data — users, tracks, playlists — are all now cleaned.
  - Track artist strings (free-form) may still contain the user's
    display name as a cached value in the tracks index after this
    cleanup. Actual user-owned tracks are deleted here, but if a third
    party's track referenced the removed user in its artist field, that
    reference is not touched. Strict RGPD on that edge case is a
    separate ticket.

Refs: AUDIT_REPORT.md §8.5, §10 P5, §12 item 1
2026-04-15 12:25:39 +02:00
..
api ci(cache): add save-always to persist cache on job failure 2026-04-14 18:01:40 +02:00
common v0.9.2 2026-03-05 19:27:34 +01:00
config style(backend): gofmt -w on 85 files (whitespace only) 2026-04-14 12:22:14 +02:00
core style(backend): gofmt -w on 85 files (whitespace only) 2026-04-14 12:22:14 +02:00
database v0.9.4 2026-03-05 23:03:43 +01:00
dto Phase 2 stabilisation: code mort, Modal→Dialog, feature flags, tests, router split, Rust legacy 2026-02-14 17:23:32 +01:00
elasticsearch style(backend): gofmt -w on 85 files (whitespace only) 2026-04-14 12:22:14 +02:00
email STABILISATION: phase 3–5 – API contract, tests & chat-server hardening 2025-12-06 17:21:59 +01:00
errors v0.9.8 2026-03-06 19:13:16 +01:00
eventbus feat: backend — config, handlers, services, logging, migration 2026-03-23 15:46:57 +01:00
features adding initial backend API (Go) 2025-12-03 20:29:37 +01:00
handlers refactor(backend): J3 — remove 3 deprecated unused handlers 2026-04-14 18:11:07 +02:00
infrastructure v0.9.4 2026-03-05 23:03:43 +01:00
integration style(backend): gofmt -w on 85 files (whitespace only) 2026-04-14 12:22:14 +02:00
interfaces adding initial backend API (Go) 2025-12-03 20:29:37 +01:00
jobs fix(v0.12.6.1): remediate remaining 15 MEDIUM + LOW pentest findings 2026-03-12 06:13:38 +01:00
logging style(backend): gofmt -w on 85 files (whitespace only) 2026-04-14 12:22:14 +02:00
metrics v0.9.4 2026-03-05 23:03:43 +01:00
middleware style(backend): gofmt -w on 85 files (whitespace only) 2026-04-14 12:22:14 +02:00
models style(backend): gofmt -w on 85 files (whitespace only) 2026-04-14 12:22:14 +02:00
monitoring v0.9.4 2026-03-05 23:03:43 +01:00
pagination v0.9.8 2026-03-06 19:13:16 +01:00
recovery chore(v0.102): consolidate remaining changes — docs, frontend, backend 2026-02-20 13:02:12 +01:00
repositories fix(v0.12.6.1): remediate 2 CRITICAL + 10 HIGH + 1 MEDIUM pentest findings 2026-03-12 05:40:53 +01:00
repository fix(v0.12.6.1): update in-memory UserRepositoryImpl to accept context.Context 2026-03-12 05:47:47 +01:00
resilience chore: consolidate CI, E2E, backend and frontend updates 2026-02-17 16:43:21 +01:00
response fix: stabilize builds, tests, and lint across all stacks 2026-04-05 16:48:07 +02:00
security refactor(backend): replace 40 fmt.Printf calls with zap structured logging 2026-02-22 17:44:38 +01:00
services style(backend): gofmt -w on 85 files (whitespace only) 2026-04-14 12:22:14 +02:00
shutdown incus deployement fully implemented, Makefile updated and make fmt ran 2026-01-13 19:47:57 +01:00
testutils test(backend): gate testcontainers tests behind VEZA_SKIP_INTEGRATION 2026-04-14 11:45:19 +02:00
tracing incus deployement fully implemented, Makefile updated and make fmt ran 2026-01-13 19:47:57 +01:00
types feat(profile): add profile privacy toggle (B3) 2026-02-20 15:10:02 +01:00
upload [INT-015] int: Add file upload format standardization 2025-12-25 15:40:01 +01:00
utils fix(v0.12.6): apply all pentest remediations — 36 findings across 36 files 2026-03-14 00:44:46 +01:00
validators feat(v0.13.3): complete - Polish Sécurité Avancée 2026-03-13 10:09:01 +01:00
websocket style(backend): gofmt -w on 85 files (whitespace only) 2026-04-14 12:22:14 +02:00
workers fix(backend): J4 — GDPR-compliant hard delete with Redis and ES cleanup 2026-04-15 12:25:39 +02:00