veza/veza-backend-api/internal/workers
senke 9cdfc6d898 fix(backend): J4 — GDPR-compliant hard delete with Redis and ES cleanup
Closes TODO(HIGH-007). When the hard-delete worker anonymizes a user past
their recovery deadline, it now also cleans the user's residual data from
Redis and Elasticsearch, not just PostgreSQL. Without this, a user who
invoked their right to erasure would still appear in cached feed/profile
responses and in ES search results for up to the next reindex cycle.

Worker changes (internal/workers/hard_delete_worker.go):

  WithRedis / WithElasticsearch builder methods inject the clients. Both
  are optional: if either is nil (feature disabled or unreachable), the
  corresponding cleanup is skipped with a debug log and the worker keeps
  going. Partial progress beats panic.

  cleanRedisKeys uses SCAN with a cursor loop (COUNT 100), NEVER KEYS —
  KEYS would block the Redis server on multi-million-key deployments.
  Pattern is user:{id}:*. Transient SCAN errors retry up to 3 times with
  100ms * retry linear backoff; persistent errors return without panic.
  DEL errors on a batch are logged but non-fatal so subsequent batches
  are still attempted.

  cleanESDocs hits three indices independently:
    - users index: DELETE doc by _id (the user UUID); 404 treated as
      success (already gone = desired state)
    - tracks index: DeleteByQuery with a terms filter on _id, using the
      list of track IDs collected from PostgreSQL BEFORE anonymization
    - playlists index: same pattern as tracks
  A failure on one index does not prevent the others from being tried;
  the first error is returned so the caller can log.

  Track/playlist IDs are pre-collected (collectTrackIDs, collectPlaylistIDs)
  before the UPDATE anonymization runs, because the anonymization does NOT
  cascade (no DELETE on users), so tracks and playlists rows remain with
  their creator_id / user_id intact and resolvable at query time.

Wiring (cmd/api/main.go):

  The worker now receives cfg.RedisClient directly, and an optional ES
  client built from elasticsearch.LoadConfig() + NewClient. If ES is
  disabled or unreachable at startup, the worker logs a warning and
  proceeds with Redis-only cleanup.

Tests (internal/workers/hard_delete_worker_test.go, +260 lines):

  Pure-function unit tests:
    - TestUUIDsToStrings
    - TestEsIndexNameFor
  Nil-client safety tests:
    - TestCleanRedisKeys_NilClientIsNoop
    - TestCleanESDocs_NilClientIsNoop
  ES mock-server tests (httptest.Server mimicking /_doc and
  /_delete_by_query endpoints with valid ES 8.11 responses):
    - TestCleanESDocs_CallsAllThreeIndices — verifies the three expected
      HTTP calls land with the right paths and request bodies containing
      the provided UUIDs
    - TestCleanESDocs_SkipsEmptyIDLists — verifies no DeleteByQuery is
      issued when the ID lists are empty
  Redis testcontainer integration test (gated by VEZA_SKIP_INTEGRATION):
    - TestCleanRedisKeys_Integration — seeds 154 keys (4 fixed + 150 bulk
      to force the SCAN loop past a single batch) plus 4 unrelated keys
      from another user / global, runs cleanRedisKeys, asserts all 154
      own keys are gone and all 4 unrelated keys remain.

Verification:
  go build ./...                                                OK
  go vet ./...                                                  OK
  VEZA_SKIP_INTEGRATION=1 go test ./internal/workers/... short  OK
  go test ./internal/workers/ -run TestCleanRedisKeys_Integration
    → testcontainers spins redis:7-alpine, test passes in 1.34s

Out of J4 scope (noted for a follow-up):
  - No "activity" ES index exists in the codebase today (the audit plan
    mentioned it as a possible target). The three real indices with user
    data — users, tracks, playlists — are all now cleaned.
  - Track artist strings (free-form) may still contain the user's
    display name as a cached value in the tracks index after this
    cleanup. Actual user-owned tracks are deleted here, but if a third
    party's track referenced the removed user in its artist field, that
    reference is not touched. Strict RGPD on that edge case is a
    separate ticket.

Refs: AUDIT_REPORT.md §8.5, §10 P5, §12 item 1
2026-04-15 12:25:39 +02:00
..
analytics_job.go STABILISATION: phase 3–5 – API contract, tests & chat-server hardening 2025-12-06 17:21:59 +01:00
analytics_job_test.go STABILISATION: phase 3–5 – API contract, tests & chat-server hardening 2025-12-06 17:21:59 +01:00
email_job.go STABILISATION: phase 3–5 – API contract, tests & chat-server hardening 2025-12-06 17:21:59 +01:00
email_job_test.go STABILISATION: phase 3–5 – API contract, tests & chat-server hardening 2025-12-06 17:21:59 +01:00
hard_delete_worker.go fix(backend): J4 — GDPR-compliant hard delete with Redis and ES cleanup 2026-04-15 12:25:39 +02:00
hard_delete_worker_test.go fix(backend): J4 — GDPR-compliant hard delete with Redis and ES cleanup 2026-04-15 12:25:39 +02:00
hls_transcode_worker.go refonte: backend-api go first; phase 1 2025-12-12 21:34:34 -05:00
job_worker.go feat: backend, stream server & infra improvements 2026-03-18 11:36:06 +01:00
job_worker_test.go STABILISATION: phase 3–5 – API contract, tests & chat-server hardening 2025-12-06 17:21:59 +01:00
playback_analytics_worker.go refonte: backend-api go first; phase 1 2025-12-12 21:34:34 -05:00
playback_analytics_worker_test.go stabilizing veza-backend-api: phase 1 2025-12-16 11:23:49 -05:00
playback_retention_worker.go adding initial backend API (Go) 2025-12-03 20:29:37 +01:00
playback_retention_worker_test.go adding initial backend API (Go) 2025-12-03 20:29:37 +01:00
thumbnail_job.go STABILISATION: phase 3–5 – API contract, tests & chat-server hardening 2025-12-06 17:21:59 +01:00
thumbnail_job_test.go STABILISATION: phase 3–5 – API contract, tests & chat-server hardening 2025-12-06 17:21:59 +01:00
video_transcode_worker.go feat(v0.12.3): F276-F305 video upload, HLS transcoding, education tests 2026-03-11 19:20:48 +01:00
webhook_worker.go refonte: backend-api go first; phase 1 2025-12-12 21:34:34 -05:00