veza/infra/ansible/roles/nginx_proxy_cache/defaults/main.yml
senke 66beb8ccb1
Some checks failed
Veza CI / Notify on failure (push) Blocked by required conditions
Security Scan / Secret Scanning (gitleaks) (push) Waiting to run
Veza CI / Frontend (Web) (push) Has been cancelled
Veza CI / Backend (Go) (push) Has been cancelled
E2E Playwright / e2e (full) (push) Has been cancelled
Veza CI / Rust (Stream Server) (push) Has been cancelled
feat(infra): nginx_proxy_cache phase-1 edge cache fronting MinIO (W3+)
Self-hosted edge cache on a dedicated Incus container, sits between
clients and the MinIO EC:2 cluster. Replaces the need for an external
CDN at v1.0 traffic levels — handles thousands of concurrent listeners
on the R720, leaks zero logs to a third party.

This is the phase-1 alternative documented in the v1.0.9 CDN synthesis :
phase-1 = self-hosted Nginx, phase-2 = 2 cache nodes + GeoDNS, phase-3
= Bunny.net via the existing CDN_* config (still inert with
CDN_ENABLED=false).

- infra/ansible/roles/nginx_proxy_cache/ : install nginx + curl, render
  nginx.conf with shared zone (128 MiB keys + 20 GiB disk,
  inactive=7d), render veza-cache site that proxies to the minio_nodes
  upstream pool with keepalive=32. HLS segments cached 7d via 1 MiB
  slice ; .m3u8 cached 60s ; everything else 1h.
- Cache key excludes Authorization / Cookie (presigned URLs only in
  v1.0). slice_range included for segments so byte-range requests
  with arbitrary offsets all hit the same cached chunks.
- proxy_cache_use_stale error timeout updating http_500..504 +
  background_update + lock — survives MinIO partial outages without
  cold-storming the origin.
- X-Cache-Status surfaced on every response so smoke tests + operators
  can verify HIT/MISS without parsing access logs.
- stub_status bound to 127.0.0.1:81/__nginx_status for the future
  prometheus nginx_exporter sidecar.
- infra/ansible/playbooks/nginx_proxy_cache.yml : provisions the
  Incus container + applies common baseline + role.
- inventory/lab.yml : new nginx_cache group.
- infra/ansible/tests/test_nginx_cache.sh : MISS→HIT roundtrip via
  X-Cache-Status, on-disk entry verification.

Acceptance : smoke test reports MISS then HIT for the same URL ; cache
directory carries on-disk entries.

No backend code change — the cache is transparent. To route through it,
flip AWS_S3_ENDPOINT=http://nginx-cache.lxd:80 in the API env.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 15:58:14 +02:00

52 lines
2.5 KiB
YAML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# nginx_proxy_cache defaults — phase-1 edge cache (self-hosted) in
# front of the distributed MinIO cluster.
#
# Why Nginx and not Varnish : VCL is overkill for HLS in front of S3.
# Segments are content-addressed (immutable), playlists rotate every
# 60s ; a plain HTTP cache with proper Cache-Control fences is
# sufficient. Nginx integrates trivially with TLS, structured logs,
# and the existing Prometheus stack via stub_status.
#
# Phase-1 scope : single cache node colocated on the R720 host
# (Incus container `nginx-cache`). Phase-2 (W3+) adds a second
# geographically-distinct cache node + GeoDNS ; phase-3 only if the
# auto-hosted edges aren't enough.
---
nginx_cache_root: /var/cache/nginx/veza
nginx_cache_max_size: "20g" # disk cap. R720 has plenty of space.
nginx_cache_inactive: "7d" # purge entries unused for > 7d
nginx_cache_levels: "1:2" # 16 × 256 dir fan-out, plenty for 100k objects
# Origin pool — points at the MinIO cluster. The role reads
# groups['minio_nodes'] inventory to populate the upstream block
# automatically ; override here if testing against an external bucket.
nginx_cache_minio_port: 9000
# Cache TTLs by file extension. Segments are content-addressed
# (immutable) so 7 days is safe + matches the backend's
# Cache-Control: max-age=86400, immutable header (we add the upper
# bound here on top, the backend can't reach above the origin's TTL).
nginx_cache_ttl_segment: "7d" # .ts, .m4s, .mp4, .aac
nginx_cache_ttl_playlist: "60s" # .m3u8 (live streams may regen)
nginx_cache_ttl_other: "1h" # cover art, generic objects
# Stale-on-error : if the origin times out / 5xx, serve the stale
# cached version. Bounded so we don't lock viewers into a permanently
# stale view if MinIO is genuinely gone.
nginx_cache_stale_error_window: "1h"
# Listener config. v1.0 = HTTP only on the Incus bridge ; TLS
# termination lives at the public LB (HAProxy/Caddy in prod). When
# we add direct internet exposure (phase-2), tls_cert_path /
# tls_key_path go here.
nginx_cache_listen_port: 80
nginx_cache_server_name: "cache.veza.lxd"
# Worker tuning. nginx defaults are ~1 worker per core ; stub_status
# exporter parses these so set them explicitly for graphability.
nginx_cache_worker_processes: "auto"
nginx_cache_worker_connections: 4096
# Stub-status endpoint for the prometheus nginx exporter. Bound to
# loopback only — the exporter sidecar reads it via 127.0.0.1.
nginx_cache_stub_status_path: "/__nginx_status"