veza/infra/coturn/README.md
senke b8eed72f96 feat(webrtc): coturn ICE config endpoint + frontend wiring + ops template (v1.0.9 item 1.2)
Closes FUNCTIONAL_AUDIT.md §4 #1: WebRTC 1:1 calls had working
signaling but no NAT traversal, so calls between two peers behind
symmetric NAT (corporate firewalls, mobile carrier CGNAT, Incus
container default networking) failed silently after the SDP exchange.

Backend:
  - GET /api/v1/config/webrtc (public) returns {iceServers: [...]}
    built from WEBRTC_STUN_URLS / WEBRTC_TURN_URLS / *_USERNAME /
    *_CREDENTIAL env vars. Half-config (URLs without creds, or vice
    versa) deliberately omits the TURN block — a half-configured TURN
    surfaces auth errors at call time instead of falling back cleanly
    to STUN-only.
  - 4 handler tests cover the matrix.

Frontend:
  - services/api/webrtcConfig.ts caches the config for the page
    lifetime and falls back to the historical hardcoded Google STUN
    if the fetch fails.
  - useWebRTC fetches at mount, hands iceServers synchronously to
    every RTCPeerConnection, exposes a {hasTurn, loaded} hint.
  - CallButton tooltip warns up-front when TURN isn't configured
    instead of letting calls time out silently.

Ops:
  - infra/coturn/turnserver.conf — annotated template with the SSRF-
    safe denied-peer-ip ranges, prometheus exporter, TLS for TURNS,
    static lt-cred-mech (REST-secret rotation deferred to v1.1).
  - infra/coturn/README.md — Incus deploy walkthrough, smoke test
    via turnutils_uclient, capacity rules of thumb.
  - docs/ENV_VARIABLES.md gains a 13bis. WebRTC ICE servers section.

Coturn deployment itself is a separate ops action — this commit lands
the plumbing so the deploy can light up the path with zero code
changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 23:38:42 +02:00

5.1 KiB

coturn — Veza TURN/STUN relay (v1.0.9 item 1.2)

Deployed alongside Veza to fix the NAT traversal hole-punching gap identified in FUNCTIONAL_AUDIT.md §4 #1: WebRTC 1:1 calls signaling works (chat WebSocket relays the SDP offer/answer/ICE candidates) but the actual media stream fails between two peers behind symmetric NAT (corporate firewalls, mobile carrier CGNAT, Incus container default networking). coturn provides the relay that lets the media flow through a public IP when peer-to-peer hole-punching fails.

Topology

   Caller (browser)            Veza backend             coturn (Incus)
       │                           │                         │
       │── GET /api/v1/config/webrtc ──▶                     │  HTTPS
       │◀── { iceServers: [...] } ──                         │
       │                                                     │
       │── WS POST /chat ─▶ [SDP offer] ─▶ Callee            │  WSS
       │                                                     │
       │── ICE candidate (UDP probe) ─▶ Callee               │  fail
       │                                                     │
       │── ICE candidate (TURN relay) ─────────────▶─────────┤  UDP 3478
       │◀──────────────── relay ────────────────────────────┤
       │                                                     │

The backend never proxies media itself. coturn is the only component that handles the relay path; backend-api just hands the client a list of candidate ICE servers it can try.

Deploy on Incus

# 1. Create the container with host-network UDP forwarding for the
#    listener port and the relay range.
incus launch images:debian/12 turn-veza
incus config device add turn-veza turn-udp proxy \
    listen=udp:0.0.0.0:3478 connect=udp:127.0.0.1:3478
# Range proxy is more involved; the cleanest is to put the container
# directly on the host network:
#   incus config set turn-veza security.privileged true
#   incus config device add turn-veza host-network nic nictype=macvlan parent=eno1
# Or use a dedicated public IP and skip macvlan.

# 2. Install coturn.
incus exec turn-veza -- apt-get update
incus exec turn-veza -- apt-get install -y coturn

# 3. Render this directory's turnserver.conf with secrets — Ansible
#    template OR sops-decrypt OR raw envsubst:
#
#    WEBRTC_TURN_PUBLIC_IP=<public_ip> \
#    WEBRTC_TURN_REALM=turn.veza.fr \
#    WEBRTC_TURN_USERNAME=<from_vault> \
#    WEBRTC_TURN_CREDENTIAL=<from_vault> \
#    envsubst < turnserver.conf | incus file push - turn-veza/etc/turnserver.conf

# 4. Drop in the TLS cert+key (Let's Encrypt or a rotated wildcard).
incus file push fullchain.pem turn-veza/etc/coturn/cert.pem
incus file push privkey.pem  turn-veza/etc/coturn/key.pem

# 5. Enable + start.
incus exec turn-veza -- systemctl enable coturn
incus exec turn-veza -- systemctl start coturn

Configure Veza backend

Set these env vars on the backend container so the SPA gets the right ICE servers from GET /api/v1/config/webrtc:

WEBRTC_STUN_URLS=stun:turn.veza.fr:3478           # comma-separated
WEBRTC_TURN_URLS=turn:turn.veza.fr:3478,turns:turn.veza.fr:5349
WEBRTC_TURN_USERNAME=<same as turnserver.conf>
WEBRTC_TURN_CREDENTIAL=<same as turnserver.conf>

If any of the TURN vars is empty, the handler returns STUN-only and the SPA's useWebRTC().nat.hasTurn reports false — the CallButton tooltip warns the user up-front instead of letting calls time out silently.

Smoke test

# From any machine outside the Incus host network:
turnutils_uclient \
    -u <WEBRTC_TURN_USERNAME> \
    -w <WEBRTC_TURN_CREDENTIAL> \
    -p 3478 turn.veza.fr

# Should succeed within ~1s. Failure modes:
#   "Cannot get a TURN allocation"  — listening-ip/port wrong, or NAT not forwarded
#   "401 Unauthorized"              — username/credential mismatch with config
#   "BAD-REQUEST"                   — realm mismatch

For an end-to-end test from the SPA: open the browser devtools, start a call, watch chrome://webrtc-internals for iceConnectionState=connected and the candidate pair selected — should be relay/relay when on symmetric-NAT networks.

Operational notes

  • Static credentials are rotated by changing user= in turnserver.conf and reloading coturn (systemctl reload coturn). The backend env vars must be updated to match in the same change window — the SPA caches the config for the page lifetime, so rotation is invisible to in-flight users.
  • v1.1 will switch to RFC-draft REST shared-secret credentials so the backend can mint per-user, per-call ephemeral credentials without reloading coturn. See ORIGIN_SECURITY_FRAMEWORK.md (deferred).
  • Capacity rule of thumb: each TURN relay session uses ~50 KB/s for audio. A 4-vCPU coturn handles ~1000 concurrent audio sessions before CPU saturation. Scale horizontally with a second container behind a DNS round-robin if needed; coturn is stateless across instances.