veza/infra/coturn/README.md
senke b8eed72f96 feat(webrtc): coturn ICE config endpoint + frontend wiring + ops template (v1.0.9 item 1.2)
Closes FUNCTIONAL_AUDIT.md §4 #1: WebRTC 1:1 calls had working
signaling but no NAT traversal, so calls between two peers behind
symmetric NAT (corporate firewalls, mobile carrier CGNAT, Incus
container default networking) failed silently after the SDP exchange.

Backend:
  - GET /api/v1/config/webrtc (public) returns {iceServers: [...]}
    built from WEBRTC_STUN_URLS / WEBRTC_TURN_URLS / *_USERNAME /
    *_CREDENTIAL env vars. Half-config (URLs without creds, or vice
    versa) deliberately omits the TURN block — a half-configured TURN
    surfaces auth errors at call time instead of falling back cleanly
    to STUN-only.
  - 4 handler tests cover the matrix.

Frontend:
  - services/api/webrtcConfig.ts caches the config for the page
    lifetime and falls back to the historical hardcoded Google STUN
    if the fetch fails.
  - useWebRTC fetches at mount, hands iceServers synchronously to
    every RTCPeerConnection, exposes a {hasTurn, loaded} hint.
  - CallButton tooltip warns up-front when TURN isn't configured
    instead of letting calls time out silently.

Ops:
  - infra/coturn/turnserver.conf — annotated template with the SSRF-
    safe denied-peer-ip ranges, prometheus exporter, TLS for TURNS,
    static lt-cred-mech (REST-secret rotation deferred to v1.1).
  - infra/coturn/README.md — Incus deploy walkthrough, smoke test
    via turnutils_uclient, capacity rules of thumb.
  - docs/ENV_VARIABLES.md gains a 13bis. WebRTC ICE servers section.

Coturn deployment itself is a separate ops action — this commit lands
the plumbing so the deploy can light up the path with zero code
changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 23:38:42 +02:00

117 lines
5.1 KiB
Markdown

# coturn — Veza TURN/STUN relay (v1.0.9 item 1.2)
Deployed alongside Veza to fix the **NAT traversal hole-punching gap**
identified in `FUNCTIONAL_AUDIT.md` §4 #1: WebRTC 1:1 calls signaling
works (chat WebSocket relays the SDP offer/answer/ICE candidates) but
the actual media stream fails between two peers behind symmetric NAT
(corporate firewalls, mobile carrier CGNAT, Incus container default
networking). coturn provides the relay that lets the media flow
through a public IP when peer-to-peer hole-punching fails.
## Topology
```
Caller (browser) Veza backend coturn (Incus)
│ │ │
│── GET /api/v1/config/webrtc ──▶ │ HTTPS
│◀── { iceServers: [...] } ── │
│ │
│── WS POST /chat ─▶ [SDP offer] ─▶ Callee │ WSS
│ │
│── ICE candidate (UDP probe) ─▶ Callee │ fail
│ │
│── ICE candidate (TURN relay) ─────────────▶─────────┤ UDP 3478
│◀──────────────── relay ────────────────────────────┤
│ │
```
The backend never proxies media itself. coturn is the only component
that handles the relay path; backend-api just hands the client a list
of candidate ICE servers it can try.
## Deploy on Incus
```bash
# 1. Create the container with host-network UDP forwarding for the
# listener port and the relay range.
incus launch images:debian/12 turn-veza
incus config device add turn-veza turn-udp proxy \
listen=udp:0.0.0.0:3478 connect=udp:127.0.0.1:3478
# Range proxy is more involved; the cleanest is to put the container
# directly on the host network:
# incus config set turn-veza security.privileged true
# incus config device add turn-veza host-network nic nictype=macvlan parent=eno1
# Or use a dedicated public IP and skip macvlan.
# 2. Install coturn.
incus exec turn-veza -- apt-get update
incus exec turn-veza -- apt-get install -y coturn
# 3. Render this directory's turnserver.conf with secrets — Ansible
# template OR sops-decrypt OR raw envsubst:
#
# WEBRTC_TURN_PUBLIC_IP=<public_ip> \
# WEBRTC_TURN_REALM=turn.veza.fr \
# WEBRTC_TURN_USERNAME=<from_vault> \
# WEBRTC_TURN_CREDENTIAL=<from_vault> \
# envsubst < turnserver.conf | incus file push - turn-veza/etc/turnserver.conf
# 4. Drop in the TLS cert+key (Let's Encrypt or a rotated wildcard).
incus file push fullchain.pem turn-veza/etc/coturn/cert.pem
incus file push privkey.pem turn-veza/etc/coturn/key.pem
# 5. Enable + start.
incus exec turn-veza -- systemctl enable coturn
incus exec turn-veza -- systemctl start coturn
```
## Configure Veza backend
Set these env vars on the backend container so the SPA gets the right
ICE servers from `GET /api/v1/config/webrtc`:
```bash
WEBRTC_STUN_URLS=stun:turn.veza.fr:3478 # comma-separated
WEBRTC_TURN_URLS=turn:turn.veza.fr:3478,turns:turn.veza.fr:5349
WEBRTC_TURN_USERNAME=<same as turnserver.conf>
WEBRTC_TURN_CREDENTIAL=<same as turnserver.conf>
```
If any of the TURN vars is empty, the handler returns STUN-only and the
SPA's `useWebRTC().nat.hasTurn` reports false — the CallButton tooltip
warns the user up-front instead of letting calls time out silently.
## Smoke test
```bash
# From any machine outside the Incus host network:
turnutils_uclient \
-u <WEBRTC_TURN_USERNAME> \
-w <WEBRTC_TURN_CREDENTIAL> \
-p 3478 turn.veza.fr
# Should succeed within ~1s. Failure modes:
# "Cannot get a TURN allocation" — listening-ip/port wrong, or NAT not forwarded
# "401 Unauthorized" — username/credential mismatch with config
# "BAD-REQUEST" — realm mismatch
```
For an end-to-end test from the SPA: open the browser devtools, start a
call, watch `chrome://webrtc-internals` for `iceConnectionState=connected`
and the candidate pair selected — should be `relay`/`relay` when on
symmetric-NAT networks.
## Operational notes
- Static credentials are rotated by changing `user=` in `turnserver.conf`
and reloading coturn (`systemctl reload coturn`). The backend env
vars must be updated to match in the same change window — the SPA
caches the config for the page lifetime, so rotation is invisible to
in-flight users.
- v1.1 will switch to RFC-draft REST shared-secret credentials so the
backend can mint per-user, per-call ephemeral credentials without
reloading coturn. See `ORIGIN_SECURITY_FRAMEWORK.md` (deferred).
- Capacity rule of thumb: each TURN relay session uses ~50 KB/s for
audio. A 4-vCPU coturn handles ~1000 concurrent audio sessions before
CPU saturation. Scale horizontally with a second container behind a
DNS round-robin if needed; coturn is stateless across instances.