veza/infra/ansible/roles/tempo/tasks/main.yml
senke 84e92a75e2
Some checks failed
Veza CI / Notify on failure (push) Blocked by required conditions
Security Scan / Secret Scanning (gitleaks) (push) Waiting to run
Veza CI / Backend (Go) (push) Has been cancelled
Veza CI / Rust (Stream Server) (push) Has been cancelled
Veza CI / Frontend (Web) (push) Has been cancelled
E2E Playwright / e2e (full) (push) Has been cancelled
feat(observability): OTel SDK + collector + Tempo + 4 hot path spans (W2 Day 9)
Wires distributed tracing end-to-end. Backend exports OTLP/gRPC to a
collector, which tail-samples (errors + slow always, 10% rest) and
ships to Tempo. Grafana service-map dashboard pivots on the 4
instrumented hot paths.

- internal/tracing/otlp_exporter.go : InitOTLPTracer + Provider.Shutdown,
  BatchSpanProcessor (5s/512 batch), ParentBased(TraceIDRatio) sampler,
  W3C trace-context + baggage propagators. OTEL_SDK_DISABLED=true
  short-circuits to a no-op. Failure to dial collector is non-fatal.
- cmd/api/main.go : init at boot, defer Shutdown(5s) on exit. appVersion
  ldflag-overridable for resource attributes.
- 4 hot paths instrumented :
    * handlers/auth.go::Login           → "auth.login"
    * core/track/track_upload_handler.go::InitiateChunkedUpload → "track.upload.initiate"
    * core/marketplace/service.go::ProcessPaymentWebhook → "payment.webhook"
    * handlers/search_handlers.go::Search → "search.query"
  PII guarded — email masked, query content not recorded (length only).
- infra/ansible/roles/otel_collector : pin v0.116.1 contrib build,
  systemd unit, tail-sampling config (errors + > 500ms always kept).
- infra/ansible/roles/tempo : pin v2.7.1 monolithic, local-disk backend
  (S3 deferred to v1.1), 14d retention.
- infra/ansible/playbooks/observability.yml : provisions both Incus
  containers + applies common baseline + roles in order.
- inventory/lab.yml : new groups observability, otel_collectors, tempo.
- config/grafana/dashboards/service-map.json : node graph + 4 hot-path
  span tables + collector throughput/queue panels.
- docs/ENV_VARIABLES.md §30 : 4 OTEL_* env vars documented.

Acceptance criterion (Day 9) : login → span visible in Tempo UI. Lab
deployment to validate with `ansible-playbook -i inventory/lab.yml
playbooks/observability.yml` once roles/postgres_ha is up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 01:15:11 +02:00

100 lines
2.5 KiB
YAML

# Tempo role — installs the single-binary distribution under /opt,
# renders monolithic config, sets up systemd. Idempotent.
---
- name: Ensure /opt/tempo exists
ansible.builtin.file:
path: /opt/tempo
state: directory
owner: root
group: root
mode: "0755"
tags: [tempo, install]
- name: Check installed Tempo version
ansible.builtin.stat:
path: "/opt/tempo/tempo-{{ tempo_version }}"
register: tempo_installed
tags: [tempo, install]
- name: Download Tempo tarball
ansible.builtin.get_url:
url: "https://github.com/grafana/tempo/releases/download/v{{ tempo_version }}/tempo_{{ tempo_version }}_linux_{{ tempo_arch }}.tar.gz"
dest: "/tmp/tempo-{{ tempo_version }}.tar.gz"
mode: "0644"
when: not tempo_installed.stat.exists
tags: [tempo, install]
- name: Extract Tempo binary into versioned slot
ansible.builtin.unarchive:
src: "/tmp/tempo-{{ tempo_version }}.tar.gz"
dest: /opt/tempo
remote_src: true
creates: "/opt/tempo/tempo-{{ tempo_version }}"
extra_opts:
- "--transform=s|^tempo$|tempo-{{ tempo_version }}|"
when: not tempo_installed.stat.exists
tags: [tempo, install]
- name: Symlink /usr/local/bin/tempo → versioned binary
ansible.builtin.file:
src: "/opt/tempo/tempo-{{ tempo_version }}"
dest: /usr/local/bin/tempo
state: link
force: true
notify: Restart tempo
tags: [tempo, install]
- name: Create tempo system user
ansible.builtin.user:
name: tempo
system: true
home: "{{ tempo_storage_local_path }}"
shell: /usr/sbin/nologin
create_home: true
tags: [tempo, install]
- name: Ensure storage directory ownership
ansible.builtin.file:
path: "{{ tempo_storage_local_path }}"
state: directory
owner: tempo
group: tempo
mode: "0755"
tags: [tempo, install]
- name: Ensure /etc/tempo exists
ansible.builtin.file:
path: /etc/tempo
state: directory
owner: root
group: tempo
mode: "0750"
tags: [tempo, config]
- name: Render tempo.yaml
ansible.builtin.template:
src: tempo.yaml.j2
dest: /etc/tempo/tempo.yaml
owner: root
group: tempo
mode: "0640"
notify: Restart tempo
tags: [tempo, config]
- name: Render systemd unit
ansible.builtin.template:
src: tempo.service.j2
dest: /etc/systemd/system/tempo.service
owner: root
group: root
mode: "0644"
notify: Restart tempo
tags: [tempo, service]
- name: Enable + start tempo
ansible.builtin.systemd:
name: tempo
state: started
enabled: true
daemon_reload: true
tags: [tempo, service]