veza/k8s/monitoring/README.md

144 lines
3.2 KiB
Markdown
Raw Normal View History

# Monitoring and Logging Setup
This directory contains Kubernetes manifests for monitoring and logging infrastructure.
## Components
### Prometheus
- **Purpose**: Metrics collection and alerting
- **Port**: 9090
- **Storage**: 50Gi PVC
- **Retention**: 30 days
### Grafana
- **Purpose**: Metrics visualization and dashboards
- **Port**: 3000
- **Storage**: 10Gi PVC
- **Default User**: admin (password from secret)
### Loki
- **Purpose**: Log aggregation
- **Port**: 3100
- **Storage**: 50Gi PVC
- **Retention**: 30 days
### Promtail
- **Purpose**: Log collection agent (DaemonSet)
- **Port**: 9080
- **Collects**: Pod logs from all nodes
## Deployment
### 1. Deploy Prometheus
```bash
kubectl apply -f k8s/monitoring/prometheus-configmap.yaml
kubectl apply -f k8s/monitoring/prometheus-deployment.yaml
```
### 2. Deploy Grafana
```bash
kubectl apply -f k8s/monitoring/grafana-deployment.yaml
```
**Note**: Make sure to set `grafana-password` in `veza-secrets`:
```bash
kubectl create secret generic veza-secrets \
--from-literal=grafana-password=your-secure-password \
-n veza-production \
--dry-run=client -o yaml | kubectl apply -f -
```
### 3. Deploy Loki
```bash
kubectl apply -f k8s/monitoring/loki-deployment.yaml
```
### 4. Deploy Promtail
```bash
kubectl apply -f k8s/monitoring/promtail-deployment.yaml
```
## Access
### Prometheus
```bash
kubectl port-forward service/prometheus 9090:9090 -n veza-production
# Access at http://localhost:9090
```
### Grafana
```bash
kubectl port-forward service/grafana 3000:3000 -n veza-production
# Access at http://localhost:3000
# Default credentials: admin / (from secret)
```
### Loki
```bash
kubectl port-forward service/loki 3100:3100 -n veza-production
# Access at http://localhost:3100
```
## Integration with Services
All services should expose metrics at `/metrics` endpoint. Prometheus will automatically discover and scrape them using Kubernetes service discovery.
### Adding Metrics to Services
1. **Backend API (Go)**: Already has Prometheus metrics via `internal/metrics/prometheus.go`
2. **Chat Server (Rust)**: Already has Prometheus metrics
3. **Stream Server (Rust)**: Already has Prometheus metrics
### Viewing Logs in Grafana
1. Add Loki as a data source in Grafana:
- URL: `http://loki:3100`
- Access: Server (default)
2. Use LogQL queries:
```
{namespace="veza-production", app="veza-backend-api"}
```
## Dashboards
Grafana will automatically provision dashboards from ConfigMaps. To add custom dashboards:
1. Create a ConfigMap with dashboard JSON
2. Mount it in Grafana deployment
3. Grafana will auto-discover and load it
## Alerts
Prometheus alerting rules can be added via ConfigMap. Create rules files and mount them in Prometheus deployment.
## Troubleshooting
### Check Prometheus Targets
```bash
kubectl port-forward service/prometheus 9090:9090 -n veza-production
# Visit http://localhost:9090/targets
```
### Check Promtail Logs
```bash
kubectl logs -f daemonset/promtail -n veza-production
```
### Check Loki Logs
```bash
kubectl logs -f deployment/loki -n veza-production
```
### Verify Service Discovery
```bash
kubectl get pods -n veza-production -l app=veza-backend-api
kubectl get pods -n veza-production -l app=veza-chat-server
```