veza/k8s/disaster-recovery/runbooks/data-restore.md

334 lines
8.3 KiB
Markdown
Raw Normal View History

# Data Restore Runbook
This runbook describes the procedure for restoring data from backups after data loss or corruption.
## Prerequisites
- Access to backup storage
- Database credentials
- kubectl access to cluster
- Backup file identified
## Pre-Restore Checklist
- [ ] Backup file identified and verified
- [ ] Backup integrity checked
- [ ] Restore point confirmed
- [ ] Applications stopped (to prevent writes)
- [ ] Current data backed up (if possible)
## Restore Procedure
### Step 1: Identify Backup
```bash
# List available backups
kubectl get pvc postgres-backup-storage -n veza-production
# List backups in storage
kubectl run backup-lister --rm -it --image=postgres:15-alpine \
--restart=Never \
--overrides='
{
"spec": {
"containers": [{
"name": "backup-lister",
"image": "postgres:15-alpine",
"command": ["/bin/sh", "-c", "ls -lh /backups/postgres/"],
"volumeMounts": [{
"name": "backup-storage",
"mountPath": "/backups"
}]
}],
"volumes": [{
"name": "backup-storage",
"persistentVolumeClaim": {
"claimName": "postgres-backup-storage"
}
}]
}
}' \
-n veza-production
# Or from S3
aws s3 ls s3://veza-backups/postgres/ --recursive | sort
```
### Step 2: Stop Applications
```bash
# Scale down applications to prevent writes
docs(J2): align docs with reality — rewrite CLAUDE.md, fix README, purge chat-server refs Completes Day 2 of the v1.0.3 → v1.0.4 cleanup sprint. The documentation now describes the actual repo layout instead of a fictional one. CLAUDE.md — complete rewrite Old version referenced paths that don't exist and a protocol aimed at implementing v0.11.0 (current tag: v1.0.3). The agent was following a map for a city that had been rebuilt. - backend/ → veza-backend-api/ - frontend/ → apps/web/ - ORIGIN/ (root) → veza-docs/ORIGIN/ - veza-chat-server → merged into backend-api (v0.502, commit 279a10d31) - apps/desktop/ → never existed Also refreshed: stack versions (Go 1.25, Vite 5, React 18.2, Axum 0.8), commands, conventions, hook bypasses (SKIP_TYPES/SKIP_TESTS/SKIP_E2E), scope rules kept as immutable (no AI/ML, no Web3, no gamification, no dark patterns, no public popularity metrics). README.md — targeted fixes - "Version cible: v0.101" → "Version courante: v1.0.4" - "Development Setup (v0.9.3)" → "Development Setup" - Removed Desktop (Electron) section — never implemented - Removed veza-chat-server from structure — merged into backend - Removed deprecated compose files section (nothing is DEPRECATED now) k8s runbooks — remove stale chat-server references The disaster-recovery runbooks still scaled/restarted a deployment that no longer exists. In a real failover these commands would have failed silently and blocked the procedure. Files patched: - k8s/disaster-recovery/runbooks/cluster-failover.md - k8s/disaster-recovery/runbooks/data-restore.md - k8s/disaster-recovery/runbooks/database-failover.md - k8s/disaster-recovery/runbooks/rollback-procedure.md - k8s/network-policies/README.md - k8s/secrets/README.md - k8s/secrets.yaml.example Each reference is replaced by a short inline note pointing to v0.502 (commit 279a10d31) so future readers understand the history. .env.example — remove CHAT_JWT_SECRET Legacy env var for the deleted chat server. Replaced by an explanatory comment. Not in this commit (user handles on Forgejo): - Closing the 5 open dependabot PRs on veza-chat-server/* branches - Deleting those 5 remote branches after the PRs are closed Refs: AUDIT_REPORT.md §5.1, §7.1, §10 P1, §10 P4
2026-04-14 15:23:50 +00:00
# (backend-api handles chat since v0.502 merge — no separate chat-server deployment)
kubectl scale deployment veza-backend-api --replicas=0 -n veza-production
# Verify pods are stopped
kubectl get pods -n veza-production -l app=veza-backend-api
```
### Step 3: Backup Current State (Optional)
```bash
# Create backup of current state before restore
kubectl create job --from=cronjob/postgres-backup \
postgres-backup-pre-restore-$(date +%s) \
-n veza-production
# Wait for backup to complete
kubectl wait --for=condition=complete job/postgres-backup-pre-restore-* \
-n veza-production \
--timeout=600s
```
### Step 4: Restore Database
#### Full Database Restore
```bash
# Get database credentials
DB_PASSWORD=$(kubectl get secret veza-secrets -n veza-production \
-o jsonpath='{.data.database-url}' | \
base64 -d | grep -oP 'password=\K[^&]+')
# Restore database
kubectl run postgres-restore --rm -it --image=postgres:15-alpine \
--restart=Never \
--env="PGPASSWORD=$DB_PASSWORD" \
--env="POSTGRES_HOST=postgres-service" \
--env="POSTGRES_USER=veza_user" \
--env="POSTGRES_DB=veza_db" \
--overrides='
{
"spec": {
"containers": [{
"name": "postgres-restore",
"image": "postgres:15-alpine",
"command": ["/bin/sh", "-c", "pg_restore -h $POSTGRES_HOST -U $POSTGRES_USER -d $POSTGRES_DB -F c /backups/postgres/veza_db_YYYYMMDD_HHMMSS.dump --clean --if-exists --verbose"],
"env": [
{"name": "PGPASSWORD", "value": "'$DB_PASSWORD'"},
{"name": "POSTGRES_HOST", "value": "postgres-service"},
{"name": "POSTGRES_USER", "value": "veza_user"},
{"name": "POSTGRES_DB", "value": "veza_db"}
],
"volumeMounts": [{
"name": "backup-storage",
"mountPath": "/backups"
}]
}],
"volumes": [{
"name": "backup-storage",
"persistentVolumeClaim": {
"claimName": "postgres-backup-storage"
}
}]
}
}' \
-n veza-production
```
#### Restore from S3
```bash
# Download backup from S3
aws s3 cp s3://veza-backups/postgres/veza_db_YYYYMMDD_HHMMSS.dump /tmp/backup.dump
# Restore
kubectl run postgres-restore --rm -it --image=postgres:15-alpine \
--restart=Never \
--env="PGPASSWORD=$DB_PASSWORD" \
--overrides='
{
"spec": {
"containers": [{
"name": "postgres-restore",
"image": "postgres:15-alpine",
"command": ["/bin/sh", "-c", "pg_restore -h postgres-service -U veza_user -d veza_db -F c /backups/backup.dump --clean --if-exists"],
"env": [{"name": "PGPASSWORD", "value": "'$DB_PASSWORD'"}],
"volumeMounts": [{
"name": "backup",
"mountPath": "/backups"
}]
}],
"volumes": [{
"name": "backup",
"hostPath": {
"path": "/tmp"
}
}]
}
}' \
-n veza-production
```
#### Point-in-Time Recovery
```bash
# Restore to specific timestamp using WAL archives
pg_restore -h postgres-service -U veza_user -d veza_db \
--recovery-target-time="2025-01-01 12:00:00" \
/backups/postgres/base_backup.dump
```
### Step 5: Verify Data Integrity
```bash
# Check table counts
kubectl exec -it postgres-pod -n veza-production -- \
psql -U veza_user -d veza_db -c "
docs(J2): align docs with reality — rewrite CLAUDE.md, fix README, purge chat-server refs Completes Day 2 of the v1.0.3 → v1.0.4 cleanup sprint. The documentation now describes the actual repo layout instead of a fictional one. CLAUDE.md — complete rewrite Old version referenced paths that don't exist and a protocol aimed at implementing v0.11.0 (current tag: v1.0.3). The agent was following a map for a city that had been rebuilt. - backend/ → veza-backend-api/ - frontend/ → apps/web/ - ORIGIN/ (root) → veza-docs/ORIGIN/ - veza-chat-server → merged into backend-api (v0.502, commit 279a10d31) - apps/desktop/ → never existed Also refreshed: stack versions (Go 1.25, Vite 5, React 18.2, Axum 0.8), commands, conventions, hook bypasses (SKIP_TYPES/SKIP_TESTS/SKIP_E2E), scope rules kept as immutable (no AI/ML, no Web3, no gamification, no dark patterns, no public popularity metrics). README.md — targeted fixes - "Version cible: v0.101" → "Version courante: v1.0.4" - "Development Setup (v0.9.3)" → "Development Setup" - Removed Desktop (Electron) section — never implemented - Removed veza-chat-server from structure — merged into backend - Removed deprecated compose files section (nothing is DEPRECATED now) k8s runbooks — remove stale chat-server references The disaster-recovery runbooks still scaled/restarted a deployment that no longer exists. In a real failover these commands would have failed silently and blocked the procedure. Files patched: - k8s/disaster-recovery/runbooks/cluster-failover.md - k8s/disaster-recovery/runbooks/data-restore.md - k8s/disaster-recovery/runbooks/database-failover.md - k8s/disaster-recovery/runbooks/rollback-procedure.md - k8s/network-policies/README.md - k8s/secrets/README.md - k8s/secrets.yaml.example Each reference is replaced by a short inline note pointing to v0.502 (commit 279a10d31) so future readers understand the history. .env.example — remove CHAT_JWT_SECRET Legacy env var for the deleted chat server. Replaced by an explanatory comment. Not in this commit (user handles on Forgejo): - Closing the 5 open dependabot PRs on veza-chat-server/* branches - Deleting those 5 remote branches after the PRs are closed Refs: AUDIT_REPORT.md §5.1, §7.1, §10 P1, §10 P4
2026-04-14 15:23:50 +00:00
SELECT
'users' as table_name, COUNT(*) as count FROM users
UNION ALL
SELECT 'tracks', COUNT(*) FROM tracks
UNION ALL
SELECT 'playlists', COUNT(*) FROM playlists;
"
# Verify specific data
kubectl exec -it postgres-pod -n veza-production -- \
psql -U veza_user -d veza_db -c "
docs(J2): align docs with reality — rewrite CLAUDE.md, fix README, purge chat-server refs Completes Day 2 of the v1.0.3 → v1.0.4 cleanup sprint. The documentation now describes the actual repo layout instead of a fictional one. CLAUDE.md — complete rewrite Old version referenced paths that don't exist and a protocol aimed at implementing v0.11.0 (current tag: v1.0.3). The agent was following a map for a city that had been rebuilt. - backend/ → veza-backend-api/ - frontend/ → apps/web/ - ORIGIN/ (root) → veza-docs/ORIGIN/ - veza-chat-server → merged into backend-api (v0.502, commit 279a10d31) - apps/desktop/ → never existed Also refreshed: stack versions (Go 1.25, Vite 5, React 18.2, Axum 0.8), commands, conventions, hook bypasses (SKIP_TYPES/SKIP_TESTS/SKIP_E2E), scope rules kept as immutable (no AI/ML, no Web3, no gamification, no dark patterns, no public popularity metrics). README.md — targeted fixes - "Version cible: v0.101" → "Version courante: v1.0.4" - "Development Setup (v0.9.3)" → "Development Setup" - Removed Desktop (Electron) section — never implemented - Removed veza-chat-server from structure — merged into backend - Removed deprecated compose files section (nothing is DEPRECATED now) k8s runbooks — remove stale chat-server references The disaster-recovery runbooks still scaled/restarted a deployment that no longer exists. In a real failover these commands would have failed silently and blocked the procedure. Files patched: - k8s/disaster-recovery/runbooks/cluster-failover.md - k8s/disaster-recovery/runbooks/data-restore.md - k8s/disaster-recovery/runbooks/database-failover.md - k8s/disaster-recovery/runbooks/rollback-procedure.md - k8s/network-policies/README.md - k8s/secrets/README.md - k8s/secrets.yaml.example Each reference is replaced by a short inline note pointing to v0.502 (commit 279a10d31) so future readers understand the history. .env.example — remove CHAT_JWT_SECRET Legacy env var for the deleted chat server. Replaced by an explanatory comment. Not in this commit (user handles on Forgejo): - Closing the 5 open dependabot PRs on veza-chat-server/* branches - Deleting those 5 remote branches after the PRs are closed Refs: AUDIT_REPORT.md §5.1, §7.1, §10 P1, §10 P4
2026-04-14 15:23:50 +00:00
SELECT id, username, email, created_at
FROM users
ORDER BY created_at DESC
LIMIT 10;
"
# Check database size
kubectl exec -it postgres-pod -n veza-production -- \
psql -U veza_user -d veza_db -c "
SELECT pg_size_pretty(pg_database_size('veza_db'));
"
```
### Step 6: Restart Applications
```bash
docs(J2): align docs with reality — rewrite CLAUDE.md, fix README, purge chat-server refs Completes Day 2 of the v1.0.3 → v1.0.4 cleanup sprint. The documentation now describes the actual repo layout instead of a fictional one. CLAUDE.md — complete rewrite Old version referenced paths that don't exist and a protocol aimed at implementing v0.11.0 (current tag: v1.0.3). The agent was following a map for a city that had been rebuilt. - backend/ → veza-backend-api/ - frontend/ → apps/web/ - ORIGIN/ (root) → veza-docs/ORIGIN/ - veza-chat-server → merged into backend-api (v0.502, commit 279a10d31) - apps/desktop/ → never existed Also refreshed: stack versions (Go 1.25, Vite 5, React 18.2, Axum 0.8), commands, conventions, hook bypasses (SKIP_TYPES/SKIP_TESTS/SKIP_E2E), scope rules kept as immutable (no AI/ML, no Web3, no gamification, no dark patterns, no public popularity metrics). README.md — targeted fixes - "Version cible: v0.101" → "Version courante: v1.0.4" - "Development Setup (v0.9.3)" → "Development Setup" - Removed Desktop (Electron) section — never implemented - Removed veza-chat-server from structure — merged into backend - Removed deprecated compose files section (nothing is DEPRECATED now) k8s runbooks — remove stale chat-server references The disaster-recovery runbooks still scaled/restarted a deployment that no longer exists. In a real failover these commands would have failed silently and blocked the procedure. Files patched: - k8s/disaster-recovery/runbooks/cluster-failover.md - k8s/disaster-recovery/runbooks/data-restore.md - k8s/disaster-recovery/runbooks/database-failover.md - k8s/disaster-recovery/runbooks/rollback-procedure.md - k8s/network-policies/README.md - k8s/secrets/README.md - k8s/secrets.yaml.example Each reference is replaced by a short inline note pointing to v0.502 (commit 279a10d31) so future readers understand the history. .env.example — remove CHAT_JWT_SECRET Legacy env var for the deleted chat server. Replaced by an explanatory comment. Not in this commit (user handles on Forgejo): - Closing the 5 open dependabot PRs on veza-chat-server/* branches - Deleting those 5 remote branches after the PRs are closed Refs: AUDIT_REPORT.md §5.1, §7.1, §10 P1, §10 P4
2026-04-14 15:23:50 +00:00
# Scale up applications (backend-api handles chat since v0.502)
kubectl scale deployment veza-backend-api --replicas=3 -n veza-production
# Wait for pods to be ready
kubectl rollout status deployment/veza-backend-api -n veza-production
```
### Step 7: Verify Application Functionality
```bash
# Check application logs
kubectl logs -f deployment/veza-backend-api -n veza-production
# Test health endpoint
curl https://api.veza.com/health
# Test API endpoints
curl https://api.veza.com/api/v1/tracks
curl https://api.veza.com/api/v1/users/me
# Run smoke tests
# (Use your application's test suite)
```
## Partial Restore
### Restore Specific Tables
```bash
# Restore only specific tables
pg_restore -h postgres-service -U veza_user -d veza_db \
-t users -t tracks \
/backups/postgres/veza_db_YYYYMMDD_HHMMSS.dump
```
### Restore Specific Schema
```bash
# Restore only specific schema
pg_restore -h postgres-service -U veza_user -d veza_db \
-n public \
/backups/postgres/veza_db_YYYYMMDD_HHMMSS.dump
```
## Verification Checklist
- [ ] Backup file identified and verified
- [ ] Applications stopped
- [ ] Current state backed up (if possible)
- [ ] Database restored successfully
- [ ] Data integrity verified
- [ ] Applications restarted
- [ ] Health checks passing
- [ ] API endpoints responding
- [ ] Smoke tests passing
- [ ] Users can access platform
## Troubleshooting
### Restore Fails with Permission Error
```bash
# Check database user permissions
kubectl exec -it postgres-pod -n veza-production -- \
psql -U postgres -c "\du veza_user"
# Grant necessary permissions
kubectl exec -it postgres-pod -n veza-production -- \
psql -U postgres -c "GRANT ALL PRIVILEGES ON DATABASE veza_db TO veza_user;"
```
### Restore Fails with Connection Error
```bash
# Verify database is accessible
kubectl exec -it postgres-pod -n veza-production -- \
pg_isready -U veza_user -d veza_db
# Check service endpoint
kubectl get svc postgres -n veza-production
# Test connection
kubectl run test-connection --rm -it --image=postgres:15-alpine \
--restart=Never \
--env="PGPASSWORD=$DB_PASSWORD" \
-- psql -h postgres-service -U veza_user -d veza_db -c "SELECT 1;"
```
### Data Inconsistencies After Restore
```bash
# Compare record counts with expected values
# Check application logs for errors
kubectl logs -f deployment/veza-backend-api -n veza-production
# Verify foreign key constraints
kubectl exec -it postgres-pod -n veza-production -- \
psql -U veza_user -d veza_db -c "
SELECT conname, conrelid::regclass, confrelid::regclass
FROM pg_constraint
WHERE contype = 'f';
"
```
## Post-Restore Tasks
1. **Monitor Platform**
docs(J2): align docs with reality — rewrite CLAUDE.md, fix README, purge chat-server refs Completes Day 2 of the v1.0.3 → v1.0.4 cleanup sprint. The documentation now describes the actual repo layout instead of a fictional one. CLAUDE.md — complete rewrite Old version referenced paths that don't exist and a protocol aimed at implementing v0.11.0 (current tag: v1.0.3). The agent was following a map for a city that had been rebuilt. - backend/ → veza-backend-api/ - frontend/ → apps/web/ - ORIGIN/ (root) → veza-docs/ORIGIN/ - veza-chat-server → merged into backend-api (v0.502, commit 279a10d31) - apps/desktop/ → never existed Also refreshed: stack versions (Go 1.25, Vite 5, React 18.2, Axum 0.8), commands, conventions, hook bypasses (SKIP_TYPES/SKIP_TESTS/SKIP_E2E), scope rules kept as immutable (no AI/ML, no Web3, no gamification, no dark patterns, no public popularity metrics). README.md — targeted fixes - "Version cible: v0.101" → "Version courante: v1.0.4" - "Development Setup (v0.9.3)" → "Development Setup" - Removed Desktop (Electron) section — never implemented - Removed veza-chat-server from structure — merged into backend - Removed deprecated compose files section (nothing is DEPRECATED now) k8s runbooks — remove stale chat-server references The disaster-recovery runbooks still scaled/restarted a deployment that no longer exists. In a real failover these commands would have failed silently and blocked the procedure. Files patched: - k8s/disaster-recovery/runbooks/cluster-failover.md - k8s/disaster-recovery/runbooks/data-restore.md - k8s/disaster-recovery/runbooks/database-failover.md - k8s/disaster-recovery/runbooks/rollback-procedure.md - k8s/network-policies/README.md - k8s/secrets/README.md - k8s/secrets.yaml.example Each reference is replaced by a short inline note pointing to v0.502 (commit 279a10d31) so future readers understand the history. .env.example — remove CHAT_JWT_SECRET Legacy env var for the deleted chat server. Replaced by an explanatory comment. Not in this commit (user handles on Forgejo): - Closing the 5 open dependabot PRs on veza-chat-server/* branches - Deleting those 5 remote branches after the PRs are closed Refs: AUDIT_REPORT.md §5.1, §7.1, §10 P1, §10 P4
2026-04-14 15:23:50 +00:00
- Watch application logs
- Monitor error rates
- Check performance metrics
2. **Verify Data**
docs(J2): align docs with reality — rewrite CLAUDE.md, fix README, purge chat-server refs Completes Day 2 of the v1.0.3 → v1.0.4 cleanup sprint. The documentation now describes the actual repo layout instead of a fictional one. CLAUDE.md — complete rewrite Old version referenced paths that don't exist and a protocol aimed at implementing v0.11.0 (current tag: v1.0.3). The agent was following a map for a city that had been rebuilt. - backend/ → veza-backend-api/ - frontend/ → apps/web/ - ORIGIN/ (root) → veza-docs/ORIGIN/ - veza-chat-server → merged into backend-api (v0.502, commit 279a10d31) - apps/desktop/ → never existed Also refreshed: stack versions (Go 1.25, Vite 5, React 18.2, Axum 0.8), commands, conventions, hook bypasses (SKIP_TYPES/SKIP_TESTS/SKIP_E2E), scope rules kept as immutable (no AI/ML, no Web3, no gamification, no dark patterns, no public popularity metrics). README.md — targeted fixes - "Version cible: v0.101" → "Version courante: v1.0.4" - "Development Setup (v0.9.3)" → "Development Setup" - Removed Desktop (Electron) section — never implemented - Removed veza-chat-server from structure — merged into backend - Removed deprecated compose files section (nothing is DEPRECATED now) k8s runbooks — remove stale chat-server references The disaster-recovery runbooks still scaled/restarted a deployment that no longer exists. In a real failover these commands would have failed silently and blocked the procedure. Files patched: - k8s/disaster-recovery/runbooks/cluster-failover.md - k8s/disaster-recovery/runbooks/data-restore.md - k8s/disaster-recovery/runbooks/database-failover.md - k8s/disaster-recovery/runbooks/rollback-procedure.md - k8s/network-policies/README.md - k8s/secrets/README.md - k8s/secrets.yaml.example Each reference is replaced by a short inline note pointing to v0.502 (commit 279a10d31) so future readers understand the history. .env.example — remove CHAT_JWT_SECRET Legacy env var for the deleted chat server. Replaced by an explanatory comment. Not in this commit (user handles on Forgejo): - Closing the 5 open dependabot PRs on veza-chat-server/* branches - Deleting those 5 remote branches after the PRs are closed Refs: AUDIT_REPORT.md §5.1, §7.1, §10 P1, §10 P4
2026-04-14 15:23:50 +00:00
- Run data integrity checks
- Compare with expected values
- Test critical user flows
3. **Document Incident**
docs(J2): align docs with reality — rewrite CLAUDE.md, fix README, purge chat-server refs Completes Day 2 of the v1.0.3 → v1.0.4 cleanup sprint. The documentation now describes the actual repo layout instead of a fictional one. CLAUDE.md — complete rewrite Old version referenced paths that don't exist and a protocol aimed at implementing v0.11.0 (current tag: v1.0.3). The agent was following a map for a city that had been rebuilt. - backend/ → veza-backend-api/ - frontend/ → apps/web/ - ORIGIN/ (root) → veza-docs/ORIGIN/ - veza-chat-server → merged into backend-api (v0.502, commit 279a10d31) - apps/desktop/ → never existed Also refreshed: stack versions (Go 1.25, Vite 5, React 18.2, Axum 0.8), commands, conventions, hook bypasses (SKIP_TYPES/SKIP_TESTS/SKIP_E2E), scope rules kept as immutable (no AI/ML, no Web3, no gamification, no dark patterns, no public popularity metrics). README.md — targeted fixes - "Version cible: v0.101" → "Version courante: v1.0.4" - "Development Setup (v0.9.3)" → "Development Setup" - Removed Desktop (Electron) section — never implemented - Removed veza-chat-server from structure — merged into backend - Removed deprecated compose files section (nothing is DEPRECATED now) k8s runbooks — remove stale chat-server references The disaster-recovery runbooks still scaled/restarted a deployment that no longer exists. In a real failover these commands would have failed silently and blocked the procedure. Files patched: - k8s/disaster-recovery/runbooks/cluster-failover.md - k8s/disaster-recovery/runbooks/data-restore.md - k8s/disaster-recovery/runbooks/database-failover.md - k8s/disaster-recovery/runbooks/rollback-procedure.md - k8s/network-policies/README.md - k8s/secrets/README.md - k8s/secrets.yaml.example Each reference is replaced by a short inline note pointing to v0.502 (commit 279a10d31) so future readers understand the history. .env.example — remove CHAT_JWT_SECRET Legacy env var for the deleted chat server. Replaced by an explanatory comment. Not in this commit (user handles on Forgejo): - Closing the 5 open dependabot PRs on veza-chat-server/* branches - Deleting those 5 remote branches after the PRs are closed Refs: AUDIT_REPORT.md §5.1, §7.1, §10 P1, §10 P4
2026-04-14 15:23:50 +00:00
- Document restore procedure
- Note any issues encountered
- Update runbook if needed
4. **Investigate Root Cause**
docs(J2): align docs with reality — rewrite CLAUDE.md, fix README, purge chat-server refs Completes Day 2 of the v1.0.3 → v1.0.4 cleanup sprint. The documentation now describes the actual repo layout instead of a fictional one. CLAUDE.md — complete rewrite Old version referenced paths that don't exist and a protocol aimed at implementing v0.11.0 (current tag: v1.0.3). The agent was following a map for a city that had been rebuilt. - backend/ → veza-backend-api/ - frontend/ → apps/web/ - ORIGIN/ (root) → veza-docs/ORIGIN/ - veza-chat-server → merged into backend-api (v0.502, commit 279a10d31) - apps/desktop/ → never existed Also refreshed: stack versions (Go 1.25, Vite 5, React 18.2, Axum 0.8), commands, conventions, hook bypasses (SKIP_TYPES/SKIP_TESTS/SKIP_E2E), scope rules kept as immutable (no AI/ML, no Web3, no gamification, no dark patterns, no public popularity metrics). README.md — targeted fixes - "Version cible: v0.101" → "Version courante: v1.0.4" - "Development Setup (v0.9.3)" → "Development Setup" - Removed Desktop (Electron) section — never implemented - Removed veza-chat-server from structure — merged into backend - Removed deprecated compose files section (nothing is DEPRECATED now) k8s runbooks — remove stale chat-server references The disaster-recovery runbooks still scaled/restarted a deployment that no longer exists. In a real failover these commands would have failed silently and blocked the procedure. Files patched: - k8s/disaster-recovery/runbooks/cluster-failover.md - k8s/disaster-recovery/runbooks/data-restore.md - k8s/disaster-recovery/runbooks/database-failover.md - k8s/disaster-recovery/runbooks/rollback-procedure.md - k8s/network-policies/README.md - k8s/secrets/README.md - k8s/secrets.yaml.example Each reference is replaced by a short inline note pointing to v0.502 (commit 279a10d31) so future readers understand the history. .env.example — remove CHAT_JWT_SECRET Legacy env var for the deleted chat server. Replaced by an explanatory comment. Not in this commit (user handles on Forgejo): - Closing the 5 open dependabot PRs on veza-chat-server/* branches - Deleting those 5 remote branches after the PRs are closed Refs: AUDIT_REPORT.md §5.1, §7.1, §10 P1, §10 P4
2026-04-14 15:23:50 +00:00
- Review logs and events
- Identify what caused data loss
- Implement prevention measures
## References
- [Backup Strategy](../backups/README.md)
- [PostgreSQL Restore Documentation](https://www.postgresql.org/docs/current/app-pgrestore.html)