Infrastructure Overview
Complete CI/CD infrastructure specification for production deployment on Hetzner Cloud VPS
RippleCore Infrastructure Documentation
Complete CI/CD infrastructure specification for production deployment on Hetzner Cloud VPS
Overview
This documentation provides a comprehensive, production-ready CI/CD pipeline and infrastructure architecture for deploying RippleCore (and similar multi-app monorepos) on Hetzner Cloud VPS servers.
Target Audience: DevOps engineers, system administrators, technical leads Deployment Model: Service-oriented architecture on dedicated VPS servers Cloud Provider: Hetzner Cloud (Germany - EU data residency) Total Monthly Cost: β¬35-60/month (~$38-65 USD)
Documentation Structure
π Core Architecture
Architecture - Infrastructure Architecture Specification
- Complete server infrastructure design (4 VPS servers)
- Network architecture with security groups and private networking
- Technology stack and tool justification
- Scaling strategy (vertical β horizontal)
- Cost analysis and 3-year growth projections
- Time to Read: 20 minutes
π CI/CD Pipeline
CI/CD Pipeline - CI/CD Setup & Workflow Configuration
- GitHub Actions workflow (testing, building, security scanning)
- Dokploy configuration (deployment automation)
- Environment strategy (dev/preview/staging/production)
- Deployment workflows with health checks and auto-rollback
- Preview environments for pull requests
- Time to Implement: 12-16 hours (Week 2 of roadmap)
π Monitoring & Alerting
Monitoring - Monitoring & Alerting Setup Guide
- Netdata setup and custom alert configuration
- UptimeRobot configuration with 6 monitors
- Sentry integration for error tracking
- Alert routing matrix (Slack, email, SMS)
- Dashboard configuration and log management
- Time to Implement: 8-10 hours (Week 3 of roadmap)
πΎ Backup & Recovery
Backup & Recovery - Backup & Disaster Recovery Guide
- Automated backup system (PostgreSQL + Redis)
- Grandfather-Father-Son retention strategy
- Restore procedures (full, partial, selective)
- Disaster recovery scenarios with step-by-step runbooks
- Weekly automated backup validation
- Time to Implement: 6-8 hours (Week 4 of roadmap)
β Deployment Checklist
Deployment Checklist - Pre-Launch Verification Checklist
- 7-phase deployment checklist (infrastructure β launch)
- Security and compliance verification
- Performance and load testing procedures
- User acceptance testing guidelines
- Team readiness and sign-off procedures
- Time to Complete: 4-6 hours (first-time setup)
Supporting Files
π€ Automation Scripts
scripts/backup-db.sh - Automated PostgreSQL Backup
- Daily backups to Hetzner Object Storage (S3-compatible)
- Grandfather-Father-Son retention (7 days, 4 weeks, 12 months)
- Checksum verification and compression
- Slack notifications on success/failure
- Deploy to: Database server
/root/scripts/backup-db.sh
scripts/test-restore.sh - Weekly Backup Validation
- Non-destructive restore testing
- Data integrity verification
- Record count comparison with production
- Automated Slack notifications
- Deploy to: Database server
/root/scripts/test-restore.sh
π Runbooks
../runbooks/disaster-recovery.mdx - Disaster Recovery Runbook
- Emergency contact information
- Step-by-step recovery procedures for 3 scenarios:
- Complete server failure (RTO: 2 hours)
- Database corruption (RTO: 1 hour)
- Accidental data deletion (RTO: 30 minutes)
- Post-recovery checklist
- Quarterly DR drill schedule
Advanced Guides
β‘ Performance Optimization
Performance Optimization - Production Performance Tuning
- PostgreSQL configuration tuning (4GB RAM optimization)
- Redis cache optimization and monitoring
- Database query optimization and indexing strategies
- Application layer optimization (Next.js, React)
- Network & CDN optimization (Cloudflare integration)
- Server resource tuning (CPU, RAM, disk I/O)
- Performance testing and profiling
- Target: <200ms API response times, 99.5% uptime
π Security Hardening
Security Hardening - Advanced Security Measures
- Server hardening (kernel, SSH, fail2ban, Docker)
- Network security (firewall, DDoS protection, segmentation)
- Application security (CSP, input validation, authentication)
- Database security (SSL/TLS, audit logging, access control)
- Secrets management (1Password CLI, rotation)
- Security monitoring and incident response
- GDPR compliance checklist
βΈοΈ Kubernetes Migration
Kubernetes Migration - Future Scaling Path
- Migration decision framework (when to migrate)
- Kubernetes platform options (self-managed vs. managed)
- Application containerization for K8s
- Kubernetes manifests (deployments, services, HPA)
- Blue-green migration strategy
- Post-migration optimization and cost reduction
- Recommended: Migrate when >10 apps OR >100K users
Quick Start
For Immediate Implementation
If you're ready to deploy now, follow this sequence:
-
Week 1 (8-12 hours): Infrastructure Foundation
- Read: Architecture
- Provision Hetzner servers (4x VPS)
- Configure networking and firewall
- Deploy PostgreSQL + Redis
- Manual deployment of applications
-
Week 2 (12-16 hours): CI/CD Automation
- Read: CI/CD Pipeline
- Setup GitHub Actions workflows
- Configure Dokploy deployments
- Test preview environments
-
Week 3 (8-10 hours): Monitoring & Observability
- Read: Monitoring
- Install Netdata on all servers
- Configure UptimeRobot monitors
- Setup Slack alerting
-
Week 4 (6-8 hours): Backup & DR
- Read: Backup & Recovery
- Deploy backup automation scripts
- Test restore procedures
- Schedule weekly validation
-
Pre-Launch (4-6 hours): Verification
- Complete: Deployment Checklist
- Run verification script
- Conduct final UAT
- Get sign-off from stakeholders
Total Time to Production: 38-52 hours over 4-5 weeks
For Initial Assessment
If you're evaluating this approach, start here:
-
Read Architecture (20 minutes)
- Architecture - Understand server layout and costs
-
Review Deployment Checklist (15 minutes)
- Deployment Checklist - See what's required
-
Evaluate CI/CD Pipeline (20 minutes)
- CI/CD Pipeline - Assess automation approach
Total Assessment Time: ~1 hour
Key Features
β Production-Ready
- Zero-downtime deployments with rolling updates
- Automated health checks and rollback
- Comprehensive monitoring and alerting
- Disaster recovery tested procedures
π° Cost-Effective
- β¬35-60/month total infrastructure cost
- 87% cheaper than equivalent AWS infrastructure
- No vendor lock-in (easily migrate to other providers)
π Secure by Default
- Private database network (no public exposure)
- Security headers (CSP, HSTS, X-Frame-Options)
- Rate limiting and DDoS protection
- Automated SSL/TLS certificates
π Scalable
- Clear vertical scaling path (resize servers)
- Horizontal scaling strategy documented
- Load balancer integration ready
- Database read replicas support
π‘οΈ Resilient
- Daily automated backups with validation
- 2-hour recovery time objective (RTO)
- 24-hour recovery point objective (RPO)
- Quarterly disaster recovery drills
Technology Stack
Infrastructure Layer
- Cloud Provider: Hetzner Cloud (EU data residency)
- Operating System: Ubuntu 24.04 LTS
- Container Runtime: Docker 27.x
- Reverse Proxy: Traefik 3.x
- Deployment Platform: Dokploy (self-hosted)
Application Layer
- Framework: Next.js 16 + React 19
- Database: PostgreSQL 18
- Cache: Redis 7
- ORM: Drizzle (type-safe)
- Auth: better-auth
CI/CD Layer
- CI Platform: GitHub Actions (free tier)
- CD Platform: Dokploy (self-hosted)
- Security Scanning: Trivy (container vulnerabilities)
- Artifact Registry: GitHub Container Registry
Monitoring Layer
- Infrastructure: Netdata (real-time metrics)
- Uptime: UptimeRobot (health checks)
- Errors: Sentry (application errors)
- Logs: Docker logs with rotation
Infrastructure Costs
Monthly Breakdown
| Component | Specification | Monthly | Annual |
|---|---|---|---|
| Production App | CPX32 (4 vCPU, 8GB) | β¬11.99 | β¬143.88 |
| Production DB | CPX22 (3 vCPU, 4GB) | β¬8.49 | β¬101.88 |
| CI/CD Server | CPX11 (2 vCPU, 2GB) | β¬4.15 | β¬49.80 |
| Staging Server | CPX22 (3 vCPU, 4GB) | β¬8.49 | β¬101.88 |
| Object Storage | 50GB backups | β¬0.25 | β¬3.00 |
| Floating IPs | 2x static IPs | β¬2.34 | β¬28.08 |
TOTAL: β¬35.71 per month, β¬428.52 per year
External Services (optional):
- Netdata Cloud: Free (less than 5 nodes)
- UptimeRobot: Free (50 monitors)
- Sentry: Free tier or β¬26/mo
- GitHub Actions: Free (2,000 min/mo)
Grand Total: β¬35-60/month depending on usage
Performance Targets
Response Time SLAs
- Health Endpoints: <100ms
- API Endpoints: <200ms (PRD requirement)
- Page Load Time: <1s
Availability SLAs
- Uptime: 99.5% (3.6 hours/month downtime acceptable)
- RTO: 2 hours (complete recovery)
- RPO: 24 hours (daily backups)
Capacity
- Concurrent Users: 1K-50K (medium scale)
- Applications: 3-10 apps
- Database Size: Up to 100GB (can scale)
Support & Maintenance
Regular Maintenance Tasks
Daily (automated):
- Database backups (3 AM UTC)
- Backup verification (6 AM UTC)
- Health check monitoring (continuous)
Weekly (automated):
- Backup restore testing (Sundays 4 AM)
- Dokploy configuration backup (Sundays 5 AM)
- Security updates review
Monthly (manual):
- Review monitoring dashboards
- Analyze error trends (Sentry)
- Review backup storage costs
- Update documentation
Quarterly (manual):
- Disaster recovery drill
- Security audit
- Performance benchmarking
- Cost optimization review
Troubleshooting
Common Issues
Issue: Deployment Failing
- Check: CI/CD Pipeline - Troubleshooting
- Verify: GitHub Actions logs, Dokploy deployment logs
Issue: High CPU/RAM Usage
- Check: Monitoring - Dashboards
- Analyze: Netdata metrics, identify bottleneck
- Action: Vertical scaling or optimization
Issue: Backup Failures
- Check: Backup & Recovery - Backup Monitoring
- Verify: S3 credentials, disk space, PostgreSQL accessibility
- Test: Manual backup execution
Issue: Database Connectivity
- Check: Backup & Recovery - Disaster Recovery
- Verify: Private network connectivity, PostgreSQL container running
- Test:
docker exec ripplecore-postgres psql -U ripplecore -c "SELECT 1"
Migration Guides
Migrating from Existing Infrastructure
From Vercel + PlanetScale/Supabase:
- Export database from PlanetScale/Supabase
- Import to PostgreSQL on Hetzner
- Update
DATABASE_URLin applications - Deploy applications to Dokploy
- Update DNS to point to Hetzner servers
- Verify functionality, then decommission old infrastructure
From AWS/GCP/Azure:
- Provision Hetzner infrastructure (parallel to existing)
- Setup replication from existing DB to Hetzner DB
- Deploy applications to Dokploy (blue-green deployment)
- Switch DNS to Hetzner (with rollback plan)
- Monitor for 48 hours, then decommission old infrastructure
Estimated Migration Time: 1-2 weeks depending on data size
Future Enhancements
Potential Improvements (Not Required for MVP)
Infrastructure:
- Multi-region deployment (EU + US for lower latency)
- Kubernetes migration (for >10 apps, >100K users)
- Managed database (Aiven, Neon) for hands-off scaling
- CDN integration (Cloudflare, Bunny CDN)
CI/CD:
- Canary deployments (gradual rollout)
- Feature flags integration (LaunchDarkly, Flagsmith)
- Automated performance regression testing
- Blue-green deployment strategy
Monitoring:
- Grafana + Loki for advanced log querying
- Prometheus for custom metrics
- Distributed tracing (Jaeger, Tempo)
- APM integration (DataDog, New Relic)
Backup & DR:
- Point-in-time recovery (WAL archiving)
- Encrypted backups (GPG)
- Multi-region backup replication
- Hourly backups (reduce RPO to 1 hour)
Contributing to Documentation
Document Maintenance:
- Review after each major deployment
- Update after infrastructure changes
- Incorporate lessons learned from incidents
- Keep examples and commands current
Improvement Process:
- Identify gaps or outdated information
- Create PR with proposed changes
- Review with team
- Update changelog
- Notify team of changes
Contact & Support
Documentation Owner: DevOps Team Last Major Update: 2025-01-23 Next Review: Quarterly or after major changes
Internal Support:
- Slack:
#devopsfor infrastructure questions - Slack:
#incidentsfor emergencies - Email: devops@your-domain.com
External Support:
- Hetzner Support: https://console.hetzner.cloud (ticket system)
- Dokploy Discord: https://discord.gg/2tBnJ3jDJc
- GitHub Issues: For RippleCore-specific questions
Changelog
Version 1.1 (2025-01-23):
- Added Performance Optimization Guide (database tuning, caching, CDN)
- Added Security Hardening Guide (advanced security measures, GDPR compliance)
- Added Kubernetes Migration Guide (future scaling path with K8s)
- Enhanced README with advanced guides section
Version 1.0 (2025-01-23):
- Initial documentation release
- Complete infrastructure specification
- CI/CD pipeline implementation guide
- Monitoring and alerting setup
- Backup and disaster recovery procedures
- Deployment checklist and verification
- Automation scripts (backup, restore testing)
Next Version (TBD):
- Multi-region deployment guide (EU + US datacenters)
- Cost optimization case studies with real metrics
- Advanced monitoring with Grafana + Loki
- Service mesh implementation (Istio/Linkerd)
Happy Deploying! π
For questions or feedback, reach out to the DevOps team.