Deployment Checklist
Pre-launch verification checklist for production deployment
Production Deployment Checklist
Pre-launch verification checklist for production deployment
Purpose: Ensure all critical systems are configured and tested before launch Estimated Time: 4-6 hours (first-time setup) Review Cycle: Before each major production deployment
Table of Contents
- Phase 1: Infrastructure Setup
- Phase 2: Application Configuration
- Phase 3: Security & Compliance
- Phase 4: Monitoring & Alerting
- Phase 5: Backup & Disaster Recovery
- Phase 6: Performance & Load Testing
- Phase 7: Final Pre-Launch
Phase 1: Infrastructure Setup
Hetzner Cloud Servers
-
Production App Server (CPX32)
- Server created in Falkenstein datacenter (fsn1)
- Ubuntu 24.04 LTS installed
- SSH key configured
- Private network assigned (10.0.1.0/24)
- Floating IP assigned
- Cloud firewall applied
- Hostname set:
ripplecore-app-prod
-
Production DB Server (CPX22)
- Server created in Falkenstein datacenter (fsn1)
- Ubuntu 24.04 LTS installed
- SSH key configured
- Private network assigned (10.0.1.0/24)
- Cloud firewall applied (database ports private only)
- Hostname set:
ripplecore-db-prod
-
CI/CD Server (CPX11)
- Server created in Falkenstein datacenter (fsn1)
- Ubuntu 24.04 LTS installed
- SSH key configured
- Private network assigned (10.0.1.0/24)
- Floating IP assigned (optional)
- Hostname set:
ripplecore-cicd-prod
-
Staging Server (CPX22)
- Server created in Falkenstein datacenter (fsn1)
- Ubuntu 24.04 LTS installed
- Private network assigned (10.0.2.0/24)
- Hostname set:
ripplecore-staging
Network Configuration
-
Private Network
- Network created:
ripplecore-prod-network - Subnet: 10.0.1.0/24 (production)
- Subnet: 10.0.2.0/24 (staging)
- All servers attached to appropriate subnets
- Network created:
-
Firewall Rules
- Firewall created:
ripplecore-prod-firewall - Inbound rules configured (80, 443, 22 restricted)
- Outbound rules configured (allow all for updates)
- Applied to all production servers
- Firewall created:
-
Floating IPs
- Production app floating IP assigned
- Floating IP DNS records configured
DNS Configuration
-
Domain Records Configured
-
app.your-domain.com→ App server floating IP -
api.your-domain.com→ App server floating IP -
www.your-domain.com→ App server floating IP -
staging.your-domain.com→ Staging server IP -
dokploy.your-domain.com→ CI/CD server IP
-
-
DNS Propagation
- All records propagated (check with
dig +short) - TTL set appropriately (300s for production, 60s for staging)
- All records propagated (check with
Object Storage
- Hetzner Object Storage
- Bucket created:
ripplecore-backups - Access key generated and saved securely
- Lifecycle policies configured (7/28/365 day retention)
- Versioning enabled
- s5cmd installed and configured on DB server
- Bucket created:
Phase 2: Application Configuration
Database Setup
-
PostgreSQL 18
- Docker container running (
ripplecore-postgres) - Database created:
ripplecore - User created:
ripplecorewith secure password - Accessible only via private network (10.0.1.3:5432)
- Migrations applied (Drizzle schema pushed)
- Test query successful
docker exec ripplecore-postgres psql -U ripplecore -d ripplecore -c "SELECT COUNT(*) FROM users;" - Docker container running (
-
Redis 7
- Docker container running (
ripplecore-redis) - AOF persistence enabled
- RDB snapshots configured (hourly)
- Accessible only via private network (10.0.1.3:6379)
- Test connection successful
docker exec ripplecore-redis redis-cli ping # Expected: PONG - Docker container running (
Application Deployment
-
Dokploy Installation
- Dokploy installed on CI/CD server
- Traefik reverse proxy running
- Admin account created
- HTTPS certificates configured (Let's Encrypt)
-
Main Application (Next.js)
- Deployed via Dokploy
- Environment variables configured
- Health endpoint accessible:
/api/health - Domain SSL working:
https://app.your-domain.com - 2 replicas running (zero-downtime deployments)
-
API Server (Next.js)
- Deployed via Dokploy
- Environment variables configured
- Health endpoint accessible:
/api/health - Domain SSL working:
https://api.your-domain.com
-
Marketing Website (Next.js)
- Deployed via Dokploy
- Environment variables configured
- Health endpoint accessible:
/api/health - Domain SSL working:
https://www.your-domain.com
Environment Variables
-
Production Environment Variables Set
Critical Variables (verify in Dokploy for each app):
DATABASE_URL=postgresql://ripplecore:<secret>@10.0.1.3:5432/ripplecore_prod REDIS_URL=redis://:<secret>@10.0.1.3:6379 BETTER_AUTH_SECRET=<32-char-secret> # Generated with: npx @better-auth/cli secret BETTER_AUTH_URL=https://app.your-domain.com BETTER_AUTH_TRUST_HOST=true NODE_ENV=production NEXT_PUBLIC_APP_URL=https://app.your-domain.com STRICT_HEALTH_CHECK=true ADMIN_USER_IDS=<comma-separated-user-ids>Optional Services (if configured):
SENTRY_DSN=<sentry-dsn> ARCJET_KEY=<arcjet-api-key> SLACK_WEBHOOK_URL=<slack-webhook> -
Secrets Validation
- All secrets generated (not default values)
- Secrets stored in 1Password vault
- No secrets committed to Git
-
.env.localfiles gitignored
Phase 3: Security & Compliance
SSH Hardening
-
SSH Configuration (on all servers)
# /etc/ssh/sshd_config PasswordAuthentication no PermitRootLogin prohibit-password PubkeyAuthentication yes -
fail2ban Installed
sudo apt install fail2ban sudo systemctl enable fail2ban sudo systemctl start fail2ban -
SSH Keys
- Only authorized team keys in
~/.ssh/authorized_keys - No password-based authentication
- SSH access tested from authorized IPs only
- Only authorized team keys in
Application Security
-
Security Headers Verified
Test with:
curl -I https://app.your-domain.com | grep -E "X-|Strict|Content-Security"Expected headers:
-
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload -
X-Frame-Options: DENY -
X-Content-Type-Options: nosniff -
Content-Security-Policy: [configured] -
Permissions-Policy: [configured]
-
-
SSL/TLS Configuration
- Let's Encrypt certificates installed
- Auto-renewal configured (Traefik handles this)
- SSL Labs grade: A or A+
- Test at: https://www.ssllabs.com/ssltest/
- HTTP → HTTPS redirect working
- HSTS preload enabled
-
Rate Limiting
- Arcjet API key configured (if using)
- Rate limits tested:
- Public endpoints: 10 req/10s per IP
- Authenticated: 100 req/min per user
- Admin: 50 req/min per user
# Test rate limiting for i in {1..15}; do curl https://app.your-domain.com/api/kindness; done # Should return 429 after limit exceeded -
Authentication Security
- better-auth configured with secure settings
- Session expiry: 8 hours (PRD requirement)
- Secure cookies enabled (
useSecureCookies: true) - Session rotation enabled (
updateAge: 3600)
Database Security
-
Database Access
- Database NOT exposed to public internet
- Accessible only via private network (10.0.1.0/24)
- Strong password configured (>32 characters)
- Audit logging enabled (PostgreSQL logs queries)
-
Multi-Tenant Isolation
- All queries filter by
organizationId - Cache keys include organization scope
- Session includes
activeOrganizationId - Test cross-tenant access (should be blocked)
# Verify organization isolation curl https://app.your-domain.com/api/kindness \ -H "Authorization: Bearer <org1-token>" \ # Should only return org1 data, not org2 - All queries filter by
Compliance
-
GDPR Compliance (if applicable)
- Data processing agreement with Hetzner signed
- User data stored in EU (Hetzner Falkenstein, Germany)
- Privacy policy published
- Cookie consent implemented
- Data export functionality available
- Data deletion functionality available
-
Security Documentation
- Security policy documented
- Incident response plan created
- Security contact published (security@your-domain.com)
Phase 4: Monitoring & Alerting
Infrastructure Monitoring
-
Netdata Installed (on all servers)
# Verify Netdata running curl http://localhost:19999/api/v1/info -
Netdata Alerts Configured
- Custom alerts created (
/etc/netdata/health.d/custom.conf) - Slack integration configured
- Test alert sent successfully
/usr/libexec/netdata/plugins.d/alarm-notify.sh test - Custom alerts created (
-
Netdata Cloud (optional)
- Servers claimed to Netdata Cloud
- Team dashboard accessible
- All servers showing in dashboard
Uptime Monitoring
-
UptimeRobot Configured
- 6 monitors created:
- https://app.your-domain.com/api/health
- https://api.your-domain.com/api/health
- https://www.your-domain.com/api/health
- https://staging.your-domain.com/api/health
- TCP check: 10.0.1.3:5432 (PostgreSQL)
- TCP check: 10.0.1.3:6379 (Redis)
- 6 monitors created:
-
Alert Contacts Configured
- Email: admin@your-domain.com
- Slack: #alerts channel
- SMS: (optional, paid tier)
-
Public Status Page (optional)
- Status page created
- Custom domain configured
- Embedded in website footer
Error Tracking
-
Sentry Configured
- Sentry DSN in environment variables
- Source maps uploaded (for stack traces)
- Error alerts configured
- Performance monitoring enabled
- Test error sent successfully
# Trigger test error in app curl https://app.your-domain.com/api/sentry-test # Verify error appears in Sentry dashboard -
Sentry Alerts
- High error rate alert (>100/hour)
- New error type alert
- Performance degradation alert (>1s endpoints)
Slack Notifications
-
Slack Channels Created
-
#deployments- Deployment notifications -
#alerts- Critical alerts -
#ops-alerts- Infrastructure alerts -
#database-alerts- Database-specific alerts
-
-
Webhooks Configured
- Netdata → #ops-alerts
- UptimeRobot → #alerts
- Sentry → #alerts
- GitHub Actions → #deployments
Phase 5: Backup & Disaster Recovery
Automated Backups
-
Backup Scripts Deployed
-
/root/scripts/backup-db.shon DB server -
/root/scripts/test-restore.shon DB server - Scripts executable (
chmod +x) - Slack webhook configured in scripts
-
-
Cron Jobs Configured
crontab -l # Expected: # 0 3 * * * /root/scripts/backup-db.sh >> /var/log/backup.log 2>&1 # 0 4 * * 0 /root/scripts/test-restore.sh >> /var/log/backup.log 2>&1 -
Backup Verification
-
Manual backup test successful
/root/scripts/backup-db.sh # Verify backup in S3 s5cmd ls s3://ripplecore-backups/postgres/daily/ -
Backup size reasonable (>10MB compressed)
-
Slack notification received
-
Restore Testing
-
Manual Restore Test
- Download backup from S3
- Restore to test database
- Verify data integrity
- Document restore time (should be <30 minutes)
-
Automated Weekly Test
- test-restore.sh executed successfully
- All integrity checks passed
- Slack notification received
Disaster Recovery
-
DR Documentation Complete
- Disaster recovery runbook created
- Emergency contacts documented
- Credentials stored in 1Password
- Team trained on DR procedures
-
DR Drill Scheduled
- First quarterly drill scheduled
- Drill checklist prepared
- Team notified of drill schedule
Phase 6: Performance & Load Testing
Health Endpoints
-
Health Check Verification
All endpoints should return 200 OK with valid JSON:
curl https://app.your-domain.com/api/health | jq curl https://api.your-domain.com/api/health | jq curl https://www.your-domain.com/api/health | jqExpected response:
{ "status": "ok", "timestamp": "2025-01-23T12:00:00.000Z", "service": "ripplecore-app", "version": "1.0.0", "environment": "production", "checks": { "database": { "status": "ok" }, "redis": { "status": "ok" } } }
Response Time Testing
-
API Performance
- Test major endpoints with
curlor Postman - Health endpoint: <100ms
- API endpoints: <200ms
- Page load times: <1s
# Test response time curl -w "@curl-format.txt" -o /dev/null -s https://app.your-domain.com/api/health # curl-format.txt content: # time_namelookup: %{time_namelookup}\n # time_connect: %{time_connect}\n # time_appconnect: %{time_appconnect}\n # time_pretransfer: %{time_pretransfer}\n # time_starttransfer: %{time_starttransfer}\n # time_total: %{time_total}\n - Test major endpoints with
Load Testing
-
Basic Load Test (Apache Bench)
# Test health endpoint under load ab -n 1000 -c 50 https://app.your-domain.com/api/health # Expected: 100% success rate, <200ms avg response time -
Realistic Load Test (optional - k6 or JMeter)
- Simulate 100 concurrent users
- Test critical user journeys
- Monitor server resources during test
- Verify auto-scaling triggers (if configured)
Database Performance
-
Connection Pool Testing
- Verify max connections not exceeded under load
- Test connection timeout behavior
- Monitor query performance
# Check active connections docker exec ripplecore-postgres psql -U ripplecore -c "SELECT count(*) FROM pg_stat_activity;" -
Slow Query Monitoring
- Enable slow query logging (queries >500ms)
- Review initial slow queries
- Add missing indexes if needed
Phase 7: Final Pre-Launch
User Acceptance Testing
-
Critical User Flows
- User registration and email verification
- User login and logout
- Create organization
- Invite team member
- Log evidence (kindness, volunteer, donation, wellbeing)
- View analytics dashboard
- Export compliance reports
- Admin license management
-
Cross-Browser Testing
- Chrome (latest)
- Firefox (latest)
- Safari (latest)
- Edge (latest)
- Mobile Safari (iOS)
- Chrome Mobile (Android)
-
Accessibility Testing
- Keyboard navigation works
- Screen reader tested (basic)
- Color contrast checked
- Focus indicators visible
Documentation
-
Public Documentation
- User guide published
- API documentation published (if public API)
- Help center/FAQ published
- Privacy policy published
- Terms of service published
-
Internal Documentation
- Infrastructure documentation complete
- Deployment procedures documented
- Troubleshooting guides created
- Runbooks finalized
Team Readiness
-
On-Call Rotation
- On-call schedule created
- On-call contacts documented
- Escalation procedures defined
-
Team Training
- Team trained on monitoring dashboards
- Team trained on deployment process
- Team trained on incident response
- Team trained on DR procedures
Launch Checklist
-
Pre-Launch
- All critical bugs fixed
- All smoke tests passing
- Monitoring dashboards all green
- Backup verified in last 24 hours
- DR plan tested in last 30 days
- Security scan completed (no critical issues)
- Performance baseline established
-
Launch Day
- On-call team available
- Incident response plan ready
- Rollback plan prepared
- Stakeholders notified of launch
- Monitoring actively watched
-
Post-Launch (first 24 hours)
- Monitor error rates (should be <0.1%)
- Monitor response times (should be <200ms)
- Monitor server resources (CPU <70%, RAM <80%)
- Verify backups completed successfully
- Review user feedback
- Document any issues encountered
Verification Script
Quick verification script to check critical services:
#!/bin/bash
# verify-deployment.sh
echo "=========================================="
echo "Production Deployment Verification"
echo "=========================================="
# DNS checks
echo "✓ Checking DNS records..."
dig +short app.your-domain.com
dig +short api.your-domain.com
dig +short www.your-domain.com
# Health checks
echo "✓ Checking health endpoints..."
curl -f https://app.your-domain.com/api/health || echo "❌ App health failed"
curl -f https://api.your-domain.com/api/health || echo "❌ API health failed"
curl -f https://www.your-domain.com/api/health || echo "❌ Web health failed"
# SSL checks
echo "✓ Checking SSL certificates..."
echo | openssl s_client -connect app.your-domain.com:443 2>/dev/null | openssl x509 -noout -dates
# Database connectivity (from app server)
echo "✓ Checking database connectivity..."
docker exec ripplecore-app curl -f http://10.0.1.3:5432 || echo "❌ Database unreachable"
# Redis connectivity (from app server)
echo "✓ Checking Redis connectivity..."
docker exec ripplecore-app curl -f http://10.0.1.3:6379 || echo "❌ Redis unreachable"
# Monitoring
echo "✓ Checking Netdata..."
curl -f http://localhost:19999/api/v1/info || echo "❌ Netdata not running"
# Backups
echo "✓ Checking latest backup..."
s5cmd ls s3://ripplecore-backups/postgres/daily/ | tail -1
echo "=========================================="
echo "Verification Complete"
echo "=========================================="Sign-Off
Deployment Authorized By:
- Technical Lead: **____** Date: **__**
- DevOps Lead: ****_**** Date: **__**
- Security Lead: **____** Date: **__**
- CTO/VP Engineering: **___** Date: **__**
Document Version: 1.0 Last Updated: 2025-01-23 Next Review: Before next major production deployment