Kubernetes Migration Guide
Migration path from Docker/Dokploy to Kubernetes for large-scale deployments
Migration path from Docker/Dokploy to Kubernetes
When to Migrate: >10 apps, >100K users, need for auto-scaling and advanced orchestration Estimated Effort: 2-4 weeks (preparation + migration + validation) Cost Impact: €100-200/month (managed Kubernetes) or €60-100/month (self-managed)
Table of Contents
- Migration Decision Framework
- Pre-Migration Planning
- Kubernetes Platform Options
- Application Containerization
- Kubernetes Manifests
- Migration Strategy
- Post-Migration Optimization
Migration Decision Framework
When to Migrate to Kubernetes
Migrate When:
- Managing >10 applications
- Traffic >100K concurrent users
- Need horizontal pod autoscaling (HPA)
- Multi-region deployment required
- Complex service mesh needed
- Team has Kubernetes expertise
- CI/CD pipeline needs advanced orchestration
Don't Migrate If:
- <5 applications running
- <50K users
- Simple scaling needs (vertical scaling sufficient)
- Team lacks Kubernetes experience
- Budget constraints (<€100/month for infrastructure)
Current RippleCore Status (from ARCHITECTURE.md):
- Apps: 4 (app, api, web, docs)
- Users: 1K-50K (medium scale)
- Complexity: Moderate
- Recommendation: Stay on Docker/Dokploy for now
Migrate when: >10 apps OR >100K users OR need auto-scaling
Cost Comparison
| Approach | Monthly Cost | Pros | Cons |
|---|---|---|---|
| Current (Dokploy) | €35-60 | Simple, cost-effective, sufficient | Manual scaling, limited HA |
| Self-Managed K8s | €60-100 | Full control, customizable | Requires expertise, maintenance overhead |
| Hetzner Cloud K8s | N/A | N/A | Not available (use Rancher on VPS) |
| Managed K8s (Civo) | €90-150 | Managed control plane, easy | Less control, vendor lock-in |
| GKE/EKS/AKS | €150-300 | Enterprise features, support | Expensive, complex billing |
Recommended Path: Self-managed K8s on Hetzner VPS (cost-effective, full control)
Pre-Migration Planning
Infrastructure Readiness Checklist
Current State Audit:
- Document all running services (4 apps + DB + Redis + Traefik)
- Map environment variables for all services
- Identify persistent volumes (PostgreSQL data, Redis data)
- Document network dependencies (app → DB, app → Redis)
- Export current resource usage (CPU, RAM per service)
Kubernetes Requirements:
- Minimum 3 nodes for high availability
- Each node: 4 vCPU, 8GB RAM (CPX32 or larger)
- Separate network for cluster communication
- Load balancer for ingress (Hetzner Cloud Load Balancer)
- Object storage for backups (already have S3)
Estimated Infrastructure (3-node cluster):
Control Plane + Worker 1: CPX42 (8 vCPU, 16GB) - €26.99/mo
Worker 2: CPX32 (4 vCPU, 8GB) - €11.99/mo
Worker 3: CPX32 (4 vCPU, 8GB) - €11.99/mo
Load Balancer: Hetzner Cloud LB - €5.83/mo
Object Storage: 50GB (backups) - €0.25/mo
Total: ~€57/month (vs. €36 current)Skills & Training Preparation
Required Skills:
- Kubernetes fundamentals (pods, services, deployments)
- Helm package management
- kubectl CLI proficiency
- YAML manifest creation
- Troubleshooting Kubernetes networking
Training Resources (2-4 weeks):
- Free: Kubernetes official documentation (https://kubernetes.io/docs/)
- Free: KodeKloud Kubernetes for Beginners (https://kodekloud.com)
- Paid: Linux Foundation CKA course ($300)
- Hands-on: Minikube or kind for local practice
Team Readiness:
- At least 2 team members should complete training
- Practice deploying simple apps to local cluster (minikube)
- Run chaos engineering experiments (kill pods, simulate failures)
Kubernetes Platform Options
Option 1: Self-Managed Kubernetes (Recommended)
Tool: Rancher on Hetzner Cloud VPS
Pros:
- Full control over cluster configuration
- Cost-effective (use existing Hetzner infrastructure)
- No vendor lock-in
- Rancher provides management UI
Cons:
- Requires Kubernetes expertise
- Manual upgrades and maintenance
- Team responsible for security patches
Setup Guide:
1. Provision Servers (3 nodes):
# Via Hetzner Cloud Console
# Create 3x CPX32 servers with private network
NODE_1_IP="10.0.3.10"
NODE_2_IP="10.0.3.11"
NODE_3_IP="10.0.3.12"2. Install RKE2 (Rancher Kubernetes Engine):
# On first node (control plane + worker)
curl -sfL https://get.rke2.io | sh -
systemctl enable rke2-server.service
systemctl start rke2-server.service
# Get join token
cat /var/lib/rancher/rke2/server/node-token
# On worker nodes
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE="agent" sh -
systemctl enable rke2-agent.service
# Configure agent
mkdir -p /etc/rancher/rke2/
cat > /etc/rancher/rke2/config.yaml <<EOF
server: https://10.0.3.10:9345
token: <NODE_TOKEN>
EOF
systemctl start rke2-agent.service3. Install kubectl and Helm:
# kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
# Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
# Configure kubeconfig
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
kubectl get nodes4. Install Rancher Management:
# Add Helm repo
helm repo add rancher-latest https://releases.rancher.com/server-charts/latest
helm repo update
# Create namespace
kubectl create namespace cattle-system
# Install cert-manager (for SSL)
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml
# Install Rancher
helm install rancher rancher-latest/rancher \
--namespace cattle-system \
--set hostname=rancher.your-domain.com \
--set bootstrapPassword=admin \
--set ingress.tls.source=letsEncrypt \
--set letsEncrypt.email=admin@your-domain.com
# Access Rancher UI
# https://rancher.your-domain.comOption 2: Managed Kubernetes (Civo)
Pros:
- Managed control plane (no maintenance)
- Quick setup (<10 minutes)
- UK-based (EU data residency)
Cons:
- €90-150/month (more expensive)
- Vendor lock-in
- Less control over cluster config
Setup:
- Sign up at https://civo.com
- Create cluster via UI (3 medium nodes)
- Download kubeconfig
- Deploy apps via kubectl/Helm
Application Containerization
Dockerfile Optimization for Kubernetes
Multi-Stage Build (reduce image size):
# apps/app/Dockerfile.k8s
# ============================================================================
# STAGE 1: Dependencies
# ============================================================================
FROM node:20-alpine AS deps
WORKDIR /app
# Copy package files
COPY package.json pnpm-lock.yaml ./
COPY .npmrc ./
# Install dependencies
RUN corepack enable && \
corepack prepare pnpm@latest --activate && \
pnpm install --frozen-lockfile --prod
# ============================================================================
# STAGE 2: Builder
# ============================================================================
FROM node:20-alpine AS builder
WORKDIR /app
# Copy dependencies from deps stage
COPY --from=deps /app/node_modules ./node_modules
# Copy source code
COPY . .
# Build application
RUN pnpm build
# ============================================================================
# STAGE 3: Runner (Final Image)
# ============================================================================
FROM node:20-alpine AS runner
WORKDIR /app
# Create non-root user
RUN addgroup --system --gid 1001 nodejs && \
adduser --system --uid 1001 nextjs
# Copy necessary files only
COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./
COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static
COPY --from=builder --chown=nextjs:nodejs /app/public ./public
# Switch to non-root user
USER nextjs
EXPOSE 3000
ENV NODE_ENV=production
ENV PORT=3000
# Health check
HEALTHCHECK --interval=30s --timeout=5s --start-period=60s --retries=3 \
CMD node -e "require('http').get('http://localhost:3000/api/health', (r) => {process.exit(r.statusCode === 200 ? 0 : 1)})"
CMD ["node", "server.js"]Build and Push:
# Build for multiple platforms (ARM + x86)
docker buildx build --platform linux/amd64,linux/arm64 \
-t ghcr.io/your-org/ripplecore-app:latest \
-t ghcr.io/your-org/ripplecore-app:v1.0.0 \
--push \
-f apps/app/Dockerfile.k8s .Kubernetes Manifests
Namespace and ConfigMap
File: k8s/base/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: ripplecore-production
labels:
name: ripplecore-production
environment: productionFile: k8s/base/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: ripplecore-config
namespace: ripplecore-production
data:
NODE_ENV: "production"
NEXT_PUBLIC_APP_URL: "https://app.your-domain.com"
BETTER_AUTH_URL: "https://app.your-domain.com"
BETTER_AUTH_TRUST_HOST: "true"Secrets Management
File: k8s/base/secrets.yaml
apiVersion: v1
kind: Secret
metadata:
name: ripplecore-secrets
namespace: ripplecore-production
type: Opaque
stringData:
DATABASE_URL: "postgresql://ripplecore:<secret>@postgres-service:5432/ripplecore"
REDIS_URL: "redis://:<secret>@redis-service:6379"
BETTER_AUTH_SECRET: "<secret>"
SENTRY_DSN: "<sentry-dsn>"
ARCJET_KEY: "<arcjet-key>"Better Approach: Use Sealed Secrets or External Secrets Operator
# Install Sealed Secrets
kubectl apply -f https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.24.0/controller.yaml
# Create sealed secret
echo -n 'DATABASE_URL=postgresql://...' | \
kubectl create secret generic ripplecore-secrets --dry-run=client --from-file=/dev/stdin -o yaml | \
kubeseal -o yaml > k8s/base/sealed-secrets.yamlPostgreSQL StatefulSet
File: k8s/base/postgres-statefulset.yaml
apiVersion: v1
kind: Service
metadata:
name: postgres-service
namespace: ripplecore-production
spec:
selector:
app: postgres
ports:
- port: 5432
targetPort: 5432
clusterIP: None # Headless service for StatefulSet
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
namespace: ripplecore-production
spec:
serviceName: postgres-service
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:18-alpine
ports:
- containerPort: 5432
env:
- name: POSTGRES_USER
value: ripplecore
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: ripplecore-secrets
key: POSTGRES_PASSWORD
- name: POSTGRES_DB
value: ripplecore
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
volumeMounts:
- name: postgres-storage
mountPath: /var/lib/postgresql/data
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "4Gi"
cpu: "2000m"
livenessProbe:
exec:
command:
- pg_isready
- -U
- ripplecore
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command:
- pg_isready
- -U
- ripplecore
initialDelaySeconds: 5
periodSeconds: 5
volumeClaimTemplates:
- metadata:
name: postgres-storage
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: hcloud-volumes # Hetzner Cloud Volumes
resources:
requests:
storage: 50GiApplication Deployment
File: k8s/base/app-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: ripplecore-app
namespace: ripplecore-production
labels:
app: ripplecore-app
spec:
replicas: 3 # Horizontal scaling
selector:
matchLabels:
app: ripplecore-app
template:
metadata:
labels:
app: ripplecore-app
spec:
containers:
- name: app
image: ghcr.io/your-org/ripplecore-app:latest
imagePullPolicy: Always
ports:
- containerPort: 3000
envFrom:
- configMapRef:
name: ripplecore-config
- secretRef:
name: ripplecore-secrets
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
livenessProbe:
httpGet:
path: /api/health
port: 3000
initialDelaySeconds: 60
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /api/health
port: 3000
initialDelaySeconds: 30
periodSeconds: 5
failureThreshold: 3
imagePullSecrets:
- name: ghcr-secret
---
apiVersion: v1
kind: Service
metadata:
name: app-service
namespace: ripplecore-production
spec:
selector:
app: ripplecore-app
ports:
- protocol: TCP
port: 80
targetPort: 3000
type: ClusterIPHorizontal Pod Autoscaler (HPA)
File: k8s/base/app-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: ripplecore-app-hpa
namespace: ripplecore-production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ripplecore-app
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # Scale up when CPU >70%
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80 # Scale up when RAM >80%
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 min before scaling down
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0 # Scale up immediately
policies:
- type: Percent
value: 100
periodSeconds: 30Ingress (Traefik)
File: k8s/base/ingress.yaml
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: ripplecore-app
namespace: ripplecore-production
spec:
entryPoints:
- websecure
routes:
- match: Host(`app.your-domain.com`)
kind: Rule
services:
- name: app-service
port: 80
tls:
certResolver: letsencrypt
---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: ripplecore-api
namespace: ripplecore-production
spec:
entryPoints:
- websecure
routes:
- match: Host(`api.your-domain.com`)
kind: Rule
services:
- name: api-service
port: 80
tls:
certResolver: letsencryptMigration Strategy
Blue-Green Deployment (Recommended)
Minimizes Risk: Run both systems in parallel, switch traffic when validated
Phase 1: Preparation (Week 1)
- Provision Kubernetes cluster (3 nodes)
- Install Rancher and configure networking
- Install Traefik ingress controller
- Setup persistent volumes for PostgreSQL
- Configure DNS for blue environment (
blue.your-domain.com)
Phase 2: Deploy to Blue (Week 2)
- Deploy PostgreSQL StatefulSet
- Restore database from production backup
- Deploy Redis
- Deploy applications (app, api, web)
- Configure ingress routes
- Test all functionality on blue environment
Phase 3: Data Sync (Days 1-2 of Week 3)
- Setup continuous replication: Production DB → K8s DB
- Verify replication lag <1 second
- Monitor for 48 hours
Phase 4: Cutover (Day 3 of Week 3)
- Enable maintenance mode on current production
- Final database sync (stop writes, sync, verify)
- Update DNS:
app.your-domain.com→ K8s load balancer IP - Monitor traffic shift (DNS propagation ~5-60 minutes)
- Disable maintenance mode
- Monitor for 24 hours
Phase 5: Decommission (Week 4)
- Keep old infrastructure for 7 days (rollback safety)
- Archive final backup from old infrastructure
- Delete old servers from Hetzner
- Update documentation
Rollback Plan:
# If issues detected within 24 hours
# 1. Revert DNS to old infrastructure
# 2. Stop K8s applications
# 3. Sync database back to old infrastructure
# 4. Investigate issues, fix, retryPost-Migration Optimization
Cost Optimization
Right-Size Pods:
# Monitor actual resource usage
kubectl top pods -n ripplecore-production
# Adjust resource requests/limits based on actual usage
# Reduce overprovisioning by 20-30%Cluster Autoscaler (scale nodes automatically):
# Install cluster autoscaler
helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm install cluster-autoscaler autoscaler/cluster-autoscaler \
--set cloudProvider=hetzner \
--set autoDiscovery.clusterName=ripplecore-productionUse Spot Instances (if available on Hetzner):
- Save 40-60% on worker nodes
- Suitable for non-critical workloads
Monitoring & Observability
Prometheus + Grafana Stack:
# Install kube-prometheus-stack
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace
# Access Grafana
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80
# Username: admin
# Password: prom-operatorKey Dashboards:
- Cluster overview (CPU, RAM, disk usage)
- Pod metrics (request rates, latencies)
- Node metrics (resource utilization)
- PostgreSQL metrics (connections, queries/sec)
Backup Strategy
Velero (Kubernetes-native backup):
# Install Velero
velero install \
--provider aws \
--bucket ripplecore-k8s-backups \
--backup-location-config region=eu-central,s3Url=https://fsn1.your-objectstorage.com \
--snapshot-location-config region=eu-central \
--secret-file ./credentials-velero
# Create backup schedule
velero schedule create daily-backup \
--schedule="0 3 * * *" \
--include-namespaces ripplecore-production \
--ttl 168h0m0s # Retain for 7 daysMigration Checklist
Pre-Migration
- Team trained on Kubernetes fundamentals
- Kubernetes cluster provisioned and tested
- All manifests created and validated
- Persistent volumes configured
- Ingress controller installed (Traefik)
- DNS records prepared (blue environment)
- Monitoring stack installed (Prometheus + Grafana)
- Backup solution configured (Velero)
Migration
- Deploy PostgreSQL StatefulSet
- Restore database from production backup
- Deploy Redis
- Deploy applications (app, api, web)
- Configure HPA for auto-scaling
- Test all functionality on blue environment
- Setup database replication (production → K8s)
- Monitor replication lag for 48 hours
- Execute DNS cutover during low-traffic window
- Monitor traffic shift and application health
Post-Migration
- Verify all services healthy (24 hours)
- Validate backup restoration
- Tune resource requests/limits
- Configure cluster autoscaler
- Update monitoring dashboards
- Update documentation
- Decommission old infrastructure (after 7 days)
- Conduct post-migration review
- Update runbooks for Kubernetes operations
Kubernetes Learning Resources
Free Resources:
- Kubernetes Official Docs: https://kubernetes.io/docs/
- Kubernetes the Hard Way: https://github.com/kelseyhightower/kubernetes-the-hard-way
- KodeKloud Free Course: https://kodekloud.com/courses/kubernetes-for-the-absolute-beginners/
Paid Training:
- Linux Foundation CKA (Certified Kubernetes Administrator): $395
- Cloud Native Computing Foundation courses: https://training.linuxfoundation.org
Hands-On Practice:
- minikube (local cluster): https://minikube.sigs.k8s.io
- kind (Kubernetes in Docker): https://kind.sigs.k8s.io
- Kubernetes Playground: https://www.katacoda.com/courses/kubernetes
Decision Summary
Stay on Docker/Dokploy if:
- Managing <10 applications
- Serving <100K users
- Team lacks Kubernetes expertise
- Budget <€100/month for infrastructure
Migrate to Kubernetes if:
- Managing >10 applications
- Serving >100K users
- Need horizontal auto-scaling
- Multi-region deployment required
- Team has Kubernetes skills
Current Recommendation for RippleCore: Defer Kubernetes migration
Revisit this decision when:
- Application count exceeds 10
- User base exceeds 100K
- Vertical scaling becomes insufficient
- Team has completed Kubernetes training
Document Version: 1.0 Last Updated: 2025-01-23 Review Cycle: Annually or when scale requirements change Next Review: [Schedule 1 year from now or at 50K users]