RippleCore
Infrastructure

Kubernetes Migration Guide

Migration path from Docker/Dokploy to Kubernetes for large-scale deployments

Migration path from Docker/Dokploy to Kubernetes

When to Migrate: >10 apps, >100K users, need for auto-scaling and advanced orchestration Estimated Effort: 2-4 weeks (preparation + migration + validation) Cost Impact: €100-200/month (managed Kubernetes) or €60-100/month (self-managed)

Table of Contents

Migration Decision Framework

When to Migrate to Kubernetes

Migrate When:

  • Managing >10 applications
  • Traffic >100K concurrent users
  • Need horizontal pod autoscaling (HPA)
  • Multi-region deployment required
  • Complex service mesh needed
  • Team has Kubernetes expertise
  • CI/CD pipeline needs advanced orchestration

Don't Migrate If:

  • <5 applications running
  • <50K users
  • Simple scaling needs (vertical scaling sufficient)
  • Team lacks Kubernetes experience
  • Budget constraints (<€100/month for infrastructure)

Current RippleCore Status (from ARCHITECTURE.md):

  • Apps: 4 (app, api, web, docs)
  • Users: 1K-50K (medium scale)
  • Complexity: Moderate
  • Recommendation: Stay on Docker/Dokploy for now

Migrate when: >10 apps OR >100K users OR need auto-scaling

Cost Comparison

ApproachMonthly CostProsCons
Current (Dokploy)€35-60Simple, cost-effective, sufficientManual scaling, limited HA
Self-Managed K8s€60-100Full control, customizableRequires expertise, maintenance overhead
Hetzner Cloud K8sN/AN/ANot available (use Rancher on VPS)
Managed K8s (Civo)€90-150Managed control plane, easyLess control, vendor lock-in
GKE/EKS/AKS€150-300Enterprise features, supportExpensive, complex billing

Recommended Path: Self-managed K8s on Hetzner VPS (cost-effective, full control)

Pre-Migration Planning

Infrastructure Readiness Checklist

Current State Audit:

  • Document all running services (4 apps + DB + Redis + Traefik)
  • Map environment variables for all services
  • Identify persistent volumes (PostgreSQL data, Redis data)
  • Document network dependencies (app → DB, app → Redis)
  • Export current resource usage (CPU, RAM per service)

Kubernetes Requirements:

  • Minimum 3 nodes for high availability
  • Each node: 4 vCPU, 8GB RAM (CPX32 or larger)
  • Separate network for cluster communication
  • Load balancer for ingress (Hetzner Cloud Load Balancer)
  • Object storage for backups (already have S3)

Estimated Infrastructure (3-node cluster):

Control Plane + Worker 1: CPX42 (8 vCPU, 16GB) - €26.99/mo
Worker 2: CPX32 (4 vCPU, 8GB) - €11.99/mo
Worker 3: CPX32 (4 vCPU, 8GB) - €11.99/mo
Load Balancer: Hetzner Cloud LB - €5.83/mo
Object Storage: 50GB (backups) - €0.25/mo
Total: ~€57/month (vs. €36 current)

Skills & Training Preparation

Required Skills:

  • Kubernetes fundamentals (pods, services, deployments)
  • Helm package management
  • kubectl CLI proficiency
  • YAML manifest creation
  • Troubleshooting Kubernetes networking

Training Resources (2-4 weeks):

Team Readiness:

  • At least 2 team members should complete training
  • Practice deploying simple apps to local cluster (minikube)
  • Run chaos engineering experiments (kill pods, simulate failures)

Kubernetes Platform Options

Tool: Rancher on Hetzner Cloud VPS

Pros:

  • Full control over cluster configuration
  • Cost-effective (use existing Hetzner infrastructure)
  • No vendor lock-in
  • Rancher provides management UI

Cons:

  • Requires Kubernetes expertise
  • Manual upgrades and maintenance
  • Team responsible for security patches

Setup Guide:

1. Provision Servers (3 nodes):

# Via Hetzner Cloud Console
# Create 3x CPX32 servers with private network

NODE_1_IP="10.0.3.10"
NODE_2_IP="10.0.3.11"
NODE_3_IP="10.0.3.12"

2. Install RKE2 (Rancher Kubernetes Engine):

# On first node (control plane + worker)
curl -sfL https://get.rke2.io | sh -
systemctl enable rke2-server.service
systemctl start rke2-server.service

# Get join token
cat /var/lib/rancher/rke2/server/node-token

# On worker nodes
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE="agent" sh -
systemctl enable rke2-agent.service

# Configure agent
mkdir -p /etc/rancher/rke2/
cat > /etc/rancher/rke2/config.yaml <<EOF
server: https://10.0.3.10:9345
token: <NODE_TOKEN>
EOF

systemctl start rke2-agent.service

3. Install kubectl and Helm:

# kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

# Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

# Configure kubeconfig
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
kubectl get nodes

4. Install Rancher Management:

# Add Helm repo
helm repo add rancher-latest https://releases.rancher.com/server-charts/latest
helm repo update

# Create namespace
kubectl create namespace cattle-system

# Install cert-manager (for SSL)
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml

# Install Rancher
helm install rancher rancher-latest/rancher \
  --namespace cattle-system \
  --set hostname=rancher.your-domain.com \
  --set bootstrapPassword=admin \
  --set ingress.tls.source=letsEncrypt \
  --set letsEncrypt.email=admin@your-domain.com

# Access Rancher UI
# https://rancher.your-domain.com

Option 2: Managed Kubernetes (Civo)

Pros:

  • Managed control plane (no maintenance)
  • Quick setup (<10 minutes)
  • UK-based (EU data residency)

Cons:

  • €90-150/month (more expensive)
  • Vendor lock-in
  • Less control over cluster config

Setup:

  1. Sign up at https://civo.com
  2. Create cluster via UI (3 medium nodes)
  3. Download kubeconfig
  4. Deploy apps via kubectl/Helm

Application Containerization

Dockerfile Optimization for Kubernetes

Multi-Stage Build (reduce image size):

# apps/app/Dockerfile.k8s
# ============================================================================
# STAGE 1: Dependencies
# ============================================================================
FROM node:20-alpine AS deps

WORKDIR /app

# Copy package files
COPY package.json pnpm-lock.yaml ./
COPY .npmrc ./

# Install dependencies
RUN corepack enable && \
    corepack prepare pnpm@latest --activate && \
    pnpm install --frozen-lockfile --prod

# ============================================================================
# STAGE 2: Builder
# ============================================================================
FROM node:20-alpine AS builder

WORKDIR /app

# Copy dependencies from deps stage
COPY --from=deps /app/node_modules ./node_modules

# Copy source code
COPY . .

# Build application
RUN pnpm build

# ============================================================================
# STAGE 3: Runner (Final Image)
# ============================================================================
FROM node:20-alpine AS runner

WORKDIR /app

# Create non-root user
RUN addgroup --system --gid 1001 nodejs && \
    adduser --system --uid 1001 nextjs

# Copy necessary files only
COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./
COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static
COPY --from=builder --chown=nextjs:nodejs /app/public ./public

# Switch to non-root user
USER nextjs

EXPOSE 3000

ENV NODE_ENV=production
ENV PORT=3000

# Health check
HEALTHCHECK --interval=30s --timeout=5s --start-period=60s --retries=3 \
  CMD node -e "require('http').get('http://localhost:3000/api/health', (r) => {process.exit(r.statusCode === 200 ? 0 : 1)})"

CMD ["node", "server.js"]

Build and Push:

# Build for multiple platforms (ARM + x86)
docker buildx build --platform linux/amd64,linux/arm64 \
  -t ghcr.io/your-org/ripplecore-app:latest \
  -t ghcr.io/your-org/ripplecore-app:v1.0.0 \
  --push \
  -f apps/app/Dockerfile.k8s .

Kubernetes Manifests

Namespace and ConfigMap

File: k8s/base/namespace.yaml

apiVersion: v1
kind: Namespace
metadata:
  name: ripplecore-production
  labels:
    name: ripplecore-production
    environment: production

File: k8s/base/configmap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: ripplecore-config
  namespace: ripplecore-production
data:
  NODE_ENV: "production"
  NEXT_PUBLIC_APP_URL: "https://app.your-domain.com"
  BETTER_AUTH_URL: "https://app.your-domain.com"
  BETTER_AUTH_TRUST_HOST: "true"

Secrets Management

File: k8s/base/secrets.yaml

apiVersion: v1
kind: Secret
metadata:
  name: ripplecore-secrets
  namespace: ripplecore-production
type: Opaque
stringData:
  DATABASE_URL: "postgresql://ripplecore:<secret>@postgres-service:5432/ripplecore"
  REDIS_URL: "redis://:<secret>@redis-service:6379"
  BETTER_AUTH_SECRET: "<secret>"
  SENTRY_DSN: "<sentry-dsn>"
  ARCJET_KEY: "<arcjet-key>"

Better Approach: Use Sealed Secrets or External Secrets Operator

# Install Sealed Secrets
kubectl apply -f https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.24.0/controller.yaml

# Create sealed secret
echo -n 'DATABASE_URL=postgresql://...' | \
  kubectl create secret generic ripplecore-secrets --dry-run=client --from-file=/dev/stdin -o yaml | \
  kubeseal -o yaml > k8s/base/sealed-secrets.yaml

PostgreSQL StatefulSet

File: k8s/base/postgres-statefulset.yaml

apiVersion: v1
kind: Service
metadata:
  name: postgres-service
  namespace: ripplecore-production
spec:
  selector:
    app: postgres
  ports:
    - port: 5432
      targetPort: 5432
  clusterIP: None # Headless service for StatefulSet

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: ripplecore-production
spec:
  serviceName: postgres-service
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:18-alpine
          ports:
            - containerPort: 5432
          env:
            - name: POSTGRES_USER
              value: ripplecore
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: ripplecore-secrets
                  key: POSTGRES_PASSWORD
            - name: POSTGRES_DB
              value: ripplecore
            - name: PGDATA
              value: /var/lib/postgresql/data/pgdata
          volumeMounts:
            - name: postgres-storage
              mountPath: /var/lib/postgresql/data
          resources:
            requests:
              memory: "2Gi"
              cpu: "1000m"
            limits:
              memory: "4Gi"
              cpu: "2000m"
          livenessProbe:
            exec:
              command:
                - pg_isready
                - -U
                - ripplecore
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            exec:
              command:
                - pg_isready
                - -U
                - ripplecore
            initialDelaySeconds: 5
            periodSeconds: 5
  volumeClaimTemplates:
    - metadata:
        name: postgres-storage
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: hcloud-volumes # Hetzner Cloud Volumes
        resources:
          requests:
            storage: 50Gi

Application Deployment

File: k8s/base/app-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ripplecore-app
  namespace: ripplecore-production
  labels:
    app: ripplecore-app
spec:
  replicas: 3 # Horizontal scaling
  selector:
    matchLabels:
      app: ripplecore-app
  template:
    metadata:
      labels:
        app: ripplecore-app
    spec:
      containers:
        - name: app
          image: ghcr.io/your-org/ripplecore-app:latest
          imagePullPolicy: Always
          ports:
            - containerPort: 3000
          envFrom:
            - configMapRef:
                name: ripplecore-config
            - secretRef:
                name: ripplecore-secrets
          resources:
            requests:
              memory: "1Gi"
              cpu: "500m"
            limits:
              memory: "2Gi"
              cpu: "1000m"
          livenessProbe:
            httpGet:
              path: /api/health
              port: 3000
            initialDelaySeconds: 60
            periodSeconds: 10
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /api/health
              port: 3000
            initialDelaySeconds: 30
            periodSeconds: 5
            failureThreshold: 3
      imagePullSecrets:
        - name: ghcr-secret

---
apiVersion: v1
kind: Service
metadata:
  name: app-service
  namespace: ripplecore-production
spec:
  selector:
    app: ripplecore-app
  ports:
    - protocol: TCP
      port: 80
      targetPort: 3000
  type: ClusterIP

Horizontal Pod Autoscaler (HPA)

File: k8s/base/app-hpa.yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ripplecore-app-hpa
  namespace: ripplecore-production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ripplecore-app
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70 # Scale up when CPU >70%
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80 # Scale up when RAM >80%
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300 # Wait 5 min before scaling down
      policies:
        - type: Percent
          value: 50
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0 # Scale up immediately
      policies:
        - type: Percent
          value: 100
          periodSeconds: 30

Ingress (Traefik)

File: k8s/base/ingress.yaml

apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: ripplecore-app
  namespace: ripplecore-production
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`app.your-domain.com`)
      kind: Rule
      services:
        - name: app-service
          port: 80
  tls:
    certResolver: letsencrypt

---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: ripplecore-api
  namespace: ripplecore-production
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`api.your-domain.com`)
      kind: Rule
      services:
        - name: api-service
          port: 80
  tls:
    certResolver: letsencrypt

Migration Strategy

Minimizes Risk: Run both systems in parallel, switch traffic when validated

Phase 1: Preparation (Week 1)

  • Provision Kubernetes cluster (3 nodes)
  • Install Rancher and configure networking
  • Install Traefik ingress controller
  • Setup persistent volumes for PostgreSQL
  • Configure DNS for blue environment (blue.your-domain.com)

Phase 2: Deploy to Blue (Week 2)

  • Deploy PostgreSQL StatefulSet
  • Restore database from production backup
  • Deploy Redis
  • Deploy applications (app, api, web)
  • Configure ingress routes
  • Test all functionality on blue environment

Phase 3: Data Sync (Days 1-2 of Week 3)

  • Setup continuous replication: Production DB → K8s DB
  • Verify replication lag <1 second
  • Monitor for 48 hours

Phase 4: Cutover (Day 3 of Week 3)

  • Enable maintenance mode on current production
  • Final database sync (stop writes, sync, verify)
  • Update DNS: app.your-domain.com → K8s load balancer IP
  • Monitor traffic shift (DNS propagation ~5-60 minutes)
  • Disable maintenance mode
  • Monitor for 24 hours

Phase 5: Decommission (Week 4)

  • Keep old infrastructure for 7 days (rollback safety)
  • Archive final backup from old infrastructure
  • Delete old servers from Hetzner
  • Update documentation

Rollback Plan:

# If issues detected within 24 hours
# 1. Revert DNS to old infrastructure
# 2. Stop K8s applications
# 3. Sync database back to old infrastructure
# 4. Investigate issues, fix, retry

Post-Migration Optimization

Cost Optimization

Right-Size Pods:

# Monitor actual resource usage
kubectl top pods -n ripplecore-production

# Adjust resource requests/limits based on actual usage
# Reduce overprovisioning by 20-30%

Cluster Autoscaler (scale nodes automatically):

# Install cluster autoscaler
helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm install cluster-autoscaler autoscaler/cluster-autoscaler \
--set cloudProvider=hetzner \
--set autoDiscovery.clusterName=ripplecore-production

Use Spot Instances (if available on Hetzner):

  • Save 40-60% on worker nodes
  • Suitable for non-critical workloads

Monitoring & Observability

Prometheus + Grafana Stack:

# Install kube-prometheus-stack
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace

# Access Grafana
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80
# Username: admin
# Password: prom-operator

Key Dashboards:

  • Cluster overview (CPU, RAM, disk usage)
  • Pod metrics (request rates, latencies)
  • Node metrics (resource utilization)
  • PostgreSQL metrics (connections, queries/sec)

Backup Strategy

Velero (Kubernetes-native backup):

# Install Velero
velero install \
  --provider aws \
  --bucket ripplecore-k8s-backups \
  --backup-location-config region=eu-central,s3Url=https://fsn1.your-objectstorage.com \
  --snapshot-location-config region=eu-central \
  --secret-file ./credentials-velero

# Create backup schedule
velero schedule create daily-backup \
  --schedule="0 3 * * *" \
  --include-namespaces ripplecore-production \
  --ttl 168h0m0s  # Retain for 7 days

Migration Checklist

Pre-Migration

  • Team trained on Kubernetes fundamentals
  • Kubernetes cluster provisioned and tested
  • All manifests created and validated
  • Persistent volumes configured
  • Ingress controller installed (Traefik)
  • DNS records prepared (blue environment)
  • Monitoring stack installed (Prometheus + Grafana)
  • Backup solution configured (Velero)

Migration

  • Deploy PostgreSQL StatefulSet
  • Restore database from production backup
  • Deploy Redis
  • Deploy applications (app, api, web)
  • Configure HPA for auto-scaling
  • Test all functionality on blue environment
  • Setup database replication (production → K8s)
  • Monitor replication lag for 48 hours
  • Execute DNS cutover during low-traffic window
  • Monitor traffic shift and application health

Post-Migration

  • Verify all services healthy (24 hours)
  • Validate backup restoration
  • Tune resource requests/limits
  • Configure cluster autoscaler
  • Update monitoring dashboards
  • Update documentation
  • Decommission old infrastructure (after 7 days)
  • Conduct post-migration review
  • Update runbooks for Kubernetes operations

Kubernetes Learning Resources

Free Resources:

Paid Training:

Hands-On Practice:

Decision Summary

Stay on Docker/Dokploy if:

  • Managing <10 applications
  • Serving <100K users
  • Team lacks Kubernetes expertise
  • Budget <€100/month for infrastructure

Migrate to Kubernetes if:

  • Managing >10 applications
  • Serving >100K users
  • Need horizontal auto-scaling
  • Multi-region deployment required
  • Team has Kubernetes skills

Current Recommendation for RippleCore: Defer Kubernetes migration

Revisit this decision when:

  • Application count exceeds 10
  • User base exceeds 100K
  • Vertical scaling becomes insufficient
  • Team has completed Kubernetes training

Document Version: 1.0 Last Updated: 2025-01-23 Review Cycle: Annually or when scale requirements change Next Review: [Schedule 1 year from now or at 50K users]