Kubernetes Migration Guide

Migration path from Docker/Dokploy to Kubernetes

When to Migrate: >10 apps, >100K users, need for auto-scaling and advanced orchestration Estimated Effort: 2-4 weeks (preparation + migration + validation) Cost Impact: €100-200/month (managed Kubernetes) or €60-100/month (self-managed)

Migration Decision Framework
Pre-Migration Planning
Kubernetes Platform Options
Application Containerization
Kubernetes Manifests
Migration Strategy
Post-Migration Optimization

Migration Decision Framework

When to Migrate to Kubernetes

Migrate When:

Managing >10 applications
Traffic >100K concurrent users
Need horizontal pod autoscaling (HPA)
Multi-region deployment required
Complex service mesh needed
Team has Kubernetes expertise
CI/CD pipeline needs advanced orchestration

Don't Migrate If:

<5 applications running
<50K users
Simple scaling needs (vertical scaling sufficient)
Team lacks Kubernetes experience
Budget constraints (<€100/month for infrastructure)

Current RippleCore Status (from ARCHITECTURE.md):

Apps: 4 (app, api, web, docs)
Users: 1K-50K (medium scale)
Complexity: Moderate
Recommendation: Stay on Docker/Dokploy for now

Migrate when: >10 apps OR >100K users OR need auto-scaling

Cost Comparison

Approach	Monthly Cost	Pros	Cons
Current (Dokploy)	€35-60	Simple, cost-effective, sufficient	Manual scaling, limited HA
Self-Managed K8s	€60-100	Full control, customizable	Requires expertise, maintenance overhead
Hetzner Cloud K8s	N/A	N/A	Not available (use Rancher on VPS)
Managed K8s (Civo)	€90-150	Managed control plane, easy	Less control, vendor lock-in
GKE/EKS/AKS	€150-300	Enterprise features, support	Expensive, complex billing

Recommended Path: Self-managed K8s on Hetzner VPS (cost-effective, full control)

Pre-Migration Planning

Infrastructure Readiness Checklist

Current State Audit:

Document all running services (4 apps + DB + Redis + Traefik)
Map environment variables for all services
Identify persistent volumes (PostgreSQL data, Redis data)
Document network dependencies (app → DB, app → Redis)
Export current resource usage (CPU, RAM per service)

Kubernetes Requirements:

Minimum 3 nodes for high availability
Each node: 4 vCPU, 8GB RAM (CPX32 or larger)
Separate network for cluster communication
Load balancer for ingress (Hetzner Cloud Load Balancer)
Object storage for backups (already have S3)

Estimated Infrastructure (3-node cluster):

Control Plane + Worker 1: CPX42 (8 vCPU, 16GB) - €26.99/mo
Worker 2: CPX32 (4 vCPU, 8GB) - €11.99/mo
Worker 3: CPX32 (4 vCPU, 8GB) - €11.99/mo
Load Balancer: Hetzner Cloud LB - €5.83/mo
Object Storage: 50GB (backups) - €0.25/mo
Total: ~€57/month (vs. €36 current)

Skills & Training Preparation

Required Skills:

Kubernetes fundamentals (pods, services, deployments)
Helm package management
kubectl CLI proficiency
YAML manifest creation
Troubleshooting Kubernetes networking

Training Resources (2-4 weeks):

Free: Kubernetes official documentation (https://kubernetes.io/docs/)
Free: KodeKloud Kubernetes for Beginners (https://kodekloud.com)
Paid: Linux Foundation CKA course ($300)
Hands-on: Minikube or kind for local practice

Team Readiness:

At least 2 team members should complete training
Practice deploying simple apps to local cluster (minikube)
Run chaos engineering experiments (kill pods, simulate failures)

Kubernetes Platform Options

Option 1: Self-Managed Kubernetes (Recommended)

Tool: Rancher on Hetzner Cloud VPS

Pros:

Full control over cluster configuration
Cost-effective (use existing Hetzner infrastructure)
No vendor lock-in
Rancher provides management UI

Cons:

Requires Kubernetes expertise
Manual upgrades and maintenance
Team responsible for security patches

Setup Guide:

1. Provision Servers (3 nodes):

# Via Hetzner Cloud Console
# Create 3x CPX32 servers with private network

NODE_1_IP="10.0.3.10"
NODE_2_IP="10.0.3.11"
NODE_3_IP="10.0.3.12"

2. Install RKE2 (Rancher Kubernetes Engine):

# On first node (control plane + worker)
curl -sfL https://get.rke2.io | sh -
systemctl enable rke2-server.service
systemctl start rke2-server.service

# Get join token
cat /var/lib/rancher/rke2/server/node-token

# On worker nodes
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE="agent" sh -
systemctl enable rke2-agent.service

# Configure agent
mkdir -p /etc/rancher/rke2/
cat > /etc/rancher/rke2/config.yaml <<EOF
server: https://10.0.3.10:9345
token: <NODE_TOKEN>
EOF

systemctl start rke2-agent.service

3. Install kubectl and Helm:

# kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

# Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

# Configure kubeconfig
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
kubectl get nodes

4. Install Rancher Management:

# Add Helm repo
helm repo add rancher-latest https://releases.rancher.com/server-charts/latest
helm repo update

# Create namespace
kubectl create namespace cattle-system

# Install cert-manager (for SSL)
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml

# Install Rancher
helm install rancher rancher-latest/rancher \
  --namespace cattle-system \
  --set hostname=rancher.your-domain.com \
  --set bootstrapPassword=admin \
  --set ingress.tls.source=letsEncrypt \
  --set letsEncrypt.email=admin@your-domain.com

# Access Rancher UI
# https://rancher.your-domain.com

Option 2: Managed Kubernetes (Civo)

Pros:

Managed control plane (no maintenance)
Quick setup (<10 minutes)
UK-based (EU data residency)

Cons:

€90-150/month (more expensive)
Vendor lock-in
Less control over cluster config

Setup:

Sign up at https://civo.com
Create cluster via UI (3 medium nodes)
Download kubeconfig
Deploy apps via kubectl/Helm

Application Containerization

Dockerfile Optimization for Kubernetes

Multi-Stage Build (reduce image size):

# apps/app/Dockerfile.k8s
# ============================================================================
# STAGE 1: Dependencies
# ============================================================================
FROM node:20-alpine AS deps

WORKDIR /app

# Copy package files
COPY package.json pnpm-lock.yaml ./
COPY .npmrc ./

# Install dependencies
RUN corepack enable && \
    corepack prepare pnpm@latest --activate && \
    pnpm install --frozen-lockfile --prod

# ============================================================================
# STAGE 2: Builder
# ============================================================================
FROM node:20-alpine AS builder

WORKDIR /app

# Copy dependencies from deps stage
COPY --from=deps /app/node_modules ./node_modules

# Copy source code
COPY . .

# Build application
RUN pnpm build

# ============================================================================
# STAGE 3: Runner (Final Image)
# ============================================================================
FROM node:20-alpine AS runner

WORKDIR /app

# Create non-root user
RUN addgroup --system --gid 1001 nodejs && \
    adduser --system --uid 1001 nextjs

# Copy necessary files only
COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./
COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static
COPY --from=builder --chown=nextjs:nodejs /app/public ./public

# Switch to non-root user
USER nextjs

EXPOSE 3000

ENV NODE_ENV=production
ENV PORT=3000

# Health check
HEALTHCHECK --interval=30s --timeout=5s --start-period=60s --retries=3 \
  CMD node -e "require('http').get('http://localhost:3000/api/health', (r) => {process.exit(r.statusCode === 200 ? 0 : 1)})"

CMD ["node", "server.js"]

Build and Push:

# Build for multiple platforms (ARM + x86)
docker buildx build --platform linux/amd64,linux/arm64 \
  -t ghcr.io/your-org/ripplecore-app:latest \
  -t ghcr.io/your-org/ripplecore-app:v1.0.0 \
  --push \
  -f apps/app/Dockerfile.k8s .

Kubernetes Manifests

Namespace and ConfigMap

File: k8s/base/namespace.yaml

apiVersion: v1
kind: Namespace
metadata:
  name: ripplecore-production
  labels:
    name: ripplecore-production
    environment: production

File: k8s/base/configmap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: ripplecore-config
  namespace: ripplecore-production
data:
  NODE_ENV: "production"
  NEXT_PUBLIC_APP_URL: "https://app.your-domain.com"
  BETTER_AUTH_URL: "https://app.your-domain.com"
  BETTER_AUTH_TRUST_HOST: "true"

Secrets Management

File: k8s/base/secrets.yaml

apiVersion: v1
kind: Secret
metadata:
  name: ripplecore-secrets
  namespace: ripplecore-production
type: Opaque
stringData:
  DATABASE_URL: "postgresql://ripplecore:<secret>@postgres-service:5432/ripplecore"
  REDIS_URL: "redis://:<secret>@redis-service:6379"
  BETTER_AUTH_SECRET: "<secret>"
  SENTRY_DSN: "<sentry-dsn>"
  ARCJET_KEY: "<arcjet-key>"

Better Approach: Use Sealed Secrets or External Secrets Operator

# Install Sealed Secrets
kubectl apply -f https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.24.0/controller.yaml

# Create sealed secret
echo -n 'DATABASE_URL=postgresql://...' | \
  kubectl create secret generic ripplecore-secrets --dry-run=client --from-file=/dev/stdin -o yaml | \
  kubeseal -o yaml > k8s/base/sealed-secrets.yaml

PostgreSQL StatefulSet

File: k8s/base/postgres-statefulset.yaml

apiVersion: v1
kind: Service
metadata:
  name: postgres-service
  namespace: ripplecore-production
spec:
  selector:
    app: postgres
  ports:
    - port: 5432
      targetPort: 5432
  clusterIP: None # Headless service for StatefulSet

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: ripplecore-production
spec:
  serviceName: postgres-service
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:18-alpine
          ports:
            - containerPort: 5432
          env:
            - name: POSTGRES_USER
              value: ripplecore
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: ripplecore-secrets
                  key: POSTGRES_PASSWORD
            - name: POSTGRES_DB
              value: ripplecore
            - name: PGDATA
              value: /var/lib/postgresql/data/pgdata
          volumeMounts:
            - name: postgres-storage
              mountPath: /var/lib/postgresql/data
          resources:
            requests:
              memory: "2Gi"
              cpu: "1000m"
            limits:
              memory: "4Gi"
              cpu: "2000m"
          livenessProbe:
            exec:
              command:
                - pg_isready
                - -U
                - ripplecore
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            exec:
              command:
                - pg_isready
                - -U
                - ripplecore
            initialDelaySeconds: 5
            periodSeconds: 5
  volumeClaimTemplates:
    - metadata:
        name: postgres-storage
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: hcloud-volumes # Hetzner Cloud Volumes
        resources:
          requests:
            storage: 50Gi

Application Deployment

File: k8s/base/app-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ripplecore-app
  namespace: ripplecore-production
  labels:
    app: ripplecore-app
spec:
  replicas: 3 # Horizontal scaling
  selector:
    matchLabels:
      app: ripplecore-app
  template:
    metadata:
      labels:
        app: ripplecore-app
    spec:
      containers:
        - name: app
          image: ghcr.io/your-org/ripplecore-app:latest
          imagePullPolicy: Always
          ports:
            - containerPort: 3000
          envFrom:
            - configMapRef:
                name: ripplecore-config
            - secretRef:
                name: ripplecore-secrets
          resources:
            requests:
              memory: "1Gi"
              cpu: "500m"
            limits:
              memory: "2Gi"
              cpu: "1000m"
          livenessProbe:
            httpGet:
              path: /api/health
              port: 3000
            initialDelaySeconds: 60
            periodSeconds: 10
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /api/health
              port: 3000
            initialDelaySeconds: 30
            periodSeconds: 5
            failureThreshold: 3
      imagePullSecrets:
        - name: ghcr-secret

---
apiVersion: v1
kind: Service
metadata:
  name: app-service
  namespace: ripplecore-production
spec:
  selector:
    app: ripplecore-app
  ports:
    - protocol: TCP
      port: 80
      targetPort: 3000
  type: ClusterIP

Horizontal Pod Autoscaler (HPA)

File: k8s/base/app-hpa.yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ripplecore-app-hpa
  namespace: ripplecore-production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ripplecore-app
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70 # Scale up when CPU >70%
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80 # Scale up when RAM >80%
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300 # Wait 5 min before scaling down
      policies:
        - type: Percent
          value: 50
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0 # Scale up immediately
      policies:
        - type: Percent
          value: 100
          periodSeconds: 30

Ingress (Traefik)

File: k8s/base/ingress.yaml

apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: ripplecore-app
  namespace: ripplecore-production
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`app.your-domain.com`)
      kind: Rule
      services:
        - name: app-service
          port: 80
  tls:
    certResolver: letsencrypt

---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: ripplecore-api
  namespace: ripplecore-production
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`api.your-domain.com`)
      kind: Rule
      services:
        - name: api-service
          port: 80
  tls:
    certResolver: letsencrypt

Migration Strategy

Blue-Green Deployment (Recommended)

Minimizes Risk: Run both systems in parallel, switch traffic when validated

Phase 1: Preparation (Week 1)

Provision Kubernetes cluster (3 nodes)
Install Rancher and configure networking
Install Traefik ingress controller
Setup persistent volumes for PostgreSQL
Configure DNS for blue environment (blue.your-domain.com)

Phase 2: Deploy to Blue (Week 2)

Deploy PostgreSQL StatefulSet
Restore database from production backup
Deploy Redis
Deploy applications (app, api, web)
Configure ingress routes
Test all functionality on blue environment

Phase 3: Data Sync (Days 1-2 of Week 3)

Setup continuous replication: Production DB → K8s DB
Verify replication lag <1 second
Monitor for 48 hours

Phase 4: Cutover (Day 3 of Week 3)

Enable maintenance mode on current production
Final database sync (stop writes, sync, verify)
Update DNS: app.your-domain.com → K8s load balancer IP
Monitor traffic shift (DNS propagation ~5-60 minutes)
Disable maintenance mode
Monitor for 24 hours

Phase 5: Decommission (Week 4)

Keep old infrastructure for 7 days (rollback safety)
Archive final backup from old infrastructure
Delete old servers from Hetzner
Update documentation

Rollback Plan:

# If issues detected within 24 hours
# 1. Revert DNS to old infrastructure
# 2. Stop K8s applications
# 3. Sync database back to old infrastructure
# 4. Investigate issues, fix, retry

Post-Migration Optimization

Cost Optimization

Right-Size Pods:

# Monitor actual resource usage
kubectl top pods -n ripplecore-production

# Adjust resource requests/limits based on actual usage
# Reduce overprovisioning by 20-30%

Cluster Autoscaler (scale nodes automatically):

# Install cluster autoscaler
helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm install cluster-autoscaler autoscaler/cluster-autoscaler \
--set cloudProvider=hetzner \
--set autoDiscovery.clusterName=ripplecore-production

Use Spot Instances (if available on Hetzner):

Save 40-60% on worker nodes
Suitable for non-critical workloads

Monitoring & Observability

Prometheus + Grafana Stack:

# Install kube-prometheus-stack
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace

# Access Grafana
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80
# Username: admin
# Password: prom-operator

Key Dashboards:

Cluster overview (CPU, RAM, disk usage)
Pod metrics (request rates, latencies)
Node metrics (resource utilization)
PostgreSQL metrics (connections, queries/sec)

Backup Strategy

Velero (Kubernetes-native backup):

# Install Velero
velero install \
  --provider aws \
  --bucket ripplecore-k8s-backups \
  --backup-location-config region=eu-central,s3Url=https://fsn1.your-objectstorage.com \
  --snapshot-location-config region=eu-central \
  --secret-file ./credentials-velero

# Create backup schedule
velero schedule create daily-backup \
  --schedule="0 3 * * *" \
  --include-namespaces ripplecore-production \
  --ttl 168h0m0s  # Retain for 7 days

Migration Checklist

Pre-Migration

Team trained on Kubernetes fundamentals
Kubernetes cluster provisioned and tested
All manifests created and validated
Persistent volumes configured
Ingress controller installed (Traefik)
DNS records prepared (blue environment)
Monitoring stack installed (Prometheus + Grafana)
Backup solution configured (Velero)

Migration

Post-Migration

Verify all services healthy (24 hours)
Validate backup restoration
Tune resource requests/limits
Configure cluster autoscaler
Update monitoring dashboards
Update documentation
Decommission old infrastructure (after 7 days)
Conduct post-migration review
Update runbooks for Kubernetes operations

Kubernetes Learning Resources

Free Resources:

Kubernetes Official Docs: https://kubernetes.io/docs/
Kubernetes the Hard Way: https://github.com/kelseyhightower/kubernetes-the-hard-way
KodeKloud Free Course: https://kodekloud.com/courses/kubernetes-for-the-absolute-beginners/

Paid Training:

Linux Foundation CKA (Certified Kubernetes Administrator): $395
Cloud Native Computing Foundation courses: https://training.linuxfoundation.org

Hands-On Practice:

minikube (local cluster): https://minikube.sigs.k8s.io
kind (Kubernetes in Docker): https://kind.sigs.k8s.io
Kubernetes Playground: https://www.katacoda.com/courses/kubernetes

Decision Summary

Stay on Docker/Dokploy if:

Managing <10 applications
Serving <100K users
Team lacks Kubernetes expertise
Budget <€100/month for infrastructure

Migrate to Kubernetes if:

Managing >10 applications
Serving >100K users
Need horizontal auto-scaling
Multi-region deployment required
Team has Kubernetes skills

Current Recommendation for RippleCore: Defer Kubernetes migration

Revisit this decision when:

Application count exceeds 10
User base exceeds 100K
Vertical scaling becomes insufficient
Team has completed Kubernetes training

Document Version: 1.0 Last Updated: 2025-01-23 Review Cycle: Annually or when scale requirements change Next Review: [Schedule 1 year from now or at 50K users]