Why Migrate at All?
Docker Compose is brilliant for local development and small deployments. I ran a production SaaS on it for two years with zero regrets. The pain starts when you need horizontal scaling, zero-downtime deployments, or multi-node resilience. If your app runs on a single server and you're happy with that, stop reading — Compose is fine.
But if you've hit any of these walls, Kubernetes is worth the complexity tax:
- You need to scale individual services independently
- Deployments cause downtime because containers restart sequentially
- A single node failure takes your entire stack offline
- You're manually SSHing into servers to debug or restart services
The Incremental Approach
The biggest mistake I see teams make is trying to migrate everything at once. Instead, I follow a three-phase approach that keeps production stable throughout.
Phase 1: Containerize Properly
If your Dockerfiles aren't production-ready, fix them first. Multi-stage builds, non-root users, health checks, and proper signal handling. None of this is Kubernetes-specific, and it'll improve your Compose setup too.
A good production Dockerfile follows these principles: minimal base images (Alpine or distroless), explicit version pinning, layer caching optimization, and proper ENTRYPOINT/CMD separation.
Phase 2: Extract Configuration
Move all environment variables into a structured config system. In Compose you might have them scattered across .env files and docker-compose.yml. In Kubernetes, they'll live in ConfigMaps and Secrets.
Create a configuration matrix that maps every environment variable to its source, whether it's sensitive, and which services consume it. This exercise alone catches configuration bugs.
Phase 3: Migrate Service by Service
Start with your least critical service — maybe a background worker or internal tool. Create the Kubernetes manifests: Deployment, Service, ConfigMap. Deploy it alongside your Compose stack. Once it's stable, route traffic to it and decommission the Compose version.
Repeat for each service, saving your most critical (usually your API gateway or primary database) for last.
Manifest Patterns That Work
After three migrations, I've settled on a manifest structure that scales well. Each service gets its own directory with four files: deployment.yaml, service.yaml, configmap.yaml, and optionally an hpa.yaml for autoscaling.
Resource Limits
Always set both requests and limits. Requests determine scheduling — Kubernetes uses them to decide which node gets your pod. Limits prevent runaway containers from starving neighbors. Start conservative and adjust based on actual metrics.
A pattern I like: set requests to your P50 usage and limits to 2x that. Monitor for a week, then tighten. Under-provisioning causes OOMKills; over-provisioning wastes money.
Health Checks
Kubernetes health checks are more nuanced than Docker's HEALTHCHECK. You get three types: startup probes (for slow-starting apps), liveness probes (should I restart this container?), and readiness probes (should I send traffic here?).
The most common mistake is making liveness probes too aggressive. If your app takes 30 seconds to start, don't set a liveness probe with a 10-second timeout — Kubernetes will kill it in a restart loop. Use startup probes for slow starters and keep liveness probes simple (a basic TCP check or lightweight HTTP endpoint).
Rolling Updates
Configure your deployment strategy for zero-downtime updates. The key settings are maxSurge (how many extra pods during update) and maxUnavailable (how many pods can be down simultaneously). For most services, maxSurge of 1 and maxUnavailable of 0 gives you safe, sequential updates.
Networking Gotchas
The biggest surprise for Compose-to-Kubernetes migrants is networking. In Compose, services talk to each other by name on a shared Docker network. In Kubernetes, you have ClusterIP services, DNS resolution, and potentially network policies.
Service discovery works similarly — you reference services by their Kubernetes Service name — but the underlying mechanics differ. DNS resolution can add latency, and you'll want to understand the difference between ClusterIP, NodePort, and LoadBalancer service types.
Ingress controllers replace your Compose reverse proxy (usually Nginx or Traefik). I recommend starting with the Nginx Ingress Controller for familiarity, then evaluating alternatives like Traefik or Istio once you're comfortable.
Storage Considerations
Stateless services migrate easily. Stateful services — databases, file storage, message queues — require more thought. My recommendation: don't run databases in Kubernetes unless you have a dedicated platform team. Use managed services (RDS, Cloud SQL, managed Redis) and connect to them from your cluster.
For file storage, replace Docker volumes with PersistentVolumeClaims backed by your cloud provider's storage class. For shared storage across pods, consider NFS or a cloud-native solution like EFS.
Monitoring the Migration
You need observability before, during, and after migration. Set up Prometheus and Grafana (or your preferred stack) to monitor both your Compose and Kubernetes workloads simultaneously. Key metrics to track: request latency, error rates, resource utilization, and pod restart counts.
During migration, run both stacks in parallel and compare metrics. Any degradation in the Kubernetes version should be investigated before decommissioning the Compose equivalent.
Lessons Learned
After three migrations, here's what I wish I'd known from the start: first, invest in local development tooling early — Skaffold, Tilt, or similar tools make the developer experience bearable. Second, namespaces are your friend; use them to isolate environments. Third, RBAC is not optional; set it up from day one. Fourth, GitOps (ArgoCD or Flux) pays for itself within the first month.
The migration itself typically takes 2-4 weeks for a team of two, depending on the number of services and the complexity of your stateful workloads. Plan for another month of tuning and stabilization before calling it done.