logo
Cloud

Kubernetes in production: lessons from the trenches

Hard-won lessons on running Kubernetes at scale, from resource limits to multi-tenant networking.

ID

Infinity DevOps

Author

April 20, 2026

Published

8 min

Read time

Kubernetes in production: lessons from the trenches

Kubernetes is the standard for container orchestration, but transitioning a cluster from a staging playground to a resilient, high-volume production environment requires strict engineering controls. Kubernetes defaults are designed to make application deployments easy, not secure or optimized. Operating Kubernetes clusters at scale requires careful tuning of resources, networking boundaries, and autoscaling. Here are the core lessons gathered from running containerized systems in production.

1. Demystifying Resource Requests and Limits

If you fail to configure resource requests and limits, your cluster scheduler is operating blindly. This can result in node overload, CPU throttling, and application crashes. Let's look at the difference:

  • Resource Requests (Guaranteed Resources): The minimum CPU and memory required for a container to boot and run. The scheduler uses this to place the pod on a suitable node. If requests are set too high, you waste resources; if too low, nodes become overcommitted.
  • Resource Limits (Ceiling Limits): The maximum resources a container is allowed to consume. If a container exceeds its memory limit, the Linux kernel's Out-Of-Memory (OOM) killer will immediately terminate it, resulting in an OOMKilled status. If it exceeds its CPU limit, Kubernetes throttles the CPU cycles, resulting in high latency and slow response times.

2. Hardening Multi-Tenant Networking

By default, Kubernetes uses a flat network model. Any pod in any namespace can communicate with any other pod. If a public-facing frontend service is compromised, an attacker could scan and interact with databases or cache clusters in other namespaces. Enforce security boundaries:

  • Default Deny All: Deploy a default deny-all network policy for both ingress and egress traffic in every namespace.
  • Explicit Whitelisting: Write explicit policies to allow only specific components (e.g., ingress controllers to frontend, frontend to backend APIs, and backend APIs to databases).

3. Production-Ready Kubernetes Deployment Example

Here is an example deployment manifest incorporating resource limits, liveness and readiness probes, and rolling update strategies:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payments-api
  namespace: finance
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    spec:
      containers:
      - name: api
        image: infinitydevops/payments-api:v1.4.2
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "1000m"
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 10
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 20

4. Managed Control Planes: Focus on Applications

Unless you are running a massive bare-metal infrastructure, do not attempt to self-host your Kubernetes control plane (kube-apiserver, etcd, controller-manager). Keeping etcd clusters synchronized, backed up, and upgraded is a massive operational burden. Offload this responsibility to cloud providers using managed services like Amazon EKS, Google GKE, or Azure AKS. Let the cloud provider guarantee control plane availability while your engineers focus on container workloads.

5. Horizontal Pod & Cluster Autoscaling

Manual scaling is too slow to handle traffic spikes. Deploy the Horizontal Pod Autoscaler (HPA) to scale pods based on CPU, memory, or custom business metrics (e.g., message queue depth using KEDA). Pair HPA with node autoscaling tools like Karpenter or Cluster Autoscaler to dynamically provision new virtual machines when the cluster runs out of capacity.

Tagged with:CloudDevOpsBest Practices
ID

Infinity DevOps

Sharing practical DevOps knowledge with the community.

Work with us
Keep Reading

Related Articles

View all articles