Scaling Like a Pro: Horizontal Pod Autoscaling in Kubernetes

“Why run 10 pods when 2 will do? And why run 2 when traffic surges to 1,000 users?”

Enter Horizontal Pod Autoscaler (HPA): Kubernetes' secret weapon for scaling smart.

🎬 Quick Story

I once launched an app demo during a DevOps webinar. The app was running in a K8s cluster — just 1 pod. Everything looked great on the outside… until the traffic hit.

As attendees flooded in to test the app, that single pod choked under pressure, CPU usage shot through the roof, and eventually — it crashed.

⚠️ No replicas. No autoscaling. No safety net.

In seconds, my demo became a case study in what not to do with production-like environments.

That day, I learned a painful but priceless lesson:
1 pod ≠ 1,000 users.

That’s when I discovered HPA — a controller that scales pods dynamically based on CPU, memory, or custom metrics. And today, I won’t deploy a service without it.

Thanks to HPA, Kubernetes is no longer just a scheduler — it’s a smart traffic cop, scaling up during peak hours and scaling down to save resources.

What is Horizontal Pod Autoscaling (HPA)?

HPA automatically adjusts the number of pods in a Deployment, ReplicaSet, or StatefulSet based on observed metrics.

Default metric: CPU utilization
Others supported: Memory, custom metrics, Prometheus metrics (via metrics-server or adapters)

Let’s Build It – Step by Step

We’ll deploy a simple Flask app that spikes CPU on demand to test HPA.

`app.py` (CPU Burner)

from flask import Flask
import time
app = Flask(__name__)

@app.route('/')
def index():
    return "Hello, world!"

@app.route('/load')
def load():
    start = time.time()
    while time.time() - start < 10:
        pass  # burn CPU for 10 seconds
    return "CPU load generated!"

Dockerfile

FROM python:3.11-slim
WORKDIR /app
COPY . .
RUN pip install flask
CMD ["python", "app.py"]

Kubernetes Manifests

Deployment (`flask-deploy.yaml`)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: flask-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: flask
  template:
    metadata:
      labels:
        app: flask
    spec:
      containers:
      - name: flask
        image: your-dockerhub/flask-cpu-app
        ports:
        - containerPort: 5000
        resources:
          limits:
            cpu: "500m"
          requests:
            cpu: "200m"

Service (`flask-svc.yaml`)

apiVersion: v1
kind: Service
metadata:
  name: flask-svc
spec:
  selector:
    app: flask
  ports:
    - protocol: TCP
      port: 80
      targetPort: 5000

HPA (`hpa.yaml`)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: flask-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: flask-app
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

Deploy & Scale

kubectl apply -f flask-deploy.yaml
kubectl apply -f flask-svc.yaml
kubectl apply -f hpa.yaml

Confirm HPA is Working

kubectl get hpa

You’ll see output like:

NAME         REFERENCE              TARGETS   MINPODS   MAXPODS   REPLICAS
flask-hpa    Deployment/flask-app   20%/50%   1         5         1

Test It: Simulate CPU Load

Let’s hit the /load endpoint repeatedly using hey or curl:

hey -z 30s -c 10 http://<node-ip>:<node-port>/load

After a few seconds, check HPA again:

kubectl get hpa

You should see something like:

flask-hpa    Deployment/flask-app   160%/50%   1         5         3

📈 Result: Pods automatically scaled from 1 → 3 based on CPU!

📉 Cool Down

Once load subsides, Kubernetes gradually scales pods back down to the minimum (1), conserving cluster resources.

Key Takeaways

HPA keeps your app lean when idle and powerful when under load.
CPU requests/limits are essential for autoscaling.
You can scale on memory or custom metrics using autoscaling/v2.
HPA = built-in resiliency + performance optimization.

Scaling Like a Pro: Horizontal Pod Autoscaling in Kubernetes

🎬 Quick Story

What is Horizontal Pod Autoscaling (HPA)?

Let’s Build It – Step by Step

`app.py` (CPU Burner)

Dockerfile

Kubernetes Manifests

Deployment (`flask-deploy.yaml`)

Service (`flask-svc.yaml`)

HPA (`hpa.yaml`)

Deploy & Scale

Confirm HPA is Working

Test It: Simulate CPU Load

📉 Cool Down

Key Takeaways

Comments

Clustering My Thoughts: A K8s Journey

Kubernetes Resiliency: How Self-Healing Clusters Save You From Disaster

More from this blog

Networking & Communication in System Design: The Invisible Roads of Your System

Security & Privacy in System Design: Building Digital Fortresses

Reliability & Fault Tolerance in System Design: Keeping Your System Alive When Everything Goes Wrong

Scalability & Performance in System Design: How to Keep Your System from Crashing When It Gets Famous

Caching in System Design: The Art of Remembering Things Fast.

Command Palette

🎬 Quick Story

What is Horizontal Pod Autoscaling (HPA)?

Let’s Build It – Step by Step

app.py (CPU Burner)

Dockerfile

Kubernetes Manifests

Deployment (flask-deploy.yaml)

Service (flask-svc.yaml)

HPA (hpa.yaml)

Deploy & Scale

Confirm HPA is Working

Test It: Simulate CPU Load

📉 Cool Down

Key Takeaways

Comments

Clustering My Thoughts: A K8s Journey

Kubernetes Resiliency: How Self-Healing Clusters Save You From Disaster

More from this blog

`app.py` (CPU Burner)

Deployment (`flask-deploy.yaml`)

Service (`flask-svc.yaml`)

HPA (`hpa.yaml`)