Kubernetes Autoscaling: HPA, VPA and Cluster Autoscaler - Cloudspark

Kubernetes Autoscaling: HPA, VPA ve Cluster Autoscaler

03 Apr 2026

Understanding Kubernetes Autoscaling

Kubernetes provides three complementary autoscaling mechanisms: Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and Cluster Autoscaler. Together, they ensure your applications have the right amount of resources at the right time, optimizing both performance and cost.

Horizontal Pod Autoscaler (HPA)

HPA automatically adjusts the number of pod replicas based on observed CPU utilization, memory usage, or custom metrics. When CPU exceeds the target threshold (default 80%), HPA creates additional pods across available nodes. The scaling algorithm uses a stabilization window to prevent thrashing — scale-up happens within 15 seconds while scale-down waits 5 minutes by default.

Custom Metrics Scaling

Beyond CPU and memory, HPA can scale on custom application metrics like requests per second, queue depth, or latency percentiles using the Prometheus Adapter or KEDA. This enables event-driven autoscaling where pods scale based on business-relevant metrics.

Vertical Pod Autoscaler (VPA)

VPA automatically adjusts CPU and memory requests/limits for containers. It analyzes historical resource consumption patterns and recommends or applies right-sized resource configurations. VPA operates in three modes: Off (recommendations only), Initial (applies on pod creation), and Auto (applies by evicting and recreating pods).

Cluster Autoscaler

Cluster Autoscaler adjusts the number of nodes based on pending pods that cannot be scheduled due to insufficient resources. It adds nodes when pods are unschedulable and removes underutilized nodes when their pods can be placed elsewhere. The scale-down delay is 10 minutes by default to prevent oscillation.

KEDA: Event-Driven Autoscaling

KEDA extends Kubernetes autoscaling with 60+ event sources including Azure Service Bus, Kafka, Redis, and Prometheus. It can scale deployments to zero when no events are pending, reducing costs for intermittent workloads. KEDA scales in proportion to event backlog, providing precise capacity management.

Best Practices

Use HPA for stateless workloads with variable traffic patterns
Combine HPA with Cluster Autoscaler for fully elastic infrastructure
Set appropriate resource requests — they are the foundation of scheduling decisions
Use Pod Disruption Budgets to maintain availability during autoscaling
Monitor autoscaling events with Prometheus and Grafana dashboards

Key Features and Capabilities

The following are the core capabilities that make this technology essential for modern cloud infrastructure:

Horizontal Pod Autoscaler

Automatically adjusts replica count based on CPU, memory, or custom metrics from Prometheus — supports scaling to zero with KEDA integration

Vertical Pod Autoscaler

Recommends and automatically adjusts CPU and memory requests based on historical usage patterns, eliminating over-provisioning waste

Cluster Autoscaler

Adds or removes nodes based on pending pod scheduling needs, supporting multiple node pools with different VM sizes for workload diversity

KEDA Event-Driven Scaling

Scale based on external event sources — Azure Service Bus queue depth, Kafka lag, HTTP request rate, or cron schedules with 50+ built-in scalers

Predictive Autoscaling

KEDA and custom metrics enable predictive scaling that pre-provisions capacity before known traffic spikes based on historical patterns

Real-World Use Cases

Organizations across industries are leveraging this technology in production environments:

E-Commerce Traffic Spikes

HPA with custom metrics scales web frontends from 3 to 50 replicas during flash sales, while Cluster Autoscaler adds nodes in under 2 minutes

Batch Processing

KEDA scales job workers from 0 to 100 based on Azure Storage Queue depth, processing 1M messages overnight and scaling to zero during business hours

API Gateway

HPA using requests-per-second custom metric maintains consistent latency by scaling API pods proportionally to incoming traffic volume

ML Training

Cluster Autoscaler provisions GPU node pools on-demand for training jobs, deallocating expensive nodes when training completes

Best Practices and Recommendations

Based on enterprise deployments and production experience, these recommendations will help you maximize value:

Always set resource requests accurately — HPA percentage-based scaling and VPA recommendations depend on correct baseline values
Use Pod Disruption Budgets with autoscaling to prevent service disruption during scale-down events and node draining
Configure stabilization windows (5 min scale-up, 15 min scale-down) to prevent rapid flapping during traffic fluctuations
Combine HPA with VPA cautiously — use VPA in recommendation-only mode when HPA is active to avoid conflicting scaling decisions
Set Cluster Autoscaler scan interval to 10 seconds for responsive scaling, and configure max-graceful-termination-sec for stateful workloads
Monitor autoscaler events through kube-events and set alerts on FailedScaleUp to detect resource quota or capacity issues

Frequently Asked Questions

Can HPA and VPA run together?

Not recommended for the same metric. If HPA scales on CPU, VPA should not adjust CPU requests. The best practice is HPA for replica scaling with VPA in recommendation-only mode, or use VPA for non-HPA workloads like stateful services that cannot horizontally scale.

How fast does Cluster Autoscaler add nodes?

Typically 1-3 minutes from detecting unschedulable pods to new nodes being ready. AKS and GKE can use node pool pre-provisioning (overprovisioning) with low-priority placeholder pods to achieve sub-30-second effective scaling for latency-sensitive workloads.

What is KEDA and when should I use it?

KEDA (Kubernetes Event-Driven Autoscaling) extends HPA with 50+ external event scalers. Use KEDA when scaling should respond to business metrics like queue depth, stream lag, or database query results rather than just CPU/memory.

You must be logged in to post a comment.

Understanding Kubernetes Autoscaling

Horizontal Pod Autoscaler (HPA)

Custom Metrics Scaling

Vertical Pod Autoscaler (VPA)

Cluster Autoscaler

KEDA: Event-Driven Autoscaling

Best Practices

Key Features and Capabilities

Horizontal Pod Autoscaler

Vertical Pod Autoscaler

Cluster Autoscaler

KEDA Event-Driven Scaling

Predictive Autoscaling

Real-World Use Cases

E-Commerce Traffic Spikes

Batch Processing

API Gateway

ML Training

Best Practices and Recommendations

Frequently Asked Questions

Can HPA and VPA run together?

How fast does Cluster Autoscaler add nodes?

What is KEDA and when should I use it?

Benzer İçerikler

Kubernetes Service Mesh: Microservice Management with Istio

Advantages of Kubernetes Services and Consulting Services

What is Kubernetes Get started with Container Orchestration