Karpenter v1 vs Cluster Autoscaler: A Production Migration Story
I’d been running Cluster Autoscaler on our production EKS cluster for years. It worked. It wasn’t exciting, it wasn’t cheap, but it worked. Then in early 2025 Karpenter hit v1 and the API stopped breaking every release, and I finally ran out of excuses. This is the story of that migration — what I learned, what I’d do differently, and why I think Karpenter has effectively won the EKS autoscaling argument.
I’m going to be specific about numbers where I can. I’ll use rounded approximations from our cluster — they’re illustrative, not benchmarks you should quote. Your mileage will vary. But the patterns I’ll describe are real, and the gotchas are exactly the ones that bit me at 2 AM on a Tuesday.
Where Cluster Autoscaler Stops Making Sense
Cluster Autoscaler is fine. It’s been in production for nearly a decade, it’s well-understood, and the failure modes are predictable. The problem is that its model — Auto Scaling Groups as the unit of scaling — is fundamentally incompatible with how modern Kubernetes workloads actually behave.
Here’s the issue I kept hitting. We had a mix of services: latency-sensitive APIs that wanted on-demand instances, batch workers that should run on spot, ML inference pods that needed GPU nodes, and a long tail of small services that could go anywhere. To handle this with Cluster Autoscaler, I had something like nine separate node groups. Each group had its own ASG. Each ASG had its own instance type list. Each had its own scaling policy. The amount of Terraform that existed purely to manage these groups was embarrassing.
When traffic shifted unpredictably — and it always does — Cluster Autoscaler couldn’t move workloads between node groups. A burst of batch jobs would spin up a whole new spot ASG while the on-demand nodes sat half-empty. Bin-packing across the fleet was nonexistent. We were leaving real money on the table, and the operational complexity was real.
If you’ve read my piece on AWS cost optimization techniques, you know I’m not the kind of person who treats cloud bills as a problem for someone else. Karpenter promised to fix the structural problem: instead of pre-defined ASGs, you describe what you want and the controller picks the right instance, right now, from the entire EC2 catalog.
What v1 Actually Changed
Karpenter v1 went GA with a few things that matter for production use. The API is now stable — karpenter.sh/v1 for NodePool and karpenter.k8s.aws/v1 for EC2NodeClass. No more v1beta1 migration headaches every quarter. The disruption controls are first-class. Consolidation policies are explicit and well-defined. Drift detection works without weird flag-flipping.
If you were on the v1beta1 release line, the migration to v1 isn’t drop-in. The conversion webhook handled most of it for me, but disruption budgets — which are genuinely new — needed manual configuration. I’ll get to those.
NodePool Design for a Real Cluster
The first thing I did was throw away the mental model of “node groups.” With Karpenter, you have NodePools, and a NodePool is closer to a policy than a fleet. You can have multiple NodePools that overlap, and Karpenter picks the most appropriate one per pod based on requirements.
Here’s the NodePool I built for general workloads. It targets a wide instance family range and prefers spot, but allows on-demand fallback:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: general-purpose
spec:
template:
metadata:
labels:
workload-class: general
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64", "arm64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: node.kubernetes.io/instance-type
operator: NotIn
values: ["t2.micro", "t2.small", "t3.nano", "t3.micro"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["5"]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
expireAfter: 720h
terminationGracePeriod: 30s
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 1m
budgets:
- nodes: "10%"
- schedule: "0 9 * * mon-fri"
duration: 8h
nodes: "0"
Let me explain what’s actually happening there, because every line is doing work.
The requirements block is the heart of it. Karpenter will provision any instance type that matches all of these requirements. By saying instance-category is c, m, or r and generation is greater than 5, I’m letting Karpenter pick from c6i, c7i, m6i, m7i, r6i, r7i, c7g, m7g, r7g, and so on. It picks the cheapest one that fits the pod’s resource requests. This is why bin-packing improved so dramatically — Karpenter can pick a c7g.2xlarge if that’s the right size, rather than rounding up to whatever the ASG happened to have.
The expireAfter of 720 hours forces every node to be replaced once a month. This is my paranoia about kernel patches and AMI drift. If you’re running eBPF observability tools that depend on kernel features, you really want fresh nodes regularly.
The disruption block is where v1 earns its keep. WhenEmptyOrUnderutilized is the consolidation policy that actually saves money — Karpenter will look at the fleet and figure out if it can replace bigger nodes with smaller ones, or pack pods more tightly and shut down a node entirely. The consolidateAfter of 1 minute means it’ll start trying to consolidate one minute after the cluster reaches steady state.
Disruption Budgets: The Most Important New Feature
Disruption budgets are the v1 feature that should change how you think about Karpenter. In v1beta1, you had a global TTL and that was about it. In v1, you can express things like “never disrupt more than 10% of nodes at once” and “during business hours, don’t disrupt anything.”
Look at that schedule clause again:
budgets:
- nodes: "10%"
- schedule: "0 9 * * mon-fri"
duration: 8h
nodes: "0"
This says: on weekdays from 9 AM for 8 hours, allow zero disruptions. The rest of the time, allow disrupting up to 10% of nodes simultaneously. This is enormous. We had a recurring pain point where Karpenter would consolidate a node mid-business-hours and an unlucky long-running request would get killed despite our PDBs. The schedule budget eliminated that class of incident overnight.
I learned this the hard way. The first week after migration, I had no schedule budget set. Karpenter consolidated a node hosting an analytics service during a customer demo. The customer noticed. I noticed. The schedule clause went in that afternoon.
If you’re running stateful workloads or anything with sticky sessions, you need schedule budgets. They are not optional. The default of “always disrupt up to 10%” is fine for stateless APIs but disastrous for anything else.
EC2NodeClass: Where AMIs and IAM Live
The EC2NodeClass is the AWS-specific configuration. It’s where you wire up subnets, security groups, AMIs, and instance metadata options.
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2023
amiSelectorTerms:
- alias: al2023@latest
role: KarpenterNodeRole-prod-cluster
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: prod-cluster
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: prod-cluster
metadataOptions:
httpEndpoint: enabled
httpProtocolIPv6: disabled
httpPutResponseHopLimit: 2
httpTokens: required
blockDeviceMappings:
- deviceName: /dev/xvdb
ebs:
volumeSize: 100Gi
volumeType: gp3
iops: 3000
throughput: 125
encrypted: true
deleteOnTermination: true
tags:
Environment: production
ManagedBy: karpenter
NodePool: general-purpose
A few things I want to call out. amiSelectorTerms with alias: al2023@latest lets Karpenter automatically use the latest AL2023 AMI. When AWS publishes a new AMI, Karpenter detects the drift, marks existing nodes as drifted, and replaces them according to your disruption budgets. This is the kind of thing that used to require a CI pipeline and a custom Lambda. Now it’s a label.
The httpTokens: required line is non-negotiable in production. IMDSv1 is a SSRF vulnerability waiting to happen. If your NodeClass doesn’t have this, fix it before you read another paragraph.
Spot Instance Handling
Spot interruption handling is something Cluster Autoscaler basically didn’t do well. You needed the AWS Node Termination Handler running as a DaemonSet, watching the spot interruption notice queue, and cordoning nodes. It worked, kind of. The two-minute warning often wasn’t enough to drain a heavily-loaded node gracefully.
Karpenter v1 handles spot interruptions natively. It listens to the EC2 instance state change events, the spot interruption warnings, and the instance rebalance recommendations. When a spot interruption notice arrives, Karpenter immediately starts provisioning replacement capacity and drains the doomed node. You don’t run a separate handler. You don’t configure a queue. It just works.
There’s a subtle thing here: Karpenter actually responds to rebalance recommendations, not just interruption warnings. Rebalance recommendations come earlier (sometimes ten minutes before interruption) and are AWS’s way of saying “this instance is at elevated risk.” Karpenter treats this as a signal to start draining preemptively. We saw a measurable drop in disrupted requests after migration, even though the underlying spot pool was the same.
For batch workloads that can tolerate interruption, the spot strategy I use is essentially “give me anything, I don’t care”:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: karpenter.k8s.aws/instance-family
operator: In
values: ["c6i", "c6a", "c7i", "c7a", "m6i", "m6a", "m7i", "m7a", "c7g", "m7g"]
Wide instance family selection plus spot equals very high availability of capacity. The Karpenter scheduler will pick whichever family currently has the deepest pool.
Consolidation: The Money-Saving Magic
Consolidation is what makes Karpenter actually save money. The controller continuously looks at your fleet and asks: can I replace these nodes with cheaper or fewer nodes while still respecting all pod requirements?
In v1, the policy options are WhenEmpty (only delete empty nodes) or WhenEmptyOrUnderutilized (also replace bigger nodes with smaller ones, or consolidate workloads onto fewer nodes). The latter is more aggressive but is where the savings come from.
The consolidateAfter setting controls how long Karpenter waits at steady state before trying to consolidate. I run 1 minute in production. You might think shorter is better — more aggressive consolidation, more savings. In my experience, anything less than 30 seconds caused thrashing, where Karpenter would consolidate, traffic would spike, and it would have to re-provision. The cost of provisioning a new node is real (cold start of kubelet, image pulls, application warmup) and you don’t want it happening constantly.
There’s a trade-off here that’s worth being explicit about. Aggressive consolidation saves EC2 dollars but increases pod restart frequency. If your workloads are slow to start — JVMs with long warmup, ML models that need to load weights, services with cold-start issues like AWS Lambda — aggressive consolidation can hurt user-perceived availability. Tune consolidateAfter upward and use schedule budgets to protect business hours.
The Migration: How I Actually Did It
Migration in place is risky. I did the cautious thing: I ran Cluster Autoscaler and Karpenter side by side for two weeks.
The trick is that Karpenter only manages nodes it provisioned. It looks at unschedulable pods and decides whether to spin up nodes. Existing nodes managed by Cluster Autoscaler are invisible to Karpenter. So I deployed Karpenter, set up NodePools, then started cordoning Cluster Autoscaler-managed nodes one at a time. Pods evicted from those nodes became unschedulable, and Karpenter provisioned new nodes for them.
After about ten days of this, all production workloads were running on Karpenter-managed nodes. I scaled the Cluster Autoscaler ASGs to zero, then deleted them.
A few things bit me along the way:
The DaemonSets surprised me. Some DaemonSets had node selectors that matched only Cluster Autoscaler labels. They didn’t run on Karpenter nodes until I updated the selectors. This meant our log forwarder briefly stopped collecting logs from new nodes. Caught in alerts within an hour. Mortifying but recoverable.
Tolerations on system pods were another gotcha. We had a few critical system pods that tolerated only the taints Cluster Autoscaler nodes had. Updated them, redeployed.
PVC topology constraints were the third issue. Some pods had EBS volumes already attached in specific AZs. Karpenter respects topology constraints, but you have to make sure your NodePool requirements include the right zones. I had a NodePool that was inadvertently restricted to a single AZ for a few hours and a stateful workload couldn’t reschedule. Caught in monitoring; the fix was a one-line YAML change.
What the Numbers Looked Like
In our cluster — and I want to be clear these are illustrative, not benchmarks — we saw EC2 spend drop by roughly 30% within a month. The breakdown was approximately:
- 15% from better bin-packing (Karpenter picked instance sizes that actually matched workload requirements)
- 10% from increased spot adoption (workloads that previously couldn’t easily go to spot now could, because Karpenter’s spot diversification was so much more robust)
- 5% from idle node consolidation (off-hours and weekends, the cluster shrank significantly)
The harder-to-measure win was operational. We deleted thousands of lines of Terraform. The on-call burden dropped because “node group X is full” alerts disappeared. Cluster scaling events became boring.
Where Cluster Autoscaler Still Wins
I want to be fair. Cluster Autoscaler still has a role. If your workload is extremely homogeneous — every pod looks the same, every node should be the same instance type — Cluster Autoscaler is simpler. If your compliance regime requires Auto Scaling Groups for some reason, you may not have a choice. If you’re running a tiny cluster where the operational simplicity of “managed node groups, no controller to maintain” outweighs efficiency gains, sure, stay on CAS.
But for any cluster with mixed workloads, mixed instance type needs, or any sort of dynamic traffic pattern, Karpenter v1 is the answer. The migration is real work but it’s a one-time cost. The savings are continuous.
Karpenter Plus the Rest of the Stack
A few integrations worth mentioning. If you’re running Argo for deployments — see my GitOps with ArgoCD guide for the setup — Karpenter plays nicely. The NodePool and EC2NodeClass resources are just CRDs, so you can manage them through Argo like anything else.
For multi-tenant clusters, you can give different teams different NodePools with different tolerations and labels. Combined with Kubernetes RBAC for multi-tenant clusters, you get a clean separation where teams can request specific node behavior without stepping on each other.
If you’re choosing between ECS and EKS for container orchestration, Karpenter is one of the things that genuinely tips the scale toward EKS. ECS has its capacity providers but they don’t approach Karpenter’s flexibility.
For infrastructure provisioning, I manage the Karpenter IRSA roles, the discovery tags, and the bootstrap CloudFormation through Terraform. If you’re newer to that workflow, the Terraform state management guide is worth a read — Karpenter touches enough resources that you want your state hygiene tight.
Things I’d Do Differently
If I were starting fresh, I’d skip the side-by-side phase. It was useful learning but it dragged on longer than necessary. With v1’s stability, I’d just deploy Karpenter, deploy NodePools, cordon all CAS nodes one weekend, and let Karpenter take over.
I’d also set up disruption schedule budgets from day one. The “consolidating during business hours” incident was avoidable.
Lastly, I’d be more aggressive about NodePool segmentation by workload class. Having a single general-purpose NodePool sounded clean but in practice, having separate pools for “latency-sensitive” (on-demand only, narrower instance families) and “batch” (spot only, wide instance families) gave better resource isolation and made it easier to reason about disruption.
Karpenter v1 isn’t perfect. The documentation has some rough patches, the CRDs are verbose, and the failure modes when the controller itself has issues can be confusing. But it’s a substantial step forward from Cluster Autoscaler, and I haven’t met an EKS operator who’s gone back.