Karpenter v1 vs Cluster Autoscaler: A Production Migration Story
I’d been running Cluster Autoscaler on our production EKS cluster for years. It worked. It wasn’t exciting, it wasn’t cheap, but it …
Read Article →111 articles about devops development, tools, and best practices
I’d been running Cluster Autoscaler on our production EKS cluster for years. It worked. It wasn’t exciting, it wasn’t cheap, but it …
Read Article →Single-account AWS is a ticking time bomb. I don’t say that lightly. I’ve watched it blow up firsthand, and I’ve spent more hours …
Read Article →eBPF is the biggest shift in Linux observability since strace. I don’t say that lightly. I’ve spent years wiring up monitoring stacks, …
Read Article →Security as a gate at the end of the pipeline is security theater. I’ve believed this for years, but it took watching a real incident unfold to …
Read Article →You don’t know your system is resilient until you’ve broken it on purpose.
I believed our payment processing service was fault tolerant. …
Read Article →Gateway API is what Ingress should have been from day one.
I don’t say that lightly. I’ve spent years wrangling Kubernetes Ingress …
Read Article →If you’re not running scheduled terraform plan, you have drift. You just don’t know it yet.
I learned this the hard way. A colleague made …
Read Article →Everything I’ve learned building on AWS since 2012, organized by domain.
This is the hub for everything I’ve written about Kubernetes. Whether you’re setting up your first cluster or optimizing a multi-tenant …
Read Article →I’ve been running Kubernetes in production for years now, and there’s a specific kind of pain that only hits you once you cross the …
Read Article →99.99% availability sounds great until you realize that’s 4 minutes and 19 seconds of downtime per month. Four minutes. That’s barely …
Read Article →I mass-deleted requirements.txt files from a monorepo last month. Fourteen of them. Some had unpinned dependencies, some had pins from 2021, one had a …
NGINX Ingress is the Honda Civic of ingress controllers. Boring, reliable, gets the job done. I’ve deployed it on dozens of clusters and …
Read Article →I deleted roughly 2,000 lines of orchestration code from our payment processing service last year. Replaced it with about 200 lines of Amazon States …
Read Article →Most Terraform code has zero tests. That’s insane for something managing production infrastructure. We wouldn’t ship application code …
Read Article →I spent four hours on a Tuesday night debugging a 30-second API call. Four hours. The call touched 12 services — auth, inventory, pricing, three …
Read Article →If you’re not scanning container images before they hit production, it’s only a matter of time before something ugly shows up in your …
Read Article →EventBridge is the most underused AWS service. I’ll die on that hill. Teams will build these elaborate Rube Goldberg machines out of SNS topics, …
Read Article →Operator SDK vs kubebuilder — I pick kubebuilder every time. Operator SDK wraps kubebuilder anyway, adds a layer of abstraction that mostly just gets …
Read Article →I use both. Terraform for multi-cloud, CDK when it’s pure AWS and the team knows TypeScript. That’s the short answer. But the long answer …
Read Article →Platform engineering is DevOps done right. Or maybe it’s DevOps with a product mindset. Either way, it’s the recognition that telling …
Read Article →CPU-based autoscaling is a lie for most web services. There, I said it.
I spent a painful week last year watching an HPA scale our API pods from 3 to …
Read Article →VPNs are not zero trust. Stop calling them that.
I can’t count how many times I’ve sat in architecture reviews where someone points at a …
Read Article →I got a call from a startup founder last year. “Our AWS bill just hit $47,000 and we have twelve engineers.” They’d been running for …
Read Article →