AWS Architecture Guide: Production Patterns and Best Practices
Everything I’ve learned building on AWS since 2012, organized by domain.
103 articles about devops development, tools, and best practices
Everything I’ve learned building on AWS since 2012, organized by domain.
This is the hub for everything I’ve written about Kubernetes. Whether you’re setting up your first cluster or optimizing a multi-tenant …
Read Article →I’ve been running Kubernetes in production for years now, and there’s a specific kind of pain that only hits you once you cross the …
Read Article →99.99% availability sounds great until you realize that’s 4 minutes and 19 seconds of downtime per month. Four minutes. That’s barely …
Read Article →I mass-deleted requirements.txt files from a monorepo last month. Fourteen of them. Some had unpinned dependencies, some had pins from 2021, one had a …
NGINX Ingress is the Honda Civic of ingress controllers. Boring, reliable, gets the job done. I’ve deployed it on dozens of clusters and …
Read Article →I deleted roughly 2,000 lines of orchestration code from our payment processing service last year. Replaced it with about 200 lines of Amazon States …
Read Article →Most Terraform code has zero tests. That’s insane for something managing production infrastructure. We wouldn’t ship application code …
Read Article →I spent four hours on a Tuesday night debugging a 30-second API call. Four hours. The call touched 12 services — auth, inventory, pricing, three …
Read Article →If you’re not scanning container images before they hit production, it’s only a matter of time before something ugly shows up in your …
Read Article →EventBridge is the most underused AWS service. I’ll die on that hill. Teams will build these elaborate Rube Goldberg machines out of SNS topics, …
Read Article →Operator SDK vs kubebuilder — I pick kubebuilder every time. Operator SDK wraps kubebuilder anyway, adds a layer of abstraction that mostly just gets …
Read Article →I use both. Terraform for multi-cloud, CDK when it’s pure AWS and the team knows TypeScript. That’s the short answer. But the long answer …
Read Article →Platform engineering is DevOps done right. Or maybe it’s DevOps with a product mindset. Either way, it’s the recognition that telling …
Read Article →CPU-based autoscaling is a lie for most web services. There, I said it.
I spent a painful week last year watching an HPA scale our API pods from 3 to …
Read Article →VPNs are not zero trust. Stop calling them that.
I can’t count how many times I’ve sat in architecture reviews where someone points at a …
Read Article →I got a call from a startup founder last year. “Our AWS bill just hit $47,000 and we have twelve engineers.” They’d been running for …
Read Article →I’m going to say something that’ll upset people: if your developers have cluster-admin access in production, you’re running on …
Read Article →I once inherited a project with a single main.tf that was over 3,000 lines long. No modules. No abstractions. Just one enormous file that deployed an …
ArgoCD won the GitOps war. I’ll say it. Flux is fine—it works, it’s CNCF graduated, it has its fans—but ArgoCD’s UI alone makes it …
Read Article →I started learning Rust as someone who’d spent years writing Python scripts and Go services for cloud infrastructure. My first reaction was …
Read Article →ECS is underrated. Most teams picking EKS don’t need it. I’ve been saying this for years, and I’ll keep saying it until the industry …
Read Article →I’ve shipped Docker images to production for years now, and the single biggest improvement I’ve made wasn’t some fancy orchestration …
Read Article →I’m going to be blunt here. If you’re running Kubernetes without network policies, every pod in your cluster can talk to every other pod. …
Read Article →