Chaos Engineering on AWS: Fault Injection Simulator Guide
You don’t know your system is resilient until you’ve broken it on purpose.
I believed our payment processing service was fault tolerant. …
Read Article →In-depth guides, insights, and best practices for modern software engineering
You don’t know your system is resilient until you’ve broken it on purpose.
I believed our payment processing service was fault tolerant. …
Read Article →I’m going to say something that’ll upset a lot of people: pandas had its run. Polars is just better.
I don’t mean that lightly. I …
Read Article →I’ve spent years writing Python for DevOps tooling and Go for services. Python is a joy to write but painfully slow for anything compute-heavy. …
Read Article →Gateway API is what Ingress should have been from day one.
I don’t say that lightly. I’ve spent years wrangling Kubernetes Ingress …
Read Article →Bedrock is AWS finally getting AI right. I don’t say that lightly. I’ve watched AWS stumble through SageMaker’s complexity, watched …
Read Article →If you’re not running scheduled terraform plan, you have drift. You just don’t know it yet.
I learned this the hard way. A colleague made …
Read Article →Everything I’ve learned building on AWS since 2012, organized by domain.
This is the hub for everything I’ve written about Kubernetes. Whether you’re setting up your first cluster or optimizing a multi-tenant …
Read Article →I’ve been running Kubernetes in production for years now, and there’s a specific kind of pain that only hits you once you cross the …
Read Article →Last year I ported an image processing pipeline from JavaScript to Rust compiled to WebAssembly. The JS version took 1.2 seconds to apply a chain of …
Read Article →Aurora Serverless v2 is what v1 should have been. I don’t say that lightly — I ran v1 in production for two years and spent more time fighting …
Read Article →99.99% availability sounds great until you realize that’s 4 minutes and 19 seconds of downtime per month. Four minutes. That’s barely …
Read Article →I mass-deleted requirements.txt files from a monorepo last month. Fourteen of them. Some had unpinned dependencies, some had pins from 2021, one had a …
NGINX Ingress is the Honda Civic of ingress controllers. Boring, reliable, gets the job done. I’ve deployed it on dozens of clusters and …
Read Article →I deleted roughly 2,000 lines of orchestration code from our payment processing service last year. Replaced it with about 200 lines of Amazon States …
Read Article →Most Terraform code has zero tests. That’s insane for something managing production infrastructure. We wouldn’t ship application code …
Read Article →I spent four hours on a Tuesday night debugging a 30-second API call. Four hours. The call touched 12 services — auth, inventory, pricing, three …
Read Article →If you’re not scanning container images before they hit production, it’s only a matter of time before something ugly shows up in your …
Read Article →EventBridge is the most underused AWS service. I’ll die on that hill. Teams will build these elaborate Rube Goldberg machines out of SNS topics, …
Read Article →Don’t optimize until you’ve profiled. I’ve watched teams rewrite entire modules that weren’t even the bottleneck. Weeks of …
Read Article →Operator SDK vs kubebuilder — I pick kubebuilder every time. Operator SDK wraps kubebuilder anyway, adds a layer of abstraction that mostly just gets …
Read Article →I got paged at 3am on a Tuesday because a Rust service I’d deployed two weeks earlier crashed hard. No graceful degradation, no useful error …
Read Article →I use both. Terraform for multi-cloud, CDK when it’s pure AWS and the team knows TypeScript. That’s the short answer. But the long answer …
Read Article →Platform engineering is DevOps done right. Or maybe it’s DevOps with a product mindset. Either way, it’s the recognition that telling …
Read Article →