<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Andrew Odendaal</title>
    <link>https://andrewodendaal.com/</link>
    <description>Hands-on technical guides on AWS, Kubernetes, Terraform, Rust, Python &amp; Go by Andrew Odendaal — cloud architecture and DevOps notes from production, since 2007.</description>
    <language>en-us</language>
    <lastBuildDate>Sat, 16 May 2026 02:33:23 &#43;0000</lastBuildDate>
    <atom:link href="https://andrewodendaal.com/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Rust Async Runtime Deep Dive: Tokio Architecture</title>
      <link>https://andrewodendaal.com/rust-async-runtime-tokio-architecture/</link>
      <pubDate>Fri, 15 May 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/rust-async-runtime-tokio-architecture/</guid>
      <description>&lt;p&gt;Tokio is Rust&amp;rsquo;s killer app for network services. I don&amp;rsquo;t say that lightly. After spending years building concurrent systems in &lt;a href=&#34;https://andrewodendaal.com/go-concurrency-patterns-microservices&#34;&gt;Go&lt;/a&gt; and other languages, Tokio changed how I think about async I/O. It&amp;rsquo;s not just a library — it&amp;rsquo;s a full runtime that turns Rust&amp;rsquo;s zero-cost futures into something you can actually build production services with.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve been running Tokio in production for a while now, and I&amp;rsquo;ve hit enough walls to have opinions about how it works under the hood. This post is the deep dive I wish I&amp;rsquo;d had when I started. If you&amp;rsquo;re coming from my &lt;a href=&#34;https://andrewodendaal.com/rust-for-cloud-engineers-systems-programming&#34;&gt;Rust for cloud engineers&lt;/a&gt; piece, this is the natural next step.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Chaos Engineering on AWS: Fault Injection Simulator Guide</title>
      <link>https://andrewodendaal.com/chaos-engineering-aws-fault-injection-simulator/</link>
      <pubDate>Tue, 12 May 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/chaos-engineering-aws-fault-injection-simulator/</guid>
      <description>&lt;p&gt;You don&amp;rsquo;t know your system is resilient until you&amp;rsquo;ve broken it on purpose.&lt;/p&gt;
&lt;p&gt;I believed our payment processing service was fault tolerant. We ran multi-AZ. We had health checks. We had auto scaling. We had all the boxes ticked on the Well-Architected review. Then us-east-1b had a networking event on a Tuesday afternoon, and we watched a service that was supposed to gracefully fail over instead fall flat on its face. The load balancer kept routing to unhealthy targets for nearly four minutes because our health check intervals were too generous. The database failover triggered but the application&amp;rsquo;s connection pool held stale connections for another two minutes after that. Six minutes of degraded service for a payment processor. That&amp;rsquo;s the kind of thing that gets you a phone call from someone whose title starts with &amp;ldquo;Chief.&amp;rdquo;&lt;/p&gt;</description>
    </item>
    <item>
      <title>Python Data Pipelines with Polars and DuckDB</title>
      <link>https://andrewodendaal.com/python-data-pipelines-polars-duckdb/</link>
      <pubDate>Fri, 08 May 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/python-data-pipelines-polars-duckdb/</guid>
      <description>&lt;p&gt;I&amp;rsquo;m going to say something that&amp;rsquo;ll upset a lot of people: pandas had its run. Polars is just better.&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t mean that lightly. I spent years writing pandas code. I taught pandas to junior developers. I built production systems on pandas. But after migrating several data pipelines to Polars and DuckDB over the past year, I can&amp;rsquo;t go back. The performance difference isn&amp;rsquo;t incremental — it&amp;rsquo;s a different universe.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Why I Built Wyn: A Programming Language That Compiles to C</title>
      <link>https://andrewodendaal.com/why-i-built-wyn-programming-language/</link>
      <pubDate>Wed, 06 May 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/why-i-built-wyn-programming-language/</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve spent years writing Python for DevOps tooling and Go for services. Python is a joy to write but painfully slow for anything compute-heavy. Go is fast but verbose — error handling alone accounts for a third of my code. Rust is powerful but the learning curve is brutal for the kind of tools I build daily.&lt;/p&gt;
&lt;p&gt;So I built &lt;a href=&#34;https://wynlang.com/&#34;&gt;Wyn&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Wyn compiles to C, produces 49KB binaries, builds in under a second, and has a syntax that feels like Python with types. No garbage collector, no VM, no runtime. Just native code.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Kubernetes Gateway API: The Future of Ingress</title>
      <link>https://andrewodendaal.com/kubernetes-gateway-api-future-of-ingress/</link>
      <pubDate>Tue, 05 May 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/kubernetes-gateway-api-future-of-ingress/</guid>
      <description>&lt;p&gt;Gateway API is what Ingress should have been from day one.&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t say that lightly. I&amp;rsquo;ve spent years wrangling Kubernetes Ingress resources, writing controller-specific annotations, and debugging routing issues that only existed because the Ingress spec was too simple for real-world traffic management. Gateway API fixes nearly every complaint I&amp;rsquo;ve ever had, and if you&amp;rsquo;re still running pure Ingress in production, it&amp;rsquo;s time to start planning your migration.&lt;/p&gt;
&lt;p&gt;This isn&amp;rsquo;t a &amp;ldquo;maybe someday&amp;rdquo; technology. Gateway API hit GA in Kubernetes 1.26, major controllers already support it, and the ecosystem is moving fast. I&amp;rsquo;ve migrated three production clusters over the past year and I&amp;rsquo;m not looking back.&lt;/p&gt;</description>
    </item>
    <item>
      <title>AWS Bedrock: Building AI Applications with Foundation Models</title>
      <link>https://andrewodendaal.com/aws-bedrock-building-ai-applications-foundation-models/</link>
      <pubDate>Fri, 01 May 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/aws-bedrock-building-ai-applications-foundation-models/</guid>
      <description>&lt;p&gt;Bedrock is AWS finally getting AI right. I don&amp;rsquo;t say that lightly. I&amp;rsquo;ve watched AWS stumble through SageMaker&amp;rsquo;s complexity, watched teams burn months trying to self-host open-source models on EC2, and watched startups hemorrhage money on OpenAI API calls with zero fallback plan. Bedrock cuts through all of that. You pick a foundation model, call an API, and you&amp;rsquo;re building. No infrastructure. No GPU capacity planning. No model weight management.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Infrastructure Drift Detection and Remediation</title>
      <link>https://andrewodendaal.com/infrastructure-drift-detection-remediation/</link>
      <pubDate>Sat, 25 Apr 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/infrastructure-drift-detection-remediation/</guid>
      <description>&lt;p&gt;If you&amp;rsquo;re not running scheduled &lt;code&gt;terraform plan&lt;/code&gt;, you have drift. You just don&amp;rsquo;t know it yet.&lt;/p&gt;
&lt;p&gt;I learned this the hard way. A colleague made a &amp;ldquo;quick fix&amp;rdquo; in the AWS console — changed a security group rule to unblock a vendor integration. Totally reasonable in the moment. Nobody updated the Terraform code. Three weeks later, I ran a deploy that included security group changes for a different service. Terraform saw the console change as drift, reverted it, and killed the vendor connection. That vendor connection happened to feed data into our payment processing pipeline. Two hours of downtime, a war room, and a very uncomfortable post-mortem later, we had a new rule: nothing touches production infrastructure outside of code. Ever.&lt;/p&gt;</description>
    </item>
    <item>
      <title>AWS Architecture Guide: Production Patterns and Best Practices</title>
      <link>https://andrewodendaal.com/aws-architecture-guide/</link>
      <pubDate>Fri, 24 Apr 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/aws-architecture-guide/</guid>
      <description>&lt;p&gt;Everything I&amp;rsquo;ve learned building on AWS since 2012, organized by domain.&lt;/p&gt;
&lt;h2 id=&#34;serverless&#34;&gt;Serverless&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://andrewodendaal.com/aws-lambda-cold-starts-causes-measurement-mitigation&#34;&gt;AWS Lambda Cold Starts: Causes, Measurement, and Mitigation&lt;/a&gt; — The definitive cold start guide&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://andrewodendaal.com/aws-step-functions-orchestrating-complex-workflows&#34;&gt;AWS Step Functions: Orchestrating Complex Workflows&lt;/a&gt; — State machine patterns&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://andrewodendaal.com/aws-eventbridge-event-driven-architectures&#34;&gt;AWS EventBridge: Event-Driven Architectures&lt;/a&gt; — Building event-driven systems&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;containers&#34;&gt;Containers&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://andrewodendaal.com/aws-ecs-vs-eks-choosing-container-orchestrator-2026&#34;&gt;AWS ECS vs EKS: Choosing Your Container Orchestrator&lt;/a&gt; — When to use each&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://andrewodendaal.com/building-production-ready-docker-images-multi-stage&#34;&gt;Building Production-Ready Docker Images&lt;/a&gt; — Multi-stage builds and distroless&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;data--ai&#34;&gt;Data &amp;amp; AI&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://andrewodendaal.com/aws-aurora-serverless-v2-architecture-performance&#34;&gt;AWS Aurora Serverless v2: Architecture and Performance&lt;/a&gt; — Serverless database deep dive&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://andrewodendaal.com/aws-bedrock-building-ai-applications-foundation-models&#34;&gt;AWS Bedrock: Building AI Applications&lt;/a&gt; — Foundation models in production&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;governance--cost&#34;&gt;Governance &amp;amp; Cost&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://andrewodendaal.com/aws-cost-optimization-techniques-that-work&#34;&gt;AWS Cost Optimization Techniques That Work&lt;/a&gt; — Practical cost reduction&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://andrewodendaal.com/aws-organizations-control-tower-multi-account-strategy&#34;&gt;AWS Organizations and Control Tower&lt;/a&gt; — Multi-account strategy&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://andrewodendaal.com/implementing-zero-trust-networking-aws&#34;&gt;Implementing Zero Trust Networking on AWS&lt;/a&gt; — Network security architecture&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;infrastructure-as-code&#34;&gt;Infrastructure as Code&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://andrewodendaal.com/aws-cdk-vs-terraform-practical-comparison-2026&#34;&gt;AWS CDK vs Terraform: Practical Comparison&lt;/a&gt; — Choosing your IaC tool&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://andrewodendaal.com/terraform-state-management-best-practices-2026&#34;&gt;Terraform State Management Best Practices&lt;/a&gt; — Remote state and locking&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://andrewodendaal.com/terraform-modules-design-patterns-reusable-infrastructure&#34;&gt;Terraform Modules: Design Patterns&lt;/a&gt; — Reusable infrastructure&lt;/li&gt;
&lt;/ul&gt;</description>
    </item>
    <item>
      <title>Kubernetes Guide: From Basics to Production Operations</title>
      <link>https://andrewodendaal.com/kubernetes-guide/</link>
      <pubDate>Thu, 23 Apr 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/kubernetes-guide/</guid>
      <description>&lt;p&gt;This is the hub for everything I&amp;rsquo;ve written about Kubernetes. Whether you&amp;rsquo;re setting up your first cluster or optimizing a multi-tenant production environment, start here.&lt;/p&gt;
&lt;h2 id=&#34;cluster-security&#34;&gt;Cluster Security&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://andrewodendaal.com/kubernetes-rbac-deep-dive-multi-tenant-clusters&#34;&gt;Kubernetes RBAC Deep Dive: Multi-Tenant Clusters&lt;/a&gt; — Role-based access control for teams sharing a cluster&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://andrewodendaal.com/kubernetes-network-policies-practical-security-guide&#34;&gt;Kubernetes Network Policies: Practical Security Guide&lt;/a&gt; — Pod-to-pod traffic control with Calico and Cilium&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://andrewodendaal.com/kubernetes-ebpf-observability-security&#34;&gt;Kubernetes eBPF Observability and Security&lt;/a&gt; — Runtime security with eBPF&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;scaling--performance&#34;&gt;Scaling &amp;amp; Performance&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://andrewodendaal.com/kubernetes-hpa-custom-metrics-autoscaling&#34;&gt;Kubernetes HPA with Custom Metrics&lt;/a&gt; — Autoscaling beyond CPU with Prometheus metrics&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://andrewodendaal.com/k8s-scaling-mastery-manual-hpa-metrics-apis&#34;&gt;K8s Scaling Mastery: Manual, HPA &amp;amp; Metrics APIs&lt;/a&gt; — Complete scaling overview&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://andrewodendaal.com/kubernetes-multi-cluster-management-fleet-rancher&#34;&gt;Kubernetes Multi-Cluster Management&lt;/a&gt; — Fleet and Rancher for multi-cluster&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;networking--ingress&#34;&gt;Networking &amp;amp; Ingress&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://andrewodendaal.com/kubernetes-ingress-controllers-nginx-traefik-istio&#34;&gt;Kubernetes Ingress Controllers: NGINX, Traefik, Istio&lt;/a&gt; — Choosing the right ingress controller&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://andrewodendaal.com/kubernetes-gateway-api-future-of-ingress&#34;&gt;Kubernetes Gateway API: The Future of Ingress&lt;/a&gt; — Gateway API vs Ingress resources&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;advanced-operations&#34;&gt;Advanced Operations&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://andrewodendaal.com/kubernetes-operators-custom-controllers-go&#34;&gt;Kubernetes Operators and Custom Controllers in Go&lt;/a&gt; — Building operators with Kubebuilder&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://andrewodendaal.com/implementing-slos-error-budgets-practice&#34;&gt;Implementing SLOs and Error Budgets in Practice&lt;/a&gt; — SRE practices for K8s workloads&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://andrewodendaal.com/distributed-tracing-opentelemetry-complete-guide&#34;&gt;Distributed Tracing with OpenTelemetry&lt;/a&gt; — Observability for microservices&lt;/li&gt;
&lt;/ul&gt;</description>
    </item>
    <item>
      <title>Kubernetes Multi-Cluster Management with Fleet and Rancher</title>
      <link>https://andrewodendaal.com/kubernetes-multi-cluster-management-fleet-rancher/</link>
      <pubDate>Wed, 22 Apr 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/kubernetes-multi-cluster-management-fleet-rancher/</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve been running Kubernetes in production for years now, and there&amp;rsquo;s a specific kind of pain that only hits you once you cross the threshold from &amp;ldquo;a couple of clusters&amp;rdquo; to &amp;ldquo;wait, how many do we have again?&amp;rdquo; That threshold, for me, was eight clusters. Eight clusters across three cloud providers and two on-prem data centers. And every single one of them had drifted into its own little snowflake.&lt;/p&gt;
&lt;p&gt;This isn&amp;rsquo;t a theoretical post. I&amp;rsquo;m going to walk through how I used Fleet and Rancher to wrangle that mess back into something manageable, and why I think GitOps-driven multi-cluster management is the only sane approach once you&amp;rsquo;re past three or four clusters.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Rust WebAssembly: Building High-Performance Web Applications</title>
      <link>https://andrewodendaal.com/rust-webassembly-high-performance-web-apps/</link>
      <pubDate>Sat, 18 Apr 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/rust-webassembly-high-performance-web-apps/</guid>
      <description>&lt;p&gt;Last year I ported an image processing pipeline from JavaScript to Rust compiled to WebAssembly. The JS version took 1.2 seconds to apply a chain of filters — blur, sharpen, color correction, resize — to a 4K image in the browser. The Rust Wasm version did the same work in 58 milliseconds. Not a typo. A 20x speedup, running in the same browser, on the same machine, called from the same React app.&lt;/p&gt;</description>
    </item>
    <item>
      <title>AWS Aurora Serverless v2: Architecture and Performance Guide</title>
      <link>https://andrewodendaal.com/aws-aurora-serverless-v2-architecture-performance/</link>
      <pubDate>Wed, 15 Apr 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/aws-aurora-serverless-v2-architecture-performance/</guid>
      <description>&lt;p&gt;Aurora Serverless v2 is what v1 should have been. I don&amp;rsquo;t say that lightly — I ran v1 in production for two years and spent more time fighting its scaling quirks than actually building features. The pausing, the cold starts, the inability to add read replicas. It was a product that promised serverless databases and delivered something that felt like a managed instance with extra steps.&lt;/p&gt;
&lt;p&gt;When v2 landed, I was skeptical. AWS has a habit of slapping &amp;ldquo;v2&amp;rdquo; on things that are marginally better. But I migrated a production PostgreSQL workload from RDS provisioned to Aurora Serverless v2 last year, and it genuinely changed how I think about &lt;a href=&#34;https://andrewodendaal.com/database-scaling-strategies&#34;&gt;database scaling strategies&lt;/a&gt;. The scaling is fast, granular, and — this is the part that surprised me — it doesn&amp;rsquo;t drop connections when it scales. That alone makes it a different product entirely.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Implementing SLOs and Error Budgets in Practice</title>
      <link>https://andrewodendaal.com/implementing-slos-error-budgets-practice/</link>
      <pubDate>Sat, 11 Apr 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/implementing-slos-error-budgets-practice/</guid>
      <description>&lt;p&gt;99.99% availability sounds great until you realize that&amp;rsquo;s 4 minutes and 19 seconds of downtime per month. Four minutes. That&amp;rsquo;s barely enough time to get paged, open your laptop, authenticate to the VPN, and find the right dashboard. You haven&amp;rsquo;t even started diagnosing anything yet.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve watched teams commit to four-nines SLOs because someone in a leadership meeting said &amp;ldquo;we need to be best in class.&amp;rdquo; No capacity planning. No discussion about what it would cost. No understanding that the jump from 99.9% to 99.99% isn&amp;rsquo;t a 0.09% improvement — it&amp;rsquo;s a 10x reduction in your margin for error.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Python Packaging in 2026: uv, Poetry, and the Modern Ecosystem</title>
      <link>https://andrewodendaal.com/python-packaging-2026-uv-poetry-modern-ecosystem/</link>
      <pubDate>Wed, 08 Apr 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/python-packaging-2026-uv-poetry-modern-ecosystem/</guid>
      <description>&lt;p&gt;I mass-deleted &lt;code&gt;requirements.txt&lt;/code&gt; files from a monorepo last month. Fourteen of them. Some had unpinned dependencies, some had pins from 2021, one had a comment that said &lt;code&gt;# TODO: fix this&lt;/code&gt; next to a package that no longer exists on PyPI. Nobody cried. The CI pipeline didn&amp;rsquo;t break. We&amp;rsquo;d already moved everything to &lt;code&gt;pyproject.toml&lt;/code&gt; and uv.&lt;/p&gt;
&lt;p&gt;Python packaging has been a punchline for years. &amp;ldquo;It&amp;rsquo;s 2024 and we still can&amp;rsquo;t install packages properly&amp;rdquo; was a meme that wrote itself. But here&amp;rsquo;s the thing — it&amp;rsquo;s 2026 now, and the landscape genuinely changed. Not incrementally. Fundamentally. uv showed up and rewrote the rules. Poetry matured into something reliable. &lt;code&gt;pyproject.toml&lt;/code&gt; won. The old &lt;code&gt;setup.py&lt;/code&gt; + &lt;code&gt;requirements.txt&lt;/code&gt; + &lt;code&gt;virtualenv&lt;/code&gt; + &lt;code&gt;pip&lt;/code&gt; stack isn&amp;rsquo;t dead, but it&amp;rsquo;s legacy. If you&amp;rsquo;re starting a new project today and reaching for that combo, you&amp;rsquo;re choosing the hard path for no reason.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Kubernetes Ingress Controllers: NGINX vs Traefik vs Istio Gateway</title>
      <link>https://andrewodendaal.com/kubernetes-ingress-controllers-nginx-traefik-istio/</link>
      <pubDate>Sat, 04 Apr 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/kubernetes-ingress-controllers-nginx-traefik-istio/</guid>
      <description>&lt;p&gt;NGINX Ingress is the Honda Civic of ingress controllers. Boring, reliable, gets the job done. I&amp;rsquo;ve deployed it on dozens of clusters and it&amp;rsquo;s never been the thing that woke me up at 3am. That&amp;rsquo;s the highest compliment I can give any piece of infrastructure.&lt;/p&gt;
&lt;p&gt;But boring doesn&amp;rsquo;t mean it&amp;rsquo;s always the right choice. I&amp;rsquo;ve spent the last three years running all three major ingress options — NGINX Ingress Controller, Traefik, and Istio&amp;rsquo;s Gateway — across production clusters of varying sizes. I migrated one platform from NGINX to Istio and nearly lost my mind in the process. I&amp;rsquo;ve also watched Traefik quietly become the best option for teams that nobody talks about at conferences.&lt;/p&gt;</description>
    </item>
    <item>
      <title>AWS Step Functions: Orchestrating Complex Workflows</title>
      <link>https://andrewodendaal.com/aws-step-functions-orchestrating-complex-workflows/</link>
      <pubDate>Wed, 01 Apr 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/aws-step-functions-orchestrating-complex-workflows/</guid>
      <description>&lt;p&gt;I deleted roughly 2,000 lines of orchestration code from our payment processing service last year. Replaced it with about 200 lines of Amazon States Language JSON. The system got more reliable, not less. That&amp;rsquo;s the short version of why I think Step Functions is one of the most underappreciated services in AWS.&lt;/p&gt;
&lt;p&gt;The longer version involves a 3am incident, a chain of Lambda functions calling each other through direct invocation, and a payment that got charged twice because nobody could tell where the workflow had actually failed.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Terraform Testing: Unit, Integration, and End-to-End</title>
      <link>https://andrewodendaal.com/terraform-testing-unit-integration-e2e/</link>
      <pubDate>Sat, 28 Mar 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/terraform-testing-unit-integration-e2e/</guid>
      <description>&lt;p&gt;Most Terraform code has zero tests. That&amp;rsquo;s insane for something managing production infrastructure. We wouldn&amp;rsquo;t ship application code without tests — why do we treat the thing that creates our VPCs, databases, and IAM roles like it&amp;rsquo;s somehow less important?&lt;/p&gt;
&lt;p&gt;I learned this lesson the painful way. Last year I pushed a Terraform change that modified a security group rule on a shared networking stack. The plan looked clean. Added an ingress rule, removed an old one. Terraform showed exactly two changes. I approved it, applied it, and went to lunch. By the time I got back, three services were down. The &amp;ldquo;old&amp;rdquo; rule I removed was the one allowing traffic between our application tier and the database subnet. The plan was technically correct — it did exactly what I told it to. But I&amp;rsquo;d told it the wrong thing, and nothing in our pipeline caught it.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Distributed Tracing with OpenTelemetry: A Complete Guide</title>
      <link>https://andrewodendaal.com/distributed-tracing-opentelemetry-complete-guide/</link>
      <pubDate>Wed, 25 Mar 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/distributed-tracing-opentelemetry-complete-guide/</guid>
      <description>&lt;p&gt;I spent four hours on a Tuesday night debugging a 30-second API call. Four hours. The call touched 12 services — auth, inventory, pricing, three different caching layers, a recommendation engine, two legacy adapters, and a handful of internal APIs that nobody remembered writing. Logs told me nothing useful. Metrics showed elevated latency somewhere in the pricing path, but &amp;ldquo;somewhere&amp;rdquo; isn&amp;rsquo;t actionable at 11pm when your on-call phone won&amp;rsquo;t stop buzzing.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Container Security Scanning in CI/CD Pipelines</title>
      <link>https://andrewodendaal.com/container-security-scanning-cicd-pipelines/</link>
      <pubDate>Sat, 21 Mar 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/container-security-scanning-cicd-pipelines/</guid>
      <description>&lt;p&gt;If you&amp;rsquo;re not scanning container images before they hit production, it&amp;rsquo;s only a matter of time before something ugly shows up in your environment. I learned this the hard way, and I&amp;rsquo;m going to walk you through exactly how I set up container security scanning in CI/CD pipelines so you don&amp;rsquo;t repeat my mistakes.&lt;/p&gt;
&lt;h3 id=&#34;the-wake-up-call&#34;&gt;The Wake-Up Call&lt;/h3&gt;
&lt;p&gt;About two years ago, I was running a handful of microservices on ECS. Everything was humming along. Deployments were smooth, monitoring looked clean, the team was shipping features weekly. Life was good.&lt;/p&gt;</description>
    </item>
    <item>
      <title>AWS EventBridge: Building Event-Driven Architectures</title>
      <link>https://andrewodendaal.com/aws-eventbridge-event-driven-architectures/</link>
      <pubDate>Tue, 17 Mar 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/aws-eventbridge-event-driven-architectures/</guid>
      <description>&lt;p&gt;EventBridge is the most underused AWS service. I&amp;rsquo;ll die on that hill. Teams will build these elaborate Rube Goldberg machines out of SNS topics, SQS queues, and Lambda functions stitched together with duct tape and prayers, when EventBridge would&amp;rsquo;ve given them a cleaner architecture in a fraction of the time.&lt;/p&gt;
&lt;p&gt;I know this because I was one of those teams. About two years ago I inherited a system where a single order placement triggered a cascade of 14 SNS topics fanning out to 23 SQS queues. Nobody could tell me what happened when an order was placed without opening a spreadsheet. A spreadsheet. For message routing. When I asked why they hadn&amp;rsquo;t used EventBridge, the answer was &amp;ldquo;we started before it existed and never migrated.&amp;rdquo; Fair enough. But the pain was real — we&amp;rsquo;d get phantom duplicate processing, messages landing in DLQs with no context about where they came from, and debugging meant grepping through six different CloudWatch log groups hoping to find a correlation ID someone remembered to pass along.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Python Performance Optimization: Profiling and Tuning Guide</title>
      <link>https://andrewodendaal.com/python-performance-optimization-profiling-tuning/</link>
      <pubDate>Sat, 14 Mar 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/python-performance-optimization-profiling-tuning/</guid>
      <description>&lt;p&gt;Don&amp;rsquo;t optimize until you&amp;rsquo;ve profiled. I&amp;rsquo;ve watched teams rewrite entire modules that weren&amp;rsquo;t even the bottleneck. Weeks of work, zero measurable improvement. The code was &amp;ldquo;cleaner&amp;rdquo; I guess, but the endpoint was still slow because the actual problem was three database queries hiding inside a template tag.&lt;/p&gt;
&lt;p&gt;I learned this the hard way on a Django project a couple of years back. We had a view that took 4+ seconds to render. The team was convinced it was the serialization layer — we were building a big nested JSON response, lots of related objects. Someone had already started rewriting the serializers when I asked if anyone had actually profiled it. Blank stares.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Kubernetes Operators: Building Custom Controllers in Go</title>
      <link>https://andrewodendaal.com/kubernetes-operators-custom-controllers-go/</link>
      <pubDate>Tue, 10 Mar 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/kubernetes-operators-custom-controllers-go/</guid>
      <description>&lt;p&gt;Operator SDK vs kubebuilder — I pick kubebuilder every time. Operator SDK wraps kubebuilder anyway, adds a layer of abstraction that mostly just gets in the way, and the documentation lags behind. Kubebuilder gives you the scaffolding, the code generation, and then gets out of your face. That&amp;rsquo;s what I want from a framework.&lt;/p&gt;
&lt;p&gt;I built my first operator about two years ago. The task: automate database provisioning for development teams. Every time a team needed a new PostgreSQL instance, they&amp;rsquo;d file a Jira ticket, wait for the platform team to provision it, get credentials back in a Slack DM (yes, really), and manually configure their app. The whole cycle took three to five days. Sometimes longer if someone was on leave.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Rust Error Handling Patterns for Production Applications</title>
      <link>https://andrewodendaal.com/rust-error-handling-patterns-production/</link>
      <pubDate>Sat, 07 Mar 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/rust-error-handling-patterns-production/</guid>
      <description>&lt;p&gt;I got paged at 3am on a Tuesday because a Rust service I&amp;rsquo;d deployed two weeks earlier crashed hard. No graceful degradation, no useful error message in the logs. Just a panic backtrace pointing at line 247 of our config parser: &lt;code&gt;.unwrap()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The config file had a trailing comma that our test fixtures didn&amp;rsquo;t cover. One &lt;code&gt;.unwrap()&lt;/code&gt; on a &lt;code&gt;serde_json::from_str&lt;/code&gt; call, and the whole service went down. I sat there in the dark, laptop balanced on my knees, fixing a one-line bug that should never have made it past code review.&lt;/p&gt;</description>
    </item>
    <item>
      <title>AWS CDK vs Terraform: A Practical Comparison in 2026</title>
      <link>https://andrewodendaal.com/aws-cdk-vs-terraform-practical-comparison-2026/</link>
      <pubDate>Tue, 03 Mar 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/aws-cdk-vs-terraform-practical-comparison-2026/</guid>
      <description>&lt;p&gt;I use both. Terraform for multi-cloud, CDK when it&amp;rsquo;s pure AWS and the team knows TypeScript. That&amp;rsquo;s the short answer. But the long answer has a lot more nuance, and I&amp;rsquo;ve earned that nuance the hard way — including one migration that nearly broke a team&amp;rsquo;s shipping cadence for two months.&lt;/p&gt;
&lt;p&gt;This isn&amp;rsquo;t a &amp;ldquo;which one is better&amp;rdquo; post. I don&amp;rsquo;t think that question makes sense without context. What I can tell you is where each tool shines, where each one will bite you, and how to pick the right one for your situation in 2026. I&amp;rsquo;ve shipped production infrastructure with both, maintained both in anger, and migrated between them. Here&amp;rsquo;s what I&amp;rsquo;ve learned.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Platform Engineering: Building an Internal Developer Platform</title>
      <link>https://andrewodendaal.com/platform-engineering-internal-developer-platform/</link>
      <pubDate>Sat, 28 Feb 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/platform-engineering-internal-developer-platform/</guid>
      <description>&lt;p&gt;Platform engineering is DevOps done right. Or maybe it&amp;rsquo;s DevOps with a product mindset. Either way, it&amp;rsquo;s the recognition that telling every team to &amp;ldquo;own their own infrastructure&amp;rdquo; without giving them decent tooling is a recipe for chaos. I&amp;rsquo;ve watched organisations try the &amp;ldquo;you build it, you run it&amp;rdquo; approach and end up with fifteen different ways to deploy a container, nine half-configured Terraform repos, and developers who spend more time fighting YAML than writing features.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Kubernetes Horizontal Pod Autoscaling with Custom Metrics</title>
      <link>https://andrewodendaal.com/kubernetes-hpa-custom-metrics-autoscaling/</link>
      <pubDate>Wed, 25 Feb 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/kubernetes-hpa-custom-metrics-autoscaling/</guid>
      <description>&lt;p&gt;CPU-based autoscaling is a lie for most web services. There, I said it.&lt;/p&gt;
&lt;p&gt;I spent a painful week last year watching an HPA scale our API pods from 3 to 15 based on CPU utilization. The dashboards looked great — CPU was being &amp;ldquo;managed.&amp;rdquo; Meanwhile, the service was falling over because every single one of those 15 pods was fighting over a connection pool limited to 50 database connections. More pods made the problem worse. We were autoscaling ourselves into an outage.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Go Concurrency Patterns for Microservices</title>
      <link>https://andrewodendaal.com/go-concurrency-patterns-microservices/</link>
      <pubDate>Sat, 21 Feb 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/go-concurrency-patterns-microservices/</guid>
      <description>&lt;p&gt;Goroutines are cheap. Goroutine leaks are not.&lt;/p&gt;
&lt;p&gt;I learned this the hard way at 2am on a Tuesday, staring at Grafana dashboards showing one of our services consuming 40GB of RAM and climbing. The service normally sat around 500MB. We&amp;rsquo;d shipped a change three days earlier — a seemingly innocent fan-out pattern to parallelize calls to a downstream API. The code looked fine. Reviews passed. Tests passed. What we&amp;rsquo;d missed was that when the downstream service timed out, nothing was cancelling the spawned goroutines. They just&amp;hellip; accumulated. Thousands per minute, each holding onto its request body and response buffer, waiting for a context that would never expire because we&amp;rsquo;d used &lt;code&gt;context.Background()&lt;/code&gt; instead of propagating the parent context.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Implementing Zero-Trust Networking on AWS</title>
      <link>https://andrewodendaal.com/implementing-zero-trust-networking-aws/</link>
      <pubDate>Wed, 18 Feb 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/implementing-zero-trust-networking-aws/</guid>
      <description>&lt;p&gt;VPNs are not zero trust. Stop calling them that.&lt;/p&gt;
&lt;p&gt;I can&amp;rsquo;t count how many times I&amp;rsquo;ve sat in architecture reviews where someone points at a Site-to-Site VPN or a Client VPN endpoint and says &amp;ldquo;we&amp;rsquo;re zero trust.&amp;rdquo; No. You&amp;rsquo;ve built a tunnel. A tunnel that, once you&amp;rsquo;re inside, gives you access to everything on the network. That&amp;rsquo;s the opposite of zero trust. That&amp;rsquo;s a castle with a drawbridge and nothing inside but open hallways.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Python Type Hints and Static Analysis in Production Codebases</title>
      <link>https://andrewodendaal.com/python-type-hints-static-analysis-production/</link>
      <pubDate>Sat, 14 Feb 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/python-type-hints-static-analysis-production/</guid>
      <description>&lt;p&gt;If you&amp;rsquo;re writing Python without type hints in 2026, you&amp;rsquo;re making life harder for everyone — including future you. I held out for a while. I liked Python&amp;rsquo;s flexibility, the duck typing, the &amp;ldquo;we&amp;rsquo;re all consenting adults here&amp;rdquo; philosophy. Then a production bug cost my team three days of debugging, and I changed my mind permanently.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m going to walk through how I&amp;rsquo;ve adopted type hints across production codebases, the tooling that makes it practical, and the patterns that actually matter versus the ones that are just academic noise.&lt;/p&gt;</description>
    </item>
    <item>
      <title>AWS Cost Optimization: 15 Techniques That Actually Work</title>
      <link>https://andrewodendaal.com/aws-cost-optimization-techniques-that-work/</link>
      <pubDate>Tue, 10 Feb 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/aws-cost-optimization-techniques-that-work/</guid>
      <description>&lt;p&gt;I got a call from a startup founder last year. &amp;ldquo;Our AWS bill just hit $47,000 and we have twelve engineers.&amp;rdquo; They&amp;rsquo;d been running for about eighteen months, never really looked at the bill, and suddenly it was eating their runway. I spent a week inside their account. We cut it to $28,000. That&amp;rsquo;s a 40% reduction, and honestly most of it was embarrassingly obvious stuff.&lt;/p&gt;
&lt;p&gt;That experience crystallized something I&amp;rsquo;d been thinking about for a while: most AWS cost problems aren&amp;rsquo;t sophisticated. They&amp;rsquo;re neglect. People provision things, forget about them, and the meter keeps running. The fixes aren&amp;rsquo;t glamorous either — they&amp;rsquo;re methodical, sometimes tedious, and they work.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Kubernetes RBAC Deep Dive: Securing Multi-Tenant Clusters</title>
      <link>https://andrewodendaal.com/kubernetes-rbac-deep-dive-multi-tenant-clusters/</link>
      <pubDate>Sat, 07 Feb 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/kubernetes-rbac-deep-dive-multi-tenant-clusters/</guid>
      <description>&lt;p&gt;I&amp;rsquo;m going to say something that&amp;rsquo;ll upset people: if your developers have cluster-admin access in production, you&amp;rsquo;re running on borrowed time. I don&amp;rsquo;t care how small your team is. I don&amp;rsquo;t care if &amp;ldquo;everyone&amp;rsquo;s responsible.&amp;rdquo; It&amp;rsquo;s insane, and I&amp;rsquo;ve got the scars to prove it.&lt;/p&gt;
&lt;p&gt;This article is the RBAC deep dive I wish I&amp;rsquo;d had before a developer on my team ran &lt;code&gt;kubectl delete namespace production-api&lt;/code&gt; on a Friday afternoon. Not maliciously. He thought he was pointed at his local minikube. He wasn&amp;rsquo;t. That namespace had 14 services, and we spent the weekend rebuilding it from manifests that were — let&amp;rsquo;s be generous — &amp;ldquo;mostly&amp;rdquo; up to date.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Terraform Modules: Design Patterns for Reusable Infrastructure</title>
      <link>https://andrewodendaal.com/terraform-modules-design-patterns-reusable-infrastructure/</link>
      <pubDate>Tue, 03 Feb 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/terraform-modules-design-patterns-reusable-infrastructure/</guid>
      <description>&lt;p&gt;I once inherited a project with a single &lt;code&gt;main.tf&lt;/code&gt; that was over 3,000 lines long. No modules. No abstractions. Just one enormous file that deployed an entire production environment — VPCs, ECS clusters, RDS instances, Lambda functions, IAM roles — all jammed together with hardcoded values and copy-pasted blocks. Changing a security group rule meant scrolling for five minutes and praying you edited the right resource. It was, without exaggeration, the worst Terraform I&amp;rsquo;ve ever seen.&lt;/p&gt;</description>
    </item>
    <item>
      <title>GitOps with ArgoCD: From Zero to Production</title>
      <link>https://andrewodendaal.com/gitops-argocd-zero-to-production/</link>
      <pubDate>Sat, 31 Jan 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/gitops-argocd-zero-to-production/</guid>
      <description>&lt;p&gt;ArgoCD won the GitOps war. I&amp;rsquo;ll say it. Flux is fine—it works, it&amp;rsquo;s CNCF graduated, it has its fans—but ArgoCD&amp;rsquo;s UI alone makes it worth choosing. When something&amp;rsquo;s out of sync at 2am, I don&amp;rsquo;t want to be parsing CLI output. I want to click on a resource tree and see exactly what drifted.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve been running ArgoCD in production across multiple clusters for a couple of years now, and this is the guide I wish I&amp;rsquo;d had when I started. We&amp;rsquo;ll go from a fresh install to a production-grade setup with app-of-apps, RBAC, SSO, multi-cluster management, and sane sync policies.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Rust for Cloud Engineers: Why Systems Programming Matters</title>
      <link>https://andrewodendaal.com/rust-for-cloud-engineers-systems-programming/</link>
      <pubDate>Tue, 27 Jan 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/rust-for-cloud-engineers-systems-programming/</guid>
      <description>&lt;p&gt;I started learning Rust as someone who&amp;rsquo;d spent years writing Python scripts and Go services for cloud infrastructure. My first reaction was honestly frustration — the borrow checker felt like a compiler that existed purely to reject my code. But something kept pulling me back. The binaries were tiny. The startup times were instant. And once my code compiled, it just&amp;hellip; worked. No runtime panics at 3am. No mysterious memory leaks creeping up after a week in production.&lt;/p&gt;</description>
    </item>
    <item>
      <title>AWS ECS vs EKS: Choosing the Right Container Orchestrator in 2026</title>
      <link>https://andrewodendaal.com/aws-ecs-vs-eks-choosing-container-orchestrator-2026/</link>
      <pubDate>Sat, 24 Jan 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/aws-ecs-vs-eks-choosing-container-orchestrator-2026/</guid>
      <description>&lt;p&gt;ECS is underrated. Most teams picking EKS don&amp;rsquo;t need it. I&amp;rsquo;ve been saying this for years, and I&amp;rsquo;ll keep saying it until the industry stops treating Kubernetes as the default answer to every container question.&lt;/p&gt;
&lt;p&gt;I watched a team — smart engineers, solid product — choose EKS for what was essentially a three-service CRUD application behind an ALB. They&amp;rsquo;d read the blog posts, watched the conference talks, and decided Kubernetes was the future. Three months later they were still stabilizing the cluster. Not building features. Not shipping value. Debugging Helm chart conflicts, fighting with the AWS VPC CNI plugin, and trying to understand why their pods kept getting evicted. The application itself worked fine. The orchestration layer was the problem.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Building Production-Ready Docker Images: A Multi-Stage Build Guide</title>
      <link>https://andrewodendaal.com/building-production-ready-docker-images-multi-stage/</link>
      <pubDate>Tue, 20 Jan 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/building-production-ready-docker-images-multi-stage/</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve shipped Docker images to production for years now, and the single biggest improvement I&amp;rsquo;ve made wasn&amp;rsquo;t some fancy orchestration tool or a new CI platform. It was learning to write proper multi-stage Dockerfiles. My CI pipeline used to spend 20 minutes rebuilding a bloated 2GB image every push. After switching to multi-stage builds, that image dropped to 45MB and builds finished in under 3 minutes. That&amp;rsquo;s not a typo.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Python Async Programming: asyncio, Tasks, and Real-World Patterns</title>
      <link>https://andrewodendaal.com/python-async-programming-asyncio-patterns/</link>
      <pubDate>Fri, 16 Jan 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/python-async-programming-asyncio-patterns/</guid>
      <description>&lt;p&gt;I avoided asyncio for years. Callbacks, event loops, futures — it all felt like unnecessary complexity when threads worked fine. Then we had an API endpoint making 200 sequential HTTP calls to an upstream service. 45 seconds per request. We threw asyncio.gather at it and the whole thing dropped to 3 seconds. That was the moment it clicked.&lt;/p&gt;
&lt;p&gt;Python&amp;rsquo;s async story has matured enormously. What used to be a mess of &lt;code&gt;yield from&lt;/code&gt; and manual loop management is now clean, readable, and genuinely powerful. If you&amp;rsquo;ve been putting off learning asyncio properly, this is the guide I wish I&amp;rsquo;d had.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Kubernetes Network Policies: A Practical Security Guide</title>
      <link>https://andrewodendaal.com/kubernetes-network-policies-practical-security-guide/</link>
      <pubDate>Mon, 12 Jan 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/kubernetes-network-policies-practical-security-guide/</guid>
      <description>&lt;p&gt;I&amp;rsquo;m going to be blunt here. If you&amp;rsquo;re running Kubernetes without network policies, every pod in your cluster can talk to every other pod. That&amp;rsquo;s a flat network. It&amp;rsquo;s terrifying.&lt;/p&gt;
&lt;p&gt;I learned this the hard way. A few years back, a compromised container in our staging namespace made a direct TCP connection to the production PostgreSQL pod. No firewall, no segmentation, nothing stopping it. The attacker didn&amp;rsquo;t even need to be clever — they just scanned the internal network and found an open port. We had &lt;a href=&#34;https://andrewodendaal.com/kubernetes-pod-security-policies&#34;&gt;pod security policies&lt;/a&gt; in place, RBAC locked down, image scanning, the works. But zero network policies. That one gap made everything else irrelevant.&lt;/p&gt;</description>
    </item>
    <item>
      <title>AWS Lambda Cold Starts: Causes, Measurement, and Mitigation Strategies</title>
      <link>https://andrewodendaal.com/aws-lambda-cold-starts-causes-measurement-mitigation/</link>
      <pubDate>Thu, 08 Jan 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/aws-lambda-cold-starts-causes-measurement-mitigation/</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve lost count of how many times someone&amp;rsquo;s told me &amp;ldquo;Lambda has cold start problems&amp;rdquo; like it&amp;rsquo;s some fatal flaw. It isn&amp;rsquo;t. Cold starts are a tradeoff. You get near-infinite scale and zero idle cost, and in return, the first request to a new execution environment takes a bit longer. That&amp;rsquo;s the deal.&lt;/p&gt;
&lt;p&gt;The real problem is that most teams either panic about cold starts when they don&amp;rsquo;t matter, or ignore them completely when they absolutely do. I&amp;rsquo;ve seen both. We had a payment API on Lambda that was timing out on cold starts during Black Friday — the Java function took 6 seconds to initialize with Spring Boot, and our API Gateway timeout was set to 5 seconds. Every new concurrent request during the traffic spike just&amp;hellip; failed. That was a bad day.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Terraform State Management Best Practices in 2026</title>
      <link>https://andrewodendaal.com/terraform-state-management-best-practices-2026/</link>
      <pubDate>Mon, 05 Jan 2026 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/terraform-state-management-best-practices-2026/</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve been managing Terraform state across production environments for years now, and if there&amp;rsquo;s one thing I&amp;rsquo;m certain of, it&amp;rsquo;s this: state management is where most Terraform setups fall apart. Not modules. Not provider quirks. State.&lt;/p&gt;
&lt;p&gt;The state file is Terraform&amp;rsquo;s memory. It&amp;rsquo;s how Terraform knows what it built, what changed, and what to tear down. Lose it, corrupt it, or let two people write to it at the same time, and you&amp;rsquo;re in for a rough day. I once lost a state file for a networking stack and spent the better part of 6 hours reimporting over 200 resources by hand. VPCs, subnets, route tables, NAT gateways — one at a time. Never again.&lt;/p&gt;</description>
    </item>
    <item>
      <title>SRE Practices for Serverless Architectures: Ensuring Reliability Without Servers</title>
      <link>https://andrewodendaal.com/sre-serverless-architectures/</link>
      <pubDate>Tue, 30 Dec 2025 09:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/sre-serverless-architectures/</guid>
      <description>&lt;p&gt;Serverless architectures have transformed how organizations build and deploy applications, offering benefits like reduced operational overhead, automatic scaling, and consumption-based pricing. However, the ephemeral nature of serverless functions, limited execution contexts, and distributed architecture introduce unique reliability challenges. Site Reliability Engineering (SRE) practices must evolve to address these challenges while maintaining the core principles of reliability, observability, and automation.&lt;/p&gt;
&lt;p&gt;This comprehensive guide explores how to apply SRE practices to serverless architectures, with practical examples and implementation strategies for ensuring reliability in environments where you don&amp;rsquo;t manage the underlying infrastructure.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Rust Year in Review: 2025&#39;s Major Milestones and Achievements</title>
      <link>https://andrewodendaal.com/rust-year-in-review-2025/</link>
      <pubDate>Thu, 25 Dec 2025 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/rust-year-in-review-2025/</guid>
      <description>&lt;p&gt;As 2025 draws to a close, it&amp;rsquo;s time to look back on what has been an extraordinary year for the Rust programming language. From significant language enhancements and ecosystem growth to expanding industry adoption and community achievements, Rust has continued its impressive trajectory. What began as Mozilla&amp;rsquo;s research project has evolved into a mainstream programming language that&amp;rsquo;s reshaping how we think about systems programming, web development, and beyond.&lt;/p&gt;
&lt;p&gt;In this comprehensive year-in-review, we&amp;rsquo;ll explore the major milestones and achievements that defined Rust in 2025. We&amp;rsquo;ll examine the language improvements that landed, the ecosystem developments that expanded Rust&amp;rsquo;s capabilities, the industry adoption trends that solidified its position, and the community growth that fueled its success. Whether you&amp;rsquo;ve been following Rust closely throughout the year or are just catching up, this retrospective will provide valuable insights into Rust&amp;rsquo;s evolution over the past twelve months.&lt;/p&gt;</description>
    </item>
    <item>
      <title>AI-Driven Cybersecurity: Advanced Threat Detection and Response</title>
      <link>https://andrewodendaal.com/ai-driven-cybersecurity/</link>
      <pubDate>Tue, 16 Dec 2025 10:30:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/ai-driven-cybersecurity/</guid>
      <description>&lt;p&gt;The cybersecurity landscape has reached a critical inflection point. As threat actors deploy increasingly sophisticated attacks using automation and artificial intelligence, traditional security approaches are struggling to keep pace. Security teams face overwhelming volumes of alerts, complex attack patterns, and a persistent shortage of skilled personnel. In response, organizations are turning to AI-driven cybersecurity solutions to detect, analyze, and respond to threats with greater speed and accuracy than ever before.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Rust in 2025: Future Directions and Predictions</title>
      <link>https://andrewodendaal.com/rust-future-directions-2025/</link>
      <pubDate>Mon, 15 Dec 2025 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/rust-future-directions-2025/</guid>
      <description>&lt;p&gt;As 2025 draws to a close, the Rust programming language continues its impressive trajectory of growth and adoption. From its humble beginnings as Mozilla&amp;rsquo;s research project to its current status as a mainstream language used by tech giants and startups alike, Rust has proven that its unique combination of safety, performance, and expressiveness fills a critical gap in the programming language landscape. But what lies ahead for Rust in 2025? What new features, ecosystem developments, and adoption trends can we expect to see?&lt;/p&gt;</description>
    </item>
    <item>
      <title>Rust for AI and Machine Learning in 2025: Libraries, Performance, and Use Cases</title>
      <link>https://andrewodendaal.com/rust-ai-machine-learning/</link>
      <pubDate>Fri, 05 Dec 2025 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/rust-ai-machine-learning/</guid>
      <description>&lt;p&gt;Artificial Intelligence and Machine Learning continue to transform industries across the globe, driving innovations in everything from healthcare and finance to autonomous vehicles and creative tools. While Python has long dominated the AI/ML landscape due to its extensive ecosystem and ease of use, Rust has been steadily gaining ground as a compelling alternative for performance-critical components and production deployments. With its focus on safety, speed, and concurrency, Rust offers unique advantages for AI/ML workloads that require efficiency and reliability.&lt;/p&gt;</description>
    </item>
    <item>
      <title>DevOps for Edge Computing: Extending CI/CD to the Network Edge</title>
      <link>https://andrewodendaal.com/devops-edge-computing/</link>
      <pubDate>Tue, 02 Dec 2025 09:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/devops-edge-computing/</guid>
      <description>&lt;p&gt;The rise of edge computing is transforming how organizations deploy and manage applications. By moving computation closer to data sources and end users, edge computing reduces latency, conserves bandwidth, and enables new use cases that weren&amp;rsquo;t previously possible. However, this distributed architecture introduces significant challenges for DevOps teams accustomed to centralized cloud environments.&lt;/p&gt;
&lt;p&gt;This comprehensive guide explores how to extend DevOps principles and practices to edge computing environments, enabling reliable, secure, and scalable deployments across potentially thousands of edge locations.&lt;/p&gt;</description>
    </item>
    <item>
      <title>FinOps Practices for Cloud Cost Optimization in Distributed Systems</title>
      <link>https://andrewodendaal.com/finops-practices-cloud-cost-optimization/</link>
      <pubDate>Mon, 01 Dec 2025 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/finops-practices-cloud-cost-optimization/</guid>
      <description>&lt;p&gt;As organizations increasingly adopt distributed systems in the cloud, managing and optimizing costs has become a critical challenge. The dynamic, scalable nature of cloud resources that makes distributed systems powerful can also lead to unexpected expenses and inefficiencies if not properly managed. This is where FinOps—the practice of bringing financial accountability to cloud spending—comes into play.&lt;/p&gt;
&lt;p&gt;This article explores practical FinOps strategies and techniques for optimizing cloud costs in distributed systems without compromising performance, reliability, or security.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Hiring Cloud Engineers: What to Look For</title>
      <link>https://andrewodendaal.com/hiring-cloud-engineers-what-to-look-for/</link>
      <pubDate>Mon, 01 Dec 2025 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/hiring-cloud-engineers-what-to-look-for/</guid>
      <description>&lt;p&gt;As organizations accelerate their cloud adoption journeys, the demand for skilled cloud engineers has skyrocketed. Building a high-performing cloud team is now a critical competitive advantage, yet finding and retaining top cloud talent remains one of the most significant challenges facing technology leaders today. The rapid evolution of cloud technologies, combined with a global shortage of experienced professionals, has created a fiercely competitive hiring landscape.&lt;/p&gt;
&lt;p&gt;This comprehensive guide explores what to look for when hiring cloud engineers, from essential technical skills and certifications to soft skills and cultural fit. Whether you&amp;rsquo;re building a cloud team from scratch or expanding an existing one, this guide provides actionable strategies for attracting, assessing, and retaining the cloud talent your organization needs to succeed.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Rust Best Practices for Maintainable Code in 2025</title>
      <link>https://andrewodendaal.com/rust-maintainable-code-practices/</link>
      <pubDate>Tue, 25 Nov 2025 08:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/rust-maintainable-code-practices/</guid>
      <description>&lt;p&gt;Writing code that works is just the first step in software development. For projects that need to evolve and be maintained over time, code quality and maintainability are just as important as functionality. Rust, with its emphasis on safety and correctness, provides many tools and patterns that can help you write code that&amp;rsquo;s not only correct but also maintainable. However, like any language, it requires discipline and adherence to best practices to ensure your codebase remains clean, understandable, and sustainable.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Service Mesh Architecture: The SRE&#39;s Guide to Network Reliability</title>
      <link>https://andrewodendaal.com/service-mesh-architecture/</link>
      <pubDate>Tue, 18 Nov 2025 10:00:00 &#43;0400</pubDate>
      <guid>https://andrewodendaal.com/service-mesh-architecture/</guid>
      <description>&lt;p&gt;As organizations adopt microservices architectures, the complexity of service-to-service communication grows exponentially. Managing this communication layer—including routing, security, reliability, and observability—has become one of the most challenging aspects of operating modern distributed systems. Service mesh architecture has emerged as a powerful solution to these challenges, providing a dedicated infrastructure layer that handles service-to-service communication.&lt;/p&gt;
&lt;p&gt;This comprehensive guide explores service mesh architecture from an SRE perspective, focusing on how it enhances reliability, security, and observability in microservices environments.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
