Kubernetes Ingress Controllers: NGINX vs Traefik vs Istio Gateway

NGINX Ingress is the Honda Civic of ingress controllers. Boring, reliable, gets the job done. I’ve deployed it on dozens of clusters and it’s never been the thing that woke me up at 3am. That’s the highest compliment I can give any piece of infrastructure.

But boring doesn’t mean it’s always the right choice. I’ve spent the last three years running all three major ingress options — NGINX Ingress Controller, Traefik, and Istio’s Gateway — across production clusters of varying sizes. I migrated one platform from NGINX to Istio and nearly lost my mind in the process. I’ve also watched Traefik quietly become the best option for teams that nobody talks about at conferences.

Here’s what I actually think about each of them, based on running real traffic through all three.

Why Your Ingress Controller Choice Matters More Than You Think

If you’ve read my piece on K8s networking, you know that getting traffic into a cluster is one of the fundamental problems Kubernetes doesn’t solve out of the box. The Ingress API is just a spec. It doesn’t do anything by itself. You need a controller — a piece of software that watches for Ingress resources and configures a reverse proxy to actually route traffic.

The controller you pick determines your TLS termination strategy, your routing capabilities, your observability story, and — critically — how much YAML you’ll be writing at 11pm when something breaks. It’s the front door to your entire platform. Pick wrong and you’ll feel it every single day.

Most teams default to NGINX because it’s what they know. That’s not a terrible reason, honestly. But it’s worth understanding what you’re giving up.

NGINX Ingress Controller: The Reliable Workhorse

There are actually two NGINX ingress controllers and this confuses everyone. The one maintained by the Kubernetes community (kubernetes/ingress-nginx) and the one maintained by F5/NGINX Inc (nginxinc/kubernetes-ingress). I’m talking about the community one. It’s what 90% of people mean when they say “NGINX Ingress.”

It works exactly how you’d expect. You create an Ingress resource, the controller sees it, generates an nginx.conf, and reloads. Simple mental model. Simple debugging — when something goes wrong, you can exec into the pod and read the generated nginx config like it’s 2012.

A basic setup:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/proxy-body-size: "50m"
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - api.example.com
      secretName: api-tls
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /v1
            pathType: Prefix
            backend:
              service:
                name: api-v1
                port:
                  number: 8080
          - path: /v2
            pathType: Prefix
            backend:
              service:
                name: api-v2
                port:
                  number: 8080

That annotation-driven model is both NGINX’s greatest strength and its biggest weakness. Need rate limiting? There’s an annotation. Need CORS? Annotation. Need custom headers, proxy timeouts, upstream hashing? Annotations, annotations, annotations.

Here’s what rate limiting looks like:

metadata:
  annotations:
    nginx.ingress.kubernetes.io/limit-rps: "50"
    nginx.ingress.kubernetes.io/limit-burst-multiplier: "5"
    nginx.ingress.kubernetes.io/limit-connections: "10"

It works. But once you’ve got 15 annotations on a single Ingress resource, you start to wonder if there’s a better way. There is — but we’ll get to that.

Where NGINX really shines is raw performance. In my benchmarks across a 6-node cluster running on m5.xlarge instances, NGINX consistently handled 45,000+ requests per second with p99 latencies under 5ms for simple HTTP routing. It’s been battle-tested by millions of deployments. The failure modes are well-understood. The community is massive. When you Google an NGINX Ingress problem, you’ll find the answer.

The downsides? Config reloads. Every time you create or modify an Ingress resource, the controller regenerates the entire nginx.conf and triggers a reload. On clusters with hundreds of Ingress resources, this can cause brief connection drops. The community has mitigated this with lua-based dynamic upstreams, but it’s still a fundamentally different architecture than controllers that were designed for dynamic environments from the start.

Also, no native support for TCP/UDP routing through the standard Ingress API. You need ConfigMaps for that, and it’s clunky. If you’re running gRPC services or anything beyond HTTP, you’ll feel the friction.

Traefik: The One Nobody Regrets Choosing

Traefik doesn’t get the conference keynotes. It doesn’t have a massive corporate sponsor pushing it at every KubeCon. But every team I know that’s adopted Traefik has been quietly happy with it. That’s a rare thing in this industry.

Traefik was built for dynamic environments. It doesn’t generate config files and reload — it watches the Kubernetes API and updates its routing table in real time. Zero downtime on config changes. Zero connection drops. This alone makes it worth considering if you’re running a cluster with frequent deployments.

The IngressRoute CRD is where Traefik pulls ahead of NGINX’s annotation soup:

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: api-route
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`api.example.com`) && PathPrefix(`/v1`)
      kind: Rule
      services:
        - name: api-v1
          port: 8080
      middlewares:
        - name: rate-limit
        - name: compress
    - match: Host(`api.example.com`) && PathPrefix(`/v2`)
      kind: Rule
      services:
        - name: api-v2
          port: 8080
          weight: 90
        - name: api-v2-canary
          port: 8080
          weight: 10
  tls:
    certResolver: letsencrypt

Look at that canary deployment — weighted routing is a first-class concept, not an afterthought. And the middleware system is composable. Define a rate limiter once, reference it everywhere:

apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: rate-limit
spec:
  rateLimit:
    average: 100
    burst: 200
    period: 1s

apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: compress
spec:
  compress:
    excludedContentTypes:
      - text/event-stream

This is so much cleaner than cramming everything into annotations. Each middleware is a separate resource you can version, review, and reuse across routes. When I’m reviewing PRs, I’d rather look at a Middleware resource than decode nginx.ingress.kubernetes.io/configuration-snippet with inline Lua.

Traefik’s built-in Let’s Encrypt integration is another killer feature. Point it at an ACME server and it handles certificate provisioning and renewal automatically. No cert-manager, no separate Certificate resources, no debugging why your cert didn’t renew. With NGINX, you’re almost certainly running cert-manager alongside it — which works fine, but it’s another moving part.

Performance-wise, Traefik sits slightly behind NGINX in raw throughput. My benchmarks showed about 38,000 requests per second on the same hardware — roughly 15% less than NGINX. For most applications, this difference is irrelevant. You’ll hit application bottlenecks long before the ingress controller becomes the constraint. But if you’re running a high-traffic API gateway handling tens of thousands of concurrent connections, it’s worth knowing.

Where Traefik falls short is ecosystem maturity for advanced use cases. Need mutual TLS between services? You’ll want a service mesh for that. Need fine-grained traffic splitting based on headers or cookies? Traefik can do it, but the configuration gets verbose. And the documentation, while improved, still has gaps compared to NGINX’s decade of community-written guides.

Istio Gateway: When You Need the Full Arsenal

Istio Gateway isn’t an ingress controller in the traditional sense. It’s the edge component of a full service mesh architecture, and comparing it to NGINX or Traefik is like comparing a Swiss Army knife to a screwdriver. Yes, they both turn screws. But one of them also has a saw, a corkscrew, and a tiny pair of scissors you’ll never use.

Istio replaces the Ingress API entirely with its own Gateway and VirtualService resources:

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: api-gateway
spec:
  selector:
    istio: ingressgateway
  servers:
    - port:
        number: 443
        name: https
        protocol: HTTPS
      tls:
        mode: SIMPLE
        credentialName: api-tls
      hosts:
        - api.example.com

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: api-routes
spec:
  hosts:
    - api.example.com
  gateways:
    - api-gateway
  http:
    - match:
        - uri:
            prefix: /v2
          headers:
            x-canary:
              exact: "true"
      route:
        - destination:
            host: api-v2-canary
            port:
              number: 8080
    - match:
        - uri:
            prefix: /v2
      route:
        - destination:
            host: api-v2
            port:
              number: 8080
          weight: 95
        - destination:
            host: api-v2-canary
            port:
              number: 8080
          weight: 5
    - match:
        - uri:
            prefix: /v1
      route:
        - destination:
            host: api-v1
            port:
              number: 8080
      retries:
        attempts: 3
        perTryTimeout: 2s
        retryOn: 5xx,reset,connect-failure

That’s header-based routing, weighted traffic splitting, and automatic retries — all declarative, all in one resource. You can do circuit breaking, fault injection, mirroring, and timeout policies without touching application code. For teams running complex microservice architectures with dozens of services, this level of control is genuinely transformative.

But here’s the thing nobody tells you at KubeCon.

The Migration War Story: NGINX to Istio

Two years ago I led a migration from NGINX Ingress to Istio on a platform running 40-something microservices. The pitch was compelling: we needed mTLS between services, canary deployments, and better observability. NGINX couldn’t give us that. Istio could. The migration plan looked clean on paper. Six weeks, phased rollout, namespace by namespace.

It took four months.

The first week was fine. We installed Istio with the demo profile, got the dashboard running, felt clever. Then we started migrating actual services and everything went sideways.

Problem one: sidecar injection. Istio injects an Envoy proxy sidecar into every pod. This changes your pod’s networking fundamentally. Init containers that make HTTP calls during startup? They fail because the sidecar isn’t ready yet. We had twelve services with init containers that pulled config from a central service. Every single one broke. The fix — holdApplicationUntilProxyStarts: true — wasn’t obvious, and we burned a week figuring it out.

Problem two: resource consumption. Each Envoy sidecar consumes about 50-100MB of memory and a non-trivial amount of CPU. Across 40 services with 3 replicas each, that’s 120 additional containers. Our cluster’s resource requests jumped by roughly 15%. We had to add two nodes just to accommodate the mesh overhead. Nobody had budgeted for that.

Problem three: debugging got harder, not easier. With NGINX, when a request failed, I’d check the NGINX logs and the application logs. Two places. With Istio, a failed request could be caused by the ingress gateway, the source sidecar, a DestinationRule, a PeerAuthentication policy, the destination sidecar, or the application itself. I once spent an entire afternoon debugging a 503 that turned out to be a mTLS misconfiguration between two namespaces with different PeerAuthentication policies. The error message was completely unhelpful.

Problem four: the YAML explosion. We went from about 30 Ingress resources to roughly 90 Istio resources — Gateways, VirtualServices, DestinationRules, PeerAuthentication policies, AuthorizationPolicies. Our GitOps repo tripled in size. Every new service now required five Istio resources instead of one Ingress resource.

Was it worth it? Eventually, yes. The mTLS encryption, the traffic management, the observability through Kiali — it’s genuinely powerful once it’s stable. But the operational cost is real and ongoing. You need at least one person on the team who deeply understands Istio. Not “read the docs” understands — “can debug Envoy proxy configs at 2am” understands. If you don’t have that person, you’re going to have a bad time.

For teams that need advanced networking capabilities and are willing to invest in the operational overhead, Istio is unmatched. For everyone else, it’s a complexity trap.

Performance Benchmarks: Actual Numbers

I ran these benchmarks on a 6-node EKS cluster (m5.xlarge, Kubernetes 1.29) using hey with 200 concurrent connections hitting a simple Go HTTP server returning a 1KB JSON response. Each test ran for 60 seconds after a 10-second warmup.

Metric	NGINX	Traefik	Istio Gateway
Requests/sec	45,200	38,100	34,800
p50 latency	1.2ms	1.8ms	2.4ms
p99 latency	4.8ms	6.2ms	9.1ms
Memory (controller)	120MB	85MB	180MB*
CPU (idle)	0.05 cores	0.03 cores	0.15 cores*

*Istio numbers include the ingress gateway pod only, not the istiod control plane or sidecars.

NGINX wins on raw throughput. No surprise — it’s NGINX. Traefik is close enough that the difference won’t matter for 99% of workloads. Istio’s overhead comes from the Envoy proxy doing more work per request — mTLS handshakes, telemetry collection, policy evaluation. If you’re using those features, the overhead is justified. If you’re not, you’re paying a tax for nothing.

The more interesting benchmark is config reload behavior. I created and deleted 100 Ingress/IngressRoute/VirtualService resources in rapid succession while running a steady load test:

NGINX: 3 brief connection resets during reload storms. Total dropped requests: ~150.
Traefik: Zero dropped requests. Routing updates were seamless.
Istio: Zero dropped requests. Envoy’s hot restart handled it cleanly.

For clusters with frequent deployments — think 50+ deploys per day — Traefik and Istio’s zero-downtime config updates are a meaningful advantage.

The Kubernetes Gateway API: The Future for All Three

The new Kubernetes Gateway API is worth mentioning because it changes the calculus. It’s a more expressive, role-oriented replacement for the Ingress API, and all three controllers support it (or are adding support).

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: api-route
spec:
  parentRefs:
    - name: main-gateway
  hostnames:
    - api.example.com
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /v1
      backendRefs:
        - name: api-v1
          port: 8080
    - matches:
        - path:
            type: PathPrefix
            value: /v2
      backendRefs:
        - name: api-v2
          port: 8080
          weight: 90
        - name: api-v2-canary
          port: 8080
          weight: 10

This is the same resource regardless of which controller you’re running. Weighted routing, header matching, URL rewriting — all standardized. The Gateway API eliminates the biggest argument against NGINX (annotation hell) and reduces the lock-in concern with Traefik’s CRDs or Istio’s VirtualServices.

It’s not fully mature yet. Some advanced features still require controller-specific extensions. But if you’re starting a new cluster today, I’d build on the Gateway API from day one. It gives you the option to swap controllers later without rewriting your entire routing configuration.

The Decision Framework

After running all three in production, here’s how I’d decide:

Choose NGINX Ingress if:

Your team already knows NGINX
You need maximum raw throughput
You’re running fewer than 50 services
You don’t need canary deployments or traffic splitting
You want the largest community and the most Stack Overflow answers

Choose Traefik if:

You deploy frequently and can’t tolerate config reload drops
You want built-in Let’s Encrypt without cert-manager
You prefer CRDs over annotations for configuration
You’re running a mid-size cluster (10-100 services)
You want something that just works without a dedicated platform team

Choose Istio Gateway if:

You need mTLS between services (not just at the edge)
You need advanced traffic management: fault injection, mirroring, circuit breaking
You have a dedicated platform team that can own the mesh
You’re running 50+ microservices with complex inter-service communication
You need the observability that comes with a full service mesh

If you’re unsure, start with Traefik. Seriously. It’s the best balance of capability and simplicity. You can always migrate to Istio later if you outgrow it — and you’ll know when you’ve outgrown it because you’ll be building service mesh features on top of your ingress controller instead of using a purpose-built tool.

Security Considerations Across All Three

Regardless of which controller you pick, there are security basics that apply everywhere. I’ve written about K8s network policies separately, but the ingress layer has its own concerns.

TLS termination — all three handle this well, but the defaults differ. NGINX defaults to TLS 1.2 minimum, which is fine. Traefik defaults to TLS 1.2 with a sensible cipher suite. Istio defaults to TLS 1.2 for gateway traffic and enforces mTLS 1.3 for mesh-internal traffic. Make sure you’re explicitly setting your minimum TLS version regardless:

# NGINX - via ConfigMap
data:
  ssl-protocols: "TLSv1.2 TLSv1.3"
  ssl-ciphers: "ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256"

Rate limiting — NGINX and Traefik both support it natively. With Istio, you’ll need an EnvoyFilter or the Istio rate limiting service, which is a separate deployment. It works, but it’s another thing to manage.

WAF capabilities — NGINX has ModSecurity support through an annotation. Traefik and Istio don’t have native WAF. For all three, I’d recommend putting a cloud WAF (AWS WAF, Cloudflare) in front of the ingress controller rather than trying to do application-layer filtering at the Kubernetes level.

Production Hardening Tips

A few things I’ve learned the hard way that apply to all three controllers:

Run multiple replicas with pod anti-affinity. Your ingress controller is a single point of failure. Run at least 2 replicas spread across availability zones. This isn’t optional.

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app.kubernetes.io/name: ingress-nginx
        topologyKey: kubernetes.io/hostname

Set resource requests and limits. I’ve seen ingress controllers OOM-killed during traffic spikes because nobody set memory limits. For NGINX, start with 256Mi request / 512Mi limit. For Traefik, 128Mi / 256Mi. For Istio’s gateway, 256Mi / 512Mi. Tune from there based on your traffic patterns.

Monitor the right metrics. Connection count, request latency percentiles, error rates, and config reload duration (for NGINX). All three expose Prometheus metrics. If you’re not scraping them, you’re flying blind.

Use the Gateway API for K8s services routing where possible. Even if you’re on NGINX today, writing your routes in Gateway API format means you can switch controllers without rewriting everything. That portability is worth the slight learning curve.

Where I’ve Landed

I run NGINX on clusters where simplicity matters most — small teams, straightforward routing, no fancy traffic management. I run Traefik on clusters where I want a good developer experience without a dedicated platform team. And I run Istio on exactly two clusters where the microservice complexity genuinely demands a service mesh.

The worst decision is picking Istio because it looks impressive on an architecture diagram. The second worst is sticking with NGINX when you’ve outgrown it and are duct-taping features together with annotations and Lua snippets.

Pick the tool that matches your team’s size, your application’s complexity, and your willingness to operate the thing at 2am. Everything else is noise.