I’m going to be blunt here. If you’re running Kubernetes without network policies, every pod in your cluster can talk to every other pod. That’s a flat network. It’s terrifying.

I learned this the hard way. A few years back, a compromised container in our staging namespace made a direct TCP connection to the production PostgreSQL pod. No firewall, no segmentation, nothing stopping it. The attacker didn’t even need to be clever — they just scanned the internal network and found an open port. We had pod security policies in place, RBAC locked down, image scanning, the works. But zero network policies. That one gap made everything else irrelevant.

Most teams skip network policies. Don’t be most teams.


First Things First: Your CNI Has to Support It

Here’s something that trips people up constantly. NetworkPolicy is a Kubernetes API resource, but the enforcement happens at the CNI plugin level. If your CNI doesn’t support it, you can create NetworkPolicy objects all day long and nothing will actually happen. No errors, no warnings. Just silent non-enforcement.

The default kubenet CNI that ships with some managed clusters? Doesn’t support network policies. You need one of these:

Calico — the most common choice. Solid, battle-tested, works everywhere. I’ve run it on EKS, GKE, and bare metal. It handles both network policies and can do its own Calico-specific policies for more advanced rules.

Cilium — eBPF-based, newer, incredibly powerful. If you’re on a recent kernel (5.10+), Cilium gives you L7 visibility and policy enforcement that Calico can’t match. I’ve been moving new clusters to Cilium over the past year.

Weave Net — works fine, less common these days.

If you’re on EKS, check out my piece on advanced networking in EKS — the VPC CNI plugin doesn’t enforce network policies natively, so you’ll need Calico or Cilium as an overlay.

Verify your CNI actually works before writing a single policy:

# Deploy two test pods
kubectl run source --image=busybox --restart=Never -- sleep 3600
kubectl run target --image=nginx --restart=Never

# Confirm they can talk
kubectl exec source -- wget -qO- --timeout=3 http://$(kubectl get pod target -o jsonpath='{.status.podIP}')

Start with Default Deny

Every namespace should start with a default deny policy. This flips the model from “everything is allowed” to “nothing is allowed unless explicitly permitted.” It’s the single most impactful thing you can do for container security.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress

That empty podSelector: {} means “apply to every pod in this namespace.” Once this is in place, all traffic — inbound and outbound — is blocked. Your pods can’t even resolve DNS.

Which brings me to the first thing everyone forgets.


Allow DNS or Everything Breaks

The moment you apply a default deny egress policy, DNS resolution stops working. Every pod that tries to reach a service by name will fail. I’ve seen teams apply default deny on a Friday afternoon and then spend the weekend debugging why their entire application is down.

Always pair default deny with a DNS allow:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns
  namespace: production
spec:
  podSelector: {}
  policyTypes:
    - Egress
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
      ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53

I target the kube-system namespace specifically rather than allowing DNS to anywhere. You don’t want pods resolving against arbitrary DNS servers.


Ingress Rules: Who Can Talk to What

Now you selectively open things up. Say you’ve got a frontend that needs to reach your API, and the API needs to reach the database. Nothing else should be allowed.

For the API pods:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-ingress
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: frontend
      ports:
        - protocol: TCP
          port: 8080

Only pods labeled app: frontend in the same namespace can reach the API on port 8080. Everything else gets dropped. If you want a deeper understanding of how pod-to-pod traffic flows, I wrote about it in Kubernetes networking demystified.

For the database:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: db-ingress
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: postgres
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: api
      ports:
        - protocol: TCP
          port: 5432

Only the API can reach Postgres. The frontend can’t. A compromised frontend pod can’t pivot to the database. This is exactly the kind of lateral movement that got us burned in that staging incident I mentioned.


Egress Rules: Controlling Outbound Traffic

Egress policies are harder to get right but arguably more important. A compromised pod that can’t make outbound connections can’t exfiltrate data, can’t reach a command-and-control server, can’t do much of anything useful for an attacker.

Here’s an egress policy that lets the API talk to the database and nothing else (besides DNS):

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-egress
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api
  policyTypes:
    - Egress
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: postgres
      ports:
        - protocol: TCP
          port: 5432
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
      ports:
        - protocol: UDP
          port: 53

If your API also needs to call external services, you can allow specific CIDR ranges:

  egress:
    - to:
        - ipBlock:
            cidr: 52.94.0.0/16  # specific AWS service range
      ports:
        - protocol: TCP
          port: 443

I prefer being explicit about external CIDRs rather than allowing 0.0.0.0/0 on port 443. Yes, it’s more maintenance. But it’s also the difference between a compromised pod being able to reach the entire internet and being limited to your known dependencies.


Namespace Isolation

Cross-namespace traffic is where things get interesting. By default, pods in namespace A can freely reach pods in namespace B. That’s usually not what you want.

To allow traffic from a monitoring namespace to scrape metrics from production:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-monitoring-scrape
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              purpose: monitoring
          podSelector:
            matchLabels:
              app: prometheus
      ports:
        - protocol: TCP
          port: 9090

Notice both namespaceSelector and podSelector are under the same from entry. That’s an AND — the pod must be in the monitoring namespace AND have the prometheus label. If you put them as separate list items, it becomes an OR, which is almost certainly not what you want. I’ve seen this mistake in production more times than I can count.

# THIS IS AN OR - probably wrong
ingress:
  - from:
      - namespaceSelector:
          matchLabels:
            purpose: monitoring
      - podSelector:
          matchLabels:
            app: prometheus

# THIS IS AN AND - probably what you want
ingress:
  - from:
      - namespaceSelector:
          matchLabels:
            purpose: monitoring
        podSelector:
          matchLabels:
            app: prometheus

The YAML indentation difference is subtle. It’s bitten me. It’ll bite you too if you’re not careful.


Debugging Network Policies

When traffic gets blocked and you’re not sure why, here’s my debugging workflow:

# 1. Check what policies apply to a pod
kubectl get networkpolicies -n production

# 2. Describe a specific policy
kubectl describe networkpolicy api-ingress -n production

# 3. Check pod labels match your selectors
kubectl get pods -n production --show-labels

# 4. Test connectivity from a debug pod
kubectl run debug --image=nicolaka/netshoot --restart=Never -it --rm -- \
  bash -c "curl -v --connect-timeout 3 http://api.production.svc:8080/health"

If you’re on Calico, you can see denied connections in the logs:

# Calico flow logs
kubectl logs -n calico-system -l k8s-app=calico-node | grep -i denied

Cilium has even better tooling with Hubble:

# Install Hubble CLI, then:
hubble observe --namespace production --verdict DROPPED

Hubble’s real-time flow visibility is honestly one of the main reasons I’ve been pushing teams toward Cilium. Being able to see exactly which packet got dropped by which policy saves hours of debugging.

For a broader look at securing Kubernetes from container threats, network policies are just one layer — but they’re the foundation everything else builds on.


Rolling This Out Without Breaking Everything

Don’t apply default deny to production on a Monday morning. Here’s the approach I use:

Week 1: Deploy policies in audit mode. Cilium supports this natively. With Calico, you can use GlobalNetworkPolicy with action: Log to see what would be blocked without actually blocking it.

Week 2: Review the logs. Map out actual traffic patterns. You’ll be surprised — there’s always some sidecar or init container making calls you didn’t know about.

Week 3: Write your allow policies based on observed traffic. Apply them alongside the audit policies.

Week 4: Switch to enforce mode. Keep the debug pod handy.

Start with your least critical namespace. Get comfortable with the workflow. Then move to production.


What I’d Do on a New Cluster Today

If I’m setting up a fresh cluster right now, here’s my playbook:

  1. Install Cilium as the CNI from day one
  2. Apply default deny to every namespace immediately — before any workloads are deployed
  3. Require network policies as part of the deployment manifest review process (enforce it in CI with something like Kyverno or OPA Gatekeeper)
  4. Use Hubble for flow visibility from the start
  5. Treat network policies like code — version them, review them, test them

Network policies aren’t glamorous. Nobody’s writing blog posts about how exciting they are. But they’re the difference between a security incident being contained to one pod and it being a full cluster compromise. I’ve seen both outcomes. Trust me — you want the policies in place before you need them.