Kubernetes Networking: Advanced Cluster Communication
Master Kubernetes networking including CNI.
Introduction and Setup
Kubernetes networking has a reputation for being complex, and honestly, that reputation is well-deserved. The challenge isn’t that the concepts are inherently difficult—it’s that they’re completely different from traditional networking. If you’re coming from a world of VLANs, subnets, and static IP addresses, Kubernetes networking requires a fundamental shift in thinking.
The good news is that once you understand the core principles, Kubernetes networking is actually quite elegant. It’s dynamic, software-defined, and surprisingly simple—once you stop fighting it and start working with its design philosophy.
The Kubernetes Networking Model
Traditional networking thinks in terms of physical locations and fixed addresses. Kubernetes thinks in terms of labels, services, and policies. This shift in mindset is crucial because it affects every networking decision you’ll make.
Every pod gets its own IP address, but here’s the catch—you should never care what that IP is. Pods are ephemeral; they come and go with different IPs every time. The moment you start hardcoding pod IPs, you’ve missed the point entirely.
Instead, Kubernetes uses Services as stable network endpoints. Think of Services as phone numbers that always reach the right person, even if that person moves apartments. The Service handles the routing; you just dial the number.
# A simple service that demonstrates the concept
apiVersion: v1
kind: Service
metadata:
name: web-service
spec:
selector:
app: web
ports:
- port: 80
targetPort: 8080
This Service creates a stable endpoint called web-service
that routes traffic to any pod labeled app: web
. The pods can restart, move to different nodes, or scale up and down—the Service endpoint remains constant.
Understanding Pod-to-Pod Communication
The first networking principle in Kubernetes is that any pod can talk to any other pod without NAT (Network Address Translation). This sounds scary from a security perspective, and it should—by default, your cluster is one big flat network.
When I first learned this, I panicked. “You mean my database pod can be reached from anywhere in the cluster?” Yes, exactly. This is why network policies exist, but we’ll get to those later.
This flat network model makes development easier but requires you to think about security from day one. The good news is that Kubernetes gives you the tools to lock things down properly.
DNS and Service Discovery
Kubernetes runs its own DNS server inside the cluster, and it’s one of the most elegant parts of the system. Every Service automatically gets a DNS name following a predictable pattern: service-name.namespace.svc.cluster.local
.
In practice, you rarely need the full DNS name. If you’re in the same namespace, just use the service name:
# Your app can connect to the database like this
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
template:
spec:
containers:
- name: app
image: myapp:latest
env:
- name: DATABASE_URL
value: "postgres://user:pass@postgres-service:5432/mydb"
Notice how we’re using postgres-service
as the hostname. Kubernetes DNS resolves this to the actual pod IPs behind the service. It’s like having a phone book that updates itself automatically.
The Container Network Interface (CNI)
Here’s where things get interesting. Kubernetes doesn’t actually implement networking—it delegates that to CNI plugins. Different CNI plugins have different capabilities, and choosing the right one affects what networking features you can use.
Popular CNI plugins include:
- Flannel: Simple and reliable, good for basic setups
- Calico: Advanced features like network policies and BGP routing
- Cilium: eBPF-based with advanced security and observability
- Weave: Easy setup with built-in encryption
The CNI plugin you choose determines whether you can use network policies, what kind of load balancing you get, and how traffic flows between nodes. Most managed Kubernetes services (EKS, GKE, AKS) choose this for you, but it’s worth understanding the implications.
Service Types and External Access
Services come in different types, each solving different networking challenges:
ClusterIP is the default—it creates an internal-only endpoint. Perfect for backend services that don’t need external access.
NodePort opens a port on every node in your cluster. It’s simple but not very elegant for production use:
apiVersion: v1
kind: Service
metadata:
name: web-nodeport
spec:
type: NodePort
selector:
app: web
ports:
- port: 80
targetPort: 8080
nodePort: 30080
LoadBalancer is what you want for production external access. If you’re on a cloud provider, this creates an actual load balancer:
apiVersion: v1
kind: Service
metadata:
name: web-loadbalancer
spec:
type: LoadBalancer
selector:
app: web
ports:
- port: 80
targetPort: 8080
Setting Up Your Networking Environment
For this guide, you’ll need a Kubernetes cluster with a CNI that supports network policies. If you’re using a managed service:
- EKS: Use the AWS VPC CNI with Calico for network policies
- GKE: Enable network policy support when creating the cluster
- AKS: Use Azure CNI with network policies enabled
For local development, I recommend using kind (Kubernetes in Docker) with Calico:
# Create a kind cluster with Calico
cat <<EOF | kind create cluster --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
disableDefaultCNI: true
podSubnet: "10.244.0.0/16"
nodes:
- role: control-plane
- role: worker
- role: worker
EOF
# Install Calico
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/calico.yaml
Common Networking Gotchas
Let me save you some debugging time with the issues I see most often:
DNS resolution fails: Usually means CoreDNS isn’t running properly. Check kubectl get pods -n kube-system
and look for coredns pods.
Services can’t reach pods: Check that your service selector matches your pod labels exactly. Case matters, and typos are common.
External traffic can’t reach services: Make sure you’re using the right service type and that your cloud provider supports LoadBalancer services.
Pods can’t reach external services: Could be DNS configuration or network policies blocking egress traffic.
Testing Your Network Setup
Before diving into complex networking scenarios, let’s verify everything works:
# Create a test pod for network debugging
kubectl run netshoot --image=nicolaka/netshoot -it --rm -- /bin/bash
# Inside the pod, test DNS resolution
nslookup kubernetes.default.svc.cluster.local
# Test connectivity to a service
curl http://web-service
# Check what DNS servers are configured
cat /etc/resolv.conf
The netshoot
image is invaluable for network debugging—it includes tools like curl, dig, nslookup, and tcpdump.
What’s Coming Next
Understanding these networking fundamentals sets you up for the more advanced topics we’ll cover:
- Service discovery patterns and DNS configuration
- Ingress controllers and HTTP routing
- Network policies for security and segmentation
- Advanced networking with service mesh
- Troubleshooting network issues in production
The key insight to remember: Kubernetes networking is about abstractions, not infrastructure. Stop thinking about IP addresses and start thinking about services, labels, and policies. Once that clicks, everything else becomes much clearer.
In the next part, we’ll dive deep into service discovery and DNS, exploring how applications find and communicate with each other in a dynamic container environment.
Core Concepts and Fundamentals
Service Discovery and DNS
I once spent an entire afternoon debugging why my microservices couldn’t find each other, only to discover I’d been using the wrong DNS names. The frustrating part? The error messages were completely unhelpful. “Connection refused” tells you nothing about whether you’re using the wrong hostname, wrong port, or if the service doesn’t exist at all.
Service discovery in Kubernetes is both simpler and more complex than traditional networking. Simpler because DNS “just works” most of the time. More complex because understanding the nuances can save you hours of debugging when things go wrong.
How Kubernetes DNS Actually Works
Every Kubernetes cluster runs CoreDNS (or kube-dns in older clusters) as a system service. This isn’t just any DNS server—it’s specifically designed to understand Kubernetes resources and automatically create DNS records for your services.
When you create a Service, Kubernetes immediately creates several DNS records:
my-service.my-namespace.svc.cluster.local
(full FQDN)my-service.my-namespace
(shorter form)my-service
(if you’re in the same namespace)
The beauty is that you rarely need to think about this. Your applications can use the simplest form that works, and Kubernetes handles the rest.
# This service automatically gets DNS entries
apiVersion: v1
kind: Service
metadata:
name: api-service
namespace: production
spec:
selector:
app: api
ports:
- port: 80
targetPort: 8080
From any pod in the production
namespace, you can reach this service at api-service
. From other namespaces, use api-service.production
. The full FQDN api-service.production.svc.cluster.local
works from anywhere, but it’s unnecessarily verbose.
Service Discovery Patterns
The most common pattern I see in production is environment-based service discovery. Instead of hardcoding service names, use environment variables that can change between environments:
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
template:
spec:
containers:
- name: app
image: myapp:latest
env:
- name: API_URL
value: "http://api-service"
- name: DATABASE_HOST
value: "postgres-service"
- name: REDIS_URL
value: "redis://redis-service:6379"
This approach lets you use different service names in different environments (dev, staging, production) without changing your application code.
Understanding Service Endpoints
Services don’t actually handle traffic—they’re just configuration objects that tell Kubernetes how to route requests. The real work happens at the endpoint level. When you create a Service, Kubernetes automatically creates an Endpoints object that tracks which pods are ready to receive traffic.
You can see this in action:
# Check which pods are behind a service
kubectl get endpoints api-service
# Get detailed endpoint information
kubectl describe endpoints api-service
This is crucial for debugging. If your service isn’t working, check the endpoints. No endpoints usually means your service selector doesn’t match any pods, or the pods aren’t ready.
Headless Services and Direct Pod Access
Sometimes you don’t want load balancing—you want to talk directly to individual pods. This is common with databases or when you need to maintain session affinity. Headless services solve this by returning pod IPs directly instead of a service IP.
apiVersion: v1
kind: Service
metadata:
name: database-headless
spec:
clusterIP: None # This makes it headless
selector:
app: database
ports:
- port: 5432
With a headless service, DNS queries return multiple A records—one for each pod. Your application can then choose which pod to connect to, or use all of them for different purposes.
Cross-Namespace Communication
By default, services are only accessible within their namespace using the short name. For cross-namespace communication, you have several options:
Use the namespace-qualified name:
env:
- name: SHARED_API_URL
value: "http://shared-api.shared-services"
Or create a Service in your namespace that points to a service in another namespace using ExternalName:
apiVersion: v1
kind: Service
metadata:
name: external-api
namespace: my-app
spec:
type: ExternalName
externalName: shared-api.shared-services.svc.cluster.local
Now your app can use external-api
as if it were a local service, but it actually routes to the shared service.
DNS Configuration and Troubleshooting
Each pod gets its DNS configuration from the cluster’s DNS service. You can see this configuration:
# Check DNS config in a pod
kubectl exec -it my-pod -- cat /etc/resolv.conf
The typical configuration looks like:
nameserver 10.96.0.10
search my-namespace.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
The search
domains are crucial—they let you use short names like api-service
instead of full FQDNs. The ndots:5
setting means DNS will try the search domains for any hostname with fewer than 5 dots.
Service Discovery for External Services
Not everything runs in your cluster. For external databases, APIs, or legacy services, you can create Services without selectors:
apiVersion: v1
kind: Service
metadata:
name: external-database
spec:
ports:
- port: 5432
---
apiVersion: v1
kind: Endpoints
metadata:
name: external-database
subsets:
- addresses:
- ip: 192.168.1.100
ports:
- port: 5432
This creates a service that routes to an external IP address. Your applications can use external-database
just like any other service, making it easy to migrate between internal and external services.
Load Balancing and Session Affinity
By default, Services use round-robin load balancing between healthy pods. Sometimes you need more control:
apiVersion: v1
kind: Service
metadata:
name: sticky-service
spec:
selector:
app: web
sessionAffinity: ClientIP # Route same client to same pod
ports:
- port: 80
Session affinity based on client IP ensures that requests from the same client always go to the same pod. This is useful for applications that store session data locally instead of in a shared store.
Service Mesh and Advanced Discovery
For complex microservice architectures, consider a service mesh like Istio or Linkerd. Service meshes provide advanced service discovery features:
- Automatic mutual TLS between services
- Advanced load balancing algorithms
- Circuit breakers and retry policies
- Detailed traffic metrics and tracing
Here’s a simple Istio DestinationRule that adds circuit breaking:
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: api-circuit-breaker
spec:
host: api-service
trafficPolicy:
outlierDetection:
consecutiveErrors: 3
interval: 30s
baseEjectionTime: 30s
Debugging Service Discovery Issues
When service discovery fails, follow this debugging checklist:
- Check if the service exists:
kubectl get svc
- Verify the service has endpoints:
kubectl get endpoints service-name
- Test DNS resolution:
kubectl exec -it pod -- nslookup service-name
- Check pod labels match service selector:
kubectl get pods --show-labels
- Verify pods are ready:
kubectl get pods
(look for Ready column)
The most common issues are:
- Typos in service names or selectors
- Pods not passing readiness checks
- Network policies blocking traffic (we’ll cover this in part 4)
- Wrong namespace or DNS configuration
Performance Considerations
DNS lookups add latency to every request. In high-performance applications, consider:
- Caching DNS results in your application
- Using IP addresses for very high-frequency internal calls
- Configuring appropriate DNS timeouts and retries
- Using headless services to reduce DNS overhead
However, premature optimization is the root of all evil. Start with the simple approach and optimize only when you have actual performance problems.
What’s Next
Service discovery gives you the foundation for reliable communication between services. In the next part, we’ll explore Ingress controllers and how to expose your services to external traffic with proper HTTP routing, SSL termination, and load balancing.
Understanding service discovery is crucial because it affects every other networking decision you’ll make. Whether you’re implementing network policies, setting up ingress, or debugging connectivity issues, it all comes back to how services find and communicate with each other.
Practical Applications and Examples
Ingress and External Traffic Management
Exposing Kubernetes services to the internet presents an interesting challenge. You could create a LoadBalancer service for each application, but that quickly becomes expensive and unwieldy—imagine managing dozens of load balancers, each with its own IP address and SSL certificate. There has to be a better way.
Ingress controllers solve this problem elegantly by providing HTTP routing, SSL termination, and load balancing all in one place. They act as a single entry point for external traffic, then route requests to the appropriate services based on hostnames, paths, and other HTTP attributes. Like most Kubernetes concepts, Ingress seems simple until you need it to work in production.
Understanding Ingress Controllers
An Ingress controller is essentially a reverse proxy that runs inside your cluster and routes external HTTP/HTTPS traffic to your services. The key insight is that Ingress is just configuration—you need an Ingress controller to actually implement that configuration.
Popular Ingress controllers include:
- NGINX Ingress Controller: Most common, reliable, lots of features
- Traefik: Great for microservices, automatic service discovery
- HAProxy Ingress: High performance, enterprise features
- Istio Gateway: Part of the Istio service mesh
- Cloud provider controllers: ALB (AWS), GCE (Google), etc.
The choice matters because different controllers have different capabilities and configuration options.
Setting Up NGINX Ingress Controller
Let’s start with the most popular option. Installing NGINX Ingress Controller is straightforward:
# Install NGINX Ingress Controller
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.8.1/deploy/static/provider/cloud/deploy.yaml
# Wait for it to be ready
kubectl wait --namespace ingress-nginx \
--for=condition=ready pod \
--selector=app.kubernetes.io/component=controller \
--timeout=120s
This creates a LoadBalancer service that receives all external traffic and routes it based on your Ingress rules.
Basic HTTP Routing
The simplest Ingress rule routes all traffic to a single service:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: simple-ingress
spec:
ingressClassName: nginx
rules:
- host: myapp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: web-service
port:
number: 80
This routes all traffic for myapp.example.com
to the web-service
. The pathType: Prefix
means any path starting with /
(so, everything) gets routed to this service.
Path-Based Routing
More commonly, you’ll want to route different paths to different services:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: path-based-ingress
spec:
ingressClassName: nginx
rules:
- host: api.example.com
http:
paths:
- path: /users
pathType: Prefix
backend:
service:
name: user-service
port:
number: 80
- path: /orders
pathType: Prefix
backend:
service:
name: order-service
port:
number: 80
- path: /
pathType: Prefix
backend:
service:
name: frontend-service
port:
number: 80
Order matters here. More specific paths should come before general ones. The /
path acts as a catch-all for anything that doesn’t match the other paths.
SSL/TLS Termination
SSL termination is where Ingress controllers really shine. Instead of managing certificates in each service, you handle them centrally:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: tls-ingress
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
ingressClassName: nginx
tls:
- hosts:
- secure.example.com
secretName: secure-example-tls
rules:
- host: secure.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: web-service
port:
number: 80
The cert-manager.io/cluster-issuer
annotation tells cert-manager to automatically obtain and renew SSL certificates from Let’s Encrypt. The certificate gets stored in the secure-example-tls
secret.
Advanced Routing with Annotations
NGINX Ingress Controller supports extensive customization through annotations. Here are some patterns I use regularly:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: advanced-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /$2
nginx.ingress.kubernetes.io/rate-limit: "100"
nginx.ingress.kubernetes.io/rate-limit-window: "1m"
nginx.ingress.kubernetes.io/cors-allow-origin: "https://myapp.com"
nginx.ingress.kubernetes.io/auth-basic: "Authentication Required"
nginx.ingress.kubernetes.io/auth-basic-realm: "Please enter your credentials"
spec:
ingressClassName: nginx
rules:
- host: api.example.com
http:
paths:
- path: /api/v1(/|$)(.*)
pathType: Prefix
backend:
service:
name: api-v1-service
port:
number: 80
The rewrite-target
annotation strips /api/v1
from the path before sending it to the backend service. Rate limiting prevents abuse, CORS headers enable cross-origin requests, and basic auth adds simple authentication.
Multiple Ingress Controllers
In larger environments, you might run multiple Ingress controllers for different purposes:
# Internal-only ingress for admin interfaces
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: admin-ingress
annotations:
nginx.ingress.kubernetes.io/whitelist-source-range: "10.0.0.0/8,192.168.0.0/16"
spec:
ingressClassName: nginx-internal
rules:
- host: admin.internal.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: admin-service
port:
number: 80
This uses a separate nginx-internal
Ingress class that might be configured to only accept traffic from internal networks.
Load Balancing and Session Affinity
By default, Ingress controllers load balance requests across backend pods. Sometimes you need session affinity:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: sticky-ingress
annotations:
nginx.ingress.kubernetes.io/affinity: "cookie"
nginx.ingress.kubernetes.io/affinity-mode: "persistent"
nginx.ingress.kubernetes.io/session-cookie-name: "route"
nginx.ingress.kubernetes.io/session-cookie-expires: "86400"
spec:
ingressClassName: nginx
rules:
- host: app.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: stateful-app-service
port:
number: 80
This creates a cookie-based session affinity that lasts 24 hours, ensuring users stick to the same backend pod.
Handling WebSockets and Streaming
WebSockets and long-lived connections need special handling:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: websocket-ingress
annotations:
nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
nginx.ingress.kubernetes.io/server-snippets: |
location /ws {
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
spec:
ingressClassName: nginx
rules:
- host: chat.example.com
http:
paths:
- path: /ws
pathType: Prefix
backend:
service:
name: websocket-service
port:
number: 8080
- path: /
pathType: Prefix
backend:
service:
name: web-service
port:
number: 80
The extended timeouts and upgrade headers ensure WebSocket connections work properly.
Cloud Provider Integration
Cloud providers offer their own Ingress controllers that integrate with their load balancers:
# AWS ALB Ingress example
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: aws-alb-ingress
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-west-2:123456789:certificate/abc123
spec:
rules:
- host: myapp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: web-service
port:
number: 80
This creates an AWS Application Load Balancer with SSL termination using an ACM certificate.
Monitoring and Observability
Ingress controllers provide valuable metrics about your external traffic:
# Check NGINX Ingress Controller metrics
kubectl get --raw /metrics | grep nginx
# View controller logs
kubectl logs -n ingress-nginx deployment/ingress-nginx-controller
# Check ingress status
kubectl describe ingress my-ingress
Most Ingress controllers expose Prometheus metrics that you can scrape for monitoring dashboards.
Troubleshooting Common Issues
When Ingress isn’t working, check these common issues:
- DNS not pointing to load balancer: Verify your domain points to the Ingress controller’s external IP
- Wrong Ingress class: Make sure
ingressClassName
matches your controller - Service doesn’t exist: Check that the backend service and endpoints exist
- Path matching issues: Test with curl and check controller logs
- SSL certificate problems: Verify cert-manager is working and certificates are valid
Security Considerations
Ingress controllers are your cluster’s front door, so security is crucial:
- Always use HTTPS in production
- Implement rate limiting to prevent abuse
- Use Web Application Firewall (WAF) rules
- Regularly update your Ingress controller
- Monitor for suspicious traffic patterns
- Implement proper authentication and authorization
What’s Coming Next
Ingress gets traffic into your cluster, but what about controlling traffic between services inside your cluster? In the next part, we’ll explore network policies—Kubernetes’ built-in firewall that lets you implement micro-segmentation and zero-trust networking.
The combination of Ingress for external traffic and network policies for internal traffic gives you complete control over how data flows through your applications.
Advanced Techniques and Patterns
Network Policies and Security
Kubernetes clusters are surprisingly permissive by default. Any pod can communicate with any other pod, which means your frontend application can directly access your production database if it wants to. This flat network model makes development easier, but it’s a security nightmare that needs to be addressed before you go to production.
Network policies are Kubernetes’ answer to network segmentation. They’re like firewalls, but instead of working with IP addresses and ports, they work with labels and selectors. The challenge is that they require a different way of thinking about network security—one that embraces the dynamic, label-driven nature of Kubernetes.
Understanding Network Policy Fundamentals
Network policies are deny-by-default when they exist. This is crucial to understand: if no network policy selects a pod, that pod can communicate freely. But as soon as any network policy selects a pod, that pod can only communicate according to the rules in those policies.
This means you can’t just create a single “allow everything” policy and call it secure. You need to think through your application’s communication patterns and create policies that allow necessary traffic while blocking everything else.
# This policy blocks ALL traffic to selected pods
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {} # Selects all pods in the namespace
policyTypes:
- Ingress
- Egress
This policy selects all pods in the production
namespace and blocks all ingress and egress traffic. It’s the foundation of a zero-trust network model.
Implementing Micro-Segmentation
The most effective approach I’ve found is to start with a default-deny policy and then explicitly allow the traffic you need. Let’s build a realistic example with a three-tier application:
# Allow frontend to communicate with backend
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-backend
namespace: production
spec:
podSelector:
matchLabels:
tier: backend
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
tier: frontend
ports:
- protocol: TCP
port: 8080
This policy allows pods labeled tier: frontend
to connect to pods labeled tier: backend
on port 8080. Notice that we’re selecting the destination pods (backend) and defining who can reach them (frontend).
For the database tier, we want even stricter controls:
# Only backend can access database
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-backend-to-database
namespace: production
spec:
podSelector:
matchLabels:
tier: database
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
tier: backend
ports:
- protocol: TCP
port: 5432
egress:
- to: [] # Allow DNS resolution
ports:
- protocol: UDP
port: 53
The database policy is more restrictive—it only allows ingress from backend pods and only allows egress for DNS resolution.
Cross-Namespace Communication Control
In multi-tenant environments, you often need to control communication between namespaces. Network policies support namespace selectors for this:
# Allow access from monitoring namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-monitoring-access
namespace: production
spec:
podSelector:
matchLabels:
app: web-server
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: monitoring
podSelector:
matchLabels:
app: prometheus
ports:
- protocol: TCP
port: 9090
This allows Prometheus pods in the monitoring
namespace to scrape metrics from web servers in the production
namespace.
Egress Control and External Services
Controlling outbound traffic is just as important as inbound traffic. Many attacks involve compromised pods making outbound connections to download malware or exfiltrate data:
# Restrict egress to specific external services
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: backend-egress-policy
namespace: production
spec:
podSelector:
matchLabels:
tier: backend
policyTypes:
- Egress
egress:
- to: [] # DNS resolution
ports:
- protocol: UDP
port: 53
- to:
- podSelector:
matchLabels:
tier: database
ports:
- protocol: TCP
port: 5432
- to: [] # HTTPS to external APIs
ports:
- protocol: TCP
port: 443
This policy allows backend pods to resolve DNS, connect to the database, and make HTTPS requests to external services, but blocks everything else.
Advanced Selector Patterns
Network policies support sophisticated label selectors that let you create flexible rules:
# Allow access based on multiple labels
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: complex-selector-policy
spec:
podSelector:
matchLabels:
app: api-server
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchExpressions:
- key: tier
operator: In
values: ["frontend", "mobile-app"]
- key: version
operator: NotIn
values: ["deprecated"]
ports:
- protocol: TCP
port: 8080
This policy allows access from pods that are either frontend or mobile-app tier, but not if they’re marked as deprecated.
IP Block Policies for Legacy Integration
Sometimes you need to allow traffic from specific IP ranges, especially when integrating with legacy systems:
# Allow traffic from specific IP ranges
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-legacy-systems
spec:
podSelector:
matchLabels:
app: legacy-integration
policyTypes:
- Ingress
ingress:
- from:
- ipBlock:
cidr: 192.168.1.0/24
except:
- 192.168.1.5/32 # Exclude compromised host
ports:
- protocol: TCP
port: 8080
This allows traffic from the 192.168.1.0/24
network except for the specific host 192.168.1.5
.
Policy Ordering and Conflicts
Network policies are additive—if multiple policies select the same pod, the union of all their rules applies. This can lead to unexpected behavior:
# Policy 1: Allow frontend access
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend
spec:
podSelector:
matchLabels:
app: api
ingress:
- from:
- podSelector:
matchLabels:
tier: frontend
---
# Policy 2: Allow monitoring access
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-monitoring
spec:
podSelector:
matchLabels:
app: api
ingress:
- from:
- podSelector:
matchLabels:
app: prometheus
Both policies select pods with app: api
, so those pods can receive traffic from both frontend pods and Prometheus pods.
Testing and Validation
Network policies can be tricky to debug. Here’s my systematic approach to testing them:
# Create a test pod for network debugging
kubectl run netshoot --image=nicolaka/netshoot -it --rm -- /bin/bash
# Test connectivity between specific pods
kubectl exec -it frontend-pod -- curl backend-service:8080
# Check if a policy is selecting the right pods
kubectl describe networkpolicy my-policy
# See which policies apply to a pod
kubectl get networkpolicy -o yaml | grep -A 10 -B 5 "app: my-app"
I also use temporary “debug” policies that log denied connections:
# Temporary policy to see what's being blocked
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: debug-policy
annotations:
debug: "true"
spec:
podSelector:
matchLabels:
app: debug-target
policyTypes:
- Ingress
- Egress
# No rules = deny all, but some CNIs will log denials
CNI Plugin Considerations
Not all CNI plugins support network policies, and those that do may have different capabilities:
- Calico: Full network policy support, including egress rules and IP blocks
- Cilium: Advanced features like L7 policies and DNS-based rules
- Weave: Basic network policy support
- Flannel: No network policy support (needs Calico overlay)
Check your CNI plugin’s documentation for specific features and limitations.
Performance and Scale Considerations
Network policies add overhead to packet processing. In high-throughput environments, consider:
- Minimizing the number of policies per pod
- Using efficient label selectors
- Avoiding overly complex rules
- Monitoring CNI plugin performance metrics
Most CNI plugins cache policy decisions, so the performance impact decreases over time as the cache warms up.
Compliance and Audit Requirements
Network policies are often required for compliance frameworks like PCI DSS, SOC 2, or HIPAA. Document your policies clearly:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: pci-compliance-policy
annotations:
compliance.framework: "PCI DSS"
compliance.requirement: "1.2.1"
description: "Restrict inbound traffic to cardholder data environment"
spec:
podSelector:
matchLabels:
data-classification: cardholder-data
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
authorized-for-chd: "true"
Monitoring and Alerting
Set up monitoring for network policy violations:
# Check for policy denials in CNI logs
kubectl logs -n kube-system -l k8s-app=calico-node | grep -i deny
# Monitor policy changes
kubectl get events --field-selector reason=NetworkPolicyUpdated
Many organizations set up alerts for:
- New network policies being created
- Policies being deleted
- High numbers of denied connections
- Pods without any network policy coverage
What’s Next
Network policies provide the foundation for secure networking, but they’re just one piece of the puzzle. In the final part, we’ll explore advanced networking patterns including service mesh integration, multi-cluster networking, and troubleshooting complex networking issues in production environments.
The key to successful network policy implementation is starting simple and iterating. Begin with basic segmentation between tiers, test thoroughly, and gradually add more sophisticated rules as your understanding grows.
Best Practices and Optimization
Advanced Patterns and Production Troubleshooting
After years of debugging Kubernetes networking issues at 3 AM, I’ve learned that the most complex problems usually have simple causes. A misconfigured DNS setting, a typo in a service selector, or a forgotten network policy can bring down entire applications. The key to effective troubleshooting is having a systematic approach and understanding how all the networking pieces fit together.
This final part covers the advanced patterns you’ll need in production and the troubleshooting skills that’ll save you hours of frustration when things go wrong.
Service Mesh Integration
Service meshes like Istio, Linkerd, and Consul Connect add a layer of sophistication to Kubernetes networking. They provide features that are difficult or impossible to achieve with basic Kubernetes networking: mutual TLS, advanced traffic management, circuit breakers, and detailed observability.
The trade-off is complexity. Service meshes introduce new concepts, configuration files, and potential failure points. I recommend starting with basic Kubernetes networking and adding a service mesh only when you have specific requirements that justify the complexity.
Here’s a simple Istio configuration that demonstrates the power of service mesh:
# Automatic mutual TLS between services
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: production
spec:
mtls:
mode: STRICT
---
# Traffic splitting for canary deployments
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: reviews-vs
spec:
http:
- match:
- headers:
canary:
exact: "true"
route:
- destination:
host: reviews-service
subset: v2
- route:
- destination:
host: reviews-service
subset: v1
weight: 90
- destination:
host: reviews-service
subset: v2
weight: 10
This configuration enables automatic encryption between all services and implements a canary deployment that sends 10% of traffic to version 2, with the ability to override using a header.
Multi-Cluster Networking
As organizations scale, they often need to connect multiple Kubernetes clusters. This might be for disaster recovery, geographic distribution, or separating different environments while maintaining connectivity.
Cluster mesh solutions like Istio multi-cluster or Submariner enable cross-cluster service discovery and communication:
# Cross-cluster service in Istio
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
name: external-cluster-service
spec:
hosts:
- api-service.production.global
location: MESH_EXTERNAL
ports:
- number: 80
name: http
protocol: HTTP
resolution: DNS
addresses:
- 240.0.0.1 # Virtual IP for cross-cluster service
endpoints:
- address: api-service.production.svc.cluster.local
network: cluster-2
ports:
http: 80
This allows services in one cluster to call api-service.production.global
and have the traffic routed to the appropriate cluster.
Advanced Load Balancing Patterns
Beyond basic round-robin load balancing, production applications often need more sophisticated traffic distribution:
# Weighted routing based on geography
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: geographic-routing
spec:
host: api-service
trafficPolicy:
loadBalancer:
localityLbSetting:
enabled: true
distribute:
- from: "region1/*"
to:
"region1/*": 80
"region2/*": 20
- from: "region2/*"
to:
"region2/*": 80
"region1/*": 20
failover:
- from: region1
to: region2
This configuration keeps traffic local when possible but provides failover to other regions when needed.
Network Performance Optimization
Network performance in Kubernetes depends on several factors. Here are the optimizations that have made the biggest difference in my experience:
Pod Networking: Use host networking for high-throughput applications that can tolerate the security implications:
apiVersion: v1
kind: Pod
metadata:
name: high-performance-app
spec:
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: app
image: myapp:latest
ports:
- containerPort: 8080
hostPort: 8080
CPU Affinity: Pin network-intensive pods to specific CPU cores to reduce context switching:
apiVersion: v1
kind: Pod
metadata:
name: network-intensive-app
spec:
containers:
- name: app
image: myapp:latest
resources:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "2"
memory: "4Gi"
nodeSelector:
node-type: high-performance
Service Mesh Bypass: For very high-throughput internal communication, consider bypassing the service mesh:
# Direct pod-to-pod communication annotation
apiVersion: v1
kind: Service
metadata:
name: high-throughput-service
annotations:
traffic.sidecar.istio.io/excludeInboundPorts: "8080"
spec:
selector:
app: high-throughput-app
ports:
- port: 8080
Comprehensive Troubleshooting Methodology
When networking issues occur, follow this systematic approach:
1. Verify Basic Connectivity
Start with the simplest tests:
# Check if pods are running and ready
kubectl get pods -o wide
# Test DNS resolution
kubectl exec -it test-pod -- nslookup kubernetes.default.svc.cluster.local
# Test service connectivity
kubectl exec -it test-pod -- curl -v http://my-service
2. Examine Service Configuration
Most networking issues stem from service misconfigurations:
# Check service details
kubectl describe service my-service
# Verify endpoints exist
kubectl get endpoints my-service
# Check if service selector matches pod labels
kubectl get pods --show-labels
kubectl get service my-service -o yaml | grep -A 5 selector
3. Network Policy Analysis
If basic connectivity works but specific traffic is blocked:
# List all network policies
kubectl get networkpolicy --all-namespaces
# Check which policies affect a specific pod
kubectl describe pod my-pod | grep -i labels
kubectl get networkpolicy -o yaml | grep -B 10 -A 10 "app: my-app"
# Test with a temporary pod in different namespaces
kubectl run test-pod --image=nicolaka/netshoot -n different-namespace
4. CNI Plugin Debugging
Different CNI plugins provide different debugging tools:
# Calico debugging
kubectl exec -n kube-system calico-node-xxx -- calicoctl get workloadendpoint
kubectl exec -n kube-system calico-node-xxx -- calicoctl get networkpolicy
# Check CNI plugin logs
kubectl logs -n kube-system -l k8s-app=calico-node
kubectl logs -n kube-system -l k8s-app=cilium
5. Ingress Controller Issues
For external connectivity problems:
# Check ingress controller status
kubectl get pods -n ingress-nginx
kubectl logs -n ingress-nginx deployment/ingress-nginx-controller
# Verify ingress configuration
kubectl describe ingress my-ingress
# Check external load balancer
kubectl get service -n ingress-nginx ingress-nginx-controller
Common Production Issues and Solutions
DNS Resolution Failures
Symptoms: Services can’t find each other, intermittent connection failures Causes: CoreDNS configuration issues, DNS policy problems, search domain conflicts
# Check CoreDNS status
kubectl get pods -n kube-system -l k8s-app=kube-dns
# Examine DNS configuration
kubectl get configmap -n kube-system coredns -o yaml
# Test DNS from different pods
kubectl exec -it pod1 -- nslookup service-name
kubectl exec -it pod2 -- dig service-name.namespace.svc.cluster.local
Service Discovery Latency
Symptoms: Slow response times, timeouts during startup Causes: DNS caching issues, service mesh overhead, inefficient service selectors
# Monitor DNS query performance
kubectl exec -it test-pod -- time nslookup my-service
# Check service endpoint count
kubectl get endpoints my-service -o yaml
# Analyze service mesh metrics
kubectl exec -it istio-proxy -- curl localhost:15000/stats | grep dns
Network Policy Conflicts
Symptoms: Unexpected connection denials, services working intermittently Causes: Overlapping policies, incorrect label selectors, missing egress rules
# Audit all policies affecting a pod
kubectl get networkpolicy --all-namespaces -o yaml | \
yq eval 'select(.spec.podSelector.matchLabels.app == "my-app")'
# Test policy changes safely
kubectl apply -f test-policy.yaml --dry-run=server
Load Balancer Issues
Symptoms: Uneven traffic distribution, session affinity problems Causes: Incorrect service configuration, pod readiness issues, upstream health checks
# Check service endpoints and their readiness
kubectl describe endpoints my-service
# Monitor traffic distribution
kubectl top pods -l app=my-app
# Verify load balancer configuration
kubectl describe service my-service | grep -i session
Monitoring and Observability
Effective networking monitoring requires metrics at multiple layers:
# ServiceMonitor for Prometheus to scrape network metrics
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: network-metrics
spec:
selector:
matchLabels:
app: my-app
endpoints:
- port: metrics
interval: 30s
path: /metrics
Key metrics to monitor:
- DNS query latency and failure rates
- Service response times and error rates
- Network policy deny counts
- Ingress controller request rates and latencies
- Pod-to-pod communication patterns
Security Best Practices
Network security in production requires defense in depth:
- Default Deny: Always start with restrictive network policies
- Principle of Least Privilege: Only allow necessary communication
- Regular Audits: Review and update network policies regularly
- Encryption in Transit: Use service mesh or manual TLS for sensitive data
- Monitoring: Alert on policy violations and unusual traffic patterns
Performance Tuning Guidelines
Based on production experience, here are the settings that matter most:
# Optimize CoreDNS for high-throughput environments
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
}
Looking Forward
Kubernetes networking continues to evolve rapidly. Keep an eye on:
- eBPF-based networking (Cilium, Calico eBPF mode)
- Gateway API replacing Ingress
- Multi-cluster service mesh standardization
- IPv6 dual-stack networking
- Network security policy enhancements
The fundamentals we’ve covered—services, DNS, ingress, and network policies—will remain relevant, but the implementations and capabilities will continue to improve.
Final Thoughts
Kubernetes networking seems complex because it is complex. But that complexity serves a purpose: it provides the flexibility and power needed to run modern, distributed applications at scale. The key to mastering it is understanding the principles, practicing with real applications, and building your troubleshooting skills through experience.
Start with the basics, implement security from the beginning, and don’t be afraid to experiment. Every networking issue you debug makes you better at designing resilient, secure network architectures. The investment in understanding Kubernetes networking pays dividends in application reliability, security, and operational efficiency.