Service Discovery and DNS

I once spent an entire afternoon debugging why my microservices couldn’t find each other, only to discover I’d been using the wrong DNS names. The frustrating part? The error messages were completely unhelpful. “Connection refused” tells you nothing about whether you’re using the wrong hostname, wrong port, or if the service doesn’t exist at all.

Service discovery in Kubernetes is both simpler and more complex than traditional networking. Simpler because DNS “just works” most of the time. More complex because understanding the nuances can save you hours of debugging when things go wrong.

How Kubernetes DNS Actually Works

Every Kubernetes cluster runs CoreDNS (or kube-dns in older clusters) as a system service. This isn’t just any DNS server—it’s specifically designed to understand Kubernetes resources and automatically create DNS records for your services.

When you create a Service, Kubernetes immediately creates several DNS records:

  • my-service.my-namespace.svc.cluster.local (full FQDN)
  • my-service.my-namespace (shorter form)
  • my-service (if you’re in the same namespace)

The beauty is that you rarely need to think about this. Your applications can use the simplest form that works, and Kubernetes handles the rest.

# This service automatically gets DNS entries
apiVersion: v1
kind: Service
metadata:
  name: api-service
  namespace: production
spec:
  selector:
    app: api
  ports:
  - port: 80
    targetPort: 8080

From any pod in the production namespace, you can reach this service at api-service. From other namespaces, use api-service.production. The full FQDN api-service.production.svc.cluster.local works from anywhere, but it’s unnecessarily verbose.

Service Discovery Patterns

The most common pattern I see in production is environment-based service discovery. Instead of hardcoding service names, use environment variables that can change between environments:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  template:
    spec:
      containers:
      - name: app
        image: myapp:latest
        env:
        - name: API_URL
          value: "http://api-service"
        - name: DATABASE_HOST
          value: "postgres-service"
        - name: REDIS_URL
          value: "redis://redis-service:6379"

This approach lets you use different service names in different environments (dev, staging, production) without changing your application code.

Understanding Service Endpoints

Services don’t actually handle traffic—they’re just configuration objects that tell Kubernetes how to route requests. The real work happens at the endpoint level. When you create a Service, Kubernetes automatically creates an Endpoints object that tracks which pods are ready to receive traffic.

You can see this in action:

# Check which pods are behind a service
kubectl get endpoints api-service

# Get detailed endpoint information
kubectl describe endpoints api-service

This is crucial for debugging. If your service isn’t working, check the endpoints. No endpoints usually means your service selector doesn’t match any pods, or the pods aren’t ready.

Headless Services and Direct Pod Access

Sometimes you don’t want load balancing—you want to talk directly to individual pods. This is common with databases or when you need to maintain session affinity. Headless services solve this by returning pod IPs directly instead of a service IP.

apiVersion: v1
kind: Service
metadata:
  name: database-headless
spec:
  clusterIP: None  # This makes it headless
  selector:
    app: database
  ports:
  - port: 5432

With a headless service, DNS queries return multiple A records—one for each pod. Your application can then choose which pod to connect to, or use all of them for different purposes.

Cross-Namespace Communication

By default, services are only accessible within their namespace using the short name. For cross-namespace communication, you have several options:

Use the namespace-qualified name:

env:
- name: SHARED_API_URL
  value: "http://shared-api.shared-services"

Or create a Service in your namespace that points to a service in another namespace using ExternalName:

apiVersion: v1
kind: Service
metadata:
  name: external-api
  namespace: my-app
spec:
  type: ExternalName
  externalName: shared-api.shared-services.svc.cluster.local

Now your app can use external-api as if it were a local service, but it actually routes to the shared service.

DNS Configuration and Troubleshooting

Each pod gets its DNS configuration from the cluster’s DNS service. You can see this configuration:

# Check DNS config in a pod
kubectl exec -it my-pod -- cat /etc/resolv.conf

The typical configuration looks like:

nameserver 10.96.0.10
search my-namespace.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

The search domains are crucial—they let you use short names like api-service instead of full FQDNs. The ndots:5 setting means DNS will try the search domains for any hostname with fewer than 5 dots.

Service Discovery for External Services

Not everything runs in your cluster. For external databases, APIs, or legacy services, you can create Services without selectors:

apiVersion: v1
kind: Service
metadata:
  name: external-database
spec:
  ports:
  - port: 5432
---
apiVersion: v1
kind: Endpoints
metadata:
  name: external-database
subsets:
- addresses:
  - ip: 192.168.1.100
  ports:
  - port: 5432

This creates a service that routes to an external IP address. Your applications can use external-database just like any other service, making it easy to migrate between internal and external services.

Load Balancing and Session Affinity

By default, Services use round-robin load balancing between healthy pods. Sometimes you need more control:

apiVersion: v1
kind: Service
metadata:
  name: sticky-service
spec:
  selector:
    app: web
  sessionAffinity: ClientIP  # Route same client to same pod
  ports:
  - port: 80

Session affinity based on client IP ensures that requests from the same client always go to the same pod. This is useful for applications that store session data locally instead of in a shared store.

Service Mesh and Advanced Discovery

For complex microservice architectures, consider a service mesh like Istio or Linkerd. Service meshes provide advanced service discovery features:

  • Automatic mutual TLS between services
  • Advanced load balancing algorithms
  • Circuit breakers and retry policies
  • Detailed traffic metrics and tracing

Here’s a simple Istio DestinationRule that adds circuit breaking:

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: api-circuit-breaker
spec:
  host: api-service
  trafficPolicy:
    outlierDetection:
      consecutiveErrors: 3
      interval: 30s
      baseEjectionTime: 30s

Debugging Service Discovery Issues

When service discovery fails, follow this debugging checklist:

  1. Check if the service exists: kubectl get svc
  2. Verify the service has endpoints: kubectl get endpoints service-name
  3. Test DNS resolution: kubectl exec -it pod -- nslookup service-name
  4. Check pod labels match service selector: kubectl get pods --show-labels
  5. Verify pods are ready: kubectl get pods (look for Ready column)

The most common issues are:

  • Typos in service names or selectors
  • Pods not passing readiness checks
  • Network policies blocking traffic (we’ll cover this in part 4)
  • Wrong namespace or DNS configuration

Performance Considerations

DNS lookups add latency to every request. In high-performance applications, consider:

  • Caching DNS results in your application
  • Using IP addresses for very high-frequency internal calls
  • Configuring appropriate DNS timeouts and retries
  • Using headless services to reduce DNS overhead

However, premature optimization is the root of all evil. Start with the simple approach and optimize only when you have actual performance problems.

What’s Next

Service discovery gives you the foundation for reliable communication between services. In the next part, we’ll explore Ingress controllers and how to expose your services to external traffic with proper HTTP routing, SSL termination, and load balancing.

Understanding service discovery is crucial because it affects every other networking decision you’ll make. Whether you’re implementing network policies, setting up ingress, or debugging connectivity issues, it all comes back to how services find and communicate with each other.