Introduction and Setup

When I first started working with Kubernetes, I quickly realized that managing complex applications required more than just deploying pods and services. That’s where operators come in - they’re like having an experienced system administrator encoded in software, continuously managing your applications with domain-specific knowledge.

Understanding Kubernetes Operators

Operators extend Kubernetes by combining Custom Resource Definitions (CRDs) with controllers that understand how to manage specific applications. I’ve seen teams struggle with manual database backups, complex scaling decisions, and application lifecycle management. Operators solve these problems by automating operational tasks that would otherwise require human intervention.

Think of it this way: instead of writing runbooks for your operations team, you encode that knowledge into an operator that runs 24/7 in your cluster. When you need to deploy a PostgreSQL database, the operator knows how to configure storage, set up replication, schedule backups, and handle failover scenarios automatically.

The operator pattern consists of three key components that work together. Custom Resources (CRs) define your application’s desired state using familiar Kubernetes YAML syntax. Custom Resource Definitions (CRDs) act as the schema, defining what fields your custom resources can have and their validation rules. Controllers watch for changes to these resources and take action to maintain the desired state.

Setting Up Your Development Environment

Before we build our first operator, let’s ensure you have the necessary tools installed. I recommend using operator-sdk as it provides scaffolding and best practices out of the box.

# Verify kubectl is working
kubectl version --client

# Check cluster connectivity
kubectl cluster-info

If you don’t have operator-sdk installed, here’s how to get it on macOS:

# Download and install operator-sdk
curl -LO https://github.com/operator-framework/operator-sdk/releases/latest/download/operator-sdk_darwin_amd64
chmod +x operator-sdk_darwin_amd64
sudo mv operator-sdk_darwin_amd64 /usr/local/bin/operator-sdk

Verify everything is working correctly:

operator-sdk version

You should see output showing the operator-sdk version, confirming it’s properly installed.

Creating Your First Custom Resource Definition

Let’s start with a practical example - a WebApp CRD that defines how we want to deploy web applications. This will give you hands-on experience with the concepts before we dive deeper into controller logic.

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: webapps.example.com
spec:
  group: example.com
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              replicas:
                type: integer
                minimum: 1
                maximum: 10
              image:
                type: string
              port:
                type: integer
                default: 8080
            required:
            - replicas
            - image
  scope: Namespaced
  names:
    plural: webapps
    singular: webapp
    kind: WebApp

This CRD defines a new Kubernetes resource type called WebApp. The group field creates a namespace for your API (similar to how core Kubernetes resources use different API groups). The schema section defines what fields users can specify - in this case, the number of replicas, container image, and port number.

Notice how we’ve included validation rules like minimum and maximum values for replicas. This prevents users from accidentally creating deployments with zero replicas or overwhelming the cluster with too many instances.

Apply this CRD to your cluster:

kubectl apply -f webapp-crd.yaml
kubectl get crd webapps.example.com

Creating Custom Resource Instances

Now that we’ve defined the structure, let’s create an actual WebApp resource. This demonstrates how users will interact with your operator - through familiar Kubernetes YAML manifests.

apiVersion: example.com/v1
kind: WebApp
metadata:
  name: my-webapp
spec:
  replicas: 3
  image: nginx:1.21
  port: 80

Apply this resource and observe how Kubernetes accepts it:

kubectl apply -f my-webapp.yaml
kubectl get webapps
kubectl describe webapp my-webapp

At this point, you’ll notice that while Kubernetes accepts and stores your WebApp resource, nothing actually happens. That’s because we haven’t built the controller yet - the component that watches for WebApp resources and takes action.

Understanding Controller Basics

Controllers are the brains of the operator pattern. They continuously watch for changes to resources and work to reconcile the actual state with the desired state. Here’s a simplified controller structure that shows the essential pattern:

func (r *WebAppReconciler) Reconcile(ctx context.Context, req reconcile.Request) (reconcile.Result, error) {
    // Fetch the WebApp resource
    webapp := &WebApp{}
    err := r.Get(ctx, req.NamespacedName, webapp)
    if err != nil {
        return reconcile.Result{}, client.IgnoreNotFound(err)
    }
    
    // Create deployment based on WebApp spec
    deployment := buildDeployment(webapp)
    return reconcile.Result{}, r.Create(ctx, deployment)
}

The Reconcile function is called whenever a WebApp resource changes. It fetches the current resource, determines what Kubernetes objects should exist (like Deployments or Services), and creates or updates them accordingly. The beauty of this pattern is that it’s declarative - you describe what you want, and the controller figures out how to make it happen.

Testing Your Foundation

Let’s verify that your CRD is working correctly by exploring the new API endpoint:

# Check that your new resource type is available
kubectl api-resources | grep webapp

# Create a test instance
kubectl apply -f - <<EOF
apiVersion: example.com/v1
kind: WebApp
metadata:
  name: test-app
spec:
  replicas: 2
  image: nginx:alpine
  port: 80
EOF

You can now manage WebApp resources just like any other Kubernetes resource, using familiar commands like kubectl get, kubectl describe, and kubectl delete.

In Part 2, we’ll build the controller logic that brings these WebApp resources to life by creating actual Deployments and Services. You’ll learn about reconciliation loops, event handling, and how to properly manage the lifecycle of the resources your operator creates.