Practical Applications and Examples

Now that you understand the fundamentals, let’s build operators that solve real problems. I’ve found that the best way to learn operator development is by tackling scenarios you’ll actually encounter in production - database management, configuration handling, and backup automation.

Building a Database Operator

Database operators are among the most valuable because they handle complex lifecycle management that would otherwise require significant manual intervention. Let me walk you through building a PostgreSQL operator that manages not just the database itself, but also users, backups, and monitoring.

The first step is defining what users actually need from a database operator. In my experience, teams want to specify the database version, storage requirements, and backup policies without worrying about StatefulSets, persistent volumes, or backup scripts.

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: postgresqls.database.example.com
spec:
  group: database.example.com
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              version:
                type: string
                enum: ["12", "13", "14", "15"]
              storage:
                type: string
                pattern: '^[0-9]+Gi$'
              replicas:
                type: integer
                minimum: 1
                maximum: 5
              backup:
                type: object
                properties:
                  enabled:
                    type: boolean
                  schedule:
                    type: string
                  retention:
                    type: string
  scope: Namespaced
  names:
    plural: postgresqls
    singular: postgresql
    kind: PostgreSQL

This CRD captures the essential configuration while hiding the complexity of Kubernetes primitives. Notice how we use validation to prevent common mistakes like invalid storage formats or too many replicas.

The controller logic needs to handle the interdependencies between different resources. Databases require careful ordering - you can’t create users before the database is running, and you shouldn’t start backups until the data directory is properly initialized.

func (r *PostgreSQLReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    postgres := &databasev1.PostgreSQL{}
    err := r.Get(ctx, req.NamespacedName, postgres)
    if err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }
    
    // Create resources in dependency order
    if err := r.ensureSecret(ctx, postgres); err != nil {
        return ctrl.Result{}, fmt.Errorf("failed to create credentials: %w", err)
    }
    
    if err := r.ensureStatefulSet(ctx, postgres); err != nil {
        return ctrl.Result{}, fmt.Errorf("failed to create database: %w", err)
    }
    
    if err := r.ensureService(ctx, postgres); err != nil {
        return ctrl.Result{}, fmt.Errorf("failed to create service: %w", err)
    }
    
    // Only setup backups after database is running
    if postgres.Spec.Backup.Enabled && r.isDatabaseReady(ctx, postgres) {
        if err := r.ensureBackupCronJob(ctx, postgres); err != nil {
            return ctrl.Result{}, fmt.Errorf("failed to setup backups: %w", err)
        }
    }
    
    return ctrl.Result{RequeueAfter: time.Minute * 5}, r.updateStatus(ctx, postgres)
}

The key insight here is that each ensure function is idempotent - it checks the current state and only makes changes if necessary. This makes the operator resilient to failures and restarts.

Configuration Management Patterns

One of the most common operator use cases I encounter is managing application configuration across different environments. Teams often struggle with keeping configuration in sync between development, staging, and production while maintaining security boundaries.

Let’s build a configuration operator that can template values based on the environment and automatically update applications when configuration changes.

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: appconfigs.config.example.com
spec:
  group: config.example.com
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              application:
                type: string
              environment:
                type: string
                enum: ["dev", "staging", "prod"]
              config:
                type: object
                additionalProperties:
                  type: string
  scope: Namespaced
  names:
    plural: appconfigs
    singular: appconfig
    kind: AppConfig

The controller for this operator demonstrates an important pattern - detecting changes and triggering updates in dependent resources. When configuration changes, applications need to be restarted to pick up the new values.

func (r *AppConfigReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    appConfig := &configv1.AppConfig{}
    err := r.Get(ctx, req.NamespacedName, appConfig)
    if err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }
    
    configMap := &corev1.ConfigMap{
        ObjectMeta: metav1.ObjectMeta{
            Name:      fmt.Sprintf("%s-config", appConfig.Spec.Application),
            Namespace: appConfig.Namespace,
        },
        Data: r.processConfigTemplate(appConfig),
    }
    
    ctrl.SetControllerReference(appConfig, configMap, r.Scheme)
    
    // Check if ConfigMap needs updating
    existing := &corev1.ConfigMap{}
    err = r.Get(ctx, client.ObjectKeyFromObject(configMap), existing)
    if err != nil && errors.IsNotFound(err) {
        return ctrl.Result{}, r.Create(ctx, configMap)
    }
    
    if !reflect.DeepEqual(existing.Data, configMap.Data) {
        existing.Data = configMap.Data
        if err := r.Update(ctx, existing); err != nil {
            return ctrl.Result{}, err
        }
        
        // Trigger rolling update of deployments using this config
        return ctrl.Result{}, r.triggerDeploymentUpdate(ctx, appConfig)
    }
    
    return ctrl.Result{}, nil
}

This pattern of watching for changes and cascading updates is incredibly powerful. It means your applications automatically stay in sync with their configuration without manual intervention.

Backup and Restore Automation

Backup operators solve one of the most critical operational challenges - ensuring data is safely backed up and can be restored when needed. I’ve seen too many teams lose data because backup scripts failed silently or weren’t tested properly.

Here’s how to build a backup operator that handles multiple database types and storage backends:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: backups.backup.example.com
spec:
  group: backup.example.com
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              source:
                type: object
                properties:
                  type:
                    type: string
                    enum: ["postgresql", "mysql", "mongodb"]
                  connection:
                    type: object
              destination:
                type: object
                properties:
                  type:
                    type: string
                    enum: ["s3", "gcs", "azure"]
                  bucket:
                    type: string
              schedule:
                type: string
              retention:
                type: string
  scope: Namespaced
  names:
    plural: backups
    singular: backup
    kind: Backup

The backup controller creates CronJobs that run backup scripts on a schedule. The beauty of this approach is that it leverages Kubernetes’ built-in job scheduling while providing a higher-level abstraction for backup management.

func (r *BackupReconciler) ensureCronJob(ctx context.Context, backup *backupv1.Backup) error {
    cronJob := &batchv1.CronJob{
        ObjectMeta: metav1.ObjectMeta{
            Name:      backup.Name + "-cronjob",
            Namespace: backup.Namespace,
        },
        Spec: batchv1.CronJobSpec{
            Schedule: backup.Spec.Schedule,
            JobTemplate: batchv1.JobTemplateSpec{
                Spec: batchv1.JobSpec{
                    Template: corev1.PodTemplateSpec{
                        Spec: corev1.PodSpec{
                            RestartPolicy: corev1.RestartPolicyOnFailure,
                            Containers: []corev1.Container{{
                                Name:    "backup",
                                Image:   r.getBackupImage(backup.Spec.Source.Type),
                                Env:     r.buildBackupEnv(backup),
                                Command: []string{"/backup.sh"},
                            }},
                        },
                    },
                },
            },
        },
    }
    
    ctrl.SetControllerReference(backup, cronJob, r.Scheme)
    return r.Create(ctx, cronJob)
}

What makes this operator particularly useful is that it handles the complexity of different database types and storage backends behind a simple, consistent interface. Users don’t need to remember the specific flags for pg_dump or the AWS CLI syntax - they just specify what they want backed up and where.

Multi-Resource Coordination

Real applications rarely consist of a single component. Most production systems involve databases, caches, message queues, and multiple services that need to be deployed and configured together. This is where operators really shine - they can coordinate complex deployments that would be error-prone to manage manually.

Let me show you how to build an operator that manages a complete application stack:

func (r *AppStackReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    stack := &appv1.AppStack{}
    err := r.Get(ctx, req.NamespacedName, stack)
    if err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }
    
    // Deploy components in dependency order
    if err := r.ensureDatabase(ctx, stack); err != nil {
        return ctrl.Result{}, err
    }
    
    if err := r.ensureCache(ctx, stack); err != nil {
        return ctrl.Result{}, err
    }
    
    if err := r.ensureBackend(ctx, stack); err != nil {
        return ctrl.Result{}, err
    }
    
    if err := r.ensureFrontend(ctx, stack); err != nil {
        return ctrl.Result{}, err
    }
    
    return ctrl.Result{}, r.updateStackStatus(ctx, stack)
}

The key challenge in multi-resource coordination is handling dependencies correctly. You can’t start the backend until the database is ready, and you shouldn’t expose the frontend until the backend is healthy. The operator handles these dependencies automatically, waiting for each component to be ready before proceeding to the next.

Testing Your Operators

Testing operators requires a different approach than testing typical applications. You need to verify that your operator correctly manages Kubernetes resources and handles various failure scenarios.

I recommend starting with integration tests that run against a real Kubernetes cluster:

#!/bin/bash
echo "Testing PostgreSQL Operator..."

kubectl apply -f - <<EOF
apiVersion: database.example.com/v1
kind: PostgreSQL
metadata:
  name: test-db
spec:
  version: "14"
  storage: "1Gi"
  replicas: 1
EOF

# Wait for database to be ready
kubectl wait --for=condition=Ready postgresql/test-db --timeout=300s

# Verify the database is accessible
kubectl exec test-db-0 -- psql -U postgres -c "SELECT version();"

echo "Database test passed!"

These tests give you confidence that your operator works correctly in real environments and help catch issues that unit tests might miss.

In Part 4, we’ll dive into advanced operator techniques like admission webhooks, performance optimization, and security considerations. You’ll learn how to build operators that are ready for production use at scale.