Practical Applications and Examples
Now that you understand the fundamentals, let’s build operators that solve real problems. I’ve found that the best way to learn operator development is by tackling scenarios you’ll actually encounter in production - database management, configuration handling, and backup automation.
Building a Database Operator
Database operators are among the most valuable because they handle complex lifecycle management that would otherwise require significant manual intervention. Let me walk you through building a PostgreSQL operator that manages not just the database itself, but also users, backups, and monitoring.
The first step is defining what users actually need from a database operator. In my experience, teams want to specify the database version, storage requirements, and backup policies without worrying about StatefulSets, persistent volumes, or backup scripts.
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: postgresqls.database.example.com
spec:
group: database.example.com
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
version:
type: string
enum: ["12", "13", "14", "15"]
storage:
type: string
pattern: '^[0-9]+Gi$'
replicas:
type: integer
minimum: 1
maximum: 5
backup:
type: object
properties:
enabled:
type: boolean
schedule:
type: string
retention:
type: string
scope: Namespaced
names:
plural: postgresqls
singular: postgresql
kind: PostgreSQL
This CRD captures the essential configuration while hiding the complexity of Kubernetes primitives. Notice how we use validation to prevent common mistakes like invalid storage formats or too many replicas.
The controller logic needs to handle the interdependencies between different resources. Databases require careful ordering - you can’t create users before the database is running, and you shouldn’t start backups until the data directory is properly initialized.
func (r *PostgreSQLReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
postgres := &databasev1.PostgreSQL{}
err := r.Get(ctx, req.NamespacedName, postgres)
if err != nil {
return ctrl.Result{}, client.IgnoreNotFound(err)
}
// Create resources in dependency order
if err := r.ensureSecret(ctx, postgres); err != nil {
return ctrl.Result{}, fmt.Errorf("failed to create credentials: %w", err)
}
if err := r.ensureStatefulSet(ctx, postgres); err != nil {
return ctrl.Result{}, fmt.Errorf("failed to create database: %w", err)
}
if err := r.ensureService(ctx, postgres); err != nil {
return ctrl.Result{}, fmt.Errorf("failed to create service: %w", err)
}
// Only setup backups after database is running
if postgres.Spec.Backup.Enabled && r.isDatabaseReady(ctx, postgres) {
if err := r.ensureBackupCronJob(ctx, postgres); err != nil {
return ctrl.Result{}, fmt.Errorf("failed to setup backups: %w", err)
}
}
return ctrl.Result{RequeueAfter: time.Minute * 5}, r.updateStatus(ctx, postgres)
}
The key insight here is that each ensure
function is idempotent - it checks the current state and only makes changes if necessary. This makes the operator resilient to failures and restarts.
Configuration Management Patterns
One of the most common operator use cases I encounter is managing application configuration across different environments. Teams often struggle with keeping configuration in sync between development, staging, and production while maintaining security boundaries.
Let’s build a configuration operator that can template values based on the environment and automatically update applications when configuration changes.
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: appconfigs.config.example.com
spec:
group: config.example.com
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
application:
type: string
environment:
type: string
enum: ["dev", "staging", "prod"]
config:
type: object
additionalProperties:
type: string
scope: Namespaced
names:
plural: appconfigs
singular: appconfig
kind: AppConfig
The controller for this operator demonstrates an important pattern - detecting changes and triggering updates in dependent resources. When configuration changes, applications need to be restarted to pick up the new values.
func (r *AppConfigReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
appConfig := &configv1.AppConfig{}
err := r.Get(ctx, req.NamespacedName, appConfig)
if err != nil {
return ctrl.Result{}, client.IgnoreNotFound(err)
}
configMap := &corev1.ConfigMap{
ObjectMeta: metav1.ObjectMeta{
Name: fmt.Sprintf("%s-config", appConfig.Spec.Application),
Namespace: appConfig.Namespace,
},
Data: r.processConfigTemplate(appConfig),
}
ctrl.SetControllerReference(appConfig, configMap, r.Scheme)
// Check if ConfigMap needs updating
existing := &corev1.ConfigMap{}
err = r.Get(ctx, client.ObjectKeyFromObject(configMap), existing)
if err != nil && errors.IsNotFound(err) {
return ctrl.Result{}, r.Create(ctx, configMap)
}
if !reflect.DeepEqual(existing.Data, configMap.Data) {
existing.Data = configMap.Data
if err := r.Update(ctx, existing); err != nil {
return ctrl.Result{}, err
}
// Trigger rolling update of deployments using this config
return ctrl.Result{}, r.triggerDeploymentUpdate(ctx, appConfig)
}
return ctrl.Result{}, nil
}
This pattern of watching for changes and cascading updates is incredibly powerful. It means your applications automatically stay in sync with their configuration without manual intervention.
Backup and Restore Automation
Backup operators solve one of the most critical operational challenges - ensuring data is safely backed up and can be restored when needed. I’ve seen too many teams lose data because backup scripts failed silently or weren’t tested properly.
Here’s how to build a backup operator that handles multiple database types and storage backends:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: backups.backup.example.com
spec:
group: backup.example.com
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
source:
type: object
properties:
type:
type: string
enum: ["postgresql", "mysql", "mongodb"]
connection:
type: object
destination:
type: object
properties:
type:
type: string
enum: ["s3", "gcs", "azure"]
bucket:
type: string
schedule:
type: string
retention:
type: string
scope: Namespaced
names:
plural: backups
singular: backup
kind: Backup
The backup controller creates CronJobs that run backup scripts on a schedule. The beauty of this approach is that it leverages Kubernetes’ built-in job scheduling while providing a higher-level abstraction for backup management.
func (r *BackupReconciler) ensureCronJob(ctx context.Context, backup *backupv1.Backup) error {
cronJob := &batchv1.CronJob{
ObjectMeta: metav1.ObjectMeta{
Name: backup.Name + "-cronjob",
Namespace: backup.Namespace,
},
Spec: batchv1.CronJobSpec{
Schedule: backup.Spec.Schedule,
JobTemplate: batchv1.JobTemplateSpec{
Spec: batchv1.JobSpec{
Template: corev1.PodTemplateSpec{
Spec: corev1.PodSpec{
RestartPolicy: corev1.RestartPolicyOnFailure,
Containers: []corev1.Container{{
Name: "backup",
Image: r.getBackupImage(backup.Spec.Source.Type),
Env: r.buildBackupEnv(backup),
Command: []string{"/backup.sh"},
}},
},
},
},
},
},
}
ctrl.SetControllerReference(backup, cronJob, r.Scheme)
return r.Create(ctx, cronJob)
}
What makes this operator particularly useful is that it handles the complexity of different database types and storage backends behind a simple, consistent interface. Users don’t need to remember the specific flags for pg_dump
or the AWS CLI syntax - they just specify what they want backed up and where.
Multi-Resource Coordination
Real applications rarely consist of a single component. Most production systems involve databases, caches, message queues, and multiple services that need to be deployed and configured together. This is where operators really shine - they can coordinate complex deployments that would be error-prone to manage manually.
Let me show you how to build an operator that manages a complete application stack:
func (r *AppStackReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
stack := &appv1.AppStack{}
err := r.Get(ctx, req.NamespacedName, stack)
if err != nil {
return ctrl.Result{}, client.IgnoreNotFound(err)
}
// Deploy components in dependency order
if err := r.ensureDatabase(ctx, stack); err != nil {
return ctrl.Result{}, err
}
if err := r.ensureCache(ctx, stack); err != nil {
return ctrl.Result{}, err
}
if err := r.ensureBackend(ctx, stack); err != nil {
return ctrl.Result{}, err
}
if err := r.ensureFrontend(ctx, stack); err != nil {
return ctrl.Result{}, err
}
return ctrl.Result{}, r.updateStackStatus(ctx, stack)
}
The key challenge in multi-resource coordination is handling dependencies correctly. You can’t start the backend until the database is ready, and you shouldn’t expose the frontend until the backend is healthy. The operator handles these dependencies automatically, waiting for each component to be ready before proceeding to the next.
Testing Your Operators
Testing operators requires a different approach than testing typical applications. You need to verify that your operator correctly manages Kubernetes resources and handles various failure scenarios.
I recommend starting with integration tests that run against a real Kubernetes cluster:
#!/bin/bash
echo "Testing PostgreSQL Operator..."
kubectl apply -f - <<EOF
apiVersion: database.example.com/v1
kind: PostgreSQL
metadata:
name: test-db
spec:
version: "14"
storage: "1Gi"
replicas: 1
EOF
# Wait for database to be ready
kubectl wait --for=condition=Ready postgresql/test-db --timeout=300s
# Verify the database is accessible
kubectl exec test-db-0 -- psql -U postgres -c "SELECT version();"
echo "Database test passed!"
These tests give you confidence that your operator works correctly in real environments and help catch issues that unit tests might miss.
In Part 4, we’ll dive into advanced operator techniques like admission webhooks, performance optimization, and security considerations. You’ll learn how to build operators that are ready for production use at scale.