Core Concepts and Fundamentals
After building your first CRD in Part 1, you might be wondering how the magic actually happens - how does Kubernetes know what to do when you create a WebApp resource? The answer lies in understanding the reconciliation loop, which I consider the heart of any well-designed operator.
The Reconciliation Loop Explained
I’ve worked with many developers who initially think of controllers as event-driven systems, but that’s not quite right. Controllers are level-triggered, not edge-triggered. This means they don’t just react to changes - they continuously ensure the desired state matches reality.
The reconciliation loop follows a simple but powerful pattern. First, it observes the current state of resources in your cluster. Then it compares this with the desired state defined in your custom resources. Finally, it takes action to reconcile any differences. This happens continuously, which means your operator can recover from failures, handle manual changes, and maintain consistency even when things go wrong.
Here’s what a basic reconciliation function looks like in practice:
func (r *WebAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
webapp := &examplev1.WebApp{}
err := r.Get(ctx, req.NamespacedName, webapp)
if err != nil {
if errors.IsNotFound(err) {
return ctrl.Result{}, nil
}
return ctrl.Result{}, err
}
return r.reconcileWebApp(ctx, webapp)
}
This function gets called whenever something changes - whether that’s a user updating the WebApp resource, a deployment failing, or even when the operator restarts. The beauty is that the same logic handles all these scenarios.
Building a Production-Ready Controller
Let me show you how to structure a controller that can handle real-world complexity. In my experience, the biggest mistake developers make is trying to do everything in the main Reconcile function. Instead, break it down into focused, testable functions.
func (r *WebAppReconciler) reconcileWebApp(ctx context.Context, webapp *examplev1.WebApp) (ctrl.Result, error) {
// Handle deletion first
if !webapp.ObjectMeta.DeletionTimestamp.IsZero() {
return r.handleDeletion(ctx, webapp)
}
// Ensure deployment exists and is correct
if err := r.ensureDeployment(ctx, webapp); err != nil {
return ctrl.Result{}, err
}
// Ensure service exists
if err := r.ensureService(ctx, webapp); err != nil {
return ctrl.Result{}, err
}
// Update status
return ctrl.Result{}, r.updateStatus(ctx, webapp)
}
This structure makes it clear what the controller is responsible for and makes each piece independently testable. The ensure
pattern is particularly powerful - these functions check if a resource exists and create or update it as needed.
Managing Resource Ownership
One of the trickiest aspects of operator development is managing the lifecycle of resources your operator creates. Kubernetes provides owner references to establish parent-child relationships between resources, which enables automatic garbage collection.
func (r *WebAppReconciler) createDeployment(webapp *examplev1.WebApp) *appsv1.Deployment {
deployment := &appsv1.Deployment{
ObjectMeta: metav1.ObjectMeta{
Name: webapp.Name,
Namespace: webapp.Namespace,
},
Spec: appsv1.DeploymentSpec{
Replicas: &webapp.Spec.Replicas,
Selector: &metav1.LabelSelector{
MatchLabels: map[string]string{"app": webapp.Name},
},
Template: corev1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{"app": webapp.Name},
},
Spec: corev1.PodSpec{
Containers: []corev1.Container{{
Name: "webapp",
Image: webapp.Spec.Image,
Ports: []corev1.ContainerPort{{
ContainerPort: int32(webapp.Spec.Port),
}},
}},
},
},
},
}
// This is the crucial part - establishing ownership
ctrl.SetControllerReference(webapp, deployment, r.Scheme)
return deployment
}
The SetControllerReference
call creates an owner reference from the deployment back to the WebApp resource. This means when you delete the WebApp, Kubernetes automatically cleans up the deployment. It also prevents other controllers from accidentally managing resources they don’t own.
Handling Complex Lifecycle Events
Sometimes you need more control over cleanup than automatic garbage collection provides. That’s where finalizers come in. I use them when the operator needs to clean up external resources like databases or cloud infrastructure.
func (r *WebAppReconciler) handleDeletion(ctx context.Context, webapp *examplev1.WebApp) (ctrl.Result, error) {
finalizerName := "webapp.example.com/finalizer"
if controllerutil.ContainsFinalizer(webapp, finalizerName) {
// Perform cleanup operations
if err := r.cleanupExternalResources(ctx, webapp); err != nil {
return ctrl.Result{}, err
}
// Remove finalizer to allow deletion
controllerutil.RemoveFinalizer(webapp, finalizerName)
return ctrl.Result{}, r.Update(ctx, webapp)
}
return ctrl.Result{}, nil
}
The finalizer pattern ensures that your cleanup code runs before Kubernetes deletes the resource. This is essential when your operator manages resources outside the cluster or needs to perform graceful shutdown procedures.
Status Management and User Feedback
Users need to understand what’s happening with their resources, which is why proper status management is crucial. I always include both high-level phase information and detailed conditions that help with troubleshooting.
func (r *WebAppReconciler) updateStatus(ctx context.Context, webapp *examplev1.WebApp) error {
// Get current deployment status
deployment := &appsv1.Deployment{}
err := r.Get(ctx, types.NamespacedName{
Name: webapp.Name,
Namespace: webapp.Namespace,
}, deployment)
if err != nil {
webapp.Status.Phase = "Pending"
return r.Status().Update(ctx, webapp)
}
// Update based on deployment readiness
if deployment.Status.ReadyReplicas == webapp.Spec.Replicas {
webapp.Status.Phase = "Running"
webapp.Status.ReadyReplicas = deployment.Status.ReadyReplicas
} else {
webapp.Status.Phase = "Deploying"
}
return r.Status().Update(ctx, webapp)
}
Notice how we use a separate status update call. Kubernetes treats the status subresource differently from the main resource, which prevents status updates from triggering unnecessary reconciliation loops.
Controller Manager Setup
To tie everything together, you need a controller manager that handles the infrastructure concerns like leader election, metrics, and health checks. Here’s the minimal setup that I use in production:
func main() {
mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
Scheme: scheme,
MetricsBindAddress: ":8080",
Port: 9443,
LeaderElection: true,
LeaderElectionID: "webapp-operator-lock",
})
if err := (&WebAppReconciler{
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
}).SetupWithManager(mgr); err != nil {
setupLog.Error(err, "unable to create controller")
os.Exit(1)
}
if err := mgr.Start(ctrl.SetupSignalHandler()); err != nil {
setupLog.Error(err, "problem running manager")
os.Exit(1)
}
}
The leader election feature ensures that only one instance of your operator is actively reconciling resources at a time, which prevents conflicts when running multiple replicas for high availability.
Testing Your Controller Logic
I can’t stress enough how important it is to test your reconciliation logic thoroughly. The controller-runtime framework provides excellent testing utilities that let you test against a fake Kubernetes API server.
func TestWebAppReconciler(t *testing.T) {
scheme := runtime.NewScheme()
_ = examplev1.AddToScheme(scheme)
_ = appsv1.AddToScheme(scheme)
webapp := &examplev1.WebApp{
ObjectMeta: metav1.ObjectMeta{
Name: "test-webapp",
Namespace: "default",
},
Spec: examplev1.WebAppSpec{
Replicas: 3,
Image: "nginx:1.21",
Port: 80,
},
}
client := fake.NewClientBuilder().WithScheme(scheme).WithObjects(webapp).Build()
reconciler := &WebAppReconciler{Client: client, Scheme: scheme}
_, err := reconciler.Reconcile(context.TODO(), reconcile.Request{
NamespacedName: types.NamespacedName{Name: "test-webapp", Namespace: "default"},
})
assert.NoError(t, err)
}
This type of testing catches logic errors early and gives you confidence that your operator will behave correctly in production.
In Part 3, we’ll put these concepts into practice by building operators for real-world scenarios like database management, backup automation, and configuration management. You’ll see how these fundamental patterns scale to handle complex, multi-resource applications.