Kubernetes Configuration Management
Master Kubernetes configuration with ConfigMaps.
Introduction and Setup
Configuration management in Kubernetes nearly broke me when I first started trying to use it. I spent three days debugging why my application couldn’t connect to the database, only to discover I’d misspelled “postgres” as “postgress” in a ConfigMap. That typo taught me more about Kubernetes configuration than any documentation ever could.
The frustrating truth about Kubernetes configuration is that it looks simple until you need it to work reliably across environments. ConfigMaps and Secrets seem straightforward, but managing configuration at scale requires patterns that aren’t obvious from the basic examples.
Why Configuration Management Matters
I’ve seen production outages caused by configuration mistakes more often than code bugs. A missing environment variable, an incorrect database URL, or a malformed JSON config can bring down entire services. The challenge isn’t just storing configuration - it’s managing it safely across development, staging, and production environments.
The key insight I’ve learned: treat configuration as code. Version it, test it, and deploy it with the same rigor you apply to application code.
ConfigMaps: The Foundation
ConfigMaps store non-sensitive configuration data as key-value pairs. I use them for application settings, feature flags, and any configuration that doesn’t contain secrets.
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
namespace: production
data:
database.host: "postgres.example.com"
database.port: "5432"
log.level: "info"
feature.new_ui: "true"
The beauty of ConfigMaps is their flexibility. You can store simple key-value pairs, entire configuration files, or structured data like JSON or YAML.
Creating ConfigMaps from files saves time during development:
# Create from a properties file
kubectl create configmap app-config --from-file=application.properties
# Create from multiple files
kubectl create configmap nginx-config --from-file=nginx.conf --from-file=mime.types
I keep configuration files in my application repository and create ConfigMaps as part of the deployment process. This ensures configuration changes go through the same review process as code changes.
Secrets: Handling Sensitive Data
Secrets manage sensitive information like passwords, API keys, and certificates. They’re similar to ConfigMaps but with additional security features like base64 encoding and restricted access.
apiVersion: v1
kind: Secret
metadata:
name: database-credentials
namespace: production
type: Opaque
data:
username: cG9zdGdyZXM= # postgres (base64 encoded)
password: c3VwZXJzZWNyZXQ= # supersecret (base64 encoded)
Creating Secrets from the command line is often easier than managing base64 encoding manually:
kubectl create secret generic database-credentials \
--from-literal=username=postgres \
--from-literal=password=supersecret
For TLS certificates, Kubernetes provides a specific Secret type:
apiVersion: v1
kind: Secret
metadata:
name: tls-secret
type: kubernetes.io/tls
data:
tls.crt: LS0tLS1CRUdJTi... # base64 encoded certificate
tls.key: LS0tLS1CRUdJTi... # base64 encoded private key
Using Configuration in Pods
Configuration becomes useful when you inject it into your applications. Kubernetes provides several methods: environment variables, volume mounts, and init containers.
Environment variables work well for simple configuration:
apiVersion: v1
kind: Pod
metadata:
name: app-pod
spec:
containers:
- name: app
image: myapp:latest
env:
- name: DATABASE_HOST
valueFrom:
configMapKeyRef:
name: app-config
key: database.host
- name: DATABASE_PASSWORD
valueFrom:
secretKeyRef:
name: database-credentials
key: password
Volume mounts work better for configuration files:
apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
spec:
containers:
- name: nginx
image: nginx:latest
volumeMounts:
- name: config-volume
mountPath: /etc/nginx/nginx.conf
subPath: nginx.conf
volumes:
- name: config-volume
configMap:
name: nginx-config
The subPath
field lets you mount individual files instead of entire directories, which prevents overwriting existing files in the container.
Environment-Specific Configuration
Managing configuration across environments is where things get complex. I’ve learned to use consistent naming patterns and separate ConfigMaps for each environment.
# Development environment
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config-dev
namespace: development
data:
database.host: "postgres-dev.internal"
log.level: "debug"
feature.new_ui: "true"
---
# Production environment
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config-prod
namespace: production
data:
database.host: "postgres-prod.internal"
log.level: "warn"
feature.new_ui: "false"
I use deployment templates that reference the appropriate ConfigMap for each environment. This ensures consistency while allowing environment-specific overrides.
Configuration Validation
Testing configuration before deployment prevents runtime failures. I create init containers that validate configuration and fail fast if something’s wrong:
apiVersion: v1
kind: Pod
metadata:
name: app-with-validation
spec:
initContainers:
- name: config-validator
image: busybox
command: ['sh', '-c']
args:
- |
echo "Testing configuration..."
# Test environment variables
if [ -z "$DATABASE_HOST" ]; then
echo "ERROR: DATABASE_HOST not set"
exit 1
fi
# Test configuration files
if [ ! -f /config/app.yaml ]; then
echo "ERROR: app.yaml not found"
exit 1
fi
echo "Configuration validation passed"
env:
- name: DATABASE_HOST
valueFrom:
configMapKeyRef:
name: app-config
key: database.host
volumeMounts:
- name: config-volume
mountPath: /config
containers:
- name: app
image: myapp:latest
# ... rest of container spec
This validation catches configuration errors before the main application starts, making debugging much easier.
Common Pitfalls
I’ve made every configuration mistake possible. Here are the ones that hurt the most:
Case sensitivity matters. Kubernetes resource names are case-sensitive, but environment variable names often aren’t. I’ve spent hours debugging why DATABASE_HOST
worked locally but database_host
failed in Kubernetes.
Namespace isolation. ConfigMaps and Secrets are namespaced resources. A ConfigMap in the development
namespace isn’t accessible from pods in the production
namespace. This seems obvious but catches everyone at least once.
Base64 encoding confusion. Secret values must be base64 encoded in YAML files, but kubectl create secret
handles encoding automatically. Mixing these approaches leads to double-encoding errors.
Immutable updates. Changing a ConfigMap doesn’t automatically restart pods that use it. You need to restart pods manually or use deployment strategies that trigger restarts when configuration changes.
Development Workflow
I’ve developed a workflow that makes configuration management less error-prone:
- Keep configuration in version control alongside application code
- Use consistent naming patterns across environments
- Validate configuration before deployment
- Test configuration changes in development first
- Monitor applications after configuration updates
This workflow has saved me from countless production issues and makes configuration changes feel as safe as code deployments.
Configuration management in Kubernetes requires discipline, but the patterns in this guide will help you avoid the mistakes that make it frustrating. The key is treating configuration with the same care you give to application code.
Next, we’ll explore advanced ConfigMap and Secret patterns that make configuration management scalable and maintainable in production environments.
Core Concepts and Fundamentals
After managing configuration for dozens of Kubernetes applications, I’ve learned that the basic ConfigMap and Secret examples only get you so far. Real applications need structured configuration, templating, and dynamic updates. The patterns I’ll share here come from years of debugging configuration issues at 3 AM.
The biggest lesson I’ve learned: configuration complexity grows exponentially with the number of services and environments. What works for a single application breaks down when you’re managing configuration for 50 microservices across multiple environments.
Structured Configuration Patterns
Simple key-value pairs work for basic settings, but complex applications need hierarchical configuration. I structure ConfigMaps to mirror how applications actually consume configuration:
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config-structured
namespace: production
data:
database.yaml: |
primary:
host: postgres-primary.example.com
port: 5432
pool_size: 20
timeout: 30s
replica:
host: postgres-replica.example.com
port: 5432
pool_size: 10
timeout: 30s
logging.yaml: |
level: info
format: json
outputs:
- console
- file:/var/log/app.log
features.yaml: |
new_ui: true
beta_features: false
rate_limiting: 1000
This approach lets me manage related configuration together while keeping it organized. Applications can mount these as files and parse them with their preferred configuration libraries.
Configuration Templating
Hard-coding values in ConfigMaps becomes unmaintainable across environments. I use templating to generate environment-specific configuration from shared templates.
Here’s a template I use for database configuration:
# config-template.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config-{{.Environment}}
namespace: {{.Namespace}}
data:
database.yaml: |
host: postgres-{{.Environment}}.internal
port: 5432
database: myapp_{{.Environment}}
ssl: {{.DatabaseSSL}}
pool_size: {{.DatabasePoolSize}}
app.yaml: |
environment: {{.Environment}}
debug: {{.DebugMode}}
log_level: {{.LogLevel}}
I process this template with different values for each environment:
# Development values
export Environment=dev
export Namespace=development
export DatabaseSSL=false
export DatabasePoolSize=5
export DebugMode=true
export LogLevel=debug
# Generate development ConfigMap
envsubst < config-template.yaml > config-dev.yaml
This templating approach eliminates configuration drift between environments while allowing necessary differences.
Advanced Secret Management
Basic Secrets work for simple use cases, but production applications need more sophisticated secret management. I’ve learned to integrate external secret management systems with Kubernetes.
External Secrets Operator integration:
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: vault-backend
namespace: production
spec:
provider:
vault:
server: "https://vault.company.com"
path: "secret"
version: "v2"
auth:
kubernetes:
mountPath: "kubernetes"
role: "myapp-role"
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: database-credentials
namespace: production
spec:
refreshInterval: 15s
secretStoreRef:
name: vault-backend
kind: SecretStore
target:
name: database-secret
creationPolicy: Owner
data:
- secretKey: username
remoteRef:
key: database/production
property: username
- secretKey: password
remoteRef:
key: database/production
property: password
This setup automatically syncs secrets from Vault to Kubernetes, eliminating the need to manually manage secret values in cluster.
Dynamic Configuration Updates
One of the most frustrating aspects of Kubernetes configuration is that changing a ConfigMap doesn’t automatically update running pods. I’ve developed patterns to handle dynamic updates gracefully.
For applications that can reload configuration, I use a sidecar container that watches for changes:
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-with-config-reload
spec:
template:
spec:
containers:
- name: app
image: myapp:latest
volumeMounts:
- name: config
mountPath: /etc/config
- name: config-reloader
image: jimmidyson/configmap-reload:latest
args:
- --volume-dir=/etc/config
- --webhook-url=http://localhost:8080/reload
volumeMounts:
- name: config
mountPath: /etc/config
readOnly: true
volumes:
- name: config
configMap:
name: app-config
The config-reloader sidecar watches for file changes and triggers a reload webhook in the main application. This pattern works well for applications that support graceful configuration reloading.
For applications that can’t reload configuration, I use deployment annotations to force pod restarts when configuration changes:
# Update ConfigMap and trigger deployment restart
kubectl patch configmap app-config --patch '{"data":{"new.setting":"value"}}'
kubectl patch deployment myapp -p \
'{"spec":{"template":{"metadata":{"annotations":{"configHash":"'$(date +%s)'"}}}}}'
Configuration Validation Patterns
I’ve learned to validate configuration at multiple levels to catch errors early. Here’s a comprehensive validation approach I use:
apiVersion: v1
kind: ConfigMap
metadata:
name: config-validator
data:
validate.sh: |
#!/bin/bash
set -e
echo "Validating configuration..."
# Validate required environment variables
required_vars=("DATABASE_HOST" "DATABASE_PORT" "API_KEY")
for var in "${required_vars[@]}"; do
if [ -z "${!var}" ]; then
echo "ERROR: Required variable $var is not set"
exit 1
fi
done
# Validate configuration file syntax
if [ -f /config/app.yaml ]; then
python -c "import yaml; yaml.safe_load(open('/config/app.yaml'))" || {
echo "ERROR: Invalid YAML in app.yaml"
exit 1
}
fi
# Validate database connectivity
if command -v nc >/dev/null; then
nc -z "$DATABASE_HOST" "$DATABASE_PORT" || {
echo "ERROR: Cannot connect to database"
exit 1
}
fi
echo "Configuration validation passed"
I run this validation in init containers to ensure configuration is correct before starting the main application.
Multi-Environment Configuration Strategy
Managing configuration across development, staging, and production environments requires a systematic approach. I use a layered configuration strategy:
Base configuration (shared across all environments):
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config-base
data:
app.yaml: |
service_name: myapp
port: 8080
metrics_port: 9090
health_check_path: /health
Environment-specific overlays:
# Development overlay
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config-dev
data:
app.yaml: |
debug: true
log_level: debug
database_pool_size: 5
---
# Production overlay
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config-prod
data:
app.yaml: |
debug: false
log_level: warn
database_pool_size: 20
Applications merge base configuration with environment-specific overrides at startup. This approach ensures consistency while allowing necessary environment differences.
Configuration Security Patterns
Security considerations become critical when managing configuration at scale. I follow these patterns to keep configuration secure:
Principle of least privilege: Each application gets only the configuration it needs:
apiVersion: v1
kind: ServiceAccount
metadata:
name: myapp-sa
namespace: production
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: myapp-config-reader
namespace: production
rules:
- apiGroups: [""]
resources: ["configmaps"]
resourceNames: ["app-config", "app-config-prod"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["secrets"]
resourceNames: ["database-credentials"]
verbs: ["get"]
Configuration encryption: Sensitive configuration gets encrypted at rest:
apiVersion: v1
kind: Secret
metadata:
name: encrypted-config
annotations:
config.kubernetes.io/local-config: "true"
type: Opaque
data:
config.yaml: <encrypted-data>
Audit logging: I enable audit logging for configuration changes to track who changed what and when.
Troubleshooting Configuration Issues
Configuration problems can be subtle and hard to debug. I’ve developed a systematic approach to troubleshooting:
Check configuration mounting:
# Verify ConfigMap exists and has expected data
kubectl get configmap app-config -o yaml
# Check if configuration is properly mounted in pod
kubectl exec -it pod-name -- ls -la /etc/config
kubectl exec -it pod-name -- cat /etc/config/app.yaml
Validate environment variables:
# Check environment variables in running pod
kubectl exec -it pod-name -- env | grep DATABASE
Monitor configuration changes:
# Watch for ConfigMap changes
kubectl get events --field-selector involvedObject.kind=ConfigMap
# Check pod restart history
kubectl describe pod pod-name | grep -A 10 Events
These debugging techniques have saved me countless hours when configuration issues arise in production.
The patterns in this section form the foundation for scalable configuration management. They’ve evolved from real-world experience managing configuration across hundreds of applications and multiple environments.
Next, we’ll explore practical applications of these patterns with real-world examples and complete configuration setups for common application architectures.
Practical Applications and Examples
The real test of configuration management comes when you’re deploying actual applications. I’ve configured everything from simple web services to complex microservice architectures, and each taught me something new about what works in practice versus what looks good in documentation.
The most valuable lesson I’ve learned: start simple and add complexity only when you need it. I’ve seen teams over-engineer configuration systems that became harder to debug than the applications they were meant to support.
Microservices Configuration Architecture
Managing configuration for a microservices architecture requires coordination across multiple services. Here’s how I structure configuration for a typical e-commerce platform with user service, product service, and order service.
I start with shared configuration that all services need:
apiVersion: v1
kind: ConfigMap
metadata:
name: platform-config
namespace: ecommerce
data:
platform.yaml: |
cluster_name: production-east
region: us-east-1
environment: production
observability:
jaeger_endpoint: http://jaeger-collector:14268/api/traces
metrics_port: 9090
log_format: json
security:
cors_origins:
- https://app.example.com
- https://admin.example.com
rate_limit: 1000
Each service gets its own specific configuration:
# User Service Configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: user-service-config
namespace: ecommerce
data:
service.yaml: |
service_name: user-service
port: 8080
database:
host: postgres-users.internal
port: 5432
database: users
max_connections: 20
cache:
redis_url: redis://redis-users:6379
ttl: 3600
auth:
jwt_secret_key: user-jwt-secret
token_expiry: 24h
---
# Product Service Configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: product-service-config
namespace: ecommerce
data:
service.yaml: |
service_name: product-service
port: 8080
database:
host: postgres-products.internal
port: 5432
database: products
max_connections: 50
search:
elasticsearch_url: http://elasticsearch:9200
index_name: products
image_storage:
s3_bucket: product-images-prod
cdn_url: https://cdn.example.com
The deployment pattern I use injects both shared and service-specific configuration:
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service
namespace: ecommerce
spec:
template:
spec:
containers:
- name: user-service
image: user-service:v1.2.0
env:
- name: CONFIG_PATH
value: /etc/config
volumeMounts:
- name: platform-config
mountPath: /etc/config/platform
- name: service-config
mountPath: /etc/config/service
- name: secrets
mountPath: /etc/secrets
volumes:
- name: platform-config
configMap:
name: platform-config
- name: service-config
configMap:
name: user-service-config
- name: secrets
secret:
secretName: user-service-secrets
This pattern scales well because each service gets exactly the configuration it needs while sharing common platform settings.
Database Configuration Patterns
Database configuration is where I’ve made the most painful mistakes. Connection strings, credentials, and connection pooling settings need careful management across environments.
Here’s my standard database configuration pattern:
apiVersion: v1
kind: ConfigMap
metadata:
name: database-config
namespace: production
data:
database.yaml: |
primary:
host: postgres-primary.internal
port: 5432
database: myapp
sslmode: require
pool:
min_size: 5
max_size: 20
max_lifetime: 1h
idle_timeout: 10m
timeouts:
connect: 30s
query: 60s
idle: 300s
replica:
host: postgres-replica.internal
port: 5432
database: myapp
sslmode: require
pool:
min_size: 2
max_size: 10
max_lifetime: 1h
idle_timeout: 10m
---
apiVersion: v1
kind: Secret
metadata:
name: database-credentials
namespace: production
type: Opaque
data:
username: bXlhcHA= # myapp
password: c3VwZXJzZWNyZXRwYXNzd29yZA== # supersecretpassword
# Connection strings for different use cases
primary_url: cG9zdGdyZXM6Ly9teWFwcDpzdXBlcnNlY3JldHBhc3N3b3JkQHBvc3RncmVzLXByaW1hcnkuaW50ZXJuYWw6NTQzMi9teWFwcD9zc2xtb2RlPXJlcXVpcmU=
replica_url: cG9zdGdyZXM6Ly9teWFwcDpzdXBlcnNlY3JldHBhc3N3b3JkQHBvc3RncmVzLXJlcGxpY2EuaW50ZXJuYWw6NTQzMi9teWFwcD9zc2xtb2RlPXJlcXVpcmU=
Applications can use either the structured configuration or the pre-built connection strings depending on their database libraries.
Web Application Configuration
Web applications need configuration for frontend assets, API endpoints, and feature flags. I structure this configuration to support both server-side and client-side needs:
apiVersion: v1
kind: ConfigMap
metadata:
name: webapp-config
namespace: production
data:
# Server-side configuration
server.yaml: |
server:
port: 8080
host: 0.0.0.0
session:
secret_key: webapp-session-secret
cookie_name: webapp_session
max_age: 86400
secure: true
static_files:
path: /static
cache_duration: 3600
# Client-side configuration (injected into HTML)
client.json: |
{
"api_base_url": "https://api.example.com",
"websocket_url": "wss://ws.example.com",
"features": {
"new_dashboard": true,
"beta_features": false,
"analytics_enabled": true
},
"third_party": {
"google_analytics_id": "GA-XXXXXXXXX",
"stripe_public_key": "pk_live_..."
}
}
# Nginx configuration for serving the app
nginx.conf: |
server {
listen 80;
server_name example.com;
location / {
try_files $uri $uri/ /index.html;
}
location /api/ {
proxy_pass http://backend-service:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
location /static/ {
expires 1y;
add_header Cache-Control "public, immutable";
}
}
The deployment injects client configuration into the HTML at build time, ensuring the frontend gets the right API endpoints for each environment.
Configuration Testing and Validation
I’ve learned to test configuration as rigorously as application code. Here’s my testing approach:
#!/bin/bash
# config-test.sh
set -e
echo "Testing Kubernetes configurations..."
# Test YAML syntax
echo "Validating YAML syntax..."
for file in *.yaml; do
if ! yq eval . "$file" >/dev/null 2>&1; then
echo "ERROR: Invalid YAML in $file"
exit 1
fi
done
# Test ConfigMap required keys
echo "Validating ConfigMap required keys..."
required_keys=("database.host" "database.port" "log.level")
for configmap in $(kubectl get configmaps -o name); do
for key in "${required_keys[@]}"; do
if ! kubectl get "$configmap" -o jsonpath="{.data['$key']}" >/dev/null 2>&1; then
echo "WARNING: Key '$key' missing in $configmap"
fi
done
done
# Test Secret references
echo "Validating Secret references..."
secret_refs=$(grep -r "secretKeyRef" . | grep -o "name: [a-zA-Z0-9-]*" | cut -d' ' -f2 | sort -u)
for secret in $secret_refs; do
if ! kubectl get secret "$secret" >/dev/null 2>&1; then
echo "ERROR: Referenced secret '$secret' does not exist"
exit 1
fi
done
echo "Configuration tests passed!"
I run this script in CI/CD pipelines before deploying configuration changes. It catches most configuration errors before they reach production.
Configuration Monitoring and Alerting
Monitoring configuration changes helps catch issues early. I use this monitoring setup:
apiVersion: v1
kind: ConfigMap
metadata:
name: config-monitor
data:
monitor.py: |
#!/usr/bin/env python3
import time
import hashlib
from kubernetes import client, config, watch
def monitor_config_changes():
config.load_incluster_config()
v1 = client.CoreV1Api()
print("Starting configuration monitor...")
w = watch.Watch()
for event in w.stream(v1.list_config_map_for_all_namespaces):
config_map = event['object']
event_type = event['type']
if event_type in ['ADDED', 'MODIFIED']:
print(f"ConfigMap {config_map.metadata.name} {event_type}")
# Calculate configuration hash
config_hash = hashlib.md5(
str(config_map.data).encode()
).hexdigest()
# Log change for audit
print(f"Configuration hash: {config_hash}")
# Alert on production changes
if config_map.metadata.namespace == 'production':
send_alert(config_map.metadata.name, event_type)
def send_alert(config_name, event_type):
# Send alert to monitoring system
print(f"ALERT: Production config {config_name} {event_type}")
if __name__ == "__main__":
monitor_config_changes()
This monitor tracks all configuration changes and alerts when production configuration is modified.
Configuration Drift Detection
Configuration drift happens when running configuration differs from what’s in version control. I use this drift detection system:
#!/usr/bin/env python3
import yaml
import hashlib
from kubernetes import client, config
class ConfigDriftDetector:
def __init__(self):
config.load_incluster_config()
self.v1 = client.CoreV1Api()
def detect_drift(self):
print("Detecting configuration drift...")
# Load baseline configuration from git
baseline = self.load_baseline_config()
# Get current cluster configuration
current = self.get_cluster_config()
# Compare configurations
drift_detected = False
for name, baseline_config in baseline.items():
if name not in current:
print(f"DRIFT: ConfigMap {name} missing from cluster")
drift_detected = True
continue
current_config = current[name]
if self.config_hash(baseline_config) != self.config_hash(current_config):
print(f"DRIFT: ConfigMap {name} differs from baseline")
self.show_diff(name, baseline_config, current_config)
drift_detected = True
return drift_detected
def config_hash(self, config_data):
return hashlib.md5(str(config_data).encode()).hexdigest()
def show_diff(self, name, baseline, current):
print(f"Differences in {name}:")
# Implementation would show actual differences
pass
I run drift detection daily to ensure cluster configuration matches the intended state in version control.
These practical patterns have evolved from managing configuration in real production environments. They handle the complexity that emerges when you move beyond simple examples to actual applications serving real users.
Next, we’ll explore advanced techniques including custom operators, policy-driven configuration, and enterprise-grade configuration management patterns.
Advanced Techniques and Patterns
After years of managing configuration at scale, I’ve learned that the real challenges emerge when you need governance, compliance, and automation. The basic ConfigMap and Secret patterns work for small teams, but enterprise environments require sophisticated approaches to configuration management.
The turning point in my understanding came when I had to manage configuration for 200+ microservices across multiple clusters and regions. The manual approaches that worked for 10 services became impossible at that scale.
Custom Configuration Operators
When standard Kubernetes resources aren’t enough, custom operators can automate complex configuration management tasks. I built my first configuration operator after spending too many hours manually updating configuration across environments.
Here’s a simplified version of a configuration operator I use:
// ConfigTemplate represents a configuration template
type ConfigTemplate struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec ConfigTemplateSpec `json:"spec,omitempty"`
Status ConfigTemplateStatus `json:"status,omitempty"`
}
type ConfigTemplateSpec struct {
Template string `json:"template"`
Variables map[string]string `json:"variables"`
Targets []TargetSpec `json:"targets"`
}
type TargetSpec struct {
Namespace string `json:"namespace"`
Name string `json:"name"`
Type string `json:"type"` // ConfigMap or Secret
}
The operator watches for ConfigTemplate resources and generates ConfigMaps or Secrets based on templates:
apiVersion: config.example.com/v1
kind: ConfigTemplate
metadata:
name: database-config-template
namespace: config-system
spec:
template: |
database:
host: postgres-{{.Environment}}.internal
port: 5432
database: myapp_{{.Environment}}
ssl: {{.DatabaseSSL}}
pool_size: {{.PoolSize}}
variables:
Environment: "{{ .Namespace }}"
DatabaseSSL: "true"
PoolSize: "20"
targets:
- namespace: development
name: database-config
type: ConfigMap
- namespace: staging
name: database-config
type: ConfigMap
- namespace: production
name: database-config
type: ConfigMap
The operator processes templates and creates the appropriate resources in each target namespace. This eliminates manual configuration management across environments.
Policy-Driven Configuration
As teams grow, configuration governance becomes critical. I use Open Policy Agent (OPA) to enforce configuration policies across all environments.
Here’s a policy that ensures production ConfigMaps have required metadata:
package kubernetes.configmaps
# Deny ConfigMaps in production without required labels
deny[msg] {
input.kind == "ConfigMap"
input.metadata.namespace == "production"
required_labels := ["app", "version", "environment", "owner"]
missing_label := required_labels[_]
not input.metadata.labels[missing_label]
msg := sprintf("Production ConfigMap missing required label: %v", [missing_label])
}
# Deny ConfigMaps with sensitive data in non-encrypted form
deny[msg] {
input.kind == "ConfigMap"
input.data[key]
# Check for common sensitive patterns
sensitive_patterns := ["password", "secret", "key", "token"]
pattern := sensitive_patterns[_]
contains(lower(key), pattern)
msg := sprintf("ConfigMap contains potentially sensitive key '%v' - use Secret instead", [key])
}
# Require configuration validation annotation
deny[msg] {
input.kind == "ConfigMap"
input.metadata.namespace == "production"
not input.metadata.annotations["config.kubernetes.io/validated"]
msg := "Production ConfigMaps must have validation annotation"
}
I integrate these policies with admission controllers to prevent non-compliant configuration from being deployed:
apiVersion: config.gatekeeper.sh/v1alpha1
kind: ConstraintTemplate
metadata:
name: configmappolicy
spec:
crd:
spec:
names:
kind: ConfigMapPolicy
validation:
properties:
requiredLabels:
type: array
items:
type: string
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package configmappolicy
violation[{"msg": msg}] {
input.review.object.kind == "ConfigMap"
required := input.parameters.requiredLabels[_]
not input.review.object.metadata.labels[required]
msg := sprintf("Missing required label: %v", [required])
}
This policy framework prevents configuration mistakes before they reach production.
GitOps Configuration Management
I’ve found GitOps to be the most reliable approach for managing configuration at scale. Configuration changes go through the same review process as code changes, and deployments are automated and auditable.
My GitOps configuration structure:
config-repo/
├── environments/
│ ├── development/
│ │ ├── configmaps/
│ │ ├── secrets/
│ │ └── kustomization.yaml
│ ├── staging/
│ │ ├── configmaps/
│ │ ├── secrets/
│ │ └── kustomization.yaml
│ └── production/
│ ├── configmaps/
│ ├── secrets/
│ └── kustomization.yaml
├── base/
│ ├── configmaps/
│ ├── secrets/
│ └── kustomization.yaml
└── policies/
├── configmap-policies.rego
└── secret-policies.rego
Base configuration defines common settings:
# base/configmaps/platform-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: platform-config
data:
platform.yaml: |
observability:
metrics_port: 9090
log_format: json
security:
rate_limit: 1000
cors_enabled: true
Environment-specific overlays customize settings:
# environments/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base
patchesStrategicMerge:
- configmaps/platform-config-patch.yaml
configMapGenerator:
- name: environment-config
literals:
- environment=production
- debug=false
- log_level=warn
ArgoCD automatically syncs configuration changes from Git to Kubernetes:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: config-production
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/company/config-repo
targetRevision: main
path: environments/production
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
This GitOps approach provides complete audit trails and makes configuration changes as reliable as code deployments.
Configuration Encryption and Security
Protecting sensitive configuration requires encryption both at rest and in transit. I use Sealed Secrets to encrypt configuration in Git repositories:
# Create a sealed secret from a regular secret
kubectl create secret generic database-credentials \
--from-literal=username=myapp \
--from-literal=password=supersecret \
--dry-run=client -o yaml | \
kubeseal -o yaml > database-credentials-sealed.yaml
The sealed secret can be safely stored in Git:
apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
name: database-credentials
namespace: production
spec:
encryptedData:
username: AgBy3i4OJSWK+PiTySYZZA9rO43cGDEQAx...
password: AgAKAoiQm7xFtFqSJ8TqDE9I8tnHoRqKU...
template:
metadata:
name: database-credentials
namespace: production
type: Opaque
For even higher security, I integrate with external secret management systems:
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: vault-backend
namespace: production
spec:
provider:
vault:
server: "https://vault.company.com"
path: "secret"
version: "v2"
auth:
kubernetes:
mountPath: "kubernetes"
role: "production-role"
This approach keeps sensitive data out of Kubernetes entirely while still making it available to applications.
Configuration Performance Optimization
Large-scale configuration management can impact cluster performance. I’ve learned to optimize configuration for both storage and retrieval performance.
Configuration compression for large ConfigMaps:
import gzip
import base64
import yaml
def compress_configmap_data(data):
"""Compress ConfigMap data to reduce etcd storage"""
compressed = gzip.compress(data.encode('utf-8'))
return base64.b64encode(compressed).decode('utf-8')
def create_compressed_configmap(name, data):
"""Create a ConfigMap with compressed data"""
compressed_data = compress_configmap_data(yaml.dump(data))
configmap = {
'apiVersion': 'v1',
'kind': 'ConfigMap',
'metadata': {
'name': name,
'annotations': {
'config.kubernetes.io/compressed': 'true'
}
},
'data': {
'config.yaml.gz': compressed_data
}
}
return configmap
Configuration caching to reduce API server load:
type ConfigCache struct {
cache map[string]*v1.ConfigMap
mutex sync.RWMutex
ttl time.Duration
}
func (c *ConfigCache) GetConfigMap(namespace, name string) (*v1.ConfigMap, error) {
c.mutex.RLock()
key := fmt.Sprintf("%s/%s", namespace, name)
cached, exists := c.cache[key]
c.mutex.RUnlock()
if exists && time.Since(cached.CreationTimestamp.Time) < c.ttl {
return cached, nil
}
// Fetch from API server and cache
configMap, err := c.client.CoreV1().ConfigMaps(namespace).Get(
context.TODO(), name, metav1.GetOptions{})
if err != nil {
return nil, err
}
c.mutex.Lock()
c.cache[key] = configMap
c.mutex.Unlock()
return configMap, nil
}
These optimizations become important when managing thousands of ConfigMaps across large clusters.
Configuration Compliance and Auditing
Enterprise environments require comprehensive auditing of configuration changes. I use this auditing system:
apiVersion: v1
kind: ConfigMap
metadata:
name: config-audit-policy
data:
audit-policy.yaml: |
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata
resources:
- group: ""
resources: ["configmaps", "secrets"]
namespaces: ["production", "staging"]
- level: RequestResponse
resources:
- group: ""
resources: ["configmaps", "secrets"]
namespaces: ["production"]
verbs: ["create", "update", "patch", "delete"]
Configuration compliance scanning:
class ConfigComplianceScanner:
def __init__(self):
self.policies = self.load_policies()
def scan_namespace(self, namespace):
"""Scan all configuration in a namespace for compliance"""
violations = []
# Scan ConfigMaps
configmaps = self.get_configmaps(namespace)
for cm in configmaps:
violations.extend(self.check_configmap_compliance(cm))
# Scan Secrets
secrets = self.get_secrets(namespace)
for secret in secrets:
violations.extend(self.check_secret_compliance(secret))
return violations
def check_configmap_compliance(self, configmap):
"""Check ConfigMap against compliance policies"""
violations = []
# Check required labels
required_labels = ["app", "version", "environment"]
for label in required_labels:
if label not in configmap.metadata.labels:
violations.append(f"Missing required label: {label}")
# Check for sensitive data
for key, value in configmap.data.items():
if self.contains_sensitive_data(key, value):
violations.append(f"Potentially sensitive data in key: {key}")
return violations
This compliance framework ensures configuration meets organizational standards and regulatory requirements.
These advanced patterns have evolved from managing configuration in large, complex environments. They provide the governance, security, and automation needed for enterprise-scale Kubernetes deployments.
Next, we’ll explore best practices and optimization techniques that tie all these concepts together into a comprehensive configuration management strategy.
Best Practices and Optimization
After managing configuration for hundreds of applications across multiple Kubernetes clusters, I’ve learned that the difference between good and great configuration management lies in the details. The patterns that work for small teams break down at enterprise scale, and the optimizations that seem unnecessary become critical for performance and reliability.
The most important lesson I’ve learned: configuration management is as much about people and processes as it is about technology. The best technical solution fails if the team can’t use it effectively.
Configuration Architecture Principles
I follow these principles when designing configuration systems:
Separation of Concerns: Configuration, secrets, and application code live in separate repositories with different access controls. This prevents developers from accidentally committing secrets and allows security teams to audit configuration independently.
Environment Parity: Development, staging, and production environments use identical configuration structures with only values differing. This eliminates environment-specific bugs and makes promotions predictable.
Immutable Configuration: Once deployed, configuration doesn’t change. Updates require new deployments, ensuring consistency and enabling rollbacks.
Least Privilege: Applications and users get only the configuration access they need. Over-broad permissions lead to security issues and make auditing difficult.
Production-Ready Configuration Patterns
Here’s the configuration architecture I use for production systems:
# Base configuration template
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config-template
annotations:
config.kubernetes.io/version: "v1.2.0"
config.kubernetes.io/validated: "true"
config.kubernetes.io/last-updated: "2024-01-15T10:30:00Z"
data:
app.yaml: |
service:
name: {{ .ServiceName }}
port: {{ .ServicePort }}
environment: {{ .Environment }}
database:
host: {{ .DatabaseHost }}
port: {{ .DatabasePort }}
name: {{ .DatabaseName }}
ssl: {{ .DatabaseSSL }}
pool:
min: {{ .PoolMin }}
max: {{ .PoolMax }}
timeout: {{ .PoolTimeout }}
observability:
metrics_enabled: {{ .MetricsEnabled }}
tracing_enabled: {{ .TracingEnabled }}
log_level: {{ .LogLevel }}
features:
new_ui: {{ .NewUIEnabled }}
beta_features: {{ .BetaEnabled }}
rate_limit: {{ .RateLimit }}
Environment-specific values are managed separately:
# Production values
production:
ServiceName: "user-service"
ServicePort: "8080"
Environment: "production"
DatabaseHost: "postgres-prod.internal"
DatabasePort: "5432"
DatabaseName: "users_prod"
DatabaseSSL: "true"
PoolMin: "10"
PoolMax: "50"
PoolTimeout: "30s"
MetricsEnabled: "true"
TracingEnabled: "true"
LogLevel: "warn"
NewUIEnabled: "true"
BetaEnabled: "false"
RateLimit: "1000"
# Development values
development:
ServiceName: "user-service"
ServicePort: "8080"
Environment: "development"
DatabaseHost: "postgres-dev.internal"
DatabasePort: "5432"
DatabaseName: "users_dev"
DatabaseSSL: "false"
PoolMin: "2"
PoolMax: "10"
PoolTimeout: "10s"
MetricsEnabled: "true"
TracingEnabled: "true"
LogLevel: "debug"
NewUIEnabled: "true"
BetaEnabled: "true"
RateLimit: "100"
This approach ensures consistency while allowing necessary environment differences.
Configuration Security Framework
Security becomes critical when managing configuration at scale. I implement defense-in-depth with multiple security layers:
Access Control:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: production
name: config-reader
rules:
- apiGroups: [""]
resources: ["configmaps"]
resourceNames: ["app-config", "platform-config"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["secrets"]
resourceNames: ["app-secrets"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: production
name: config-manager
rules:
- apiGroups: [""]
resources: ["configmaps", "secrets"]
verbs: ["get", "list", "create", "update", "patch"]
Secret Encryption:
apiVersion: v1
kind: EncryptionConfiguration
resources:
- resources:
- secrets
- configmaps
providers:
- aescbc:
keys:
- name: key1
secret: <base64-encoded-key>
- identity: {}
Policy Enforcement:
package kubernetes.admission
# Deny ConfigMaps with sensitive data patterns
deny[msg] {
input.request.kind.kind == "ConfigMap"
input.request.object.data[key]
sensitive_patterns := [
"password", "secret", "key", "token",
"credential", "auth", "private"
]
pattern := sensitive_patterns[_]
contains(lower(key), pattern)
msg := sprintf("ConfigMap key '%v' appears to contain sensitive data - use Secret instead", [key])
}
# Require encryption for production secrets
deny[msg] {
input.request.kind.kind == "Secret"
input.request.namespace == "production"
not input.request.object.metadata.annotations["config.kubernetes.io/encrypted"]
msg := "Production secrets must be encrypted at rest"
}
This security framework prevents common configuration security mistakes.
Performance Optimization Strategies
Configuration performance impacts application startup time and cluster scalability. I optimize at multiple levels:
ConfigMap Size Optimization:
import gzip
import base64
import json
class ConfigOptimizer:
def __init__(self):
self.compression_threshold = 1024 # 1KB
def optimize_configmap(self, configmap_data):
"""Optimize ConfigMap for size and performance"""
optimized = {}
for key, value in configmap_data.items():
if len(value) > self.compression_threshold:
# Compress large configuration
compressed = self.compress_data(value)
optimized[f"{key}.gz"] = compressed
print(f"Compressed {key}: {len(value)} -> {len(compressed)} bytes")
else:
optimized[key] = value
return optimized
def compress_data(self, data):
"""Compress configuration data"""
compressed = gzip.compress(data.encode('utf-8'))
return base64.b64encode(compressed).decode('utf-8')
Configuration Caching:
type ConfigManager struct {
client kubernetes.Interface
cache map[string]*CachedConfig
cacheMux sync.RWMutex
cacheTTL time.Duration
}
type CachedConfig struct {
Data map[string]string
Timestamp time.Time
}
func (cm *ConfigManager) GetConfig(namespace, name string) (map[string]string, error) {
key := fmt.Sprintf("%s/%s", namespace, name)
cm.cacheMux.RLock()
cached, exists := cm.cache[key]
cm.cacheMux.RUnlock()
if exists && time.Since(cached.Timestamp) < cm.cacheTTL {
return cached.Data, nil
}
// Fetch from API server
configMap, err := cm.client.CoreV1().ConfigMaps(namespace).Get(
context.TODO(), name, metav1.GetOptions{})
if err != nil {
return nil, err
}
// Update cache
cm.cacheMux.Lock()
cm.cache[key] = &CachedConfig{
Data: configMap.Data,
Timestamp: time.Now(),
}
cm.cacheMux.Unlock()
return configMap.Data, nil
}
Batch Configuration Loading:
apiVersion: v1
kind: Pod
metadata:
name: app-with-batch-config
spec:
initContainers:
- name: config-loader
image: config-loader:v1.0.0
command:
- /bin/sh
- -c
- |
# Load all configuration in parallel
kubectl get configmap app-config -o jsonpath='{.data}' > /shared/app-config.json &
kubectl get configmap platform-config -o jsonpath='{.data}' > /shared/platform-config.json &
kubectl get secret app-secrets -o jsonpath='{.data}' > /shared/secrets.json &
wait
# Merge configurations
merge-configs /shared/app-config.json /shared/platform-config.json > /shared/merged-config.json
echo "Configuration loading complete"
volumeMounts:
- name: shared-config
mountPath: /shared
containers:
- name: app
image: myapp:latest
volumeMounts:
- name: shared-config
mountPath: /etc/config
volumes:
- name: shared-config
emptyDir: {}
This approach reduces configuration loading time by parallelizing operations and pre-processing configuration.
Operational Excellence Patterns
Running configuration management in production requires robust operational practices:
Configuration Monitoring:
class ConfigMonitor:
def __init__(self):
self.baseline_configs = self.load_baseline()
self.alert_threshold = 0.1 # 10% change threshold
def monitor_drift(self):
"""Monitor configuration drift from baseline"""
current_configs = self.get_current_configs()
for name, current in current_configs.items():
if name not in self.baseline_configs:
self.alert(f"New configuration detected: {name}")
continue
baseline = self.baseline_configs[name]
drift_percentage = self.calculate_drift(baseline, current)
if drift_percentage > self.alert_threshold:
self.alert(f"Configuration drift detected in {name}: {drift_percentage:.2%}")
def calculate_drift(self, baseline, current):
"""Calculate configuration drift percentage"""
total_keys = len(set(baseline.keys()) | set(current.keys()))
changed_keys = 0
for key in total_keys:
if baseline.get(key) != current.get(key):
changed_keys += 1
return changed_keys / total_keys if total_keys > 0 else 0
Automated Compliance Scanning:
#!/bin/bash
# config-compliance-scan.sh
echo "Starting configuration compliance scan..."
# Check for required labels
echo "Checking required labels..."
kubectl get configmaps --all-namespaces -o json | \
jq -r '.items[] | select(.metadata.namespace == "production") |
select(.metadata.labels.app == null or
.metadata.labels.version == null or
.metadata.labels.environment == null) |
"\(.metadata.namespace)/\(.metadata.name): Missing required labels"'
# Check for sensitive data in ConfigMaps
echo "Checking for sensitive data patterns..."
kubectl get configmaps --all-namespaces -o json | \
jq -r '.items[] | .metadata as $meta |
.data | to_entries[] |
select(.key | test("password|secret|key|token"; "i")) |
"\($meta.namespace)/\($meta.name): Potentially sensitive key \(.key)"'
# Check Secret encryption
echo "Checking Secret encryption..."
kubectl get secrets --all-namespaces -o json | \
jq -r '.items[] | select(.metadata.namespace == "production") |
select(.metadata.annotations["config.kubernetes.io/encrypted"] != "true") |
"\(.metadata.namespace)/\(.metadata.name): Production Secret not encrypted"'
echo "Compliance scan complete"
Configuration Backup and Recovery:
class ConfigBackupManager:
def __init__(self, backup_storage):
self.storage = backup_storage
self.retention_days = 30
def backup_configuration(self, namespace):
"""Backup all configuration in a namespace"""
timestamp = datetime.now().isoformat()
backup_data = {
'timestamp': timestamp,
'namespace': namespace,
'configmaps': self.get_configmaps(namespace),
'secrets': self.get_secrets(namespace)
}
backup_key = f"config-backup/{namespace}/{timestamp}.json"
self.storage.store(backup_key, json.dumps(backup_data))
# Clean up old backups
self.cleanup_old_backups(namespace)
def restore_configuration(self, namespace, backup_timestamp):
"""Restore configuration from backup"""
backup_key = f"config-backup/{namespace}/{backup_timestamp}.json"
backup_data = json.loads(self.storage.retrieve(backup_key))
# Restore ConfigMaps
for cm_data in backup_data['configmaps']:
self.restore_configmap(cm_data)
# Restore Secrets
for secret_data in backup_data['secrets']:
self.restore_secret(secret_data)
Configuration Lifecycle Management
Managing configuration changes over time requires systematic lifecycle management:
Version Control Integration:
# .github/workflows/config-deploy.yml
name: Deploy Configuration
on:
push:
branches: [main]
paths: ['config/**']
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Validate Configuration
run: |
# Validate YAML syntax
find config/ -name "*.yaml" -exec yamllint {} \;
# Run policy checks
opa test policies/ config/
# Validate against schema
kubeval config/**/*.yaml
- name: Deploy to Development
run: |
kubectl apply -k config/environments/development/
- name: Run Integration Tests
run: |
./scripts/test-config-integration.sh development
- name: Deploy to Production
if: github.ref == 'refs/heads/main'
run: |
kubectl apply -k config/environments/production/
Change Management Process:
- Configuration Change Request: All changes start with a documented request including rationale and impact assessment
- Peer Review: Configuration changes require approval from at least two team members
- Automated Testing: Changes are tested in development environment before production deployment
- Gradual Rollout: Production changes are deployed gradually with monitoring at each step
- Rollback Plan: Every change includes a tested rollback procedure
This comprehensive approach to configuration management has evolved from managing real production systems at scale. The patterns and practices here provide the foundation for reliable, secure, and maintainable configuration management in any Kubernetes environment.
The key insight I’ve learned: configuration management is not just about storing and retrieving values - it’s about creating a system that enables teams to work effectively while maintaining security, compliance, and reliability standards.
You now have the knowledge and tools to build enterprise-grade configuration management systems that scale with your organization and support your operational requirements.