Monitoring & Compliance

Security monitoring in Kubernetes isn’t just about collecting logs and metrics—it’s about building a comprehensive observability strategy that helps you detect threats, investigate incidents, and prove compliance with security frameworks. I’ve learned that the most effective security monitoring combines real-time threat detection with long-term trend analysis and compliance reporting.

The challenge with Kubernetes security monitoring is the sheer volume of events and the dynamic nature of the environment. Pods come and go, services scale up and down, and network connections change constantly. Your monitoring strategy needs to separate normal operational events from genuine security concerns while maintaining the audit trail required for compliance.

Comprehensive Audit Logging Strategy

Kubernetes audit logging is your primary source of truth for security events, but the default configuration captures too much noise and not enough signal. An effective audit policy focuses on security-relevant events while filtering out routine operations that don’t indicate potential threats.

Here’s a production-ready audit policy that balances security coverage with log volume:

apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: None
  users: ["system:kube-proxy"]
  verbs: ["watch"]
  resources:
  - group: ""
    resources: ["endpoints", "services"]
- level: Metadata
  omitStages: ["RequestReceived"]
  resources:
  - group: ""
    resources: ["secrets", "configmaps"]
  - group: "rbac.authorization.k8s.io"
    resources: ["roles", "rolebindings", "clusterroles", "clusterrolebindings"]
- level: Request
  omitStages: ["RequestReceived"]
  verbs: ["create", "update", "patch", "delete"]
  resources:
  - group: ""
    resources: ["pods", "services"]
  - group: "apps"
    resources: ["deployments", "daemonsets", "statefulsets"]

This policy excludes noisy system events while capturing detailed information about security-sensitive operations. The key insight is using different log levels for different types of events—metadata for secret access, full request details for resource modifications.

Audit log analysis requires structured approaches to handle the volume effectively. I recommend using log aggregation tools that can parse Kubernetes audit events and create meaningful alerts:

kubectl logs -n kube-system kube-apiserver-master-1 | \
  jq 'select(.verb == "create" and .objectRef.resource == "pods" and .user.username != "system:serviceaccount:kube-system:replicaset-controller")'

This command filters audit logs to show pod creations by non-system users, which can help identify unauthorized workload deployments.

Runtime Security Monitoring

Runtime security monitoring detects threats that occur after containers are running. This includes process execution monitoring, file system changes, network connection analysis, and system call filtering. Tools like Falco provide real-time detection of suspicious activities based on behavioral rules.

Falco rules can detect a wide range of security events. Here’s a custom rule that detects when containers try to access sensitive host directories:

- rule: Container Accessing Sensitive Directories
  desc: Detect containers accessing sensitive host paths
  condition: >
    spawned_process and container and
    (fd.name startswith /etc/shadow or
     fd.name startswith /etc/passwd or
     fd.name startswith /root/.ssh)
  output: >
    Sensitive file access in container
    (user=%user.name command=%proc.cmdline file=%fd.name
     container_id=%container.id image=%container.image.repository)
  priority: WARNING

This rule triggers when processes inside containers try to access password files or SSH keys, which could indicate a container breakout attempt or credential harvesting attack.

Network monitoring complements process monitoring by tracking connection patterns and identifying unusual communication flows. Tools that integrate with your CNI plugin can provide detailed network flow analysis:

kubectl get networkpolicies --all-namespaces -o yaml | \
  yq eval '.items[] | select(.spec.policyTypes[] == "Egress") | .metadata.name'

This command helps audit which network policies include egress rules, ensuring that outbound traffic restrictions are properly configured.

Policy Enforcement and Validation

Policy enforcement goes beyond just having policies—you need to continuously validate that they’re working correctly and haven’t been bypassed. Admission controllers like Open Policy Agent (OPA) Gatekeeper provide powerful policy enforcement capabilities with detailed reporting.

Here’s a Gatekeeper constraint that enforces security context requirements:

apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: requiresecuritycontext
spec:
  crd:
    spec:
      names:
        kind: RequireSecurityContext
      validation:
        properties:
          runAsNonRoot:
            type: boolean
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package requiresecuritycontext
        
        violation[{"msg": msg}] {
          input.review.object.kind == "Pod"
          not input.review.object.spec.securityContext.runAsNonRoot
          msg := "Pod must run as non-root user"
        }

The corresponding constraint applies this template to specific namespaces:

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: RequireSecurityContext
metadata:
  name: must-run-as-nonroot
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
    namespaces: ["production", "staging"]
  parameters:
    runAsNonRoot: true

Policy violations should be monitored and alerted on. Gatekeeper provides metrics that you can scrape with Prometheus to track policy enforcement:

kubectl get constraints -o json | \
  jq '.items[] | {name: .metadata.name, violations: .status.totalViolations}'

Compliance Framework Implementation

Compliance frameworks like CIS Kubernetes Benchmark, NIST, and SOC 2 require specific security controls and evidence collection. Automated compliance scanning tools can continuously assess your cluster against these frameworks and generate reports for auditors.

The CIS Kubernetes Benchmark includes specific checks that you can implement as monitoring rules. For example, ensuring that the API server is not accessible without authentication:

kubectl get pods -n kube-system -l component=kube-apiserver -o yaml | \
  grep -E "anonymous-auth|insecure-port"

This command checks API server configuration for insecure settings that would violate CIS benchmark requirements.

Compliance reporting requires collecting evidence over time, not just point-in-time assessments. Your monitoring system should track:

RBAC changes and access patterns
Security policy violations and remediation
Vulnerability scan results and patching timelines
Incident response activities and outcomes

Vulnerability Management Integration

Container image scanning should be integrated into your CI/CD pipeline and runtime monitoring. Tools like Trivy can scan images for known vulnerabilities and generate reports that feed into your compliance documentation:

trivy image --format json --output scan-results.json nginx:latest

Runtime vulnerability monitoring goes beyond static image scanning to detect when running containers become vulnerable due to newly discovered CVEs. This requires continuous scanning and correlation with your inventory of running images.

Policy-based vulnerability management helps prioritize remediation efforts. Not all vulnerabilities require immediate action—focus on those that are exploitable in your specific environment and affect critical workloads.

Incident Response and Forensics

When security incidents occur, your monitoring data becomes crucial for investigation and forensics. Kubernetes environments generate massive amounts of data, so having the right collection and retention policies is essential for effective incident response.

Key data sources for incident investigation include:

Kubernetes audit logs with detailed API access patterns
Container runtime logs showing process execution and file access
Network flow logs capturing communication patterns
Resource utilization metrics that might indicate compromise

Incident response playbooks should include specific procedures for Kubernetes environments, such as isolating compromised pods, collecting container forensic images, and analyzing cluster state at the time of the incident.

The monitoring and compliance foundation we’ve established provides the visibility and evidence collection needed for both day-to-day security operations and formal compliance requirements. In the final part, we’ll bring everything together with real-world production security patterns that demonstrate how all these components work together in enterprise environments.