Docker Security and Performance Optimization
Implement Docker security best practices.
Introduction and Setup
Docker Security and Optimization: Introduction and Setup
Docker security and performance optimization are critical for production deployments. This guide covers comprehensive security hardening, performance tuning, and operational best practices for containerized environments.
Docker Security Fundamentals
Security Model Overview
┌─────────────────────────────────────────────────────────┐
│ Host Operating System │
├─────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Kernel │ │ Namespaces │ │ cgroups │ │
│ │ Capabilities│ │ │ │ │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
├─────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ SELinux/ │ │ AppArmor │ │ seccomp │ │
│ │ AppArmor │ │ │ │ │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
├─────────────────────────────────────────────────────────┤
│ Docker Engine │
├─────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Container 1 │ │ Container 2 │ │ Container 3 │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────┘
Basic Security Commands
# Run container with security options
docker run --security-opt no-new-privileges:true \
--cap-drop ALL \
--cap-add NET_BIND_SERVICE \
--read-only \
--tmpfs /tmp:rw,noexec,nosuid,size=100m \
nginx
# Run as non-root user
docker run --user 1000:1000 \
-v /etc/passwd:/etc/passwd:ro \
-v /etc/group:/etc/group:ro \
alpine id
# Limit resources
docker run --memory=512m \
--cpus="1.5" \
--pids-limit=100 \
--ulimit nofile=1024:1024 \
myapp
# Network security
docker run --network none alpine
docker run --network custom-network \
--ip 172.20.0.10 \
myapp
Container Hardening Basics
Secure Dockerfile Practices
# Use specific versions, not latest
FROM node:16.17.0-alpine3.16
# Create non-root user early
RUN addgroup -g 1001 -S nodejs && \
adduser -S nextjs -u 1001 -G nodejs
# Set working directory
WORKDIR /app
# Copy package files first for better caching
COPY package*.json ./
# Install dependencies as root, then switch
RUN npm ci --only=production && \
npm cache clean --force && \
chown -R nextjs:nodejs /app
# Copy application files
COPY --chown=nextjs:nodejs . .
# Switch to non-root user
USER nextjs
# Remove unnecessary packages
RUN apk del --purge \
&& rm -rf /var/cache/apk/* \
&& rm -rf /tmp/*
# Set secure permissions
RUN chmod -R 755 /app && \
chmod 644 /app/package.json
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD curl -f http://localhost:3000/health || exit 1
# Expose port
EXPOSE 3000
# Use exec form for CMD
CMD ["node", "server.js"]
Runtime Security Configuration
# Comprehensive security flags
docker run -d \
--name secure-app \
--user 1000:1000 \
--read-only \
--tmpfs /tmp:rw,noexec,nosuid,size=100m \
--tmpfs /var/run:rw,noexec,nosuid,size=50m \
--cap-drop ALL \
--cap-add CHOWN \
--cap-add SETGID \
--cap-add SETUID \
--security-opt no-new-privileges:true \
--security-opt apparmor:docker-default \
--security-opt seccomp:default \
--memory=512m \
--memory-swap=512m \
--cpu-shares=512 \
--pids-limit=100 \
--ulimit nofile=1024:1024 \
--ulimit nproc=64:64 \
--restart=unless-stopped \
myapp:latest
Performance Optimization Basics
Resource Management
# CPU optimization
docker run -d \
--cpus="2.5" \
--cpu-shares=1024 \
--cpuset-cpus="0,1" \
--cpu-quota=50000 \
--cpu-period=100000 \
myapp
# Memory optimization
docker run -d \
--memory=2g \
--memory-swap=2g \
--memory-reservation=1g \
--oom-kill-disable=false \
--kernel-memory=500m \
myapp
# I/O optimization
docker run -d \
--device-read-bps /dev/sda:50mb \
--device-write-bps /dev/sda:50mb \
--device-read-iops /dev/sda:1000 \
--device-write-iops /dev/sda:1000 \
myapp
# Network optimization
docker run -d \
--sysctl net.core.somaxconn=65535 \
--sysctl net.ipv4.tcp_max_syn_backlog=65535 \
--sysctl net.core.rmem_max=134217728 \
--sysctl net.core.wmem_max=134217728 \
myapp
Docker Daemon Optimization
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
},
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
],
"default-ulimits": {
"nofile": {
"Name": "nofile",
"Hard": 64000,
"Soft": 64000
}
},
"max-concurrent-downloads": 10,
"max-concurrent-uploads": 5,
"default-shm-size": "128M",
"userland-proxy": false,
"experimental": false,
"metrics-addr": "127.0.0.1:9323",
"live-restore": true
}
Security Scanning and Assessment
Image Vulnerability Scanning
# Install and use Trivy
curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin
# Scan image for vulnerabilities
trivy image nginx:latest
# Scan with specific severity
trivy image --severity HIGH,CRITICAL nginx:latest
# Scan and output to file
trivy image --format json --output results.json nginx:latest
# Scan filesystem
trivy fs .
# Scan with ignore file
trivy image --ignorefile .trivyignore nginx:latest
.trivyignore example:
# Ignore specific CVEs
CVE-2021-12345
CVE-2021-67890
# Ignore by package
pkg:npm/[email protected]
# Ignore by severity
MEDIUM
LOW
Container Runtime Security
# Use Docker Bench Security
git clone https://github.com/docker/docker-bench-security.git
cd docker-bench-security
sudo ./docker-bench-security.sh
# Use Falco for runtime security
docker run -i -t \
--name falco \
--privileged \
-v /var/run/docker.sock:/host/var/run/docker.sock \
-v /dev:/host/dev \
-v /proc:/host/proc:ro \
-v /boot:/host/boot:ro \
-v /lib/modules:/host/lib/modules:ro \
-v /usr:/host/usr:ro \
falcosecurity/falco:latest
# Use Sysdig for monitoring
docker run -d --name sysdig-agent \
--restart always \
--privileged \
--net host \
--pid host \
-e ACCESS_KEY=your-access-key \
-e SECURE=true \
-v /var/run/docker.sock:/host/var/run/docker.sock \
-v /dev:/host/dev \
-v /proc:/host/proc:ro \
-v /boot:/host/boot:ro \
-v /lib/modules:/host/lib/modules:ro \
-v /usr:/host/usr:ro \
sysdig/agent
Secrets Management
Docker Secrets (Swarm Mode)
# Create secret from file
echo "mypassword" | docker secret create db_password -
# Create secret from stdin
docker secret create ssl_cert cert.pem
# List secrets
docker secret ls
# Use secret in service
docker service create \
--name myapp \
--secret db_password \
--secret ssl_cert \
myapp:latest
# Access secret in container (available at /run/secrets/secret_name)
docker exec container cat /run/secrets/db_password
External Secrets Management
# docker-compose.yml with external secrets
version: '3.8'
services:
app:
image: myapp
environment:
- DB_PASSWORD_FILE=/run/secrets/db_password
- API_KEY_FILE=/run/secrets/api_key
secrets:
- db_password
- api_key
volumes:
- app_data:/data
vault:
image: vault:latest
cap_add:
- IPC_LOCK
environment:
- VAULT_DEV_ROOT_TOKEN_ID=myroot
- VAULT_DEV_LISTEN_ADDRESS=0.0.0.0:8200
ports:
- "8200:8200"
secrets:
db_password:
external: true
api_key:
external: true
volumes:
app_data:
Monitoring and Logging Setup
Security Monitoring
# docker-compose.security-monitoring.yml
version: '3.8'
services:
# Security event collector
falco:
image: falcosecurity/falco:latest
privileged: true
volumes:
- /var/run/docker.sock:/host/var/run/docker.sock
- /dev:/host/dev
- /proc:/host/proc:ro
- /boot:/host/boot:ro
- /lib/modules:/host/lib/modules:ro
- /usr:/host/usr:ro
- ./falco/falco.yaml:/etc/falco/falco.yaml:ro
environment:
- FALCO_GRPC_ENABLED=true
- FALCO_GRPC_BIND_ADDRESS=0.0.0.0:5060
# Log aggregation
fluentd:
image: fluent/fluentd:latest
volumes:
- ./fluentd/fluent.conf:/fluentd/etc/fluent.conf:ro
- /var/log:/var/log:ro
ports:
- "24224:24224"
# Metrics collection
node-exporter:
image: prom/node-exporter:latest
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($$|/)'
# Container metrics
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
ports:
- "8080:8080"
Performance Monitoring
# Monitor container performance
docker stats
# Detailed container inspection
docker exec container top
docker exec container ps aux
docker exec container netstat -tlnp
docker exec container iostat -x 1
# System-wide monitoring
htop
iotop
nethogs
iftop
# Docker system information
docker system df
docker system events
docker system info
Compliance and Governance
CIS Docker Benchmark
Key security controls:
-
Host Configuration
- Keep Docker up to date
- Use trusted users only
- Audit Docker daemon and files
- Set up proper logging
-
Docker Daemon Configuration
- Restrict network traffic between containers
- Set logging level to ‘info’
- Allow Docker to make changes to iptables
- Do not use insecure registries
-
Docker Daemon Configuration Files
- Verify ownership and permissions
- Secure Docker socket
- Protect Docker daemon configuration
-
Container Images and Build Files
- Create user for containers
- Use trusted base images
- Do not install unnecessary packages
- Scan images for vulnerabilities
-
Container Runtime
- Do not share host’s network namespace
- Limit memory usage
- Set container CPU priority appropriately
- Mount container’s root filesystem as read-only
Automated Compliance Checking
#!/bin/bash
# docker-security-audit.sh
echo "Docker Security Audit Report"
echo "============================"
echo "Date: $(date)"
echo
# Check Docker version
echo "1. Docker Version Check"
docker --version
echo
# Check running containers security
echo "2. Container Security Analysis"
for container in $(docker ps -q); do
echo "Container: $container"
# Check if running as root
user=$(docker inspect --format='{{.Config.User}}' $container)
if [ -z "$user" ]; then
echo " WARNING: Container running as root"
else
echo " OK: Running as user $user"
fi
# Check privileged mode
privileged=$(docker inspect --format='{{.HostConfig.Privileged}}' $container)
if [ "$privileged" = "true" ]; then
echo " WARNING: Container running in privileged mode"
else
echo " OK: Not running in privileged mode"
fi
# Check read-only root filesystem
readonly=$(docker inspect --format='{{.HostConfig.ReadonlyRootfs}}' $container)
if [ "$readonly" = "false" ]; then
echo " WARNING: Root filesystem is writable"
else
echo " OK: Root filesystem is read-only"
fi
echo
done
# Check image vulnerabilities
echo "3. Image Vulnerability Scan"
for image in $(docker images --format "{{.Repository}}:{{.Tag}}" | grep -v "<none>"); do
echo "Scanning $image..."
trivy image --severity HIGH,CRITICAL --quiet $image
done
echo "Audit completed."
Summary
In this introduction, you’ve learned:
Security Fundamentals
- Security Model: Understanding Docker’s security architecture and isolation mechanisms
- Container Hardening: Secure Dockerfile practices and runtime security configuration
- Vulnerability Scanning: Using Trivy and other tools for security assessment
- Secrets Management: Proper handling of sensitive data in containers
Performance Basics
- Resource Management: CPU, memory, and I/O optimization techniques
- Docker Daemon Tuning: Configuration for optimal performance
- Monitoring Setup: Tools and techniques for performance monitoring
Operational Security
- Compliance: CIS Docker Benchmark and security controls
- Monitoring: Security event collection and analysis
- Governance: Automated compliance checking and reporting
Key Concepts Mastered
- Defense in Depth: Multiple layers of security controls
- Least Privilege: Running containers with minimal permissions
- Resource Limits: Preventing resource exhaustion attacks
- Continuous Monitoring: Real-time security and performance monitoring
Next Steps: Part 2 explores core security concepts including advanced hardening techniques, performance optimization strategies, and comprehensive monitoring solutions that form the foundation of production-ready Docker deployments.
Core Concepts and Fundamentals
Core Security and Optimization Concepts
This section explores advanced security hardening techniques, performance optimization strategies, and comprehensive monitoring solutions essential for production Docker deployments.
Advanced Security Hardening
Kernel Security Features
# Enable AppArmor profile
docker run --security-opt apparmor:docker-default nginx
# Custom AppArmor profile
docker run --security-opt apparmor:custom-profile myapp
# SELinux context
docker run --security-opt label:type:container_t \
--security-opt label:level:s0:c100,c200 \
myapp
# Custom seccomp profile
docker run --security-opt seccomp:./custom-seccomp.json myapp
Custom seccomp profile:
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": ["SCMP_ARCH_X86_64"],
"syscalls": [
{
"names": [
"accept", "accept4", "access", "adjtimex", "alarm", "bind", "brk",
"capget", "capset", "chdir", "chmod", "chown", "chroot", "clock_getres",
"clock_gettime", "clock_nanosleep", "close", "connect", "copy_file_range",
"creat", "dup", "dup2", "dup3", "epoll_create", "epoll_create1",
"epoll_ctl", "epoll_pwait", "epoll_wait", "eventfd", "eventfd2",
"execve", "execveat", "exit", "exit_group", "faccessat", "fadvise64",
"fallocate", "fanotify_mark", "fchdir", "fchmod", "fchmodat", "fchown",
"fchownat", "fcntl", "fdatasync", "fgetxattr", "flistxattr", "flock",
"fork", "fremovexattr", "fsetxattr", "fstat", "fstatfs", "fsync",
"ftruncate", "futex", "getcwd", "getdents", "getdents64", "getegid",
"geteuid", "getgid", "getgroups", "getpeername", "getpgrp", "getpid",
"getppid", "getpriority", "getrandom", "getresgid", "getresuid",
"getrlimit", "get_robust_list", "getrusage", "getsid", "getsockname",
"getsockopt", "get_thread_area", "gettid", "gettimeofday", "getuid",
"getxattr", "inotify_add_watch", "inotify_init", "inotify_init1",
"inotify_rm_watch", "io_cancel", "ioctl", "io_destroy", "io_getevents",
"ioprio_get", "ioprio_set", "io_setup", "io_submit", "ipc", "kill",
"lchown", "lgetxattr", "link", "linkat", "listen", "listxattr",
"llistxattr", "lremovexattr", "lseek", "lsetxattr", "lstat", "madvise",
"memfd_create", "mincore", "mkdir", "mkdirat", "mknod", "mknodat",
"mlock", "mlock2", "mlockall", "mmap", "mount", "mprotect", "mq_getsetattr",
"mq_notify", "mq_open", "mq_timedreceive", "mq_timedsend", "mq_unlink",
"mremap", "msgctl", "msgget", "msgrcv", "msgsnd", "msync", "munlock",
"munlockall", "munmap", "nanosleep", "newfstatat", "open", "openat",
"pause", "pipe", "pipe2", "poll", "ppoll", "prctl", "pread64", "preadv",
"prlimit64", "pselect6", "ptrace", "pwrite64", "pwritev", "read",
"readahead", "readlink", "readlinkat", "readv", "recv", "recvfrom",
"recvmmsg", "recvmsg", "remap_file_pages", "removexattr", "rename",
"renameat", "renameat2", "restart_syscall", "rmdir", "rt_sigaction",
"rt_sigpending", "rt_sigprocmask", "rt_sigqueueinfo", "rt_sigreturn",
"rt_sigsuspend", "rt_sigtimedwait", "rt_tgsigqueueinfo", "sched_getaffinity",
"sched_getattr", "sched_getparam", "sched_get_priority_max",
"sched_get_priority_min", "sched_getscheduler", "sched_rr_get_interval",
"sched_setaffinity", "sched_setattr", "sched_setparam", "sched_setscheduler",
"sched_yield", "seccomp", "select", "semctl", "semget", "semop",
"semtimedop", "send", "sendfile", "sendmmsg", "sendmsg", "sendto",
"setfsgid", "setfsuid", "setgid", "setgroups", "setitimer", "setpgid",
"setpriority", "setregid", "setresgid", "setresuid", "setreuid",
"setrlimit", "set_robust_list", "setsid", "setsockopt", "set_thread_area",
"set_tid_address", "setuid", "setxattr", "shmat", "shmctl", "shmdt",
"shmget", "shutdown", "sigaltstack", "signalfd", "signalfd4", "sigreturn",
"socket", "socketcall", "socketpair", "splice", "stat", "statfs",
"symlink", "symlinkat", "sync", "sync_file_range", "syncfs", "sysinfo",
"tee", "tgkill", "time", "timer_create", "timer_delete", "timerfd_create",
"timerfd_gettime", "timerfd_settime", "timer_getoverrun", "timer_gettime",
"timer_settime", "times", "tkill", "truncate", "umask", "uname",
"unlink", "unlinkat", "utime", "utimensat", "utimes", "vfork", "vmsplice",
"wait4", "waitid", "waitpid", "write", "writev"
],
"action": "SCMP_ACT_ALLOW"
}
]
}
Container Image Security
# Multi-stage secure build
FROM node:16-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
FROM gcr.io/distroless/nodejs16-debian11 AS production
WORKDIR /app
COPY --from=builder /app/node_modules ./node_modules
COPY --chown=nonroot:nonroot . .
USER nonroot
EXPOSE 3000
CMD ["server.js"]
Runtime Security Policies
# Pod Security Policy (Kubernetes)
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: restricted
spec:
privileged: false
allowPrivilegeEscalation: false
requiredDropCapabilities:
- ALL
volumes:
- 'configMap'
- 'emptyDir'
- 'projected'
- 'secret'
- 'downwardAPI'
- 'persistentVolumeClaim'
runAsUser:
rule: 'MustRunAsNonRoot'
seLinux:
rule: 'RunAsAny'
fsGroup:
rule: 'RunAsAny'
Performance Optimization Strategies
CPU and Memory Optimization
# CPU affinity and scheduling
docker run -d \
--cpuset-cpus="0,1" \
--cpu-shares=1024 \
--cpu-quota=50000 \
--cpu-period=100000 \
--cpu-rt-period=1000000 \
--cpu-rt-runtime=950000 \
myapp
# Memory optimization
docker run -d \
--memory=2g \
--memory-swap=4g \
--memory-reservation=1g \
--memory-swappiness=10 \
--oom-kill-disable=false \
--kernel-memory=500m \
myapp
# NUMA optimization
docker run -d \
--cpuset-mems="0" \
--memory=4g \
myapp
Storage Performance Tuning
# docker-compose.yml with optimized storage
version: '3.8'
services:
database:
image: postgres:14
volumes:
# Separate data and WAL volumes
- type: volume
source: postgres-data
target: /var/lib/postgresql/data
volume:
driver: local
driver_opts:
type: ext4
o: noatime,nodiratime
- type: volume
source: postgres-wal
target: /var/lib/postgresql/wal
volume:
driver: local
driver_opts:
type: ext4
o: noatime,sync
# tmpfs for temporary operations
- type: tmpfs
target: /tmp
tmpfs:
size: 1G
mode: 1777
# Shared memory for PostgreSQL
- type: tmpfs
target: /dev/shm
tmpfs:
size: 2G
environment:
- POSTGRES_INITDB_WALDIR=/var/lib/postgresql/wal
command: |
postgres
-c shared_buffers=1GB
-c effective_cache_size=3GB
-c maintenance_work_mem=256MB
-c checkpoint_completion_target=0.9
-c wal_buffers=16MB
-c default_statistics_target=100
-c random_page_cost=1.1
-c effective_io_concurrency=200
volumes:
postgres-data:
postgres-wal:
Network Performance Optimization
# Network performance tuning
docker run -d \
--sysctl net.core.somaxconn=65535 \
--sysctl net.ipv4.tcp_max_syn_backlog=65535 \
--sysctl net.core.rmem_max=134217728 \
--sysctl net.core.wmem_max=134217728 \
--sysctl net.ipv4.tcp_rmem="4096 65536 134217728" \
--sysctl net.ipv4.tcp_wmem="4096 65536 134217728" \
--sysctl net.ipv4.tcp_congestion_control=bbr \
--sysctl net.core.netdev_max_backlog=5000 \
--sysctl net.core.netdev_budget=600 \
myapp
# Container with optimized networking
docker run -d \
--network=host \
--ulimit nofile=65536:65536 \
--sysctl net.ipv4.ip_local_port_range="1024 65535" \
--sysctl net.ipv4.tcp_tw_reuse=1 \
high-performance-app
Comprehensive Monitoring Solutions
Security Monitoring Stack
# docker-compose.security-monitoring.yml
version: '3.8'
services:
# Falco for runtime security
falco:
image: falcosecurity/falco:latest
privileged: true
volumes:
- /var/run/docker.sock:/host/var/run/docker.sock
- /dev:/host/dev
- /proc:/host/proc:ro
- /boot:/host/boot:ro
- /lib/modules:/host/lib/modules:ro
- /usr:/host/usr:ro
- ./falco/falco_rules.yaml:/etc/falco/falco_rules.local.yaml:ro
environment:
- FALCO_GRPC_ENABLED=true
- FALCO_GRPC_BIND_ADDRESS=0.0.0.0:5060
# OSSEC for host intrusion detection
ossec:
image: wazuh/wazuh:latest
volumes:
- ossec-data:/var/ossec/data
- ./ossec/ossec.conf:/var/ossec/etc/ossec.conf:ro
- /var/log:/host/var/log:ro
ports:
- "1514:1514/udp"
- "1515:1515"
# Suricata for network intrusion detection
suricata:
image: jasonish/suricata:latest
network_mode: host
cap_add:
- NET_ADMIN
- SYS_NICE
volumes:
- suricata-logs:/var/log/suricata
- ./suricata/suricata.yaml:/etc/suricata/suricata.yaml:ro
command: -i eth0
# ELK Stack for log analysis
elasticsearch:
image: elasticsearch:7.17.0
environment:
- discovery.type=single-node
- "ES_JAVA_OPTS=-Xms2g -Xmx2g"
volumes:
- elasticsearch-data:/usr/share/elasticsearch/data
logstash:
image: logstash:7.17.0
volumes:
- ./logstash/pipeline:/usr/share/logstash/pipeline:ro
- ./logstash/config:/usr/share/logstash/config:ro
depends_on:
- elasticsearch
kibana:
image: kibana:7.17.0
ports:
- "5601:5601"
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
depends_on:
- elasticsearch
volumes:
ossec-data:
suricata-logs:
elasticsearch-data:
Performance Monitoring
# docker-compose.performance-monitoring.yml
version: '3.8'
services:
# Prometheus for metrics collection
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./prometheus/rules:/etc/prometheus/rules:ro
- prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.retention.time=30d'
- '--web.enable-lifecycle'
# Node Exporter for host metrics
node-exporter:
image: prom/node-exporter:latest
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($$|/)'
# cAdvisor for container metrics
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
ports:
- "8080:8080"
# Grafana for visualization
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
volumes:
- grafana-data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning:ro
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
# Jaeger for distributed tracing
jaeger:
image: jaegertracing/all-in-one:latest
ports:
- "16686:16686"
- "14268:14268"
environment:
- COLLECTOR_ZIPKIN_HTTP_PORT=9411
volumes:
prometheus-data:
grafana-data:
Advanced Security Patterns
Zero-Trust Container Security
#!/usr/bin/env python3
# zero-trust-enforcer.py
import docker
import json
import subprocess
from typing import Dict, List
class ZeroTrustEnforcer:
def __init__(self):
self.client = docker.from_env()
self.policies = self.load_policies()
def load_policies(self) -> Dict:
return {
"network_policies": {
"default_deny": True,
"allowed_connections": [
{"from": "web-tier", "to": "app-tier", "port": 8080},
{"from": "app-tier", "to": "db-tier", "port": 5432}
]
},
"container_policies": {
"required_labels": ["security.level", "security.owner"],
"forbidden_capabilities": ["SYS_ADMIN", "NET_ADMIN"],
"required_user": "nonroot"
}
}
def enforce_container_policies(self):
"""Enforce security policies on running containers"""
for container in self.client.containers.list():
violations = self.check_container_compliance(container)
if violations:
self.handle_violations(container, violations)
def check_container_compliance(self, container) -> List[str]:
"""Check container against security policies"""
violations = []
# Check required labels
for label in self.policies["container_policies"]["required_labels"]:
if label not in container.labels:
violations.append(f"Missing required label: {label}")
# Check user
config = container.attrs["Config"]
if not config.get("User"):
violations.append("Container running as root")
# Check capabilities
host_config = container.attrs["HostConfig"]
cap_add = host_config.get("CapAdd", [])
forbidden_caps = self.policies["container_policies"]["forbidden_capabilities"]
for cap in cap_add:
if cap in forbidden_caps:
violations.append(f"Forbidden capability: {cap}")
return violations
def handle_violations(self, container, violations: List[str]):
"""Handle policy violations"""
print(f"Policy violations in container {container.name}:")
for violation in violations:
print(f" - {violation}")
# Log violation
self.log_security_event({
"container_id": container.id,
"container_name": container.name,
"violations": violations,
"action": "quarantine"
})
# Quarantine container (stop networking)
self.quarantine_container(container)
def quarantine_container(self, container):
"""Isolate container by removing from networks"""
networks = container.attrs["NetworkSettings"]["Networks"]
for network_name in networks:
if network_name != "none":
network = self.client.networks.get(network_name)
network.disconnect(container)
# Connect to quarantine network
quarantine_net = self.get_or_create_quarantine_network()
quarantine_net.connect(container)
def get_or_create_quarantine_network(self):
"""Get or create quarantine network"""
try:
return self.client.networks.get("quarantine")
except docker.errors.NotFound:
return self.client.networks.create(
"quarantine",
driver="bridge",
internal=True,
labels={"purpose": "security-quarantine"}
)
def log_security_event(self, event: Dict):
"""Log security events"""
with open("/var/log/security-events.json", "a") as f:
json.dump(event, f)
f.write("\n")
if __name__ == "__main__":
enforcer = ZeroTrustEnforcer()
enforcer.enforce_container_policies()
Summary
This section covered core security and optimization concepts:
Advanced Security
- Kernel Security: AppArmor, SELinux, and seccomp profile configuration
- Image Security: Distroless images and multi-stage secure builds
- Runtime Policies: Pod Security Policies and runtime enforcement
- Zero-Trust: Automated policy enforcement and violation handling
Performance Optimization
- Resource Tuning: Advanced CPU, memory, and NUMA optimization
- Storage Performance: Optimized volume configurations and database tuning
- Network Optimization: Kernel parameter tuning and high-performance networking
Monitoring Excellence
- Security Monitoring: Falco, OSSEC, and Suricata integration
- Performance Monitoring: Comprehensive metrics collection with Prometheus and Grafana
- Distributed Tracing: Jaeger for application performance monitoring
Operational Patterns
- Policy Enforcement: Automated compliance checking and remediation
- Incident Response: Security event logging and container quarantine
- Continuous Monitoring: Real-time security and performance assessment
Next Steps: Part 3 explores practical applications including real-world security implementations, performance optimization case studies, and enterprise monitoring deployments.
Practical Applications and Examples
Practical Security and Optimization Applications
This section demonstrates real-world Docker security implementations, performance optimization case studies, and enterprise monitoring deployments across various scenarios.
Enterprise Security Implementation
Financial Services Security Stack
# docker-compose.financial-security.yml
version: '3.8'
services:
# Web Application Firewall
waf:
image: owasp/modsecurity-crs:nginx
ports:
- "80:80"
- "443:443"
volumes:
- ./waf/nginx.conf:/etc/nginx/nginx.conf:ro
- ./waf/modsecurity.conf:/etc/modsecurity/modsecurity.conf:ro
- ./waf/crs-setup.conf:/etc/modsecurity/crs/crs-setup.conf:ro
- waf-logs:/var/log/nginx
networks:
- dmz
environment:
- PARANOIA=2
- ANOMALY_INBOUND=5
- ANOMALY_OUTBOUND=4
# Application with strict security
trading-app:
build: ./trading-app
networks:
- app-tier
volumes:
- trading-data:/app/data:ro
- audit-logs:/app/logs
environment:
- ENCRYPTION_KEY_FILE=/run/secrets/encryption_key
- DATABASE_URL_FILE=/run/secrets/db_connection
- AUDIT_ENABLED=true
- COMPLIANCE_MODE=PCI_DSS
secrets:
- encryption_key
- db_connection
security_opt:
- no-new-privileges:true
- apparmor:trading-app-profile
cap_drop:
- ALL
cap_add:
- CHOWN
- SETGID
- SETUID
read_only: true
tmpfs:
- /tmp:rw,noexec,nosuid,size=100m
user: "1000:1000"
deploy:
resources:
limits:
memory: 2G
cpus: '1.0'
reservations:
memory: 1G
cpus: '0.5'
# Secure Database
secure-db:
image: postgres:14
networks:
- db-tier
volumes:
- secure-db-data:/var/lib/postgresql/data
- secure-db-config:/etc/postgresql:ro
environment:
- POSTGRES_DB=trading
- POSTGRES_USER=trading_user
- POSTGRES_PASSWORD_FILE=/run/secrets/db_password
- POSTGRES_SSL_MODE=require
secrets:
- db_password
command: |
postgres
-c ssl=on
-c ssl_cert_file=/etc/ssl/certs/server.crt
-c ssl_key_file=/etc/ssl/private/server.key
-c log_statement=all
-c log_connections=on
-c log_disconnections=on
-c log_checkpoints=on
-c log_lock_waits=on
# HSM (Hardware Security Module) Simulator
hsm:
image: softhsm:latest
networks:
- security-tier
volumes:
- hsm-data:/var/lib/softhsm
environment:
- SOFTHSM2_CONF=/etc/softhsm2.conf
security_opt:
- no-new-privileges:true
cap_drop:
- ALL
# Compliance Scanner
compliance-scanner:
build: ./compliance-scanner
networks:
- monitoring
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- compliance-reports:/reports
environment:
- SCAN_SCHEDULE=0 */6 * * *
- COMPLIANCE_STANDARDS=PCI_DSS,SOX,GDPR
- ALERT_WEBHOOK=${COMPLIANCE_WEBHOOK_URL}
networks:
dmz:
driver: bridge
driver_opts:
com.docker.network.bridge.enable_icc: "false"
app-tier:
driver: bridge
internal: true
db-tier:
driver: bridge
internal: true
security-tier:
driver: bridge
internal: true
monitoring:
driver: bridge
volumes:
waf-logs:
trading-data:
audit-logs:
secure-db-data:
driver: local
driver_opts:
type: ext4
o: noatime,nodev,nosuid
secure-db-config:
hsm-data:
compliance-reports:
secrets:
encryption_key:
external: true
db_connection:
external: true
db_password:
external: true
High-Performance Computing Optimization
Scientific Computing Stack
# docker-compose.hpc-optimized.yml
version: '3.8'
services:
# Compute Node with GPU Support
compute-node:
build: ./compute-node
runtime: nvidia
environment:
- NVIDIA_VISIBLE_DEVICES=all
- CUDA_VISIBLE_DEVICES=0,1
- OMP_NUM_THREADS=16
- MKL_NUM_THREADS=16
volumes:
- compute-data:/data
- compute-scratch:/scratch
- type: tmpfs
target: /tmp
tmpfs:
size: 8G
networks:
- compute-network
deploy:
resources:
limits:
memory: 32G
cpus: '16.0'
reservations:
memory: 16G
cpus: '8.0'
sysctls:
- kernel.shmmax=68719476736
- kernel.shmall=4294967296
ulimits:
memlock:
soft: -1
hard: -1
stack:
soft: 67108864
hard: 67108864
# High-Performance Storage
storage-node:
image: gluster/gluster-centos
privileged: true
networks:
- storage-network
volumes:
- gluster-data:/data
- /sys/fs/cgroup:/sys/fs/cgroup:ro
environment:
- GLUSTER_VOLUME_NAME=compute-volume
- GLUSTER_REPLICA_COUNT=3
# Message Passing Interface (MPI) Coordinator
mpi-coordinator:
build: ./mpi-coordinator
networks:
- compute-network
volumes:
- mpi-config:/etc/mpi
environment:
- MPI_HOSTS=compute-node-1,compute-node-2,compute-node-3
- MPI_SLOTS_PER_HOST=16
command: |
mpirun --allow-run-as-root \
--hostfile /etc/mpi/hostfile \
--np 48 \
--map-by node \
--bind-to core \
/app/compute-job
# Performance Monitor
perf-monitor:
build: ./perf-monitor
privileged: true
networks:
- monitoring
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- perf-data:/data
environment:
- MONITOR_INTERVAL=1
- METRICS=cpu,memory,network,gpu,storage
command: |
sh -c "
while true; do
perf stat -a -e cycles,instructions,cache-misses,branch-misses sleep 1
nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv,noheader,nounits
iostat -x 1 1
done
"
networks:
compute-network:
driver: bridge
driver_opts:
com.docker.network.bridge.name: compute-br
com.docker.network.mtu: 9000
storage-network:
driver: bridge
driver_opts:
com.docker.network.mtu: 9000
monitoring:
driver: bridge
volumes:
compute-data:
driver: local
driver_opts:
type: ext4
o: noatime,nodiratime,data=writeback
compute-scratch:
driver: local
driver_opts:
type: tmpfs
device: tmpfs
o: size=16G,noatime
gluster-data:
mpi-config:
perf-data:
Real-Time Security Monitoring
Security Operations Center (SOC)
#!/usr/bin/env python3
# soc-monitor.py
import asyncio
import docker
import json
import logging
import websockets
from datetime import datetime
from typing import Dict, List
import aioredis
class SOCMonitor:
def __init__(self):
self.docker_client = docker.from_env()
self.redis = None
self.websocket_clients = set()
self.security_rules = self.load_security_rules()
async def initialize(self):
"""Initialize async components"""
self.redis = await aioredis.from_url("redis://localhost:6379")
def load_security_rules(self) -> Dict:
"""Load security detection rules"""
return {
"suspicious_processes": [
"nc", "netcat", "nmap", "wget", "curl", "python", "perl", "ruby"
],
"suspicious_network": [
{"port": 22, "protocol": "tcp", "direction": "outbound"},
{"port": 3389, "protocol": "tcp", "direction": "outbound"},
{"port": 4444, "protocol": "tcp", "direction": "any"}
],
"file_integrity": [
"/etc/passwd", "/etc/shadow", "/etc/hosts", "/etc/crontab"
],
"resource_thresholds": {
"cpu_percent": 90,
"memory_percent": 95,
"network_connections": 1000
}
}
async def monitor_containers(self):
"""Monitor container security events"""
while True:
try:
for container in self.docker_client.containers.list():
await self.analyze_container_security(container)
await asyncio.sleep(5)
except Exception as e:
logging.error(f"Container monitoring error: {e}")
await asyncio.sleep(10)
async def analyze_container_security(self, container):
"""Analyze individual container for security issues"""
try:
# Check running processes
processes = await self.get_container_processes(container)
await self.check_suspicious_processes(container, processes)
# Check network connections
connections = await self.get_network_connections(container)
await self.check_suspicious_network(container, connections)
# Check resource usage
stats = container.stats(stream=False)
await self.check_resource_anomalies(container, stats)
# Check file integrity
await self.check_file_integrity(container)
except Exception as e:
logging.error(f"Error analyzing container {container.name}: {e}")
async def get_container_processes(self, container) -> List[Dict]:
"""Get running processes in container"""
try:
result = container.exec_run("ps aux", demux=True)
if result.exit_code == 0:
lines = result.output[0].decode().strip().split('\n')[1:] # Skip header
processes = []
for line in lines:
parts = line.split(None, 10)
if len(parts) >= 11:
processes.append({
'user': parts[0],
'pid': parts[1],
'cpu': parts[2],
'mem': parts[3],
'command': parts[10]
})
return processes
except Exception as e:
logging.error(f"Failed to get processes for {container.name}: {e}")
return []
async def check_suspicious_processes(self, container, processes: List[Dict]):
"""Check for suspicious processes"""
for process in processes:
command = process['command'].lower()
for suspicious_cmd in self.security_rules["suspicious_processes"]:
if suspicious_cmd in command:
await self.create_security_alert({
'type': 'suspicious_process',
'severity': 'medium',
'container': container.name,
'process': process,
'description': f"Suspicious process detected: {suspicious_cmd}"
})
async def get_network_connections(self, container) -> List[Dict]:
"""Get network connections from container"""
try:
result = container.exec_run("netstat -tuln", demux=True)
if result.exit_code == 0:
lines = result.output[0].decode().strip().split('\n')
connections = []
for line in lines:
if 'LISTEN' in line or 'ESTABLISHED' in line:
parts = line.split()
if len(parts) >= 4:
connections.append({
'protocol': parts[0],
'local_address': parts[3],
'state': parts[5] if len(parts) > 5 else 'UNKNOWN'
})
return connections
except Exception as e:
logging.error(f"Failed to get connections for {container.name}: {e}")
return []
async def check_suspicious_network(self, container, connections: List[Dict]):
"""Check for suspicious network activity"""
for conn in connections:
local_addr = conn['local_address']
if ':' in local_addr:
port = int(local_addr.split(':')[-1])
for rule in self.security_rules["suspicious_network"]:
if port == rule['port']:
await self.create_security_alert({
'type': 'suspicious_network',
'severity': 'high',
'container': container.name,
'connection': conn,
'description': f"Suspicious network activity on port {port}"
})
async def check_resource_anomalies(self, container, stats: Dict):
"""Check for resource usage anomalies"""
try:
# Calculate CPU percentage
cpu_delta = stats['cpu_stats']['cpu_usage']['total_usage'] - \
stats['precpu_stats']['cpu_usage']['total_usage']
system_delta = stats['cpu_stats']['system_cpu_usage'] - \
stats['precpu_stats']['system_cpu_usage']
cpu_percent = (cpu_delta / system_delta) * 100.0
# Calculate memory percentage
memory_usage = stats['memory_stats']['usage']
memory_limit = stats['memory_stats']['limit']
memory_percent = (memory_usage / memory_limit) * 100.0
# Check thresholds
thresholds = self.security_rules["resource_thresholds"]
if cpu_percent > thresholds["cpu_percent"]:
await self.create_security_alert({
'type': 'resource_anomaly',
'severity': 'medium',
'container': container.name,
'metric': 'cpu',
'value': cpu_percent,
'threshold': thresholds["cpu_percent"],
'description': f"High CPU usage: {cpu_percent:.2f}%"
})
if memory_percent > thresholds["memory_percent"]:
await self.create_security_alert({
'type': 'resource_anomaly',
'severity': 'high',
'container': container.name,
'metric': 'memory',
'value': memory_percent,
'threshold': thresholds["memory_percent"],
'description': f"High memory usage: {memory_percent:.2f}%"
})
except Exception as e:
logging.error(f"Error checking resource anomalies: {e}")
async def check_file_integrity(self, container):
"""Check file integrity for critical files"""
for file_path in self.security_rules["file_integrity"]:
try:
# Get file hash
result = container.exec_run(f"sha256sum {file_path}", demux=True)
if result.exit_code == 0:
current_hash = result.output[0].decode().split()[0]
# Check against stored hash
stored_hash = await self.redis.get(f"hash:{container.name}:{file_path}")
if stored_hash:
if current_hash != stored_hash.decode():
await self.create_security_alert({
'type': 'file_integrity',
'severity': 'critical',
'container': container.name,
'file': file_path,
'description': f"File integrity violation: {file_path}"
})
else:
# Store initial hash
await self.redis.set(f"hash:{container.name}:{file_path}", current_hash)
except Exception as e:
logging.debug(f"File integrity check failed for {file_path}: {e}")
async def create_security_alert(self, alert: Dict):
"""Create and distribute security alert"""
alert['timestamp'] = datetime.now().isoformat()
alert['id'] = f"alert_{int(datetime.now().timestamp())}"
# Store in Redis
await self.redis.lpush("security_alerts", json.dumps(alert))
await self.redis.ltrim("security_alerts", 0, 999) # Keep last 1000 alerts
# Log alert
logging.warning(f"Security Alert: {alert}")
# Send to WebSocket clients
await self.broadcast_alert(alert)
# Trigger automated response if critical
if alert['severity'] == 'critical':
await self.trigger_incident_response(alert)
async def broadcast_alert(self, alert: Dict):
"""Broadcast alert to WebSocket clients"""
if self.websocket_clients:
message = json.dumps(alert)
await asyncio.gather(
*[client.send(message) for client in self.websocket_clients],
return_exceptions=True
)
async def trigger_incident_response(self, alert: Dict):
"""Trigger automated incident response"""
container_name = alert.get('container')
if container_name:
try:
container = self.docker_client.containers.get(container_name)
# Isolate container (disconnect from networks except monitoring)
networks = container.attrs['NetworkSettings']['Networks']
for network_name in networks:
if network_name != 'monitoring':
network = self.docker_client.networks.get(network_name)
network.disconnect(container)
logging.critical(f"Container {container_name} isolated due to critical alert")
except Exception as e:
logging.error(f"Failed to isolate container {container_name}: {e}")
async def websocket_handler(self, websocket, path):
"""Handle WebSocket connections for real-time alerts"""
self.websocket_clients.add(websocket)
try:
# Send recent alerts
recent_alerts = await self.redis.lrange("security_alerts", 0, 49)
for alert_json in recent_alerts:
await websocket.send(alert_json.decode())
# Keep connection alive
await websocket.wait_closed()
finally:
self.websocket_clients.remove(websocket)
async def run(self):
"""Run the SOC monitor"""
await self.initialize()
# Start monitoring tasks
monitor_task = asyncio.create_task(self.monitor_containers())
# Start WebSocket server
websocket_server = await websockets.serve(
self.websocket_handler, "localhost", 8765
)
logging.info("SOC Monitor started")
try:
await asyncio.gather(monitor_task)
except KeyboardInterrupt:
logging.info("SOC Monitor stopped")
finally:
websocket_server.close()
await websocket_server.wait_closed()
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
monitor = SOCMonitor()
asyncio.run(monitor.run())
Performance Optimization Case Studies
E-Commerce Platform Optimization
# docker-compose.ecommerce-optimized.yml
version: '3.8'
services:
# Load Balancer with Connection Pooling
haproxy:
image: haproxy:2.6
ports:
- "80:80"
- "443:443"
volumes:
- ./haproxy/haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro
networks:
- frontend
sysctls:
- net.core.somaxconn=65535
- net.ipv4.tcp_max_syn_backlog=65535
deploy:
resources:
limits:
memory: 1G
cpus: '2.0'
# Web Frontend with Caching
web:
build: ./web-optimized
networks:
- frontend
- backend
volumes:
- web-cache:/var/cache/nginx
- type: tmpfs
target: /tmp
tmpfs:
size: 512M
environment:
- NGINX_WORKER_PROCESSES=auto
- NGINX_WORKER_CONNECTIONS=4096
sysctls:
- net.core.rmem_max=134217728
- net.core.wmem_max=134217728
deploy:
replicas: 4
resources:
limits:
memory: 512M
cpus: '1.0'
# Application Server with JVM Tuning
app:
build: ./app-optimized
networks:
- backend
environment:
- JAVA_OPTS=-Xms2g -Xmx4g -XX:+UseG1GC -XX:MaxGCPauseMillis=200
- JVM_OPTS=-XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap
volumes:
- app-logs:/app/logs
deploy:
replicas: 6
resources:
limits:
memory: 6G
cpus: '2.0'
reservations:
memory: 4G
cpus: '1.0'
# Database with Performance Tuning
postgres:
image: postgres:14
networks:
- backend
volumes:
- postgres-data:/var/lib/postgresql/data
- postgres-wal:/var/lib/postgresql/wal
- type: tmpfs
target: /tmp
tmpfs:
size: 2G
environment:
- POSTGRES_INITDB_WALDIR=/var/lib/postgresql/wal
command: |
postgres
-c max_connections=200
-c shared_buffers=2GB
-c effective_cache_size=6GB
-c maintenance_work_mem=512MB
-c checkpoint_completion_target=0.9
-c wal_buffers=16MB
-c default_statistics_target=100
-c random_page_cost=1.1
-c effective_io_concurrency=200
-c work_mem=8MB
-c min_wal_size=2GB
-c max_wal_size=8GB
deploy:
resources:
limits:
memory: 8G
cpus: '4.0'
# Redis Cluster for Caching
redis:
image: redis:7-alpine
networks:
- backend
volumes:
- redis-data:/data
command: |
redis-server
--maxmemory 4gb
--maxmemory-policy allkeys-lru
--save 900 1
--save 300 10
--save 60 10000
--tcp-backlog 511
--tcp-keepalive 300
sysctls:
- net.core.somaxconn=65535
deploy:
resources:
limits:
memory: 6G
cpus: '2.0'
networks:
frontend:
driver: bridge
driver_opts:
com.docker.network.bridge.name: frontend-br
com.docker.network.mtu: 1500
backend:
driver: bridge
internal: true
driver_opts:
com.docker.network.bridge.name: backend-br
volumes:
web-cache:
app-logs:
postgres-data:
driver: local
driver_opts:
type: ext4
o: noatime,nodiratime
postgres-wal:
driver: local
driver_opts:
type: ext4
o: noatime,sync
redis-data:
Summary
This section demonstrated practical security and optimization applications:
Enterprise Security
- Financial Services: Comprehensive security stack with WAF, HSM, and compliance scanning
- Real-Time Monitoring: SOC implementation with automated threat detection and response
- Zero-Trust Architecture: Policy enforcement and container isolation
Performance Optimization
- HPC Computing: GPU-accelerated computing with MPI coordination and high-performance storage
- E-Commerce Platform: Multi-tier optimization with caching, connection pooling, and database tuning
- Resource Management: Advanced CPU, memory, and network optimization
Monitoring and Response
- Security Operations: Automated threat detection with process, network, and file integrity monitoring
- Performance Analytics: Comprehensive metrics collection and anomaly detection
- Incident Response: Automated container isolation and alert distribution
Key Patterns Applied
- Defense in Depth: Multiple security layers from network to application level
- Performance Tuning: Systematic optimization across all infrastructure components
- Automation: Automated monitoring, alerting, and response capabilities
- Compliance: Continuous compliance monitoring and reporting
Next Steps: Part 4 covers advanced techniques including custom security plugins, performance profiling tools, and enterprise-grade monitoring solutions.
Advanced Techniques and Patterns
Advanced Security and Optimization Techniques
This section explores sophisticated Docker security and performance patterns including custom security plugins, advanced profiling tools, and enterprise-grade monitoring solutions.
Custom Security Plugins
Runtime Security Engine
// security-engine/main.go
package main
import (
"context"
"encoding/json"
"fmt"
"log"
"net/http"
"time"
"github.com/docker/docker/api/types"
"github.com/docker/docker/client"
)
type SecurityEngine struct {
dockerClient *client.Client
policies *SecurityPolicies
violations chan SecurityViolation
}
type SecurityPolicies struct {
ImagePolicies []ImagePolicy `json:"image_policies"`
RuntimePolicies []RuntimePolicy `json:"runtime_policies"`
NetworkPolicies []NetworkPolicy `json:"network_policies"`
}
type ImagePolicy struct {
Name string `json:"name"`
AllowedTags []string `json:"allowed_tags"`
BlockedCVEs []string `json:"blocked_cves"`
MaxSeverity string `json:"max_severity"`
}
type RuntimePolicy struct {
Name string `json:"name"`
AllowedProcesses []string `json:"allowed_processes"`
BlockedSyscalls []string `json:"blocked_syscalls"`
MaxCPUPercent float64 `json:"max_cpu_percent"`
MaxMemoryMB int64 `json:"max_memory_mb"`
}
type NetworkPolicy struct {
Name string `json:"name"`
AllowedPorts []int `json:"allowed_ports"`
BlockedDomains []string `json:"blocked_domains"`
}
type SecurityViolation struct {
ContainerID string `json:"container_id"`
ContainerName string `json:"container_name"`
ViolationType string `json:"violation_type"`
Severity string `json:"severity"`
Description string `json:"description"`
Timestamp time.Time `json:"timestamp"`
Metadata map[string]interface{} `json:"metadata"`
}
func NewSecurityEngine() (*SecurityEngine, error) {
dockerClient, err := client.NewClientWithOpts(client.FromEnv)
if err != nil {
return nil, err
}
policies := &SecurityPolicies{
ImagePolicies: []ImagePolicy{
{
Name: "production-images",
AllowedTags: []string{"latest", "stable", "v*"},
BlockedCVEs: []string{"CVE-2021-44228", "CVE-2021-45046"},
MaxSeverity: "HIGH",
},
},
RuntimePolicies: []RuntimePolicy{
{
Name: "standard-runtime",
AllowedProcesses: []string{"node", "nginx", "postgres", "redis"},
BlockedSyscalls: []string{"ptrace", "mount", "umount"},
MaxCPUPercent: 80.0,
MaxMemoryMB: 2048,
},
},
NetworkPolicies: []NetworkPolicy{
{
Name: "web-tier",
AllowedPorts: []int{80, 443, 8080},
BlockedDomains: []string{"malicious.com", "suspicious.net"},
},
},
}
return &SecurityEngine{
dockerClient: dockerClient,
policies: policies,
violations: make(chan SecurityViolation, 1000),
}, nil
}
func (se *SecurityEngine) MonitorContainers(ctx context.Context) {
ticker := time.NewTicker(10 * time.Second)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
se.scanRunningContainers(ctx)
}
}
}
func (se *SecurityEngine) scanRunningContainers(ctx context.Context) {
containers, err := se.dockerClient.ContainerList(ctx, types.ContainerListOptions{})
if err != nil {
log.Printf("Error listing containers: %v", err)
return
}
for _, container := range containers {
go se.analyzeContainer(ctx, container)
}
}
func (se *SecurityEngine) analyzeContainer(ctx context.Context, container types.Container) {
// Check image compliance
se.checkImageCompliance(container)
// Check runtime compliance
se.checkRuntimeCompliance(ctx, container)
// Check network compliance
se.checkNetworkCompliance(ctx, container)
}
func (se *SecurityEngine) checkImageCompliance(container types.Container) {
for _, policy := range se.policies.ImagePolicies {
// Check if image tag is allowed
imageTag := container.Image
allowed := false
for _, allowedTag := range policy.AllowedTags {
if matchesPattern(imageTag, allowedTag) {
allowed = true
break
}
}
if !allowed {
se.violations <- SecurityViolation{
ContainerID: container.ID,
ContainerName: container.Names[0],
ViolationType: "image_policy",
Severity: "HIGH",
Description: fmt.Sprintf("Image tag %s not allowed by policy %s", imageTag, policy.Name),
Timestamp: time.Now(),
Metadata: map[string]interface{}{
"image": imageTag,
"policy": policy.Name,
},
}
}
}
}
func (se *SecurityEngine) checkRuntimeCompliance(ctx context.Context, container types.Container) {
// Get container stats
stats, err := se.dockerClient.ContainerStats(ctx, container.ID, false)
if err != nil {
return
}
defer stats.Body.Close()
var containerStats types.StatsJSON
if err := json.NewDecoder(stats.Body).Decode(&containerStats); err != nil {
return
}
// Check CPU usage
cpuPercent := calculateCPUPercent(&containerStats)
for _, policy := range se.policies.RuntimePolicies {
if cpuPercent > policy.MaxCPUPercent {
se.violations <- SecurityViolation{
ContainerID: container.ID,
ContainerName: container.Names[0],
ViolationType: "runtime_policy",
Severity: "MEDIUM",
Description: fmt.Sprintf("CPU usage %.2f%% exceeds policy limit %.2f%%", cpuPercent, policy.MaxCPUPercent),
Timestamp: time.Now(),
Metadata: map[string]interface{}{
"cpu_percent": cpuPercent,
"limit": policy.MaxCPUPercent,
},
}
}
}
// Check memory usage
memoryMB := containerStats.MemoryStats.Usage / 1024 / 1024
for _, policy := range se.policies.RuntimePolicies {
if int64(memoryMB) > policy.MaxMemoryMB {
se.violations <- SecurityViolation{
ContainerID: container.ID,
ContainerName: container.Names[0],
ViolationType: "runtime_policy",
Severity: "HIGH",
Description: fmt.Sprintf("Memory usage %dMB exceeds policy limit %dMB", memoryMB, policy.MaxMemoryMB),
Timestamp: time.Now(),
Metadata: map[string]interface{}{
"memory_mb": memoryMB,
"limit": policy.MaxMemoryMB,
},
}
}
}
}
func (se *SecurityEngine) checkNetworkCompliance(ctx context.Context, container types.Container) {
// Get container network settings
containerJSON, err := se.dockerClient.ContainerInspect(ctx, container.ID)
if err != nil {
return
}
// Check exposed ports
for port := range containerJSON.NetworkSettings.Ports {
portNum := port.Int()
allowed := false
for _, policy := range se.policies.NetworkPolicies {
for _, allowedPort := range policy.AllowedPorts {
if portNum == allowedPort {
allowed = true
break
}
}
}
if !allowed {
se.violations <- SecurityViolation{
ContainerID: container.ID,
ContainerName: container.Names[0],
ViolationType: "network_policy",
Severity: "MEDIUM",
Description: fmt.Sprintf("Port %d not allowed by network policy", portNum),
Timestamp: time.Now(),
Metadata: map[string]interface{}{
"port": portNum,
},
}
}
}
}
func (se *SecurityEngine) ProcessViolations(ctx context.Context) {
for {
select {
case <-ctx.Done():
return
case violation := <-se.violations:
se.handleViolation(violation)
}
}
}
func (se *SecurityEngine) handleViolation(violation SecurityViolation) {
// Log violation
log.Printf("Security Violation: %+v", violation)
// Send to external systems (SIEM, alerting, etc.)
se.sendToSIEM(violation)
// Take automated action based on severity
switch violation.Severity {
case "CRITICAL":
se.quarantineContainer(violation.ContainerID)
case "HIGH":
se.alertSecurityTeam(violation)
case "MEDIUM":
se.logForReview(violation)
}
}
func (se *SecurityEngine) quarantineContainer(containerID string) {
ctx := context.Background()
// Stop the container
timeout := 30
if err := se.dockerClient.ContainerStop(ctx, containerID, &timeout); err != nil {
log.Printf("Failed to stop container %s: %v", containerID, err)
}
log.Printf("Container %s quarantined due to critical security violation", containerID)
}
func (se *SecurityEngine) sendToSIEM(violation SecurityViolation) {
// Implementation for SIEM integration
// This could be Splunk, ELK, or other SIEM systems
}
func (se *SecurityEngine) alertSecurityTeam(violation SecurityViolation) {
// Implementation for alerting (Slack, PagerDuty, etc.)
}
func (se *SecurityEngine) logForReview(violation SecurityViolation) {
// Implementation for logging violations for manual review
}
func calculateCPUPercent(stats *types.StatsJSON) float64 {
cpuDelta := float64(stats.CPUStats.CPUUsage.TotalUsage - stats.PreCPUStats.CPUUsage.TotalUsage)
systemDelta := float64(stats.CPUStats.SystemUsage - stats.PreCPUStats.SystemUsage)
if systemDelta > 0.0 && cpuDelta > 0.0 {
return (cpuDelta / systemDelta) * float64(len(stats.CPUStats.CPUUsage.PercpuUsage)) * 100.0
}
return 0.0
}
func matchesPattern(text, pattern string) bool {
// Simple pattern matching - in production, use proper regex
return text == pattern || pattern == "*"
}
func main() {
engine, err := NewSecurityEngine()
if err != nil {
log.Fatal(err)
}
ctx := context.Background()
// Start monitoring
go engine.MonitorContainers(ctx)
go engine.ProcessViolations(ctx)
// Start HTTP API
http.HandleFunc("/violations", func(w http.ResponseWriter, r *http.Request) {
// Return recent violations
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(map[string]string{"status": "ok"})
})
log.Println("Security Engine started on :8080")
log.Fatal(http.ListenAndServe(":8080", nil))
}
Advanced Performance Profiling
Container Performance Analyzer
#!/usr/bin/env python3
# performance-analyzer.py
import asyncio
import docker
import psutil
import json
import time
from datetime import datetime, timedelta
from typing import Dict, List, Optional
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
class PerformanceAnalyzer:
def __init__(self):
self.docker_client = docker.from_env()
self.metrics_history = {}
self.analysis_results = {}
def collect_system_metrics(self) -> Dict:
"""Collect system-wide performance metrics"""
return {
'timestamp': datetime.now().isoformat(),
'cpu': {
'percent': psutil.cpu_percent(interval=1),
'count': psutil.cpu_count(),
'freq': psutil.cpu_freq()._asdict() if psutil.cpu_freq() else None,
'per_cpu': psutil.cpu_percent(interval=1, percpu=True),
'load_avg': psutil.getloadavg() if hasattr(psutil, 'getloadavg') else None
},
'memory': {
'total': psutil.virtual_memory().total,
'available': psutil.virtual_memory().available,
'percent': psutil.virtual_memory().percent,
'used': psutil.virtual_memory().used,
'free': psutil.virtual_memory().free,
'buffers': psutil.virtual_memory().buffers,
'cached': psutil.virtual_memory().cached
},
'disk': {
'usage': {partition.mountpoint: psutil.disk_usage(partition.mountpoint)._asdict()
for partition in psutil.disk_partitions()},
'io': psutil.disk_io_counters()._asdict() if psutil.disk_io_counters() else None
},
'network': {
'io': psutil.net_io_counters()._asdict(),
'connections': len(psutil.net_connections())
}
}
def collect_container_metrics(self, container) -> Dict:
"""Collect detailed container performance metrics"""
try:
stats = container.stats(stream=False)
# Calculate CPU percentage
cpu_delta = stats['cpu_stats']['cpu_usage']['total_usage'] - \
stats['precpu_stats']['cpu_usage']['total_usage']
system_delta = stats['cpu_stats']['system_cpu_usage'] - \
stats['precpu_stats']['system_cpu_usage']
cpu_percent = 0.0
if system_delta > 0 and cpu_delta > 0:
cpu_percent = (cpu_delta / system_delta) * \
len(stats['cpu_stats']['cpu_usage']['percpu_usage']) * 100.0
# Memory metrics
memory_usage = stats['memory_stats']['usage']
memory_limit = stats['memory_stats']['limit']
memory_percent = (memory_usage / memory_limit) * 100.0
# Network metrics
networks = stats.get('networks', {})
total_rx_bytes = sum(net['rx_bytes'] for net in networks.values())
total_tx_bytes = sum(net['tx_bytes'] for net in networks.values())
# Block I/O metrics
blkio_stats = stats.get('blkio_stats', {})
io_service_bytes = blkio_stats.get('io_service_bytes_recursive', [])
read_bytes = sum(entry['value'] for entry in io_service_bytes
if entry['op'] == 'Read')
write_bytes = sum(entry['value'] for entry in io_service_bytes
if entry['op'] == 'Write')
return {
'timestamp': datetime.now().isoformat(),
'container_id': container.id,
'container_name': container.name,
'cpu': {
'percent': cpu_percent,
'usage': stats['cpu_stats']['cpu_usage']['total_usage'],
'system_usage': stats['cpu_stats']['system_cpu_usage'],
'throttling': stats['cpu_stats'].get('throttling_data', {})
},
'memory': {
'usage': memory_usage,
'limit': memory_limit,
'percent': memory_percent,
'cache': stats['memory_stats'].get('stats', {}).get('cache', 0),
'rss': stats['memory_stats'].get('stats', {}).get('rss', 0)
},
'network': {
'rx_bytes': total_rx_bytes,
'tx_bytes': total_tx_bytes,
'rx_packets': sum(net['rx_packets'] for net in networks.values()),
'tx_packets': sum(net['tx_packets'] for net in networks.values())
},
'blkio': {
'read_bytes': read_bytes,
'write_bytes': write_bytes
}
}
except Exception as e:
print(f"Error collecting metrics for container {container.name}: {e}")
return None
def analyze_performance_trends(self, container_id: str, hours: int = 24) -> Dict:
"""Analyze performance trends for a container"""
if container_id not in self.metrics_history:
return {"error": "No metrics history found"}
metrics = self.metrics_history[container_id]
cutoff_time = datetime.now() - timedelta(hours=hours)
# Filter recent metrics
recent_metrics = [m for m in metrics
if datetime.fromisoformat(m['timestamp']) > cutoff_time]
if not recent_metrics:
return {"error": "No recent metrics found"}
# Extract time series data
timestamps = [datetime.fromisoformat(m['timestamp']) for m in recent_metrics]
cpu_values = [m['cpu']['percent'] for m in recent_metrics]
memory_values = [m['memory']['percent'] for m in recent_metrics]
# Calculate statistics
analysis = {
'container_id': container_id,
'analysis_period': f"{hours} hours",
'sample_count': len(recent_metrics),
'cpu': {
'mean': np.mean(cpu_values),
'std': np.std(cpu_values),
'min': np.min(cpu_values),
'max': np.max(cpu_values),
'p95': np.percentile(cpu_values, 95),
'p99': np.percentile(cpu_values, 99)
},
'memory': {
'mean': np.mean(memory_values),
'std': np.std(memory_values),
'min': np.min(memory_values),
'max': np.max(memory_values),
'p95': np.percentile(memory_values, 95),
'p99': np.percentile(memory_values, 99)
}
}
# Detect anomalies
analysis['anomalies'] = self.detect_anomalies(recent_metrics)
# Performance recommendations
analysis['recommendations'] = self.generate_recommendations(analysis)
return analysis
def detect_anomalies(self, metrics: List[Dict]) -> List[Dict]:
"""Detect performance anomalies using statistical methods"""
anomalies = []
cpu_values = [m['cpu']['percent'] for m in metrics]
memory_values = [m['memory']['percent'] for m in metrics]
# CPU anomalies (values > 2 standard deviations from mean)
cpu_mean = np.mean(cpu_values)
cpu_std = np.std(cpu_values)
cpu_threshold = cpu_mean + 2 * cpu_std
for i, metric in enumerate(metrics):
if metric['cpu']['percent'] > cpu_threshold:
anomalies.append({
'type': 'cpu_spike',
'timestamp': metric['timestamp'],
'value': metric['cpu']['percent'],
'threshold': cpu_threshold,
'severity': 'high' if metric['cpu']['percent'] > cpu_mean + 3 * cpu_std else 'medium'
})
# Memory anomalies
memory_mean = np.mean(memory_values)
memory_std = np.std(memory_values)
memory_threshold = memory_mean + 2 * memory_std
for i, metric in enumerate(metrics):
if metric['memory']['percent'] > memory_threshold:
anomalies.append({
'type': 'memory_spike',
'timestamp': metric['timestamp'],
'value': metric['memory']['percent'],
'threshold': memory_threshold,
'severity': 'high' if metric['memory']['percent'] > memory_mean + 3 * memory_std else 'medium'
})
return anomalies
def generate_recommendations(self, analysis: Dict) -> List[str]:
"""Generate performance optimization recommendations"""
recommendations = []
cpu_stats = analysis['cpu']
memory_stats = analysis['memory']
# CPU recommendations
if cpu_stats['p95'] > 80:
recommendations.append("Consider increasing CPU limits or optimizing CPU-intensive operations")
if cpu_stats['std'] > 20:
recommendations.append("High CPU variance detected - investigate workload patterns")
# Memory recommendations
if memory_stats['p95'] > 85:
recommendations.append("Consider increasing memory limits or optimizing memory usage")
if memory_stats['max'] > 95:
recommendations.append("Memory usage approaching limits - risk of OOM kills")
# General recommendations
if len(analysis.get('anomalies', [])) > 10:
recommendations.append("Frequent anomalies detected - review application performance")
return recommendations
def generate_performance_report(self, container_id: str) -> str:
"""Generate comprehensive performance report"""
analysis = self.analyze_performance_trends(container_id)
if 'error' in analysis:
return f"Error generating report: {analysis['error']}"
report = f"""
Performance Analysis Report
Container ID: {container_id}
Analysis Period: {analysis['analysis_period']}
Sample Count: {analysis['sample_count']}
CPU Performance:
- Average: {analysis['cpu']['mean']:.2f}%
- 95th Percentile: {analysis['cpu']['p95']:.2f}%
- Maximum: {analysis['cpu']['max']:.2f}%
- Standard Deviation: {analysis['cpu']['std']:.2f}%
Memory Performance:
- Average: {analysis['memory']['mean']:.2f}%
- 95th Percentile: {analysis['memory']['p95']:.2f}%
- Maximum: {analysis['memory']['max']:.2f}%
- Standard Deviation: {analysis['memory']['std']:.2f}%
Anomalies Detected: {len(analysis['anomalies'])}
Recommendations:
"""
for i, rec in enumerate(analysis['recommendations'], 1):
report += f"{i}. {rec}\n"
return report
def create_performance_dashboard(self, container_id: str, output_file: str = "performance_dashboard.png"):
"""Create visual performance dashboard"""
if container_id not in self.metrics_history:
print("No metrics history found")
return
metrics = self.metrics_history[container_id]
# Extract data
timestamps = [datetime.fromisoformat(m['timestamp']) for m in metrics]
cpu_values = [m['cpu']['percent'] for m in metrics]
memory_values = [m['memory']['percent'] for m in metrics]
# Create dashboard
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 10))
# CPU usage over time
ax1.plot(timestamps, cpu_values, label='CPU %', color='blue')
ax1.set_title('CPU Usage Over Time')
ax1.set_ylabel('CPU %')
ax1.legend()
ax1.grid(True)
# Memory usage over time
ax2.plot(timestamps, memory_values, label='Memory %', color='red')
ax2.set_title('Memory Usage Over Time')
ax2.set_ylabel('Memory %')
ax2.legend()
ax2.grid(True)
# CPU distribution
ax3.hist(cpu_values, bins=30, alpha=0.7, color='blue')
ax3.set_title('CPU Usage Distribution')
ax3.set_xlabel('CPU %')
ax3.set_ylabel('Frequency')
# Memory distribution
ax4.hist(memory_values, bins=30, alpha=0.7, color='red')
ax4.set_title('Memory Usage Distribution')
ax4.set_xlabel('Memory %')
ax4.set_ylabel('Frequency')
plt.tight_layout()
plt.savefig(output_file, dpi=300, bbox_inches='tight')
plt.close()
print(f"Performance dashboard saved to {output_file}")
async def continuous_monitoring(self, duration_hours: int = 24):
"""Run continuous performance monitoring"""
end_time = datetime.now() + timedelta(hours=duration_hours)
while datetime.now() < end_time:
# Collect system metrics
system_metrics = self.collect_system_metrics()
# Collect container metrics
for container in self.docker_client.containers.list():
container_metrics = self.collect_container_metrics(container)
if container_metrics:
container_id = container.id
if container_id not in self.metrics_history:
self.metrics_history[container_id] = []
self.metrics_history[container_id].append(container_metrics)
# Keep only last 1000 metrics per container
if len(self.metrics_history[container_id]) > 1000:
self.metrics_history[container_id] = self.metrics_history[container_id][-1000:]
# Wait before next collection
await asyncio.sleep(30) # Collect every 30 seconds
print(f"Monitoring completed after {duration_hours} hours")
if __name__ == "__main__":
analyzer = PerformanceAnalyzer()
# Run continuous monitoring for 1 hour
asyncio.run(analyzer.continuous_monitoring(duration_hours=1))
# Generate reports for all monitored containers
for container_id in analyzer.metrics_history:
print(analyzer.generate_performance_report(container_id))
analyzer.create_performance_dashboard(container_id, f"dashboard_{container_id[:12]}.png")
Summary
This section covered advanced security and optimization techniques:
Custom Security Solutions
- Runtime Security Engine: Go-based security policy enforcement with real-time monitoring
- Policy Framework: Comprehensive image, runtime, and network policy definitions
- Automated Response: Container quarantine and security team alerting
Advanced Performance Analysis
- Performance Analyzer: Python-based comprehensive metrics collection and analysis
- Anomaly Detection: Statistical methods for identifying performance issues
- Trend Analysis: Historical performance analysis with recommendations
Enterprise Patterns
- Security Automation: Policy-driven security enforcement and violation handling
- Performance Intelligence: AI-driven performance optimization recommendations
- Continuous Monitoring: Real-time security and performance assessment
Next Steps: Part 5 demonstrates complete production implementations combining all these advanced techniques into enterprise-ready security and optimization solutions.
Best Practices and Optimization
Docker Security and Optimization: Best Practices and Production Excellence
This final section demonstrates production-ready security and optimization implementations, combining comprehensive security frameworks, performance excellence, and operational best practices into enterprise-grade solutions.
Enterprise Security Framework
Complete Security Operations Platform
# docker-compose.security-platform.yml
version: '3.8'
services:
# Security Orchestrator
security-orchestrator:
build: ./security-orchestrator
networks:
- security-mgmt
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- security-policies:/etc/security/policies:ro
- security-logs:/var/log/security
environment:
- SECURITY_LEVEL=enterprise
- COMPLIANCE_STANDARDS=SOC2,PCI_DSS,HIPAA,GDPR
- AUTO_REMEDIATION=true
- ALERT_WEBHOOK=${SECURITY_WEBHOOK_URL}
secrets:
- security_encryption_key
- siem_api_key
# Vulnerability Scanner
vulnerability-scanner:
image: aquasec/trivy:latest
networks:
- security-mgmt
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- trivy-cache:/root/.cache/trivy
- scan-results:/results
environment:
- TRIVY_DB_REPOSITORY=ghcr.io/aquasecurity/trivy-db
- TRIVY_JAVA_DB_REPOSITORY=ghcr.io/aquasecurity/trivy-java-db
command: |
sh -c "
while true; do
echo 'Starting vulnerability scan...'
for image in $(docker images --format '{{.Repository}}:{{.Tag}}' | grep -v '<none>'); do
echo 'Scanning $image'
trivy image --format json --output /results/scan_$(echo $image | tr '/:' '_')_$(date +%Y%m%d_%H%M%S).json $image
done
sleep 3600 # Scan every hour
done
"
# Runtime Security Monitor (Falco)
falco:
image: falcosecurity/falco:latest
privileged: true
networks:
- security-mgmt
volumes:
- /var/run/docker.sock:/host/var/run/docker.sock
- /dev:/host/dev
- /proc:/host/proc:ro
- /boot:/host/boot:ro
- /lib/modules:/host/lib/modules:ro
- /usr:/host/usr:ro
- ./falco/falco.yaml:/etc/falco/falco.yaml:ro
- ./falco/rules:/etc/falco/rules:ro
environment:
- FALCO_GRPC_ENABLED=true
- FALCO_GRPC_BIND_ADDRESS=0.0.0.0:5060
- FALCO_WEBSERVER_ENABLED=true
# Security Information and Event Management
siem-collector:
build: ./siem-collector
networks:
- security-mgmt
volumes:
- security-logs:/var/log/security:ro
- siem-data:/var/lib/siem
environment:
- ELASTICSEARCH_URL=http://elasticsearch:9200
- KIBANA_URL=http://kibana:5601
- LOG_SOURCES=falco,trivy,docker,system
depends_on:
- elasticsearch
- kibana
# Elasticsearch for log storage
elasticsearch:
image: elasticsearch:7.17.0
networks:
- security-mgmt
volumes:
- elasticsearch-data:/usr/share/elasticsearch/data
environment:
- discovery.type=single-node
- "ES_JAVA_OPTS=-Xms2g -Xmx2g"
- xpack.security.enabled=true
- ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
# Kibana for security dashboards
kibana:
image: kibana:7.17.0
networks:
- security-mgmt
ports:
- "5601:5601"
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
- ELASTICSEARCH_USERNAME=elastic
- ELASTICSEARCH_PASSWORD=${ELASTIC_PASSWORD}
volumes:
- ./kibana/dashboards:/usr/share/kibana/data/dashboards:ro
depends_on:
- elasticsearch
# Compliance Reporter
compliance-reporter:
build: ./compliance-reporter
networks:
- security-mgmt
volumes:
- compliance-reports:/reports
- /var/run/docker.sock:/var/run/docker.sock:ro
environment:
- REPORT_SCHEDULE=0 6 * * * # Daily at 6 AM
- COMPLIANCE_FRAMEWORKS=CIS_DOCKER,NIST_CSF,ISO27001
- REPORT_FORMAT=pdf,json,html
- NOTIFICATION_EMAIL=${COMPLIANCE_EMAIL}
# Certificate Management
cert-manager:
image: jetstack/cert-manager-controller:latest
networks:
- security-mgmt
volumes:
- cert-data:/var/lib/cert-manager
- ./cert-manager/config.yaml:/etc/cert-manager/config.yaml:ro
environment:
- ACME_EMAIL=${ACME_EMAIL}
- DNS_PROVIDER=${DNS_PROVIDER}
# Secrets Management (Vault)
vault:
image: vault:latest
networks:
- security-mgmt
ports:
- "8200:8200"
volumes:
- vault-data:/vault/data
- vault-logs:/vault/logs
- ./vault/config.hcl:/vault/config/config.hcl:ro
environment:
- VAULT_CONFIG_DIR=/vault/config
- VAULT_LOG_LEVEL=info
cap_add:
- IPC_LOCK
command: vault server -config=/vault/config/config.hcl
networks:
security-mgmt:
driver: bridge
driver_opts:
com.docker.network.bridge.enable_icc: "true"
encrypted: "true"
volumes:
security-policies:
security-logs:
trivy-cache:
scan-results:
siem-data:
elasticsearch-data:
compliance-reports:
cert-data:
vault-data:
vault-logs:
secrets:
security_encryption_key:
external: true
siem_api_key:
external: true
Performance Excellence Platform
High-Performance Computing Environment
# docker-compose.performance-platform.yml
version: '3.8'
services:
# Performance Orchestrator
performance-orchestrator:
build: ./performance-orchestrator
privileged: true
networks:
- performance-mgmt
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- performance-data:/data
environment:
- OPTIMIZATION_MODE=aggressive
- AUTO_SCALING=true
- PERFORMANCE_TARGETS=cpu:80,memory:85,latency:100ms
- MONITORING_INTERVAL=10s
# Application Performance Monitoring
apm-server:
image: elastic/apm-server:7.17.0
networks:
- performance-mgmt
volumes:
- ./apm/apm-server.yml:/usr/share/apm-server/apm-server.yml:ro
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
depends_on:
- elasticsearch
# Distributed Tracing (Jaeger)
jaeger:
image: jaegertracing/all-in-one:latest
networks:
- performance-mgmt
ports:
- "16686:16686"
- "14268:14268"
environment:
- COLLECTOR_ZIPKIN_HTTP_PORT=9411
- SPAN_STORAGE_TYPE=elasticsearch
- ES_SERVER_URLS=http://elasticsearch:9200
depends_on:
- elasticsearch
# Metrics Collection (Prometheus)
prometheus:
image: prom/prometheus:latest
networks:
- performance-mgmt
ports:
- "9090:9090"
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./prometheus/rules:/etc/prometheus/rules:ro
- prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.retention.time=30d'
- '--web.enable-lifecycle'
- '--storage.tsdb.path=/prometheus'
# Performance Visualization (Grafana)
grafana:
image: grafana/grafana:latest
networks:
- performance-mgmt
ports:
- "3000:3000"
volumes:
- grafana-data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning:ro
- ./grafana/dashboards:/var/lib/grafana/dashboards:ro
environment:
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
- GF_INSTALL_PLUGINS=grafana-piechart-panel,grafana-worldmap-panel,grafana-clock-panel
# Load Testing Platform
load-tester:
build: ./load-tester
networks:
- performance-mgmt
volumes:
- load-test-results:/results
- ./load-tests:/tests:ro
environment:
- TEST_SCHEDULE=0 2 * * * # Daily at 2 AM
- TARGET_APPLICATIONS=${TARGET_APPS}
- PERFORMANCE_THRESHOLDS=response_time:500ms,throughput:1000rps,error_rate:1%
# Performance Profiler
profiler:
build: ./profiler
privileged: true
networks:
- performance-mgmt
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- profiler-data:/data
- /sys/kernel/debug:/sys/kernel/debug:ro
environment:
- PROFILING_MODE=continuous
- PROFILE_TARGETS=cpu,memory,io,network
- FLAME_GRAPH_ENABLED=true
# Auto-Scaler
auto-scaler:
build: ./auto-scaler
networks:
- performance-mgmt
volumes:
- /var/run/docker.sock:/var/run/docker.sock
environment:
- SCALING_POLICIES=cpu:80:scale_up,cpu:30:scale_down,memory:85:scale_up
- MIN_REPLICAS=2
- MAX_REPLICAS=20
- COOLDOWN_PERIOD=300s
networks:
performance-mgmt:
driver: bridge
volumes:
performance-data:
prometheus-data:
grafana-data:
load-test-results:
profiler-data:
Automated Security and Performance Management
Intelligent Operations Platform
#!/usr/bin/env python3
# intelligent-ops-platform.py
import asyncio
import docker
import json
import logging
import numpy as np
from datetime import datetime, timedelta
from typing import Dict, List, Optional, Tuple
import aioredis
import aiohttp
from dataclasses import dataclass
import yaml
@dataclass
class SecurityAlert:
id: str
severity: str
type: str
container_id: str
description: str
timestamp: datetime
metadata: Dict
@dataclass
class PerformanceMetric:
container_id: str
metric_type: str
value: float
timestamp: datetime
threshold: Optional[float] = None
class IntelligentOpsManager:
def __init__(self, config_path: str):
with open(config_path, 'r') as f:
self.config = yaml.safe_load(f)
self.docker_client = docker.from_env()
self.redis = None
self.session = None
# ML models for prediction (simplified)
self.performance_model = None
self.security_model = None
# Operational state
self.active_alerts = {}
self.performance_history = {}
self.security_incidents = {}
async def initialize(self):
"""Initialize async components"""
self.redis = await aioredis.from_url("redis://localhost:6379")
self.session = aiohttp.ClientSession()
# Load ML models
await self.load_ml_models()
async def load_ml_models(self):
"""Load machine learning models for predictions"""
# In production, load actual trained models
# For demo, using simple statistical models
self.performance_model = {
'cpu_threshold': 80.0,
'memory_threshold': 85.0,
'prediction_window': 300 # 5 minutes
}
self.security_model = {
'anomaly_threshold': 2.0, # Standard deviations
'risk_factors': {
'privileged_containers': 0.8,
'root_processes': 0.6,
'network_anomalies': 0.7
}
}
async def continuous_monitoring(self):
"""Main monitoring loop"""
while True:
try:
# Collect metrics
await self.collect_security_metrics()
await self.collect_performance_metrics()
# Analyze and predict
await self.analyze_security_posture()
await self.analyze_performance_trends()
# Take automated actions
await self.execute_automated_responses()
# Update dashboards
await self.update_operational_dashboards()
await asyncio.sleep(30) # Monitor every 30 seconds
except Exception as e:
logging.error(f"Monitoring error: {e}")
await asyncio.sleep(60)
async def collect_security_metrics(self):
"""Collect comprehensive security metrics"""
containers = self.docker_client.containers.list()
for container in containers:
try:
# Check container configuration
config_score = await self.assess_container_security(container)
# Check runtime behavior
runtime_score = await self.assess_runtime_security(container)
# Calculate overall security score
security_score = (config_score + runtime_score) / 2
# Store metrics
await self.redis.hset(
f"security:{container.id}",
mapping={
'config_score': config_score,
'runtime_score': runtime_score,
'overall_score': security_score,
'timestamp': datetime.now().isoformat()
}
)
# Generate alerts if needed
if security_score < self.config['security']['alert_threshold']:
await self.create_security_alert(container, security_score)
except Exception as e:
logging.error(f"Error collecting security metrics for {container.name}: {e}")
async def assess_container_security(self, container) -> float:
"""Assess container security configuration"""
score = 100.0
# Check if running as root
config = container.attrs['Config']
if not config.get('User'):
score -= 20
# Check privileged mode
host_config = container.attrs['HostConfig']
if host_config.get('Privileged'):
score -= 30
# Check capabilities
cap_add = host_config.get('CapAdd', [])
dangerous_caps = ['SYS_ADMIN', 'NET_ADMIN', 'SYS_PTRACE']
for cap in cap_add:
if cap in dangerous_caps:
score -= 15
# Check read-only filesystem
if not host_config.get('ReadonlyRootfs'):
score -= 10
# Check security options
security_opt = host_config.get('SecurityOpt', [])
if 'no-new-privileges:true' not in security_opt:
score -= 10
return max(0, score)
async def assess_runtime_security(self, container) -> float:
"""Assess container runtime security"""
score = 100.0
try:
# Check running processes
result = container.exec_run("ps aux", demux=True)
if result.exit_code == 0:
processes = result.output[0].decode()
# Check for suspicious processes
suspicious_procs = ['nc', 'netcat', 'nmap', 'wget', 'curl']
for proc in suspicious_procs:
if proc in processes.lower():
score -= 15
# Check for root processes
if 'root' in processes:
score -= 10
# Check network connections
result = container.exec_run("netstat -tuln", demux=True)
if result.exit_code == 0:
connections = result.output[0].decode()
# Check for suspicious ports
suspicious_ports = ['22', '23', '3389', '4444']
for port in suspicious_ports:
if f":{port}" in connections:
score -= 10
except Exception as e:
logging.debug(f"Runtime security assessment error: {e}")
score -= 5 # Penalty for inability to assess
return max(0, score)
async def collect_performance_metrics(self):
"""Collect comprehensive performance metrics"""
containers = self.docker_client.containers.list()
for container in containers:
try:
stats = container.stats(stream=False)
# Calculate metrics
cpu_percent = self.calculate_cpu_percent(stats)
memory_percent = self.calculate_memory_percent(stats)
# Store metrics
timestamp = datetime.now()
container_id = container.id
if container_id not in self.performance_history:
self.performance_history[container_id] = []
metrics = {
'timestamp': timestamp,
'cpu_percent': cpu_percent,
'memory_percent': memory_percent,
'network_rx': self.get_network_rx(stats),
'network_tx': self.get_network_tx(stats),
'disk_read': self.get_disk_read(stats),
'disk_write': self.get_disk_write(stats)
}
self.performance_history[container_id].append(metrics)
# Keep only last 1000 metrics
if len(self.performance_history[container_id]) > 1000:
self.performance_history[container_id] = self.performance_history[container_id][-1000:]
# Store in Redis for real-time access
await self.redis.hset(
f"performance:{container_id}",
mapping={
'cpu_percent': cpu_percent,
'memory_percent': memory_percent,
'timestamp': timestamp.isoformat()
}
)
except Exception as e:
logging.error(f"Error collecting performance metrics for {container.name}: {e}")
async def analyze_performance_trends(self):
"""Analyze performance trends and predict issues"""
for container_id, history in self.performance_history.items():
if len(history) < 10: # Need minimum data points
continue
try:
# Extract recent metrics
recent_metrics = history[-60:] # Last 60 data points
cpu_values = [m['cpu_percent'] for m in recent_metrics]
memory_values = [m['memory_percent'] for m in recent_metrics]
# Predict future performance
cpu_prediction = await self.predict_metric_trend(cpu_values)
memory_prediction = await self.predict_metric_trend(memory_values)
# Check for predicted issues
if cpu_prediction > self.performance_model['cpu_threshold']:
await self.create_performance_alert(
container_id, 'cpu', cpu_prediction, 'predicted_high_cpu'
)
if memory_prediction > self.performance_model['memory_threshold']:
await self.create_performance_alert(
container_id, 'memory', memory_prediction, 'predicted_high_memory'
)
# Detect anomalies
cpu_anomaly = self.detect_anomaly(cpu_values)
memory_anomaly = self.detect_anomaly(memory_values)
if cpu_anomaly:
await self.create_performance_alert(
container_id, 'cpu', cpu_values[-1], 'cpu_anomaly'
)
if memory_anomaly:
await self.create_performance_alert(
container_id, 'memory', memory_values[-1], 'memory_anomaly'
)
except Exception as e:
logging.error(f"Error analyzing performance trends for {container_id}: {e}")
async def predict_metric_trend(self, values: List[float]) -> float:
"""Simple linear regression prediction"""
if len(values) < 5:
return values[-1] if values else 0
x = np.arange(len(values))
y = np.array(values)
# Linear regression
coeffs = np.polyfit(x, y, 1)
# Predict next value
next_x = len(values)
prediction = coeffs[0] * next_x + coeffs[1]
return max(0, prediction)
def detect_anomaly(self, values: List[float]) -> bool:
"""Detect anomalies using statistical methods"""
if len(values) < 10:
return False
mean = np.mean(values[:-1]) # Exclude current value
std = np.std(values[:-1])
current = values[-1]
# Check if current value is more than 2 standard deviations from mean
return abs(current - mean) > 2 * std
async def execute_automated_responses(self):
"""Execute automated responses to alerts"""
# Get active alerts
alert_keys = await self.redis.keys("alert:*")
for alert_key in alert_keys:
alert_data = await self.redis.hgetall(alert_key)
if not alert_data:
continue
alert_type = alert_data.get(b'type', b'').decode()
container_id = alert_data.get(b'container_id', b'').decode()
severity = alert_data.get(b'severity', b'').decode()
# Execute response based on alert type and severity
if alert_type == 'predicted_high_cpu' and severity == 'high':
await self.scale_container_resources(container_id, cpu_increase=0.5)
elif alert_type == 'predicted_high_memory' and severity == 'high':
await self.scale_container_resources(container_id, memory_increase=512)
elif alert_type == 'security_violation' and severity == 'critical':
await self.quarantine_container(container_id)
elif alert_type == 'cpu_anomaly':
await self.restart_container_if_needed(container_id)
async def scale_container_resources(self, container_id: str, cpu_increase: float = 0, memory_increase: int = 0):
"""Scale container resources"""
try:
container = self.docker_client.containers.get(container_id)
# Get current resource limits
host_config = container.attrs['HostConfig']
current_cpu = host_config.get('CpuQuota', 100000) / 100000 # Convert to CPU count
current_memory = host_config.get('Memory', 0)
# Calculate new limits
new_cpu = current_cpu + cpu_increase
new_memory = current_memory + (memory_increase * 1024 * 1024) # Convert MB to bytes
# Update container (requires restart in Docker)
logging.info(f"Scaling container {container_id}: CPU +{cpu_increase}, Memory +{memory_increase}MB")
# In production, this would integrate with orchestration platform
# For Docker Compose, would update the compose file and recreate
except Exception as e:
logging.error(f"Error scaling container {container_id}: {e}")
async def quarantine_container(self, container_id: str):
"""Quarantine container by isolating it"""
try:
container = self.docker_client.containers.get(container_id)
# Disconnect from all networks except monitoring
networks = container.attrs['NetworkSettings']['Networks']
for network_name in networks:
if network_name != 'monitoring':
network = self.docker_client.networks.get(network_name)
network.disconnect(container)
logging.critical(f"Container {container_id} quarantined due to security violation")
except Exception as e:
logging.error(f"Error quarantining container {container_id}: {e}")
def calculate_cpu_percent(self, stats: Dict) -> float:
"""Calculate CPU percentage from stats"""
cpu_delta = stats['cpu_stats']['cpu_usage']['total_usage'] - \
stats['precpu_stats']['cpu_usage']['total_usage']
system_delta = stats['cpu_stats']['system_cpu_usage'] - \
stats['precpu_stats']['system_cpu_usage']
if system_delta > 0 and cpu_delta > 0:
return (cpu_delta / system_delta) * \
len(stats['cpu_stats']['cpu_usage']['percpu_usage']) * 100.0
return 0.0
def calculate_memory_percent(self, stats: Dict) -> float:
"""Calculate memory percentage from stats"""
usage = stats['memory_stats']['usage']
limit = stats['memory_stats']['limit']
return (usage / limit) * 100.0 if limit > 0 else 0.0
async def create_security_alert(self, container, security_score: float):
"""Create security alert"""
alert_id = f"sec_{container.id}_{int(datetime.now().timestamp())}"
severity = 'critical' if security_score < 50 else 'high' if security_score < 70 else 'medium'
await self.redis.hset(
f"alert:{alert_id}",
mapping={
'type': 'security_violation',
'container_id': container.id,
'container_name': container.name,
'severity': severity,
'security_score': security_score,
'timestamp': datetime.now().isoformat()
}
)
logging.warning(f"Security alert created for {container.name}: score {security_score}")
async def create_performance_alert(self, container_id: str, metric_type: str, value: float, alert_type: str):
"""Create performance alert"""
alert_id = f"perf_{container_id}_{int(datetime.now().timestamp())}"
severity = 'high' if value > 90 else 'medium' if value > 80 else 'low'
await self.redis.hset(
f"alert:{alert_id}",
mapping={
'type': alert_type,
'container_id': container_id,
'metric_type': metric_type,
'value': value,
'severity': severity,
'timestamp': datetime.now().isoformat()
}
)
logging.warning(f"Performance alert created for {container_id}: {metric_type} = {value}")
async def main():
"""Main function"""
config = {
'security': {
'alert_threshold': 70.0,
'auto_quarantine': True
},
'performance': {
'cpu_threshold': 80.0,
'memory_threshold': 85.0,
'auto_scaling': True
}
}
# Save config
with open('/tmp/ops-config.yaml', 'w') as f:
yaml.dump(config, f)
# Initialize and run
ops_manager = IntelligentOpsManager('/tmp/ops-config.yaml')
await ops_manager.initialize()
logging.info("Intelligent Operations Platform started")
await ops_manager.continuous_monitoring()
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
asyncio.run(main())
Summary
This comprehensive Docker Security and Optimization guide has covered:
Foundation to Enterprise
- Security Fundamentals: Container hardening, vulnerability scanning, and secrets management
- Performance Basics: Resource optimization, monitoring, and tuning techniques
- Advanced Techniques: Custom security plugins, performance profiling, and ML-driven optimization
- Production Excellence: Complete security and performance platforms with automation
Enterprise-Grade Solutions
- Security Operations: Comprehensive SIEM integration with automated threat response
- Performance Intelligence: ML-driven performance prediction and auto-scaling
- Compliance Automation: Continuous compliance monitoring and reporting
- Operational Excellence: Intelligent operations platform with predictive capabilities
Key Achievements
You now have the expertise to:
- Implement Enterprise Security: Multi-layered security with automated threat detection and response
- Optimize Performance: Advanced performance tuning with predictive scaling and optimization
- Ensure Compliance: Automated compliance monitoring across multiple frameworks
- Operate Intelligently: AI-driven operations with predictive analytics and automated remediation
- Scale Securely: Production-ready security and performance at enterprise scale
Congratulations! You’ve mastered Docker security and optimization from basic concepts to enterprise-grade implementations. You can now design, implement, and operate production-ready containerized environments that meet the highest standards of security, performance, and operational excellence.
This completes our comprehensive journey through Docker security and optimization, providing you with the knowledge and tools to build and maintain secure, high-performance containerized applications in any environment.