Writing Effective Dockerfiles

I’ve written hundreds of Dockerfiles over the years, and I’ve learned that the difference between a good and bad Dockerfile often determines whether your containers succeed in production. A well-crafted Dockerfile creates smaller, more secure, and faster-building images.

Dockerfile Structure and Layer Optimization

Every instruction in a Dockerfile creates a new layer. Understanding this is crucial for building efficient images. Docker caches layers, so ordering instructions correctly can dramatically speed up your builds.

# Poor layer ordering - cache invalidated frequently
FROM node:18-alpine
COPY . /app
WORKDIR /app
RUN npm install
EXPOSE 3000
CMD ["npm", "start"]

This Dockerfile copies all source code before installing dependencies. Every code change invalidates the npm install cache, making builds slower.

# Better layer ordering - leverages cache effectively
FROM node:18-alpine
WORKDIR /app

# Copy package files first
COPY package*.json ./

# Install dependencies (cached unless package files change)
RUN npm ci --only=production

# Copy source code last
COPY . .

EXPOSE 3000
CMD ["npm", "start"]

Now dependency installation only runs when package files change, not on every code modification. I’ve seen this simple change reduce build times from 5 minutes to 30 seconds.

Choosing the Right Base Image

Base image selection affects security, size, and compatibility. I’ve learned to be deliberate about this choice rather than defaulting to full distributions.

# Full Ubuntu image - large but familiar
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y python3 python3-pip
COPY requirements.txt .
RUN pip3 install -r requirements.txt
COPY . .
CMD ["python3", "app.py"]

This works but creates a 200MB+ image for a simple Python app. Alpine Linux offers a much smaller alternative:

# Alpine-based image - smaller and more secure
FROM python:3.11-alpine
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "app.py"]

Alpine reduces the image size to under 50MB. However, Alpine uses musl libc instead of glibc, which can cause compatibility issues with some Python packages.

For maximum security and minimal size, consider distroless images:

# Multi-stage build with distroless runtime
FROM python:3.11-slim as builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user -r requirements.txt

FROM gcr.io/distroless/python3
COPY --from=builder /root/.local /root/.local
COPY . /app
WORKDIR /app
ENV PATH=/root/.local/bin:$PATH
CMD ["app.py"]

Multi-Stage Builds for Compiled Applications

Multi-stage builds separate build-time dependencies from runtime requirements. This is especially powerful for compiled languages:

# Go application with multi-stage build
FROM golang:1.21-alpine AS builder

WORKDIR /app

# Copy go mod files first for better caching
COPY go.mod go.sum ./
RUN go mod download

# Copy source and build
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o app ./cmd/server

# Runtime stage
FROM alpine:3.18
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=builder /app/app .
CMD ["./app"]

The builder stage includes the full Go toolchain (300MB+), while the runtime stage contains only the compiled binary and minimal Alpine base (10MB total).

Managing Dependencies and Package Installation

How you install packages significantly impacts image size and security. I always clean up package caches and temporary files in the same layer where they’re created.

# Poor practice - leaves package cache
FROM ubuntu:22.04
RUN apt-get update
RUN apt-get install -y curl nginx
RUN rm -rf /var/lib/apt/lists/*

Each RUN instruction creates a separate layer, so the package cache exists in the middle layer even though it’s deleted in the final layer.

# Better practice - single layer with cleanup
FROM ubuntu:22.04
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        curl \
        nginx && \
    rm -rf /var/lib/apt/lists/* && \
    apt-get clean

The --no-install-recommends flag prevents apt from installing suggested packages, and cleanup happens in the same layer as installation.

Using .dockerignore Effectively

The .dockerignore file prevents unnecessary files from being sent to the Docker daemon during builds:

# Version control
.git
.gitignore

# Dependencies
node_modules
__pycache__
*.pyc

# Build artifacts
dist
build
target

# IDE files
.vscode
.idea

# Environment files
.env
.env.local

# Documentation
README.md
docs/

I’ve seen builds fail because someone accidentally included a 2GB dataset in the build context. A good .dockerignore prevents these issues.

Environment Variables and Configuration

Handle configuration through environment variables rather than baking values into images:

FROM node:18-alpine

WORKDIR /app

# Set default environment variables
ENV NODE_ENV=production
ENV PORT=3000
ENV LOG_LEVEL=info

COPY package*.json ./
RUN npm ci --only=production

COPY . .

# Create non-root user
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nextjs -u 1001

USER nextjs

EXPOSE $PORT

CMD ["node", "server.js"]

Security Considerations

Never run containers as root unless absolutely necessary:

FROM python:3.11-slim

# Create app user
RUN groupadd -r appuser && useradd -r -g appuser appuser

WORKDIR /app

# Install dependencies as root
COPY requirements.txt .
RUN pip install -r requirements.txt

# Copy app and change ownership
COPY . .
RUN chown -R appuser:appuser /app

# Switch to non-root user
USER appuser

CMD ["python", "app.py"]

Avoid including secrets in images:

# Bad - secret baked into image
FROM alpine
ENV API_KEY=secret123
CMD ["./app"]

# Good - secret provided at runtime
FROM alpine
ENV API_KEY=""
CMD ["./app"]

The goal is creating images that are small, secure, and fast to build. Every instruction should serve a purpose, and the order should optimize for caching and security.

In the next part, I’ll cover container security in depth, including vulnerability scanning, image signing, and runtime security practices that I’ve learned are essential for production deployments.