Graceful Shutdown Patterns in Go
Implement proper shutdown procedures for Go applications.
Signal Handling Fundamentals
At the core of graceful shutdown is the ability to detect and respond to termination signals. Before diving into complex implementations, let’s establish a solid understanding of signal handling in Go.
Understanding OS Signals
Operating systems communicate with processes through signals. The most common signals relevant to application lifecycle management include:
- SIGINT (Ctrl+C): Interrupt signal, typically sent when a user presses Ctrl+C
- SIGTERM: Termination signal, the standard way to request graceful termination
- SIGKILL: Kill signal, forces immediate termination (cannot be caught or ignored)
- SIGHUP: Hangup signal, traditionally used to indicate a controlling terminal has closed
In Go, we can capture and handle these signals using the os/signal
package and channels:
package main
import (
"fmt"
"os"
"os/signal"
"syscall"
"time"
)
func main() {
// Create a channel to receive OS signals
sigs := make(chan os.Signal, 1)
// Register for specific signals
signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)
// Create a channel to indicate when processing is done
done := make(chan bool, 1)
// Start a goroutine to handle signals
go func() {
// Block until a signal is received
sig := <-sigs
fmt.Printf("Received signal: %s\n", sig)
// Perform cleanup operations
fmt.Println("Starting graceful shutdown...")
time.Sleep(2 * time.Second) // Simulate cleanup work
fmt.Println("Cleanup completed, shutting down...")
// Signal completion
done <- true
}()
fmt.Println("Application running... Press Ctrl+C to terminate")
// Block until done signal is received
<-done
fmt.Println("Application stopped")
}
This simple example demonstrates the basic pattern for signal handling in Go:
- Create a channel to receive signals
- Register for specific signals using
signal.Notify()
- Start a goroutine to handle signals and perform cleanup
- Block the main goroutine until cleanup is complete
Context-Based Cancellation
Go’s context
package provides a powerful mechanism for propagating cancellation signals throughout your application. This is particularly useful for graceful shutdown scenarios:
package main
import (
"context"
"fmt"
"os"
"os/signal"
"sync"
"syscall"
"time"
)
func main() {
// Create a base context with cancellation capability
ctx, cancel := context.WithCancel(context.Background())
// Create a channel to receive OS signals
sigs := make(chan os.Signal, 1)
signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)
// Create a WaitGroup to track active workers
var wg sync.WaitGroup
// Start some worker goroutines
for i := 1; i <= 3; i++ {
wg.Add(1)
go worker(ctx, i, &wg)
}
// Handle signals
go func() {
sig := <-sigs
fmt.Printf("\nReceived signal: %s\n", sig)
fmt.Println("Cancelling context...")
cancel() // This will propagate cancellation to all workers
}()
fmt.Println("Application running with workers... Press Ctrl+C to terminate")
// Wait for all workers to finish
wg.Wait()
fmt.Println("All workers have completed, shutting down...")
}
func worker(ctx context.Context, id int, wg *sync.WaitGroup) {
defer wg.Done()
fmt.Printf("Worker %d starting\n", id)
// Simulate work with context awareness
for {
select {
case <-time.After(time.Second):
fmt.Printf("Worker %d performing task\n", id)
case <-ctx.Done():
fmt.Printf("Worker %d received cancellation signal, cleaning up...\n", id)
// Simulate cleanup work
time.Sleep(time.Duration(id) * 500 * time.Millisecond)
fmt.Printf("Worker %d cleanup complete\n", id)
return
}
}
}
This example demonstrates how to use context cancellation to coordinate shutdown across multiple goroutines:
- Create a cancellable context
- Pass the context to all workers
- When a termination signal is received, call
cancel()
to notify all workers - Use a
WaitGroup
to ensure all workers complete their cleanup before the application exits
Fundamentals and Core Concepts
Graceful Shutdown Patterns
With the fundamentals established, let’s explore more sophisticated patterns for implementing graceful shutdown in different types of Go applications.
HTTP Server Graceful Shutdown
Go’s standard library provides built-in support for graceful shutdown of HTTP servers since Go 1.8. This allows existing connections to complete their requests before the server shuts down:
package main
import (
"context"
"fmt"
"log"
"net/http"
"os"
"os/signal"
"syscall"
"time"
)
func main() {
// Create a new server
server := &http.Server{
Addr: ":8080",
Handler: http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// Simulate a long-running request
time.Sleep(5 * time.Second)
fmt.Fprintf(w, "Hello, World!")
}),
}
// Channel to listen for errors coming from the listener
serverErrors := make(chan error, 1)
// Start the server in a goroutine
go func() {
log.Printf("Server listening on %s", server.Addr)
serverErrors <- server.ListenAndServe()
}()
// Channel to listen for interrupt signals
shutdown := make(chan os.Signal, 1)
signal.Notify(shutdown, os.Interrupt, syscall.SIGTERM)
// Block until we receive a signal or an error
select {
case err := <-serverErrors:
log.Fatalf("Error starting server: %v", err)
case sig := <-shutdown:
log.Printf("Received signal: %v", sig)
// Create a deadline for graceful shutdown
ctx, cancel := context.WithTimeout(context.Background(), 15*time.Second)
defer cancel()
// Gracefully shutdown the server
log.Printf("Shutting down server gracefully with timeout: %s", ctx.Deadline())
if err := server.Shutdown(ctx); err != nil {
log.Printf("Server shutdown error: %v", err)
// Force close if graceful shutdown fails
if err := server.Close(); err != nil {
log.Printf("Server close error: %v", err)
}
}
log.Println("Server shutdown complete")
}
}
Key aspects of this pattern:
- Start the HTTP server in a separate goroutine
- Wait for termination signals
- When a signal is received, call
server.Shutdown()
with a timeout context - If graceful shutdown fails within the timeout, force close the server
Multiple Server Coordination
In real-world applications, you might need to coordinate the shutdown of multiple servers or services:
package main
import (
"context"
"fmt"
"log"
"net/http"
"os"
"os/signal"
"sync"
"syscall"
"time"
)
type Server struct {
name string
httpServer *http.Server
}
func NewServer(name string, addr string, handler http.Handler) *Server {
return &Server{
name: name,
httpServer: &http.Server{
Addr: addr,
Handler: handler,
},
}
}
func (s *Server) Start(wg *sync.WaitGroup) {
defer wg.Done()
log.Printf("%s server starting on %s", s.name, s.httpServer.Addr)
if err := s.httpServer.ListenAndServe(); err != http.ErrServerClosed {
log.Printf("%s server error: %v", s.name, err)
}
log.Printf("%s server stopped", s.name)
}
func (s *Server) Shutdown(ctx context.Context) error {
log.Printf("Shutting down %s server...", s.name)
return s.httpServer.Shutdown(ctx)
}
func main() {
// Create API and metrics servers
apiServer := NewServer("API", ":8080", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
time.Sleep(2 * time.Second) // Simulate work
fmt.Fprintf(w, "API response")
}))
metricsServer := NewServer("Metrics", ":9090", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "Metrics data")
}))
// WaitGroup for tracking running servers
var wg sync.WaitGroup
// Start servers
wg.Add(2)
go apiServer.Start(&wg)
go metricsServer.Start(&wg)
// Channel to listen for interrupt signals
shutdown := make(chan os.Signal, 1)
signal.Notify(shutdown, os.Interrupt, syscall.SIGTERM)
// Wait for shutdown signal
sig := <-shutdown
log.Printf("Received signal: %v", sig)
// Create a deadline for graceful shutdown
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
// Shutdown servers in order (metrics first, then API)
if err := metricsServer.Shutdown(ctx); err != nil {
log.Printf("Metrics server shutdown error: %v", err)
}
if err := apiServer.Shutdown(ctx); err != nil {
log.Printf("API server shutdown error: %v", err)
}
// Wait for servers to finish
log.Println("Waiting for servers to complete shutdown...")
wg.Wait()
log.Println("All servers shutdown complete")
}
This pattern demonstrates:
- Encapsulating servers in a common interface
- Starting each server in its own goroutine
- Coordinating shutdown in a specific order
- Using a
WaitGroup
to ensure all servers have fully stopped
Advanced Patterns and Techniques
Resource Cleanup and Management
Proper resource management is critical during shutdown. Let’s explore patterns for cleaning up various types of resources.
Database Connection Cleanup
Ensuring database connections are properly closed prevents connection leaks and allows transactions to complete:
package main
import (
"context"
"database/sql"
"fmt"
"log"
"os"
"os/signal"
"syscall"
"time"
_ "github.com/go-sql-driver/mysql"
)
type App struct {
db *sql.DB
}
func NewApp() (*App, error) {
// Open database connection
db, err := sql.Open("mysql", "user:password@tcp(127.0.0.1:3306)/dbname")
if err != nil {
return nil, fmt.Errorf("failed to open database: %w", err)
}
// Configure connection pool
db.SetMaxOpenConns(25)
db.SetMaxIdleConns(25)
db.SetConnMaxLifetime(5 * time.Minute)
// Verify connection
if err := db.Ping(); err != nil {
db.Close() // Close on error
return nil, fmt.Errorf("failed to ping database: %w", err)
}
return &App{db: db}, nil
}
func (a *App) Shutdown(ctx context.Context) error {
log.Println("Closing database connections...")
// Create a timeout for DB shutdown if not already set
dbCtx, cancel := context.WithTimeout(ctx, 10*time.Second)
defer cancel()
// Use a channel to signal completion or timeout
done := make(chan struct{})
var err error
go func() {
// Close the database connection
err = a.db.Close()
close(done)
}()
// Wait for completion or timeout
select {
case <-done:
if err != nil {
return fmt.Errorf("error closing database: %w", err)
}
log.Println("Database connections closed successfully")
return nil
case <-dbCtx.Done():
return fmt.Errorf("database shutdown timed out: %w", dbCtx.Err())
}
}
func main() {
// Initialize application
app, err := NewApp()
if err != nil {
log.Fatalf("Failed to initialize app: %v", err)
}
// Channel to listen for interrupt signals
shutdown := make(chan os.Signal, 1)
signal.Notify(shutdown, os.Interrupt, syscall.SIGTERM)
// Block until we receive a signal
sig := <-shutdown
log.Printf("Received signal: %v", sig)
// Create a deadline for graceful shutdown
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
// Perform application shutdown
if err := app.Shutdown(ctx); err != nil {
log.Printf("Error during shutdown: %v", err)
os.Exit(1)
}
log.Println("Application shutdown complete")
}
This pattern demonstrates:
- Proper database connection pool configuration
- Graceful shutdown with timeout handling
- Error handling during shutdown
Worker Pool Graceful Shutdown
Worker pools are common in Go applications. Here’s a pattern for gracefully shutting down a worker pool:
package main
import (
"context"
"fmt"
"log"
"os"
"os/signal"
"sync"
"syscall"
"time"
)
// Job represents a unit of work
type Job struct {
ID int
}
// WorkerPool manages a pool of workers
type WorkerPool struct {
jobs chan Job
results chan Result
workerCount int
shutdown chan struct{}
wg sync.WaitGroup
}
// Result represents the outcome of a job
type Result struct {
JobID int
Output string
Error error
}
// NewWorkerPool creates a new worker pool
func NewWorkerPool(workerCount int, queueSize int) *WorkerPool {
return &WorkerPool{
jobs: make(chan Job, queueSize),
results: make(chan Result, queueSize),
workerCount: workerCount,
shutdown: make(chan struct{}),
}
}
// Start launches the worker pool
func (p *WorkerPool) Start() {
// Start workers
for i := 1; i <= p.workerCount; i++ {
p.wg.Add(1)
go p.worker(i)
}
log.Printf("Started worker pool with %d workers", p.workerCount)
}
// worker processes jobs
func (p *WorkerPool) worker(id int) {
defer p.wg.Done()
log.Printf("Worker %d starting", id)
for {
select {
case job, ok := <-p.jobs:
if !ok {
log.Printf("Worker %d shutting down: job channel closed", id)
return
}
// Process job
log.Printf("Worker %d processing job %d", id, job.ID)
// Simulate work
time.Sleep(time.Duration(job.ID%3+1) * time.Second)
// Send result
p.results <- Result{
JobID: job.ID,
Output: fmt.Sprintf("Result for job %d", job.ID),
}
case <-p.shutdown:
log.Printf("Worker %d received shutdown signal", id)
return
}
}
}
// Submit adds a job to the pool
func (p *WorkerPool) Submit(job Job) {
p.jobs <- job
}
// Results returns the results channel
func (p *WorkerPool) Results() <-chan Result {
return p.results
}
// Shutdown gracefully shuts down the worker pool
func (p *WorkerPool) Shutdown(ctx context.Context) {
log.Println("Worker pool shutting down...")
// Signal all workers to stop
close(p.shutdown)
// Close the jobs channel to prevent new jobs
close(p.jobs)
// Create a channel to signal when workers are done
done := make(chan struct{})
go func() {
// Wait for all workers to finish
p.wg.Wait()
close(done)
}()
// Wait for workers to finish or timeout
select {
case <-done:
log.Println("All workers have stopped")
case <-ctx.Done():
log.Printf("Worker pool shutdown timed out: %v", ctx.Err())
}
// Close the results channel
close(p.results)
}
func main() {
// Create a worker pool with 5 workers and a queue size of 10
pool := NewWorkerPool(5, 10)
pool.Start()
// Start a goroutine to process results
go func() {
for result := range pool.Results() {
log.Printf("Got result: %s (error: %v)", result.Output, result.Error)
}
log.Println("Results channel closed")
}()
// Submit some jobs
for i := 1; i <= 10; i++ {
pool.Submit(Job{ID: i})
}
// Channel to listen for interrupt signals
shutdown := make(chan os.Signal, 1)
signal.Notify(shutdown, os.Interrupt, syscall.SIGTERM)
// Block until we receive a signal
sig := <-shutdown
log.Printf("Received signal: %v", sig)
// Create a deadline for graceful shutdown
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
// Shutdown the worker pool
pool.Shutdown(ctx)
log.Println("Application shutdown complete")
}
This pattern demonstrates:
- Creating a worker pool with controlled concurrency
- Signaling workers to stop processing
- Waiting for in-progress work to complete
- Handling shutdown timeouts
Implementation Strategies
Coordinating Multiple Services
In microservice architectures, coordinating shutdown across multiple services requires careful orchestration.
Dependency-Aware Shutdown
Services often have dependencies that dictate the order of shutdown. Here’s a pattern for dependency-aware shutdown:
package main
import (
"context"
"fmt"
"log"
"os"
"os/signal"
"sync"
"syscall"
"time"
)
// Service represents a component that can be started and stopped
type Service interface {
Name() string
Start() error
Stop(ctx context.Context) error
Dependencies() []Service
}
// BaseService provides common functionality for services
type BaseService struct {
name string
dependencies []Service
}
func (s *BaseService) Name() string {
return s.name
}
func (s *BaseService) Dependencies() []Service {
return s.dependencies
}
// DatabaseService represents a database connection
type DatabaseService struct {
BaseService
}
func NewDatabaseService() *DatabaseService {
return &DatabaseService{
BaseService: BaseService{
name: "database",
dependencies: []Service{},
},
}
}
func (s *DatabaseService) Start() error {
log.Printf("Starting %s service", s.Name())
time.Sleep(1 * time.Second) // Simulate startup
return nil
}
func (s *DatabaseService) Stop(ctx context.Context) error {
log.Printf("Stopping %s service", s.Name())
time.Sleep(2 * time.Second) // Simulate cleanup
return nil
}
// CacheService represents a cache service
type CacheService struct {
BaseService
}
func NewCacheService() *CacheService {
return &CacheService{
BaseService: BaseService{
name: "cache",
dependencies: []Service{},
},
}
}
func (s *CacheService) Start() error {
log.Printf("Starting %s service", s.Name())
time.Sleep(500 * time.Millisecond) // Simulate startup
return nil
}
func (s *CacheService) Stop(ctx context.Context) error {
log.Printf("Stopping %s service", s.Name())
time.Sleep(1 * time.Second) // Simulate cleanup
return nil
}
// APIService represents an API server
type APIService struct {
BaseService
}
func NewAPIService(db *DatabaseService, cache *CacheService) *APIService {
return &APIService{
BaseService: BaseService{
name: "api",
dependencies: []Service{db, cache},
},
}
}
func (s *APIService) Start() error {
log.Printf("Starting %s service", s.Name())
time.Sleep(1 * time.Second) // Simulate startup
return nil
}
func (s *APIService) Stop(ctx context.Context) error {
log.Printf("Stopping %s service", s.Name())
time.Sleep(3 * time.Second) // Simulate cleanup
return nil
}
// Application coordinates all services
type Application struct {
services []Service
mu sync.Mutex
}
func NewApplication(services ...Service) *Application {
return &Application{
services: services,
}
}
// Start starts all services in dependency order
func (a *Application) Start() error {
started := make(map[string]bool)
var startService func(Service) error
startService = func(s Service) error {
a.mu.Lock()
if started[s.Name()] {
a.mu.Unlock()
return nil
}
a.mu.Unlock()
// Start dependencies first
for _, dep := range s.Dependencies() {
if err := startService(dep); err != nil {
return fmt.Errorf("failed to start dependency %s: %w", dep.Name(), err)
}
}
// Start the service
if err := s.Start(); err != nil {
return fmt.Errorf("failed to start service %s: %w", s.Name(), err)
}
a.mu.Lock()
started[s.Name()] = true
a.mu.Unlock()
return nil
}
// Start all services
for _, s := range a.services {
if err := startService(s); err != nil {
return err
}
}
return nil
}
// Stop stops all services in reverse dependency order
func (a *Application) Stop(ctx context.Context) error {
// Build a reverse dependency graph
dependedOnBy := make(map[string][]Service)
for _, s := range a.services {
for _, dep := range s.Dependencies() {
dependedOnBy[dep.Name()] = append(dependedOnBy[dep.Name()], s)
}
}
// Find services with no dependents (leaf nodes)
var leaves []Service
for _, s := range a.services {
if len(dependedOnBy[s.Name()]) == 0 {
leaves = append(leaves, s)
}
}
// Stop services in reverse dependency order
stopped := make(map[string]bool)
var wg sync.WaitGroup
errCh := make(chan error, len(a.services))
var stopService func(Service)
stopService = func(s Service) {
defer wg.Done()
a.mu.Lock()
if stopped[s.Name()] {
a.mu.Unlock()
return
}
stopped[s.Name()] = true
a.mu.Unlock()
// Stop the service
if err := s.Stop(ctx); err != nil {
errCh <- fmt.Errorf("failed to stop service %s: %w", s.Name(), err)
return
}
// Stop dependencies after dependents
for _, dep := range s.Dependencies() {
// Check if all services depending on this dependency have been stopped
canStopDep := true
for _, depDependent := range dependedOnBy[dep.Name()] {
if !stopped[depDependent.Name()] {
canStopDep = false
break
}
}
if canStopDep {
wg.Add(1)
go stopService(dep)
}
}
}
// Start stopping leaf services
for _, s := range leaves {
wg.Add(1)
go stopService(s)
}
// Wait for all services to stop or context to be cancelled
done := make(chan struct{})
go func() {
wg.Wait()
close(done)
}()
select {
case <-done:
// Check for errors
close(errCh)
var errs []error
for err := range errCh {
errs = append(errs, err)
}
if len(errs) > 0 {
return fmt.Errorf("errors during shutdown: %v", errs)
}
return nil
case <-ctx.Done():
return ctx.Err()
}
}
func main() {
// Create services
db := NewDatabaseService()
cache := NewCacheService()
api := NewAPIService(db, cache)
// Create application
app := NewApplication(api, db, cache)
// Start application
if err := app.Start(); err != nil {
log.Fatalf("Failed to start application: %v", err)
}
log.Println("Application started successfully")
// Channel to listen for interrupt signals
shutdown := make(chan os.Signal, 1)
signal.Notify(shutdown, os.Interrupt, syscall.SIGTERM)
// Block until we receive a signal
sig := <-shutdown
log.Printf("Received signal: %v", sig)
// Create a deadline for graceful shutdown
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
// Stop application
if err := app.Stop(ctx); err != nil {
log.Printf("Error during shutdown: %v", err)
os.Exit(1)
}
log.Println("Application shutdown complete")
}
This sophisticated pattern demonstrates:
- Modeling service dependencies explicitly
- Starting services in dependency order
- Stopping services in reverse dependency order
- Parallel shutdown where possible
- Timeout handling for the entire shutdown process
Health Checks and Readiness Probes
Health checks and readiness probes are essential for coordinating with orchestration systems like Kubernetes.
Performance and Optimization
Implementing Health and Readiness Endpoints
package main
import (
"context"
"encoding/json"
"fmt"
"log"
"net/http"
"os"
"os/signal"
"sync"
"syscall"
"time"
)
// HealthStatus represents the health state of a component
type HealthStatus string
const (
StatusHealthy HealthStatus = "healthy"
StatusDegraded HealthStatus = "degraded"
StatusUnhealthy HealthStatus = "unhealthy"
StatusShutdown HealthStatus = "shutdown"
)
// HealthCheck represents a component that can report its health
type HealthCheck interface {
Name() string
Check() HealthStatus
}
// Component represents a service component with health reporting
type Component struct {
name string
status HealthStatus
mu sync.RWMutex
}
func NewComponent(name string) *Component {
return &Component{
name: name,
status: StatusHealthy,
}
}
func (c *Component) Name() string {
return c.name
}
func (c *Component) Check() HealthStatus {
c.mu.RLock()
defer c.mu.RUnlock()
return c.status
}
func (c *Component) SetStatus(status HealthStatus) {
c.mu.Lock()
defer c.mu.Unlock()
c.status = status
}
// HealthServer provides health and readiness endpoints
type HealthServer struct {
components []HealthCheck
server *http.Server
isShutdown bool
mu sync.RWMutex
}
func NewHealthServer(addr string) *HealthServer {
return &HealthServer{
components: []HealthCheck{},
server: &http.Server{
Addr: addr,
},
}
}
// AddComponent adds a component to health monitoring
func (hs *HealthServer) AddComponent(component HealthCheck) {
hs.components = append(hs.components, component)
}
// Start begins serving health and readiness endpoints
func (hs *HealthServer) Start() error {
mux := http.NewServeMux()
// Health endpoint returns overall system health
mux.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) {
hs.mu.RLock()
if hs.isShutdown {
hs.mu.RUnlock()
w.WriteHeader(http.StatusServiceUnavailable)
json.NewEncoder(w).Encode(map[string]string{
"status": string(StatusShutdown),
})
return
}
hs.mu.RUnlock()
overallStatus := StatusHealthy
componentStatuses := make(map[string]string)
for _, component := range hs.components {
status := component.Check()
componentStatuses[component.Name()] = string(status)
if status == StatusUnhealthy {
overallStatus = StatusUnhealthy
} else if status == StatusDegraded && overallStatus != StatusUnhealthy {
overallStatus = StatusDegraded
}
}
response := map[string]interface{}{
"status": string(overallStatus),
"components": componentStatuses,
"timestamp": time.Now().Format(time.RFC3339),
}
if overallStatus != StatusHealthy {
w.WriteHeader(http.StatusServiceUnavailable)
}
json.NewEncoder(w).Encode(response)
})
// Readiness endpoint indicates if the service is ready to receive traffic
mux.HandleFunc("/ready", func(w http.ResponseWriter, r *http.Request) {
hs.mu.RLock()
isShutdown := hs.isShutdown
hs.mu.RUnlock()
if isShutdown {
w.WriteHeader(http.StatusServiceUnavailable)
json.NewEncoder(w).Encode(map[string]string{
"status": "not ready - shutting down",
})
return
}
w.WriteHeader(http.StatusOK)
json.NewEncoder(w).Encode(map[string]string{
"status": "ready",
})
})
hs.server.Handler = mux
go func() {
log.Printf("Health server listening on %s", hs.server.Addr)
if err := hs.server.ListenAndServe(); err != http.ErrServerClosed {
log.Printf("Health server error: %v", err)
}
}()
return nil
}
// BeginShutdown marks the service as shutting down
func (hs *HealthServer) BeginShutdown() {
hs.mu.Lock()
defer hs.mu.Unlock()
hs.isShutdown = true
log.Println("Health server marked as shutting down")
}
// Shutdown stops the health server
func (hs *HealthServer) Shutdown(ctx context.Context) error {
log.Println("Shutting down health server...")
return hs.server.Shutdown(ctx)
}
func main() {
// Create components
dbComponent := NewComponent("database")
apiComponent := NewComponent("api")
// Create health server
healthServer := NewHealthServer(":8081")
healthServer.AddComponent(dbComponent)
healthServer.AddComponent(apiComponent)
// Create API server
apiServer := &http.Server{
Addr: ":8080",
Handler: http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
time.Sleep(2 * time.Second) // Simulate work
fmt.Fprintf(w, "API response")
}),
}
// Start servers
if err := healthServer.Start(); err != nil {
log.Fatalf("Failed to start health server: %v", err)
}
go func() {
log.Printf("API server listening on %s", apiServer.Addr)
if err := apiServer.ListenAndServe(); err != http.ErrServerClosed {
log.Printf("API server error: %v", err)
}
}()
// Channel to listen for interrupt signals
shutdown := make(chan os.Signal, 1)
signal.Notify(shutdown, os.Interrupt, syscall.SIGTERM)
// Block until we receive a signal
sig := <-shutdown
log.Printf("Received signal: %v", sig)
// Mark as shutting down in health checks
healthServer.BeginShutdown()
// Simulate degraded status during shutdown
dbComponent.SetStatus(StatusDegraded)
// Create a deadline for graceful shutdown
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
// Shutdown API server first
log.Println("Shutting down API server...")
if err := apiServer.Shutdown(ctx); err != nil {
log.Printf("API server shutdown error: %v", err)
}
// Update component status
apiComponent.SetStatus(StatusShutdown)
// Shutdown health server last
if err := healthServer.Shutdown(ctx); err != nil {
log.Printf("Health server shutdown error: %v", err)
}
log.Println("Application shutdown complete")
}
This pattern demonstrates:
- Implementing health and readiness endpoints
- Tracking component health status
- Updating health status during shutdown
- Using health checks to coordinate with orchestration systems
Production Deployment Strategies
Graceful shutdown is particularly important in production environments, especially when dealing with orchestration systems like Kubernetes.
Connection Draining for Zero-Downtime Deployments
In production environments, you often need to ensure that in-flight requests are completed before shutting down:
package main
import (
"context"
"fmt"
"log"
"net/http"
"os"
"os/signal"
"sync"
"syscall"
"time"
)
// ConnectionTracker keeps track of active connections
type ConnectionTracker struct {
activeConnections int
mu sync.Mutex
drainComplete chan struct{}
}
func NewConnectionTracker() *ConnectionTracker {
return &ConnectionTracker{
drainComplete: make(chan struct{}),
}
}
// ConnectionStarted increments the active connection counter
func (ct *ConnectionTracker) ConnectionStarted() {
ct.mu.Lock()
defer ct.mu.Unlock()
ct.activeConnections++
log.Printf("Connection started. Active connections: %d", ct.activeConnections)
}
// ConnectionFinished decrements the active connection counter
func (ct *ConnectionTracker) ConnectionFinished() {
ct.mu.Lock()
defer ct.mu.Unlock()
ct.activeConnections--
log.Printf("Connection finished. Active connections: %d", ct.activeConnections)
// If we're draining and this was the last connection, signal completion
if ct.activeConnections == 0 && ct.drainComplete != nil {
select {
case <-ct.drainComplete:
// Channel already closed
default:
close(ct.drainComplete)
}
}
}
// WaitForDrain waits for all connections to finish
func (ct *ConnectionTracker) WaitForDrain(ctx context.Context) error {
ct.mu.Lock()
if ct.activeConnections == 0 {
ct.mu.Unlock()
return nil
}
ct.mu.Unlock()
select {
case <-ct.drainComplete:
return nil
case <-ctx.Done():
return ctx.Err()
}
}
// ConnectionDrainingHandler wraps an HTTP handler to track connections
func ConnectionDrainingHandler(tracker *ConnectionTracker, next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
tracker.ConnectionStarted()
defer tracker.ConnectionFinished()
next.ServeHTTP(w, r)
})
}
func main() {
// Create connection tracker
tracker := NewConnectionTracker()
// Create server with connection tracking
server := &http.Server{
Addr: ":8080",
Handler: ConnectionDrainingHandler(tracker, http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// Simulate a long-running request
duration := time.Duration(2+time.Now().Second()%3) * time.Second
log.Printf("Handling request, will take %v", duration)
time.Sleep(duration)
fmt.Fprintf(w, "Request processed after %v", duration)
})),
}
// Start server
go func() {
log.Printf("Server listening on %s", server.Addr)
if err := server.ListenAndServe(); err != http.ErrServerClosed {
log.Printf("Server error: %v", err)
}
}()
// Channel to listen for interrupt signals
shutdown := make(chan os.Signal, 1)
signal.Notify(shutdown, os.Interrupt, syscall.SIGTERM)
// Block until we receive a signal
sig := <-shutdown
log.Printf("Received signal: %v", sig)
// Create a deadline for graceful shutdown
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
// Step 1: Stop accepting new connections
log.Println("Shutting down server - no longer accepting new connections")
if err := server.Shutdown(ctx); err != nil {
log.Printf("Server shutdown error: %v", err)
}
// Step 2: Wait for existing connections to drain
log.Println("Waiting for active connections to complete...")
if err := tracker.WaitForDrain(ctx); err != nil {
log.Printf("Connection draining error: %v", err)
} else {
log.Println("All connections drained successfully")
}
log.Println("Server shutdown complete")
}
This pattern demonstrates:
- Tracking active connections
- Gracefully rejecting new connections while allowing existing ones to complete
- Waiting for all in-flight requests to finish before final shutdown
Kubernetes-Ready Graceful Shutdown
When running in Kubernetes, you need to handle termination signals and coordinate with the container lifecycle:
package main
import (
"context"
"fmt"
"log"
"net/http"
"os"
"os/signal"
"sync"
"syscall"
"time"
)
// ShutdownManager coordinates the shutdown process
type ShutdownManager struct {
shutdownTimeout time.Duration
preStopTimeout time.Duration
server *http.Server
readyToShutdown bool
mu sync.RWMutex
}
func NewShutdownManager(server *http.Server, shutdownTimeout, preStopTimeout time.Duration) *ShutdownManager {
return &ShutdownManager{
server: server,
shutdownTimeout: shutdownTimeout,
preStopTimeout: preStopTimeout,
}
}
// StartPreStop marks the service as no longer ready and waits for the preStop hook duration
func (sm *ShutdownManager) StartPreStop() {
sm.mu.Lock()
sm.readyToShutdown = true
sm.mu.Unlock()
log.Printf("PreStop hook received, waiting %v before starting shutdown", sm.preStopTimeout)
time.Sleep(sm.preStopTimeout)
}
// IsReady returns whether the service is ready to receive traffic
func (sm *ShutdownManager) IsReady() bool {
sm.mu.RLock()
defer sm.mu.RUnlock()
return !sm.readyToShutdown
}
// Shutdown performs the actual server shutdown
func (sm *ShutdownManager) Shutdown() error {
log.Println("Starting graceful shutdown...")
ctx, cancel := context.WithTimeout(context.Background(), sm.shutdownTimeout)
defer cancel()
return sm.server.Shutdown(ctx)
}
func main() {
// Create server
server := &http.Server{
Addr: ":8080",
Handler: http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
time.Sleep(2 * time.Second) // Simulate work
fmt.Fprintf(w, "Hello, World!")
}),
}
// Create shutdown manager
shutdownManager := NewShutdownManager(
server,
30*time.Second, // Shutdown timeout
5*time.Second, // PreStop hook duration
)
// Create health server
healthServer := &http.Server{
Addr: ":8081",
Handler: http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// Kubernetes readiness probe
if r.URL.Path == "/ready" {
if shutdownManager.IsReady() {
w.WriteHeader(http.StatusOK)
fmt.Fprintln(w, "Ready")
} else {
// Return not ready during shutdown
w.WriteHeader(http.StatusServiceUnavailable)
fmt.Fprintln(w, "Not Ready - Shutting Down")
}
return
}
// Kubernetes liveness probe
if r.URL.Path == "/health" {
w.WriteHeader(http.StatusOK)
fmt.Fprintln(w, "Healthy")
return
}
w.WriteHeader(http.StatusNotFound)
}),
}
// Start servers
go func() {
log.Printf("Main server listening on %s", server.Addr)
if err := server.ListenAndServe(); err != http.ErrServerClosed {
log.Printf("Main server error: %v", err)
}
}()
go func() {
log.Printf("Health server listening on %s", healthServer.Addr)
if err := healthServer.ListenAndServe(); err != http.ErrServerClosed {
log.Printf("Health server error: %v", err)
}
}()
// Channel to listen for interrupt signals
shutdown := make(chan os.Signal, 1)
signal.Notify(shutdown, syscall.SIGINT, syscall.SIGTERM)
// Block until we receive a signal
sig := <-shutdown
log.Printf("Received signal: %v", sig)
// Start the pre-stop process
// This simulates the Kubernetes preStop hook
shutdownManager.StartPreStop()
// Shutdown the main server
if err := shutdownManager.Shutdown(); err != nil {
log.Printf("Main server shutdown error: %v", err)
}
// Shutdown the health server last
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
if err := healthServer.Shutdown(ctx); err != nil {
log.Printf("Health server shutdown error: %v", err)
}
log.Println("Application shutdown complete")
}
This pattern demonstrates:
- Coordinating with Kubernetes lifecycle hooks
- Implementing readiness probes that reflect shutdown state
- Using a preStop hook delay to allow for load balancer reconfiguration
- Proper sequencing of shutdown steps
Monitoring and Logging During Shutdown
Proper monitoring and logging during shutdown is essential for troubleshooting and ensuring clean termination.
Structured Shutdown Logging
package main
import (
"context"
"encoding/json"
"fmt"
"log"
"net/http"
"os"
"os/signal"
"sync"
"syscall"
"time"
)
// LogLevel represents the severity of a log message
type LogLevel string
const (
LogLevelInfo LogLevel = "INFO"
LogLevelWarning LogLevel = "WARNING"
LogLevelError LogLevel = "ERROR"
)
// StructuredLogger provides structured logging
type StructuredLogger struct {
mu sync.Mutex
}
// Log outputs a structured log message
func (l *StructuredLogger) Log(level LogLevel, message string, fields map[string]interface{}) {
l.mu.Lock()
defer l.mu.Unlock()
if fields == nil {
fields = make(map[string]interface{})
}
fields["timestamp"] = time.Now().Format(time.RFC3339)
fields["level"] = level
fields["message"] = message
jsonData, err := json.Marshal(fields)
if err != nil {
log.Printf("Error marshaling log: %v", err)
return
}
fmt.Println(string(jsonData))
}
// ShutdownMonitor tracks the shutdown process
type ShutdownMonitor struct {
logger *StructuredLogger
startTime time.Time
shutdownSteps map[string]ShutdownStepStatus
mu sync.Mutex
}
// ShutdownStepStatus represents the status of a shutdown step
type ShutdownStepStatus struct {
Status string
StartTime time.Time
EndTime time.Time
Duration time.Duration
Error error
}
func NewShutdownMonitor(logger *StructuredLogger) *ShutdownMonitor {
return &ShutdownMonitor{
logger: logger,
shutdownSteps: make(map[string]ShutdownStepStatus),
}
}
// StartShutdown begins the shutdown process
func (sm *ShutdownMonitor) StartShutdown() {
sm.mu.Lock()
defer sm.mu.Unlock()
sm.startTime = time.Now()
sm.logger.Log(LogLevelInfo, "Starting application shutdown", map[string]interface{}{
"shutdown_id": sm.startTime.UnixNano(),
})
}
// BeginStep marks the beginning of a shutdown step
func (sm *ShutdownMonitor) BeginStep(stepName string) {
sm.mu.Lock()
defer sm.mu.Unlock()
sm.shutdownSteps[stepName] = ShutdownStepStatus{
Status: "in_progress",
StartTime: time.Now(),
}
sm.logger.Log(LogLevelInfo, fmt.Sprintf("Beginning shutdown step: %s", stepName), map[string]interface{}{
"step": stepName,
"status": "in_progress",
"shutdown_id": sm.startTime.UnixNano(),
})
}
// EndStep marks the end of a shutdown step
func (sm *ShutdownMonitor) EndStep(stepName string, err error) {
sm.mu.Lock()
defer sm.mu.Unlock()
step, exists := sm.shutdownSteps[stepName]
if !exists {
sm.logger.Log(LogLevelWarning, fmt.Sprintf("Ending unknown shutdown step: %s", stepName), map[string]interface{}{
"step": stepName,
"shutdown_id": sm.startTime.UnixNano(),
})
return
}
step.EndTime = time.Now()
step.Duration = step.EndTime.Sub(step.StartTime)
if err != nil {
step.Status = "failed"
step.Error = err
sm.logger.Log(LogLevelError, fmt.Sprintf("Shutdown step failed: %s", stepName), map[string]interface{}{
"step": stepName,
"status": "failed",
"duration_ms": step.Duration.Milliseconds(),
"error": err.Error(),
"shutdown_id": sm.startTime.UnixNano(),
})
} else {
step.Status = "completed"
sm.logger.Log(LogLevelInfo, fmt.Sprintf("Shutdown step completed: %s", stepName), map[string]interface{}{
"step": stepName,
"status": "completed",
"duration_ms": step.Duration.Milliseconds(),
"shutdown_id": sm.startTime.UnixNano(),
})
}
sm.shutdownSteps[stepName] = step
}
// CompleteShutdown finalizes the shutdown process
func (sm *ShutdownMonitor) CompleteShutdown() {
sm.mu.Lock()
defer sm.mu.Unlock()
duration := time.Since(sm.startTime)
// Count successes and failures
successes := 0
failures := 0
for _, step := range sm.shutdownSteps {
if step.Status == "completed" {
successes++
} else if step.Status == "failed" {
failures++
}
}
sm.logger.Log(LogLevelInfo, "Application shutdown complete", map[string]interface{}{
"shutdown_id": sm.startTime.UnixNano(),
"duration_ms": duration.Milliseconds(),
"total_steps": len(sm.shutdownSteps),
"success_steps": successes,
"failed_steps": failures,
})
}
func main() {
// Create structured logger
logger := &StructuredLogger{}
// Create shutdown monitor
monitor := NewShutdownMonitor(logger)
// Create server
server := &http.Server{
Addr: ":8080",
Handler: http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
time.Sleep(1 * time.Second) // Simulate work
fmt.Fprintf(w, "Hello, World!")
}),
}
// Start server
go func() {
logger.Log(LogLevelInfo, "Starting server", map[string]interface{}{
"address": server.Addr,
})
if err := server.ListenAndServe(); err != http.ErrServerClosed {
logger.Log(LogLevelError, "Server error", map[string]interface{}{
"error": err.Error(),
})
}
}()
// Channel to listen for interrupt signals
shutdown := make(chan os.Signal, 1)
signal.Notify(shutdown, os.Interrupt, syscall.SIGTERM)
// Block until we receive a signal
sig := <-shutdown
logger.Log(LogLevelInfo, "Received termination signal", map[string]interface{}{
"signal": sig.String(),
})
// Start the shutdown process
monitor.StartShutdown()
// Step 1: Stop accepting new connections
monitor.BeginStep("server_shutdown")
ctx, cancel := context.WithTimeout(context.Background(), 15*time.Second)
defer cancel()
err := server.Shutdown(ctx)
monitor.EndStep("server_shutdown", err)
// Step 2: Close database connections (simulated)
monitor.BeginStep("database_shutdown")
time.Sleep(2 * time.Second) // Simulate DB shutdown
monitor.EndStep("database_shutdown", nil)
// Step 3: Flush metrics (simulated)
monitor.BeginStep("metrics_flush")
time.Sleep(1 * time.Second) // Simulate metrics flush
// Simulate an error
monitor.EndStep("metrics_flush", fmt.Errorf("failed to flush metrics: connection timeout"))
// Complete the shutdown process
monitor.CompleteShutdown()
}
This pattern demonstrates:
- Structured logging during shutdown
- Tracking individual shutdown steps
- Measuring shutdown duration
- Reporting success and failure metrics
The Bottom Line
Implementing robust graceful shutdown patterns is not just a best practice—it’s a critical requirement for production-grade Go applications. By properly handling termination signals, coordinating resource cleanup, and managing connection draining, you can ensure that your services terminate cleanly without disrupting users or compromising data integrity.
The patterns we’ve explored in this guide provide a comprehensive toolkit for implementing graceful shutdown in various contexts:
- Signal Handling: Capturing OS signals to trigger controlled shutdown
- Context-Based Cancellation: Propagating shutdown signals throughout your application
- HTTP Server Shutdown: Allowing in-flight requests to complete before termination
- Resource Cleanup: Properly closing database connections and other resources
- Worker Pool Management: Gracefully stopping worker pools and background tasks
- Service Coordination: Shutting down services in the correct order based on dependencies
- Health Checks: Integrating with orchestration systems through health and readiness endpoints
- Connection Draining: Ensuring zero-downtime deployments through proper connection handling
- Kubernetes Integration: Coordinating with container lifecycle hooks
- Monitoring and Logging: Tracking and troubleshooting the shutdown process
When implementing these patterns, remember these key principles:
- Timeout Everything: Always use timeouts to prevent indefinite blocking during shutdown
- Order Matters: Shut down services in the reverse order of their dependencies
- Be Defensive: Handle errors during shutdown gracefully
- Monitor and Log: Track the shutdown process for troubleshooting
- Test Thoroughly: Verify shutdown behavior under various conditions
By applying these patterns and principles, you can build Go applications that not only perform well during normal operation but also terminate gracefully when needed, ensuring reliability and data integrity even during deployments, scaling events, or unexpected failures.