Timeouts That Actually Make Sense
Let me tell you about the worst timeout bug I ever encountered. A service was timing out after exactly 30 seconds, every single time, regardless of load or complexity. Turns out someone had hardcoded 30*time.Second
everywhere. During peak traffic, simple operations were timing out, but during quiet periods, complex operations were getting way more time than they needed.
That’s when I learned that smart timeout management isn’t about picking magic numbers—it’s about understanding your system’s behavior and adapting accordingly.
Timeouts vs Deadlines (And When to Use Each)
First, let’s clear up the confusion between timeouts and deadlines. A timeout is relative: “give this operation 5 seconds.” A deadline is absolute: “this must finish by 3:15 PM.”
func demonstrateTimeoutVsDeadline(ctx context.Context) {
// Timeout: relative to now
timeoutCtx, cancel1 := context.WithTimeout(ctx, 5*time.Second)
defer cancel1()
// Deadline: absolute point in time
deadline := time.Now().Add(10*time.Second)
deadlineCtx, cancel2 := context.WithDeadline(ctx, deadline)
defer cancel2()
// Use timeout for operations where you care about duration
fetchUserData(timeoutCtx, "user123")
// Use deadline when you have a hard cutoff time
generateReport(deadlineCtx, "monthly")
}
I use timeouts for most operations because they’re easier to reason about. Deadlines are great when you’re coordinating across multiple services or have external constraints (like “this report must be ready before the meeting starts”).
Adaptive Timeouts (The Smart Way)
Here’s the pattern that changed how I think about timeouts. Instead of hardcoding values, make them adapt based on actual performance:
type SmartTimeout struct {
baseTimeout time.Duration
maxTimeout time.Duration
recentDurations []time.Duration
mu sync.RWMutex
}
func NewSmartTimeout(base, max time.Duration) *SmartTimeout {
return &SmartTimeout{
baseTimeout: base,
maxTimeout: max,
recentDurations: make([]time.Duration, 0, 50),
}
}
func (st *SmartTimeout) GetTimeout() time.Duration {
st.mu.RLock()
defer st.mu.RUnlock()
if len(st.recentDurations) < 10 {
// Not enough data yet, use base timeout
return st.baseTimeout
}
// Calculate 95th percentile of recent operations
sorted := make([]time.Duration, len(st.recentDurations))
copy(sorted, st.recentDurations)
sort.Slice(sorted, func(i, j int) bool {
return sorted[i] < sorted[j]
})
p95 := sorted[int(float64(len(sorted))*0.95)]
adaptiveTimeout := p95 * 2 // Add some buffer
// Clamp to our bounds
if adaptiveTimeout > st.maxTimeout {
return st.maxTimeout
}
if adaptiveTimeout < st.baseTimeout {
return st.baseTimeout
}
return adaptiveTimeout
}
func (st *SmartTimeout) RecordDuration(d time.Duration) {
st.mu.Lock()
defer st.mu.Unlock()
st.recentDurations = append(st.recentDurations, d)
if len(st.recentDurations) > 50 {
// Keep only recent measurements
st.recentDurations = st.recentDurations[1:]
}
}
This timeout learns from your system’s actual behavior. During slow periods, it gives operations more time. During fast periods, it fails fast. Much better than guessing.
Hierarchical Timeouts (Dividing Time Fairly)
When you have a complex operation with multiple stages, you need to divide the available time intelligently:
func processComplexRequest(ctx context.Context, req *Request) error {
// Figure out how much time we have total
deadline, hasDeadline := ctx.Deadline()
if !hasDeadline {
deadline = time.Now().Add(30*time.Second)
}
totalTime := time.Until(deadline)
// Divide time between stages based on typical needs
authTime := totalTime * 10 / 100 // 10% for auth
processTime := totalTime * 70 / 100 // 70% for processing
responseTime := totalTime * 20 / 100 // 20% for response
// Stage 1: Authentication
authDeadline := time.Now().Add(authTime)
authCtx, authCancel := context.WithDeadline(ctx, authDeadline)
defer authCancel()
user, err := authenticateUser(authCtx, req.Token)
if err != nil {
return fmt.Errorf("auth failed: %w", err)
}
// Stage 2: Processing
processDeadline := time.Now().Add(processTime)
processCtx, processCancel := context.WithDeadline(ctx, processDeadline)
defer processCancel()
result, err := processUserRequest(processCtx, user, req)
if err != nil {
return fmt.Errorf("processing failed: %w", err)
}
// Stage 3: Response
responseDeadline := deadline
responseCtx, responseCancel := context.WithDeadline(ctx, responseDeadline)
defer responseCancel()
return sendResponse(responseCtx, result)
}
This ensures no single stage hogs all the time. I’ve seen too many systems where authentication takes 1ms but gets 10 seconds, while the actual work gets starved.
Timeout Inheritance (When You Need More Time)
Sometimes a specific operation needs more time than its parent allows, but you still want to respect cancellation:
func extendTimeoutIfNeeded(parent context.Context, minTimeout time.Duration) (context.Context, context.CancelFunc) {
// Check parent's deadline
if deadline, hasDeadline := parent.Deadline(); hasDeadline {
remaining := time.Until(deadline)
if remaining >= minTimeout {
// Parent has enough time, use it
return context.WithCancel(parent)
}
}
// Parent doesn't have enough time, create new timeout
// but still respect parent cancellation
ctx, cancel := context.WithTimeout(parent, minTimeout)
return ctx, cancel
}
func performCriticalOperation(ctx context.Context) error {
// This operation needs at least 5 minutes
criticalCtx, cancel := extendTimeoutIfNeeded(ctx, 5*time.Minute)
defer cancel()
return doImportantWork(criticalCtx)
}
This pattern lets critical operations get the time they need while still being cancellable by parent contexts.
Cross-Service Timeout Coordination
In microservices, you need to coordinate timeouts across service boundaries. Here’s how I handle it:
type ServiceTimeouts struct {
services map[string]time.Duration
overhead time.Duration
}
func NewServiceTimeouts() *ServiceTimeouts {
return &ServiceTimeouts{
services: map[string]time.Duration{
"auth": 2 * time.Second,
"user": 3 * time.Second,
"billing": 5 * time.Second,
"external": 10 * time.Second,
},
overhead: 500 * time.Millisecond, // Network/processing overhead
}
}
func (st *ServiceTimeouts) CreateServiceContext(ctx context.Context, service string) (context.Context, context.CancelFunc) {
timeout, exists := st.services[service]
if !exists {
timeout = 5 * time.Second // Default
}
// Check if parent context has enough time
if deadline, hasDeadline := ctx.Deadline(); hasDeadline {
remaining := time.Until(deadline) - st.overhead
if remaining <= 0 {
// No time left!
cancelledCtx, cancel := context.WithCancel(ctx)
cancel()
return cancelledCtx, cancel
}
if remaining < timeout {
timeout = remaining
}
}
return context.WithTimeout(ctx, timeout)
}
This ensures each service call gets appropriate time while respecting the overall request deadline.
Timeout Monitoring (Know When Things Go Wrong)
You can’t improve what you don’t measure. Here’s how I monitor timeout behavior:
type TimeoutTracker struct {
operation string
start time.Time
timeout time.Duration
}
func NewTimeoutTracker(operation string, timeout time.Duration) *TimeoutTracker {
return &TimeoutTracker{
operation: operation,
start: time.Now(),
timeout: timeout,
}
}
func (tt *TimeoutTracker) RecordResult(err error) {
duration := time.Since(tt.start)
if errors.Is(err, context.DeadlineExceeded) {
// Operation timed out
log.Printf("TIMEOUT: %s took %v (limit: %v)",
tt.operation, duration, tt.timeout)
// Maybe the timeout is too aggressive?
if duration > tt.timeout*95/100 {
log.Printf("CLOSE_CALL: %s almost timed out", tt.operation)
}
} else if err == nil {
// Success - record how long it actually took
log.Printf("SUCCESS: %s completed in %v (limit: %v)",
tt.operation, duration, tt.timeout)
// Maybe the timeout is too generous?
if duration < tt.timeout/2 {
log.Printf("FAST_COMPLETION: %s finished quickly", tt.operation)
}
}
}
func monitoredOperation(ctx context.Context, operation string) error {
timeout := 5 * time.Second
opCtx, cancel := context.WithTimeout(ctx, timeout)
defer cancel()
tracker := NewTimeoutTracker(operation, timeout)
err := doActualWork(opCtx)
tracker.RecordResult(err)
return err
}
This gives you data to tune your timeouts based on real behavior, not guesswork.
The key insight about timeout management is that good timeouts are dynamic, not static. They adapt to system conditions, coordinate across boundaries, and provide observability into system behavior. When you get timeouts right, your system becomes both responsive and resilient.
Next up, we’ll tackle context values and request scoping. You’ll learn how to carry request-specific data through your application without turning context into a dumping ground for random stuff.