Go Context Patterns: Advanced Request Lifecycle Management
Master advanced Go context patterns for sophisticated request lifecycle management, cancellation, and deadline handling in concurrent applications.
Context Fundamentals
Why Context Matters More Than You Think
When I first encountered Go’s context package, I’ll be honest—I thought it was just another way to pass around cancellation signals. Boy, was I wrong. After years of debugging production issues and watching teams struggle with request lifecycle management, I’ve come to realize that context is actually the backbone of well-architected Go applications.
Here’s the thing: context isn’t just about cancellation. It’s about coordination. Every request in your system has a lifecycle, and context gives you the tools to manage that lifecycle gracefully. Without proper context usage, you end up with goroutine leaks, hanging requests, and systems that don’t shut down cleanly.
What Context Actually Does
Let’s cut through the documentation speak. Context does three things that matter in real applications:
type Context interface {
Deadline() (deadline time.Time, ok bool)
Done() <-chan struct{}
Err() error
Value(key interface{}) interface{}
}
The Done()
channel tells you when to stop working. The Deadline()
method tells you when you must stop. The Err()
method explains why you stopped. And Value()
carries request-specific information along for the ride.
I’ve seen developers get hung up on the interface complexity, but really, it’s just answering the question: “Should I keep working on this request, and what do I need to know about it?”
Starting Simple: Root Contexts
Every context tree needs a root. Think of it like the trunk of a tree—everything else branches off from here:
func main() {
// This is your starting point for most applications
ctx := context.Background()
// Use this when you're not sure what context to use yet
// (but don't leave it in production code)
todoCtx := context.TODO()
processRequest(ctx, "user123")
}
I always use context.Background()
in main functions and tests. The context.TODO()
is handy during development when you’re refactoring and haven’t figured out the right context yet—but if you ship code with TODO contexts, you’re asking for trouble.
Building Context Trees
Here’s where context gets interesting. You don’t just pass the same context everywhere—you derive new contexts that inherit from their parents:
func handleUserRequest(ctx context.Context, userID string) error {
// Create a cancellable context for this specific request
requestCtx, cancel := context.WithCancel(ctx)
defer cancel() // This is crucial - always clean up
// Maybe add a timeout for database operations
dbCtx, dbCancel := context.WithTimeout(requestCtx, 5*time.Second)
defer dbCancel()
// Use the appropriate context for each operation
user, err := fetchUser(dbCtx, userID)
if err != nil {
return err
}
return processUser(requestCtx, user)
}
The beauty here is that if you cancel requestCtx
, both dbCtx
and any other derived contexts automatically get cancelled too. It’s like pulling the plug on an entire branch of work.
The Golden Rule: Always Call Cancel
This might be the most important thing I’ll tell you about contexts. Every time you create a context with a cancel function, you must call that function:
func doWork(ctx context.Context) error {
workCtx, cancel := context.WithTimeout(ctx, 30*time.Second)
defer cancel() // Even if the timeout expires naturally, call this
return performActualWork(workCtx)
}
I’ve debugged too many memory leaks caused by forgotten cancel calls. Even if your context times out naturally, calling cancel ensures resources are freed immediately instead of waiting for the garbage collector.
Respecting Context in Your Functions
When you’re writing functions that might take a while or need to be cancellable, always check the context:
func processLargeDataset(ctx context.Context, data []Item) error {
for i, item := range data {
// Check for cancellation periodically
select {
case <-ctx.Done():
return ctx.Err()
default:
// Continue processing
}
if err := processItem(ctx, item); err != nil {
return err
}
// For long-running loops, check more frequently
if i%100 == 0 {
select {
case <-ctx.Done():
return ctx.Err()
default:
}
}
}
return nil
}
The key is finding the right balance. Check too often and you hurt performance. Check too rarely and cancellation becomes sluggish.
Context as the First Parameter
There’s a convention in Go that context should be the first parameter of any function that needs it:
// Good - context comes first
func FetchUserData(ctx context.Context, userID string, includeHistory bool) (*User, error) {
// implementation
}
// Bad - context buried in parameters
func FetchUserData(userID string, includeHistory bool, ctx context.Context) (*User, error) {
// implementation
}
This isn’t just style—it makes context handling predictable across your codebase. When every function follows this pattern, you never have to guess where the context parameter is.
The real insight about context fundamentals is this: context isn’t overhead you add to your functions—it’s the coordination mechanism that makes your functions work reliably in concurrent, distributed systems. Once you start thinking of context as essential infrastructure rather than optional plumbing, everything else falls into place.
Next up, we’ll dive into cancellation patterns that go way beyond simple timeouts. You’ll learn how to coordinate complex operations, handle partial failures, and build systems that shut down gracefully even when things go wrong.
Cancellation Patterns
When Things Need to Stop (And How to Make Them)
Cancellation is where context really shines, but it’s also where I see the most confusion. Too many developers think cancellation is just about timeouts—press a button, operation stops. In reality, cancellation in distributed systems is more like conducting an orchestra: you need to coordinate multiple moving parts to stop gracefully at the same time.
The trick isn’t just stopping work—it’s stopping work cleanly, without leaving your system in a weird state or leaking resources all over the place.
The Cascade Effect
One of the coolest things about Go’s context model is how cancellation cascades down through derived contexts. Cancel a parent, and all the children stop automatically:
func runComplexWorkflow(ctx context.Context) error {
// Create a workflow-specific context
workflowCtx, cancel := context.WithCancel(ctx)
defer cancel()
// Channel to collect errors from goroutines
errChan := make(chan error, 3)
// Start three concurrent operations
go func() {
errChan <- fetchUserProfile(workflowCtx)
}()
go func() {
errChan <- generateAnalytics(workflowCtx)
}()
go func() {
errChan <- updateRecommendations(workflowCtx)
}()
// Wait for first completion or error
for i := 0; i < 3; i++ {
select {
case err := <-errChan:
if err != nil {
// Something failed - cancel everything else
cancel()
return fmt.Errorf("workflow failed: %w", err)
}
case <-ctx.Done():
// Parent context cancelled - we're done here
return ctx.Err()
}
}
return nil
}
What I love about this pattern is that one failure automatically stops all related work. No need to manually track and cancel individual operations—the context tree handles it for you.
Selective Cancellation (When You Need More Control)
Sometimes you don’t want to cancel everything. Maybe the user data fetch failed, but you still want to show cached recommendations. Here’s how I handle selective cancellation:
type WorkManager struct {
operations map[string]context.CancelFunc
mu sync.RWMutex
}
func NewWorkManager() *WorkManager {
return &WorkManager{
operations: make(map[string]context.CancelFunc),
}
}
func (wm *WorkManager) StartOperation(parent context.Context, name string) context.Context {
wm.mu.Lock()
defer wm.mu.Unlock()
ctx, cancel := context.WithCancel(parent)
wm.operations[name] = cancel
return ctx
}
func (wm *WorkManager) CancelOperation(name string) {
wm.mu.Lock()
defer wm.mu.Unlock()
if cancel, exists := wm.operations[name]; exists {
cancel()
delete(wm.operations, name)
}
}
func (wm *WorkManager) CancelAll() {
wm.mu.Lock()
defer wm.mu.Unlock()
for _, cancel := range wm.operations {
cancel()
}
wm.operations = make(map[string]context.CancelFunc)
}
This gives you fine-grained control over what gets cancelled when. I use this pattern in systems where different operations have different criticality levels.
Smart Timeout Coordination
Here’s something that took me a while to figure out: not all operations should have the same timeout. A cache lookup should fail fast, but a complex calculation might need more time:
func processRequestWithSmartTimeouts(ctx context.Context, req *Request) error {
// Fast operations get short timeouts
cacheCtx, cacheCancel := context.WithTimeout(ctx, 100*time.Millisecond)
defer cacheCancel()
// Slow operations get longer timeouts
dbCtx, dbCancel := context.WithTimeout(ctx, 5*time.Second)
defer dbCancel()
// Try cache first
if data, err := getFromCache(cacheCtx, req.Key); err == nil {
return processData(ctx, data)
}
// Cache miss - hit the database
data, err := getFromDatabase(dbCtx, req.Key)
if err != nil {
return err
}
// Update cache in background (with its own timeout)
go func() {
updateCtx, updateCancel := context.WithTimeout(context.Background(), 2*time.Second)
defer updateCancel()
updateCache(updateCtx, req.Key, data)
}()
return processData(ctx, data)
}
Notice how the cache update runs in a background goroutine with its own context? That’s because we don’t want cache update failures to affect the main request.
Cancellation with Cleanup
This is where things get tricky. When an operation gets cancelled, you often need to clean up resources, but the cleanup itself might take time:
func processWithCleanup(ctx context.Context) error {
// Track resources that need cleanup
var resources []io.Closer
defer func() {
// Clean up resources even if context is cancelled
cleanupCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
for _, resource := range resources {
if err := resource.Close(); err != nil {
log.Printf("Failed to close resource: %v", err)
}
}
}()
// Acquire resources
db, err := openDatabase(ctx)
if err != nil {
return err
}
resources = append(resources, db)
cache, err := openCache(ctx)
if err != nil {
return err
}
resources = append(resources, cache)
// Do the actual work
return performWork(ctx, db, cache)
}
The key insight here is using a separate context for cleanup. Even if the main context is cancelled, you still want to clean up properly.
Handling Different Cancellation Reasons
Not all cancellations are created equal. User cancellation is different from timeout, which is different from system shutdown:
func handleCancellation(ctx context.Context, operation string) error {
err := doSomeWork(ctx)
if err == nil {
return nil
}
// Figure out why we were cancelled
switch {
case errors.Is(err, context.Canceled):
// User hit the cancel button - that's fine
log.Printf("User cancelled %s operation", operation)
return nil
case errors.Is(err, context.DeadlineExceeded):
// Operation timed out - might be a problem
log.Printf("Operation %s timed out", operation)
return fmt.Errorf("operation timeout: %w", err)
default:
// Some other error occurred
return fmt.Errorf("operation failed: %w", err)
}
}
I treat user cancellation as success (they got what they wanted—the operation stopped), but timeouts might indicate a performance problem that needs investigation.
The “Cancel Everything” Pattern
Sometimes you need a nuclear option—cancel all ongoing work immediately. Here’s how I implement that:
type CancellationManager struct {
rootCancel context.CancelFunc
mu sync.RWMutex
}
func NewCancellationManager() *CancellationManager {
ctx, cancel := context.WithCancel(context.Background())
return &CancellationManager{
rootCancel: cancel,
}
}
func (cm *CancellationManager) CreateContext() context.Context {
cm.mu.RLock()
defer cm.mu.RUnlock()
// All contexts derive from the same root
return context.WithValue(context.Background(), "root", cm.rootCancel)
}
func (cm *CancellationManager) CancelEverything() {
cm.mu.Lock()
defer cm.mu.Unlock()
if cm.rootCancel != nil {
cm.rootCancel()
cm.rootCancel = nil
}
}
This is useful for graceful shutdown scenarios where you want to stop all ongoing work before the process exits.
The thing about cancellation patterns is that they’re not just about stopping work—they’re about stopping work in a way that leaves your system in a consistent state. Master these patterns, and you’ll build systems that handle failures gracefully instead of falling over in a heap.
Next, we’ll dive into timeout and deadline management. You’ll learn how to set intelligent timeouts that adapt to system conditions, coordinate deadlines across service boundaries, and handle the tricky edge cases that come up in distributed systems.
Timeout and Deadline Management
Timeouts That Actually Make Sense
Let me tell you about the worst timeout bug I ever encountered. A service was timing out after exactly 30 seconds, every single time, regardless of load or complexity. Turns out someone had hardcoded 30*time.Second
everywhere. During peak traffic, simple operations were timing out, but during quiet periods, complex operations were getting way more time than they needed.
That’s when I learned that smart timeout management isn’t about picking magic numbers—it’s about understanding your system’s behavior and adapting accordingly.
Timeouts vs Deadlines (And When to Use Each)
First, let’s clear up the confusion between timeouts and deadlines. A timeout is relative: “give this operation 5 seconds.” A deadline is absolute: “this must finish by 3:15 PM.”
func demonstrateTimeoutVsDeadline(ctx context.Context) {
// Timeout: relative to now
timeoutCtx, cancel1 := context.WithTimeout(ctx, 5*time.Second)
defer cancel1()
// Deadline: absolute point in time
deadline := time.Now().Add(10*time.Second)
deadlineCtx, cancel2 := context.WithDeadline(ctx, deadline)
defer cancel2()
// Use timeout for operations where you care about duration
fetchUserData(timeoutCtx, "user123")
// Use deadline when you have a hard cutoff time
generateReport(deadlineCtx, "monthly")
}
I use timeouts for most operations because they’re easier to reason about. Deadlines are great when you’re coordinating across multiple services or have external constraints (like “this report must be ready before the meeting starts”).
Adaptive Timeouts (The Smart Way)
Here’s the pattern that changed how I think about timeouts. Instead of hardcoding values, make them adapt based on actual performance:
type SmartTimeout struct {
baseTimeout time.Duration
maxTimeout time.Duration
recentDurations []time.Duration
mu sync.RWMutex
}
func NewSmartTimeout(base, max time.Duration) *SmartTimeout {
return &SmartTimeout{
baseTimeout: base,
maxTimeout: max,
recentDurations: make([]time.Duration, 0, 50),
}
}
func (st *SmartTimeout) GetTimeout() time.Duration {
st.mu.RLock()
defer st.mu.RUnlock()
if len(st.recentDurations) < 10 {
// Not enough data yet, use base timeout
return st.baseTimeout
}
// Calculate 95th percentile of recent operations
sorted := make([]time.Duration, len(st.recentDurations))
copy(sorted, st.recentDurations)
sort.Slice(sorted, func(i, j int) bool {
return sorted[i] < sorted[j]
})
p95 := sorted[int(float64(len(sorted))*0.95)]
adaptiveTimeout := p95 * 2 // Add some buffer
// Clamp to our bounds
if adaptiveTimeout > st.maxTimeout {
return st.maxTimeout
}
if adaptiveTimeout < st.baseTimeout {
return st.baseTimeout
}
return adaptiveTimeout
}
func (st *SmartTimeout) RecordDuration(d time.Duration) {
st.mu.Lock()
defer st.mu.Unlock()
st.recentDurations = append(st.recentDurations, d)
if len(st.recentDurations) > 50 {
// Keep only recent measurements
st.recentDurations = st.recentDurations[1:]
}
}
This timeout learns from your system’s actual behavior. During slow periods, it gives operations more time. During fast periods, it fails fast. Much better than guessing.
Hierarchical Timeouts (Dividing Time Fairly)
When you have a complex operation with multiple stages, you need to divide the available time intelligently:
func processComplexRequest(ctx context.Context, req *Request) error {
// Figure out how much time we have total
deadline, hasDeadline := ctx.Deadline()
if !hasDeadline {
deadline = time.Now().Add(30*time.Second)
}
totalTime := time.Until(deadline)
// Divide time between stages based on typical needs
authTime := totalTime * 10 / 100 // 10% for auth
processTime := totalTime * 70 / 100 // 70% for processing
responseTime := totalTime * 20 / 100 // 20% for response
// Stage 1: Authentication
authDeadline := time.Now().Add(authTime)
authCtx, authCancel := context.WithDeadline(ctx, authDeadline)
defer authCancel()
user, err := authenticateUser(authCtx, req.Token)
if err != nil {
return fmt.Errorf("auth failed: %w", err)
}
// Stage 2: Processing
processDeadline := time.Now().Add(processTime)
processCtx, processCancel := context.WithDeadline(ctx, processDeadline)
defer processCancel()
result, err := processUserRequest(processCtx, user, req)
if err != nil {
return fmt.Errorf("processing failed: %w", err)
}
// Stage 3: Response
responseDeadline := deadline
responseCtx, responseCancel := context.WithDeadline(ctx, responseDeadline)
defer responseCancel()
return sendResponse(responseCtx, result)
}
This ensures no single stage hogs all the time. I’ve seen too many systems where authentication takes 1ms but gets 10 seconds, while the actual work gets starved.
Timeout Inheritance (When You Need More Time)
Sometimes a specific operation needs more time than its parent allows, but you still want to respect cancellation:
func extendTimeoutIfNeeded(parent context.Context, minTimeout time.Duration) (context.Context, context.CancelFunc) {
// Check parent's deadline
if deadline, hasDeadline := parent.Deadline(); hasDeadline {
remaining := time.Until(deadline)
if remaining >= minTimeout {
// Parent has enough time, use it
return context.WithCancel(parent)
}
}
// Parent doesn't have enough time, create new timeout
// but still respect parent cancellation
ctx, cancel := context.WithTimeout(parent, minTimeout)
return ctx, cancel
}
func performCriticalOperation(ctx context.Context) error {
// This operation needs at least 5 minutes
criticalCtx, cancel := extendTimeoutIfNeeded(ctx, 5*time.Minute)
defer cancel()
return doImportantWork(criticalCtx)
}
This pattern lets critical operations get the time they need while still being cancellable by parent contexts.
Cross-Service Timeout Coordination
In microservices, you need to coordinate timeouts across service boundaries. Here’s how I handle it:
type ServiceTimeouts struct {
services map[string]time.Duration
overhead time.Duration
}
func NewServiceTimeouts() *ServiceTimeouts {
return &ServiceTimeouts{
services: map[string]time.Duration{
"auth": 2 * time.Second,
"user": 3 * time.Second,
"billing": 5 * time.Second,
"external": 10 * time.Second,
},
overhead: 500 * time.Millisecond, // Network/processing overhead
}
}
func (st *ServiceTimeouts) CreateServiceContext(ctx context.Context, service string) (context.Context, context.CancelFunc) {
timeout, exists := st.services[service]
if !exists {
timeout = 5 * time.Second // Default
}
// Check if parent context has enough time
if deadline, hasDeadline := ctx.Deadline(); hasDeadline {
remaining := time.Until(deadline) - st.overhead
if remaining <= 0 {
// No time left!
cancelledCtx, cancel := context.WithCancel(ctx)
cancel()
return cancelledCtx, cancel
}
if remaining < timeout {
timeout = remaining
}
}
return context.WithTimeout(ctx, timeout)
}
This ensures each service call gets appropriate time while respecting the overall request deadline.
Timeout Monitoring (Know When Things Go Wrong)
You can’t improve what you don’t measure. Here’s how I monitor timeout behavior:
type TimeoutTracker struct {
operation string
start time.Time
timeout time.Duration
}
func NewTimeoutTracker(operation string, timeout time.Duration) *TimeoutTracker {
return &TimeoutTracker{
operation: operation,
start: time.Now(),
timeout: timeout,
}
}
func (tt *TimeoutTracker) RecordResult(err error) {
duration := time.Since(tt.start)
if errors.Is(err, context.DeadlineExceeded) {
// Operation timed out
log.Printf("TIMEOUT: %s took %v (limit: %v)",
tt.operation, duration, tt.timeout)
// Maybe the timeout is too aggressive?
if duration > tt.timeout*95/100 {
log.Printf("CLOSE_CALL: %s almost timed out", tt.operation)
}
} else if err == nil {
// Success - record how long it actually took
log.Printf("SUCCESS: %s completed in %v (limit: %v)",
tt.operation, duration, tt.timeout)
// Maybe the timeout is too generous?
if duration < tt.timeout/2 {
log.Printf("FAST_COMPLETION: %s finished quickly", tt.operation)
}
}
}
func monitoredOperation(ctx context.Context, operation string) error {
timeout := 5 * time.Second
opCtx, cancel := context.WithTimeout(ctx, timeout)
defer cancel()
tracker := NewTimeoutTracker(operation, timeout)
err := doActualWork(opCtx)
tracker.RecordResult(err)
return err
}
This gives you data to tune your timeouts based on real behavior, not guesswork.
The key insight about timeout management is that good timeouts are dynamic, not static. They adapt to system conditions, coordinate across boundaries, and provide observability into system behavior. When you get timeouts right, your system becomes both responsive and resilient.
Next up, we’ll tackle context values and request scoping. You’ll learn how to carry request-specific data through your application without turning context into a dumping ground for random stuff.
Value Propagation and Request Scoping
Context Values: The Good, The Bad, and The Ugly
Context values are probably the most controversial part of Go’s context package. I’ve seen teams ban them entirely, and I’ve seen other teams abuse them so badly that debugging becomes a nightmare. The truth is somewhere in the middle—context values are incredibly useful when used correctly, but they’re also easy to misuse.
Here’s my rule of thumb: context values should carry information about the request, not information for the request. Think user IDs, trace IDs, request metadata—stuff that helps you understand what’s happening, not stuff your business logic depends on.
Type-Safe Context Keys (No More String Collisions)
The biggest mistake I see with context values is using string keys. That leads to collisions, typos, and runtime panics. Here’s how to do it right:
// Define unexported key types to prevent collisions
type contextKey string
const (
userIDKey contextKey = "user_id"
requestIDKey contextKey = "request_id"
traceIDKey contextKey = "trace_id"
)
// Type-safe setters
func WithUserID(ctx context.Context, userID string) context.Context {
return context.WithValue(ctx, userIDKey, userID)
}
func WithRequestID(ctx context.Context, requestID string) context.Context {
return context.WithValue(ctx, requestIDKey, requestID)
}
// Type-safe getters with proper error handling
func GetUserID(ctx context.Context) (string, bool) {
userID, ok := ctx.Value(userIDKey).(string)
return userID, ok
}
func GetRequestID(ctx context.Context) (string, bool) {
requestID, ok := ctx.Value(requestIDKey).(string)
return requestID, ok
}
// Convenience function for when you don't care about the bool
func MustGetUserID(ctx context.Context) string {
if userID, ok := GetUserID(ctx); ok {
return userID
}
return "unknown"
}
The unexported contextKey
type prevents other packages from accidentally using the same keys. The type assertions in getters ensure you handle missing values gracefully.
Request Metadata Pattern
Instead of scattering individual values throughout your context, I prefer bundling related metadata together:
type RequestInfo struct {
ID string
UserID string
TraceID string
StartTime time.Time
UserAgent string
IPAddress string
}
type requestInfoKey struct{}
func WithRequestInfo(ctx context.Context, info RequestInfo) context.Context {
return context.WithValue(ctx, requestInfoKey{}, info)
}
func GetRequestInfo(ctx context.Context) (RequestInfo, bool) {
info, ok := ctx.Value(requestInfoKey{}).(RequestInfo)
return info, ok
}
// HTTP middleware to populate request info
func RequestInfoMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
info := RequestInfo{
ID: generateRequestID(),
TraceID: r.Header.Get("X-Trace-ID"),
StartTime: time.Now(),
UserAgent: r.UserAgent(),
IPAddress: getClientIP(r),
}
// Extract user ID from JWT or session
if userID := extractUserID(r); userID != "" {
info.UserID = userID
}
ctx := WithRequestInfo(r.Context(), info)
next.ServeHTTP(w, r.WithContext(ctx))
})
}
This approach keeps related data together and makes it easy to add new fields without changing function signatures throughout your codebase.
Context Composition (Merging Contexts Safely)
Sometimes you need to combine contexts from different sources while preserving important values:
type ContextMerger struct {
preserveKeys []interface{}
}
func NewContextMerger(keys ...interface{}) *ContextMerger {
return &ContextMerger{preserveKeys: keys}
}
func (cm *ContextMerger) Merge(base, source context.Context) context.Context {
result := base
// Copy specific values from source to base
for _, key := range cm.preserveKeys {
if value := source.Value(key); value != nil {
result = context.WithValue(result, key, value)
}
}
return result
}
// Example: Creating a background task with request context
func scheduleBackgroundTask(requestCtx context.Context, task Task) {
// Create a background context that won't be cancelled with the request
bgCtx := context.Background()
// But preserve important request metadata
merger := NewContextMerger(userIDKey, requestIDKey, traceIDKey)
taskCtx := merger.Merge(bgCtx, requestCtx)
go processTask(taskCtx, task)
}
This lets you create new contexts with different cancellation behavior while keeping the metadata you need for logging and tracing.
Structured Logging with Context
One of the best uses for context values is enriching your logs automatically:
type ContextLogger struct {
logger *slog.Logger
}
func NewContextLogger(logger *slog.Logger) *ContextLogger {
return &ContextLogger{logger: logger}
}
func (cl *ContextLogger) Info(ctx context.Context, msg string, args ...any) {
attrs := cl.extractContextAttrs(ctx)
cl.logger.Info(msg, append(attrs, args...)...)
}
func (cl *ContextLogger) Error(ctx context.Context, msg string, err error, args ...any) {
attrs := cl.extractContextAttrs(ctx)
attrs = append(attrs, slog.String("error", err.Error()))
cl.logger.Error(msg, append(attrs, args...)...)
}
func (cl *ContextLogger) extractContextAttrs(ctx context.Context) []any {
var attrs []any
if info, ok := GetRequestInfo(ctx); ok {
attrs = append(attrs,
slog.String("request_id", info.ID),
slog.String("user_id", info.UserID),
slog.String("trace_id", info.TraceID),
)
}
return attrs
}
// Usage in your handlers
func handleUserUpdate(ctx context.Context, logger *ContextLogger, req UpdateRequest) error {
logger.Info(ctx, "Starting user update", slog.String("operation", "update_user"))
if err := validateUpdate(req); err != nil {
logger.Error(ctx, "Validation failed", err)
return err
}
logger.Info(ctx, "User update completed successfully")
return nil
}
Now every log entry automatically includes request context without you having to remember to add it manually.
Context Value Validation
In production systems, you want to validate context values to prevent bad data from propagating:
type ContextValidator struct {
rules map[interface{}]ValidationRule
}
type ValidationRule func(interface{}) error
func NewContextValidator() *ContextValidator {
return &ContextValidator{
rules: make(map[interface{}]ValidationRule),
}
}
func (cv *ContextValidator) AddRule(key interface{}, rule ValidationRule) {
cv.rules[key] = rule
}
func (cv *ContextValidator) ValidateAndSet(ctx context.Context, key interface{}, value interface{}) (context.Context, error) {
if rule, exists := cv.rules[key]; exists {
if err := rule(value); err != nil {
return ctx, fmt.Errorf("validation failed for key %v: %w", key, err)
}
}
return context.WithValue(ctx, key, value), nil
}
// Set up validation rules
func setupValidator() *ContextValidator {
validator := NewContextValidator()
// User ID must be non-empty and reasonable length
validator.AddRule(userIDKey, func(v interface{}) error {
userID, ok := v.(string)
if !ok {
return fmt.Errorf("user ID must be string")
}
if len(userID) == 0 || len(userID) > 64 {
return fmt.Errorf("user ID length must be 1-64 characters")
}
return nil
})
return validator
}
This prevents malformed data from causing problems downstream.
Performance Considerations
Context value lookups can get expensive in deep call stacks. Here’s a caching pattern for frequently accessed values:
type ContextCache struct {
cache sync.Map // key -> value
}
func NewContextCache() *ContextCache {
return &ContextCache{}
}
func (cc *ContextCache) Get(ctx context.Context, key interface{}) (interface{}, bool) {
// Check cache first
if value, exists := cc.cache.Load(key); exists {
return value, true
}
// Cache miss - check context
value := ctx.Value(key)
if value != nil {
cc.cache.Store(key, value)
return value, true
}
return nil, false
}
func (cc *ContextCache) Clear() {
cc.cache.Range(func(key, value interface{}) bool {
cc.cache.Delete(key)
return true
})
}
// Use it for expensive lookups
func expensiveOperation(ctx context.Context) error {
cache := NewContextCache()
defer cache.Clear()
// This lookup is now cached
if userID, ok := cache.Get(ctx, userIDKey); ok {
// Use userID
_ = userID
}
return nil
}
This reduces the overhead of repeated context value lookups in performance-critical paths.
What NOT to Put in Context
Let me be clear about what shouldn’t go in context values:
// DON'T do this - business logic data doesn't belong in context
func badExample(ctx context.Context) error {
// This is wrong - database connections should be injected properly
db := ctx.Value("database").(*sql.DB)
// This is wrong - configuration should be explicit
config := ctx.Value("config").(*Config)
// This is wrong - business data should be parameters
userData := ctx.Value("user_data").(*User)
return nil
}
// DO this instead - explicit dependencies and parameters
func goodExample(ctx context.Context, db *sql.DB, config *Config, user *User) error {
// Context only carries request metadata
if requestID, ok := GetRequestID(ctx); ok {
log.Printf("Processing request %s", requestID)
}
return nil
}
Context values are for request-scoped metadata, not for dependency injection or passing business data around.
The key insight about context values is that they should enhance observability and request coordination without becoming a crutch for poor API design. When used correctly, they provide powerful capabilities for tracing, logging, and metadata propagation. When abused, they make code harder to understand and test.
Next, we’ll explore advanced context patterns that combine everything we’ve learned so far to solve complex coordination problems in distributed systems.
Advanced Context Patterns
When Simple Context Isn’t Enough
By now you’ve got the basics down, but real-world systems throw curveballs that basic context patterns can’t handle. What do you do when you need different parts of an operation to have different timeout behaviors? How do you coordinate partial failures across multiple services? These are the problems that advanced context patterns solve.
I’ve spent years dealing with these edge cases, and I’ve learned that the most elegant solutions often involve combining multiple context concepts in creative ways.
Context Multiplexing (Different Rules for Different Operations)
Sometimes you need to run operations in parallel, but each one needs different cancellation and timeout rules:
type ContextMultiplexer struct {
parent context.Context
children map[string]contextInfo
mu sync.RWMutex
}
type contextInfo struct {
ctx context.Context
cancel context.CancelFunc
}
func NewContextMultiplexer(parent context.Context) *ContextMultiplexer {
return &ContextMultiplexer{
parent: parent,
children: make(map[string]contextInfo),
}
}
func (cm *ContextMultiplexer) CreateChild(name string, timeout time.Duration) context.Context {
cm.mu.Lock()
defer cm.mu.Unlock()
ctx, cancel := context.WithTimeout(cm.parent, timeout)
cm.children[name] = contextInfo{ctx: ctx, cancel: cancel}
return ctx
}
func (cm *ContextMultiplexer) CancelChild(name string) {
cm.mu.Lock()
defer cm.mu.Unlock()
if info, exists := cm.children[name]; exists {
info.cancel()
delete(cm.children, name)
}
}
func (cm *ContextMultiplexer) CancelAll() {
cm.mu.Lock()
defer cm.mu.Unlock()
for _, info := range cm.children {
info.cancel()
}
cm.children = make(map[string]contextInfo)
}
// Real-world usage: fetching user data from multiple sources
func fetchCompleteUserProfile(ctx context.Context, userID string) (*UserProfile, error) {
mux := NewContextMultiplexer(ctx)
defer mux.CancelAll()
// Different timeouts for different data sources
profileCtx := mux.CreateChild("profile", 2*time.Second) // Fast
prefsCtx := mux.CreateChild("preferences", 5*time.Second) // Medium
historyCtx := mux.CreateChild("history", 10*time.Second) // Slow
type fetchResult struct {
name string
data interface{}
err error
}
results := make(chan fetchResult, 3)
go func() {
data, err := fetchBasicProfile(profileCtx, userID)
results <- fetchResult{"profile", data, err}
}()
go func() {
data, err := fetchUserPreferences(prefsCtx, userID)
results <- fetchResult{"preferences", data, err}
}()
go func() {
data, err := fetchUserHistory(historyCtx, userID)
results <- fetchResult{"history", data, err}
}()
profile := &UserProfile{}
for i := 0; i < 3; i++ {
select {
case result := <-results:
if result.err != nil {
// Cancel everything on any error
mux.CancelAll()
return nil, fmt.Errorf("%s fetch failed: %w", result.name, result.err)
}
// Populate profile based on result type...
case <-ctx.Done():
return nil, ctx.Err()
}
}
return profile, nil
}
This pattern gives you fine-grained control over each operation while maintaining overall coordination.
Dynamic Context Adaptation
Sometimes you need context behavior to change based on runtime conditions. Here’s how I handle that:
type AdaptiveContext struct {
base context.Context
modifiers []ContextModifier
}
type ContextModifier interface {
ShouldApply(ctx context.Context) bool
Apply(ctx context.Context) (context.Context, context.CancelFunc)
}
// Example: Extend timeout for premium users
type PremiumUserModifier struct {
extraTime time.Duration
}
func (pum *PremiumUserModifier) ShouldApply(ctx context.Context) bool {
userID, ok := GetUserID(ctx)
return ok && isPremiumUser(userID)
}
func (pum *PremiumUserModifier) Apply(ctx context.Context) (context.Context, context.CancelFunc) {
return context.WithTimeout(ctx, pum.extraTime)
}
// Example: Reduce timeout under high load
type LoadBasedModifier struct {
reducedTimeout time.Duration
}
func (lbm *LoadBasedModifier) ShouldApply(ctx context.Context) bool {
return getCurrentSystemLoad() > 0.8
}
func (lbm *LoadBasedModifier) Apply(ctx context.Context) (context.Context, context.CancelFunc) {
return context.WithTimeout(ctx, lbm.reducedTimeout)
}
func NewAdaptiveContext(base context.Context) *AdaptiveContext {
return &AdaptiveContext{
base: base,
modifiers: make([]ContextModifier, 0),
}
}
func (ac *AdaptiveContext) AddModifier(modifier ContextModifier) {
ac.modifiers = append(ac.modifiers, modifier)
}
func (ac *AdaptiveContext) CreateContext() (context.Context, context.CancelFunc) {
ctx := ac.base
var cancels []context.CancelFunc
for _, modifier := range ac.modifiers {
if modifier.ShouldApply(ctx) {
var cancel context.CancelFunc
ctx, cancel = modifier.Apply(ctx)
if cancel != nil {
cancels = append(cancels, cancel)
}
}
}
// Return combined cancel function
return ctx, func() {
for _, cancel := range cancels {
cancel()
}
}
}
This lets your context behavior adapt to user types, system load, or any other runtime conditions.
Context Pipelines (Chaining Operations with Context Evolution)
In complex processing pipelines, each stage might need to modify the context for subsequent stages:
type PipelineStage interface {
Process(ctx context.Context, data interface{}) (context.Context, interface{}, error)
Name() string
}
type ValidationStage struct{}
func (vs *ValidationStage) Name() string { return "validation" }
func (vs *ValidationStage) Process(ctx context.Context, data interface{}) (context.Context, interface{}, error) {
req := data.(*ProcessingRequest)
// High priority requests get extended timeouts
if req.Priority == "high" {
ctx = context.WithValue(ctx, "priority", "high")
// Extend timeout for high priority
newCtx, _ := context.WithTimeout(ctx, 60*time.Second)
ctx = newCtx
}
if err := validateRequest(req); err != nil {
return ctx, nil, err
}
return ctx, req, nil
}
type ProcessingStage struct{}
func (ps *ProcessingStage) Name() string { return "processing" }
func (ps *ProcessingStage) Process(ctx context.Context, data interface{}) (context.Context, interface{}, error) {
req := data.(*ProcessingRequest)
// Check if previous stage marked this as high priority
if priority := ctx.Value("priority"); priority == "high" {
result, err := processHighPriority(ctx, req)
return ctx, result, err
}
result, err := processNormal(ctx, req)
return ctx, result, err
}
type ContextPipeline struct {
stages []PipelineStage
}
func NewContextPipeline(stages ...PipelineStage) *ContextPipeline {
return &ContextPipeline{stages: stages}
}
func (cp *ContextPipeline) Execute(ctx context.Context, initialData interface{}) (interface{}, error) {
currentCtx := ctx
currentData := initialData
for _, stage := range cp.stages {
var err error
currentCtx, currentData, err = stage.Process(currentCtx, currentData)
if err != nil {
return nil, fmt.Errorf("stage %s failed: %w", stage.Name(), err)
}
// Check for cancellation between stages
select {
case <-currentCtx.Done():
return nil, currentCtx.Err()
default:
}
}
return currentData, nil
}
This pipeline pattern allows each stage to influence how subsequent stages behave through context modification.
Context Resource Pooling
For expensive resources that need context-aware lifecycle management:
type ContextAwarePool struct {
pool sync.Pool
active map[interface{}]context.CancelFunc
mu sync.RWMutex
maxAge time.Duration
}
func NewContextAwarePool(factory func() interface{}, maxAge time.Duration) *ContextAwarePool {
return &ContextAwarePool{
pool: sync.Pool{New: factory},
active: make(map[interface{}]context.CancelFunc),
maxAge: maxAge,
}
}
func (cap *ContextAwarePool) Get(ctx context.Context) (interface{}, error) {
resource := cap.pool.Get()
// Create context for this resource with max age
resourceCtx, cancel := context.WithTimeout(ctx, cap.maxAge)
cap.mu.Lock()
cap.active[resource] = cancel
cap.mu.Unlock()
// Monitor for context cancellation
go func() {
<-resourceCtx.Done()
cap.forceReturn(resource)
}()
return resource, nil
}
func (cap *ContextAwarePool) Put(resource interface{}) {
cap.mu.Lock()
if cancel, exists := cap.active[resource]; exists {
cancel()
delete(cap.active, resource)
}
cap.mu.Unlock()
cap.pool.Put(resource)
}
func (cap *ContextAwarePool) forceReturn(resource interface{}) {
cap.mu.Lock()
if cancel, exists := cap.active[resource]; exists {
cancel()
delete(cap.active, resource)
}
cap.mu.Unlock()
// Clean up the resource if needed
if closer, ok := resource.(io.Closer); ok {
closer.Close()
}
}
This pool automatically manages resource lifecycles based on context cancellation and age limits.
Context Merging (Combining Multiple Contexts)
When you need to combine contexts from different sources while preserving all their capabilities:
type MergedContext struct {
contexts []context.Context
done chan struct{}
err error
once sync.Once
}
func MergeContexts(contexts ...context.Context) *MergedContext {
mc := &MergedContext{
contexts: contexts,
done: make(chan struct{}),
}
go mc.monitor()
return mc
}
func (mc *MergedContext) monitor() {
// Use reflection to wait on multiple channels
cases := make([]reflect.SelectCase, len(mc.contexts))
for i, ctx := range mc.contexts {
cases[i] = reflect.SelectCase{
Dir: reflect.SelectRecv,
Chan: reflect.ValueOf(ctx.Done()),
}
}
chosen, _, _ := reflect.Select(cases)
mc.once.Do(func() {
mc.err = mc.contexts[chosen].Err()
close(mc.done)
})
}
func (mc *MergedContext) Done() <-chan struct{} {
return mc.done
}
func (mc *MergedContext) Err() error {
return mc.err
}
func (mc *MergedContext) Deadline() (time.Time, bool) {
var earliest time.Time
hasDeadline := false
for _, ctx := range mc.contexts {
if deadline, ok := ctx.Deadline(); ok {
if !hasDeadline || deadline.Before(earliest) {
earliest = deadline
hasDeadline = true
}
}
}
return earliest, hasDeadline
}
func (mc *MergedContext) Value(key interface{}) interface{} {
for _, ctx := range mc.contexts {
if value := ctx.Value(key); value != nil {
return value
}
}
return nil
}
This merged context cancels when any of its constituent contexts cancel, and uses the earliest deadline.
These advanced patterns become essential when you’re building complex distributed systems where simple request-response patterns aren’t enough. They give you the tools to coordinate sophisticated operations while maintaining the benefits of context-based lifecycle management.
Next, we’ll dive into error handling and recovery patterns that work with these advanced context scenarios.
Error Handling and Recovery
When Context Errors Aren’t Really Errors
Here’s something that took me way too long to figure out: not all context “errors” are actually problems. When a user cancels a request, that’s not a system failure—that’s the system working correctly. When an operation times out because the user set an aggressive deadline, that might be expected behavior, not a bug.
The challenge is building systems that can distinguish between different types of context errors and respond appropriately to each one.
Understanding Context Error Types
Context errors come in different flavors, and each one tells you something different about what happened:
type ContextErrorAnalyzer struct {
operation string
startTime time.Time
}
func NewContextErrorAnalyzer(operation string) *ContextErrorAnalyzer {
return &ContextErrorAnalyzer{
operation: operation,
startTime: time.Now(),
}
}
func (cea *ContextErrorAnalyzer) AnalyzeError(ctx context.Context, err error) string {
if err == nil {
return "success"
}
switch {
case errors.Is(err, context.Canceled):
// Was this user-initiated or system-initiated?
if cea.looksLikeUserCancellation(ctx) {
return "user_cancelled"
}
return "system_cancelled"
case errors.Is(err, context.DeadlineExceeded):
// Did we hit a timeout or an absolute deadline?
if deadline, hasDeadline := ctx.Deadline(); hasDeadline {
if time.Now().After(deadline) {
return "deadline_exceeded"
}
}
return "timeout"
default:
return "other_error"
}
}
func (cea *ContextErrorAnalyzer) looksLikeUserCancellation(ctx context.Context) bool {
// Quick cancellations are often user-initiated (they hit cancel)
elapsed := time.Since(cea.startTime)
if elapsed < 100*time.Millisecond {
return true
}
// Check for user cancellation markers in context
if source := ctx.Value("cancellation_source"); source == "user" {
return true
}
return false
}
// Usage in your error handling
func handleOperation(ctx context.Context) error {
analyzer := NewContextErrorAnalyzer("user_data_fetch")
err := fetchUserData(ctx)
errorType := analyzer.AnalyzeError(ctx, err)
switch errorType {
case "user_cancelled":
log.Info("User cancelled operation - no action needed")
return nil // Treat as success
case "timeout":
log.Warn("Operation timed out - may need performance investigation")
return fmt.Errorf("operation timeout: %w", err)
case "deadline_exceeded":
log.Error("Hard deadline exceeded - system may be overloaded")
return fmt.Errorf("deadline exceeded: %w", err)
default:
return err
}
}
This analysis helps you respond appropriately instead of treating all context errors the same way.
Smart Retry Strategies
Not all context errors should trigger retries. Here’s how I build retry logic that understands context:
type ContextAwareRetry struct {
maxAttempts int
baseDelay time.Duration
maxDelay time.Duration
backoffFactor float64
}
func NewContextAwareRetry() *ContextAwareRetry {
return &ContextAwareRetry{
maxAttempts: 3,
baseDelay: 100 * time.Millisecond,
maxDelay: 5 * time.Second,
backoffFactor: 2.0,
}
}
func (car *ContextAwareRetry) Execute(ctx context.Context, operation func(context.Context) error) error {
var lastErr error
for attempt := 0; attempt < car.maxAttempts; attempt++ {
// Check if we should even try
select {
case <-ctx.Done():
return ctx.Err()
default:
}
lastErr = operation(ctx)
if lastErr == nil {
return nil // Success!
}
// Analyze the error to decide if we should retry
if !car.shouldRetry(ctx, lastErr, attempt) {
return lastErr
}
// Calculate delay for next attempt
delay := car.calculateDelay(attempt)
// Wait with context awareness
select {
case <-time.After(delay):
// Continue to next attempt
case <-ctx.Done():
return ctx.Err()
}
}
return fmt.Errorf("operation failed after %d attempts: %w", car.maxAttempts, lastErr)
}
func (car *ContextAwareRetry) shouldRetry(ctx context.Context, err error, attempt int) bool {
// Don't retry if we're out of attempts
if attempt >= car.maxAttempts-1 {
return false
}
// Never retry user cancellations
if errors.Is(err, context.Canceled) {
return false
}
// Retry timeouts, but only if we have enough time left
if errors.Is(err, context.DeadlineExceeded) {
if deadline, hasDeadline := ctx.Deadline(); hasDeadline {
remaining := time.Until(deadline)
nextDelay := car.calculateDelay(attempt + 1)
return remaining > nextDelay*2 // Need at least 2x delay time remaining
}
return true
}
// Retry other errors
return true
}
func (car *ContextAwareRetry) calculateDelay(attempt int) time.Duration {
delay := time.Duration(float64(car.baseDelay) * math.Pow(car.backoffFactor, float64(attempt)))
if delay > car.maxDelay {
delay = car.maxDelay
}
return delay
}
This retry logic understands context constraints and won’t waste time on futile retry attempts.
Graceful Degradation
When operations fail due to context issues, sometimes you can provide partial functionality instead of complete failure:
type GracefulDegradation struct {
fallbacks map[string]FallbackFunc
}
type FallbackFunc func(ctx context.Context) (interface{}, error)
func NewGracefulDegradation() *GracefulDegradation {
return &GracefulDegradation{
fallbacks: make(map[string]FallbackFunc),
}
}
func (gd *GracefulDegradation) RegisterFallback(operation string, fallback FallbackFunc) {
gd.fallbacks[operation] = fallback
}
func (gd *GracefulDegradation) ExecuteWithFallback(ctx context.Context, operation string,
primary func(context.Context) (interface{}, error)) (interface{}, error) {
// Try primary operation first
result, err := primary(ctx)
if err == nil {
return result, nil
}
// Check if we should try fallback
if !gd.shouldUseFallback(err) {
return nil, err
}
// Try fallback with fresh context (to avoid cascading cancellations)
fallbackCtx := context.Background()
// Copy important values but not cancellation
if userID, ok := GetUserID(ctx); ok {
fallbackCtx = WithUserID(fallbackCtx, userID)
}
if requestID, ok := GetRequestID(ctx); ok {
fallbackCtx = WithRequestID(fallbackCtx, requestID)
}
if fallback, exists := gd.fallbacks[operation]; exists {
log.Printf("Primary operation failed, trying fallback for %s", operation)
return fallback(fallbackCtx)
}
return nil, err
}
func (gd *GracefulDegradation) shouldUseFallback(err error) bool {
// Use fallback for timeouts and cancellations, but not for other errors
return errors.Is(err, context.DeadlineExceeded) || errors.Is(err, context.Canceled)
}
// Example usage
func fetchUserProfile(ctx context.Context, userID string) (*UserProfile, error) {
gd := NewGracefulDegradation()
// Register fallback that returns cached data
gd.RegisterFallback("user_profile", func(ctx context.Context) (interface{}, error) {
return getCachedUserProfile(userID), nil
})
result, err := gd.ExecuteWithFallback(ctx, "user_profile", func(ctx context.Context) (interface{}, error) {
return fetchUserProfileFromDB(ctx, userID)
})
if err != nil {
return nil, err
}
return result.(*UserProfile), nil
}
This degradation strategy provides partial functionality when full operations fail due to context constraints.
Context-Aware Circuit Breaker
Circuit breakers need to understand context errors to avoid tripping on user cancellations:
type ContextCircuitBreaker struct {
state CircuitState
failures int
successes int
lastFailure time.Time
timeout time.Duration
threshold int
mu sync.RWMutex
}
type CircuitState int
const (
Closed CircuitState = iota
Open
HalfOpen
)
func NewContextCircuitBreaker(threshold int, timeout time.Duration) *ContextCircuitBreaker {
return &ContextCircuitBreaker{
state: Closed,
threshold: threshold,
timeout: timeout,
}
}
func (ccb *ContextCircuitBreaker) Execute(ctx context.Context, operation func(context.Context) error) error {
ccb.mu.RLock()
state := ccb.state
ccb.mu.RUnlock()
if state == Open {
if time.Since(ccb.lastFailure) < ccb.timeout {
return fmt.Errorf("circuit breaker is open")
}
ccb.setState(HalfOpen)
}
err := operation(ctx)
if err != nil {
// Only count real failures, not user cancellations
if ccb.isRealFailure(err) {
ccb.recordFailure()
}
return err
}
ccb.recordSuccess()
return nil
}
func (ccb *ContextCircuitBreaker) isRealFailure(err error) bool {
// Don't count user cancellations as failures
if errors.Is(err, context.Canceled) {
return false
}
// Timeouts might indicate system problems, so count them
if errors.Is(err, context.DeadlineExceeded) {
return true
}
// Other errors are real failures
return true
}
func (ccb *ContextCircuitBreaker) recordFailure() {
ccb.mu.Lock()
defer ccb.mu.Unlock()
ccb.failures++
ccb.lastFailure = time.Now()
if ccb.failures >= ccb.threshold {
ccb.state = Open
}
}
func (ccb *ContextCircuitBreaker) recordSuccess() {
ccb.mu.Lock()
defer ccb.mu.Unlock()
ccb.successes++
if ccb.state == HalfOpen {
ccb.state = Closed
ccb.failures = 0
}
}
func (ccb *ContextCircuitBreaker) setState(state CircuitState) {
ccb.mu.Lock()
defer ccb.mu.Unlock()
ccb.state = state
}
This circuit breaker won’t trip just because users are cancelling requests—it focuses on actual system failures.
Error Aggregation Across Operations
When you’re running multiple operations, you need smart error aggregation that understands context:
type ContextErrorCollector struct {
errors []ContextError
threshold int
mu sync.Mutex
}
type ContextError struct {
Operation string
Error error
ErrorType string
Timestamp time.Time
}
func NewContextErrorCollector(threshold int) *ContextErrorCollector {
return &ContextErrorCollector{
errors: make([]ContextError, 0),
threshold: threshold,
}
}
func (cec *ContextErrorCollector) AddError(operation string, err error) {
cec.mu.Lock()
defer cec.mu.Unlock()
errorType := "other"
if errors.Is(err, context.Canceled) {
errorType = "cancelled"
} else if errors.Is(err, context.DeadlineExceeded) {
errorType = "timeout"
}
cec.errors = append(cec.errors, ContextError{
Operation: operation,
Error: err,
ErrorType: errorType,
Timestamp: time.Now(),
})
}
func (cec *ContextErrorCollector) ShouldAbort() bool {
cec.mu.Lock()
defer cec.mu.Unlock()
if len(cec.errors) < cec.threshold {
return false
}
// Count only real failures, not user cancellations
realFailures := 0
for _, err := range cec.errors {
if err.ErrorType != "cancelled" {
realFailures++
}
}
return realFailures >= cec.threshold
}
func (cec *ContextErrorCollector) GetSummary() string {
cec.mu.Lock()
defer cec.mu.Unlock()
counts := make(map[string]int)
for _, err := range cec.errors {
counts[err.ErrorType]++
}
return fmt.Sprintf("Errors: %d cancelled, %d timeout, %d other",
counts["cancelled"], counts["timeout"], counts["other"])
}
This collector helps you make intelligent decisions about when to abort complex operations based on the types of errors you’re seeing.
The key insight about context error handling is that context errors are communication, not just failures. They tell you about user intentions, system constraints, and operational conditions. When you handle them appropriately, you build systems that are both robust and user-friendly.
In our final part, we’ll cover production best practices that tie everything together—monitoring, performance optimization, and operational considerations for context-aware systems.
Production Best Practices
Context in the Real World
Everything we’ve covered so far works great in development, but production is where context patterns either shine or fall apart spectacularly. I’ve learned this the hard way—context issues that never show up during testing can bring down entire systems under load.
The biggest production challenges with context aren’t about correctness—they’re about performance, observability, and operational complexity. You need to monitor context usage, prevent resource leaks, and debug issues across distributed systems.
Monitoring Context Performance
Context operations can become bottlenecks under high load. Here’s how I monitor context performance in production:
type ContextMetrics struct {
creationCounter *prometheus.CounterVec
cancellationCounter *prometheus.CounterVec
timeoutHistogram *prometheus.HistogramVec
activeContexts prometheus.Gauge
}
func NewContextMetrics() *ContextMetrics {
return &ContextMetrics{
creationCounter: prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "context_creations_total",
Help: "Total context creations by type",
},
[]string{"type", "operation"},
),
cancellationCounter: prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "context_cancellations_total",
Help: "Context cancellations by reason",
},
[]string{"reason", "operation"},
),
timeoutHistogram: prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Name: "context_timeout_duration_seconds",
Help: "Context timeout durations",
Buckets: []float64{0.001, 0.01, 0.1, 1, 5, 10, 30, 60},
},
[]string{"operation"},
),
activeContexts: prometheus.NewGauge(
prometheus.GaugeOpts{
Name: "active_contexts_current",
Help: "Currently active contexts",
},
),
}
}
type MonitoredContext struct {
context.Context
metrics *ContextMetrics
operation string
startTime time.Time
}
func (cm *ContextMetrics) WrapContext(ctx context.Context, operation string) *MonitoredContext {
cm.creationCounter.WithLabelValues("wrapped", operation).Inc()
cm.activeContexts.Inc()
return &MonitoredContext{
Context: ctx,
metrics: cm,
operation: operation,
startTime: time.Now(),
}
}
func (mc *MonitoredContext) Done() <-chan struct{} {
done := mc.Context.Done()
// Monitor for cancellation in background
go func() {
<-done
mc.recordCancellation()
}()
return done
}
func (mc *MonitoredContext) recordCancellation() {
mc.metrics.activeContexts.Dec()
reason := "unknown"
if errors.Is(mc.Err(), context.Canceled) {
reason = "cancelled"
} else if errors.Is(mc.Err(), context.DeadlineExceeded) {
reason = "timeout"
duration := time.Since(mc.startTime)
mc.metrics.timeoutHistogram.WithLabelValues(mc.operation).Observe(duration.Seconds())
}
mc.metrics.cancellationCounter.WithLabelValues(reason, mc.operation).Inc()
}
This monitoring gives you visibility into context usage patterns and helps identify performance issues.
Context Leak Detection
Context leaks are silent killers in production. Here’s my leak detection system:
type ContextLeakDetector struct {
activeContexts map[uintptr]*ContextInfo
mu sync.RWMutex
alertThreshold int
checkInterval time.Duration
stopChan chan struct{}
}
type ContextInfo struct {
ID uintptr
CreatedAt time.Time
Operation string
StackTrace string
AccessCount int64
}
func NewContextLeakDetector(threshold int, interval time.Duration) *ContextLeakDetector {
detector := &ContextLeakDetector{
activeContexts: make(map[uintptr]*ContextInfo),
alertThreshold: threshold,
checkInterval: interval,
stopChan: make(chan struct{}),
}
go detector.monitor()
return detector
}
func (cld *ContextLeakDetector) RegisterContext(ctx context.Context, operation string) {
cld.mu.Lock()
defer cld.mu.Unlock()
id := uintptr(unsafe.Pointer(&ctx))
// Capture stack trace for debugging
buf := make([]byte, 2048)
n := runtime.Stack(buf, false)
cld.activeContexts[id] = &ContextInfo{
ID: id,
CreatedAt: time.Now(),
Operation: operation,
StackTrace: string(buf[:n]),
AccessCount: 1,
}
}
func (cld *ContextLeakDetector) UnregisterContext(ctx context.Context) {
cld.mu.Lock()
defer cld.mu.Unlock()
id := uintptr(unsafe.Pointer(&ctx))
delete(cld.activeContexts, id)
}
func (cld *ContextLeakDetector) monitor() {
ticker := time.NewTicker(cld.checkInterval)
defer ticker.Stop()
for {
select {
case <-ticker.C:
cld.checkForLeaks()
case <-cld.stopChan:
return
}
}
}
func (cld *ContextLeakDetector) checkForLeaks() {
cld.mu.RLock()
defer cld.mu.RUnlock()
now := time.Now()
suspiciousContexts := 0
for _, info := range cld.activeContexts {
age := now.Sub(info.CreatedAt)
// Flag contexts older than 5 minutes
if age > 5*time.Minute {
suspiciousContexts++
if suspiciousContexts <= 5 { // Don't spam logs
log.Printf("POTENTIAL LEAK: Context %s created %v ago at:\n%s",
info.Operation, age, info.StackTrace)
}
}
}
if suspiciousContexts > cld.alertThreshold {
log.Printf("ALERT: %d potentially leaked contexts detected", suspiciousContexts)
}
}
This detector helps catch context leaks before they cause memory issues.
High-Performance Context Pooling
In high-throughput systems, context creation overhead matters. Here’s my pooling approach:
type ContextPool struct {
pool sync.Pool
maxPoolSize int
currentSize int64
metrics *ContextMetrics
}
func NewContextPool(maxSize int, metrics *ContextMetrics) *ContextPool {
return &ContextPool{
pool: sync.Pool{
New: func() interface{} {
return &PooledContext{}
},
},
maxPoolSize: maxSize,
metrics: metrics,
}
}
type PooledContext struct {
context.Context
pool *ContextPool
inUse bool
createdAt time.Time
}
func (cp *ContextPool) Get(parent context.Context) *PooledContext {
if atomic.LoadInt64(&cp.currentSize) >= int64(cp.maxPoolSize) {
// Pool full, create new
return &PooledContext{
Context: parent,
createdAt: time.Now(),
}
}
pooled := cp.pool.Get().(*PooledContext)
pooled.Context = parent
pooled.pool = cp
pooled.inUse = true
pooled.createdAt = time.Now()
atomic.AddInt64(&cp.currentSize, 1)
if cp.metrics != nil {
cp.metrics.creationCounter.WithLabelValues("pooled", "get").Inc()
}
return pooled
}
func (cp *ContextPool) Put(ctx *PooledContext) {
if ctx.pool != cp || !ctx.inUse {
return
}
ctx.inUse = false
ctx.Context = nil
// Don't pool old contexts
if time.Since(ctx.createdAt) > time.Hour {
atomic.AddInt64(&cp.currentSize, -1)
return
}
cp.pool.Put(ctx)
}
func (pc *PooledContext) Release() {
if pc.pool != nil {
pc.pool.Put(pc)
}
}
This pooling reduces allocation overhead while preventing memory bloat.
Distributed Context Tracing
In microservices, you need to trace context across service boundaries:
type DistributedTracer struct {
serviceName string
}
func NewDistributedTracer(serviceName string) *DistributedTracer {
return &DistributedTracer{serviceName: serviceName}
}
func (dt *DistributedTracer) InjectHeaders(ctx context.Context, headers map[string]string) {
if requestID, ok := GetRequestID(ctx); ok {
headers["X-Request-ID"] = requestID
}
if traceID, ok := GetTraceID(ctx); ok {
headers["X-Trace-ID"] = traceID
}
if userID, ok := GetUserID(ctx); ok {
headers["X-User-ID"] = userID
}
// Add service hop information
headers["X-Service-Path"] = dt.serviceName
}
func (dt *DistributedTracer) ExtractContext(headers map[string]string) context.Context {
ctx := context.Background()
if requestID := headers["X-Request-ID"]; requestID != "" {
ctx = WithRequestID(ctx, requestID)
}
if traceID := headers["X-Trace-ID"]; traceID != "" {
ctx = WithTraceID(ctx, traceID)
}
if userID := headers["X-User-ID"]; userID != "" {
ctx = WithUserID(ctx, userID)
}
return ctx
}
// HTTP client wrapper
func (dt *DistributedTracer) DoRequest(ctx context.Context, req *http.Request) (*http.Response, error) {
headers := make(map[string]string)
dt.InjectHeaders(ctx, headers)
for key, value := range headers {
req.Header.Set(key, value)
}
return http.DefaultClient.Do(req)
}
This ensures context information flows correctly across service boundaries.
Production Configuration Management
Production systems need configurable context behavior:
type ContextConfig struct {
DefaultTimeout time.Duration `json:"default_timeout"`
MaxTimeout time.Duration `json:"max_timeout"`
EnableLeakDetection bool `json:"enable_leak_detection"`
EnablePooling bool `json:"enable_pooling"`
MaxPoolSize int `json:"max_pool_size"`
EnableMetrics bool `json:"enable_metrics"`
}
type ProductionContextManager struct {
config *ContextConfig
pool *ContextPool
leakDetector *ContextLeakDetector
metrics *ContextMetrics
mu sync.RWMutex
}
func NewProductionContextManager(config *ContextConfig) *ProductionContextManager {
manager := &ProductionContextManager{config: config}
if config.EnableMetrics {
manager.metrics = NewContextMetrics()
}
if config.EnablePooling {
manager.pool = NewContextPool(config.MaxPoolSize, manager.metrics)
}
if config.EnableLeakDetection {
manager.leakDetector = NewContextLeakDetector(10, 30*time.Second)
}
return manager
}
func (pcm *ProductionContextManager) CreateContext(parent context.Context, operation string) (context.Context, context.CancelFunc) {
pcm.mu.RLock()
config := pcm.config
pcm.mu.RUnlock()
// Apply default timeout if none exists
if _, hasDeadline := parent.Deadline(); !hasDeadline {
parent, _ = context.WithTimeout(parent, config.DefaultTimeout)
}
ctx, cancel := context.WithCancel(parent)
// Register with leak detector
if pcm.leakDetector != nil {
pcm.leakDetector.RegisterContext(ctx, operation)
}
// Wrap with metrics
if pcm.metrics != nil {
ctx = pcm.metrics.WrapContext(ctx, operation)
}
// Enhanced cancel with cleanup
enhancedCancel := func() {
cancel()
if pcm.leakDetector != nil {
pcm.leakDetector.UnregisterContext(ctx)
}
}
return ctx, enhancedCancel
}
func (pcm *ProductionContextManager) UpdateConfig(newConfig *ContextConfig) error {
pcm.mu.Lock()
defer pcm.mu.Unlock()
if newConfig.DefaultTimeout <= 0 || newConfig.MaxTimeout <= 0 {
return fmt.Errorf("invalid timeout configuration")
}
if newConfig.DefaultTimeout > newConfig.MaxTimeout {
return fmt.Errorf("default timeout exceeds max timeout")
}
pcm.config = newConfig
return nil
}
This manager provides runtime configuration of context behavior for production environments.
The key insight about production context patterns is that observability and operational control are just as important as functional correctness. The most successful context implementations provide comprehensive monitoring, efficient resource management, and operational flexibility that enable teams to maintain reliable service at scale.
By implementing these production best practices, you’ll have a robust foundation for context-aware applications that can handle the complexities of real-world distributed systems while providing the visibility and control needed for effective operations. The patterns we’ve covered throughout this guide give you a complete toolkit for building sophisticated request lifecycle management that scales from development to production.
Context Pool Management for High-Throughput Systems
In high-throughput systems, context creation overhead can become significant. Here’s a context pooling strategy:
type ContextPool struct {
pool sync.Pool
metrics *ContextMetrics
maxPoolSize int
currentSize int64
mu sync.RWMutex
}
type PooledContext struct {
context.Context
pool *ContextPool
inUse bool
createdAt time.Time
}
func NewContextPool(maxSize int, metrics *ContextMetrics) *ContextPool {
return &ContextPool{
pool: sync.Pool{
New: func() interface{} {
return &PooledContext{
createdAt: time.Now(),
}
},
},
metrics: metrics,
maxPoolSize: maxSize,
}
}
func (cp *ContextPool) Get(parent context.Context) *PooledContext {
cp.mu.Lock()
defer cp.mu.Unlock()
if cp.currentSize >= int64(cp.maxPoolSize) {
// Pool is full, create new context
return &PooledContext{
Context: parent,
pool: cp,
inUse: true,
createdAt: time.Now(),
}
}
pooled := cp.pool.Get().(*PooledContext)
pooled.Context = parent
pooled.pool = cp
pooled.inUse = true
pooled.createdAt = time.Now()
atomic.AddInt64(&cp.currentSize, 1)
cp.metrics.creationCounter.WithLabelValues("pooled", "get").Inc()
return pooled
}
func (cp *ContextPool) Put(ctx *PooledContext) {
if ctx.pool != cp || !ctx.inUse {
return
}
cp.mu.Lock()
defer cp.mu.Unlock()
ctx.inUse = false
ctx.Context = nil
// Don't pool contexts that are too old
if time.Since(ctx.createdAt) > time.Hour {
atomic.AddInt64(&cp.currentSize, -1)
return
}
cp.pool.Put(ctx)
cp.metrics.creationCounter.WithLabelValues("pooled", "put").Inc()
}
func (pc *PooledContext) Release() {
if pc.pool != nil {
pc.pool.Put(pc)
}
}
This pooling approach reduces allocation overhead in high-throughput scenarios while preventing memory bloat.
Distributed Context Tracing
In microservices architectures, tracing context propagation across service boundaries is crucial:
type DistributedContextTracer struct {
tracer opentracing.Tracer
propagator ContextPropagator
}
type ContextPropagator interface {
Inject(ctx context.Context, headers map[string]string) error
Extract(headers map[string]string) (context.Context, error)
}
type HTTPContextPropagator struct{}
func (hcp *HTTPContextPropagator) Inject(ctx context.Context, headers map[string]string) error {
if requestID := GetRequestID(ctx); requestID != "" {
headers["X-Request-ID"] = requestID
}
if traceID := GetTraceID(ctx); traceID != "" {
headers["X-Trace-ID"] = traceID
}
if userID := GetUserID(ctx); userID != "" {
headers["X-User-ID"] = userID
}
return nil
}
func (hcp *HTTPContextPropagator) Extract(headers map[string]string) (context.Context, error) {
ctx := context.Background()
if requestID := headers["X-Request-ID"]; requestID != "" {
ctx = WithRequestID(ctx, requestID)
}
if traceID := headers["X-Trace-ID"]; traceID != "" {
ctx = WithTraceID(ctx, traceID)
}
if userID := headers["X-User-ID"]; userID != "" {
ctx = WithUserID(ctx, userID)
}
return ctx, nil
}
func NewDistributedContextTracer(tracer opentracing.Tracer) *DistributedContextTracer {
return &DistributedContextTracer{
tracer: tracer,
propagator: &HTTPContextPropagator{},
}
}
func (dct *DistributedContextTracer) StartSpanFromContext(ctx context.Context, operationName string) (opentracing.Span, context.Context) {
span, ctx := opentracing.StartSpanFromContext(ctx, operationName)
// Enrich span with context values
if requestID := GetRequestID(ctx); requestID != "" {
span.SetTag("request.id", requestID)
}
if userID := GetUserID(ctx); userID != "" {
span.SetTag("user.id", userID)
}
return span, ctx
}
func (dct *DistributedContextTracer) InjectIntoHTTPHeaders(ctx context.Context, req *http.Request) error {
headers := make(map[string]string)
if err := dct.propagator.Inject(ctx, headers); err != nil {
return err
}
for key, value := range headers {
req.Header.Set(key, value)
}
return nil
}
This tracing system ensures context information flows correctly across service boundaries with proper observability.
Context Configuration Management
Production systems need configurable context behavior that can be adjusted without code changes:
type ContextConfig struct {
DefaultTimeout time.Duration `json:"default_timeout"`
MaxTimeout time.Duration `json:"max_timeout"`
EnableLeakDetection bool `json:"enable_leak_detection"`
LeakCheckInterval time.Duration `json:"leak_check_interval"`
EnablePooling bool `json:"enable_pooling"`
MaxPoolSize int `json:"max_pool_size"`
EnableMetrics bool `json:"enable_metrics"`
ValueCacheSize int `json:"value_cache_size"`
}
type ConfigurableContextManager struct {
config *ContextConfig
pool *ContextPool
leakDetector *ContextLeakDetector
metrics *ContextMetrics
mu sync.RWMutex
}
func NewConfigurableContextManager(config *ContextConfig) *ConfigurableContextManager {
manager := &ConfigurableContextManager{
config: config,
}
if config.EnableMetrics {
manager.metrics = NewContextMetrics()
}
if config.EnablePooling {
manager.pool = NewContextPool(config.MaxPoolSize, manager.metrics)
}
if config.EnableLeakDetection {
manager.leakDetector = NewContextLeakDetector(10, config.LeakCheckInterval)
}
return manager
}
func (ccm *ConfigurableContextManager) CreateContext(parent context.Context, operation string) (context.Context, context.CancelFunc) {
ccm.mu.RLock()
config := ccm.config
ccm.mu.RUnlock()
// Apply default timeout if none exists
if _, hasDeadline := parent.Deadline(); !hasDeadline {
parent, _ = context.WithTimeout(parent, config.DefaultTimeout)
}
ctx, cancel := context.WithCancel(parent)
// Register with leak detector
if ccm.leakDetector != nil {
ccm.leakDetector.RegisterContext(ctx, operation)
}
// Wrap with metrics if enabled
if ccm.metrics != nil {
ctx = ccm.metrics.WrapContext(ctx, operation)
}
// Enhanced cancel function with cleanup
enhancedCancel := func() {
cancel()
if ccm.leakDetector != nil {
ccm.leakDetector.UnregisterContext(ctx)
}
}
return ctx, enhancedCancel
}
func (ccm *ConfigurableContextManager) UpdateConfig(newConfig *ContextConfig) error {
ccm.mu.Lock()
defer ccm.mu.Unlock()
// Validate configuration
if newConfig.DefaultTimeout <= 0 || newConfig.MaxTimeout <= 0 {
return fmt.Errorf("invalid timeout configuration")
}
if newConfig.DefaultTimeout > newConfig.MaxTimeout {
return fmt.Errorf("default timeout cannot exceed max timeout")
}
ccm.config = newConfig
return nil
}
This configurable manager allows runtime adjustment of context behavior based on operational requirements.
The key insight about production context patterns is that observability, performance, and operational flexibility are just as important as functional correctness. The most successful context implementations provide comprehensive monitoring, efficient resource management, and operational controls that enable teams to maintain reliable service in production environments.
By implementing these production best practices, you’ll have a robust foundation for context-aware applications that can scale reliably while providing the observability and control needed for effective operations. The patterns covered throughout this guide provide a comprehensive toolkit for building sophisticated request lifecycle management systems that handle the complexities of modern distributed applications.