Goroutines are cheap. Goroutine leaks are not.

I learned this the hard way at 2am on a Tuesday, staring at Grafana dashboards showing one of our services consuming 40GB of RAM and climbing. The service normally sat around 500MB. We’d shipped a change three days earlier — a seemingly innocent fan-out pattern to parallelize calls to a downstream API. The code looked fine. Reviews passed. Tests passed. What we’d missed was that when the downstream service timed out, nothing was cancelling the spawned goroutines. They just… accumulated. Thousands per minute, each holding onto its request body and response buffer, waiting for a context that would never expire because we’d used context.Background() instead of propagating the parent context.

We fixed it by propagating the request context and adding a context.WithTimeout wrapper around the fan-out. The fix was six lines. The incident cost us about four hours of downtime and a very awkward post-mortem where I had to explain that yes, I had reviewed the PR, and no, I hadn’t thought about what happens when the downstream is slow.

That incident changed how I think about concurrency in Go. Not as a feature to reach for, but as a tool that demands the same discipline as manual memory management in C. Every goroutine you spawn is a commitment — you need to know exactly how it ends. If you can’t point to the line of code where a goroutine exits, you have a bug. Maybe not today, but eventually.

This article covers the patterns I’ve settled on after running Go microservices in production for several years. They’re not clever. They’re boring. Boring is what you want at 2am.


Worker Pools

The worker pool is the workhorse pattern. You’ve got N items to process, you don’t want to spawn N goroutines, and you need backpressure. Here’s the version I use everywhere:

func WorkerPool[T any, R any](ctx context.Context, workers int, items []T, fn func(context.Context, T) (R, error)) ([]R, error) {
	jobs := make(chan T)
	type result struct {
		val R
		err error
		idx int
	}
	results := make(chan result, len(items))

	var wg sync.WaitGroup
	for i := 0; i < workers; i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()
			for item := range jobs {
				val, err := fn(ctx, item)
				results <- result{val: val, err: err}
			}
		}()
	}

	for _, item := range items {
		select {
		case jobs <- item:
		case <-ctx.Done():
			close(jobs)
			wg.Wait()
			close(results)
			return nil, ctx.Err()
		}
	}
	close(jobs)
	wg.Wait()
	close(results)

	var out []R
	for r := range results {
		if r.err != nil {
			return nil, r.err
		}
		out = append(out, r.val)
	}
	return out, nil
}

A few things to notice. The jobs channel is unbuffered — that’s the backpressure. If all workers are busy, the sender blocks. The select on ctx.Done() means we stop feeding work if the context is cancelled. And we close jobs before waiting, so workers drain what’s in-flight and exit cleanly.

I’ve seen people buffer the jobs channel to “improve throughput.” Don’t. If your workers can’t keep up, you want to know immediately, not after you’ve queued ten thousand items into memory. Buffered channels hide latency problems. Unbuffered channels surface them. In production, you always want to know about latency problems.

How many workers should you use? I start with runtime.NumCPU() for CPU-bound work and 10-20 for I/O-bound work, then tune from there based on actual metrics. The right number depends on your workload, your downstream services, and how much memory each in-flight item consumes. Profile, don’t guess.

The generic version above is what I reach for now that Go has generics. Before 1.18, I had about fifteen copies of this pattern with different types. They were all slightly different. They all had slightly different bugs.


Fan-Out/Fan-In

Fan-out/fan-in is what bit us in the 40GB incident. The pattern itself is fine — the problem is always lifecycle management. Here’s how I do it now:

func FanOut[T any](ctx context.Context, fns ...func(context.Context) (T, error)) ([]T, error) {
	ctx, cancel := context.WithCancel(ctx)
	defer cancel()

	type result struct {
		val T
		err error
		idx int
	}
	ch := make(chan result, len(fns))

	for i, fn := range fns {
		i, fn := i, fn
		go func() {
			val, err := fn(ctx)
			ch <- result{val: val, err: err, idx: i}
		}()
	}

	results := make([]T, len(fns))
	for range fns {
		r := <-ch
		if r.err != nil {
			return nil, r.err
		}
		results[r.idx] = r.val
	}
	return results, nil
}

The critical line is ctx, cancel := context.WithCancel(ctx) with defer cancel(). When any function returns an error and we bail out early, the deferred cancel fires and all the other goroutines get the signal to stop. Without this, you get my 2am incident.

The buffered channel (len(fns)) is also important. If we return early on error, the remaining goroutines still need somewhere to send their results. An unbuffered channel would leave them blocked forever — another leak. The buffer ensures every goroutine can complete its send and exit, even if nobody’s reading anymore.

This is the kind of detail that doesn’t show up in tutorials. When I was first learning concurrency in Go, every example assumed the happy path. Real services don’t live on the happy path.


Context Propagation

If there’s one rule I’d tattoo on every Go developer’s forehead, it’s this: never use context.Background() in request-handling code. Ever.

Every goroutine spawned during request processing needs the request’s context. Full stop. Here’s what goes wrong when you don’t:

// DON'T DO THIS
func (s *Service) HandleRequest(ctx context.Context, req Request) error {
	go func() {
		// This goroutine survives the request. If the client
		// disconnects, this keeps running. Forever, potentially.
		result := s.expensiveOperation(context.Background())
		s.cache.Set(req.Key, result)
	}()
	return nil
}
// Do this instead
func (s *Service) HandleRequest(ctx context.Context, req Request) error {
	go func() {
		result, err := s.expensiveOperation(ctx)
		if err != nil {
			return // context cancelled, move on
		}
		s.cache.Set(req.Key, result)
	}()
	return nil
}

But wait — that second version still has a problem. We’re spawning a goroutine and not tracking it. If the service shuts down, that goroutine might get killed mid-operation. This is where the patterns start composing together, and where things get interesting.

In microservices architecture patterns, I talked about how services need clean boundaries. Context propagation is how you enforce those boundaries at the code level. A context is a contract: “you have this long to finish, and here’s how you’ll know if I don’t need the answer anymore.”


Graceful Shutdown

This is the pattern that ties everything together. A microservice needs to stop accepting new work, finish in-flight work, and exit cleanly — all within a deadline. Here’s the skeleton I use for every service:

func main() {
	ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
	defer stop()

	srv := &http.Server{Addr: ":8080", Handler: newRouter()}

	// Start server in background
	go func() {
		if err := srv.ListenAndServe(); err != http.ErrServerClosed {
			log.Fatalf("server error: %v", err)
		}
	}()

	// Block until signal
	<-ctx.Done()
	log.Println("shutting down...")

	// Give in-flight requests 30 seconds to finish
	shutdownCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
	defer cancel()

	if err := srv.Shutdown(shutdownCtx); err != nil {
		log.Fatalf("forced shutdown: %v", err)
	}
	log.Println("clean shutdown")
}

signal.NotifyContext is one of those stdlib gems that not enough people know about. It gives you a context that cancels on OS signals — no channel juggling required.

The context.Background() in the shutdown timeout is one of the rare cases where it’s correct. The request context is already done (that’s why we’re shutting down). We need a fresh deadline for the cleanup phase.

But this only handles HTTP requests. What about background goroutines — worker pools processing queue messages, periodic tasks, that sort of thing? You need a sync.WaitGroup or something like errgroup.Group:

func run(ctx context.Context) error {
	g, ctx := errgroup.WithContext(ctx)

	g.Go(func() error {
		return runHTTPServer(ctx)
	})

	g.Go(func() error {
		return runQueueConsumer(ctx)
	})

	g.Go(func() error {
		return runPeriodicCleanup(ctx)
	})

	return g.Wait()
}

When any component fails, errgroup cancels the shared context, which signals all other components to shut down. Then Wait() blocks until everyone’s done. It’s clean, it’s composable, and it means you can’t accidentally orphan a background process.


Rate Limiting and Backpressure

In a microservices architecture, your service is someone else’s downstream dependency. If you don’t rate-limit your outbound calls, you become the reason their service falls over. I’ve been on both sides of this conversation and neither is fun.

Go’s golang.org/x/time/rate package gives you a token bucket limiter:

limiter := rate.NewLimiter(rate.Limit(100), 10) // 100 req/s, burst of 10

func (c *Client) Do(ctx context.Context, req *http.Request) (*http.Response, error) {
	if err := c.limiter.Wait(ctx); err != nil {
		return nil, err
	}
	return c.http.Do(req)
}

Wait blocks until a token is available or the context expires. That’s it. No goroutine pool, no ticker, no custom queue. The limiter handles the math. I’ve seen teams build elaborate rate-limiting middleware from scratch when this package does exactly what they need in three lines.

Combine this with the worker pool pattern and you get bounded concurrency with rate limiting — which is what most microservice-to-microservice communication actually needs. Ten workers, each rate-limited to the downstream’s published capacity. Simple, predictable, and your downstream team won’t page you at 3am.

Coming from Python async programming, I kept looking for the equivalent of semaphores and event loops. Go’s approach is different. You don’t manage an event loop. You spawn goroutines and constrain them with channels and contexts. It took me a while to stop fighting this and just let the runtime do its job.


The Patterns Compose

Here’s the thing that makes Go’s concurrency model click: these patterns aren’t isolated techniques. They compose.

A real service might look like this: HTTP server with graceful shutdown, spawning worker pools for batch endpoints, using fan-out for aggregation endpoints, rate-limiting calls to downstream services, and propagating contexts through all of it so that when a client disconnects or the service shuts down, everything unwinds cleanly.

func (s *Service) AggregatePricing(ctx context.Context, productIDs []string) ([]Price, error) {
	// Fan out to multiple pricing sources, with rate limiting
	prices, err := FanOut(ctx,
		func(ctx context.Context) ([]Price, error) {
			return s.internalPricing.GetBatch(ctx, productIDs) // rate-limited client
		},
		func(ctx context.Context) ([]Price, error) {
			return s.partnerAPI.GetBatch(ctx, productIDs) // rate-limited client
		},
	)
	if err != nil {
		return nil, fmt.Errorf("pricing aggregation: %w", err)
	}
	return mergePrices(prices[0], prices[1]), nil
}

Context flows down. Cancellation flows up. Errors propagate. Resources get cleaned up. That’s the whole model.


What I Got Wrong

When I first started writing concurrent Go, I over-engineered everything. Custom scheduler. Priority queues for goroutines. Elaborate channel topologies that looked like circuit diagrams. I thought complexity meant I was handling edge cases.

It didn’t. It meant I was creating edge cases.

The 40GB leak came from a “sophisticated” fan-out implementation with retry logic, circuit breakers, and fallback paths — all hand-rolled. The code was about 400 lines. The replacement, using the patterns in this article plus a few stdlib packages, was about 60 lines. It’s been running for over a year without incident.

If you’re building distributed systems, the concurrency primitives are the easy part. The hard part is knowing when a goroutine ends. If you can answer that question for every go statement in your codebase, you’re in good shape.

My advice if you’re learning Go as a DevOps engineer or coming from another language: start with errgroup. It forces you to handle errors and cancellation from day one. Write your worker pools by hand once to understand the mechanics, then use the generic version everywhere. And please, for the love of your on-call rotation, propagate your contexts.

Goroutines are cheap. Your sleep isn’t.