Rust Error Handling Patterns for Production Applications
I got paged at 3am on a Tuesday because a Rust service I’d deployed two weeks earlier crashed hard. No graceful degradation, no useful error message in the logs. Just a panic backtrace pointing at line 247 of our config parser: .unwrap().
The config file had a trailing comma that our test fixtures didn’t cover. One .unwrap() on a serde_json::from_str call, and the whole service went down. I sat there in the dark, laptop balanced on my knees, fixing a one-line bug that should never have made it past code review.
That night changed how I think about error handling in Rust. I’d already written about the basics of Rust error handling and even covered unwrap specifically, but knowing the theory and actually building production-resilient systems are different things. This post is about the patterns I’ve settled on after shipping Rust to production for a few years now.
unwrap() in Production Code Is a Bug Waiting to Happen
I’m going to say it plainly: every .unwrap() in production code is a latent crash. Full stop. It’s a conscious decision to say “I’m so confident this can’t fail that I’m willing to take down the entire process if I’m wrong.”
You’re almost never that confident. And when you are, you’re often wrong.
The same goes for .expect() — it’s marginally better because you get a message in the panic output, but your service is still dead. In Rust for cloud engineers, I talked about how Rust’s safety guarantees are one of its biggest selling points. But those guarantees only hold if you actually use Result and Option properly instead of bypassing them with unwrap().
Here’s what I do instead. For cases where I genuinely believe a value can’t be None or an operation can’t fail, I still handle it:
let port = config.port.unwrap_or(8080);
let addr = socket_addr
.parse::<SocketAddr>()
.unwrap_or_else(|_| SocketAddr::from(([127, 0, 0, 1], 8080)));
For anything that actually can fail — which is most things — I propagate with ? or handle it explicitly. More on that below.
The ? Operator Is Your Best Friend
If you’ve read my earlier piece on Rust error handling, you know the ? operator replaces the old try! macro. But it’s worth revisiting because it’s the backbone of production error handling.
The ? operator does three things: it unwraps the Ok value, returns early on Err, and — crucially — converts the error type using From. That last part is what makes the whole ecosystem work.
use std::fs;
use std::net::SocketAddr;
fn load_config(path: &str) -> Result<Config, AppError> {
let contents = fs::read_to_string(path)?; // io::Error -> AppError
let config: Config = serde_json::from_str(&contents)?; // serde::Error -> AppError
let _addr = config.bind_address.parse::<SocketAddr>()?; // AddrParseError -> AppError
Ok(config)
}
Three different error types, all converted automatically through From implementations. No manual matching, no .map_err() chains. This is clean, readable, and it preserves the original error information.
The key insight is that ? isn’t just syntactic sugar for early returns. It’s a composable error conversion pipeline. Once you set up your From implementations (or let a crate do it for you), error propagation becomes almost invisible.
thiserror for Libraries, anyhow for Applications
This is the single most useful heuristic I’ve found for Rust error handling, and I wish someone had told me earlier.
If you’re writing a library — something other people will depend on — use thiserror. It generates Display and From implementations for your custom error enums, giving downstream users structured errors they can pattern match on:
use thiserror::Error;
#[derive(Error, Debug)]
pub enum StorageError {
#[error("object not found: {key}")]
NotFound { key: String },
#[error("permission denied for bucket {bucket}")]
PermissionDenied { bucket: String },
#[error("connection failed after {attempts} attempts")]
ConnectionFailed { attempts: u32, source: std::io::Error },
#[error(transparent)]
Unexpected(#[from] anyhow::Error),
}
Callers can match on StorageError::NotFound and handle it differently from StorageError::PermissionDenied. That’s the whole point of a library error type — it gives consumers meaningful variants to branch on.
For applications — your actual binaries, your services, your CLI tools — use anyhow. It erases the concrete error type and focuses on context:
use anyhow::{Context, Result};
fn sync_objects(config: &Config) -> Result<()> {
let client = create_client(config)
.context("failed to create storage client")?;
let objects = client.list_objects(&config.bucket)
.context("failed to list objects")?;
for obj in objects {
client.download(&obj.key)
.with_context(|| format!("failed to download {}", obj.key))?;
}
Ok(())
}
The .context() calls are what make anyhow shine. When this fails, you don’t just get “connection refused” — you get “failed to download assets/logo.png: failed to connect: connection refused”. That chain of context is exactly what you need at 3am when you’re reading logs.
Custom Error Types Done Right
Sometimes thiserror and anyhow aren’t enough, or you want to avoid the dependency. Building error types by hand isn’t hard once you understand what traits you need to implement.
At minimum, a production error type needs Debug, Display, and std::error::Error:
use std::fmt;
#[derive(Debug)]
pub enum ApiError {
BadRequest(String),
Unauthorized,
RateLimit { retry_after: u64 },
Internal(Box<dyn std::error::Error + Send + Sync>),
}
impl fmt::Display for ApiError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
Self::BadRequest(msg) => write!(f, "bad request: {msg}"),
Self::Unauthorized => write!(f, "unauthorized"),
Self::RateLimit { retry_after } => {
write!(f, "rate limited, retry after {retry_after}s")
}
Self::Internal(e) => write!(f, "internal error: {e}"),
}
}
}
impl std::error::Error for ApiError {
fn source(&self) -> Option<&(dyn std::error::Error + 'static)> {
match self {
Self::Internal(e) => Some(e.as_ref()),
_ => None,
}
}
}
The source() method is important and often overlooked. It’s how error chains work — tools like anyhow walk the source() chain to build those nice nested error messages. If you skip it, you lose that context.
Add From implementations for the error types you want automatic conversion from:
impl From<std::io::Error> for ApiError {
fn from(e: std::io::Error) -> Self {
ApiError::Internal(Box::new(e))
}
}
Now ? works seamlessly with io::Error in any function returning Result<T, ApiError>.
Error Context Is Everything in Production
The difference between a 5-minute fix and a 2-hour debugging session usually comes down to error context. “Connection refused” tells you nothing. “Failed to connect to metrics endpoint at 10.0.3.12:9090 while flushing batch 47: connection refused” tells you everything.
I’ve adopted a personal rule: every ? at a boundary — network calls, file I/O, parsing, anything that crosses a trust boundary — gets a .context() or .with_context(). It feels verbose when you’re writing it. It feels like a gift when you’re debugging at 3am.
use anyhow::{Context, Result};
fn process_batch(batch_id: u64, items: &[Item]) -> Result<()> {
let conn = db::connect(&DB_URL)
.with_context(|| format!("batch {batch_id}: db connect failed"))?;
for item in items {
conn.insert(item)
.with_context(|| format!("batch {batch_id}: insert failed for {}", item.id))?;
}
Ok(())
}
Use .with_context() (the closure version) when you need to format strings — it only allocates if the error actually happens. .context() with a static string is fine for simple cases.
Mapping Errors Across Boundaries
Real applications have layers. Your HTTP handler calls a service layer, which calls a repository, which calls a database driver. Each layer has its own error type, and you need clean conversions between them.
The pattern I keep coming back to is: each layer defines its own error enum, and implements From for the layer below it.
// Repository layer
#[derive(Debug, thiserror::Error)]
pub enum RepoError {
#[error("not found: {0}")]
NotFound(String),
#[error("database error")]
Database(#[from] sqlx::Error),
}
// Service layer
#[derive(Debug, thiserror::Error)]
pub enum ServiceError {
#[error("resource not found: {0}")]
NotFound(String),
#[error("validation failed: {0}")]
Validation(String),
#[error(transparent)]
Internal(#[from] RepoError),
}
// HTTP layer — maps to status codes
impl From<ServiceError> for HttpResponse {
fn from(e: ServiceError) -> Self {
match e {
ServiceError::NotFound(msg) => HttpResponse::not_found(msg),
ServiceError::Validation(msg) => HttpResponse::bad_request(msg),
ServiceError::Internal(_) => HttpResponse::internal_error(),
}
}
}
Notice the HTTP layer doesn’t expose internal details. ServiceError::Internal becomes a generic 500. This is deliberate — you don’t want database error messages leaking to clients. Log the full chain server-side, return something safe to the caller.
Handling Errors in Async Code
Async Rust doesn’t change the fundamentals of error handling, but it adds a few wrinkles. ? works the same way inside async fn, which is great. The tricky part is when you’re dealing with JoinHandle results — you get a Result<Result<T, E>, JoinError>, and that double-Result catches people off guard.
use tokio::task;
use anyhow::{Context, Result};
async fn run_workers(tasks: Vec<Work>) -> Result<()> {
let mut handles = Vec::new();
for t in tasks {
handles.push(task::spawn(async move {
process(t).await
}));
}
for handle in handles {
handle.await
.context("task panicked")?
.context("task failed")?;
}
Ok(())
}
The first ? handles the JoinError (the task panicked or was cancelled). The second handles whatever error your actual work returned. Two distinct failure modes, two distinct context messages.
When to Actually Panic
I said unwrap() is a bug, and I stand by that. But there are legitimate reasons to panic in Rust. The key distinction is: panic for programmer errors, return Result for runtime errors.
Panics are appropriate for:
- Invariants that should be enforced at compile time but can’t be (yet)
- Test code (
.unwrap()in tests is fine) - Setup code that runs once at startup, where failure means the program can’t function at all
fn main() {
// This is acceptable — if config doesn't load at startup, we can't run
let config = load_config("config.toml")
.expect("failed to load config.toml — cannot start");
// After startup, everything returns Result
if let Err(e) = run_server(config) {
eprintln!("server error: {e:#}");
std::process::exit(1);
}
}
The {:#} format specifier with anyhow errors prints the full chain. Use {:?} if you want the debug representation with backtraces.
Startup panics are a pragmatic choice. If your database URL is missing from the environment, there’s no point in gracefully handling that — the service literally cannot do its job. Crash loud, crash early, let the orchestrator restart you (or don’t, if the config is still broken).
Everything after startup though? Result all the way down.
Putting It All Together
Here’s the mental model I use for every Rust project now:
-
Application binaries:
anyhow::Resultas the return type for most functions..context()liberally.main()can useexpect()for startup-critical stuff. -
Libraries:
thiserrorenums with meaningful variants. Neveranyhowin your public API — your users need to match on errors. -
Boundaries: explicit
Fromimplementations or.map_err()at layer transitions. Never leak internal error details outward. -
Logging: log the full error chain with
{:#}or by walking.source(). Return sanitized messages to callers. -
Testing:
.unwrap()is fine in tests. That’s literally what it’s for.
The 3am incident I mentioned at the start? It took me about ten minutes to fix once I was awake enough to read the backtrace. But it took weeks to rebuild confidence that the service was solid. I went through the entire codebase replacing every .unwrap() with proper error handling, adding .context() calls, and setting up structured logging so that when things did fail, I’d know exactly where and why.
That’s the real cost of sloppy error handling — it’s not the bug itself, it’s the trust you lose. In yourself, from your team, from whoever’s relying on your service.
I’ve since added a clippy lint to our CI pipeline that flags any .unwrap() outside of test modules. It’s caught dozens of them in code review. Some were genuinely safe — indexing into a vec we’d just checked the length of, that kind of thing — but even those got rewritten to use .get() or pattern matching. The discipline matters more than any individual instance.
Rust gives you the tools to handle errors properly. The type system encodes fallibility directly into function signatures. The compiler practically begs you to deal with every failure case. Pattern matching makes it ergonomic. thiserror and anyhow make it painless. There’s really no excuse for shipping .unwrap() in code that runs in production.
Listen to the compiler. Your 3am self will thank you.