AWS Lambda Cold Starts: Causes, Measurement, and Mitigation Strategies
I’ve lost count of how many times someone’s told me “Lambda has cold start problems” like it’s some fatal flaw. It isn’t. Cold starts are a tradeoff. You get near-infinite scale and zero idle cost, and in return, the first request to a new execution environment takes a bit longer. That’s the deal.
The real problem is that most teams either panic about cold starts when they don’t matter, or ignore them completely when they absolutely do. I’ve seen both. We had a payment API on Lambda that was timing out on cold starts during Black Friday — the Java function took 6 seconds to initialize with Spring Boot, and our API Gateway timeout was set to 5 seconds. Every new concurrent request during the traffic spike just… failed. That was a bad day.
So let’s talk about what’s actually happening, how to measure it properly, and what to do about it when it matters.
What Actually Happens During a Cold Start
Three things happen in sequence, and understanding which phase is slow tells you what to fix.
The sandbox spins up. Lambda provisions a Firecracker microVM, mounts the filesystem, sets up networking. If your function is in a VPC, there’s additional work to attach a Hyperplane-backed ENI. This used to be brutal — like 10+ seconds brutal — but AWS fixed it back in 2019. Now it’s mostly a non-issue. You don’t control this phase. Move on.
The runtime loads. Python interpreter starts, or the JVM boots, or the Node.js engine initializes. Python and Node.js are fast here. Java is not. That’s just reality. A Python 3.12 runtime initializes in under 200ms typically. Java 21 without SnapStart? Anywhere from 2 to 5 seconds depending on what frameworks you’ve dragged in.
Your INIT code runs. This is everything outside your handler function — imports, global variables, SDK clients, database connections. This is the part you own.
import boto3
import os
# Runs during INIT — adds to cold start time
dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table(os.environ["TABLE_NAME"])
def handler(event, context):
# Runs on every invocation
return table.get_item(Key={"pk": event["id"]})
The Init Duration field in your CloudWatch REPORT line is the sum of all three phases. That’s your number.
Measuring Cold Starts (Do This First)
I can’t stress this enough: measure before you optimize. I’ve watched teams spend weeks implementing Provisioned Concurrency for functions that had a 0.3% cold start rate and 180ms init duration. Total waste of money.
If you’re following SRE practices for serverless architectures, cold start percentage and P99 init duration should be in your SLI definitions. Here’s how to get those numbers.
CloudWatch Logs Insights is your best friend. Every Lambda invocation produces a REPORT line, and cold starts include Init Duration. Select your function’s log group and run these:
Cold start percentage over time:
filter @type = "REPORT"
| stats sum(strcontains(@message, "Init Duration")) / count(*) * 100 as coldStartPct,
avg(@duration) as avgDuration
by bin(5m)
P50/P95/P99 of init durations:
filter @type = "REPORT" and @initDuration > 0
| stats avg(@initDuration) as avgInit,
percentile(@initDuration, 50) as p50,
percentile(@initDuration, 95) as p95,
percentile(@initDuration, 99) as p99
by bin(1h)
Find your worst offenders:
filter @type = "REPORT" and @initDuration > 0
| fields @requestId, @initDuration, @duration, @maxMemoryUsed / 1024 / 1024 as memMB
| sort @initDuration desc
| limit 25
That last one is gold for debugging. Grab the request ID, trace it through X-Ray, and you’ll see exactly where time went.
X-Ray tracing gives you a visual breakdown. Enable it:
aws lambda update-function-configuration \
--function-name my-function \
--tracing-config Mode=Active
Cold starts show up as an Initialization subsegment in the trace. Filter for traces that have it and you can see whether the latency is your code or the runtime.
Quick CLI test when you just want a number:
aws lambda invoke \
--function-name my-function \
--payload '{"test": true}' \
--log-type Tail \
--query 'LogResult' \
--output text \
--cli-binary-format raw-in-base64-out \
response.json | base64 --decode | grep "Init Duration"
To force a cold start, flip an environment variable before invoking. Lambda creates a new environment every time the configuration changes.
Provisioned Concurrency — The Expensive Fix
Provisioned Concurrency is expensive and most teams don’t need it. There, I said it.
It works by pre-initializing execution environments and keeping them warm. Your INIT code runs ahead of time, and when a request arrives it hits a warm environment with zero init latency. It’s effective. It’s also a running cost whether those environments handle traffic or not.
aws lambda put-provisioned-concurrency-config \
--function-name my-function \
--qualifier my-alias \
--provisioned-concurrent-executions 10
I use it for exactly one scenario: user-facing APIs with strict P99 latency SLOs where cold starts would breach the target. That’s it. If you’re processing SQS messages or handling S3 event notifications, you don’t need this. Nobody cares if an async file processor takes an extra 400ms on the first invocation.
If you do use it, pair it with auto-scaling or you’ll either overpay or run out of warm environments during spikes:
aws application-autoscaling register-scalable-target \
--service-namespace lambda \
--resource-id function:my-function:my-alias \
--scalable-dimension lambda:function:ProvisionedConcurrency \
--min-capacity 2 \
--max-capacity 50
aws application-autoscaling put-scaling-policy \
--service-namespace lambda \
--resource-id function:my-function:my-alias \
--scalable-dimension lambda:function:ProvisionedConcurrency \
--policy-name target-tracking \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration \
'{"TargetValue": 0.7, "PredefinedMetricSpecification": {"PredefinedMetricType": "LambdaProvisionedConcurrencyUtilization"}}'
That keeps utilization around 70%, scaling up before you exhaust the warm pool. I’ve written about this pattern more broadly in designing scalable systems in AWS — the principle is the same whether you’re scaling EC2, ECS, or Lambda provisioned concurrency.
SnapStart — The One That Changed Everything for Java
Java on Lambda is pain unless you use SnapStart. I’m not being dramatic. A Spring Boot function cold-starting in 5+ seconds is unusable for synchronous APIs. SnapStart fixes this by snapshotting the initialized execution environment — memory, disk state, everything — when you publish a version. Future cold starts restore from the snapshot instead of running the full INIT.
I’ve seen Java cold starts drop from 5.2 seconds to under 400ms. That’s not a typo.
It’s available for Java 11+, Python 3.12+, and .NET 8+. For Python and .NET the gains are smaller since those runtimes already init fast, but it still helps if you’ve got heavy INIT code.
aws lambda create-function \
--function-name my-function \
--runtime python3.12 \
--handler app.handler \
--role arn:aws:iam::123456789012:role/lambda-role \
--zip-file fileb://function.zip \
--snap-start ApplyOn=PublishedVersions
Then publish a version to trigger the snapshot:
aws lambda publish-version --function-name my-function
A few gotchas that have bitten me:
- It doesn’t work with Provisioned Concurrency. Pick one.
- Network connections from INIT don’t survive the snapshot. You need to validate and re-establish them after restore.
- Any random values or UUIDs generated during INIT get baked into the snapshot. Every restored environment shares them. This is a real security concern if you’re generating tokens or nonces at module scope.
- No EFS support, and ephemeral storage must be 512MB or less.
For monitoring, SnapStart functions report Restore Duration instead of Init Duration:
filter @type = "REPORT" and @message like /Restore Duration/
| stats avg(@restoreDuration) as avgRestore,
percentile(@restoreDuration, 99) as p99Restore
by bin(1h)
If you’re building event-driven architecture patterns with Java Lambdas, SnapStart is basically mandatory now. The cold start tax without it is just too high.
The Free Stuff — Code-Level Optimizations
Before you spend money on Provisioned Concurrency, try the things that cost nothing. I’m consistently surprised by how much teams leave on the table here.
Shrink your deployment package. Every byte gets downloaded and extracted during INIT. I’ve seen functions shipping 80MB packages because someone ran pip install without cleaning up. Strip the junk:
pip install -r requirements.txt -t ./package --no-cache-dir
cd package && zip -r9 ../function.zip . -x "*.pyc" "*.dist-info/*" "__pycache__/*" "tests/*"
cd .. && zip -g function.zip app.py
A 5MB package initializes noticeably faster than a 50MB one. This isn’t theoretical — I’ve measured it.
Bump the memory. This is the single easiest optimization and people constantly overlook it. Lambda allocates CPU proportionally to memory. At 128MB you get a sliver of a vCPU. At 1769MB you get a full vCPU. More CPU means your INIT code runs faster. I’ve seen cold starts drop 40-60% just by going from 128MB to 512MB. The cost increase per invocation is often negligible because the function finishes faster too. Use Lambda Power Tuning to find the sweet spot.
Lazy-load what you can. If some code path only fires for certain event types, don’t initialize it eagerly:
import boto3
import os
_table = None
def _get_table():
global _table
if _table is None:
dynamodb = boto3.resource("dynamodb")
_table = dynamodb.Table(os.environ["TABLE_NAME"])
return _table
def handler(event, context):
if event.get("source") == "warmup":
return {"statusCode": 200}
return _get_table().get_item(Key={"pk": event["id"]})
Use this selectively though. For your main code path, eager initialization at module scope is still better — it runs once and every subsequent invocation benefits.
Reuse connections. SDK clients and database connections created during INIT persist across invocations on the same execution environment. Always create them at module scope:
import boto3
import psycopg2
import os
s3 = boto3.client("s3")
conn = psycopg2.connect(
host=os.environ["DB_HOST"],
dbname=os.environ["DB_NAME"],
user=os.environ["DB_USER"],
password=os.environ["DB_PASSWORD"],
connect_timeout=5
)
def handler(event, context):
with conn.cursor() as cur:
cur.execute("SELECT * FROM items WHERE id = %s", (event["id"],))
return cur.fetchone()
The execution environment is your connection pool. This is fundamental to serverless architecture patterns and I’m still amazed when I see teams creating new boto3 clients inside the handler on every invocation.
Architecture Decisions That Affect Cold Starts
Your function doesn’t exist in a vacuum. Some architectural choices make cold starts worse before you write a single line of handler code.
VPC attachment. If your function doesn’t need to talk to resources inside a VPC, don’t attach it. Simple as that. The Hyperplane improvements from 2019 made VPC cold starts tolerable, but “tolerable” still means extra latency you don’t need. I’ve seen teams put functions in a VPC “for security” when the function only calls public AWS APIs. That’s not security, that’s overhead.
Runtime choice matters. Python and Node.js cold-start in 150-400ms typically. Java and .NET without SnapStart are 2-5x slower. If you’re starting a new project and cold start latency is a concern, pick Python or Node.js. I know that’s an unpopular opinion with the Java crowd, but the numbers don’t lie.
Keep functions focused. A function with 30 imports and 4 SDK clients cold-starts slower than one with 3 imports and 1 client. When you create an AWS Lambda in Terraform, keep each function single-purpose with minimal dependencies. Monolithic Lambda functions are an anti-pattern for multiple reasons, and cold start performance is one of them.
Stop using warming hacks. I still see teams running scheduled EventBridge rules to ping their functions every 5 minutes. This doesn’t scale. It keeps one environment warm. If you get 10 concurrent requests, 9 of them still cold-start. Provisioned Concurrency and SnapStart exist specifically because warming hacks don’t work. Let them go.
Picking the Right Strategy
Here’s my honest decision framework after running Lambda in production for years:
| Situation | What I’d do |
|---|---|
| Sync API, strict P99 SLO | Provisioned Concurrency with auto-scaling |
| Java anything | SnapStart, full stop |
| Python/Node, moderate latency tolerance | Code optimizations + bump memory |
| Async processing (SQS, S3 triggers, etc.) | Nothing. Accept the cold starts. |
| Unpredictable spiky traffic | SnapStart + code optimizations |
| Steady high-throughput API | Provisioned Concurrency, static allocation |
Most of the time? Code optimizations and memory tuning get you 80% of the way there. I’ve worked on dozens of Lambda-based systems and only a handful genuinely needed Provisioned Concurrency. SnapStart is the middle ground — meaningful improvement, minimal cost — and I’d evaluate it before reaching for the expensive option.
Where This Fits in the Bigger Picture
Cold starts are one variable in a much larger system design problem. A 300ms cold start doesn’t matter if your downstream database query takes 2 seconds. Optimize the bottleneck that actually affects your users.
If you’re building production serverless systems, cold start optimization is just one piece. You need solid SRE practices for serverless architectures to know when things degrade, thoughtful event-driven architecture patterns to decouple components, and a clear understanding of designing scalable systems in AWS so the whole thing holds together under load.
Measure first. Optimize what matters. Don’t spend money solving problems you don’t have.