AWS Cost Optimization: 15 Techniques That Actually Work

I got a call from a startup founder last year. “Our AWS bill just hit $47,000 and we have twelve engineers.” They’d been running for about eighteen months, never really looked at the bill, and suddenly it was eating their runway. I spent a week inside their account. We cut it to $28,000. That’s a 40% reduction, and honestly most of it was embarrassingly obvious stuff.

That experience crystallized something I’d been thinking about for a while: most AWS cost problems aren’t sophisticated. They’re neglect. People provision things, forget about them, and the meter keeps running. The fixes aren’t glamorous either — they’re methodical, sometimes tedious, and they work.

Here are fifteen techniques I’ve used repeatedly. Not theory. Not “consider doing X.” Actual things, with actual numbers.

1. Right-Size Your EC2 Instances

This is always where I start. Always. In that startup engagement I mentioned, we found 23 instances running m5.2xlarge that averaged 8% CPU utilization. Twenty-three. They’d picked the instance size during a load test six months earlier and never revisited it.

Pull your CloudWatch metrics:

aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-0abc123def456 \
  --start-time 2026-01-01T00:00:00Z \
  --end-time 2026-02-01T00:00:00Z \
  --period 86400 \
  --statistics Average Maximum

If your average CPU is under 20% and your max is under 50%, you’re oversized. Drop a size. An m5.xlarge costs half what an m5.2xlarge does — that’s roughly $70/month per instance in us-east-1. Multiply by 23 instances and you’re saving $1,600/month just from this one change.

AWS Compute Optimizer will actually tell you this for free. Turn it on:

aws compute-optimizer update-enrollment-status --status Active

Then check recommendations a few days later. It’s not perfect, but it catches the obvious stuff.

2. Savings Plans Over Reserved Instances

I’m going to be opinionated here: Savings Plans are almost always better than Reserved Instances now. RIs lock you to a specific instance family in a specific region. Savings Plans (Compute flavor) cover EC2, Fargate, and Lambda across any region and any instance family.

The discount is comparable — 30-40% for a 1-year no-upfront commitment, up to 72% for 3-year all-upfront. But the flexibility is night and day.

Look at your last 30 days of spend:

aws ce get-cost-and-usage \
  --time-period Start=2026-01-10,End=2026-02-10 \
  --granularity MONTHLY \
  --metrics UnblendedCost \
  --group-by Type=DIMENSION,Key=SERVICE

Find your steady-state compute baseline and commit to that. Don’t commit to your peak — commit to your floor. If you’re spending $10,000/month on EC2 and your minimum is around $6,000, buy a Savings Plan for $6,000. Let the rest ride on-demand or Spot.

3. Spot Instances for Stateless Workloads

Spot pricing is 60-90% off on-demand. That’s not a typo. I’ve run production batch processing on Spot for years and the interruption rate on a diversified fleet is genuinely low — under 5% in most regions for most instance types.

The trick is diversification. Don’t request a single instance type. Use a mixed fleet:

aws ec2 run-instances \
  --instance-market-options '{"MarketType":"spot","SpotOptions":{"SpotInstanceType":"persistent","InstanceInterruptionBehavior":"stop"}}' \
  --instance-type m5.large \
  --min-count 1 --max-count 1 \
  --image-id ami-0abcdef1234567890

Better yet, use an EC2 Fleet or Auto Scaling group with mixed instance policies. Spread across m5.large, m5a.large, m5d.large, m4.large — whatever’s cheapest at the moment. Your application shouldn’t care which one it lands on.

For anything stateful or latency-sensitive, don’t bother. Spot is for batch jobs, CI/CD runners, dev environments, and horizontally scaled web tiers with proper health checks.

4. Kill Zombie Resources

This was the single biggest win at that startup. Unattached EBS volumes. Idle load balancers. Forgotten RDS snapshots from a migration two years ago. Elastic IPs sitting there doing nothing (those cost $3.65/month each now — AWS started charging for idle IPs in 2024).

Find unattached EBS volumes:

aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query 'Volumes[*].{ID:VolumeId,Size:Size,Type:VolumeType}' \
  --output table

Find idle load balancers (no healthy targets):

aws elbv2 describe-target-health \
  --target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-tg/abc123

We found $3,200/month in zombie resources at that startup. Just sitting there. Burning money. Nobody knew they existed.

I wrote about a similar cleanup in my case study cutting costs 30% — different client, same pattern.

5. S3 Lifecycle Policies

S3 Standard costs $0.023/GB/month. S3 Glacier Instant Retrieval costs $0.004/GB/month. S3 Glacier Deep Archive costs $0.00099/GB/month. If you’ve got terabytes of logs or old data sitting in Standard, you’re paying 5-23x more than you need to.

Set up lifecycle rules:

aws s3api put-bucket-lifecycle-configuration \
  --bucket my-data-bucket \
  --lifecycle-configuration '{
    "Rules": [{
      "ID": "archive-old-data",
      "Status": "Enabled",
      "Filter": {"Prefix": "logs/"},
      "Transitions": [
        {"Days": 30, "StorageClass": "STANDARD_IA"},
        {"Days": 90, "StorageClass": "GLACIER_IR"},
        {"Days": 365, "StorageClass": "DEEP_ARCHIVE"}
      ],
      "Expiration": {"Days": 730}
    }]
  }'

One client had 14TB of CloudTrail logs in S3 Standard. Fourteen terabytes. That’s $322/month. After lifecycle policies, it dropped to about $40/month. Not life-changing money, but it adds up — and it’s a set-and-forget fix.

Also check S3 Intelligent-Tiering if your access patterns are unpredictable. There’s a small monitoring fee ($0.0025/1000 objects) but it automatically moves data between tiers.

6. NAT Gateway Costs Are a Silent Killer

NAT Gateways charge $0.045/hour ($32/month) plus $0.045/GB of data processed. I’ve seen accounts where NAT Gateway data processing fees were the third-largest line item and nobody had any idea.

Check yours:

aws ce get-cost-and-usage \
  --time-period Start=2026-01-01,End=2026-02-01 \
  --granularity MONTHLY \
  --metrics UnblendedCost \
  --filter '{"Dimensions":{"Key":"USAGE_TYPE","Values":["NatGateway-Bytes"]}}' \
  --group-by Type=DIMENSION,Key=USAGE_TYPE

Fixes: use VPC endpoints for S3 and DynamoDB (they’re free for Gateway endpoints), consolidate NAT Gateways where possible, and move chatty services into public subnets if they don’t need to be private. I know that last one makes security folks twitch, but a dev environment doesn’t need the same network topology as production.

Gateway VPC endpoint for S3 — takes thirty seconds:

aws ec2 create-vpc-endpoint \
  --vpc-id vpc-0abc123 \
  --service-name com.amazonaws.us-east-1.s3 \
  --route-table-ids rtb-0abc123

At that startup, S3 traffic through the NAT Gateway was costing them $800/month. The VPC endpoint made it zero.

7. Use Graviton Instances

ARM-based Graviton instances (m7g, c7g, r7g families) are 20% cheaper than their x86 equivalents and deliver better performance for most workloads. If your code runs on Linux and you’re not using some weird x86-only binary dependency, there’s almost no reason not to switch.

I’ve migrated dozens of workloads to Graviton. The only ones that caused trouble were apps with native x86 compiled dependencies — some older Java JNI libraries, a couple of Python packages with C extensions that didn’t have ARM wheels. Everything else just worked.

For containerized workloads, build multi-arch images and you can run on either. The savings are real — a c7g.xlarge is $0.1088/hour versus $0.136/hour for a c7i.xlarge. That’s 20% off, forever, with no commitment required.

8. RDS Right-Sizing and Aurora Serverless

RDS instances are often the most oversized resources in an account. People pick db.r5.2xlarge “just in case” and the database sits at 15% CPU for months.

Same drill as EC2 — check CloudWatch, drop a size. But also consider Aurora Serverless v2 for variable workloads. It scales from 0.5 ACU to whatever you set as max, and you pay per ACU-hour. For databases that are busy during business hours and idle at night, the savings can be 40-60% versus a fixed-size instance.

Also: stop your dev and staging RDS instances outside business hours. Seriously. A db.r6g.xlarge costs $0.48/hour. Running it 24/7 is $350/month. Running it 10 hours a day on weekdays is $100/month.

aws rds stop-db-instance --db-instance-identifier my-dev-db

Automate it with EventBridge and Lambda. I covered patterns like this in designing scalable systems AWS.

9. CloudWatch Log Retention

By default, CloudWatch Logs never expire. Never. I’ve seen accounts with years of debug-level logs sitting in CloudWatch at $0.03/GB/month for storage.

Set retention on every log group:

aws logs put-retention-policy \
  --log-group-name /aws/lambda/my-function \
  --retention-in-days 30

Want to do them all at once? Quick script:

for lg in $(aws logs describe-log-groups --query 'logGroups[?!retentionInDays].logGroupName' --output text); do
  aws logs put-retention-policy --log-group-name "$lg" --retention-in-days 30
  echo "Set 30-day retention on $lg"
done

If you need logs longer than 30 days, export them to S3 first and apply lifecycle policies there. CloudWatch storage is 6x more expensive than S3 Standard and 30x more expensive than Glacier.

10. Data Transfer Awareness

Data transfer is the hidden tax of AWS. Transfer between AZs costs $0.01/GB each way. Transfer out to the internet is $0.09/GB for the first 10TB. It doesn’t sound like much until you’re moving terabytes.

A few things that help:

Keep communicating services in the same AZ when possible (use AZ-aware routing)
Use CloudFront for content delivery — transfer from S3 to CloudFront is free, and CloudFront-to-internet is cheaper than direct
Compress everything — API responses, log shipments, backups
Use VPC endpoints (Interface endpoints for most services, Gateway endpoints for S3/DynamoDB)

I worked with a video processing company that was spending $4,000/month on data transfer alone. We put CloudFront in front of their delivery pipeline and cut it to $1,100. The cloud cost optimization strategies post goes deeper on this.

11. Lambda Right-Sizing with Power Tuning

Lambda bills by memory-milliseconds. More memory means faster execution, which sometimes means lower cost. The relationship isn’t linear and it’s different for every function.

Use the AWS Lambda Power Tuning tool. It runs your function at different memory settings and shows you the cost-performance curve. I’ve seen functions where bumping memory from 128MB to 512MB cut execution time by 80% and reduced cost by 60%.

Also: watch out for over-provisioned functions. A function that runs for 50ms at 1024MB doesn’t need 3008MB. Check your actual memory usage in CloudWatch:

aws logs filter-log-events \
  --log-group-name /aws/lambda/my-function \
  --filter-pattern "REPORT" \
  --limit 10

The REPORT lines show Max Memory Used versus Memory Size. If there’s a big gap, you’re overpaying.

12. Use Cost Allocation Tags Religiously

You can’t optimize what you can’t see. Tag everything — every EC2 instance, every RDS database, every S3 bucket — with at minimum: Environment, Team, and Project.

Then activate those tags for billing:

aws ce update-cost-allocation-tags-status \
  --cost-allocation-tags-status '[{"TagKey":"Environment","Status":"Active"},{"TagKey":"Team","Status":"Active"}]'

Once tags are active in Cost Explorer, you can answer questions like “how much is the data team’s dev environment costing us?” Without tags, you’re guessing. With tags, you’re managing.

This is foundational FinOps practices stuff. Not exciting, but it makes everything else possible.

13. Reserved Capacity for DynamoDB

If you’ve got DynamoDB tables with predictable traffic, switch from on-demand to provisioned capacity with auto-scaling. On-demand is convenient but it’s roughly 6.5x more expensive per unit of capacity.

For tables with very stable baselines, buy DynamoDB Reserved Capacity. It’s a 1-year or 3-year commitment, similar to EC2 RIs, and the discount is significant — up to 77% for a 3-year term.

Check your table’s consumed capacity over time:

aws cloudwatch get-metric-statistics \
  --namespace AWS/DynamoDB \
  --metric-name ConsumedReadCapacityUnits \
  --dimensions Name=TableName,Value=my-table \
  --start-time 2026-01-01T00:00:00Z \
  --end-time 2026-02-01T00:00:00Z \
  --period 3600 \
  --statistics Average Maximum

If the pattern is flat and predictable, provisioned with reserved capacity will save you a lot. If it’s spiky and unpredictable, stay on-demand — the premium is worth the flexibility.

14. Automate Non-Production Shutdowns

Dev and staging environments don’t need to run at 3 AM on a Sunday. I’m amazed how often I find full production-mirror environments running 24/7 for teams that work 9-to-5 Monday through Friday.

The math: running non-prod 50 hours/week instead of 168 saves 70%. If your non-prod environment costs $5,000/month, that’s $3,500 back.

Use AWS Instance Scheduler or a simple EventBridge + Lambda combo. For ECS services, scale desired count to zero. For EC2, stop instances. For RDS, stop the database.

We built exactly this kind of automation when scaling startups with cloud — it’s one of the first things I recommend to any team burning through runway.

15. Set Up AWS Budgets and Anomaly Detection

This isn’t a cost-cutting technique per se — it’s a cost-not-exploding technique. Set budgets with alerts:

aws budgets create-budget \
  --account-id 123456789012 \
  --budget '{
    "BudgetName": "monthly-total",
    "BudgetLimit": {"Amount": "30000", "Unit": "USD"},
    "TimeUnit": "MONTHLY",
    "BudgetType": "COST"
  }' \
  --notifications-with-subscribers '[{
    "Notification": {"NotificationType":"ACTUAL","ComparisonOperator":"GREATER_THAN","Threshold":80},
    "Subscribers": [{"SubscriptionType":"EMAIL","Address":"<email>"}]
  }]'

Also enable Cost Anomaly Detection — it uses ML to spot unusual spending patterns and alerts you before a runaway process racks up a five-figure bill over a weekend. It’s free. There’s no reason not to have it on.

aws ce create-anomaly-monitor \
  --anomaly-monitor '{"MonitorName":"account-monitor","MonitorType":"DIMENSIONAL","MonitorDimension":"SERVICE"}'

Putting It All Together

That startup I mentioned at the top? Here’s roughly where the $19,000/month in savings came from:

Right-sizing EC2: $3,800
Zombie resources: $3,200
Savings Plans: $4,500
NAT Gateway optimization: $800
S3 lifecycle policies: $1,200
Non-prod shutdowns: $2,800
CloudWatch log retention: $600
Misc (data transfer, Lambda tuning): $2,100

None of it was rocket science. It was a week of methodical work — pulling metrics, checking utilization, deleting things that shouldn’t exist, and committing to discounts on things that should.

The hardest part wasn’t technical. It was convincing the team to actually look at the bill regularly. Cost optimization isn’t a project — it’s a practice. You do it once, you save money. You do it continuously, you keep saving money.

If you’re staring at an AWS bill that feels too high, start with techniques 1, 4, and 5. Right-size, kill zombies, lifecycle your storage. You’ll probably find 20-30% savings in a day. Then work through the rest of the list over the following weeks.

The money is there. You just have to go get it.