1. Establish Visibility and Baseline
Before optimizing, you need comprehensive visibility into your cloud resources and spending patterns.
Implementation Steps:
-
Deploy comprehensive monitoring
- Enable detailed billing data
- Implement resource tagging strategy
- Set up monitoring dashboards
-
Establish cost allocation
- Tag resources by department, project, environment
- Implement showback or chargeback mechanisms
- Create accountability for cloud spending
-
Define KPIs and metrics
- Cost per service/application
- Utilization percentages
- Cost vs. business metrics (cost per transaction)
AWS Implementation Example:
# Enable AWS Cost and Usage Reports
aws cur create-report-definition \
--report-name "DetailedBillingReport" \
--time-unit HOURLY \
--format textORcsv \
--compression GZIP \
--additional-schema-elements RESOURCES \
--s3-bucket "cost-reports-bucket" \
--s3-prefix "reports" \
--s3-region "us-east-1" \
--additional-artifacts REDSHIFT QUICKSIGHT
# Create a CloudWatch dashboard for cost monitoring
aws cloudwatch put-dashboard \
--dashboard-name "CostMonitoring" \
--dashboard-body file://cost-dashboard.json
Azure Implementation Example:
# Enable Azure Cost Management exports
az costmanagement export create \
--name "DailyCostExport" \
--scope "subscriptions/00000000-0000-0000-0000-000000000000" \
--storage-account-id "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/cost-management/providers/Microsoft.Storage/storageAccounts/costexports" \
--storage-container "exports" \
--timeframe MonthToDate \
--recurrence Daily \
--recurrence-period from="2025-03-01T00:00:00Z" to="2025-12-31T00:00:00Z" \
--schedule-status Active \
--definition-type ActualCost \
--metric UsageQuantity \
--metric Cost
2. Identify and Eliminate Idle Resources
Idle resources are the low-hanging fruit of cloud waste reduction.
Implementation Steps:
-
Set utilization thresholds
- Define what constitutes “idle” (e.g., <5% CPU for 7 days)
- Consider different thresholds for different resource types
-
Create regular reports
- Schedule automated scans for idle resources
- Generate actionable reports with resource details
-
Implement automated remediation
- Automatically stop or terminate idle resources
- Implement approval workflows for production resources
AWS Implementation Example:
# Python script using boto3 to identify idle EC2 instances
import boto3
import datetime
cloudwatch = boto3.client('cloudwatch')
ec2 = boto3.client('ec2')
# Get all running instances
instances = ec2.describe_instances(
Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]
)
for reservation in instances['Reservations']:
for instance in reservation['Instances']:
instance_id = instance['InstanceId']
# Get CPU utilization for the past 14 days
response = cloudwatch.get_metric_statistics(
Namespace='AWS/EC2',
MetricName='CPUUtilization',
Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
StartTime=datetime.datetime.utcnow() - datetime.timedelta(days=14),
EndTime=datetime.datetime.utcnow(),
Period=86400, # 1 day
Statistics=['Average']
)
# Check if instance is idle (average CPU < 5% for all days)
if response['Datapoints'] and all(dp['Average'] < 5.0 for dp in response['Datapoints']):
print(f"Idle instance detected: {instance_id}")
# Tag the instance for review
ec2.create_tags(
Resources=[instance_id],
Tags=[{'Key': 'Status', 'Value': 'Idle-Scheduled-For-Review'}]
)
# Optionally stop the instance (with appropriate approvals)
# ec2.stop_instances(InstanceIds=[instance_id])
GCP Implementation Example:
# Using gcloud to identify idle VMs based on CPU utilization
gcloud compute instances list --format="table(name,zone,status)" > running_instances.txt
while read instance zone status; do
if [ "$status" == "RUNNING" ]; then
# Get average CPU utilization for the past 7 days
util=$(gcloud compute instances get-serial-port-output $instance --zone $zone | \
grep -A 7 "CPU utilization" | awk '{sum+=$3; count++} END {print sum/count}')
if (( $(echo "$util < 5.0" | bc -l) )); then
echo "Idle instance detected: $instance in $zone with $util% CPU utilization"
# Tag the instance
gcloud compute instances add-labels $instance --zone $zone --labels=status=idle-review-required
fi
fi
done < running_instances.txt
3. Implement Rightsizing Recommendations
Rightsizing ensures your resources match your actual needs, eliminating waste from overprovisioning.
Implementation Steps:
-
Collect performance data
- Monitor CPU, memory, network, and disk usage
- Gather data over meaningful time periods (2-4 weeks minimum)
- Consider peak usage and patterns
-
Generate rightsizing recommendations
- Use cloud provider tools or third-party solutions
- Consider performance requirements and constraints
- Calculate potential savings
-
Implement and validate
- Apply recommendations in phases
- Monitor performance after changes
- Document savings achieved
AWS Implementation Example:
# Use AWS Compute Optimizer for rightsizing recommendations
aws compute-optimizer get-ec2-instance-recommendations \
--instance-arns arn:aws:ec2:us-west-2:123456789012:instance/i-0e9801d129EXAMPLE
# Export all recommendations to S3
aws compute-optimizer export-ec2-instance-recommendations \
--s3-destination-config bucket=my-bucket,keyPrefix=compute-optimizer/ec2
Azure Implementation Example:
# Get Azure Advisor recommendations for VM rightsizing
az advisor recommendation list --filter "Category eq 'Cost'" | \
jq '.[] | select(.shortDescription.solution | contains("right-size"))'
4. Optimize Storage Costs
Storage often represents a significant portion of cloud waste due to its persistent nature.
Implementation Steps:
-
Identify storage waste
- Unattached volumes
- Oversized volumes with low utilization
- Redundant snapshots
- Obsolete backups
-
Implement lifecycle policies
- Automate transition to lower-cost tiers
- Set retention policies for backups and snapshots
- Delete unnecessary data automatically
-
Optimize storage classes
- Match storage class to access patterns
- Use infrequent access or archive storage where appropriate
- Implement compression where beneficial
AWS Implementation Example:
# Find unattached EBS volumes
aws ec2 describe-volumes \
--filters Name=status,Values=available \
--query 'Volumes[*].{ID:VolumeId,Size:Size,Type:VolumeType,Created:CreateTime}' \
--output table
# Create S3 lifecycle policy
aws s3api put-bucket-lifecycle-configuration \
--bucket my-bucket \
--lifecycle-configuration file://lifecycle-config.json
lifecycle-config.json:
{
"Rules": [
{
"ID": "Move to Glacier after 90 days",
"Status": "Enabled",
"Filter": {
"Prefix": "logs/"
},
"Transitions": [
{
"Days": 30,
"StorageClass": "STANDARD_IA"
},
{
"Days": 90,
"StorageClass": "GLACIER"
}
],
"Expiration": {
"Days": 365
}
}
]
}
GCP Implementation Example:
# Find unattached persistent disks
gcloud compute disks list --filter="NOT users:*" --format="table(name,zone,sizeGb,status)"
# Create Object Lifecycle Management policy
cat > lifecycle.json << EOF
{
"lifecycle": {
"rule": [
{
"action": {
"type": "SetStorageClass",
"storageClass": "NEARLINE"
},
"condition": {
"age": 30,
"matchesPrefix": ["logs/"]
}
},
{
"action": {
"type": "SetStorageClass",
"storageClass": "COLDLINE"
},
"condition": {
"age": 90,
"matchesPrefix": ["logs/"]
}
},
{
"action": {
"type": "Delete"
},
"condition": {
"age": 365,
"matchesPrefix": ["logs/"]
}
}
]
}
}
EOF
gsutil lifecycle set lifecycle.json gs://my-bucket
5. Implement Scheduling for Non-Production Resources
Development, testing, and staging environments often run 24/7 despite only being used during business hours.
Implementation Steps:
-
Identify scheduling candidates
- Development and test environments
- Demo and training environments
- Batch processing resources
-
Define scheduling policies
- Business hours only (e.g., 8 AM - 6 PM weekdays)
- Custom schedules based on usage patterns
- On-demand scheduling with automation
-
Implement automated scheduling
- Use cloud provider native tools
- Consider third-party scheduling solutions
- Implement override mechanisms for exceptions
AWS Implementation Example:
# Create an EventBridge rule to start instances on weekday mornings
aws events put-rule \
--name "StartDevInstances" \
--schedule-expression "cron(0 8 ? * MON-FRI *)" \
--state ENABLED
# Create an EventBridge rule to stop instances in the evening
aws events put-rule \
--name "StopDevInstances" \
--schedule-expression "cron(0 18 ? * MON-FRI *)" \
--state ENABLED
# Create a Lambda function target for the start rule
aws events put-targets \
--rule "StartDevInstances" \
--targets "Id"="1","Arn"="arn:aws:lambda:us-east-1:123456789012:function:StartDevInstances"
# Create a Lambda function target for the stop rule
aws events put-targets \
--rule "StopDevInstances" \
--targets "Id"="1","Arn"="arn:aws:lambda:us-east-1:123456789012:function:StopDevInstances"
Azure Implementation Example:
# Create an Azure Automation account
az automation account create \
--name "ResourceScheduler" \
--resource-group "CostOptimization" \
--location "eastus"
# Create a runbook to start VMs
az automation runbook create \
--automation-account-name "ResourceScheduler" \
--resource-group "CostOptimization" \
--name "StartDevVMs" \
--type "PowerShell" \
--content-file "start-vms.ps1"
# Create a runbook to stop VMs
az automation runbook create \
--automation-account-name "ResourceScheduler" \
--resource-group "CostOptimization" \
--name "StopDevVMs" \
--type "PowerShell" \
--content-file "stop-vms.ps1"
# Create schedules
az automation schedule create \
--automation-account-name "ResourceScheduler" \
--resource-group "CostOptimization" \
--name "WeekdayMornings" \
--frequency "Week" \
--interval 1 \
--start-time "2025-03-01T08:00:00+00:00" \
--week-days "Monday Tuesday Wednesday Thursday Friday"
az automation schedule create \
--automation-account-name "ResourceScheduler" \
--resource-group "CostOptimization" \
--name "WeekdayEvenings" \
--frequency "Week" \
--interval 1 \
--start-time "2025-03-01T18:00:00+00:00" \
--week-days "Monday Tuesday Wednesday Thursday Friday"
# Link schedules to runbooks
az automation job schedule create \
--automation-account-name "ResourceScheduler" \
--resource-group "CostOptimization" \
--runbook-name "StartDevVMs" \
--schedule-name "WeekdayMornings"
az automation job schedule create \
--automation-account-name "ResourceScheduler" \
--resource-group "CostOptimization" \
--runbook-name "StopDevVMs" \
--schedule-name "WeekdayEvenings"