Terraform for AWS: Cloud-Native Infrastructure
AWS and Terraform are a powerful combination, but AWS’s complexity means there are specific patterns, gotchas, and best practices that aren’t obvious from general Terraform knowledge. This guide bridges that gap, covering the AWS-specific techniques that separate basic resource creation from production-ready, well-architected infrastructure.
From VPC design patterns to multi-account strategies, this guide covers the real-world challenges you’ll face when managing AWS infrastructure at scale with Terraform.
AWS Provider Setup
The AWS provider is Terraform’s gateway to Amazon Web Services, but configuring it properly for production use involves more than just setting a region. Authentication strategies, provider aliases for multi-region deployments, and proper credential management are essential for building reliable, secure infrastructure automation.
Getting the provider configuration right from the start prevents authentication headaches, security issues, and deployment failures down the road. The patterns in this part work whether you’re managing a single AWS account or a complex multi-account organization.
Authentication Strategies
AWS credentials can be provided to Terraform in several ways, each with different security and operational implications:
Environment variables (recommended for local development):
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_DEFAULT_REGION="us-west-2"
terraform plan
AWS CLI profiles for multiple account management:
# Configure profiles
aws configure --profile dev
aws configure --profile prod
# Use with Terraform
export AWS_PROFILE=dev
terraform plan
IAM roles (recommended for production):
provider "aws" {
region = "us-west-2"
assume_role {
role_arn = "arn:aws:iam::123456789012:role/TerraformRole"
session_name = "terraform-session"
}
}
Instance profiles for EC2-based CI/CD:
provider "aws" {
region = "us-west-2"
# Automatically uses instance profile when running on EC2
}
Multi-Region Provider Configuration
Real AWS architectures often span multiple regions for disaster recovery, compliance, or performance reasons:
# Primary region provider
provider "aws" {
region = "us-west-2"
alias = "primary"
default_tags {
tags = {
Environment = var.environment
ManagedBy = "terraform"
Project = var.project_name
}
}
}
# Secondary region for DR
provider "aws" {
region = "us-east-1"
alias = "dr"
default_tags {
tags = {
Environment = var.environment
ManagedBy = "terraform"
Project = var.project_name
}
}
}
# Use providers in resources
resource "aws_s3_bucket" "primary" {
provider = aws.primary
bucket = "my-app-primary-${var.environment}"
}
resource "aws_s3_bucket" "dr" {
provider = aws.dr
bucket = "my-app-dr-${var.environment}"
}
Cross-Account Provider Setup
Multi-account AWS architectures require careful provider configuration:
# Shared services account
provider "aws" {
region = "us-west-2"
alias = "shared"
assume_role {
role_arn = "arn:aws:iam::111111111111:role/TerraformCrossAccountRole"
}
}
# Production account
provider "aws" {
region = "us-west-2"
alias = "prod"
assume_role {
role_arn = "arn:aws:iam::222222222222:role/TerraformCrossAccountRole"
}
}
# Create resources in different accounts
resource "aws_route53_zone" "shared" {
provider = aws.shared
name = "example.com"
}
resource "aws_route53_record" "prod" {
provider = aws.prod
zone_id = aws_route53_zone.shared.zone_id
name = "api.example.com"
type = "A"
ttl = 300
records = [aws_instance.api.public_ip]
}
Provider Version Management
Pin provider versions to ensure consistent deployments:
terraform {
required_version = ">= 1.6"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.20"
}
}
}
# Provider configuration
provider "aws" {
region = var.aws_region
# Skip metadata API check for faster provider initialization
skip_metadata_api_check = true
# Skip region validation for custom regions
skip_region_validation = false
# Skip credentials validation for faster startup
skip_credentials_validation = false
}
Default Tags and Resource Naming
Consistent tagging and naming are crucial for AWS cost management and organization:
provider "aws" {
region = "us-west-2"
default_tags {
tags = {
Environment = var.environment
Project = var.project_name
ManagedBy = "terraform"
Owner = var.team_name
CostCenter = var.cost_center
CreatedDate = formatdate("YYYY-MM-DD", timestamp())
}
}
}
# Local values for consistent naming
locals {
name_prefix = "${var.project_name}-${var.environment}"
common_tags = {
Application = var.application_name
Component = "infrastructure"
}
}
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = merge(local.common_tags, {
Name = "${local.name_prefix}-vpc"
Type = "networking"
})
}
AWS CLI Integration
Terraform works best when integrated with AWS CLI workflows:
# Validate AWS credentials
aws sts get-caller-identity
# Check current region
aws configure get region
# List available regions
aws ec2 describe-regions --query 'Regions[].RegionName' --output table
# Validate IAM permissions
aws iam simulate-principal-policy \
--policy-source-arn $(aws sts get-caller-identity --query Arn --output text) \
--action-names ec2:DescribeInstances \
--resource-arns "*"
Environment-Specific Configuration
Different environments often need different provider configurations:
# variables.tf
variable "environment" {
description = "Environment name"
type = string
}
variable "aws_region" {
description = "AWS region"
type = string
default = "us-west-2"
}
variable "assume_role_arn" {
description = "IAM role ARN to assume"
type = string
default = null
}
# main.tf
provider "aws" {
region = var.aws_region
dynamic "assume_role" {
for_each = var.assume_role_arn != null ? [1] : []
content {
role_arn = var.assume_role_arn
}
}
default_tags {
tags = {
Environment = var.environment
ManagedBy = "terraform"
}
}
}
Security Best Practices
Secure provider configuration prevents credential leaks and unauthorized access:
Never hardcode credentials:
# DON'T DO THIS
provider "aws" {
access_key = "AKIAIOSFODNN7EXAMPLE" # Never hardcode!
secret_key = "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
}
# DO THIS INSTEAD
provider "aws" {
region = "us-west-2"
# Use environment variables, profiles, or IAM roles
}
Use least-privilege IAM policies:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:Describe*",
"ec2:CreateTags",
"ec2:RunInstances",
"ec2:TerminateInstances"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"aws:RequestedRegion": ["us-west-2", "us-east-1"]
}
}
}
]
}
Enable CloudTrail logging:
resource "aws_cloudtrail" "terraform_audit" {
name = "terraform-audit-${var.environment}"
s3_bucket_name = aws_s3_bucket.audit_logs.bucket
event_selector {
read_write_type = "All"
include_management_events = true
data_resource {
type = "AWS::S3::Object"
values = ["${aws_s3_bucket.terraform_state.arn}/*"]
}
}
}
Troubleshooting Common Issues
Authentication failures:
# Check current credentials
aws sts get-caller-identity
# Verify region configuration
echo $AWS_DEFAULT_REGION
# Test specific profile
aws sts get-caller-identity --profile myprofile
Provider initialization issues:
# Clear provider cache
rm -rf .terraform/
# Reinitialize with debug logging
TF_LOG=DEBUG terraform init
Cross-account access problems:
# Test role assumption
aws sts assume-role \
--role-arn arn:aws:iam::123456789012:role/TerraformRole \
--role-session-name test-session
What’s Next
Proper AWS provider configuration is the foundation for everything else you’ll build with Terraform on AWS. With authentication, regions, and basic security patterns in place, you’re ready to tackle AWS networking—the backbone of well-architected cloud infrastructure.
In the next part, we’ll explore VPC design patterns, subnet strategies, and the networking building blocks that support scalable, secure AWS architectures.
VPC and Networking
AWS networking forms the foundation of every well-architected system, but designing VPCs that scale, perform well, and maintain security requires understanding both AWS networking concepts and Terraform patterns for managing complex network topologies. The decisions you make about CIDR blocks, subnet design, and connectivity patterns affect everything you’ll build on top.
This part covers the networking patterns that work well in production—from basic VPC design to complex multi-tier architectures with proper isolation and connectivity.
VPC Design Patterns
A well-designed VPC balances security, scalability, and operational simplicity:
# VPC with carefully planned CIDR blocks
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.name_prefix}-vpc"
Type = "networking"
}
}
# Internet Gateway for public connectivity
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = {
Name = "${var.name_prefix}-igw"
}
}
# Data source for availability zones
data "aws_availability_zones" "available" {
state = "available"
}
# Calculate subnet CIDRs automatically
locals {
az_count = min(length(data.aws_availability_zones.available.names), 3)
# Public subnets: 10.0.1.0/24, 10.0.2.0/24, 10.0.3.0/24
public_subnet_cidrs = [
for i in range(local.az_count) :
cidrsubnet(var.vpc_cidr, 8, i + 1)
]
# Private subnets: 10.0.11.0/24, 10.0.12.0/24, 10.0.13.0/24
private_subnet_cidrs = [
for i in range(local.az_count) :
cidrsubnet(var.vpc_cidr, 8, i + 11)
]
# Database subnets: 10.0.21.0/24, 10.0.22.0/24, 10.0.23.0/24
database_subnet_cidrs = [
for i in range(local.az_count) :
cidrsubnet(var.vpc_cidr, 8, i + 21)
]
}
Multi-Tier Subnet Architecture
Separate tiers provide security isolation and traffic control:
# Public subnets for load balancers and NAT gateways
resource "aws_subnet" "public" {
count = local.az_count
vpc_id = aws_vpc.main.id
cidr_block = local.public_subnet_cidrs[count.index]
availability_zone = data.aws_availability_zones.available.names[count.index]
map_public_ip_on_launch = true
tags = {
Name = "${var.name_prefix}-public-${count.index + 1}"
Type = "public"
Tier = "public"
}
}
# Private subnets for application servers
resource "aws_subnet" "private" {
count = local.az_count
vpc_id = aws_vpc.main.id
cidr_block = local.private_subnet_cidrs[count.index]
availability_zone = data.aws_availability_zones.available.names[count.index]
tags = {
Name = "${var.name_prefix}-private-${count.index + 1}"
Type = "private"
Tier = "application"
}
}
# Database subnets with additional isolation
resource "aws_subnet" "database" {
count = local.az_count
vpc_id = aws_vpc.main.id
cidr_block = local.database_subnet_cidrs[count.index]
availability_zone = data.aws_availability_zones.available.names[count.index]
tags = {
Name = "${var.name_prefix}-database-${count.index + 1}"
Type = "private"
Tier = "database"
}
}
# Database subnet group for RDS
resource "aws_db_subnet_group" "main" {
name = "${var.name_prefix}-db-subnet-group"
subnet_ids = aws_subnet.database[*].id
tags = {
Name = "${var.name_prefix}-db-subnet-group"
}
}
NAT Gateway Configuration
NAT Gateways provide secure internet access for private subnets:
# Elastic IPs for NAT Gateways
resource "aws_eip" "nat" {
count = var.enable_nat_gateway ? local.az_count : 0
domain = "vpc"
depends_on = [aws_internet_gateway.main]
tags = {
Name = "${var.name_prefix}-nat-eip-${count.index + 1}"
}
}
# NAT Gateways in public subnets
resource "aws_nat_gateway" "main" {
count = var.enable_nat_gateway ? local.az_count : 0
allocation_id = aws_eip.nat[count.index].id
subnet_id = aws_subnet.public[count.index].id
tags = {
Name = "${var.name_prefix}-nat-${count.index + 1}"
}
depends_on = [aws_internet_gateway.main]
}
Route Table Management
Proper routing ensures traffic flows correctly between tiers:
# Public route table
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
tags = {
Name = "${var.name_prefix}-public-rt"
Type = "public"
}
}
# Associate public subnets with public route table
resource "aws_route_table_association" "public" {
count = local.az_count
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
# Private route tables (one per AZ for NAT Gateway redundancy)
resource "aws_route_table" "private" {
count = local.az_count
vpc_id = aws_vpc.main.id
dynamic "route" {
for_each = var.enable_nat_gateway ? [1] : []
content {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.main[count.index].id
}
}
tags = {
Name = "${var.name_prefix}-private-rt-${count.index + 1}"
Type = "private"
}
}
# Associate private subnets with their route tables
resource "aws_route_table_association" "private" {
count = local.az_count
subnet_id = aws_subnet.private[count.index].id
route_table_id = aws_route_table.private[count.index].id
}
# Database route tables (isolated from internet)
resource "aws_route_table" "database" {
count = local.az_count
vpc_id = aws_vpc.main.id
tags = {
Name = "${var.name_prefix}-database-rt-${count.index + 1}"
Type = "database"
}
}
resource "aws_route_table_association" "database" {
count = local.az_count
subnet_id = aws_subnet.database[count.index].id
route_table_id = aws_route_table.database[count.index].id
}
Security Group Patterns
Security groups provide stateful firewall rules at the instance level:
# Web tier security group
resource "aws_security_group" "web" {
name_prefix = "${var.name_prefix}-web-"
vpc_id = aws_vpc.main.id
description = "Security group for web tier"
ingress {
description = "HTTP from ALB"
from_port = 80
to_port = 80
protocol = "tcp"
security_groups = [aws_security_group.alb.id]
}
ingress {
description = "HTTPS from ALB"
from_port = 443
to_port = 443
protocol = "tcp"
security_groups = [aws_security_group.alb.id]
}
egress {
description = "All outbound traffic"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.name_prefix}-web-sg"
Tier = "web"
}
}
# Application Load Balancer security group
resource "aws_security_group" "alb" {
name_prefix = "${var.name_prefix}-alb-"
vpc_id = aws_vpc.main.id
description = "Security group for Application Load Balancer"
ingress {
description = "HTTP from internet"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
description = "HTTPS from internet"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
description = "HTTP to web tier"
from_port = 80
to_port = 80
protocol = "tcp"
security_groups = [aws_security_group.web.id]
}
tags = {
Name = "${var.name_prefix}-alb-sg"
Tier = "load-balancer"
}
}
# Database security group
resource "aws_security_group" "database" {
name_prefix = "${var.name_prefix}-db-"
vpc_id = aws_vpc.main.id
description = "Security group for database tier"
ingress {
description = "MySQL/Aurora from application tier"
from_port = 3306
to_port = 3306
protocol = "tcp"
security_groups = [aws_security_group.web.id]
}
tags = {
Name = "${var.name_prefix}-db-sg"
Tier = "database"
}
}
VPC Endpoints for AWS Services
VPC endpoints provide private connectivity to AWS services:
# S3 VPC Endpoint (Gateway endpoint)
resource "aws_vpc_endpoint" "s3" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${var.aws_region}.s3"
route_table_ids = concat(
[aws_route_table.public.id],
aws_route_table.private[*].id
)
tags = {
Name = "${var.name_prefix}-s3-endpoint"
}
}
# EC2 VPC Endpoint (Interface endpoint)
resource "aws_vpc_endpoint" "ec2" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${var.aws_region}.ec2"
vpc_endpoint_type = "Interface"
subnet_ids = aws_subnet.private[*].id
security_group_ids = [aws_security_group.vpc_endpoints.id]
private_dns_enabled = true
tags = {
Name = "${var.name_prefix}-ec2-endpoint"
}
}
# Security group for VPC endpoints
resource "aws_security_group" "vpc_endpoints" {
name_prefix = "${var.name_prefix}-vpc-endpoints-"
vpc_id = aws_vpc.main.id
description = "Security group for VPC endpoints"
ingress {
description = "HTTPS from VPC"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = [aws_vpc.main.cidr_block]
}
tags = {
Name = "${var.name_prefix}-vpc-endpoints-sg"
}
}
Network ACLs for Additional Security
Network ACLs provide subnet-level security controls:
# Database tier Network ACL
resource "aws_network_acl" "database" {
vpc_id = aws_vpc.main.id
subnet_ids = aws_subnet.database[*].id
# Allow inbound MySQL from private subnets
ingress {
protocol = "tcp"
rule_no = 100
action = "allow"
cidr_block = "10.0.0.0/8"
from_port = 3306
to_port = 3306
}
# Allow return traffic
ingress {
protocol = "tcp"
rule_no = 110
action = "allow"
cidr_block = "0.0.0.0/0"
from_port = 1024
to_port = 65535
}
# Allow outbound responses
egress {
protocol = "tcp"
rule_no = 100
action = "allow"
cidr_block = "0.0.0.0/0"
from_port = 1024
to_port = 65535
}
tags = {
Name = "${var.name_prefix}-database-nacl"
Tier = "database"
}
}
Outputs for Network Resources
Expose network information for use by other configurations:
output "vpc_id" {
description = "ID of the VPC"
value = aws_vpc.main.id
}
output "vpc_cidr_block" {
description = "CIDR block of the VPC"
value = aws_vpc.main.cidr_block
}
output "public_subnet_ids" {
description = "IDs of the public subnets"
value = aws_subnet.public[*].id
}
output "private_subnet_ids" {
description = "IDs of the private subnets"
value = aws_subnet.private[*].id
}
output "database_subnet_ids" {
description = "IDs of the database subnets"
value = aws_subnet.database[*].id
}
output "database_subnet_group_name" {
description = "Name of the database subnet group"
value = aws_db_subnet_group.main.name
}
output "security_group_ids" {
description = "Security group IDs by tier"
value = {
web = aws_security_group.web.id
alb = aws_security_group.alb.id
database = aws_security_group.database.id
}
}
What’s Next
Well-designed networking provides the foundation for secure, scalable AWS architectures. With VPCs, subnets, and security groups properly configured, you’re ready to tackle AWS’s most complex topic: Identity and Access Management.
In the next part, we’ll explore IAM patterns that provide least-privilege access, enable cross-account workflows, and automate security controls across your AWS infrastructure.
IAM and Security
AWS Identity and Access Management is both the most critical and most complex aspect of AWS security. Getting IAM wrong can expose your entire infrastructure to attack or lock you out of your own resources. Terraform helps by making IAM policies version-controlled and repeatable, but you still need to understand the principles of least privilege, role-based access, and AWS’s various authentication mechanisms.
We’ll explore IAM patterns that work well in production, from basic role creation to complex cross-account access and automated security controls.
IAM Role Patterns
Roles are the foundation of AWS security, providing temporary credentials without long-lived access keys:
# EC2 instance role for application servers
resource "aws_iam_role" "app_server" {
name = "${var.name_prefix}-app-server-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}
]
})
tags = {
Name = "${var.name_prefix}-app-server-role"
Environment = var.environment
}
}
# Instance profile for EC2
resource "aws_iam_instance_profile" "app_server" {
name = "${var.name_prefix}-app-server-profile"
role = aws_iam_role.app_server.name
}
# Policy for application access
resource "aws_iam_role_policy" "app_server" {
name = "${var.name_prefix}-app-server-policy"
role = aws_iam_role.app_server.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"s3:GetObject",
"s3:PutObject"
]
Resource = [
"${aws_s3_bucket.app_data.arn}/*"
]
},
{
Effect = "Allow"
Action = [
"secretsmanager:GetSecretValue"
]
Resource = [
aws_secretsmanager_secret.app_secrets.arn
]
}
]
})
}
Cross-Account Access Patterns
Multi-account architectures require careful cross-account role configuration:
# Cross-account role for CI/CD access
resource "aws_iam_role" "cicd_cross_account" {
name = "${var.name_prefix}-cicd-cross-account"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
AWS = "arn:aws:iam::${var.cicd_account_id}:root"
}
Condition = {
StringEquals = {
"sts:ExternalId" = var.external_id
}
StringLike = {
"aws:userid" = "AIDACKCEVSQ6C2EXAMPLE:*"
}
}
}
]
})
max_session_duration = 3600 # 1 hour
tags = {
Purpose = "CI/CD cross-account access"
}
}
# Policy for deployment permissions
resource "aws_iam_role_policy" "cicd_deployment" {
name = "${var.name_prefix}-cicd-deployment"
role = aws_iam_role.cicd_cross_account.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"ec2:DescribeInstances",
"ec2:DescribeImages",
"ec2:RunInstances",
"ec2:TerminateInstances",
"ec2:CreateTags"
]
Resource = "*"
Condition = {
StringEquals = {
"aws:RequestedRegion" = [var.aws_region]
}
}
},
{
Effect = "Allow"
Action = [
"ecs:UpdateService",
"ecs:DescribeServices",
"ecs:RegisterTaskDefinition"
]
Resource = "*"
}
]
})
}
Service-Linked Roles and Managed Policies
Use AWS managed policies where appropriate, but understand their implications:
# Attach AWS managed policy
resource "aws_iam_role_policy_attachment" "app_server_ssm" {
role = aws_iam_role.app_server.name
policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}
# Create service-linked role for ECS
resource "aws_iam_service_linked_role" "ecs" {
aws_service_name = "ecs.amazonaws.com"
description = "Service-linked role for ECS"
}
# Custom policy with specific permissions
resource "aws_iam_policy" "app_specific" {
name = "${var.name_prefix}-app-specific"
description = "Application-specific permissions"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:UpdateItem",
"dynamodb:DeleteItem"
]
Resource = [
aws_dynamodb_table.app_data.arn,
"${aws_dynamodb_table.app_data.arn}/index/*"
]
}
]
})
}
resource "aws_iam_role_policy_attachment" "app_specific" {
role = aws_iam_role.app_server.name
policy_arn = aws_iam_policy.app_specific.arn
}
User and Group Management
Manage users and groups for human access:
# Developer group with limited permissions
resource "aws_iam_group" "developers" {
name = "${var.name_prefix}-developers"
}
resource "aws_iam_group_policy" "developers" {
name = "${var.name_prefix}-developers-policy"
group = aws_iam_group.developers.name
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"ec2:Describe*",
"s3:ListBucket",
"s3:GetObject",
"logs:DescribeLogGroups",
"logs:DescribeLogStreams",
"logs:GetLogEvents"
]
Resource = "*"
},
{
Effect = "Allow"
Action = [
"sts:AssumeRole"
]
Resource = [
aws_iam_role.developer_assume_role.arn
]
}
]
})
}
# Users (typically managed outside Terraform in production)
resource "aws_iam_user" "developers" {
for_each = var.developer_users
name = each.key
path = "/developers/"
tags = {
Team = each.value.team
Role = "developer"
}
}
resource "aws_iam_user_group_membership" "developers" {
for_each = aws_iam_user.developers
user = each.value.name
groups = [aws_iam_group.developers.name]
}
Secrets Management Integration
Integrate with AWS Secrets Manager and Parameter Store:
# Application secrets in Secrets Manager
resource "aws_secretsmanager_secret" "app_secrets" {
name = "${var.name_prefix}/app/secrets"
description = "Application secrets"
replica {
region = var.backup_region
}
tags = {
Application = var.application_name
Environment = var.environment
}
}
resource "aws_secretsmanager_secret_version" "app_secrets" {
secret_id = aws_secretsmanager_secret.app_secrets.id
secret_string = jsonencode({
database_password = random_password.db_password.result
api_key = random_password.api_key.result
jwt_secret = random_password.jwt_secret.result
})
}
# Configuration in Parameter Store
resource "aws_ssm_parameter" "app_config" {
for_each = var.app_parameters
name = "/${var.name_prefix}/config/${each.key}"
type = each.value.secure ? "SecureString" : "String"
value = each.value.value
tags = {
Application = var.application_name
Environment = var.environment
}
}
# IAM policy for secrets access
resource "aws_iam_role_policy" "secrets_access" {
name = "${var.name_prefix}-secrets-access"
role = aws_iam_role.app_server.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"secretsmanager:GetSecretValue"
]
Resource = [
aws_secretsmanager_secret.app_secrets.arn
]
},
{
Effect = "Allow"
Action = [
"ssm:GetParameter",
"ssm:GetParameters",
"ssm:GetParametersByPath"
]
Resource = [
"arn:aws:ssm:${var.aws_region}:${data.aws_caller_identity.current.account_id}:parameter/${var.name_prefix}/config/*"
]
}
]
})
}
Security Automation
Automate security controls and compliance:
# CloudTrail for audit logging
resource "aws_cloudtrail" "main" {
name = "${var.name_prefix}-cloudtrail"
s3_bucket_name = aws_s3_bucket.cloudtrail_logs.bucket
event_selector {
read_write_type = "All"
include_management_events = true
data_resource {
type = "AWS::S3::Object"
values = ["arn:aws:s3:::${aws_s3_bucket.sensitive_data.bucket}/*"]
}
}
insight_selector {
insight_type = "ApiCallRateInsight"
}
tags = {
Purpose = "Security audit logging"
}
}
# Config for compliance monitoring
resource "aws_config_configuration_recorder" "main" {
name = "${var.name_prefix}-config-recorder"
role_arn = aws_iam_role.config.arn
recording_group {
all_supported = true
include_global_resource_types = true
}
}
resource "aws_config_delivery_channel" "main" {
name = "${var.name_prefix}-config-delivery"
s3_bucket_name = aws_s3_bucket.config_logs.bucket
}
# Config rules for compliance
resource "aws_config_config_rule" "root_access_key_check" {
name = "${var.name_prefix}-root-access-key-check"
source {
owner = "AWS"
source_identifier = "ROOT_ACCESS_KEY_CHECK"
}
depends_on = [aws_config_configuration_recorder.main]
}
resource "aws_config_config_rule" "encrypted_volumes" {
name = "${var.name_prefix}-encrypted-volumes"
source {
owner = "AWS"
source_identifier = "ENCRYPTED_VOLUMES"
}
depends_on = [aws_config_configuration_recorder.main]
}
KMS Key Management
Manage encryption keys for different services:
# Application-specific KMS key
resource "aws_kms_key" "app_key" {
description = "KMS key for ${var.application_name}"
deletion_window_in_days = 7
enable_key_rotation = true
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "Enable IAM User Permissions"
Effect = "Allow"
Principal = {
AWS = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"
}
Action = "kms:*"
Resource = "*"
},
{
Sid = "Allow use of the key"
Effect = "Allow"
Principal = {
AWS = [
aws_iam_role.app_server.arn
]
}
Action = [
"kms:Encrypt",
"kms:Decrypt",
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:DescribeKey"
]
Resource = "*"
}
]
})
tags = {
Name = "${var.name_prefix}-app-key"
Application = var.application_name
}
}
resource "aws_kms_alias" "app_key" {
name = "alias/${var.name_prefix}-app-key"
target_key_id = aws_kms_key.app_key.key_id
}
# S3 bucket encryption with KMS
resource "aws_s3_bucket_server_side_encryption_configuration" "app_data" {
bucket = aws_s3_bucket.app_data.id
rule {
apply_server_side_encryption_by_default {
kms_master_key_id = aws_kms_key.app_key.arn
sse_algorithm = "aws:kms"
}
bucket_key_enabled = true
}
}
Security Group Automation
Create security groups with proper ingress/egress rules:
# Application security group with dynamic rules
resource "aws_security_group" "app" {
name_prefix = "${var.name_prefix}-app-"
vpc_id = var.vpc_id
description = "Security group for ${var.application_name}"
# Dynamic ingress rules
dynamic "ingress" {
for_each = var.ingress_rules
content {
description = ingress.value.description
from_port = ingress.value.from_port
to_port = ingress.value.to_port
protocol = ingress.value.protocol
cidr_blocks = ingress.value.cidr_blocks
security_groups = ingress.value.security_groups
}
}
# Allow all outbound traffic
egress {
description = "All outbound traffic"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.name_prefix}-app-sg"
}
}
# Database security group with restricted access
resource "aws_security_group" "database" {
name_prefix = "${var.name_prefix}-db-"
vpc_id = var.vpc_id
description = "Security group for database"
ingress {
description = "MySQL from application"
from_port = 3306
to_port = 3306
protocol = "tcp"
security_groups = [aws_security_group.app.id]
}
tags = {
Name = "${var.name_prefix}-db-sg"
}
}
IAM Access Analyzer
Use Access Analyzer to identify overly permissive policies:
resource "aws_accessanalyzer_analyzer" "main" {
analyzer_name = "${var.name_prefix}-access-analyzer"
type = "ACCOUNT"
tags = {
Environment = var.environment
Purpose = "IAM policy analysis"
}
}
# Archive findings that are expected
resource "aws_accessanalyzer_archive_rule" "ignore_public_s3" {
analyzer_name = aws_accessanalyzer_analyzer.main.analyzer_name
rule_name = "ignore-public-s3-buckets"
filter {
criteria = "resourceType"
eq = ["AWS::S3::Bucket"]
}
filter {
criteria = "isPublic"
eq = ["true"]
}
}
What’s Next
IAM and security form the foundation of AWS infrastructure protection, but managing multiple AWS accounts requires additional patterns for organization setup, cross-account access, and centralized governance.
In the next part, we’ll explore multi-account strategies using AWS Organizations, including account creation automation, cross-account role management, and centralized billing and compliance controls.
Multi-Account Strategies
AWS multi-account architecture is the gold standard for enterprise cloud deployments, providing isolation, security boundaries, and simplified billing. However, managing dozens or hundreds of AWS accounts manually becomes impossible. Terraform can automate account creation, organization setup, and cross-account access patterns, but it requires careful planning and understanding of AWS Organizations.
Here we’ll dive into patterns and practices for implementing multi-account AWS architectures with Terraform, from basic organization setup to complex cross-account workflows.
AWS Organizations Setup
AWS Organizations provides centralized management for multiple AWS accounts:
# Create the organization (run this in the master account)
resource "aws_organizations_organization" "main" {
aws_service_access_principals = [
"cloudtrail.amazonaws.com",
"config.amazonaws.com",
"guardduty.amazonaws.com",
"securityhub.amazonaws.com",
"sso.amazonaws.com"
]
feature_set = "ALL"
enabled_policy_types = [
"SERVICE_CONTROL_POLICY",
"TAG_POLICY",
"BACKUP_POLICY"
]
}
# Organizational Units for different environments
resource "aws_organizations_organizational_unit" "production" {
name = "Production"
parent_id = aws_organizations_organization.main.roots[0].id
}
resource "aws_organizations_organizational_unit" "non_production" {
name = "Non-Production"
parent_id = aws_organizations_organization.main.roots[0].id
}
resource "aws_organizations_organizational_unit" "security" {
name = "Security"
parent_id = aws_organizations_organization.main.roots[0].id
}
resource "aws_organizations_organizational_unit" "shared_services" {
name = "Shared Services"
parent_id = aws_organizations_organization.main.roots[0].id
}
Account Creation Automation
Automate the creation of new AWS accounts:
# Account creation with proper naming and email conventions
resource "aws_organizations_account" "accounts" {
for_each = var.aws_accounts
name = each.value.name
email = each.value.email
role_name = "OrganizationAccountAccessRole"
# Move to appropriate OU after creation
parent_id = each.value.parent_ou_id
tags = {
Environment = each.value.environment
Purpose = each.value.purpose
Owner = each.value.owner
}
}
# Variable definition for accounts
variable "aws_accounts" {
description = "AWS accounts to create"
type = map(object({
name = string
email = string
environment = string
purpose = string
owner = string
parent_ou_id = string
}))
default = {
prod_web = {
name = "Production Web Services"
email = "[email protected]"
environment = "production"
purpose = "web-services"
owner = "web-team"
parent_ou_id = aws_organizations_organizational_unit.production.id
}
prod_data = {
name = "Production Data Services"
email = "[email protected]"
environment = "production"
purpose = "data-services"
owner = "data-team"
parent_ou_id = aws_organizations_organizational_unit.production.id
}
dev_sandbox = {
name = "Development Sandbox"
email = "[email protected]"
environment = "development"
purpose = "sandbox"
owner = "engineering"
parent_ou_id = aws_organizations_organizational_unit.non_production.id
}
}
}
Service Control Policies
Implement governance through Service Control Policies:
# Prevent deletion of CloudTrail logs
resource "aws_organizations_policy" "prevent_cloudtrail_deletion" {
name = "PreventCloudTrailDeletion"
description = "Prevent deletion of CloudTrail logs and configuration"
type = "SERVICE_CONTROL_POLICY"
content = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "PreventCloudTrailDeletion"
Effect = "Deny"
Action = [
"cloudtrail:DeleteTrail",
"cloudtrail:StopLogging",
"cloudtrail:UpdateTrail"
]
Resource = "*"
Condition = {
StringNotEquals = {
"aws:PrincipalArn" = [
"arn:aws:iam::*:role/OrganizationAccountAccessRole",
"arn:aws:iam::*:role/SecurityAuditRole"
]
}
}
}
]
})
}
# Restrict regions for compliance
resource "aws_organizations_policy" "restrict_regions" {
name = "RestrictRegions"
description = "Restrict operations to approved regions"
type = "SERVICE_CONTROL_POLICY"
content = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "RestrictRegions"
Effect = "Deny"
NotAction = [
"iam:*",
"sts:*",
"cloudfront:*",
"route53:*",
"support:*",
"trustedadvisor:*"
]
Resource = "*"
Condition = {
StringNotEquals = {
"aws:RequestedRegion" = [
"us-east-1",
"us-west-2",
"eu-west-1"
]
}
}
}
]
})
}
# Attach policies to OUs
resource "aws_organizations_policy_attachment" "production_cloudtrail" {
policy_id = aws_organizations_policy.prevent_cloudtrail_deletion.id
target_id = aws_organizations_organizational_unit.production.id
}
resource "aws_organizations_policy_attachment" "all_regions" {
policy_id = aws_organizations_policy.restrict_regions.id
target_id = aws_organizations_organization.main.roots[0].id
}
Cross-Account Role Management
Set up roles for cross-account access:
# Cross-account role in each member account
resource "aws_iam_role" "cross_account_admin" {
provider = aws.member_account
name = "CrossAccountAdminRole"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
AWS = [
"arn:aws:iam::${var.master_account_id}:root",
"arn:aws:iam::${var.security_account_id}:root"
]
}
Condition = {
StringEquals = {
"sts:ExternalId" = var.external_id
}
IpAddress = {
"aws:SourceIp" = var.allowed_ip_ranges
}
}
}
]
})
max_session_duration = 3600
tags = {
Purpose = "Cross-account administration"
}
}
# Attach appropriate policies
resource "aws_iam_role_policy_attachment" "cross_account_admin" {
provider = aws.member_account
role = aws_iam_role.cross_account_admin.name
policy_arn = "arn:aws:iam::aws:policy/AdministratorAccess"
}
# Read-only role for auditing
resource "aws_iam_role" "cross_account_readonly" {
provider = aws.member_account
name = "CrossAccountReadOnlyRole"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
AWS = "arn:aws:iam::${var.security_account_id}:root"
}
}
]
})
}
resource "aws_iam_role_policy_attachment" "cross_account_readonly" {
provider = aws.member_account
role = aws_iam_role.cross_account_readonly.name
policy_arn = "arn:aws:iam::aws:policy/ReadOnlyAccess"
}
Centralized Logging and Monitoring
Set up centralized logging across all accounts:
# Central logging bucket in security account
resource "aws_s3_bucket" "central_logs" {
provider = aws.security_account
bucket = "${var.organization_name}-central-logs"
}
resource "aws_s3_bucket_policy" "central_logs" {
provider = aws.security_account
bucket = aws_s3_bucket.central_logs.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "AWSCloudTrailAclCheck"
Effect = "Allow"
Principal = {
Service = "cloudtrail.amazonaws.com"
}
Action = "s3:GetBucketAcl"
Resource = aws_s3_bucket.central_logs.arn
},
{
Sid = "AWSCloudTrailWrite"
Effect = "Allow"
Principal = {
Service = "cloudtrail.amazonaws.com"
}
Action = "s3:PutObject"
Resource = "${aws_s3_bucket.central_logs.arn}/*"
Condition = {
StringEquals = {
"s3:x-amz-acl" = "bucket-owner-full-control"
}
}
}
]
})
}
# CloudTrail in each member account
resource "aws_cloudtrail" "member_account" {
for_each = var.member_accounts
provider = aws.member_accounts[each.key]
name = "${each.key}-cloudtrail"
s3_bucket_name = aws_s3_bucket.central_logs.bucket
s3_key_prefix = "cloudtrail/${each.key}"
include_global_service_events = true
is_multi_region_trail = true
enable_logging = true
tags = {
Account = each.key
Purpose = "Centralized audit logging"
}
}
AWS SSO Integration
Integrate with AWS Single Sign-On for centralized access:
# SSO instance (created automatically when SSO is enabled)
data "aws_ssoadmin_instances" "main" {}
# Permission sets for different roles
resource "aws_ssoadmin_permission_set" "admin" {
name = "AdministratorAccess"
description = "Full administrative access"
instance_arn = tolist(data.aws_ssoadmin_instances.main.arns)[0]
session_duration = "PT2H" # 2 hours
tags = {
Purpose = "Administrative access"
}
}
resource "aws_ssoadmin_managed_policy_attachment" "admin" {
instance_arn = tolist(data.aws_ssoadmin_instances.main.arns)[0]
managed_policy_arn = "arn:aws:iam::aws:policy/AdministratorAccess"
permission_set_arn = aws_ssoadmin_permission_set.admin.arn
}
# Developer permission set with limited access
resource "aws_ssoadmin_permission_set" "developer" {
name = "DeveloperAccess"
description = "Developer access with restrictions"
instance_arn = tolist(data.aws_ssoadmin_instances.main.arns)[0]
session_duration = "PT8H" # 8 hours
}
resource "aws_ssoadmin_permission_set_inline_policy" "developer" {
inline_policy = data.aws_iam_policy_document.developer_policy.json
instance_arn = tolist(data.aws_ssoadmin_instances.main.arns)[0]
permission_set_arn = aws_ssoadmin_permission_set.developer.arn
}
data "aws_iam_policy_document" "developer_policy" {
statement {
effect = "Allow"
actions = [
"ec2:Describe*",
"s3:ListBucket",
"s3:GetObject",
"logs:*",
"cloudwatch:*"
]
resources = ["*"]
}
statement {
effect = "Deny"
actions = [
"ec2:TerminateInstances",
"rds:DeleteDBInstance",
"s3:DeleteBucket"
]
resources = ["*"]
}
}
# Account assignments
resource "aws_ssoadmin_account_assignment" "admin_prod" {
instance_arn = tolist(data.aws_ssoadmin_instances.main.arns)[0]
permission_set_arn = aws_ssoadmin_permission_set.admin.arn
principal_id = var.admin_group_id
principal_type = "GROUP"
target_id = aws_organizations_account.accounts["prod_web"].id
target_type = "AWS_ACCOUNT"
}
Cost Management and Billing
Implement cost controls across accounts:
# Billing alerts for each account
resource "aws_budgets_budget" "account_budget" {
for_each = var.aws_accounts
provider = aws.member_accounts[each.key]
name = "${each.key}-monthly-budget"
budget_type = "COST"
limit_amount = each.value.monthly_budget
limit_unit = "USD"
time_unit = "MONTHLY"
cost_filters = {
Service = ["Amazon Elastic Compute Cloud - Compute"]
}
notification {
comparison_operator = "GREATER_THAN"
threshold = 80
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = [each.value.billing_email]
}
notification {
comparison_operator = "GREATER_THAN"
threshold = 100
threshold_type = "PERCENTAGE"
notification_type = "FORECASTED"
subscriber_email_addresses = [each.value.billing_email]
}
}
# Cost anomaly detection
resource "aws_ce_anomaly_detector" "account_anomaly" {
for_each = var.aws_accounts
provider = aws.member_accounts[each.key]
name = "${each.key}-cost-anomaly-detector"
monitor_type = "DIMENSIONAL"
specification = jsonencode({
Dimension = "SERVICE"
MatchOptions = ["EQUALS"]
Values = ["EC2-Instance", "RDS"]
})
}
resource "aws_ce_anomaly_subscription" "account_anomaly" {
for_each = var.aws_accounts
provider = aws.member_accounts[each.key]
name = "${each.key}-anomaly-subscription"
frequency = "DAILY"
monitor_arn_list = [
aws_ce_anomaly_detector.account_anomaly[each.key].arn
]
subscriber {
type = "EMAIL"
address = each.value.billing_email
}
threshold_expression {
and {
dimension {
key = "ANOMALY_TOTAL_IMPACT_ABSOLUTE"
values = ["100"]
match_options = ["GREATER_THAN_OR_EQUAL"]
}
}
}
}
Account Baseline Configuration
Apply consistent baseline configuration to all accounts:
# Module for account baseline
module "account_baseline" {
source = "./modules/account-baseline"
for_each = var.aws_accounts
providers = {
aws = aws.member_accounts[each.key]
}
account_name = each.key
environment = each.value.environment
security_account_id = var.security_account_id
log_bucket_name = aws_s3_bucket.central_logs.bucket
# Enable services based on account type
enable_guardduty = true
enable_config = true
enable_securityhub = each.value.environment == "production"
enable_cloudtrail = true
# Tagging strategy
default_tags = {
Account = each.key
Environment = each.value.environment
Owner = each.value.owner
ManagedBy = "terraform"
}
}
Cross-Account Resource Sharing
Share resources across accounts using Resource Access Manager:
# Share VPC subnets across accounts
resource "aws_ram_resource_share" "shared_subnets" {
provider = aws.shared_services
name = "shared-subnets"
allow_external_principals = false
tags = {
Purpose = "Share networking resources"
}
}
resource "aws_ram_resource_association" "shared_subnets" {
provider = aws.shared_services
for_each = toset(var.shared_subnet_ids)
resource_arn = "arn:aws:ec2:${var.aws_region}:${var.shared_services_account_id}:subnet/${each.value}"
resource_share_arn = aws_ram_resource_share.shared_subnets.arn
}
resource "aws_ram_principal_association" "shared_subnets" {
provider = aws.shared_services
for_each = var.member_account_ids
principal = each.value
resource_share_arn = aws_ram_resource_share.shared_subnets.arn
}
What’s Next
Multi-account strategies provide the organizational foundation for enterprise AWS deployments, but managing costs and implementing proper tagging strategies becomes critical as your infrastructure scales.
In the next part, we’ll explore cost optimization techniques, including resource lifecycle management, automated cost controls, and tagging strategies that enable accurate cost allocation and optimization across your AWS infrastructure.
Cost Optimization
AWS costs can spiral out of control quickly without proper governance and optimization strategies. Terraform helps by making cost controls repeatable and enforceable, but you need to understand AWS pricing models, implement proper tagging strategies, and automate resource lifecycle management to keep costs under control.
This part covers the patterns and practices for implementing cost optimization with Terraform, from basic tagging strategies to advanced automation that right-sizes resources and manages their lifecycle.
Comprehensive Tagging Strategy
Consistent tagging is the foundation of cost management and allocation:
# Global tagging strategy
locals {
# Required tags for all resources
required_tags = {
Environment = var.environment
Project = var.project_name
Owner = var.team_name
CostCenter = var.cost_center
ManagedBy = "terraform"
CreatedDate = formatdate("YYYY-MM-DD", timestamp())
}
# Optional tags that can be merged
optional_tags = {
Application = var.application_name
Component = var.component_name
Version = var.application_version
}
# Combined tags
common_tags = merge(local.required_tags, local.optional_tags)
}
# Provider-level default tags
provider "aws" {
region = var.aws_region
default_tags {
tags = local.required_tags
}
}
# Resource-specific tagging
resource "aws_instance" "web" {
ami = data.aws_ami.amazon_linux.id
instance_type = var.instance_type
tags = merge(local.common_tags, {
Name = "${var.name_prefix}-web-${count.index + 1}"
Role = "webserver"
Backup = "daily"
AutoShutdown = var.environment != "production" ? "true" : "false"
})
}
# Enforce tagging with lifecycle rules
resource "aws_instance" "web" {
# ... other configuration ...
lifecycle {
postcondition {
condition = alltrue([
for tag in keys(local.required_tags) :
contains(keys(self.tags), tag)
])
error_message = "All required tags must be present: ${join(", ", keys(local.required_tags))}"
}
}
}
Resource Right-Sizing
Implement policies to prevent oversized resources:
# Instance type validation
variable "allowed_instance_types" {
description = "Allowed EC2 instance types by environment"
type = map(list(string))
default = {
dev = [
"t3.nano", "t3.micro", "t3.small", "t3.medium"
]
staging = [
"t3.small", "t3.medium", "t3.large",
"m5.large", "m5.xlarge"
]
production = [
"t3.medium", "t3.large", "t3.xlarge",
"m5.large", "m5.xlarge", "m5.2xlarge",
"c5.large", "c5.xlarge", "c5.2xlarge"
]
}
}
resource "aws_instance" "web" {
ami = data.aws_ami.amazon_linux.id
instance_type = var.instance_type
lifecycle {
precondition {
condition = contains(
var.allowed_instance_types[var.environment],
var.instance_type
)
error_message = "Instance type ${var.instance_type} is not allowed in ${var.environment} environment. Allowed types: ${join(", ", var.allowed_instance_types[var.environment])}"
}
}
}
# RDS instance size controls
variable "allowed_db_instance_classes" {
description = "Allowed RDS instance classes by environment"
type = map(list(string))
default = {
dev = [
"db.t3.micro", "db.t3.small"
]
staging = [
"db.t3.small", "db.t3.medium", "db.r5.large"
]
production = [
"db.t3.medium", "db.t3.large",
"db.r5.large", "db.r5.xlarge", "db.r5.2xlarge"
]
}
}
resource "aws_db_instance" "main" {
identifier = "${var.name_prefix}-database"
engine = "mysql"
engine_version = "8.0"
instance_class = var.db_instance_class
lifecycle {
precondition {
condition = contains(
var.allowed_db_instance_classes[var.environment],
var.db_instance_class
)
error_message = "DB instance class ${var.db_instance_class} is not allowed in ${var.environment} environment."
}
}
}
Automated Resource Scheduling
Implement automated start/stop for non-production resources:
# Lambda function for EC2 scheduling
resource "aws_lambda_function" "ec2_scheduler" {
filename = "ec2_scheduler.zip"
function_name = "${var.name_prefix}-ec2-scheduler"
role = aws_iam_role.ec2_scheduler.arn
handler = "index.handler"
runtime = "python3.9"
timeout = 60
environment {
variables = {
ENVIRONMENT = var.environment
}
}
tags = local.common_tags
}
# IAM role for scheduler
resource "aws_iam_role" "ec2_scheduler" {
name = "${var.name_prefix}-ec2-scheduler-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "lambda.amazonaws.com"
}
}
]
})
}
resource "aws_iam_role_policy" "ec2_scheduler" {
name = "${var.name_prefix}-ec2-scheduler-policy"
role = aws_iam_role.ec2_scheduler.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
]
Resource = "arn:aws:logs:*:*:*"
},
{
Effect = "Allow"
Action = [
"ec2:DescribeInstances",
"ec2:StartInstances",
"ec2:StopInstances"
]
Resource = "*"
}
]
})
}
# CloudWatch Events for scheduling
resource "aws_cloudwatch_event_rule" "stop_instances" {
name = "${var.name_prefix}-stop-instances"
description = "Stop non-production instances at 6 PM"
schedule_expression = "cron(0 18 ? * MON-FRI *)"
tags = local.common_tags
}
resource "aws_cloudwatch_event_rule" "start_instances" {
name = "${var.name_prefix}-start-instances"
description = "Start non-production instances at 8 AM"
schedule_expression = "cron(0 8 ? * MON-FRI *)"
tags = local.common_tags
}
resource "aws_cloudwatch_event_target" "stop_instances" {
rule = aws_cloudwatch_event_rule.stop_instances.name
target_id = "StopInstancesTarget"
arn = aws_lambda_function.ec2_scheduler.arn
input = jsonencode({
action = "stop"
})
}
resource "aws_cloudwatch_event_target" "start_instances" {
rule = aws_cloudwatch_event_rule.start_instances.name
target_id = "StartInstancesTarget"
arn = aws_lambda_function.ec2_scheduler.arn
input = jsonencode({
action = "start"
})
}
resource "aws_lambda_permission" "allow_cloudwatch_stop" {
statement_id = "AllowExecutionFromCloudWatchStop"
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.ec2_scheduler.function_name
principal = "events.amazonaws.com"
source_arn = aws_cloudwatch_event_rule.stop_instances.arn
}
resource "aws_lambda_permission" "allow_cloudwatch_start" {
statement_id = "AllowExecutionFromCloudWatchStart"
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.ec2_scheduler.function_name
principal = "events.amazonaws.com"
source_arn = aws_cloudwatch_event_rule.start_instances.arn
}
Storage Lifecycle Management
Implement intelligent tiering and lifecycle policies:
# S3 bucket with intelligent tiering
resource "aws_s3_bucket" "data_storage" {
bucket = "${var.name_prefix}-data-storage"
tags = local.common_tags
}
resource "aws_s3_bucket_intelligent_tiering_configuration" "data_storage" {
bucket = aws_s3_bucket.data_storage.id
name = "EntireBucket"
status = "Enabled"
filter {
prefix = ""
}
tiering {
access_tier = "DEEP_ARCHIVE_ACCESS"
days = 180
}
tiering {
access_tier = "ARCHIVE_ACCESS"
days = 125
}
}
# Lifecycle configuration for different data types
resource "aws_s3_bucket_lifecycle_configuration" "data_storage" {
bucket = aws_s3_bucket.data_storage.id
rule {
id = "logs_lifecycle"
status = "Enabled"
filter {
prefix = "logs/"
}
transition {
days = 30
storage_class = "STANDARD_IA"
}
transition {
days = 90
storage_class = "GLACIER"
}
transition {
days = 365
storage_class = "DEEP_ARCHIVE"
}
expiration {
days = 2555 # 7 years
}
}
rule {
id = "temp_data_cleanup"
status = "Enabled"
filter {
prefix = "temp/"
}
expiration {
days = 7
}
}
rule {
id = "incomplete_multipart_uploads"
status = "Enabled"
abort_incomplete_multipart_upload {
days_after_initiation = 1
}
}
}
# EBS volume optimization
resource "aws_ebs_volume" "data" {
availability_zone = var.availability_zone
size = var.volume_size
type = var.environment == "production" ? "gp3" : "gp2"
encrypted = true
# Use gp3 for better cost/performance in production
dynamic "throughput" {
for_each = var.environment == "production" ? [1] : []
content {
throughput = 125 # Baseline throughput for gp3
}
}
dynamic "iops" {
for_each = var.environment == "production" ? [1] : []
content {
iops = 3000 # Baseline IOPS for gp3
}
}
tags = merge(local.common_tags, {
Name = "${var.name_prefix}-data-volume"
Type = "data"
})
}
Reserved Instance and Savings Plans Management
Track and manage reserved capacity:
# Data source to check existing reserved instances
data "aws_ec2_reserved_instances" "existing" {
filter {
name = "state"
values = ["active"]
}
}
# Local calculation for RI coverage
locals {
# Calculate running instances by type
running_instances = {
for instance_type, count in var.instance_counts :
instance_type => count
}
# Calculate RI coverage
ri_coverage = {
for ri in data.aws_ec2_reserved_instances.existing.reserved_instances :
ri.instance_type => ri.instance_count
}
# Identify gaps in RI coverage
ri_gaps = {
for instance_type, running_count in local.running_instances :
instance_type => max(0, running_count - lookup(local.ri_coverage, instance_type, 0))
}
}
# Output RI recommendations
output "ri_recommendations" {
description = "Reserved Instance purchase recommendations"
value = {
for instance_type, gap in local.ri_gaps :
instance_type => {
running_instances = local.running_instances[instance_type]
reserved_instances = lookup(local.ri_coverage, instance_type, 0)
recommended_purchase = gap
}
if gap > 0
}
}
Cost Monitoring and Alerting
Set up comprehensive cost monitoring:
# Budget for overall account spending
resource "aws_budgets_budget" "monthly_budget" {
name = "${var.name_prefix}-monthly-budget"
budget_type = "COST"
limit_amount = var.monthly_budget_limit
limit_unit = "USD"
time_unit = "MONTHLY"
cost_filters = {
LinkedAccount = [data.aws_caller_identity.current.account_id]
}
notification {
comparison_operator = "GREATER_THAN"
threshold = 80
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = var.budget_notification_emails
}
notification {
comparison_operator = "GREATER_THAN"
threshold = 100
threshold_type = "PERCENTAGE"
notification_type = "FORECASTED"
subscriber_email_addresses = var.budget_notification_emails
}
}
# Service-specific budgets
resource "aws_budgets_budget" "service_budgets" {
for_each = var.service_budgets
name = "${var.name_prefix}-${each.key}-budget"
budget_type = "COST"
limit_amount = each.value.limit
limit_unit = "USD"
time_unit = "MONTHLY"
cost_filters = {
Service = [each.value.service_name]
}
notification {
comparison_operator = "GREATER_THAN"
threshold = each.value.threshold
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = var.budget_notification_emails
}
}
# Cost anomaly detection
resource "aws_ce_anomaly_detector" "service_anomaly" {
name = "${var.name_prefix}-service-anomaly-detector"
monitor_type = "DIMENSIONAL"
specification = jsonencode({
Dimension = "SERVICE"
MatchOptions = ["EQUALS"]
Values = ["Amazon Elastic Compute Cloud - Compute", "Amazon Relational Database Service"]
})
tags = local.common_tags
}
resource "aws_ce_anomaly_subscription" "service_anomaly" {
name = "${var.name_prefix}-anomaly-subscription"
frequency = "DAILY"
monitor_arn_list = [
aws_ce_anomaly_detector.service_anomaly.arn
]
subscriber {
type = "EMAIL"
address = var.cost_anomaly_email
}
threshold_expression {
and {
dimension {
key = "ANOMALY_TOTAL_IMPACT_ABSOLUTE"
values = ["50"]
match_options = ["GREATER_THAN_OR_EQUAL"]
}
}
}
tags = local.common_tags
}
Spot Instance Integration
Use Spot instances for cost-effective compute:
# Launch template for Spot instances
resource "aws_launch_template" "spot_template" {
name_prefix = "${var.name_prefix}-spot-"
image_id = data.aws_ami.amazon_linux.id
instance_type = var.spot_instance_type
vpc_security_group_ids = [aws_security_group.web.id]
iam_instance_profile {
name = aws_iam_instance_profile.app_server.name
}
user_data = base64encode(templatefile("${path.module}/user_data.sh", {
environment = var.environment
}))
tag_specifications {
resource_type = "instance"
tags = merge(local.common_tags, {
Name = "${var.name_prefix}-spot-instance"
Type = "spot"
})
}
}
# Auto Scaling Group with mixed instances
resource "aws_autoscaling_group" "mixed_instances" {
name = "${var.name_prefix}-mixed-asg"
vpc_zone_identifier = var.private_subnet_ids
target_group_arns = [aws_lb_target_group.web.arn]
health_check_type = "ELB"
min_size = var.min_size
max_size = var.max_size
desired_capacity = var.desired_capacity
mixed_instances_policy {
launch_template {
launch_template_specification {
launch_template_id = aws_launch_template.spot_template.id
version = "$Latest"
}
override {
instance_type = "t3.medium"
weighted_capacity = "1"
}
override {
instance_type = "t3.large"
weighted_capacity = "2"
}
}
instances_distribution {
on_demand_base_capacity = 1
on_demand_percentage_above_base_capacity = 25
spot_allocation_strategy = "capacity-optimized"
spot_instance_pools = 2
spot_max_price = var.spot_max_price
}
}
tag {
key = "Name"
value = "${var.name_prefix}-mixed-asg"
propagate_at_launch = true
}
dynamic "tag" {
for_each = local.common_tags
content {
key = tag.key
value = tag.value
propagate_at_launch = true
}
}
}
Resource Cleanup Automation
Automate cleanup of unused resources:
# Lambda function for resource cleanup
resource "aws_lambda_function" "resource_cleanup" {
filename = "resource_cleanup.zip"
function_name = "${var.name_prefix}-resource-cleanup"
role = aws_iam_role.resource_cleanup.arn
handler = "index.handler"
runtime = "python3.9"
timeout = 300
environment {
variables = {
ENVIRONMENT = var.environment
DRY_RUN = var.cleanup_dry_run
}
}
tags = local.common_tags
}
# Schedule cleanup to run weekly
resource "aws_cloudwatch_event_rule" "resource_cleanup" {
name = "${var.name_prefix}-resource-cleanup"
description = "Weekly resource cleanup"
schedule_expression = "cron(0 2 ? * SUN *)" # 2 AM every Sunday
tags = local.common_tags
}
resource "aws_cloudwatch_event_target" "resource_cleanup" {
rule = aws_cloudwatch_event_rule.resource_cleanup.name
target_id = "ResourceCleanupTarget"
arn = aws_lambda_function.resource_cleanup.arn
}
resource "aws_lambda_permission" "allow_cloudwatch_cleanup" {
statement_id = "AllowExecutionFromCloudWatch"
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.resource_cleanup.function_name
principal = "events.amazonaws.com"
source_arn = aws_cloudwatch_event_rule.resource_cleanup.arn
}
What’s Next
Cost optimization provides the financial discipline needed for sustainable AWS operations, but implementing reusable patterns and modules is what makes these optimizations scalable across your organization.
In the next part, we’ll explore AWS-specific module patterns that encapsulate these cost optimization strategies along with security and operational best practices, creating reusable building blocks for your infrastructure.
AWS-Specific Modules
Creating reusable modules for AWS infrastructure patterns accelerates development and ensures consistency across projects. However, AWS-specific modules need to handle the complexity of AWS services, regional differences, and the various configuration options that make AWS both powerful and complicated.
This part covers patterns for building robust, reusable AWS modules that encapsulate best practices while remaining flexible enough for different use cases.
VPC Module with Best Practices
A comprehensive VPC module that handles common networking patterns:
# modules/aws-vpc/variables.tf
variable "name" {
description = "Name prefix for all resources"
type = string
}
variable "cidr_block" {
description = "CIDR block for the VPC"
type = string
default = "10.0.0.0/16"
validation {
condition = can(cidrhost(var.cidr_block, 0))
error_message = "Must be a valid CIDR block."
}
}
variable "availability_zones" {
description = "List of availability zones"
type = list(string)
default = []
}
variable "enable_nat_gateway" {
description = "Enable NAT Gateway for private subnets"
type = bool
default = true
}
variable "single_nat_gateway" {
description = "Use a single NAT Gateway for all private subnets"
type = bool
default = false
}
variable "enable_vpn_gateway" {
description = "Enable VPN Gateway"
type = bool
default = false
}
variable "enable_dns_hostnames" {
description = "Enable DNS hostnames in the VPC"
type = bool
default = true
}
variable "enable_dns_support" {
description = "Enable DNS support in the VPC"
type = bool
default = true
}
variable "tags" {
description = "Additional tags for all resources"
type = map(string)
default = {}
}
# modules/aws-vpc/main.tf
data "aws_availability_zones" "available" {
state = "available"
}
locals {
# Use provided AZs or default to first 3 available
azs = length(var.availability_zones) > 0 ? var.availability_zones : slice(data.aws_availability_zones.available.names, 0, 3)
# Calculate subnet CIDRs
public_subnet_cidrs = [
for i, az in local.azs :
cidrsubnet(var.cidr_block, 8, i + 1)
]
private_subnet_cidrs = [
for i, az in local.azs :
cidrsubnet(var.cidr_block, 8, i + 11)
]
database_subnet_cidrs = [
for i, az in local.azs :
cidrsubnet(var.cidr_block, 8, i + 21)
]
common_tags = merge(var.tags, {
ManagedBy = "terraform"
})
}
# VPC
resource "aws_vpc" "main" {
cidr_block = var.cidr_block
enable_dns_hostnames = var.enable_dns_hostnames
enable_dns_support = var.enable_dns_support
tags = merge(local.common_tags, {
Name = "${var.name}-vpc"
})
}
# Internet Gateway
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = merge(local.common_tags, {
Name = "${var.name}-igw"
})
}
# Public Subnets
resource "aws_subnet" "public" {
count = length(local.azs)
vpc_id = aws_vpc.main.id
cidr_block = local.public_subnet_cidrs[count.index]
availability_zone = local.azs[count.index]
map_public_ip_on_launch = true
tags = merge(local.common_tags, {
Name = "${var.name}-public-${count.index + 1}"
Type = "public"
Tier = "public"
})
}
# Private Subnets
resource "aws_subnet" "private" {
count = length(local.azs)
vpc_id = aws_vpc.main.id
cidr_block = local.private_subnet_cidrs[count.index]
availability_zone = local.azs[count.index]
tags = merge(local.common_tags, {
Name = "${var.name}-private-${count.index + 1}"
Type = "private"
Tier = "application"
})
}
# Database Subnets
resource "aws_subnet" "database" {
count = length(local.azs)
vpc_id = aws_vpc.main.id
cidr_block = local.database_subnet_cidrs[count.index]
availability_zone = local.azs[count.index]
tags = merge(local.common_tags, {
Name = "${var.name}-database-${count.index + 1}"
Type = "private"
Tier = "database"
})
}
# NAT Gateways
resource "aws_eip" "nat" {
count = var.enable_nat_gateway ? (var.single_nat_gateway ? 1 : length(local.azs)) : 0
domain = "vpc"
depends_on = [aws_internet_gateway.main]
tags = merge(local.common_tags, {
Name = "${var.name}-nat-eip-${count.index + 1}"
})
}
resource "aws_nat_gateway" "main" {
count = var.enable_nat_gateway ? (var.single_nat_gateway ? 1 : length(local.azs)) : 0
allocation_id = aws_eip.nat[count.index].id
subnet_id = aws_subnet.public[count.index].id
tags = merge(local.common_tags, {
Name = "${var.name}-nat-${count.index + 1}"
})
depends_on = [aws_internet_gateway.main]
}
# Route Tables
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
tags = merge(local.common_tags, {
Name = "${var.name}-public-rt"
Type = "public"
})
}
resource "aws_route_table" "private" {
count = var.enable_nat_gateway ? length(local.azs) : 1
vpc_id = aws_vpc.main.id
dynamic "route" {
for_each = var.enable_nat_gateway ? [1] : []
content {
cidr_block = "0.0.0.0/0"
nat_gateway_id = var.single_nat_gateway ? aws_nat_gateway.main[0].id : aws_nat_gateway.main[count.index].id
}
}
tags = merge(local.common_tags, {
Name = "${var.name}-private-rt-${count.index + 1}"
Type = "private"
})
}
# Route Table Associations
resource "aws_route_table_association" "public" {
count = length(aws_subnet.public)
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
resource "aws_route_table_association" "private" {
count = length(aws_subnet.private)
subnet_id = aws_subnet.private[count.index].id
route_table_id = var.enable_nat_gateway ? aws_route_table.private[count.index].id : aws_route_table.private[0].id
}
# VPN Gateway (optional)
resource "aws_vpn_gateway" "main" {
count = var.enable_vpn_gateway ? 1 : 0
vpc_id = aws_vpc.main.id
tags = merge(local.common_tags, {
Name = "${var.name}-vpn-gateway"
})
}
# modules/aws-vpc/outputs.tf
output "vpc_id" {
description = "ID of the VPC"
value = aws_vpc.main.id
}
output "vpc_cidr_block" {
description = "CIDR block of the VPC"
value = aws_vpc.main.cidr_block
}
output "public_subnet_ids" {
description = "IDs of the public subnets"
value = aws_subnet.public[*].id
}
output "private_subnet_ids" {
description = "IDs of the private subnets"
value = aws_subnet.private[*].id
}
output "database_subnet_ids" {
description = "IDs of the database subnets"
value = aws_subnet.database[*].id
}
output "internet_gateway_id" {
description = "ID of the Internet Gateway"
value = aws_internet_gateway.main.id
}
output "nat_gateway_ids" {
description = "IDs of the NAT Gateways"
value = aws_nat_gateway.main[*].id
}
output "availability_zones" {
description = "List of availability zones used"
value = local.azs
}
Application Load Balancer Module
A comprehensive ALB module with security best practices:
# modules/aws-alb/variables.tf
variable "name" {
description = "Name for the load balancer"
type = string
}
variable "vpc_id" {
description = "VPC ID where the load balancer will be created"
type = string
}
variable "subnet_ids" {
description = "List of subnet IDs for the load balancer"
type = list(string)
}
variable "certificate_arn" {
description = "ARN of the SSL certificate"
type = string
default = null
}
variable "enable_deletion_protection" {
description = "Enable deletion protection"
type = bool
default = true
}
variable "idle_timeout" {
description = "Connection idle timeout in seconds"
type = number
default = 60
}
variable "enable_http2" {
description = "Enable HTTP/2"
type = bool
default = true
}
variable "ip_address_type" {
description = "IP address type (ipv4 or dualstack)"
type = string
default = "ipv4"
}
variable "target_groups" {
description = "Map of target group configurations"
type = map(object({
port = number
protocol = string
target_type = string
health_check_path = string
health_check_matcher = string
health_check_timeout = number
health_check_interval = number
healthy_threshold = number
unhealthy_threshold = number
}))
default = {}
}
variable "listeners" {
description = "Map of listener configurations"
type = map(object({
port = number
protocol = string
certificate_arn = string
ssl_policy = string
default_action = object({
type = string
target_group_name = string
redirect_config = object({
status_code = string
protocol = string
port = string
})
})
}))
default = {}
}
variable "tags" {
description = "Additional tags"
type = map(string)
default = {}
}
# modules/aws-alb/main.tf
locals {
common_tags = merge(var.tags, {
ManagedBy = "terraform"
})
}
# Security Group for ALB
resource "aws_security_group" "alb" {
name_prefix = "${var.name}-alb-"
vpc_id = var.vpc_id
description = "Security group for ${var.name} ALB"
ingress {
description = "HTTP"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
description = "HTTPS"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
description = "All outbound traffic"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = merge(local.common_tags, {
Name = "${var.name}-alb-sg"
})
lifecycle {
create_before_destroy = true
}
}
# Application Load Balancer
resource "aws_lb" "main" {
name = var.name
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.alb.id]
subnets = var.subnet_ids
enable_deletion_protection = var.enable_deletion_protection
idle_timeout = var.idle_timeout
enable_http2 = var.enable_http2
ip_address_type = var.ip_address_type
access_logs {
bucket = aws_s3_bucket.alb_logs.bucket
prefix = "alb-logs"
enabled = true
}
tags = merge(local.common_tags, {
Name = var.name
})
}
# S3 bucket for ALB access logs
resource "aws_s3_bucket" "alb_logs" {
bucket = "${var.name}-alb-logs-${random_id.bucket_suffix.hex}"
force_destroy = true
tags = local.common_tags
}
resource "random_id" "bucket_suffix" {
byte_length = 4
}
resource "aws_s3_bucket_lifecycle_configuration" "alb_logs" {
bucket = aws_s3_bucket.alb_logs.id
rule {
id = "delete_old_logs"
status = "Enabled"
expiration {
days = 90
}
}
}
resource "aws_s3_bucket_policy" "alb_logs" {
bucket = aws_s3_bucket.alb_logs.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Principal = {
AWS = data.aws_elb_service_account.main.arn
}
Action = "s3:PutObject"
Resource = "${aws_s3_bucket.alb_logs.arn}/*"
},
{
Effect = "Allow"
Principal = {
Service = "delivery.logs.amazonaws.com"
}
Action = "s3:PutObject"
Resource = "${aws_s3_bucket.alb_logs.arn}/*"
Condition = {
StringEquals = {
"s3:x-amz-acl" = "bucket-owner-full-control"
}
}
}
]
})
}
data "aws_elb_service_account" "main" {}
# Target Groups
resource "aws_lb_target_group" "main" {
for_each = var.target_groups
name = "${var.name}-${each.key}"
port = each.value.port
protocol = each.value.protocol
vpc_id = var.vpc_id
target_type = each.value.target_type
health_check {
enabled = true
healthy_threshold = each.value.healthy_threshold
unhealthy_threshold = each.value.unhealthy_threshold
timeout = each.value.health_check_timeout
interval = each.value.health_check_interval
path = each.value.health_check_path
matcher = each.value.health_check_matcher
port = "traffic-port"
protocol = each.value.protocol
}
tags = merge(local.common_tags, {
Name = "${var.name}-${each.key}-tg"
})
}
# Listeners
resource "aws_lb_listener" "main" {
for_each = var.listeners
load_balancer_arn = aws_lb.main.arn
port = each.value.port
protocol = each.value.protocol
certificate_arn = each.value.certificate_arn
ssl_policy = each.value.ssl_policy
default_action {
type = each.value.default_action.type
dynamic "target_group_arn" {
for_each = each.value.default_action.type == "forward" ? [1] : []
content {
target_group_arn = aws_lb_target_group.main[each.value.default_action.target_group_name].arn
}
}
dynamic "redirect" {
for_each = each.value.default_action.type == "redirect" ? [1] : []
content {
port = each.value.default_action.redirect_config.port
protocol = each.value.default_action.redirect_config.protocol
status_code = each.value.default_action.redirect_config.status_code
}
}
}
tags = local.common_tags
}
RDS Module with High Availability
A production-ready RDS module with backup and monitoring:
# modules/aws-rds/main.tf
resource "aws_db_subnet_group" "main" {
name = "${var.name}-db-subnet-group"
subnet_ids = var.subnet_ids
tags = merge(var.tags, {
Name = "${var.name}-db-subnet-group"
})
}
resource "aws_security_group" "rds" {
name_prefix = "${var.name}-rds-"
vpc_id = var.vpc_id
description = "Security group for ${var.name} RDS instance"
ingress {
description = "Database access"
from_port = var.port
to_port = var.port
protocol = "tcp"
security_groups = var.allowed_security_groups
}
tags = merge(var.tags, {
Name = "${var.name}-rds-sg"
})
lifecycle {
create_before_destroy = true
}
}
resource "aws_db_instance" "main" {
identifier = var.name
# Engine configuration
engine = var.engine
engine_version = var.engine_version
instance_class = var.instance_class
# Storage configuration
allocated_storage = var.allocated_storage
max_allocated_storage = var.max_allocated_storage
storage_type = var.storage_type
storage_encrypted = var.storage_encrypted
kms_key_id = var.kms_key_id
# Database configuration
db_name = var.database_name
username = var.username
password = var.password
port = var.port
# Network configuration
db_subnet_group_name = aws_db_subnet_group.main.name
vpc_security_group_ids = [aws_security_group.rds.id]
publicly_accessible = var.publicly_accessible
# Backup configuration
backup_retention_period = var.backup_retention_period
backup_window = var.backup_window
maintenance_window = var.maintenance_window
# High availability
multi_az = var.multi_az
# Monitoring
monitoring_interval = var.monitoring_interval
monitoring_role_arn = var.monitoring_interval > 0 ? aws_iam_role.rds_monitoring[0].arn : null
# Performance Insights
performance_insights_enabled = var.performance_insights_enabled
performance_insights_kms_key_id = var.performance_insights_kms_key_id
performance_insights_retention_period = var.performance_insights_retention_period
# Deletion protection
deletion_protection = var.deletion_protection
skip_final_snapshot = var.skip_final_snapshot
final_snapshot_identifier = var.skip_final_snapshot ? null : "${var.name}-final-snapshot-${formatdate("YYYY-MM-DD-hhmm", timestamp())}"
tags = merge(var.tags, {
Name = var.name
})
}
# IAM role for enhanced monitoring
resource "aws_iam_role" "rds_monitoring" {
count = var.monitoring_interval > 0 ? 1 : 0
name = "${var.name}-rds-monitoring-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "monitoring.rds.amazonaws.com"
}
}
]
})
tags = var.tags
}
resource "aws_iam_role_policy_attachment" "rds_monitoring" {
count = var.monitoring_interval > 0 ? 1 : 0
role = aws_iam_role.rds_monitoring[0].name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonRDSEnhancedMonitoringRole"
}
# CloudWatch alarms
resource "aws_cloudwatch_metric_alarm" "database_cpu" {
alarm_name = "${var.name}-database-cpu-utilization"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "CPUUtilization"
namespace = "AWS/RDS"
period = "300"
statistic = "Average"
threshold = "80"
alarm_description = "This metric monitors RDS CPU utilization"
dimensions = {
DBInstanceIdentifier = aws_db_instance.main.id
}
alarm_actions = var.alarm_actions
tags = var.tags
}
resource "aws_cloudwatch_metric_alarm" "database_connections" {
alarm_name = "${var.name}-database-connection-count"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "DatabaseConnections"
namespace = "AWS/RDS"
period = "300"
statistic = "Average"
threshold = var.max_connections_threshold
alarm_description = "This metric monitors RDS connection count"
dimensions = {
DBInstanceIdentifier = aws_db_instance.main.id
}
alarm_actions = var.alarm_actions
tags = var.tags
}
ECS Fargate Module
A complete ECS Fargate module for containerized applications:
# modules/aws-ecs-fargate/main.tf
resource "aws_ecs_cluster" "main" {
name = var.cluster_name
configuration {
execute_command_configuration {
kms_key_id = var.kms_key_id
logging = "OVERRIDE"
log_configuration {
cloud_watch_encryption_enabled = true
cloud_watch_log_group_name = aws_cloudwatch_log_group.ecs_exec.name
}
}
}
setting {
name = "containerInsights"
value = var.enable_container_insights ? "enabled" : "disabled"
}
tags = var.tags
}
resource "aws_ecs_cluster_capacity_providers" "main" {
cluster_name = aws_ecs_cluster.main.name
capacity_providers = ["FARGATE", "FARGATE_SPOT"]
default_capacity_provider_strategy {
base = 1
weight = 100
capacity_provider = "FARGATE"
}
}
# Task Definition
resource "aws_ecs_task_definition" "main" {
family = var.service_name
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = var.cpu
memory = var.memory
execution_role_arn = aws_iam_role.ecs_execution.arn
task_role_arn = aws_iam_role.ecs_task.arn
container_definitions = jsonencode([
{
name = var.container_name
image = var.container_image
portMappings = [
{
containerPort = var.container_port
protocol = "tcp"
}
]
environment = [
for key, value in var.environment_variables : {
name = key
value = value
}
]
secrets = [
for key, value in var.secrets : {
name = key
valueFrom = value
}
]
logConfiguration = {
logDriver = "awslogs"
options = {
awslogs-group = aws_cloudwatch_log_group.app.name
awslogs-region = data.aws_region.current.name
awslogs-stream-prefix = "ecs"
}
}
healthCheck = var.health_check_command != null ? {
command = var.health_check_command
interval = 30
timeout = 5
retries = 3
startPeriod = 60
} : null
essential = true
}
])
tags = var.tags
}
# ECS Service
resource "aws_ecs_service" "main" {
name = var.service_name
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.main.arn
desired_count = var.desired_count
capacity_provider_strategy {
capacity_provider = "FARGATE"
weight = var.fargate_weight
base = var.fargate_base
}
capacity_provider_strategy {
capacity_provider = "FARGATE_SPOT"
weight = var.fargate_spot_weight
}
network_configuration {
security_groups = concat([aws_security_group.ecs_service.id], var.additional_security_groups)
subnets = var.subnet_ids
assign_public_ip = var.assign_public_ip
}
dynamic "load_balancer" {
for_each = var.target_group_arn != null ? [1] : []
content {
target_group_arn = var.target_group_arn
container_name = var.container_name
container_port = var.container_port
}
}
deployment_configuration {
maximum_percent = 200
minimum_healthy_percent = 100
}
enable_execute_command = var.enable_execute_command
tags = var.tags
depends_on = [
aws_iam_role_policy_attachment.ecs_execution,
aws_cloudwatch_log_group.app
]
}
# Auto Scaling
resource "aws_appautoscaling_target" "ecs_target" {
count = var.enable_autoscaling ? 1 : 0
max_capacity = var.max_capacity
min_capacity = var.min_capacity
resource_id = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.main.name}"
scalable_dimension = "ecs:service:DesiredCount"
service_namespace = "ecs"
}
resource "aws_appautoscaling_policy" "ecs_policy_cpu" {
count = var.enable_autoscaling ? 1 : 0
name = "${var.service_name}-cpu-scaling"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.ecs_target[0].resource_id
scalable_dimension = aws_appautoscaling_target.ecs_target[0].scalable_dimension
service_namespace = aws_appautoscaling_target.ecs_target[0].service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageCPUUtilization"
}
target_value = var.cpu_target_value
}
}
What’s Next
AWS-specific modules provide the building blocks for consistent, well-architected infrastructure, but monitoring and maintaining that infrastructure requires comprehensive observability and compliance automation.
In the next part, we’ll explore monitoring and compliance patterns that provide visibility into your AWS infrastructure, automate compliance checks, and integrate with AWS native monitoring services.
Monitoring and Compliance
Effective monitoring and compliance automation are essential for maintaining reliable, secure AWS infrastructure at scale. Terraform enables you to implement comprehensive observability and compliance controls as code, ensuring consistent monitoring across all your resources and automated compliance validation.
This part covers patterns for implementing monitoring, logging, alerting, and compliance automation using AWS native services and Terraform.
CloudWatch Monitoring Foundation
Establish comprehensive CloudWatch monitoring for all critical resources:
# CloudWatch Log Groups with proper retention
resource "aws_cloudwatch_log_group" "application_logs" {
for_each = var.log_groups
name = "/aws/${each.key}/${var.application_name}"
retention_in_days = each.value.retention_days
kms_key_id = var.log_encryption_key_id
tags = merge(var.common_tags, {
LogType = each.value.log_type
Application = var.application_name
})
}
# Custom CloudWatch Metrics
resource "aws_cloudwatch_metric_alarm" "application_errors" {
for_each = var.error_alarms
alarm_name = "${var.application_name}-${each.key}-errors"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = each.value.evaluation_periods
metric_name = each.value.metric_name
namespace = each.value.namespace
period = each.value.period
statistic = each.value.statistic
threshold = each.value.threshold
alarm_description = "High error rate for ${each.key}"
treat_missing_data = "notBreaching"
dimensions = each.value.dimensions
alarm_actions = [
aws_sns_topic.alerts.arn
]
ok_actions = [
aws_sns_topic.alerts.arn
]
tags = var.common_tags
}
# Composite Alarms for complex conditions
resource "aws_cloudwatch_composite_alarm" "application_health" {
alarm_name = "${var.application_name}-overall-health"
alarm_description = "Overall application health based on multiple metrics"
alarm_rule = join(" OR ", [
for alarm in aws_cloudwatch_metric_alarm.application_errors :
"ALARM(${alarm.alarm_name})"
])
actions_enabled = true
alarm_actions = [
aws_sns_topic.critical_alerts.arn
]
ok_actions = [
aws_sns_topic.critical_alerts.arn
]
tags = var.common_tags
}
# CloudWatch Dashboard
resource "aws_cloudwatch_dashboard" "application" {
dashboard_name = "${var.application_name}-dashboard"
dashboard_body = jsonencode({
widgets = [
{
type = "metric"
x = 0
y = 0
width = 12
height = 6
properties = {
metrics = [
["AWS/ApplicationELB", "RequestCount", "LoadBalancer", var.load_balancer_arn_suffix],
[".", "TargetResponseTime", ".", "."],
[".", "HTTPCode_Target_2XX_Count", ".", "."],
[".", "HTTPCode_Target_4XX_Count", ".", "."],
[".", "HTTPCode_Target_5XX_Count", ".", "."]
]
view = "timeSeries"
stacked = false
region = data.aws_region.current.name
title = "Load Balancer Metrics"
period = 300
}
},
{
type = "log"
x = 0
y = 6
width = 24
height = 6
properties = {
query = "SOURCE '${aws_cloudwatch_log_group.application_logs["app"].name}' | fields @timestamp, @message | filter @message like /ERROR/ | sort @timestamp desc | limit 100"
region = data.aws_region.current.name
title = "Recent Errors"
}
}
]
})
}
AWS Config for Compliance
Implement AWS Config for continuous compliance monitoring:
# Config Configuration Recorder
resource "aws_config_configuration_recorder" "main" {
name = "${var.organization_name}-config-recorder"
role_arn = aws_iam_role.config.arn
recording_group {
all_supported = true
include_global_resource_types = true
exclusion_by_resource_types {
resource_types = var.config_excluded_resource_types
}
}
depends_on = [aws_config_delivery_channel.main]
}
# Config Delivery Channel
resource "aws_config_delivery_channel" "main" {
name = "${var.organization_name}-config-delivery"
s3_bucket_name = aws_s3_bucket.config_logs.bucket
s3_key_prefix = "config"
snapshot_delivery_properties {
delivery_frequency = "TwentyFour_Hours"
}
}
# Config Rules for Compliance
resource "aws_config_config_rule" "compliance_rules" {
for_each = var.config_rules
name = "${var.organization_name}-${each.key}"
source {
owner = each.value.source_owner
source_identifier = each.value.source_identifier
}
dynamic "source_detail" {
for_each = each.value.source_details
content {
event_source = source_detail.value.event_source
message_type = source_detail.value.message_type
maximum_execution_frequency = source_detail.value.maximum_execution_frequency
}
}
input_parameters = jsonencode(each.value.input_parameters)
depends_on = [aws_config_configuration_recorder.main]
tags = var.common_tags
}
# Config Remediation Configurations
resource "aws_config_remediation_configuration" "auto_remediation" {
for_each = var.auto_remediation_rules
config_rule_name = aws_config_config_rule.compliance_rules[each.key].name
resource_type = each.value.resource_type
target_type = "SSM_DOCUMENT"
target_id = each.value.ssm_document_name
target_version = "1"
parameter {
name = "AutomationAssumeRole"
static_value = aws_iam_role.config_remediation.arn
}
dynamic "parameter" {
for_each = each.value.parameters
content {
name = parameter.key
static_value = parameter.value
}
}
automatic = each.value.automatic
maximum_automatic_attempts = each.value.maximum_automatic_attempts
}
# Config Conformance Packs
resource "aws_config_conformance_pack" "security_pack" {
name = "${var.organization_name}-security-conformance-pack"
template_body = file("${path.module}/conformance-packs/security-pack.yaml")
input_parameter {
parameter_name = "AccessLoggingBucketParameter"
parameter_value = aws_s3_bucket.access_logs.bucket
}
depends_on = [aws_config_configuration_recorder.main]
}
Security Hub Integration
Centralize security findings with AWS Security Hub:
# Enable Security Hub
resource "aws_securityhub_account" "main" {
enable_default_standards = true
}
# Security Standards Subscriptions
resource "aws_securityhub_standards_subscription" "aws_foundational" {
standards_arn = "arn:aws:securityhub:::ruleset/finding-format/aws-foundational-security-standard/v/1.0.0"
depends_on = [aws_securityhub_account.main]
}
resource "aws_securityhub_standards_subscription" "cis" {
standards_arn = "arn:aws:securityhub:::ruleset/finding-format/cis-aws-foundations-benchmark/v/1.2.0"
depends_on = [aws_securityhub_account.main]
}
resource "aws_securityhub_standards_subscription" "pci_dss" {
count = var.enable_pci_dss ? 1 : 0
standards_arn = "arn:aws:securityhub:::ruleset/finding-format/pci-dss/v/3.2.1"
depends_on = [aws_securityhub_account.main]
}
# Custom Security Hub Insights
resource "aws_securityhub_insight" "high_severity_findings" {
filters {
severity_label {
comparison = "EQUALS"
value = "HIGH"
}
record_state {
comparison = "EQUALS"
value = "ACTIVE"
}
}
group_by_attribute = "ResourceId"
name = "High Severity Active Findings"
depends_on = [aws_securityhub_account.main]
}
# EventBridge Rule for Security Hub Findings
resource "aws_cloudwatch_event_rule" "security_hub_findings" {
name = "${var.organization_name}-security-hub-findings"
description = "Capture Security Hub findings"
event_pattern = jsonencode({
source = ["aws.securityhub"]
detail-type = ["Security Hub Findings - Imported"]
detail = {
findings = {
Severity = {
Label = ["HIGH", "CRITICAL"]
}
RecordState = ["ACTIVE"]
}
}
})
}
resource "aws_cloudwatch_event_target" "security_hub_sns" {
rule = aws_cloudwatch_event_rule.security_hub_findings.name
target_id = "SecurityHubSNSTarget"
arn = aws_sns_topic.security_alerts.arn
}
GuardDuty Threat Detection
Implement GuardDuty for threat detection and response:
# Enable GuardDuty
resource "aws_guardduty_detector" "main" {
enable = true
datasources {
s3_logs {
enable = true
}
kubernetes {
audit_logs {
enable = var.enable_eks_audit_logs
}
}
malware_protection {
scan_ec2_instance_with_findings {
ebs_volumes {
enable = true
}
}
}
}
finding_publishing_frequency = "FIFTEEN_MINUTES"
tags = var.common_tags
}
# GuardDuty Threat Intel Set
resource "aws_guardduty_threatintelset" "custom_threats" {
count = length(var.threat_intel_sets) > 0 ? 1 : 0
activate = true
detector_id = aws_guardduty_detector.main.id
format = "TXT"
location = "s3://${aws_s3_bucket.threat_intel[0].bucket}/threat-intel.txt"
name = "${var.organization_name}-custom-threat-intel"
tags = var.common_tags
}
# GuardDuty IP Set for trusted IPs
resource "aws_guardduty_ipset" "trusted_ips" {
count = length(var.trusted_ip_ranges) > 0 ? 1 : 0
activate = true
detector_id = aws_guardduty_detector.main.id
format = "TXT"
location = "s3://${aws_s3_bucket.threat_intel[0].bucket}/trusted-ips.txt"
name = "${var.organization_name}-trusted-ips"
tags = var.common_tags
}
# EventBridge Rule for GuardDuty Findings
resource "aws_cloudwatch_event_rule" "guardduty_findings" {
name = "${var.organization_name}-guardduty-findings"
description = "Capture GuardDuty findings"
event_pattern = jsonencode({
source = ["aws.guardduty"]
detail-type = ["GuardDuty Finding"]
detail = {
severity = [7.0, 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 8.0, 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9, 9.0, 9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.8, 9.9, 10.0]
}
})
}
resource "aws_cloudwatch_event_target" "guardduty_lambda" {
rule = aws_cloudwatch_event_rule.guardduty_findings.name
target_id = "GuardDutyResponseLambda"
arn = aws_lambda_function.security_response.arn
}
Automated Compliance Reporting
Generate automated compliance reports:
# Lambda function for compliance reporting
resource "aws_lambda_function" "compliance_reporter" {
filename = "compliance_reporter.zip"
function_name = "${var.organization_name}-compliance-reporter"
role = aws_iam_role.compliance_reporter.arn
handler = "index.handler"
runtime = "python3.9"
timeout = 300
environment {
variables = {
CONFIG_BUCKET = aws_s3_bucket.compliance_reports.bucket
SECURITY_HUB_REGION = data.aws_region.current.name
SNS_TOPIC_ARN = aws_sns_topic.compliance_reports.arn
}
}
tags = var.common_tags
}
# Schedule compliance reporting
resource "aws_cloudwatch_event_rule" "compliance_report" {
name = "${var.organization_name}-compliance-report"
description = "Generate weekly compliance report"
schedule_expression = "cron(0 8 ? * MON *)" # Every Monday at 8 AM
tags = var.common_tags
}
resource "aws_cloudwatch_event_target" "compliance_report" {
rule = aws_cloudwatch_event_rule.compliance_report.name
target_id = "ComplianceReportTarget"
arn = aws_lambda_function.compliance_reporter.arn
}
# S3 bucket for compliance reports
resource "aws_s3_bucket" "compliance_reports" {
bucket = "${var.organization_name}-compliance-reports-${random_id.bucket_suffix.hex}"
tags = var.common_tags
}
resource "aws_s3_bucket_lifecycle_configuration" "compliance_reports" {
bucket = aws_s3_bucket.compliance_reports.id
rule {
id = "compliance_report_lifecycle"
status = "Enabled"
transition {
days = 90
storage_class = "STANDARD_IA"
}
transition {
days = 365
storage_class = "GLACIER"
}
expiration {
days = 2555 # 7 years retention
}
}
}
Cost and Usage Monitoring
Monitor costs and usage patterns:
# Cost Budget with multiple notifications
resource "aws_budgets_budget" "monthly_cost" {
name = "${var.organization_name}-monthly-cost-budget"
budget_type = "COST"
limit_amount = var.monthly_budget_limit
limit_unit = "USD"
time_unit = "MONTHLY"
cost_filters = {
LinkedAccount = [data.aws_caller_identity.current.account_id]
}
notification {
comparison_operator = "GREATER_THAN"
threshold = 50
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = var.budget_notification_emails
}
notification {
comparison_operator = "GREATER_THAN"
threshold = 80
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = var.budget_notification_emails
}
notification {
comparison_operator = "GREATER_THAN"
threshold = 100
threshold_type = "PERCENTAGE"
notification_type = "FORECASTED"
subscriber_email_addresses = var.budget_notification_emails
}
}
# Usage Budget for specific services
resource "aws_budgets_budget" "ec2_usage" {
name = "${var.organization_name}-ec2-usage-budget"
budget_type = "USAGE"
limit_amount = var.ec2_usage_limit
limit_unit = "Hrs"
time_unit = "MONTHLY"
cost_filters = {
Service = ["Amazon Elastic Compute Cloud - Compute"]
}
notification {
comparison_operator = "GREATER_THAN"
threshold = 80
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = var.budget_notification_emails
}
}
# Cost Anomaly Detection
resource "aws_ce_anomaly_detector" "cost_anomaly" {
name = "${var.organization_name}-cost-anomaly-detector"
monitor_type = "DIMENSIONAL"
specification = jsonencode({
Dimension = "SERVICE"
MatchOptions = ["EQUALS"]
Values = ["Amazon Elastic Compute Cloud - Compute", "Amazon Relational Database Service"]
})
tags = var.common_tags
}
resource "aws_ce_anomaly_subscription" "cost_anomaly" {
name = "${var.organization_name}-cost-anomaly-subscription"
frequency = "DAILY"
monitor_arn_list = [
aws_ce_anomaly_detector.cost_anomaly.arn
]
subscriber {
type = "EMAIL"
address = var.cost_anomaly_email
}
threshold_expression {
and {
dimension {
key = "ANOMALY_TOTAL_IMPACT_ABSOLUTE"
values = ["100"]
match_options = ["GREATER_THAN_OR_EQUAL"]
}
}
}
tags = var.common_tags
}
Notification and Alerting
Implement comprehensive notification systems:
# SNS Topics for different alert types
resource "aws_sns_topic" "alerts" {
name = "${var.organization_name}-alerts"
tags = var.common_tags
}
resource "aws_sns_topic" "critical_alerts" {
name = "${var.organization_name}-critical-alerts"
tags = var.common_tags
}
resource "aws_sns_topic" "security_alerts" {
name = "${var.organization_name}-security-alerts"
tags = var.common_tags
}
# SNS Topic Subscriptions
resource "aws_sns_topic_subscription" "email_alerts" {
for_each = toset(var.alert_email_addresses)
topic_arn = aws_sns_topic.alerts.arn
protocol = "email"
endpoint = each.value
}
resource "aws_sns_topic_subscription" "slack_alerts" {
count = var.slack_webhook_url != null ? 1 : 0
topic_arn = aws_sns_topic.critical_alerts.arn
protocol = "https"
endpoint = var.slack_webhook_url
}
# Lambda function for alert processing
resource "aws_lambda_function" "alert_processor" {
filename = "alert_processor.zip"
function_name = "${var.organization_name}-alert-processor"
role = aws_iam_role.alert_processor.arn
handler = "index.handler"
runtime = "python3.9"
timeout = 60
environment {
variables = {
SLACK_WEBHOOK_URL = var.slack_webhook_url
TEAMS_WEBHOOK_URL = var.teams_webhook_url
}
}
tags = var.common_tags
}
resource "aws_sns_topic_subscription" "lambda_processor" {
topic_arn = aws_sns_topic.alerts.arn
protocol = "lambda"
endpoint = aws_lambda_function.alert_processor.arn
}
resource "aws_lambda_permission" "allow_sns" {
statement_id = "AllowExecutionFromSNS"
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.alert_processor.function_name
principal = "sns.amazonaws.com"
source_arn = aws_sns_topic.alerts.arn
}
What’s Next
Comprehensive monitoring and compliance automation provide the observability and governance needed for production AWS infrastructure. These patterns ensure you can detect issues early, maintain compliance standards, and respond quickly to security threats.
In the final part, we’ll explore advanced AWS service integrations, including EKS, serverless architectures, and complex multi-service patterns that demonstrate how all these concepts work together in real-world applications.
Advanced Integration
Modern AWS architectures combine multiple services in complex patterns—EKS clusters with RDS databases, Lambda functions triggered by S3 events, API Gateway integrations with multiple backends. Terraform excels at orchestrating these complex integrations, but you need to understand service dependencies, data flow patterns, and the operational considerations that make these architectures work reliably.
This final part demonstrates advanced integration patterns that bring together everything you’ve learned about AWS and Terraform.
EKS Cluster with Complete Ecosystem
A production-ready EKS cluster with all supporting services:
# EKS Cluster
resource "aws_eks_cluster" "main" {
name = var.cluster_name
role_arn = aws_iam_role.eks_cluster.arn
version = var.kubernetes_version
vpc_config {
subnet_ids = concat(var.private_subnet_ids, var.public_subnet_ids)
endpoint_private_access = true
endpoint_public_access = var.enable_public_access
public_access_cidrs = var.public_access_cidrs
security_group_ids = [aws_security_group.eks_cluster.id]
}
encryption_config {
provider {
key_arn = aws_kms_key.eks.arn
}
resources = ["secrets"]
}
enabled_cluster_log_types = [
"api", "audit", "authenticator", "controllerManager", "scheduler"
]
depends_on = [
aws_iam_role_policy_attachment.eks_cluster_policy,
aws_iam_role_policy_attachment.eks_vpc_resource_controller,
aws_cloudwatch_log_group.eks_cluster
]
tags = var.tags
}
# EKS Node Groups with mixed instance types
resource "aws_eks_node_group" "main" {
for_each = var.node_groups
cluster_name = aws_eks_cluster.main.name
node_group_name = each.key
node_role_arn = aws_iam_role.eks_node_group.arn
subnet_ids = var.private_subnet_ids
capacity_type = each.value.capacity_type
instance_types = each.value.instance_types
ami_type = each.value.ami_type
disk_size = each.value.disk_size
scaling_config {
desired_size = each.value.desired_size
max_size = each.value.max_size
min_size = each.value.min_size
}
update_config {
max_unavailable_percentage = 25
}
# Launch template for advanced configuration
launch_template {
id = aws_launch_template.eks_nodes[each.key].id
version = aws_launch_template.eks_nodes[each.key].latest_version
}
labels = merge(each.value.labels, {
"node-group" = each.key
})
dynamic "taint" {
for_each = each.value.taints
content {
key = taint.value.key
value = taint.value.value
effect = taint.value.effect
}
}
tags = merge(var.tags, {
"kubernetes.io/cluster/${var.cluster_name}" = "owned"
})
depends_on = [
aws_iam_role_policy_attachment.eks_worker_node_policy,
aws_iam_role_policy_attachment.eks_cni_policy,
aws_iam_role_policy_attachment.eks_container_registry_policy
]
}
# Launch template for EKS nodes
resource "aws_launch_template" "eks_nodes" {
for_each = var.node_groups
name_prefix = "${var.cluster_name}-${each.key}-"
vpc_security_group_ids = [aws_security_group.eks_nodes.id]
user_data = base64encode(templatefile("${path.module}/user_data.sh", {
cluster_name = var.cluster_name
cluster_endpoint = aws_eks_cluster.main.endpoint
cluster_ca = aws_eks_cluster.main.certificate_authority[0].data
bootstrap_arguments = each.value.bootstrap_arguments
}))
tag_specifications {
resource_type = "instance"
tags = merge(var.tags, {
Name = "${var.cluster_name}-${each.key}-node"
})
}
lifecycle {
create_before_destroy = true
}
}
# EKS Add-ons
resource "aws_eks_addon" "addons" {
for_each = var.eks_addons
cluster_name = aws_eks_cluster.main.name
addon_name = each.key
addon_version = each.value.version
resolve_conflicts = "OVERWRITE"
service_account_role_arn = each.value.service_account_role_arn
tags = var.tags
}
# OIDC Identity Provider for service accounts
data "tls_certificate" "eks_oidc" {
url = aws_eks_cluster.main.identity[0].oidc[0].issuer
}
resource "aws_iam_openid_connect_provider" "eks_oidc" {
client_id_list = ["sts.amazonaws.com"]
thumbprint_list = [data.tls_certificate.eks_oidc.certificates[0].sha1_fingerprint]
url = aws_eks_cluster.main.identity[0].oidc[0].issuer
tags = var.tags
}
# Service account roles for common services
resource "aws_iam_role" "aws_load_balancer_controller" {
name = "${var.cluster_name}-aws-load-balancer-controller"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRoleWithWebIdentity"
Effect = "Allow"
Principal = {
Federated = aws_iam_openid_connect_provider.eks_oidc.arn
}
Condition = {
StringEquals = {
"${replace(aws_iam_openid_connect_provider.eks_oidc.url, "https://", "")}:sub" = "system:serviceaccount:kube-system:aws-load-balancer-controller"
"${replace(aws_iam_openid_connect_provider.eks_oidc.url, "https://", "")}:aud" = "sts.amazonaws.com"
}
}
}
]
})
tags = var.tags
}
resource "aws_iam_role_policy_attachment" "aws_load_balancer_controller" {
policy_arn = "arn:aws:iam::aws:policy/ElasticLoadBalancingFullAccess"
role = aws_iam_role.aws_load_balancer_controller.name
}
Serverless Application with API Gateway
A complete serverless application with API Gateway, Lambda, and DynamoDB:
# API Gateway REST API
resource "aws_api_gateway_rest_api" "main" {
name = var.api_name
description = "Serverless API for ${var.application_name}"
endpoint_configuration {
types = ["REGIONAL"]
}
tags = var.tags
}
# API Gateway Resources and Methods
resource "aws_api_gateway_resource" "users" {
rest_api_id = aws_api_gateway_rest_api.main.id
parent_id = aws_api_gateway_rest_api.main.root_resource_id
path_part = "users"
}
resource "aws_api_gateway_resource" "user_id" {
rest_api_id = aws_api_gateway_rest_api.main.id
parent_id = aws_api_gateway_resource.users.id
path_part = "{id}"
}
# Lambda functions for different operations
resource "aws_lambda_function" "api_functions" {
for_each = var.lambda_functions
filename = each.value.filename
function_name = "${var.application_name}-${each.key}"
role = aws_iam_role.lambda_execution.arn
handler = each.value.handler
runtime = each.value.runtime
timeout = each.value.timeout
memory_size = each.value.memory_size
environment {
variables = merge(each.value.environment_variables, {
DYNAMODB_TABLE = aws_dynamodb_table.main.name
REGION = data.aws_region.current.name
})
}
vpc_config {
subnet_ids = var.lambda_subnet_ids
security_group_ids = [aws_security_group.lambda.id]
}
dead_letter_config {
target_arn = aws_sqs_queue.dlq.arn
}
tags = var.tags
}
# API Gateway Methods and Integrations
resource "aws_api_gateway_method" "users_get" {
rest_api_id = aws_api_gateway_rest_api.main.id
resource_id = aws_api_gateway_resource.users.id
http_method = "GET"
authorization = "AWS_IAM"
request_parameters = {
"method.request.querystring.limit" = false
"method.request.querystring.offset" = false
}
}
resource "aws_api_gateway_integration" "users_get" {
rest_api_id = aws_api_gateway_rest_api.main.id
resource_id = aws_api_gateway_resource.users.id
http_method = aws_api_gateway_method.users_get.http_method
integration_http_method = "POST"
type = "AWS_PROXY"
uri = aws_lambda_function.api_functions["list_users"].invoke_arn
}
# API Gateway Deployment
resource "aws_api_gateway_deployment" "main" {
depends_on = [
aws_api_gateway_integration.users_get,
# Add other integrations here
]
rest_api_id = aws_api_gateway_rest_api.main.id
stage_name = var.api_stage
variables = {
deployed_at = timestamp()
}
lifecycle {
create_before_destroy = true
}
}
# API Gateway Stage with logging and throttling
resource "aws_api_gateway_stage" "main" {
deployment_id = aws_api_gateway_deployment.main.id
rest_api_id = aws_api_gateway_rest_api.main.id
stage_name = var.api_stage
access_log_settings {
destination_arn = aws_cloudwatch_log_group.api_gateway.arn
format = jsonencode({
requestId = "$context.requestId"
ip = "$context.identity.sourceIp"
caller = "$context.identity.caller"
user = "$context.identity.user"
requestTime = "$context.requestTime"
httpMethod = "$context.httpMethod"
resourcePath = "$context.resourcePath"
status = "$context.status"
protocol = "$context.protocol"
responseLength = "$context.responseLength"
})
}
xray_tracing_enabled = true
tags = var.tags
}
# API Gateway Method Settings
resource "aws_api_gateway_method_settings" "main" {
rest_api_id = aws_api_gateway_rest_api.main.id
stage_name = aws_api_gateway_stage.main.stage_name
method_path = "*/*"
settings {
metrics_enabled = true
logging_level = "INFO"
throttling_rate_limit = var.api_throttling_rate_limit
throttling_burst_limit = var.api_throttling_burst_limit
}
}
# DynamoDB Table with Global Secondary Indexes
resource "aws_dynamodb_table" "main" {
name = "${var.application_name}-data"
billing_mode = "PAY_PER_REQUEST"
hash_key = "id"
stream_enabled = true
stream_view_type = "NEW_AND_OLD_IMAGES"
attribute {
name = "id"
type = "S"
}
attribute {
name = "email"
type = "S"
}
attribute {
name = "created_at"
type = "S"
}
global_secondary_index {
name = "email-index"
hash_key = "email"
}
global_secondary_index {
name = "created-at-index"
hash_key = "created_at"
}
server_side_encryption {
enabled = true
kms_key_arn = aws_kms_key.dynamodb.arn
}
point_in_time_recovery {
enabled = true
}
tags = var.tags
}
# DynamoDB Stream Lambda Trigger
resource "aws_lambda_event_source_mapping" "dynamodb_stream" {
event_source_arn = aws_dynamodb_table.main.stream_arn
function_name = aws_lambda_function.api_functions["stream_processor"].arn
starting_position = "LATEST"
maximum_batching_window_in_seconds = 5
batch_size = 10
parallelization_factor = 2
}
Data Pipeline with S3, Lambda, and RDS
A data processing pipeline that demonstrates event-driven architecture:
# S3 Bucket for data ingestion
resource "aws_s3_bucket" "data_ingestion" {
bucket = "${var.application_name}-data-ingestion-${random_id.bucket_suffix.hex}"
tags = var.tags
}
resource "aws_s3_bucket_notification" "data_ingestion" {
bucket = aws_s3_bucket.data_ingestion.id
lambda_function {
lambda_function_arn = aws_lambda_function.data_processor.arn
events = ["s3:ObjectCreated:*"]
filter_prefix = "incoming/"
filter_suffix = ".json"
}
depends_on = [aws_lambda_permission.s3_invoke]
}
# Lambda function for data processing
resource "aws_lambda_function" "data_processor" {
filename = "data_processor.zip"
function_name = "${var.application_name}-data-processor"
role = aws_iam_role.data_processor.arn
handler = "index.handler"
runtime = "python3.9"
timeout = 300
memory_size = 1024
environment {
variables = {
RDS_ENDPOINT = aws_db_instance.analytics.endpoint
RDS_DATABASE = aws_db_instance.analytics.db_name
S3_BUCKET = aws_s3_bucket.processed_data.bucket
SQS_QUEUE = aws_sqs_queue.processing_queue.url
}
}
vpc_config {
subnet_ids = var.lambda_subnet_ids
security_group_ids = [aws_security_group.lambda_data_processor.id]
}
dead_letter_config {
target_arn = aws_sqs_queue.processing_dlq.arn
}
tags = var.tags
}
# RDS Instance for analytics
resource "aws_db_instance" "analytics" {
identifier = "${var.application_name}-analytics"
engine = "postgres"
engine_version = "14.9"
instance_class = var.analytics_db_instance_class
allocated_storage = var.analytics_db_storage
max_allocated_storage = var.analytics_db_max_storage
storage_type = "gp3"
storage_encrypted = true
kms_key_id = aws_kms_key.rds.arn
db_name = "analytics"
username = "analytics_user"
password = random_password.analytics_db.result
db_subnet_group_name = aws_db_subnet_group.analytics.name
vpc_security_group_ids = [aws_security_group.analytics_db.id]
backup_retention_period = 7
backup_window = "03:00-04:00"
maintenance_window = "sun:04:00-sun:05:00"
multi_az = var.environment == "production"
monitoring_interval = 60
monitoring_role_arn = aws_iam_role.rds_monitoring.arn
performance_insights_enabled = true
performance_insights_kms_key_id = aws_kms_key.rds.arn
deletion_protection = var.environment == "production"
skip_final_snapshot = var.environment != "production"
tags = var.tags
}
# SQS Queue for processing coordination
resource "aws_sqs_queue" "processing_queue" {
name = "${var.application_name}-processing-queue"
delay_seconds = 0
max_message_size = 262144
message_retention_seconds = 1209600 # 14 days
receive_wait_time_seconds = 20
redrive_policy = jsonencode({
deadLetterTargetArn = aws_sqs_queue.processing_dlq.arn
maxReceiveCount = 3
})
tags = var.tags
}
resource "aws_sqs_queue" "processing_dlq" {
name = "${var.application_name}-processing-dlq"
tags = var.tags
}
# EventBridge for workflow orchestration
resource "aws_cloudwatch_event_rule" "data_processing_workflow" {
name = "${var.application_name}-data-processing-workflow"
description = "Orchestrate data processing workflow"
event_pattern = jsonencode({
source = ["custom.dataprocessing"]
detail-type = ["Data Processing Complete"]
})
tags = var.tags
}
resource "aws_cloudwatch_event_target" "start_analytics" {
rule = aws_cloudwatch_event_rule.data_processing_workflow.name
target_id = "StartAnalyticsTarget"
arn = aws_lambda_function.analytics_processor.arn
}
# Step Functions for complex workflows
resource "aws_sfn_state_machine" "data_pipeline" {
name = "${var.application_name}-data-pipeline"
role_arn = aws_iam_role.step_functions.arn
definition = jsonencode({
Comment = "Data processing pipeline"
StartAt = "ProcessData"
States = {
ProcessData = {
Type = "Task"
Resource = aws_lambda_function.data_processor.arn
Next = "CheckProcessingResult"
Retry = [
{
ErrorEquals = ["Lambda.ServiceException", "Lambda.AWSLambdaException", "Lambda.SdkClientException"]
IntervalSeconds = 2
MaxAttempts = 6
BackoffRate = 2
}
]
}
CheckProcessingResult = {
Type = "Choice"
Choices = [
{
Variable = "$.status"
StringEquals = "SUCCESS"
Next = "RunAnalytics"
}
]
Default = "ProcessingFailed"
}
RunAnalytics = {
Type = "Task"
Resource = aws_lambda_function.analytics_processor.arn
End = true
}
ProcessingFailed = {
Type = "Fail"
Cause = "Data processing failed"
}
}
})
tags = var.tags
}
Multi-Service Integration with Service Discovery
Complex service integration using AWS Cloud Map:
# Service Discovery Namespace
resource "aws_service_discovery_private_dns_namespace" "main" {
name = "${var.application_name}.local"
description = "Service discovery for ${var.application_name}"
vpc = var.vpc_id
tags = var.tags
}
# Service Discovery Services
resource "aws_service_discovery_service" "services" {
for_each = var.services
name = each.key
dns_config {
namespace_id = aws_service_discovery_private_dns_namespace.main.id
dns_records {
ttl = 10
type = "A"
}
routing_policy = "MULTIVALUE"
}
health_check_grace_period_seconds = 30
tags = var.tags
}
# ECS Services with Service Discovery
resource "aws_ecs_service" "microservices" {
for_each = var.services
name = each.key
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.microservices[each.key].arn
desired_count = each.value.desired_count
network_configuration {
security_groups = [aws_security_group.microservices[each.key].id]
subnets = var.private_subnet_ids
}
service_registries {
registry_arn = aws_service_discovery_service.services[each.key].arn
}
load_balancer {
target_group_arn = aws_lb_target_group.microservices[each.key].arn
container_name = each.key
container_port = each.value.port
}
depends_on = [aws_lb_listener.microservices]
tags = var.tags
}
# Application Load Balancer with path-based routing
resource "aws_lb" "microservices" {
name = "${var.application_name}-alb"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.alb.id]
subnets = var.public_subnet_ids
enable_deletion_protection = var.environment == "production"
tags = var.tags
}
resource "aws_lb_listener" "microservices" {
load_balancer_arn = aws_lb.microservices.arn
port = "443"
protocol = "HTTPS"
ssl_policy = "ELBSecurityPolicy-TLS-1-2-2017-01"
certificate_arn = var.certificate_arn
default_action {
type = "fixed-response"
fixed_response {
content_type = "text/plain"
message_body = "Service not found"
status_code = "404"
}
}
}
# Listener rules for path-based routing
resource "aws_lb_listener_rule" "microservices" {
for_each = var.services
listener_arn = aws_lb_listener.microservices.arn
priority = each.value.priority
action {
type = "forward"
target_group_arn = aws_lb_target_group.microservices[each.key].arn
}
condition {
path_pattern {
values = each.value.path_patterns
}
}
}
Final Integration Example
A complete example that brings together multiple services:
# Main application module that uses all components
module "complete_application" {
source = "./modules/complete-application"
# Basic configuration
application_name = "my-app"
environment = "production"
# Network configuration
vpc_id = module.vpc.vpc_id
private_subnet_ids = module.vpc.private_subnet_ids
public_subnet_ids = module.vpc.public_subnet_ids
# EKS configuration
enable_eks = true
eks_config = {
kubernetes_version = "1.28"
node_groups = {
general = {
instance_types = ["t3.medium", "t3.large"]
capacity_type = "ON_DEMAND"
desired_size = 3
max_size = 10
min_size = 1
}
spot = {
instance_types = ["t3.medium", "t3.large", "t3.xlarge"]
capacity_type = "SPOT"
desired_size = 2
max_size = 20
min_size = 0
}
}
}
# Database configuration
database_config = {
engine = "postgres"
instance_class = "db.r5.large"
multi_az = true
backup_retention_period = 7
}
# Serverless configuration
enable_serverless = true
lambda_functions = {
api_handler = {
runtime = "python3.9"
handler = "app.handler"
memory_size = 512
timeout = 30
}
data_processor = {
runtime = "python3.9"
handler = "processor.handler"
memory_size = 1024
timeout = 300
}
}
# Monitoring configuration
monitoring_config = {
enable_detailed_monitoring = true
log_retention_days = 30
enable_xray_tracing = true
}
# Security configuration
security_config = {
enable_guardduty = true
enable_security_hub = true
enable_config = true
}
tags = {
Environment = "production"
Project = "my-app"
ManagedBy = "terraform"
}
}
Conclusion
This comprehensive guide has covered the essential patterns for using Terraform with AWS, from basic provider setup to complex multi-service architectures. The key to success with AWS and Terraform is understanding not just the individual services, but how they work together to create reliable, scalable, and secure systems.
The patterns and practices covered in this guide provide a foundation for building production-ready AWS infrastructure that scales with your organization’s needs while maintaining security, compliance, and operational excellence.