Terraform for AWS: Cloud-Native Infrastructure

AWS and Terraform are a powerful combination, but AWS’s complexity means there are specific patterns, gotchas, and best practices that aren’t obvious from general Terraform knowledge. This guide bridges that gap, covering the AWS-specific techniques that separate basic resource creation from production-ready, well-architected infrastructure.

From VPC design patterns to multi-account strategies, this guide covers the real-world challenges you’ll face when managing AWS infrastructure at scale with Terraform.

AWS Provider Setup

The AWS provider is Terraform’s gateway to Amazon Web Services, but configuring it properly for production use involves more than just setting a region. Authentication strategies, provider aliases for multi-region deployments, and proper credential management are essential for building reliable, secure infrastructure automation.

Getting the provider configuration right from the start prevents authentication headaches, security issues, and deployment failures down the road. The patterns in this part work whether you’re managing a single AWS account or a complex multi-account organization.

Authentication Strategies

AWS credentials can be provided to Terraform in several ways, each with different security and operational implications:

Environment variables (recommended for local development):

export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_DEFAULT_REGION="us-west-2"

terraform plan

AWS CLI profiles for multiple account management:

# Configure profiles
aws configure --profile dev
aws configure --profile prod

# Use with Terraform
export AWS_PROFILE=dev
terraform plan

IAM roles (recommended for production):

provider "aws" {
  region = "us-west-2"
  
  assume_role {
    role_arn = "arn:aws:iam::123456789012:role/TerraformRole"
    session_name = "terraform-session"
  }
}

Instance profiles for EC2-based CI/CD:

provider "aws" {
  region = "us-west-2"
  # Automatically uses instance profile when running on EC2
}

Multi-Region Provider Configuration

Real AWS architectures often span multiple regions for disaster recovery, compliance, or performance reasons:

# Primary region provider
provider "aws" {
  region = "us-west-2"
  alias  = "primary"
  
  default_tags {
    tags = {
      Environment = var.environment
      ManagedBy   = "terraform"
      Project     = var.project_name
    }
  }
}

# Secondary region for DR
provider "aws" {
  region = "us-east-1"
  alias  = "dr"
  
  default_tags {
    tags = {
      Environment = var.environment
      ManagedBy   = "terraform"
      Project     = var.project_name
    }
  }
}

# Use providers in resources
resource "aws_s3_bucket" "primary" {
  provider = aws.primary
  bucket   = "my-app-primary-${var.environment}"
}

resource "aws_s3_bucket" "dr" {
  provider = aws.dr
  bucket   = "my-app-dr-${var.environment}"
}

Cross-Account Provider Setup

Multi-account AWS architectures require careful provider configuration:

# Shared services account
provider "aws" {
  region = "us-west-2"
  alias  = "shared"
  
  assume_role {
    role_arn = "arn:aws:iam::111111111111:role/TerraformCrossAccountRole"
  }
}

# Production account
provider "aws" {
  region = "us-west-2"
  alias  = "prod"
  
  assume_role {
    role_arn = "arn:aws:iam::222222222222:role/TerraformCrossAccountRole"
  }
}

# Create resources in different accounts
resource "aws_route53_zone" "shared" {
  provider = aws.shared
  name     = "example.com"
}

resource "aws_route53_record" "prod" {
  provider = aws.prod
  zone_id  = aws_route53_zone.shared.zone_id
  name     = "api.example.com"
  type     = "A"
  ttl      = 300
  records  = [aws_instance.api.public_ip]
}

Provider Version Management

Pin provider versions to ensure consistent deployments:

terraform {
  required_version = ">= 1.6"
  
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.20"
    }
  }
}

# Provider configuration
provider "aws" {
  region = var.aws_region
  
  # Skip metadata API check for faster provider initialization
  skip_metadata_api_check = true
  
  # Skip region validation for custom regions
  skip_region_validation = false
  
  # Skip credentials validation for faster startup
  skip_credentials_validation = false
}

Default Tags and Resource Naming

Consistent tagging and naming are crucial for AWS cost management and organization:

provider "aws" {
  region = "us-west-2"
  
  default_tags {
    tags = {
      Environment   = var.environment
      Project       = var.project_name
      ManagedBy     = "terraform"
      Owner         = var.team_name
      CostCenter    = var.cost_center
      CreatedDate   = formatdate("YYYY-MM-DD", timestamp())
    }
  }
}

# Local values for consistent naming
locals {
  name_prefix = "${var.project_name}-${var.environment}"
  
  common_tags = {
    Application = var.application_name
    Component   = "infrastructure"
  }
}

resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true
  
  tags = merge(local.common_tags, {
    Name = "${local.name_prefix}-vpc"
    Type = "networking"
  })
}

AWS CLI Integration

Terraform works best when integrated with AWS CLI workflows:

# Validate AWS credentials
aws sts get-caller-identity

# Check current region
aws configure get region

# List available regions
aws ec2 describe-regions --query 'Regions[].RegionName' --output table

# Validate IAM permissions
aws iam simulate-principal-policy \
  --policy-source-arn $(aws sts get-caller-identity --query Arn --output text) \
  --action-names ec2:DescribeInstances \
  --resource-arns "*"

Environment-Specific Configuration

Different environments often need different provider configurations:

# variables.tf
variable "environment" {
  description = "Environment name"
  type        = string
}

variable "aws_region" {
  description = "AWS region"
  type        = string
  default     = "us-west-2"
}

variable "assume_role_arn" {
  description = "IAM role ARN to assume"
  type        = string
  default     = null
}

# main.tf
provider "aws" {
  region = var.aws_region
  
  dynamic "assume_role" {
    for_each = var.assume_role_arn != null ? [1] : []
    content {
      role_arn = var.assume_role_arn
    }
  }
  
  default_tags {
    tags = {
      Environment = var.environment
      ManagedBy   = "terraform"
    }
  }
}

Security Best Practices

Secure provider configuration prevents credential leaks and unauthorized access:

Never hardcode credentials:

# DON'T DO THIS
provider "aws" {
  access_key = "AKIAIOSFODNN7EXAMPLE"  # Never hardcode!
  secret_key = "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
}

# DO THIS INSTEAD
provider "aws" {
  region = "us-west-2"
  # Use environment variables, profiles, or IAM roles
}

Use least-privilege IAM policies:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:Describe*",
        "ec2:CreateTags",
        "ec2:RunInstances",
        "ec2:TerminateInstances"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "aws:RequestedRegion": ["us-west-2", "us-east-1"]
        }
      }
    }
  ]
}

Enable CloudTrail logging:

resource "aws_cloudtrail" "terraform_audit" {
  name           = "terraform-audit-${var.environment}"
  s3_bucket_name = aws_s3_bucket.audit_logs.bucket
  
  event_selector {
    read_write_type                 = "All"
    include_management_events       = true
    
    data_resource {
      type   = "AWS::S3::Object"
      values = ["${aws_s3_bucket.terraform_state.arn}/*"]
    }
  }
}

Troubleshooting Common Issues

Authentication failures:

# Check current credentials
aws sts get-caller-identity

# Verify region configuration
echo $AWS_DEFAULT_REGION

# Test specific profile
aws sts get-caller-identity --profile myprofile

Provider initialization issues:

# Clear provider cache
rm -rf .terraform/

# Reinitialize with debug logging
TF_LOG=DEBUG terraform init

Cross-account access problems:

# Test role assumption
aws sts assume-role \
  --role-arn arn:aws:iam::123456789012:role/TerraformRole \
  --role-session-name test-session

What’s Next

Proper AWS provider configuration is the foundation for everything else you’ll build with Terraform on AWS. With authentication, regions, and basic security patterns in place, you’re ready to tackle AWS networking—the backbone of well-architected cloud infrastructure.

In the next part, we’ll explore VPC design patterns, subnet strategies, and the networking building blocks that support scalable, secure AWS architectures.

VPC and Networking

AWS networking forms the foundation of every well-architected system, but designing VPCs that scale, perform well, and maintain security requires understanding both AWS networking concepts and Terraform patterns for managing complex network topologies. The decisions you make about CIDR blocks, subnet design, and connectivity patterns affect everything you’ll build on top.

This part covers the networking patterns that work well in production—from basic VPC design to complex multi-tier architectures with proper isolation and connectivity.

VPC Design Patterns

A well-designed VPC balances security, scalability, and operational simplicity:

# VPC with carefully planned CIDR blocks
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true
  
  tags = {
    Name = "${var.name_prefix}-vpc"
    Type = "networking"
  }
}

# Internet Gateway for public connectivity
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
  
  tags = {
    Name = "${var.name_prefix}-igw"
  }
}

# Data source for availability zones
data "aws_availability_zones" "available" {
  state = "available"
}

# Calculate subnet CIDRs automatically
locals {
  az_count = min(length(data.aws_availability_zones.available.names), 3)
  
  # Public subnets: 10.0.1.0/24, 10.0.2.0/24, 10.0.3.0/24
  public_subnet_cidrs = [
    for i in range(local.az_count) : 
    cidrsubnet(var.vpc_cidr, 8, i + 1)
  ]
  
  # Private subnets: 10.0.11.0/24, 10.0.12.0/24, 10.0.13.0/24
  private_subnet_cidrs = [
    for i in range(local.az_count) : 
    cidrsubnet(var.vpc_cidr, 8, i + 11)
  ]
  
  # Database subnets: 10.0.21.0/24, 10.0.22.0/24, 10.0.23.0/24
  database_subnet_cidrs = [
    for i in range(local.az_count) : 
    cidrsubnet(var.vpc_cidr, 8, i + 21)
  ]
}

Multi-Tier Subnet Architecture

Separate tiers provide security isolation and traffic control:

# Public subnets for load balancers and NAT gateways
resource "aws_subnet" "public" {
  count             = local.az_count
  vpc_id            = aws_vpc.main.id
  cidr_block        = local.public_subnet_cidrs[count.index]
  availability_zone = data.aws_availability_zones.available.names[count.index]
  
  map_public_ip_on_launch = true
  
  tags = {
    Name = "${var.name_prefix}-public-${count.index + 1}"
    Type = "public"
    Tier = "public"
  }
}

# Private subnets for application servers
resource "aws_subnet" "private" {
  count             = local.az_count
  vpc_id            = aws_vpc.main.id
  cidr_block        = local.private_subnet_cidrs[count.index]
  availability_zone = data.aws_availability_zones.available.names[count.index]
  
  tags = {
    Name = "${var.name_prefix}-private-${count.index + 1}"
    Type = "private"
    Tier = "application"
  }
}

# Database subnets with additional isolation
resource "aws_subnet" "database" {
  count             = local.az_count
  vpc_id            = aws_vpc.main.id
  cidr_block        = local.database_subnet_cidrs[count.index]
  availability_zone = data.aws_availability_zones.available.names[count.index]
  
  tags = {
    Name = "${var.name_prefix}-database-${count.index + 1}"
    Type = "private"
    Tier = "database"
  }
}

# Database subnet group for RDS
resource "aws_db_subnet_group" "main" {
  name       = "${var.name_prefix}-db-subnet-group"
  subnet_ids = aws_subnet.database[*].id
  
  tags = {
    Name = "${var.name_prefix}-db-subnet-group"
  }
}

NAT Gateway Configuration

NAT Gateways provide secure internet access for private subnets:

# Elastic IPs for NAT Gateways
resource "aws_eip" "nat" {
  count  = var.enable_nat_gateway ? local.az_count : 0
  domain = "vpc"
  
  depends_on = [aws_internet_gateway.main]
  
  tags = {
    Name = "${var.name_prefix}-nat-eip-${count.index + 1}"
  }
}

# NAT Gateways in public subnets
resource "aws_nat_gateway" "main" {
  count         = var.enable_nat_gateway ? local.az_count : 0
  allocation_id = aws_eip.nat[count.index].id
  subnet_id     = aws_subnet.public[count.index].id
  
  tags = {
    Name = "${var.name_prefix}-nat-${count.index + 1}"
  }
  
  depends_on = [aws_internet_gateway.main]
}

Route Table Management

Proper routing ensures traffic flows correctly between tiers:

# Public route table
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id
  
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }
  
  tags = {
    Name = "${var.name_prefix}-public-rt"
    Type = "public"
  }
}

# Associate public subnets with public route table
resource "aws_route_table_association" "public" {
  count          = local.az_count
  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

# Private route tables (one per AZ for NAT Gateway redundancy)
resource "aws_route_table" "private" {
  count  = local.az_count
  vpc_id = aws_vpc.main.id
  
  dynamic "route" {
    for_each = var.enable_nat_gateway ? [1] : []
    content {
      cidr_block     = "0.0.0.0/0"
      nat_gateway_id = aws_nat_gateway.main[count.index].id
    }
  }
  
  tags = {
    Name = "${var.name_prefix}-private-rt-${count.index + 1}"
    Type = "private"
  }
}

# Associate private subnets with their route tables
resource "aws_route_table_association" "private" {
  count          = local.az_count
  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = aws_route_table.private[count.index].id
}

# Database route tables (isolated from internet)
resource "aws_route_table" "database" {
  count  = local.az_count
  vpc_id = aws_vpc.main.id
  
  tags = {
    Name = "${var.name_prefix}-database-rt-${count.index + 1}"
    Type = "database"
  }
}

resource "aws_route_table_association" "database" {
  count          = local.az_count
  subnet_id      = aws_subnet.database[count.index].id
  route_table_id = aws_route_table.database[count.index].id
}

Security Group Patterns

Security groups provide stateful firewall rules at the instance level:

# Web tier security group
resource "aws_security_group" "web" {
  name_prefix = "${var.name_prefix}-web-"
  vpc_id      = aws_vpc.main.id
  description = "Security group for web tier"
  
  ingress {
    description = "HTTP from ALB"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    security_groups = [aws_security_group.alb.id]
  }
  
  ingress {
    description = "HTTPS from ALB"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    security_groups = [aws_security_group.alb.id]
  }
  
  egress {
    description = "All outbound traffic"
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  tags = {
    Name = "${var.name_prefix}-web-sg"
    Tier = "web"
  }
}

# Application Load Balancer security group
resource "aws_security_group" "alb" {
  name_prefix = "${var.name_prefix}-alb-"
  vpc_id      = aws_vpc.main.id
  description = "Security group for Application Load Balancer"
  
  ingress {
    description = "HTTP from internet"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  ingress {
    description = "HTTPS from internet"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  egress {
    description = "HTTP to web tier"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    security_groups = [aws_security_group.web.id]
  }
  
  tags = {
    Name = "${var.name_prefix}-alb-sg"
    Tier = "load-balancer"
  }
}

# Database security group
resource "aws_security_group" "database" {
  name_prefix = "${var.name_prefix}-db-"
  vpc_id      = aws_vpc.main.id
  description = "Security group for database tier"
  
  ingress {
    description = "MySQL/Aurora from application tier"
    from_port   = 3306
    to_port     = 3306
    protocol    = "tcp"
    security_groups = [aws_security_group.web.id]
  }
  
  tags = {
    Name = "${var.name_prefix}-db-sg"
    Tier = "database"
  }
}

VPC Endpoints for AWS Services

VPC endpoints provide private connectivity to AWS services:

# S3 VPC Endpoint (Gateway endpoint)
resource "aws_vpc_endpoint" "s3" {
  vpc_id       = aws_vpc.main.id
  service_name = "com.amazonaws.${var.aws_region}.s3"
  
  route_table_ids = concat(
    [aws_route_table.public.id],
    aws_route_table.private[*].id
  )
  
  tags = {
    Name = "${var.name_prefix}-s3-endpoint"
  }
}

# EC2 VPC Endpoint (Interface endpoint)
resource "aws_vpc_endpoint" "ec2" {
  vpc_id              = aws_vpc.main.id
  service_name        = "com.amazonaws.${var.aws_region}.ec2"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = aws_subnet.private[*].id
  security_group_ids  = [aws_security_group.vpc_endpoints.id]
  
  private_dns_enabled = true
  
  tags = {
    Name = "${var.name_prefix}-ec2-endpoint"
  }
}

# Security group for VPC endpoints
resource "aws_security_group" "vpc_endpoints" {
  name_prefix = "${var.name_prefix}-vpc-endpoints-"
  vpc_id      = aws_vpc.main.id
  description = "Security group for VPC endpoints"
  
  ingress {
    description = "HTTPS from VPC"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = [aws_vpc.main.cidr_block]
  }
  
  tags = {
    Name = "${var.name_prefix}-vpc-endpoints-sg"
  }
}

Network ACLs for Additional Security

Network ACLs provide subnet-level security controls:

# Database tier Network ACL
resource "aws_network_acl" "database" {
  vpc_id     = aws_vpc.main.id
  subnet_ids = aws_subnet.database[*].id
  
  # Allow inbound MySQL from private subnets
  ingress {
    protocol   = "tcp"
    rule_no    = 100
    action     = "allow"
    cidr_block = "10.0.0.0/8"
    from_port  = 3306
    to_port    = 3306
  }
  
  # Allow return traffic
  ingress {
    protocol   = "tcp"
    rule_no    = 110
    action     = "allow"
    cidr_block = "0.0.0.0/0"
    from_port  = 1024
    to_port    = 65535
  }
  
  # Allow outbound responses
  egress {
    protocol   = "tcp"
    rule_no    = 100
    action     = "allow"
    cidr_block = "0.0.0.0/0"
    from_port  = 1024
    to_port    = 65535
  }
  
  tags = {
    Name = "${var.name_prefix}-database-nacl"
    Tier = "database"
  }
}

Outputs for Network Resources

Expose network information for use by other configurations:

output "vpc_id" {
  description = "ID of the VPC"
  value       = aws_vpc.main.id
}

output "vpc_cidr_block" {
  description = "CIDR block of the VPC"
  value       = aws_vpc.main.cidr_block
}

output "public_subnet_ids" {
  description = "IDs of the public subnets"
  value       = aws_subnet.public[*].id
}

output "private_subnet_ids" {
  description = "IDs of the private subnets"
  value       = aws_subnet.private[*].id
}

output "database_subnet_ids" {
  description = "IDs of the database subnets"
  value       = aws_subnet.database[*].id
}

output "database_subnet_group_name" {
  description = "Name of the database subnet group"
  value       = aws_db_subnet_group.main.name
}

output "security_group_ids" {
  description = "Security group IDs by tier"
  value = {
    web      = aws_security_group.web.id
    alb      = aws_security_group.alb.id
    database = aws_security_group.database.id
  }
}

What’s Next

Well-designed networking provides the foundation for secure, scalable AWS architectures. With VPCs, subnets, and security groups properly configured, you’re ready to tackle AWS’s most complex topic: Identity and Access Management.

In the next part, we’ll explore IAM patterns that provide least-privilege access, enable cross-account workflows, and automate security controls across your AWS infrastructure.

IAM and Security

AWS Identity and Access Management is both the most critical and most complex aspect of AWS security. Getting IAM wrong can expose your entire infrastructure to attack or lock you out of your own resources. Terraform helps by making IAM policies version-controlled and repeatable, but you still need to understand the principles of least privilege, role-based access, and AWS’s various authentication mechanisms.

We’ll explore IAM patterns that work well in production, from basic role creation to complex cross-account access and automated security controls.

IAM Role Patterns

Roles are the foundation of AWS security, providing temporary credentials without long-lived access keys:

# EC2 instance role for application servers
resource "aws_iam_role" "app_server" {
  name = "${var.name_prefix}-app-server-role"
  
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "ec2.amazonaws.com"
        }
      }
    ]
  })
  
  tags = {
    Name        = "${var.name_prefix}-app-server-role"
    Environment = var.environment
  }
}

# Instance profile for EC2
resource "aws_iam_instance_profile" "app_server" {
  name = "${var.name_prefix}-app-server-profile"
  role = aws_iam_role.app_server.name
}

# Policy for application access
resource "aws_iam_role_policy" "app_server" {
  name = "${var.name_prefix}-app-server-policy"
  role = aws_iam_role.app_server.id
  
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "s3:GetObject",
          "s3:PutObject"
        ]
        Resource = [
          "${aws_s3_bucket.app_data.arn}/*"
        ]
      },
      {
        Effect = "Allow"
        Action = [
          "secretsmanager:GetSecretValue"
        ]
        Resource = [
          aws_secretsmanager_secret.app_secrets.arn
        ]
      }
    ]
  })
}

Cross-Account Access Patterns

Multi-account architectures require careful cross-account role configuration:

# Cross-account role for CI/CD access
resource "aws_iam_role" "cicd_cross_account" {
  name = "${var.name_prefix}-cicd-cross-account"
  
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          AWS = "arn:aws:iam::${var.cicd_account_id}:root"
        }
        Condition = {
          StringEquals = {
            "sts:ExternalId" = var.external_id
          }
          StringLike = {
            "aws:userid" = "AIDACKCEVSQ6C2EXAMPLE:*"
          }
        }
      }
    ]
  })
  
  max_session_duration = 3600  # 1 hour
  
  tags = {
    Purpose = "CI/CD cross-account access"
  }
}

# Policy for deployment permissions
resource "aws_iam_role_policy" "cicd_deployment" {
  name = "${var.name_prefix}-cicd-deployment"
  role = aws_iam_role.cicd_cross_account.id
  
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "ec2:DescribeInstances",
          "ec2:DescribeImages",
          "ec2:RunInstances",
          "ec2:TerminateInstances",
          "ec2:CreateTags"
        ]
        Resource = "*"
        Condition = {
          StringEquals = {
            "aws:RequestedRegion" = [var.aws_region]
          }
        }
      },
      {
        Effect = "Allow"
        Action = [
          "ecs:UpdateService",
          "ecs:DescribeServices",
          "ecs:RegisterTaskDefinition"
        ]
        Resource = "*"
      }
    ]
  })
}

Service-Linked Roles and Managed Policies

Use AWS managed policies where appropriate, but understand their implications:

# Attach AWS managed policy
resource "aws_iam_role_policy_attachment" "app_server_ssm" {
  role       = aws_iam_role.app_server.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}

# Create service-linked role for ECS
resource "aws_iam_service_linked_role" "ecs" {
  aws_service_name = "ecs.amazonaws.com"
  description      = "Service-linked role for ECS"
}

# Custom policy with specific permissions
resource "aws_iam_policy" "app_specific" {
  name        = "${var.name_prefix}-app-specific"
  description = "Application-specific permissions"
  
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "dynamodb:GetItem",
          "dynamodb:PutItem",
          "dynamodb:UpdateItem",
          "dynamodb:DeleteItem"
        ]
        Resource = [
          aws_dynamodb_table.app_data.arn,
          "${aws_dynamodb_table.app_data.arn}/index/*"
        ]
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "app_specific" {
  role       = aws_iam_role.app_server.name
  policy_arn = aws_iam_policy.app_specific.arn
}

User and Group Management

Manage users and groups for human access:

# Developer group with limited permissions
resource "aws_iam_group" "developers" {
  name = "${var.name_prefix}-developers"
}

resource "aws_iam_group_policy" "developers" {
  name  = "${var.name_prefix}-developers-policy"
  group = aws_iam_group.developers.name
  
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "ec2:Describe*",
          "s3:ListBucket",
          "s3:GetObject",
          "logs:DescribeLogGroups",
          "logs:DescribeLogStreams",
          "logs:GetLogEvents"
        ]
        Resource = "*"
      },
      {
        Effect = "Allow"
        Action = [
          "sts:AssumeRole"
        ]
        Resource = [
          aws_iam_role.developer_assume_role.arn
        ]
      }
    ]
  })
}

# Users (typically managed outside Terraform in production)
resource "aws_iam_user" "developers" {
  for_each = var.developer_users
  
  name = each.key
  path = "/developers/"
  
  tags = {
    Team = each.value.team
    Role = "developer"
  }
}

resource "aws_iam_user_group_membership" "developers" {
  for_each = aws_iam_user.developers
  
  user = each.value.name
  groups = [aws_iam_group.developers.name]
}

Secrets Management Integration

Integrate with AWS Secrets Manager and Parameter Store:

# Application secrets in Secrets Manager
resource "aws_secretsmanager_secret" "app_secrets" {
  name        = "${var.name_prefix}/app/secrets"
  description = "Application secrets"
  
  replica {
    region = var.backup_region
  }
  
  tags = {
    Application = var.application_name
    Environment = var.environment
  }
}

resource "aws_secretsmanager_secret_version" "app_secrets" {
  secret_id = aws_secretsmanager_secret.app_secrets.id
  secret_string = jsonencode({
    database_password = random_password.db_password.result
    api_key          = random_password.api_key.result
    jwt_secret       = random_password.jwt_secret.result
  })
}

# Configuration in Parameter Store
resource "aws_ssm_parameter" "app_config" {
  for_each = var.app_parameters
  
  name  = "/${var.name_prefix}/config/${each.key}"
  type  = each.value.secure ? "SecureString" : "String"
  value = each.value.value
  
  tags = {
    Application = var.application_name
    Environment = var.environment
  }
}

# IAM policy for secrets access
resource "aws_iam_role_policy" "secrets_access" {
  name = "${var.name_prefix}-secrets-access"
  role = aws_iam_role.app_server.id
  
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "secretsmanager:GetSecretValue"
        ]
        Resource = [
          aws_secretsmanager_secret.app_secrets.arn
        ]
      },
      {
        Effect = "Allow"
        Action = [
          "ssm:GetParameter",
          "ssm:GetParameters",
          "ssm:GetParametersByPath"
        ]
        Resource = [
          "arn:aws:ssm:${var.aws_region}:${data.aws_caller_identity.current.account_id}:parameter/${var.name_prefix}/config/*"
        ]
      }
    ]
  })
}

Security Automation

Automate security controls and compliance:

# CloudTrail for audit logging
resource "aws_cloudtrail" "main" {
  name           = "${var.name_prefix}-cloudtrail"
  s3_bucket_name = aws_s3_bucket.cloudtrail_logs.bucket
  
  event_selector {
    read_write_type                 = "All"
    include_management_events       = true
    
    data_resource {
      type   = "AWS::S3::Object"
      values = ["arn:aws:s3:::${aws_s3_bucket.sensitive_data.bucket}/*"]
    }
  }
  
  insight_selector {
    insight_type = "ApiCallRateInsight"
  }
  
  tags = {
    Purpose = "Security audit logging"
  }
}

# Config for compliance monitoring
resource "aws_config_configuration_recorder" "main" {
  name     = "${var.name_prefix}-config-recorder"
  role_arn = aws_iam_role.config.arn
  
  recording_group {
    all_supported                 = true
    include_global_resource_types = true
  }
}

resource "aws_config_delivery_channel" "main" {
  name           = "${var.name_prefix}-config-delivery"
  s3_bucket_name = aws_s3_bucket.config_logs.bucket
}

# Config rules for compliance
resource "aws_config_config_rule" "root_access_key_check" {
  name = "${var.name_prefix}-root-access-key-check"
  
  source {
    owner             = "AWS"
    source_identifier = "ROOT_ACCESS_KEY_CHECK"
  }
  
  depends_on = [aws_config_configuration_recorder.main]
}

resource "aws_config_config_rule" "encrypted_volumes" {
  name = "${var.name_prefix}-encrypted-volumes"
  
  source {
    owner             = "AWS"
    source_identifier = "ENCRYPTED_VOLUMES"
  }
  
  depends_on = [aws_config_configuration_recorder.main]
}

KMS Key Management

Manage encryption keys for different services:

# Application-specific KMS key
resource "aws_kms_key" "app_key" {
  description             = "KMS key for ${var.application_name}"
  deletion_window_in_days = 7
  enable_key_rotation     = true
  
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "Enable IAM User Permissions"
        Effect = "Allow"
        Principal = {
          AWS = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"
        }
        Action   = "kms:*"
        Resource = "*"
      },
      {
        Sid    = "Allow use of the key"
        Effect = "Allow"
        Principal = {
          AWS = [
            aws_iam_role.app_server.arn
          ]
        }
        Action = [
          "kms:Encrypt",
          "kms:Decrypt",
          "kms:ReEncrypt*",
          "kms:GenerateDataKey*",
          "kms:DescribeKey"
        ]
        Resource = "*"
      }
    ]
  })
  
  tags = {
    Name        = "${var.name_prefix}-app-key"
    Application = var.application_name
  }
}

resource "aws_kms_alias" "app_key" {
  name          = "alias/${var.name_prefix}-app-key"
  target_key_id = aws_kms_key.app_key.key_id
}

# S3 bucket encryption with KMS
resource "aws_s3_bucket_server_side_encryption_configuration" "app_data" {
  bucket = aws_s3_bucket.app_data.id
  
  rule {
    apply_server_side_encryption_by_default {
      kms_master_key_id = aws_kms_key.app_key.arn
      sse_algorithm     = "aws:kms"
    }
    bucket_key_enabled = true
  }
}

Security Group Automation

Create security groups with proper ingress/egress rules:

# Application security group with dynamic rules
resource "aws_security_group" "app" {
  name_prefix = "${var.name_prefix}-app-"
  vpc_id      = var.vpc_id
  description = "Security group for ${var.application_name}"
  
  # Dynamic ingress rules
  dynamic "ingress" {
    for_each = var.ingress_rules
    content {
      description = ingress.value.description
      from_port   = ingress.value.from_port
      to_port     = ingress.value.to_port
      protocol    = ingress.value.protocol
      cidr_blocks = ingress.value.cidr_blocks
      security_groups = ingress.value.security_groups
    }
  }
  
  # Allow all outbound traffic
  egress {
    description = "All outbound traffic"
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  tags = {
    Name = "${var.name_prefix}-app-sg"
  }
}

# Database security group with restricted access
resource "aws_security_group" "database" {
  name_prefix = "${var.name_prefix}-db-"
  vpc_id      = var.vpc_id
  description = "Security group for database"
  
  ingress {
    description     = "MySQL from application"
    from_port       = 3306
    to_port         = 3306
    protocol        = "tcp"
    security_groups = [aws_security_group.app.id]
  }
  
  tags = {
    Name = "${var.name_prefix}-db-sg"
  }
}

IAM Access Analyzer

Use Access Analyzer to identify overly permissive policies:

resource "aws_accessanalyzer_analyzer" "main" {
  analyzer_name = "${var.name_prefix}-access-analyzer"
  type          = "ACCOUNT"
  
  tags = {
    Environment = var.environment
    Purpose     = "IAM policy analysis"
  }
}

# Archive findings that are expected
resource "aws_accessanalyzer_archive_rule" "ignore_public_s3" {
  analyzer_name = aws_accessanalyzer_analyzer.main.analyzer_name
  rule_name     = "ignore-public-s3-buckets"
  
  filter {
    criteria = "resourceType"
    eq       = ["AWS::S3::Bucket"]
  }
  
  filter {
    criteria = "isPublic"
    eq       = ["true"]
  }
}

What’s Next

IAM and security form the foundation of AWS infrastructure protection, but managing multiple AWS accounts requires additional patterns for organization setup, cross-account access, and centralized governance.

In the next part, we’ll explore multi-account strategies using AWS Organizations, including account creation automation, cross-account role management, and centralized billing and compliance controls.

Multi-Account Strategies

AWS multi-account architecture is the gold standard for enterprise cloud deployments, providing isolation, security boundaries, and simplified billing. However, managing dozens or hundreds of AWS accounts manually becomes impossible. Terraform can automate account creation, organization setup, and cross-account access patterns, but it requires careful planning and understanding of AWS Organizations.

Here we’ll dive into patterns and practices for implementing multi-account AWS architectures with Terraform, from basic organization setup to complex cross-account workflows.

AWS Organizations Setup

AWS Organizations provides centralized management for multiple AWS accounts:

# Create the organization (run this in the master account)
resource "aws_organizations_organization" "main" {
  aws_service_access_principals = [
    "cloudtrail.amazonaws.com",
    "config.amazonaws.com",
    "guardduty.amazonaws.com",
    "securityhub.amazonaws.com",
    "sso.amazonaws.com"
  ]
  
  feature_set = "ALL"
  
  enabled_policy_types = [
    "SERVICE_CONTROL_POLICY",
    "TAG_POLICY",
    "BACKUP_POLICY"
  ]
}

# Organizational Units for different environments
resource "aws_organizations_organizational_unit" "production" {
  name      = "Production"
  parent_id = aws_organizations_organization.main.roots[0].id
}

resource "aws_organizations_organizational_unit" "non_production" {
  name      = "Non-Production"
  parent_id = aws_organizations_organization.main.roots[0].id
}

resource "aws_organizations_organizational_unit" "security" {
  name      = "Security"
  parent_id = aws_organizations_organization.main.roots[0].id
}

resource "aws_organizations_organizational_unit" "shared_services" {
  name      = "Shared Services"
  parent_id = aws_organizations_organization.main.roots[0].id
}

Account Creation Automation

Automate the creation of new AWS accounts:

# Account creation with proper naming and email conventions
resource "aws_organizations_account" "accounts" {
  for_each = var.aws_accounts
  
  name      = each.value.name
  email     = each.value.email
  role_name = "OrganizationAccountAccessRole"
  
  # Move to appropriate OU after creation
  parent_id = each.value.parent_ou_id
  
  tags = {
    Environment = each.value.environment
    Purpose     = each.value.purpose
    Owner       = each.value.owner
  }
}

# Variable definition for accounts
variable "aws_accounts" {
  description = "AWS accounts to create"
  type = map(object({
    name         = string
    email        = string
    environment  = string
    purpose      = string
    owner        = string
    parent_ou_id = string
  }))
  
  default = {
    prod_web = {
      name         = "Production Web Services"
      email        = "[email protected]"
      environment  = "production"
      purpose      = "web-services"
      owner        = "web-team"
      parent_ou_id = aws_organizations_organizational_unit.production.id
    }
    prod_data = {
      name         = "Production Data Services"
      email        = "[email protected]"
      environment  = "production"
      purpose      = "data-services"
      owner        = "data-team"
      parent_ou_id = aws_organizations_organizational_unit.production.id
    }
    dev_sandbox = {
      name         = "Development Sandbox"
      email        = "[email protected]"
      environment  = "development"
      purpose      = "sandbox"
      owner        = "engineering"
      parent_ou_id = aws_organizations_organizational_unit.non_production.id
    }
  }
}

Service Control Policies

Implement governance through Service Control Policies:

# Prevent deletion of CloudTrail logs
resource "aws_organizations_policy" "prevent_cloudtrail_deletion" {
  name        = "PreventCloudTrailDeletion"
  description = "Prevent deletion of CloudTrail logs and configuration"
  type        = "SERVICE_CONTROL_POLICY"
  
  content = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "PreventCloudTrailDeletion"
        Effect = "Deny"
        Action = [
          "cloudtrail:DeleteTrail",
          "cloudtrail:StopLogging",
          "cloudtrail:UpdateTrail"
        ]
        Resource = "*"
        Condition = {
          StringNotEquals = {
            "aws:PrincipalArn" = [
              "arn:aws:iam::*:role/OrganizationAccountAccessRole",
              "arn:aws:iam::*:role/SecurityAuditRole"
            ]
          }
        }
      }
    ]
  })
}

# Restrict regions for compliance
resource "aws_organizations_policy" "restrict_regions" {
  name        = "RestrictRegions"
  description = "Restrict operations to approved regions"
  type        = "SERVICE_CONTROL_POLICY"
  
  content = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "RestrictRegions"
        Effect = "Deny"
        NotAction = [
          "iam:*",
          "sts:*",
          "cloudfront:*",
          "route53:*",
          "support:*",
          "trustedadvisor:*"
        ]
        Resource = "*"
        Condition = {
          StringNotEquals = {
            "aws:RequestedRegion" = [
              "us-east-1",
              "us-west-2",
              "eu-west-1"
            ]
          }
        }
      }
    ]
  })
}

# Attach policies to OUs
resource "aws_organizations_policy_attachment" "production_cloudtrail" {
  policy_id = aws_organizations_policy.prevent_cloudtrail_deletion.id
  target_id = aws_organizations_organizational_unit.production.id
}

resource "aws_organizations_policy_attachment" "all_regions" {
  policy_id = aws_organizations_policy.restrict_regions.id
  target_id = aws_organizations_organization.main.roots[0].id
}

Cross-Account Role Management

Set up roles for cross-account access:

# Cross-account role in each member account
resource "aws_iam_role" "cross_account_admin" {
  provider = aws.member_account
  
  name = "CrossAccountAdminRole"
  
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          AWS = [
            "arn:aws:iam::${var.master_account_id}:root",
            "arn:aws:iam::${var.security_account_id}:root"
          ]
        }
        Condition = {
          StringEquals = {
            "sts:ExternalId" = var.external_id
          }
          IpAddress = {
            "aws:SourceIp" = var.allowed_ip_ranges
          }
        }
      }
    ]
  })
  
  max_session_duration = 3600
  
  tags = {
    Purpose = "Cross-account administration"
  }
}

# Attach appropriate policies
resource "aws_iam_role_policy_attachment" "cross_account_admin" {
  provider = aws.member_account
  
  role       = aws_iam_role.cross_account_admin.name
  policy_arn = "arn:aws:iam::aws:policy/AdministratorAccess"
}

# Read-only role for auditing
resource "aws_iam_role" "cross_account_readonly" {
  provider = aws.member_account
  
  name = "CrossAccountReadOnlyRole"
  
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          AWS = "arn:aws:iam::${var.security_account_id}:root"
        }
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "cross_account_readonly" {
  provider = aws.member_account
  
  role       = aws_iam_role.cross_account_readonly.name
  policy_arn = "arn:aws:iam::aws:policy/ReadOnlyAccess"
}

Centralized Logging and Monitoring

Set up centralized logging across all accounts:

# Central logging bucket in security account
resource "aws_s3_bucket" "central_logs" {
  provider = aws.security_account
  
  bucket = "${var.organization_name}-central-logs"
}

resource "aws_s3_bucket_policy" "central_logs" {
  provider = aws.security_account
  
  bucket = aws_s3_bucket.central_logs.id
  
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "AWSCloudTrailAclCheck"
        Effect = "Allow"
        Principal = {
          Service = "cloudtrail.amazonaws.com"
        }
        Action   = "s3:GetBucketAcl"
        Resource = aws_s3_bucket.central_logs.arn
      },
      {
        Sid    = "AWSCloudTrailWrite"
        Effect = "Allow"
        Principal = {
          Service = "cloudtrail.amazonaws.com"
        }
        Action   = "s3:PutObject"
        Resource = "${aws_s3_bucket.central_logs.arn}/*"
        Condition = {
          StringEquals = {
            "s3:x-amz-acl" = "bucket-owner-full-control"
          }
        }
      }
    ]
  })
}

# CloudTrail in each member account
resource "aws_cloudtrail" "member_account" {
  for_each = var.member_accounts
  
  provider = aws.member_accounts[each.key]
  
  name           = "${each.key}-cloudtrail"
  s3_bucket_name = aws_s3_bucket.central_logs.bucket
  s3_key_prefix  = "cloudtrail/${each.key}"
  
  include_global_service_events = true
  is_multi_region_trail        = true
  enable_logging               = true
  
  tags = {
    Account = each.key
    Purpose = "Centralized audit logging"
  }
}

AWS SSO Integration

Integrate with AWS Single Sign-On for centralized access:

# SSO instance (created automatically when SSO is enabled)
data "aws_ssoadmin_instances" "main" {}

# Permission sets for different roles
resource "aws_ssoadmin_permission_set" "admin" {
  name             = "AdministratorAccess"
  description      = "Full administrative access"
  instance_arn     = tolist(data.aws_ssoadmin_instances.main.arns)[0]
  session_duration = "PT2H"  # 2 hours
  
  tags = {
    Purpose = "Administrative access"
  }
}

resource "aws_ssoadmin_managed_policy_attachment" "admin" {
  instance_arn       = tolist(data.aws_ssoadmin_instances.main.arns)[0]
  managed_policy_arn = "arn:aws:iam::aws:policy/AdministratorAccess"
  permission_set_arn = aws_ssoadmin_permission_set.admin.arn
}

# Developer permission set with limited access
resource "aws_ssoadmin_permission_set" "developer" {
  name             = "DeveloperAccess"
  description      = "Developer access with restrictions"
  instance_arn     = tolist(data.aws_ssoadmin_instances.main.arns)[0]
  session_duration = "PT8H"  # 8 hours
}

resource "aws_ssoadmin_permission_set_inline_policy" "developer" {
  inline_policy      = data.aws_iam_policy_document.developer_policy.json
  instance_arn       = tolist(data.aws_ssoadmin_instances.main.arns)[0]
  permission_set_arn = aws_ssoadmin_permission_set.developer.arn
}

data "aws_iam_policy_document" "developer_policy" {
  statement {
    effect = "Allow"
    actions = [
      "ec2:Describe*",
      "s3:ListBucket",
      "s3:GetObject",
      "logs:*",
      "cloudwatch:*"
    ]
    resources = ["*"]
  }
  
  statement {
    effect = "Deny"
    actions = [
      "ec2:TerminateInstances",
      "rds:DeleteDBInstance",
      "s3:DeleteBucket"
    ]
    resources = ["*"]
  }
}

# Account assignments
resource "aws_ssoadmin_account_assignment" "admin_prod" {
  instance_arn       = tolist(data.aws_ssoadmin_instances.main.arns)[0]
  permission_set_arn = aws_ssoadmin_permission_set.admin.arn
  
  principal_id   = var.admin_group_id
  principal_type = "GROUP"
  
  target_id   = aws_organizations_account.accounts["prod_web"].id
  target_type = "AWS_ACCOUNT"
}

Cost Management and Billing

Implement cost controls across accounts:

# Billing alerts for each account
resource "aws_budgets_budget" "account_budget" {
  for_each = var.aws_accounts
  
  provider = aws.member_accounts[each.key]
  
  name         = "${each.key}-monthly-budget"
  budget_type  = "COST"
  limit_amount = each.value.monthly_budget
  limit_unit   = "USD"
  time_unit    = "MONTHLY"
  
  cost_filters = {
    Service = ["Amazon Elastic Compute Cloud - Compute"]
  }
  
  notification {
    comparison_operator        = "GREATER_THAN"
    threshold                 = 80
    threshold_type            = "PERCENTAGE"
    notification_type         = "ACTUAL"
    subscriber_email_addresses = [each.value.billing_email]
  }
  
  notification {
    comparison_operator        = "GREATER_THAN"
    threshold                 = 100
    threshold_type            = "PERCENTAGE"
    notification_type          = "FORECASTED"
    subscriber_email_addresses = [each.value.billing_email]
  }
}

# Cost anomaly detection
resource "aws_ce_anomaly_detector" "account_anomaly" {
  for_each = var.aws_accounts
  
  provider = aws.member_accounts[each.key]
  
  name         = "${each.key}-cost-anomaly-detector"
  monitor_type = "DIMENSIONAL"
  
  specification = jsonencode({
    Dimension = "SERVICE"
    MatchOptions = ["EQUALS"]
    Values = ["EC2-Instance", "RDS"]
  })
}

resource "aws_ce_anomaly_subscription" "account_anomaly" {
  for_each = var.aws_accounts
  
  provider = aws.member_accounts[each.key]
  
  name      = "${each.key}-anomaly-subscription"
  frequency = "DAILY"
  
  monitor_arn_list = [
    aws_ce_anomaly_detector.account_anomaly[each.key].arn
  ]
  
  subscriber {
    type    = "EMAIL"
    address = each.value.billing_email
  }
  
  threshold_expression {
    and {
      dimension {
        key           = "ANOMALY_TOTAL_IMPACT_ABSOLUTE"
        values        = ["100"]
        match_options = ["GREATER_THAN_OR_EQUAL"]
      }
    }
  }
}

Account Baseline Configuration

Apply consistent baseline configuration to all accounts:

# Module for account baseline
module "account_baseline" {
  source = "./modules/account-baseline"
  
  for_each = var.aws_accounts
  
  providers = {
    aws = aws.member_accounts[each.key]
  }
  
  account_name        = each.key
  environment         = each.value.environment
  security_account_id = var.security_account_id
  log_bucket_name     = aws_s3_bucket.central_logs.bucket
  
  # Enable services based on account type
  enable_guardduty    = true
  enable_config       = true
  enable_securityhub  = each.value.environment == "production"
  enable_cloudtrail   = true
  
  # Tagging strategy
  default_tags = {
    Account     = each.key
    Environment = each.value.environment
    Owner       = each.value.owner
    ManagedBy   = "terraform"
  }
}

Share resources across accounts using Resource Access Manager:

# Share VPC subnets across accounts
resource "aws_ram_resource_share" "shared_subnets" {
  provider = aws.shared_services
  
  name                      = "shared-subnets"
  allow_external_principals = false
  
  tags = {
    Purpose = "Share networking resources"
  }
}

resource "aws_ram_resource_association" "shared_subnets" {
  provider = aws.shared_services
  
  for_each = toset(var.shared_subnet_ids)
  
  resource_arn       = "arn:aws:ec2:${var.aws_region}:${var.shared_services_account_id}:subnet/${each.value}"
  resource_share_arn = aws_ram_resource_share.shared_subnets.arn
}

resource "aws_ram_principal_association" "shared_subnets" {
  provider = aws.shared_services
  
  for_each = var.member_account_ids
  
  principal          = each.value
  resource_share_arn = aws_ram_resource_share.shared_subnets.arn
}

What’s Next

Multi-account strategies provide the organizational foundation for enterprise AWS deployments, but managing costs and implementing proper tagging strategies becomes critical as your infrastructure scales.

In the next part, we’ll explore cost optimization techniques, including resource lifecycle management, automated cost controls, and tagging strategies that enable accurate cost allocation and optimization across your AWS infrastructure.

Cost Optimization

AWS costs can spiral out of control quickly without proper governance and optimization strategies. Terraform helps by making cost controls repeatable and enforceable, but you need to understand AWS pricing models, implement proper tagging strategies, and automate resource lifecycle management to keep costs under control.

This part covers the patterns and practices for implementing cost optimization with Terraform, from basic tagging strategies to advanced automation that right-sizes resources and manages their lifecycle.

Comprehensive Tagging Strategy

Consistent tagging is the foundation of cost management and allocation:

# Global tagging strategy
locals {
  # Required tags for all resources
  required_tags = {
    Environment   = var.environment
    Project       = var.project_name
    Owner         = var.team_name
    CostCenter    = var.cost_center
    ManagedBy     = "terraform"
    CreatedDate   = formatdate("YYYY-MM-DD", timestamp())
  }
  
  # Optional tags that can be merged
  optional_tags = {
    Application = var.application_name
    Component   = var.component_name
    Version     = var.application_version
  }
  
  # Combined tags
  common_tags = merge(local.required_tags, local.optional_tags)
}

# Provider-level default tags
provider "aws" {
  region = var.aws_region
  
  default_tags {
    tags = local.required_tags
  }
}

# Resource-specific tagging
resource "aws_instance" "web" {
  ami           = data.aws_ami.amazon_linux.id
  instance_type = var.instance_type
  
  tags = merge(local.common_tags, {
    Name         = "${var.name_prefix}-web-${count.index + 1}"
    Role         = "webserver"
    Backup       = "daily"
    AutoShutdown = var.environment != "production" ? "true" : "false"
  })
}

# Enforce tagging with lifecycle rules
resource "aws_instance" "web" {
  # ... other configuration ...
  
  lifecycle {
    postcondition {
      condition = alltrue([
        for tag in keys(local.required_tags) :
        contains(keys(self.tags), tag)
      ])
      error_message = "All required tags must be present: ${join(", ", keys(local.required_tags))}"
    }
  }
}

Resource Right-Sizing

Implement policies to prevent oversized resources:

# Instance type validation
variable "allowed_instance_types" {
  description = "Allowed EC2 instance types by environment"
  type = map(list(string))
  
  default = {
    dev = [
      "t3.nano", "t3.micro", "t3.small", "t3.medium"
    ]
    staging = [
      "t3.small", "t3.medium", "t3.large",
      "m5.large", "m5.xlarge"
    ]
    production = [
      "t3.medium", "t3.large", "t3.xlarge",
      "m5.large", "m5.xlarge", "m5.2xlarge",
      "c5.large", "c5.xlarge", "c5.2xlarge"
    ]
  }
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.amazon_linux.id
  instance_type = var.instance_type
  
  lifecycle {
    precondition {
      condition = contains(
        var.allowed_instance_types[var.environment],
        var.instance_type
      )
      error_message = "Instance type ${var.instance_type} is not allowed in ${var.environment} environment. Allowed types: ${join(", ", var.allowed_instance_types[var.environment])}"
    }
  }
}

# RDS instance size controls
variable "allowed_db_instance_classes" {
  description = "Allowed RDS instance classes by environment"
  type = map(list(string))
  
  default = {
    dev = [
      "db.t3.micro", "db.t3.small"
    ]
    staging = [
      "db.t3.small", "db.t3.medium", "db.r5.large"
    ]
    production = [
      "db.t3.medium", "db.t3.large",
      "db.r5.large", "db.r5.xlarge", "db.r5.2xlarge"
    ]
  }
}

resource "aws_db_instance" "main" {
  identifier     = "${var.name_prefix}-database"
  engine         = "mysql"
  engine_version = "8.0"
  instance_class = var.db_instance_class
  
  lifecycle {
    precondition {
      condition = contains(
        var.allowed_db_instance_classes[var.environment],
        var.db_instance_class
      )
      error_message = "DB instance class ${var.db_instance_class} is not allowed in ${var.environment} environment."
    }
  }
}

Automated Resource Scheduling

Implement automated start/stop for non-production resources:

# Lambda function for EC2 scheduling
resource "aws_lambda_function" "ec2_scheduler" {
  filename         = "ec2_scheduler.zip"
  function_name    = "${var.name_prefix}-ec2-scheduler"
  role            = aws_iam_role.ec2_scheduler.arn
  handler         = "index.handler"
  runtime         = "python3.9"
  timeout         = 60
  
  environment {
    variables = {
      ENVIRONMENT = var.environment
    }
  }
  
  tags = local.common_tags
}

# IAM role for scheduler
resource "aws_iam_role" "ec2_scheduler" {
  name = "${var.name_prefix}-ec2-scheduler-role"
  
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "lambda.amazonaws.com"
        }
      }
    ]
  })
}

resource "aws_iam_role_policy" "ec2_scheduler" {
  name = "${var.name_prefix}-ec2-scheduler-policy"
  role = aws_iam_role.ec2_scheduler.id
  
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "logs:CreateLogGroup",
          "logs:CreateLogStream",
          "logs:PutLogEvents"
        ]
        Resource = "arn:aws:logs:*:*:*"
      },
      {
        Effect = "Allow"
        Action = [
          "ec2:DescribeInstances",
          "ec2:StartInstances",
          "ec2:StopInstances"
        ]
        Resource = "*"
      }
    ]
  })
}

# CloudWatch Events for scheduling
resource "aws_cloudwatch_event_rule" "stop_instances" {
  name                = "${var.name_prefix}-stop-instances"
  description         = "Stop non-production instances at 6 PM"
  schedule_expression = "cron(0 18 ? * MON-FRI *)"
  
  tags = local.common_tags
}

resource "aws_cloudwatch_event_rule" "start_instances" {
  name                = "${var.name_prefix}-start-instances"
  description         = "Start non-production instances at 8 AM"
  schedule_expression = "cron(0 8 ? * MON-FRI *)"
  
  tags = local.common_tags
}

resource "aws_cloudwatch_event_target" "stop_instances" {
  rule      = aws_cloudwatch_event_rule.stop_instances.name
  target_id = "StopInstancesTarget"
  arn       = aws_lambda_function.ec2_scheduler.arn
  
  input = jsonencode({
    action = "stop"
  })
}

resource "aws_cloudwatch_event_target" "start_instances" {
  rule      = aws_cloudwatch_event_rule.start_instances.name
  target_id = "StartInstancesTarget"
  arn       = aws_lambda_function.ec2_scheduler.arn
  
  input = jsonencode({
    action = "start"
  })
}

resource "aws_lambda_permission" "allow_cloudwatch_stop" {
  statement_id  = "AllowExecutionFromCloudWatchStop"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.ec2_scheduler.function_name
  principal     = "events.amazonaws.com"
  source_arn    = aws_cloudwatch_event_rule.stop_instances.arn
}

resource "aws_lambda_permission" "allow_cloudwatch_start" {
  statement_id  = "AllowExecutionFromCloudWatchStart"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.ec2_scheduler.function_name
  principal     = "events.amazonaws.com"
  source_arn    = aws_cloudwatch_event_rule.start_instances.arn
}

Storage Lifecycle Management

Implement intelligent tiering and lifecycle policies:

# S3 bucket with intelligent tiering
resource "aws_s3_bucket" "data_storage" {
  bucket = "${var.name_prefix}-data-storage"
  
  tags = local.common_tags
}

resource "aws_s3_bucket_intelligent_tiering_configuration" "data_storage" {
  bucket = aws_s3_bucket.data_storage.id
  name   = "EntireBucket"
  
  status = "Enabled"
  
  filter {
    prefix = ""
  }
  
  tiering {
    access_tier = "DEEP_ARCHIVE_ACCESS"
    days        = 180
  }
  
  tiering {
    access_tier = "ARCHIVE_ACCESS"
    days        = 125
  }
}

# Lifecycle configuration for different data types
resource "aws_s3_bucket_lifecycle_configuration" "data_storage" {
  bucket = aws_s3_bucket.data_storage.id
  
  rule {
    id     = "logs_lifecycle"
    status = "Enabled"
    
    filter {
      prefix = "logs/"
    }
    
    transition {
      days          = 30
      storage_class = "STANDARD_IA"
    }
    
    transition {
      days          = 90
      storage_class = "GLACIER"
    }
    
    transition {
      days          = 365
      storage_class = "DEEP_ARCHIVE"
    }
    
    expiration {
      days = 2555  # 7 years
    }
  }
  
  rule {
    id     = "temp_data_cleanup"
    status = "Enabled"
    
    filter {
      prefix = "temp/"
    }
    
    expiration {
      days = 7
    }
  }
  
  rule {
    id     = "incomplete_multipart_uploads"
    status = "Enabled"
    
    abort_incomplete_multipart_upload {
      days_after_initiation = 1
    }
  }
}

# EBS volume optimization
resource "aws_ebs_volume" "data" {
  availability_zone = var.availability_zone
  size              = var.volume_size
  type              = var.environment == "production" ? "gp3" : "gp2"
  encrypted         = true
  
  # Use gp3 for better cost/performance in production
  dynamic "throughput" {
    for_each = var.environment == "production" ? [1] : []
    content {
      throughput = 125  # Baseline throughput for gp3
    }
  }
  
  dynamic "iops" {
    for_each = var.environment == "production" ? [1] : []
    content {
      iops = 3000  # Baseline IOPS for gp3
    }
  }
  
  tags = merge(local.common_tags, {
    Name = "${var.name_prefix}-data-volume"
    Type = "data"
  })
}

Reserved Instance and Savings Plans Management

Track and manage reserved capacity:

# Data source to check existing reserved instances
data "aws_ec2_reserved_instances" "existing" {
  filter {
    name   = "state"
    values = ["active"]
  }
}

# Local calculation for RI coverage
locals {
  # Calculate running instances by type
  running_instances = {
    for instance_type, count in var.instance_counts :
    instance_type => count
  }
  
  # Calculate RI coverage
  ri_coverage = {
    for ri in data.aws_ec2_reserved_instances.existing.reserved_instances :
    ri.instance_type => ri.instance_count
  }
  
  # Identify gaps in RI coverage
  ri_gaps = {
    for instance_type, running_count in local.running_instances :
    instance_type => max(0, running_count - lookup(local.ri_coverage, instance_type, 0))
  }
}

# Output RI recommendations
output "ri_recommendations" {
  description = "Reserved Instance purchase recommendations"
  value = {
    for instance_type, gap in local.ri_gaps :
    instance_type => {
      running_instances = local.running_instances[instance_type]
      reserved_instances = lookup(local.ri_coverage, instance_type, 0)
      recommended_purchase = gap
    }
    if gap > 0
  }
}

Cost Monitoring and Alerting

Set up comprehensive cost monitoring:

# Budget for overall account spending
resource "aws_budgets_budget" "monthly_budget" {
  name         = "${var.name_prefix}-monthly-budget"
  budget_type  = "COST"
  limit_amount = var.monthly_budget_limit
  limit_unit   = "USD"
  time_unit    = "MONTHLY"
  
  cost_filters = {
    LinkedAccount = [data.aws_caller_identity.current.account_id]
  }
  
  notification {
    comparison_operator        = "GREATER_THAN"
    threshold                 = 80
    threshold_type            = "PERCENTAGE"
    notification_type         = "ACTUAL"
    subscriber_email_addresses = var.budget_notification_emails
  }
  
  notification {
    comparison_operator        = "GREATER_THAN"
    threshold                 = 100
    threshold_type            = "PERCENTAGE"
    notification_type          = "FORECASTED"
    subscriber_email_addresses = var.budget_notification_emails
  }
}

# Service-specific budgets
resource "aws_budgets_budget" "service_budgets" {
  for_each = var.service_budgets
  
  name         = "${var.name_prefix}-${each.key}-budget"
  budget_type  = "COST"
  limit_amount = each.value.limit
  limit_unit   = "USD"
  time_unit    = "MONTHLY"
  
  cost_filters = {
    Service = [each.value.service_name]
  }
  
  notification {
    comparison_operator        = "GREATER_THAN"
    threshold                 = each.value.threshold
    threshold_type            = "PERCENTAGE"
    notification_type         = "ACTUAL"
    subscriber_email_addresses = var.budget_notification_emails
  }
}

# Cost anomaly detection
resource "aws_ce_anomaly_detector" "service_anomaly" {
  name         = "${var.name_prefix}-service-anomaly-detector"
  monitor_type = "DIMENSIONAL"
  
  specification = jsonencode({
    Dimension = "SERVICE"
    MatchOptions = ["EQUALS"]
    Values = ["Amazon Elastic Compute Cloud - Compute", "Amazon Relational Database Service"]
  })
  
  tags = local.common_tags
}

resource "aws_ce_anomaly_subscription" "service_anomaly" {
  name      = "${var.name_prefix}-anomaly-subscription"
  frequency = "DAILY"
  
  monitor_arn_list = [
    aws_ce_anomaly_detector.service_anomaly.arn
  ]
  
  subscriber {
    type    = "EMAIL"
    address = var.cost_anomaly_email
  }
  
  threshold_expression {
    and {
      dimension {
        key           = "ANOMALY_TOTAL_IMPACT_ABSOLUTE"
        values        = ["50"]
        match_options = ["GREATER_THAN_OR_EQUAL"]
      }
    }
  }
  
  tags = local.common_tags
}

Spot Instance Integration

Use Spot instances for cost-effective compute:

# Launch template for Spot instances
resource "aws_launch_template" "spot_template" {
  name_prefix   = "${var.name_prefix}-spot-"
  image_id      = data.aws_ami.amazon_linux.id
  instance_type = var.spot_instance_type
  
  vpc_security_group_ids = [aws_security_group.web.id]
  
  iam_instance_profile {
    name = aws_iam_instance_profile.app_server.name
  }
  
  user_data = base64encode(templatefile("${path.module}/user_data.sh", {
    environment = var.environment
  }))
  
  tag_specifications {
    resource_type = "instance"
    tags = merge(local.common_tags, {
      Name = "${var.name_prefix}-spot-instance"
      Type = "spot"
    })
  }
}

# Auto Scaling Group with mixed instances
resource "aws_autoscaling_group" "mixed_instances" {
  name                = "${var.name_prefix}-mixed-asg"
  vpc_zone_identifier = var.private_subnet_ids
  target_group_arns   = [aws_lb_target_group.web.arn]
  health_check_type   = "ELB"
  
  min_size         = var.min_size
  max_size         = var.max_size
  desired_capacity = var.desired_capacity
  
  mixed_instances_policy {
    launch_template {
      launch_template_specification {
        launch_template_id = aws_launch_template.spot_template.id
        version           = "$Latest"
      }
      
      override {
        instance_type     = "t3.medium"
        weighted_capacity = "1"
      }
      
      override {
        instance_type     = "t3.large"
        weighted_capacity = "2"
      }
    }
    
    instances_distribution {
      on_demand_base_capacity                  = 1
      on_demand_percentage_above_base_capacity = 25
      spot_allocation_strategy                 = "capacity-optimized"
      spot_instance_pools                      = 2
      spot_max_price                          = var.spot_max_price
    }
  }
  
  tag {
    key                 = "Name"
    value               = "${var.name_prefix}-mixed-asg"
    propagate_at_launch = true
  }
  
  dynamic "tag" {
    for_each = local.common_tags
    content {
      key                 = tag.key
      value               = tag.value
      propagate_at_launch = true
    }
  }
}

Resource Cleanup Automation

Automate cleanup of unused resources:

# Lambda function for resource cleanup
resource "aws_lambda_function" "resource_cleanup" {
  filename         = "resource_cleanup.zip"
  function_name    = "${var.name_prefix}-resource-cleanup"
  role            = aws_iam_role.resource_cleanup.arn
  handler         = "index.handler"
  runtime         = "python3.9"
  timeout         = 300
  
  environment {
    variables = {
      ENVIRONMENT = var.environment
      DRY_RUN     = var.cleanup_dry_run
    }
  }
  
  tags = local.common_tags
}

# Schedule cleanup to run weekly
resource "aws_cloudwatch_event_rule" "resource_cleanup" {
  name                = "${var.name_prefix}-resource-cleanup"
  description         = "Weekly resource cleanup"
  schedule_expression = "cron(0 2 ? * SUN *)"  # 2 AM every Sunday
  
  tags = local.common_tags
}

resource "aws_cloudwatch_event_target" "resource_cleanup" {
  rule      = aws_cloudwatch_event_rule.resource_cleanup.name
  target_id = "ResourceCleanupTarget"
  arn       = aws_lambda_function.resource_cleanup.arn
}

resource "aws_lambda_permission" "allow_cloudwatch_cleanup" {
  statement_id  = "AllowExecutionFromCloudWatch"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.resource_cleanup.function_name
  principal     = "events.amazonaws.com"
  source_arn    = aws_cloudwatch_event_rule.resource_cleanup.arn
}

What’s Next

Cost optimization provides the financial discipline needed for sustainable AWS operations, but implementing reusable patterns and modules is what makes these optimizations scalable across your organization.

In the next part, we’ll explore AWS-specific module patterns that encapsulate these cost optimization strategies along with security and operational best practices, creating reusable building blocks for your infrastructure.

AWS-Specific Modules

Creating reusable modules for AWS infrastructure patterns accelerates development and ensures consistency across projects. However, AWS-specific modules need to handle the complexity of AWS services, regional differences, and the various configuration options that make AWS both powerful and complicated.

This part covers patterns for building robust, reusable AWS modules that encapsulate best practices while remaining flexible enough for different use cases.

VPC Module with Best Practices

A comprehensive VPC module that handles common networking patterns:

# modules/aws-vpc/variables.tf
variable "name" {
  description = "Name prefix for all resources"
  type        = string
}

variable "cidr_block" {
  description = "CIDR block for the VPC"
  type        = string
  default     = "10.0.0.0/16"
  
  validation {
    condition     = can(cidrhost(var.cidr_block, 0))
    error_message = "Must be a valid CIDR block."
  }
}

variable "availability_zones" {
  description = "List of availability zones"
  type        = list(string)
  default     = []
}

variable "enable_nat_gateway" {
  description = "Enable NAT Gateway for private subnets"
  type        = bool
  default     = true
}

variable "single_nat_gateway" {
  description = "Use a single NAT Gateway for all private subnets"
  type        = bool
  default     = false
}

variable "enable_vpn_gateway" {
  description = "Enable VPN Gateway"
  type        = bool
  default     = false
}

variable "enable_dns_hostnames" {
  description = "Enable DNS hostnames in the VPC"
  type        = bool
  default     = true
}

variable "enable_dns_support" {
  description = "Enable DNS support in the VPC"
  type        = bool
  default     = true
}

variable "tags" {
  description = "Additional tags for all resources"
  type        = map(string)
  default     = {}
}

# modules/aws-vpc/main.tf
data "aws_availability_zones" "available" {
  state = "available"
}

locals {
  # Use provided AZs or default to first 3 available
  azs = length(var.availability_zones) > 0 ? var.availability_zones : slice(data.aws_availability_zones.available.names, 0, 3)
  
  # Calculate subnet CIDRs
  public_subnet_cidrs = [
    for i, az in local.azs :
    cidrsubnet(var.cidr_block, 8, i + 1)
  ]
  
  private_subnet_cidrs = [
    for i, az in local.azs :
    cidrsubnet(var.cidr_block, 8, i + 11)
  ]
  
  database_subnet_cidrs = [
    for i, az in local.azs :
    cidrsubnet(var.cidr_block, 8, i + 21)
  ]
  
  common_tags = merge(var.tags, {
    ManagedBy = "terraform"
  })
}

# VPC
resource "aws_vpc" "main" {
  cidr_block           = var.cidr_block
  enable_dns_hostnames = var.enable_dns_hostnames
  enable_dns_support   = var.enable_dns_support
  
  tags = merge(local.common_tags, {
    Name = "${var.name}-vpc"
  })
}

# Internet Gateway
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
  
  tags = merge(local.common_tags, {
    Name = "${var.name}-igw"
  })
}

# Public Subnets
resource "aws_subnet" "public" {
  count = length(local.azs)
  
  vpc_id                  = aws_vpc.main.id
  cidr_block              = local.public_subnet_cidrs[count.index]
  availability_zone       = local.azs[count.index]
  map_public_ip_on_launch = true
  
  tags = merge(local.common_tags, {
    Name = "${var.name}-public-${count.index + 1}"
    Type = "public"
    Tier = "public"
  })
}

# Private Subnets
resource "aws_subnet" "private" {
  count = length(local.azs)
  
  vpc_id            = aws_vpc.main.id
  cidr_block        = local.private_subnet_cidrs[count.index]
  availability_zone = local.azs[count.index]
  
  tags = merge(local.common_tags, {
    Name = "${var.name}-private-${count.index + 1}"
    Type = "private"
    Tier = "application"
  })
}

# Database Subnets
resource "aws_subnet" "database" {
  count = length(local.azs)
  
  vpc_id            = aws_vpc.main.id
  cidr_block        = local.database_subnet_cidrs[count.index]
  availability_zone = local.azs[count.index]
  
  tags = merge(local.common_tags, {
    Name = "${var.name}-database-${count.index + 1}"
    Type = "private"
    Tier = "database"
  })
}

# NAT Gateways
resource "aws_eip" "nat" {
  count = var.enable_nat_gateway ? (var.single_nat_gateway ? 1 : length(local.azs)) : 0
  
  domain = "vpc"
  
  depends_on = [aws_internet_gateway.main]
  
  tags = merge(local.common_tags, {
    Name = "${var.name}-nat-eip-${count.index + 1}"
  })
}

resource "aws_nat_gateway" "main" {
  count = var.enable_nat_gateway ? (var.single_nat_gateway ? 1 : length(local.azs)) : 0
  
  allocation_id = aws_eip.nat[count.index].id
  subnet_id     = aws_subnet.public[count.index].id
  
  tags = merge(local.common_tags, {
    Name = "${var.name}-nat-${count.index + 1}"
  })
  
  depends_on = [aws_internet_gateway.main]
}

# Route Tables
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id
  
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }
  
  tags = merge(local.common_tags, {
    Name = "${var.name}-public-rt"
    Type = "public"
  })
}

resource "aws_route_table" "private" {
  count = var.enable_nat_gateway ? length(local.azs) : 1
  
  vpc_id = aws_vpc.main.id
  
  dynamic "route" {
    for_each = var.enable_nat_gateway ? [1] : []
    content {
      cidr_block     = "0.0.0.0/0"
      nat_gateway_id = var.single_nat_gateway ? aws_nat_gateway.main[0].id : aws_nat_gateway.main[count.index].id
    }
  }
  
  tags = merge(local.common_tags, {
    Name = "${var.name}-private-rt-${count.index + 1}"
    Type = "private"
  })
}

# Route Table Associations
resource "aws_route_table_association" "public" {
  count = length(aws_subnet.public)
  
  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

resource "aws_route_table_association" "private" {
  count = length(aws_subnet.private)
  
  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = var.enable_nat_gateway ? aws_route_table.private[count.index].id : aws_route_table.private[0].id
}

# VPN Gateway (optional)
resource "aws_vpn_gateway" "main" {
  count = var.enable_vpn_gateway ? 1 : 0
  
  vpc_id = aws_vpc.main.id
  
  tags = merge(local.common_tags, {
    Name = "${var.name}-vpn-gateway"
  })
}

# modules/aws-vpc/outputs.tf
output "vpc_id" {
  description = "ID of the VPC"
  value       = aws_vpc.main.id
}

output "vpc_cidr_block" {
  description = "CIDR block of the VPC"
  value       = aws_vpc.main.cidr_block
}

output "public_subnet_ids" {
  description = "IDs of the public subnets"
  value       = aws_subnet.public[*].id
}

output "private_subnet_ids" {
  description = "IDs of the private subnets"
  value       = aws_subnet.private[*].id
}

output "database_subnet_ids" {
  description = "IDs of the database subnets"
  value       = aws_subnet.database[*].id
}

output "internet_gateway_id" {
  description = "ID of the Internet Gateway"
  value       = aws_internet_gateway.main.id
}

output "nat_gateway_ids" {
  description = "IDs of the NAT Gateways"
  value       = aws_nat_gateway.main[*].id
}

output "availability_zones" {
  description = "List of availability zones used"
  value       = local.azs
}

Application Load Balancer Module

A comprehensive ALB module with security best practices:

# modules/aws-alb/variables.tf
variable "name" {
  description = "Name for the load balancer"
  type        = string
}

variable "vpc_id" {
  description = "VPC ID where the load balancer will be created"
  type        = string
}

variable "subnet_ids" {
  description = "List of subnet IDs for the load balancer"
  type        = list(string)
}

variable "certificate_arn" {
  description = "ARN of the SSL certificate"
  type        = string
  default     = null
}

variable "enable_deletion_protection" {
  description = "Enable deletion protection"
  type        = bool
  default     = true
}

variable "idle_timeout" {
  description = "Connection idle timeout in seconds"
  type        = number
  default     = 60
}

variable "enable_http2" {
  description = "Enable HTTP/2"
  type        = bool
  default     = true
}

variable "ip_address_type" {
  description = "IP address type (ipv4 or dualstack)"
  type        = string
  default     = "ipv4"
}

variable "target_groups" {
  description = "Map of target group configurations"
  type = map(object({
    port                 = number
    protocol             = string
    target_type          = string
    health_check_path    = string
    health_check_matcher = string
    health_check_timeout = number
    health_check_interval = number
    healthy_threshold    = number
    unhealthy_threshold  = number
  }))
  default = {}
}

variable "listeners" {
  description = "Map of listener configurations"
  type = map(object({
    port            = number
    protocol        = string
    certificate_arn = string
    ssl_policy      = string
    default_action  = object({
      type               = string
      target_group_name  = string
      redirect_config    = object({
        status_code = string
        protocol    = string
        port        = string
      })
    })
  }))
  default = {}
}

variable "tags" {
  description = "Additional tags"
  type        = map(string)
  default     = {}
}

# modules/aws-alb/main.tf
locals {
  common_tags = merge(var.tags, {
    ManagedBy = "terraform"
  })
}

# Security Group for ALB
resource "aws_security_group" "alb" {
  name_prefix = "${var.name}-alb-"
  vpc_id      = var.vpc_id
  description = "Security group for ${var.name} ALB"
  
  ingress {
    description = "HTTP"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  ingress {
    description = "HTTPS"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  egress {
    description = "All outbound traffic"
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  tags = merge(local.common_tags, {
    Name = "${var.name}-alb-sg"
  })
  
  lifecycle {
    create_before_destroy = true
  }
}

# Application Load Balancer
resource "aws_lb" "main" {
  name               = var.name
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = var.subnet_ids
  
  enable_deletion_protection = var.enable_deletion_protection
  idle_timeout              = var.idle_timeout
  enable_http2              = var.enable_http2
  ip_address_type           = var.ip_address_type
  
  access_logs {
    bucket  = aws_s3_bucket.alb_logs.bucket
    prefix  = "alb-logs"
    enabled = true
  }
  
  tags = merge(local.common_tags, {
    Name = var.name
  })
}

# S3 bucket for ALB access logs
resource "aws_s3_bucket" "alb_logs" {
  bucket        = "${var.name}-alb-logs-${random_id.bucket_suffix.hex}"
  force_destroy = true
  
  tags = local.common_tags
}

resource "random_id" "bucket_suffix" {
  byte_length = 4
}

resource "aws_s3_bucket_lifecycle_configuration" "alb_logs" {
  bucket = aws_s3_bucket.alb_logs.id
  
  rule {
    id     = "delete_old_logs"
    status = "Enabled"
    
    expiration {
      days = 90
    }
  }
}

resource "aws_s3_bucket_policy" "alb_logs" {
  bucket = aws_s3_bucket.alb_logs.id
  
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = {
          AWS = data.aws_elb_service_account.main.arn
        }
        Action   = "s3:PutObject"
        Resource = "${aws_s3_bucket.alb_logs.arn}/*"
      },
      {
        Effect = "Allow"
        Principal = {
          Service = "delivery.logs.amazonaws.com"
        }
        Action   = "s3:PutObject"
        Resource = "${aws_s3_bucket.alb_logs.arn}/*"
        Condition = {
          StringEquals = {
            "s3:x-amz-acl" = "bucket-owner-full-control"
          }
        }
      }
    ]
  })
}

data "aws_elb_service_account" "main" {}

# Target Groups
resource "aws_lb_target_group" "main" {
  for_each = var.target_groups
  
  name     = "${var.name}-${each.key}"
  port     = each.value.port
  protocol = each.value.protocol
  vpc_id   = var.vpc_id
  
  target_type = each.value.target_type
  
  health_check {
    enabled             = true
    healthy_threshold   = each.value.healthy_threshold
    unhealthy_threshold = each.value.unhealthy_threshold
    timeout             = each.value.health_check_timeout
    interval            = each.value.health_check_interval
    path                = each.value.health_check_path
    matcher             = each.value.health_check_matcher
    port                = "traffic-port"
    protocol            = each.value.protocol
  }
  
  tags = merge(local.common_tags, {
    Name = "${var.name}-${each.key}-tg"
  })
}

# Listeners
resource "aws_lb_listener" "main" {
  for_each = var.listeners
  
  load_balancer_arn = aws_lb.main.arn
  port              = each.value.port
  protocol          = each.value.protocol
  
  certificate_arn = each.value.certificate_arn
  ssl_policy      = each.value.ssl_policy
  
  default_action {
    type = each.value.default_action.type
    
    dynamic "target_group_arn" {
      for_each = each.value.default_action.type == "forward" ? [1] : []
      content {
        target_group_arn = aws_lb_target_group.main[each.value.default_action.target_group_name].arn
      }
    }
    
    dynamic "redirect" {
      for_each = each.value.default_action.type == "redirect" ? [1] : []
      content {
        port        = each.value.default_action.redirect_config.port
        protocol    = each.value.default_action.redirect_config.protocol
        status_code = each.value.default_action.redirect_config.status_code
      }
    }
  }
  
  tags = local.common_tags
}

RDS Module with High Availability

A production-ready RDS module with backup and monitoring:

# modules/aws-rds/main.tf
resource "aws_db_subnet_group" "main" {
  name       = "${var.name}-db-subnet-group"
  subnet_ids = var.subnet_ids
  
  tags = merge(var.tags, {
    Name = "${var.name}-db-subnet-group"
  })
}

resource "aws_security_group" "rds" {
  name_prefix = "${var.name}-rds-"
  vpc_id      = var.vpc_id
  description = "Security group for ${var.name} RDS instance"
  
  ingress {
    description     = "Database access"
    from_port       = var.port
    to_port         = var.port
    protocol        = "tcp"
    security_groups = var.allowed_security_groups
  }
  
  tags = merge(var.tags, {
    Name = "${var.name}-rds-sg"
  })
  
  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_db_instance" "main" {
  identifier = var.name
  
  # Engine configuration
  engine         = var.engine
  engine_version = var.engine_version
  instance_class = var.instance_class
  
  # Storage configuration
  allocated_storage     = var.allocated_storage
  max_allocated_storage = var.max_allocated_storage
  storage_type          = var.storage_type
  storage_encrypted     = var.storage_encrypted
  kms_key_id           = var.kms_key_id
  
  # Database configuration
  db_name  = var.database_name
  username = var.username
  password = var.password
  port     = var.port
  
  # Network configuration
  db_subnet_group_name   = aws_db_subnet_group.main.name
  vpc_security_group_ids = [aws_security_group.rds.id]
  publicly_accessible    = var.publicly_accessible
  
  # Backup configuration
  backup_retention_period = var.backup_retention_period
  backup_window          = var.backup_window
  maintenance_window     = var.maintenance_window
  
  # High availability
  multi_az = var.multi_az
  
  # Monitoring
  monitoring_interval = var.monitoring_interval
  monitoring_role_arn = var.monitoring_interval > 0 ? aws_iam_role.rds_monitoring[0].arn : null
  
  # Performance Insights
  performance_insights_enabled = var.performance_insights_enabled
  performance_insights_kms_key_id = var.performance_insights_kms_key_id
  performance_insights_retention_period = var.performance_insights_retention_period
  
  # Deletion protection
  deletion_protection = var.deletion_protection
  skip_final_snapshot = var.skip_final_snapshot
  final_snapshot_identifier = var.skip_final_snapshot ? null : "${var.name}-final-snapshot-${formatdate("YYYY-MM-DD-hhmm", timestamp())}"
  
  tags = merge(var.tags, {
    Name = var.name
  })
}

# IAM role for enhanced monitoring
resource "aws_iam_role" "rds_monitoring" {
  count = var.monitoring_interval > 0 ? 1 : 0
  
  name = "${var.name}-rds-monitoring-role"
  
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "monitoring.rds.amazonaws.com"
        }
      }
    ]
  })
  
  tags = var.tags
}

resource "aws_iam_role_policy_attachment" "rds_monitoring" {
  count = var.monitoring_interval > 0 ? 1 : 0
  
  role       = aws_iam_role.rds_monitoring[0].name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonRDSEnhancedMonitoringRole"
}

# CloudWatch alarms
resource "aws_cloudwatch_metric_alarm" "database_cpu" {
  alarm_name          = "${var.name}-database-cpu-utilization"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "CPUUtilization"
  namespace           = "AWS/RDS"
  period              = "300"
  statistic           = "Average"
  threshold           = "80"
  alarm_description   = "This metric monitors RDS CPU utilization"
  
  dimensions = {
    DBInstanceIdentifier = aws_db_instance.main.id
  }
  
  alarm_actions = var.alarm_actions
  
  tags = var.tags
}

resource "aws_cloudwatch_metric_alarm" "database_connections" {
  alarm_name          = "${var.name}-database-connection-count"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "DatabaseConnections"
  namespace           = "AWS/RDS"
  period              = "300"
  statistic           = "Average"
  threshold           = var.max_connections_threshold
  alarm_description   = "This metric monitors RDS connection count"
  
  dimensions = {
    DBInstanceIdentifier = aws_db_instance.main.id
  }
  
  alarm_actions = var.alarm_actions
  
  tags = var.tags
}

ECS Fargate Module

A complete ECS Fargate module for containerized applications:

# modules/aws-ecs-fargate/main.tf
resource "aws_ecs_cluster" "main" {
  name = var.cluster_name
  
  configuration {
    execute_command_configuration {
      kms_key_id = var.kms_key_id
      logging    = "OVERRIDE"
      
      log_configuration {
        cloud_watch_encryption_enabled = true
        cloud_watch_log_group_name     = aws_cloudwatch_log_group.ecs_exec.name
      }
    }
  }
  
  setting {
    name  = "containerInsights"
    value = var.enable_container_insights ? "enabled" : "disabled"
  }
  
  tags = var.tags
}

resource "aws_ecs_cluster_capacity_providers" "main" {
  cluster_name = aws_ecs_cluster.main.name
  
  capacity_providers = ["FARGATE", "FARGATE_SPOT"]
  
  default_capacity_provider_strategy {
    base              = 1
    weight            = 100
    capacity_provider = "FARGATE"
  }
}

# Task Definition
resource "aws_ecs_task_definition" "main" {
  family                   = var.service_name
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = var.cpu
  memory                   = var.memory
  execution_role_arn       = aws_iam_role.ecs_execution.arn
  task_role_arn           = aws_iam_role.ecs_task.arn
  
  container_definitions = jsonencode([
    {
      name  = var.container_name
      image = var.container_image
      
      portMappings = [
        {
          containerPort = var.container_port
          protocol      = "tcp"
        }
      ]
      
      environment = [
        for key, value in var.environment_variables : {
          name  = key
          value = value
        }
      ]
      
      secrets = [
        for key, value in var.secrets : {
          name      = key
          valueFrom = value
        }
      ]
      
      logConfiguration = {
        logDriver = "awslogs"
        options = {
          awslogs-group         = aws_cloudwatch_log_group.app.name
          awslogs-region        = data.aws_region.current.name
          awslogs-stream-prefix = "ecs"
        }
      }
      
      healthCheck = var.health_check_command != null ? {
        command     = var.health_check_command
        interval    = 30
        timeout     = 5
        retries     = 3
        startPeriod = 60
      } : null
      
      essential = true
    }
  ])
  
  tags = var.tags
}

# ECS Service
resource "aws_ecs_service" "main" {
  name            = var.service_name
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.main.arn
  desired_count   = var.desired_count
  
  capacity_provider_strategy {
    capacity_provider = "FARGATE"
    weight           = var.fargate_weight
    base             = var.fargate_base
  }
  
  capacity_provider_strategy {
    capacity_provider = "FARGATE_SPOT"
    weight           = var.fargate_spot_weight
  }
  
  network_configuration {
    security_groups  = concat([aws_security_group.ecs_service.id], var.additional_security_groups)
    subnets          = var.subnet_ids
    assign_public_ip = var.assign_public_ip
  }
  
  dynamic "load_balancer" {
    for_each = var.target_group_arn != null ? [1] : []
    content {
      target_group_arn = var.target_group_arn
      container_name   = var.container_name
      container_port   = var.container_port
    }
  }
  
  deployment_configuration {
    maximum_percent         = 200
    minimum_healthy_percent = 100
  }
  
  enable_execute_command = var.enable_execute_command
  
  tags = var.tags
  
  depends_on = [
    aws_iam_role_policy_attachment.ecs_execution,
    aws_cloudwatch_log_group.app
  ]
}

# Auto Scaling
resource "aws_appautoscaling_target" "ecs_target" {
  count = var.enable_autoscaling ? 1 : 0
  
  max_capacity       = var.max_capacity
  min_capacity       = var.min_capacity
  resource_id        = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.main.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

resource "aws_appautoscaling_policy" "ecs_policy_cpu" {
  count = var.enable_autoscaling ? 1 : 0
  
  name               = "${var.service_name}-cpu-scaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.ecs_target[0].resource_id
  scalable_dimension = aws_appautoscaling_target.ecs_target[0].scalable_dimension
  service_namespace  = aws_appautoscaling_target.ecs_target[0].service_namespace
  
  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }
    target_value = var.cpu_target_value
  }
}

What’s Next

AWS-specific modules provide the building blocks for consistent, well-architected infrastructure, but monitoring and maintaining that infrastructure requires comprehensive observability and compliance automation.

In the next part, we’ll explore monitoring and compliance patterns that provide visibility into your AWS infrastructure, automate compliance checks, and integrate with AWS native monitoring services.

Monitoring and Compliance

Effective monitoring and compliance automation are essential for maintaining reliable, secure AWS infrastructure at scale. Terraform enables you to implement comprehensive observability and compliance controls as code, ensuring consistent monitoring across all your resources and automated compliance validation.

This part covers patterns for implementing monitoring, logging, alerting, and compliance automation using AWS native services and Terraform.

CloudWatch Monitoring Foundation

Establish comprehensive CloudWatch monitoring for all critical resources:

# CloudWatch Log Groups with proper retention
resource "aws_cloudwatch_log_group" "application_logs" {
  for_each = var.log_groups
  
  name              = "/aws/${each.key}/${var.application_name}"
  retention_in_days = each.value.retention_days
  kms_key_id        = var.log_encryption_key_id
  
  tags = merge(var.common_tags, {
    LogType     = each.value.log_type
    Application = var.application_name
  })
}

# Custom CloudWatch Metrics
resource "aws_cloudwatch_metric_alarm" "application_errors" {
  for_each = var.error_alarms
  
  alarm_name          = "${var.application_name}-${each.key}-errors"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = each.value.evaluation_periods
  metric_name         = each.value.metric_name
  namespace           = each.value.namespace
  period              = each.value.period
  statistic           = each.value.statistic
  threshold           = each.value.threshold
  alarm_description   = "High error rate for ${each.key}"
  treat_missing_data  = "notBreaching"
  
  dimensions = each.value.dimensions
  
  alarm_actions = [
    aws_sns_topic.alerts.arn
  ]
  
  ok_actions = [
    aws_sns_topic.alerts.arn
  ]
  
  tags = var.common_tags
}

# Composite Alarms for complex conditions
resource "aws_cloudwatch_composite_alarm" "application_health" {
  alarm_name        = "${var.application_name}-overall-health"
  alarm_description = "Overall application health based on multiple metrics"
  
  alarm_rule = join(" OR ", [
    for alarm in aws_cloudwatch_metric_alarm.application_errors :
    "ALARM(${alarm.alarm_name})"
  ])
  
  actions_enabled = true
  alarm_actions = [
    aws_sns_topic.critical_alerts.arn
  ]
  
  ok_actions = [
    aws_sns_topic.critical_alerts.arn
  ]
  
  tags = var.common_tags
}

# CloudWatch Dashboard
resource "aws_cloudwatch_dashboard" "application" {
  dashboard_name = "${var.application_name}-dashboard"
  
  dashboard_body = jsonencode({
    widgets = [
      {
        type   = "metric"
        x      = 0
        y      = 0
        width  = 12
        height = 6
        
        properties = {
          metrics = [
            ["AWS/ApplicationELB", "RequestCount", "LoadBalancer", var.load_balancer_arn_suffix],
            [".", "TargetResponseTime", ".", "."],
            [".", "HTTPCode_Target_2XX_Count", ".", "."],
            [".", "HTTPCode_Target_4XX_Count", ".", "."],
            [".", "HTTPCode_Target_5XX_Count", ".", "."]
          ]
          view    = "timeSeries"
          stacked = false
          region  = data.aws_region.current.name
          title   = "Load Balancer Metrics"
          period  = 300
        }
      },
      {
        type   = "log"
        x      = 0
        y      = 6
        width  = 24
        height = 6
        
        properties = {
          query   = "SOURCE '${aws_cloudwatch_log_group.application_logs["app"].name}' | fields @timestamp, @message | filter @message like /ERROR/ | sort @timestamp desc | limit 100"
          region  = data.aws_region.current.name
          title   = "Recent Errors"
        }
      }
    ]
  })
}

AWS Config for Compliance

Implement AWS Config for continuous compliance monitoring:

# Config Configuration Recorder
resource "aws_config_configuration_recorder" "main" {
  name     = "${var.organization_name}-config-recorder"
  role_arn = aws_iam_role.config.arn
  
  recording_group {
    all_supported                 = true
    include_global_resource_types = true
    
    exclusion_by_resource_types {
      resource_types = var.config_excluded_resource_types
    }
  }
  
  depends_on = [aws_config_delivery_channel.main]
}

# Config Delivery Channel
resource "aws_config_delivery_channel" "main" {
  name           = "${var.organization_name}-config-delivery"
  s3_bucket_name = aws_s3_bucket.config_logs.bucket
  s3_key_prefix  = "config"
  
  snapshot_delivery_properties {
    delivery_frequency = "TwentyFour_Hours"
  }
}

# Config Rules for Compliance
resource "aws_config_config_rule" "compliance_rules" {
  for_each = var.config_rules
  
  name = "${var.organization_name}-${each.key}"
  
  source {
    owner             = each.value.source_owner
    source_identifier = each.value.source_identifier
  }
  
  dynamic "source_detail" {
    for_each = each.value.source_details
    content {
      event_source                = source_detail.value.event_source
      message_type               = source_detail.value.message_type
      maximum_execution_frequency = source_detail.value.maximum_execution_frequency
    }
  }
  
  input_parameters = jsonencode(each.value.input_parameters)
  
  depends_on = [aws_config_configuration_recorder.main]
  
  tags = var.common_tags
}

# Config Remediation Configurations
resource "aws_config_remediation_configuration" "auto_remediation" {
  for_each = var.auto_remediation_rules
  
  config_rule_name = aws_config_config_rule.compliance_rules[each.key].name
  
  resource_type    = each.value.resource_type
  target_type      = "SSM_DOCUMENT"
  target_id        = each.value.ssm_document_name
  target_version   = "1"
  
  parameter {
    name           = "AutomationAssumeRole"
    static_value   = aws_iam_role.config_remediation.arn
  }
  
  dynamic "parameter" {
    for_each = each.value.parameters
    content {
      name         = parameter.key
      static_value = parameter.value
    }
  }
  
  automatic                = each.value.automatic
  maximum_automatic_attempts = each.value.maximum_automatic_attempts
}

# Config Conformance Packs
resource "aws_config_conformance_pack" "security_pack" {
  name = "${var.organization_name}-security-conformance-pack"
  
  template_body = file("${path.module}/conformance-packs/security-pack.yaml")
  
  input_parameter {
    parameter_name  = "AccessLoggingBucketParameter"
    parameter_value = aws_s3_bucket.access_logs.bucket
  }
  
  depends_on = [aws_config_configuration_recorder.main]
}

Security Hub Integration

Centralize security findings with AWS Security Hub:

# Enable Security Hub
resource "aws_securityhub_account" "main" {
  enable_default_standards = true
}

# Security Standards Subscriptions
resource "aws_securityhub_standards_subscription" "aws_foundational" {
  standards_arn = "arn:aws:securityhub:::ruleset/finding-format/aws-foundational-security-standard/v/1.0.0"
  depends_on    = [aws_securityhub_account.main]
}

resource "aws_securityhub_standards_subscription" "cis" {
  standards_arn = "arn:aws:securityhub:::ruleset/finding-format/cis-aws-foundations-benchmark/v/1.2.0"
  depends_on    = [aws_securityhub_account.main]
}

resource "aws_securityhub_standards_subscription" "pci_dss" {
  count = var.enable_pci_dss ? 1 : 0
  
  standards_arn = "arn:aws:securityhub:::ruleset/finding-format/pci-dss/v/3.2.1"
  depends_on    = [aws_securityhub_account.main]
}

# Custom Security Hub Insights
resource "aws_securityhub_insight" "high_severity_findings" {
  filters {
    severity_label {
      comparison = "EQUALS"
      value      = "HIGH"
    }
    
    record_state {
      comparison = "EQUALS"
      value      = "ACTIVE"
    }
  }
  
  group_by_attribute = "ResourceId"
  name              = "High Severity Active Findings"
  
  depends_on = [aws_securityhub_account.main]
}

# EventBridge Rule for Security Hub Findings
resource "aws_cloudwatch_event_rule" "security_hub_findings" {
  name        = "${var.organization_name}-security-hub-findings"
  description = "Capture Security Hub findings"
  
  event_pattern = jsonencode({
    source      = ["aws.securityhub"]
    detail-type = ["Security Hub Findings - Imported"]
    detail = {
      findings = {
        Severity = {
          Label = ["HIGH", "CRITICAL"]
        }
        RecordState = ["ACTIVE"]
      }
    }
  })
}

resource "aws_cloudwatch_event_target" "security_hub_sns" {
  rule      = aws_cloudwatch_event_rule.security_hub_findings.name
  target_id = "SecurityHubSNSTarget"
  arn       = aws_sns_topic.security_alerts.arn
}

GuardDuty Threat Detection

Implement GuardDuty for threat detection and response:

# Enable GuardDuty
resource "aws_guardduty_detector" "main" {
  enable = true
  
  datasources {
    s3_logs {
      enable = true
    }
    kubernetes {
      audit_logs {
        enable = var.enable_eks_audit_logs
      }
    }
    malware_protection {
      scan_ec2_instance_with_findings {
        ebs_volumes {
          enable = true
        }
      }
    }
  }
  
  finding_publishing_frequency = "FIFTEEN_MINUTES"
  
  tags = var.common_tags
}

# GuardDuty Threat Intel Set
resource "aws_guardduty_threatintelset" "custom_threats" {
  count = length(var.threat_intel_sets) > 0 ? 1 : 0
  
  activate    = true
  detector_id = aws_guardduty_detector.main.id
  format      = "TXT"
  location    = "s3://${aws_s3_bucket.threat_intel[0].bucket}/threat-intel.txt"
  name        = "${var.organization_name}-custom-threat-intel"
  
  tags = var.common_tags
}

# GuardDuty IP Set for trusted IPs
resource "aws_guardduty_ipset" "trusted_ips" {
  count = length(var.trusted_ip_ranges) > 0 ? 1 : 0
  
  activate    = true
  detector_id = aws_guardduty_detector.main.id
  format      = "TXT"
  location    = "s3://${aws_s3_bucket.threat_intel[0].bucket}/trusted-ips.txt"
  name        = "${var.organization_name}-trusted-ips"
  
  tags = var.common_tags
}

# EventBridge Rule for GuardDuty Findings
resource "aws_cloudwatch_event_rule" "guardduty_findings" {
  name        = "${var.organization_name}-guardduty-findings"
  description = "Capture GuardDuty findings"
  
  event_pattern = jsonencode({
    source      = ["aws.guardduty"]
    detail-type = ["GuardDuty Finding"]
    detail = {
      severity = [7.0, 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 8.0, 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9, 9.0, 9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.8, 9.9, 10.0]
    }
  })
}

resource "aws_cloudwatch_event_target" "guardduty_lambda" {
  rule      = aws_cloudwatch_event_rule.guardduty_findings.name
  target_id = "GuardDutyResponseLambda"
  arn       = aws_lambda_function.security_response.arn
}

Automated Compliance Reporting

Generate automated compliance reports:

# Lambda function for compliance reporting
resource "aws_lambda_function" "compliance_reporter" {
  filename         = "compliance_reporter.zip"
  function_name    = "${var.organization_name}-compliance-reporter"
  role            = aws_iam_role.compliance_reporter.arn
  handler         = "index.handler"
  runtime         = "python3.9"
  timeout         = 300
  
  environment {
    variables = {
      CONFIG_BUCKET     = aws_s3_bucket.compliance_reports.bucket
      SECURITY_HUB_REGION = data.aws_region.current.name
      SNS_TOPIC_ARN     = aws_sns_topic.compliance_reports.arn
    }
  }
  
  tags = var.common_tags
}

# Schedule compliance reporting
resource "aws_cloudwatch_event_rule" "compliance_report" {
  name                = "${var.organization_name}-compliance-report"
  description         = "Generate weekly compliance report"
  schedule_expression = "cron(0 8 ? * MON *)"  # Every Monday at 8 AM
  
  tags = var.common_tags
}

resource "aws_cloudwatch_event_target" "compliance_report" {
  rule      = aws_cloudwatch_event_rule.compliance_report.name
  target_id = "ComplianceReportTarget"
  arn       = aws_lambda_function.compliance_reporter.arn
}

# S3 bucket for compliance reports
resource "aws_s3_bucket" "compliance_reports" {
  bucket = "${var.organization_name}-compliance-reports-${random_id.bucket_suffix.hex}"
  
  tags = var.common_tags
}

resource "aws_s3_bucket_lifecycle_configuration" "compliance_reports" {
  bucket = aws_s3_bucket.compliance_reports.id
  
  rule {
    id     = "compliance_report_lifecycle"
    status = "Enabled"
    
    transition {
      days          = 90
      storage_class = "STANDARD_IA"
    }
    
    transition {
      days          = 365
      storage_class = "GLACIER"
    }
    
    expiration {
      days = 2555  # 7 years retention
    }
  }
}

Cost and Usage Monitoring

Monitor costs and usage patterns:

# Cost Budget with multiple notifications
resource "aws_budgets_budget" "monthly_cost" {
  name         = "${var.organization_name}-monthly-cost-budget"
  budget_type  = "COST"
  limit_amount = var.monthly_budget_limit
  limit_unit   = "USD"
  time_unit    = "MONTHLY"
  
  cost_filters = {
    LinkedAccount = [data.aws_caller_identity.current.account_id]
  }
  
  notification {
    comparison_operator        = "GREATER_THAN"
    threshold                 = 50
    threshold_type            = "PERCENTAGE"
    notification_type         = "ACTUAL"
    subscriber_email_addresses = var.budget_notification_emails
  }
  
  notification {
    comparison_operator        = "GREATER_THAN"
    threshold                 = 80
    threshold_type            = "PERCENTAGE"
    notification_type         = "ACTUAL"
    subscriber_email_addresses = var.budget_notification_emails
  }
  
  notification {
    comparison_operator        = "GREATER_THAN"
    threshold                 = 100
    threshold_type            = "PERCENTAGE"
    notification_type          = "FORECASTED"
    subscriber_email_addresses = var.budget_notification_emails
  }
}

# Usage Budget for specific services
resource "aws_budgets_budget" "ec2_usage" {
  name         = "${var.organization_name}-ec2-usage-budget"
  budget_type  = "USAGE"
  limit_amount = var.ec2_usage_limit
  limit_unit   = "Hrs"
  time_unit    = "MONTHLY"
  
  cost_filters = {
    Service = ["Amazon Elastic Compute Cloud - Compute"]
  }
  
  notification {
    comparison_operator        = "GREATER_THAN"
    threshold                 = 80
    threshold_type            = "PERCENTAGE"
    notification_type         = "ACTUAL"
    subscriber_email_addresses = var.budget_notification_emails
  }
}

# Cost Anomaly Detection
resource "aws_ce_anomaly_detector" "cost_anomaly" {
  name         = "${var.organization_name}-cost-anomaly-detector"
  monitor_type = "DIMENSIONAL"
  
  specification = jsonencode({
    Dimension = "SERVICE"
    MatchOptions = ["EQUALS"]
    Values = ["Amazon Elastic Compute Cloud - Compute", "Amazon Relational Database Service"]
  })
  
  tags = var.common_tags
}

resource "aws_ce_anomaly_subscription" "cost_anomaly" {
  name      = "${var.organization_name}-cost-anomaly-subscription"
  frequency = "DAILY"
  
  monitor_arn_list = [
    aws_ce_anomaly_detector.cost_anomaly.arn
  ]
  
  subscriber {
    type    = "EMAIL"
    address = var.cost_anomaly_email
  }
  
  threshold_expression {
    and {
      dimension {
        key           = "ANOMALY_TOTAL_IMPACT_ABSOLUTE"
        values        = ["100"]
        match_options = ["GREATER_THAN_OR_EQUAL"]
      }
    }
  }
  
  tags = var.common_tags
}

Notification and Alerting

Implement comprehensive notification systems:

# SNS Topics for different alert types
resource "aws_sns_topic" "alerts" {
  name = "${var.organization_name}-alerts"
  
  tags = var.common_tags
}

resource "aws_sns_topic" "critical_alerts" {
  name = "${var.organization_name}-critical-alerts"
  
  tags = var.common_tags
}

resource "aws_sns_topic" "security_alerts" {
  name = "${var.organization_name}-security-alerts"
  
  tags = var.common_tags
}

# SNS Topic Subscriptions
resource "aws_sns_topic_subscription" "email_alerts" {
  for_each = toset(var.alert_email_addresses)
  
  topic_arn = aws_sns_topic.alerts.arn
  protocol  = "email"
  endpoint  = each.value
}

resource "aws_sns_topic_subscription" "slack_alerts" {
  count = var.slack_webhook_url != null ? 1 : 0
  
  topic_arn = aws_sns_topic.critical_alerts.arn
  protocol  = "https"
  endpoint  = var.slack_webhook_url
}

# Lambda function for alert processing
resource "aws_lambda_function" "alert_processor" {
  filename         = "alert_processor.zip"
  function_name    = "${var.organization_name}-alert-processor"
  role            = aws_iam_role.alert_processor.arn
  handler         = "index.handler"
  runtime         = "python3.9"
  timeout         = 60
  
  environment {
    variables = {
      SLACK_WEBHOOK_URL = var.slack_webhook_url
      TEAMS_WEBHOOK_URL = var.teams_webhook_url
    }
  }
  
  tags = var.common_tags
}

resource "aws_sns_topic_subscription" "lambda_processor" {
  topic_arn = aws_sns_topic.alerts.arn
  protocol  = "lambda"
  endpoint  = aws_lambda_function.alert_processor.arn
}

resource "aws_lambda_permission" "allow_sns" {
  statement_id  = "AllowExecutionFromSNS"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.alert_processor.function_name
  principal     = "sns.amazonaws.com"
  source_arn    = aws_sns_topic.alerts.arn
}

What’s Next

Comprehensive monitoring and compliance automation provide the observability and governance needed for production AWS infrastructure. These patterns ensure you can detect issues early, maintain compliance standards, and respond quickly to security threats.

In the final part, we’ll explore advanced AWS service integrations, including EKS, serverless architectures, and complex multi-service patterns that demonstrate how all these concepts work together in real-world applications.

Advanced Integration

Modern AWS architectures combine multiple services in complex patterns—EKS clusters with RDS databases, Lambda functions triggered by S3 events, API Gateway integrations with multiple backends. Terraform excels at orchestrating these complex integrations, but you need to understand service dependencies, data flow patterns, and the operational considerations that make these architectures work reliably.

This final part demonstrates advanced integration patterns that bring together everything you’ve learned about AWS and Terraform.

EKS Cluster with Complete Ecosystem

A production-ready EKS cluster with all supporting services:

# EKS Cluster
resource "aws_eks_cluster" "main" {
  name     = var.cluster_name
  role_arn = aws_iam_role.eks_cluster.arn
  version  = var.kubernetes_version
  
  vpc_config {
    subnet_ids              = concat(var.private_subnet_ids, var.public_subnet_ids)
    endpoint_private_access = true
    endpoint_public_access  = var.enable_public_access
    public_access_cidrs     = var.public_access_cidrs
    security_group_ids      = [aws_security_group.eks_cluster.id]
  }
  
  encryption_config {
    provider {
      key_arn = aws_kms_key.eks.arn
    }
    resources = ["secrets"]
  }
  
  enabled_cluster_log_types = [
    "api", "audit", "authenticator", "controllerManager", "scheduler"
  ]
  
  depends_on = [
    aws_iam_role_policy_attachment.eks_cluster_policy,
    aws_iam_role_policy_attachment.eks_vpc_resource_controller,
    aws_cloudwatch_log_group.eks_cluster
  ]
  
  tags = var.tags
}

# EKS Node Groups with mixed instance types
resource "aws_eks_node_group" "main" {
  for_each = var.node_groups
  
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = each.key
  node_role_arn   = aws_iam_role.eks_node_group.arn
  subnet_ids      = var.private_subnet_ids
  
  capacity_type  = each.value.capacity_type
  instance_types = each.value.instance_types
  ami_type       = each.value.ami_type
  disk_size      = each.value.disk_size
  
  scaling_config {
    desired_size = each.value.desired_size
    max_size     = each.value.max_size
    min_size     = each.value.min_size
  }
  
  update_config {
    max_unavailable_percentage = 25
  }
  
  # Launch template for advanced configuration
  launch_template {
    id      = aws_launch_template.eks_nodes[each.key].id
    version = aws_launch_template.eks_nodes[each.key].latest_version
  }
  
  labels = merge(each.value.labels, {
    "node-group" = each.key
  })
  
  dynamic "taint" {
    for_each = each.value.taints
    content {
      key    = taint.value.key
      value  = taint.value.value
      effect = taint.value.effect
    }
  }
  
  tags = merge(var.tags, {
    "kubernetes.io/cluster/${var.cluster_name}" = "owned"
  })
  
  depends_on = [
    aws_iam_role_policy_attachment.eks_worker_node_policy,
    aws_iam_role_policy_attachment.eks_cni_policy,
    aws_iam_role_policy_attachment.eks_container_registry_policy
  ]
}

# Launch template for EKS nodes
resource "aws_launch_template" "eks_nodes" {
  for_each = var.node_groups
  
  name_prefix = "${var.cluster_name}-${each.key}-"
  
  vpc_security_group_ids = [aws_security_group.eks_nodes.id]
  
  user_data = base64encode(templatefile("${path.module}/user_data.sh", {
    cluster_name        = var.cluster_name
    cluster_endpoint    = aws_eks_cluster.main.endpoint
    cluster_ca          = aws_eks_cluster.main.certificate_authority[0].data
    bootstrap_arguments = each.value.bootstrap_arguments
  }))
  
  tag_specifications {
    resource_type = "instance"
    tags = merge(var.tags, {
      Name = "${var.cluster_name}-${each.key}-node"
    })
  }
  
  lifecycle {
    create_before_destroy = true
  }
}

# EKS Add-ons
resource "aws_eks_addon" "addons" {
  for_each = var.eks_addons
  
  cluster_name             = aws_eks_cluster.main.name
  addon_name               = each.key
  addon_version            = each.value.version
  resolve_conflicts        = "OVERWRITE"
  service_account_role_arn = each.value.service_account_role_arn
  
  tags = var.tags
}

# OIDC Identity Provider for service accounts
data "tls_certificate" "eks_oidc" {
  url = aws_eks_cluster.main.identity[0].oidc[0].issuer
}

resource "aws_iam_openid_connect_provider" "eks_oidc" {
  client_id_list  = ["sts.amazonaws.com"]
  thumbprint_list = [data.tls_certificate.eks_oidc.certificates[0].sha1_fingerprint]
  url             = aws_eks_cluster.main.identity[0].oidc[0].issuer
  
  tags = var.tags
}

# Service account roles for common services
resource "aws_iam_role" "aws_load_balancer_controller" {
  name = "${var.cluster_name}-aws-load-balancer-controller"
  
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRoleWithWebIdentity"
        Effect = "Allow"
        Principal = {
          Federated = aws_iam_openid_connect_provider.eks_oidc.arn
        }
        Condition = {
          StringEquals = {
            "${replace(aws_iam_openid_connect_provider.eks_oidc.url, "https://", "")}:sub" = "system:serviceaccount:kube-system:aws-load-balancer-controller"
            "${replace(aws_iam_openid_connect_provider.eks_oidc.url, "https://", "")}:aud" = "sts.amazonaws.com"
          }
        }
      }
    ]
  })
  
  tags = var.tags
}

resource "aws_iam_role_policy_attachment" "aws_load_balancer_controller" {
  policy_arn = "arn:aws:iam::aws:policy/ElasticLoadBalancingFullAccess"
  role       = aws_iam_role.aws_load_balancer_controller.name
}

Serverless Application with API Gateway

A complete serverless application with API Gateway, Lambda, and DynamoDB:

# API Gateway REST API
resource "aws_api_gateway_rest_api" "main" {
  name        = var.api_name
  description = "Serverless API for ${var.application_name}"
  
  endpoint_configuration {
    types = ["REGIONAL"]
  }
  
  tags = var.tags
}

# API Gateway Resources and Methods
resource "aws_api_gateway_resource" "users" {
  rest_api_id = aws_api_gateway_rest_api.main.id
  parent_id   = aws_api_gateway_rest_api.main.root_resource_id
  path_part   = "users"
}

resource "aws_api_gateway_resource" "user_id" {
  rest_api_id = aws_api_gateway_rest_api.main.id
  parent_id   = aws_api_gateway_resource.users.id
  path_part   = "{id}"
}

# Lambda functions for different operations
resource "aws_lambda_function" "api_functions" {
  for_each = var.lambda_functions
  
  filename         = each.value.filename
  function_name    = "${var.application_name}-${each.key}"
  role            = aws_iam_role.lambda_execution.arn
  handler         = each.value.handler
  runtime         = each.value.runtime
  timeout         = each.value.timeout
  memory_size     = each.value.memory_size
  
  environment {
    variables = merge(each.value.environment_variables, {
      DYNAMODB_TABLE = aws_dynamodb_table.main.name
      REGION         = data.aws_region.current.name
    })
  }
  
  vpc_config {
    subnet_ids         = var.lambda_subnet_ids
    security_group_ids = [aws_security_group.lambda.id]
  }
  
  dead_letter_config {
    target_arn = aws_sqs_queue.dlq.arn
  }
  
  tags = var.tags
}

# API Gateway Methods and Integrations
resource "aws_api_gateway_method" "users_get" {
  rest_api_id   = aws_api_gateway_rest_api.main.id
  resource_id   = aws_api_gateway_resource.users.id
  http_method   = "GET"
  authorization = "AWS_IAM"
  
  request_parameters = {
    "method.request.querystring.limit" = false
    "method.request.querystring.offset" = false
  }
}

resource "aws_api_gateway_integration" "users_get" {
  rest_api_id = aws_api_gateway_rest_api.main.id
  resource_id = aws_api_gateway_resource.users.id
  http_method = aws_api_gateway_method.users_get.http_method
  
  integration_http_method = "POST"
  type                   = "AWS_PROXY"
  uri                    = aws_lambda_function.api_functions["list_users"].invoke_arn
}

# API Gateway Deployment
resource "aws_api_gateway_deployment" "main" {
  depends_on = [
    aws_api_gateway_integration.users_get,
    # Add other integrations here
  ]
  
  rest_api_id = aws_api_gateway_rest_api.main.id
  stage_name  = var.api_stage
  
  variables = {
    deployed_at = timestamp()
  }
  
  lifecycle {
    create_before_destroy = true
  }
}

# API Gateway Stage with logging and throttling
resource "aws_api_gateway_stage" "main" {
  deployment_id = aws_api_gateway_deployment.main.id
  rest_api_id   = aws_api_gateway_rest_api.main.id
  stage_name    = var.api_stage
  
  access_log_settings {
    destination_arn = aws_cloudwatch_log_group.api_gateway.arn
    format = jsonencode({
      requestId      = "$context.requestId"
      ip            = "$context.identity.sourceIp"
      caller        = "$context.identity.caller"
      user          = "$context.identity.user"
      requestTime   = "$context.requestTime"
      httpMethod    = "$context.httpMethod"
      resourcePath  = "$context.resourcePath"
      status        = "$context.status"
      protocol      = "$context.protocol"
      responseLength = "$context.responseLength"
    })
  }
  
  xray_tracing_enabled = true
  
  tags = var.tags
}

# API Gateway Method Settings
resource "aws_api_gateway_method_settings" "main" {
  rest_api_id = aws_api_gateway_rest_api.main.id
  stage_name  = aws_api_gateway_stage.main.stage_name
  method_path = "*/*"
  
  settings {
    metrics_enabled = true
    logging_level   = "INFO"
    
    throttling_rate_limit  = var.api_throttling_rate_limit
    throttling_burst_limit = var.api_throttling_burst_limit
  }
}

# DynamoDB Table with Global Secondary Indexes
resource "aws_dynamodb_table" "main" {
  name           = "${var.application_name}-data"
  billing_mode   = "PAY_PER_REQUEST"
  hash_key       = "id"
  stream_enabled = true
  stream_view_type = "NEW_AND_OLD_IMAGES"
  
  attribute {
    name = "id"
    type = "S"
  }
  
  attribute {
    name = "email"
    type = "S"
  }
  
  attribute {
    name = "created_at"
    type = "S"
  }
  
  global_secondary_index {
    name     = "email-index"
    hash_key = "email"
  }
  
  global_secondary_index {
    name     = "created-at-index"
    hash_key = "created_at"
  }
  
  server_side_encryption {
    enabled     = true
    kms_key_arn = aws_kms_key.dynamodb.arn
  }
  
  point_in_time_recovery {
    enabled = true
  }
  
  tags = var.tags
}

# DynamoDB Stream Lambda Trigger
resource "aws_lambda_event_source_mapping" "dynamodb_stream" {
  event_source_arn  = aws_dynamodb_table.main.stream_arn
  function_name     = aws_lambda_function.api_functions["stream_processor"].arn
  starting_position = "LATEST"
  
  maximum_batching_window_in_seconds = 5
  batch_size                        = 10
  parallelization_factor           = 2
}

Data Pipeline with S3, Lambda, and RDS

A data processing pipeline that demonstrates event-driven architecture:

# S3 Bucket for data ingestion
resource "aws_s3_bucket" "data_ingestion" {
  bucket = "${var.application_name}-data-ingestion-${random_id.bucket_suffix.hex}"
  
  tags = var.tags
}

resource "aws_s3_bucket_notification" "data_ingestion" {
  bucket = aws_s3_bucket.data_ingestion.id
  
  lambda_function {
    lambda_function_arn = aws_lambda_function.data_processor.arn
    events              = ["s3:ObjectCreated:*"]
    filter_prefix       = "incoming/"
    filter_suffix       = ".json"
  }
  
  depends_on = [aws_lambda_permission.s3_invoke]
}

# Lambda function for data processing
resource "aws_lambda_function" "data_processor" {
  filename         = "data_processor.zip"
  function_name    = "${var.application_name}-data-processor"
  role            = aws_iam_role.data_processor.arn
  handler         = "index.handler"
  runtime         = "python3.9"
  timeout         = 300
  memory_size     = 1024
  
  environment {
    variables = {
      RDS_ENDPOINT = aws_db_instance.analytics.endpoint
      RDS_DATABASE = aws_db_instance.analytics.db_name
      S3_BUCKET    = aws_s3_bucket.processed_data.bucket
      SQS_QUEUE    = aws_sqs_queue.processing_queue.url
    }
  }
  
  vpc_config {
    subnet_ids         = var.lambda_subnet_ids
    security_group_ids = [aws_security_group.lambda_data_processor.id]
  }
  
  dead_letter_config {
    target_arn = aws_sqs_queue.processing_dlq.arn
  }
  
  tags = var.tags
}

# RDS Instance for analytics
resource "aws_db_instance" "analytics" {
  identifier = "${var.application_name}-analytics"
  
  engine         = "postgres"
  engine_version = "14.9"
  instance_class = var.analytics_db_instance_class
  
  allocated_storage     = var.analytics_db_storage
  max_allocated_storage = var.analytics_db_max_storage
  storage_type          = "gp3"
  storage_encrypted     = true
  kms_key_id           = aws_kms_key.rds.arn
  
  db_name  = "analytics"
  username = "analytics_user"
  password = random_password.analytics_db.result
  
  db_subnet_group_name   = aws_db_subnet_group.analytics.name
  vpc_security_group_ids = [aws_security_group.analytics_db.id]
  
  backup_retention_period = 7
  backup_window          = "03:00-04:00"
  maintenance_window     = "sun:04:00-sun:05:00"
  
  multi_az = var.environment == "production"
  
  monitoring_interval = 60
  monitoring_role_arn = aws_iam_role.rds_monitoring.arn
  
  performance_insights_enabled = true
  performance_insights_kms_key_id = aws_kms_key.rds.arn
  
  deletion_protection = var.environment == "production"
  skip_final_snapshot = var.environment != "production"
  
  tags = var.tags
}

# SQS Queue for processing coordination
resource "aws_sqs_queue" "processing_queue" {
  name                      = "${var.application_name}-processing-queue"
  delay_seconds             = 0
  max_message_size          = 262144
  message_retention_seconds = 1209600  # 14 days
  receive_wait_time_seconds = 20
  
  redrive_policy = jsonencode({
    deadLetterTargetArn = aws_sqs_queue.processing_dlq.arn
    maxReceiveCount     = 3
  })
  
  tags = var.tags
}

resource "aws_sqs_queue" "processing_dlq" {
  name = "${var.application_name}-processing-dlq"
  
  tags = var.tags
}

# EventBridge for workflow orchestration
resource "aws_cloudwatch_event_rule" "data_processing_workflow" {
  name        = "${var.application_name}-data-processing-workflow"
  description = "Orchestrate data processing workflow"
  
  event_pattern = jsonencode({
    source      = ["custom.dataprocessing"]
    detail-type = ["Data Processing Complete"]
  })
  
  tags = var.tags
}

resource "aws_cloudwatch_event_target" "start_analytics" {
  rule      = aws_cloudwatch_event_rule.data_processing_workflow.name
  target_id = "StartAnalyticsTarget"
  arn       = aws_lambda_function.analytics_processor.arn
}

# Step Functions for complex workflows
resource "aws_sfn_state_machine" "data_pipeline" {
  name     = "${var.application_name}-data-pipeline"
  role_arn = aws_iam_role.step_functions.arn
  
  definition = jsonencode({
    Comment = "Data processing pipeline"
    StartAt = "ProcessData"
    States = {
      ProcessData = {
        Type     = "Task"
        Resource = aws_lambda_function.data_processor.arn
        Next     = "CheckProcessingResult"
        Retry = [
          {
            ErrorEquals     = ["Lambda.ServiceException", "Lambda.AWSLambdaException", "Lambda.SdkClientException"]
            IntervalSeconds = 2
            MaxAttempts     = 6
            BackoffRate     = 2
          }
        ]
      }
      CheckProcessingResult = {
        Type = "Choice"
        Choices = [
          {
            Variable      = "$.status"
            StringEquals  = "SUCCESS"
            Next         = "RunAnalytics"
          }
        ]
        Default = "ProcessingFailed"
      }
      RunAnalytics = {
        Type     = "Task"
        Resource = aws_lambda_function.analytics_processor.arn
        End      = true
      }
      ProcessingFailed = {
        Type  = "Fail"
        Cause = "Data processing failed"
      }
    }
  })
  
  tags = var.tags
}

Multi-Service Integration with Service Discovery

Complex service integration using AWS Cloud Map:

# Service Discovery Namespace
resource "aws_service_discovery_private_dns_namespace" "main" {
  name        = "${var.application_name}.local"
  description = "Service discovery for ${var.application_name}"
  vpc         = var.vpc_id
  
  tags = var.tags
}

# Service Discovery Services
resource "aws_service_discovery_service" "services" {
  for_each = var.services
  
  name = each.key
  
  dns_config {
    namespace_id = aws_service_discovery_private_dns_namespace.main.id
    
    dns_records {
      ttl  = 10
      type = "A"
    }
    
    routing_policy = "MULTIVALUE"
  }
  
  health_check_grace_period_seconds = 30
  
  tags = var.tags
}

# ECS Services with Service Discovery
resource "aws_ecs_service" "microservices" {
  for_each = var.services
  
  name            = each.key
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.microservices[each.key].arn
  desired_count   = each.value.desired_count
  
  network_configuration {
    security_groups = [aws_security_group.microservices[each.key].id]
    subnets         = var.private_subnet_ids
  }
  
  service_registries {
    registry_arn = aws_service_discovery_service.services[each.key].arn
  }
  
  load_balancer {
    target_group_arn = aws_lb_target_group.microservices[each.key].arn
    container_name   = each.key
    container_port   = each.value.port
  }
  
  depends_on = [aws_lb_listener.microservices]
  
  tags = var.tags
}

# Application Load Balancer with path-based routing
resource "aws_lb" "microservices" {
  name               = "${var.application_name}-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = var.public_subnet_ids
  
  enable_deletion_protection = var.environment == "production"
  
  tags = var.tags
}

resource "aws_lb_listener" "microservices" {
  load_balancer_arn = aws_lb.microservices.arn
  port              = "443"
  protocol          = "HTTPS"
  ssl_policy        = "ELBSecurityPolicy-TLS-1-2-2017-01"
  certificate_arn   = var.certificate_arn
  
  default_action {
    type = "fixed-response"
    
    fixed_response {
      content_type = "text/plain"
      message_body = "Service not found"
      status_code  = "404"
    }
  }
}

# Listener rules for path-based routing
resource "aws_lb_listener_rule" "microservices" {
  for_each = var.services
  
  listener_arn = aws_lb_listener.microservices.arn
  priority     = each.value.priority
  
  action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.microservices[each.key].arn
  }
  
  condition {
    path_pattern {
      values = each.value.path_patterns
    }
  }
}

Final Integration Example

A complete example that brings together multiple services:

# Main application module that uses all components
module "complete_application" {
  source = "./modules/complete-application"
  
  # Basic configuration
  application_name = "my-app"
  environment     = "production"
  
  # Network configuration
  vpc_id             = module.vpc.vpc_id
  private_subnet_ids = module.vpc.private_subnet_ids
  public_subnet_ids  = module.vpc.public_subnet_ids
  
  # EKS configuration
  enable_eks = true
  eks_config = {
    kubernetes_version = "1.28"
    node_groups = {
      general = {
        instance_types = ["t3.medium", "t3.large"]
        capacity_type  = "ON_DEMAND"
        desired_size   = 3
        max_size       = 10
        min_size       = 1
      }
      spot = {
        instance_types = ["t3.medium", "t3.large", "t3.xlarge"]
        capacity_type  = "SPOT"
        desired_size   = 2
        max_size       = 20
        min_size       = 0
      }
    }
  }
  
  # Database configuration
  database_config = {
    engine         = "postgres"
    instance_class = "db.r5.large"
    multi_az       = true
    backup_retention_period = 7
  }
  
  # Serverless configuration
  enable_serverless = true
  lambda_functions = {
    api_handler = {
      runtime     = "python3.9"
      handler     = "app.handler"
      memory_size = 512
      timeout     = 30
    }
    data_processor = {
      runtime     = "python3.9"
      handler     = "processor.handler"
      memory_size = 1024
      timeout     = 300
    }
  }
  
  # Monitoring configuration
  monitoring_config = {
    enable_detailed_monitoring = true
    log_retention_days        = 30
    enable_xray_tracing       = true
  }
  
  # Security configuration
  security_config = {
    enable_guardduty    = true
    enable_security_hub = true
    enable_config       = true
  }
  
  tags = {
    Environment = "production"
    Project     = "my-app"
    ManagedBy   = "terraform"
  }
}

Conclusion

This comprehensive guide has covered the essential patterns for using Terraform with AWS, from basic provider setup to complex multi-service architectures. The key to success with AWS and Terraform is understanding not just the individual services, but how they work together to create reliable, scalable, and secure systems.

The patterns and practices covered in this guide provide a foundation for building production-ready AWS infrastructure that scales with your organization’s needs while maintaining security, compliance, and operational excellence.

AWS Provider Setup

Authentication Strategies

Multi-Region Provider Configuration

Cross-Account Provider Setup

Provider Version Management

Default Tags and Resource Naming

AWS CLI Integration

Environment-Specific Configuration

Security Best Practices

Troubleshooting Common Issues

What’s Next

VPC and Networking

VPC Design Patterns

Multi-Tier Subnet Architecture

NAT Gateway Configuration

Route Table Management

Security Group Patterns

VPC Endpoints for AWS Services

Network ACLs for Additional Security

Outputs for Network Resources

What’s Next

IAM and Security

IAM Role Patterns

Cross-Account Access Patterns

Service-Linked Roles and Managed Policies

User and Group Management

Secrets Management Integration

Security Automation

KMS Key Management

Security Group Automation

IAM Access Analyzer

What’s Next

Multi-Account Strategies

AWS Organizations Setup

Account Creation Automation

Service Control Policies

Cross-Account Role Management

Centralized Logging and Monitoring

AWS SSO Integration

Cost Management and Billing

Account Baseline Configuration

Cross-Account Resource Sharing

What’s Next

Cost Optimization

Comprehensive Tagging Strategy

Resource Right-Sizing

Automated Resource Scheduling

Storage Lifecycle Management

Reserved Instance and Savings Plans Management

Cost Monitoring and Alerting

Spot Instance Integration

Resource Cleanup Automation

What’s Next

AWS-Specific Modules

VPC Module with Best Practices

Application Load Balancer Module

RDS Module with High Availability

ECS Fargate Module

What’s Next

Monitoring and Compliance

CloudWatch Monitoring Foundation

AWS Config for Compliance

Security Hub Integration

GuardDuty Threat Detection

Automated Compliance Reporting

Cost and Usage Monitoring

Notification and Alerting

What’s Next

Advanced Integration

EKS Cluster with Complete Ecosystem

Serverless Application with API Gateway

Data Pipeline with S3, Lambda, and RDS

Multi-Service Integration with Service Discovery

Final Integration Example

Conclusion