Multi-Cloud Terraform: Managing Infrastructure Across Providers
Multi-cloud strategies are becoming increasingly common as organizations seek to avoid vendor lock-in, leverage best-of-breed services, and meet compliance requirements. However, managing infrastructure across multiple cloud providers introduces complexity in networking, identity management, monitoring, and operational processes.
This guide covers the patterns and practices for successfully implementing multi-cloud infrastructure with Terraform, from basic provider configuration to advanced cross-cloud networking and unified governance.
Multi-Provider Setup
Managing infrastructure across multiple cloud providers requires careful planning of provider configurations, authentication strategies, and resource organization. Each cloud provider has different authentication mechanisms, regional structures, and service offerings that need to be coordinated in a unified Terraform configuration.
This part covers the foundational patterns for multi-cloud Terraform configurations, from basic provider setup to advanced authentication and resource management strategies.
Multi-Provider Configuration
A typical multi-cloud setup involves configuring multiple providers with appropriate aliases and authentication:
terraform {
required_version = ">= 1.6"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.20"
}
azurerm = {
source = "hashicorp/azurerm"
version = "~> 3.70"
}
google = {
source = "hashicorp/google"
version = "~> 4.80"
}
}
}
# AWS Provider Configuration
provider "aws" {
region = var.aws_region
alias = "primary"
default_tags {
tags = local.common_tags
}
}
provider "aws" {
region = var.aws_secondary_region
alias = "secondary"
default_tags {
tags = local.common_tags
}
}
# Azure Provider Configuration
provider "azurerm" {
features {}
subscription_id = var.azure_subscription_id
tenant_id = var.azure_tenant_id
# Use managed identity when running in Azure
use_msi = var.use_azure_msi
}
# Google Cloud Provider Configuration
provider "google" {
project = var.gcp_project_id
region = var.gcp_region
# Use service account key or application default credentials
credentials = var.gcp_credentials_file
}
provider "google" {
project = var.gcp_project_id
region = var.gcp_secondary_region
alias = "secondary"
credentials = var.gcp_credentials_file
}
Authentication Strategies
Different providers require different authentication approaches:
AWS Authentication:
# Method 1: IAM Roles (recommended for production)
provider "aws" {
region = "us-west-2"
assume_role {
role_arn = "arn:aws:iam::123456789012:role/TerraformRole"
}
}
# Method 2: Environment variables
# AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN
# Method 3: AWS CLI profiles
# AWS_PROFILE=production terraform apply
Azure Authentication:
# Method 1: Service Principal
provider "azurerm" {
features {}
subscription_id = var.azure_subscription_id
client_id = var.azure_client_id
client_secret = var.azure_client_secret
tenant_id = var.azure_tenant_id
}
# Method 2: Managed Identity (when running in Azure)
provider "azurerm" {
features {}
use_msi = true
}
# Method 3: Azure CLI authentication
# az login && terraform apply
Google Cloud Authentication:
# Method 1: Service Account Key
provider "google" {
project = var.gcp_project_id
region = var.gcp_region
credentials = file("path/to/service-account-key.json")
}
# Method 2: Application Default Credentials
provider "google" {
project = var.gcp_project_id
region = var.gcp_region
# Uses gcloud application-default login
}
# Method 3: Workload Identity (when running in GKE)
provider "google" {
project = var.gcp_project_id
region = var.gcp_region
# Automatically uses workload identity
}
Environment-Specific Provider Configuration
Different environments often require different provider configurations:
# variables.tf
variable "environment" {
description = "Environment name"
type = string
}
variable "cloud_providers" {
description = "Cloud providers to use by environment"
type = map(object({
aws_enabled = bool
azure_enabled = bool
gcp_enabled = bool
aws_region = string
azure_region = string
gcp_region = string
}))
default = {
dev = {
aws_enabled = true
azure_enabled = false
gcp_enabled = false
aws_region = "us-west-2"
azure_region = "West US 2"
gcp_region = "us-west1"
}
staging = {
aws_enabled = true
azure_enabled = true
gcp_enabled = false
aws_region = "us-west-2"
azure_region = "West US 2"
gcp_region = "us-west1"
}
production = {
aws_enabled = true
azure_enabled = true
gcp_enabled = true
aws_region = "us-west-2"
azure_region = "West US 2"
gcp_region = "us-west1"
}
}
}
# main.tf
locals {
config = var.cloud_providers[var.environment]
common_tags = {
Environment = var.environment
ManagedBy = "terraform"
Project = var.project_name
}
}
# Conditional provider configuration
provider "aws" {
count = local.config.aws_enabled ? 1 : 0
region = local.config.aws_region
default_tags {
tags = local.common_tags
}
}
provider "azurerm" {
count = local.config.azure_enabled ? 1 : 0
features {}
}
provider "google" {
count = local.config.gcp_enabled ? 1 : 0
project = var.gcp_project_id
region = local.config.gcp_region
}
Resource Organization Patterns
Organize multi-cloud resources for maintainability:
# AWS Resources
resource "aws_vpc" "main" {
count = local.config.aws_enabled ? 1 : 0
provider = aws
cidr_block = "10.0.0.0/16"
tags = merge(local.common_tags, {
Name = "${var.project_name}-aws-vpc"
Provider = "aws"
})
}
# Azure Resources
resource "azurerm_resource_group" "main" {
count = local.config.azure_enabled ? 1 : 0
provider = azurerm
name = "${var.project_name}-rg"
location = local.config.azure_region
tags = merge(local.common_tags, {
Provider = "azure"
})
}
resource "azurerm_virtual_network" "main" {
count = local.config.azure_enabled ? 1 : 0
provider = azurerm
name = "${var.project_name}-vnet"
address_space = ["10.1.0.0/16"]
location = azurerm_resource_group.main[0].location
resource_group_name = azurerm_resource_group.main[0].name
tags = merge(local.common_tags, {
Provider = "azure"
})
}
# Google Cloud Resources
resource "google_compute_network" "main" {
count = local.config.gcp_enabled ? 1 : 0
provider = google
name = "${var.project_name}-vpc"
auto_create_subnetworks = false
labels = {
environment = var.environment
managed_by = "terraform"
project = var.project_name
provider = "gcp"
}
}
Cross-Provider Data Sharing
Share data between providers using outputs and data sources:
# outputs.tf
output "network_info" {
description = "Network information across all providers"
value = {
aws = local.config.aws_enabled ? {
vpc_id = aws_vpc.main[0].id
vpc_cidr_block = aws_vpc.main[0].cidr_block
region = local.config.aws_region
} : null
azure = local.config.azure_enabled ? {
vnet_id = azurerm_virtual_network.main[0].id
vnet_address_space = azurerm_virtual_network.main[0].address_space
resource_group = azurerm_resource_group.main[0].name
region = local.config.azure_region
} : null
gcp = local.config.gcp_enabled ? {
network_id = google_compute_network.main[0].id
network_name = google_compute_network.main[0].name
region = local.config.gcp_region
} : null
}
}
# Use in other configurations
data "terraform_remote_state" "network" {
backend = "s3"
config = {
bucket = "company-terraform-state"
key = "network/terraform.tfstate"
region = "us-west-2"
}
}
locals {
aws_vpc_id = data.terraform_remote_state.network.outputs.network_info.aws.vpc_id
azure_vnet_id = data.terraform_remote_state.network.outputs.network_info.azure.vnet_id
gcp_network_id = data.terraform_remote_state.network.outputs.network_info.gcp.network_id
}
Provider Version Management
Pin provider versions for consistency across clouds:
terraform {
required_version = ">= 1.6"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.20"
}
azurerm = {
source = "hashicorp/azurerm"
version = "~> 3.70"
}
google = {
source = "hashicorp/google"
version = "~> 4.80"
}
random = {
source = "hashicorp/random"
version = "~> 3.4"
}
tls = {
source = "hashicorp/tls"
version = "~> 4.0"
}
}
}
# Lock file ensures consistent provider versions
# terraform.lock.hcl is automatically generated
Multi-Cloud Module Structure
Organize modules for multi-cloud scenarios:
modules/
├── multi-cloud-network/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ ├── aws.tf
│ ├── azure.tf
│ └── gcp.tf
├── cloud-agnostic-database/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ ├── aws-rds.tf
│ ├── azure-sql.tf
│ └── gcp-sql.tf
└── monitoring/
├── main.tf
├── variables.tf
├── outputs.tf
├── aws-cloudwatch.tf
├── azure-monitor.tf
└── gcp-monitoring.tf
Multi-cloud network module example:
# modules/multi-cloud-network/main.tf
variable "providers_config" {
description = "Configuration for each cloud provider"
type = object({
aws_enabled = bool
azure_enabled = bool
gcp_enabled = bool
})
}
variable "network_cidrs" {
description = "CIDR blocks for each provider"
type = object({
aws = string
azure = string
gcp = string
})
default = {
aws = "10.0.0.0/16"
azure = "10.1.0.0/16"
gcp = "10.2.0.0/16"
}
}
# AWS networking resources
resource "aws_vpc" "main" {
count = var.providers_config.aws_enabled ? 1 : 0
cidr_block = var.network_cidrs.aws
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.name_prefix}-aws-vpc"
Provider = "aws"
}
}
# Azure networking resources
resource "azurerm_virtual_network" "main" {
count = var.providers_config.azure_enabled ? 1 : 0
name = "${var.name_prefix}-vnet"
address_space = [var.network_cidrs.azure]
location = var.azure_location
resource_group_name = var.azure_resource_group_name
tags = {
Provider = "azure"
}
}
# GCP networking resources
resource "google_compute_network" "main" {
count = var.providers_config.gcp_enabled ? 1 : 0
name = "${var.name_prefix}-vpc"
auto_create_subnetworks = false
labels = {
provider = "gcp"
}
}
Error Handling and Debugging
Multi-cloud configurations can be complex to debug:
# Enable detailed logging
export TF_LOG=DEBUG
export TF_LOG_PATH=terraform.log
# Test provider authentication
terraform providers
# Validate configuration
terraform validate
# Plan with specific providers
terraform plan -target="aws_vpc.main"
terraform plan -target="azurerm_virtual_network.main"
terraform plan -target="google_compute_network.main"
# Check provider plugin cache
ls -la .terraform/providers/
Provider-specific debugging:
# AWS debugging
aws sts get-caller-identity
aws configure list
# Azure debugging
az account show
az account list
# GCP debugging
gcloud auth list
gcloud config list
gcloud projects list
What’s Next
Multi-provider setup provides the foundation for multi-cloud infrastructure, but the real challenges emerge when you need to connect networks across different cloud providers. Cross-cloud networking requires understanding each provider’s networking model and implementing secure, performant connections.
In the next part, we’ll explore cross-cloud networking patterns, including VPN connections, private peering, and hybrid connectivity solutions that enable seamless communication across AWS, Azure, and Google Cloud.
Cross-Cloud Networking
Connecting networks across different cloud providers is one of the most complex aspects of multi-cloud architecture. Each provider has different networking models, security requirements, and connectivity options. This part covers practical patterns for establishing secure, reliable connections between AWS, Azure, and GCP.
AWS-Azure VPN Connection
Establish site-to-site VPN between AWS and Azure:
# AWS side configuration
data "aws_availability_zones" "available" {
state = "available"
}
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "aws-vpc"
}
}
resource "aws_subnet" "private" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.1.0/24"
availability_zone = data.aws_availability_zones.available.names[0]
tags = {
Name = "aws-private-subnet"
}
}
resource "aws_vpn_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = {
Name = "aws-vpn-gateway"
}
}
resource "aws_customer_gateway" "azure" {
bgp_asn = 65000
ip_address = azurerm_public_ip.vpn_gateway.ip_address
type = "ipsec.1"
tags = {
Name = "azure-customer-gateway"
}
}
resource "aws_vpn_connection" "azure" {
vpn_gateway_id = aws_vpn_gateway.main.id
customer_gateway_id = aws_customer_gateway.azure.id
type = "ipsec.1"
static_routes_only = true
tags = {
Name = "aws-azure-vpn"
}
}
resource "aws_vpn_connection_route" "azure" {
vpn_connection_id = aws_vpn_connection.azure.id
destination_cidr_block = "10.1.0.0/16"
}
# Azure side configuration
resource "azurerm_resource_group" "main" {
name = "multi-cloud-rg"
location = "East US"
}
resource "azurerm_virtual_network" "main" {
name = "azure-vnet"
address_space = ["10.1.0.0/16"]
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
}
resource "azurerm_subnet" "gateway" {
name = "GatewaySubnet"
resource_group_name = azurerm_resource_group.main.name
virtual_network_name = azurerm_virtual_network.main.name
address_prefixes = ["10.1.255.0/27"]
}
resource "azurerm_public_ip" "vpn_gateway" {
name = "azure-vpn-gateway-ip"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
allocation_method = "Static"
sku = "Standard"
}
resource "azurerm_virtual_network_gateway" "main" {
name = "azure-vpn-gateway"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
type = "Vpn"
vpn_type = "RouteBased"
active_active = false
enable_bgp = false
sku = "VpnGw1"
ip_configuration {
name = "vnetGatewayConfig"
public_ip_address_id = azurerm_public_ip.vpn_gateway.id
private_ip_address_allocation = "Dynamic"
subnet_id = azurerm_subnet.gateway.id
}
}
resource "azurerm_local_network_gateway" "aws" {
name = "aws-local-gateway"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
gateway_address = aws_vpn_connection.azure.tunnel1_address
address_space = ["10.0.0.0/16"]
}
resource "azurerm_virtual_network_gateway_connection" "aws" {
name = "azure-aws-connection"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
type = "IPsec"
virtual_network_gateway_id = azurerm_virtual_network_gateway.main.id
local_network_gateway_id = azurerm_local_network_gateway.aws.id
shared_key = aws_vpn_connection.azure.tunnel1_preshared_key
}
AWS-GCP Interconnect
Establish dedicated connection between AWS and GCP:
# GCP side configuration
resource "google_compute_network" "main" {
name = "gcp-vpc"
auto_create_subnetworks = false
}
resource "google_compute_subnetwork" "private" {
name = "gcp-private-subnet"
ip_cidr_range = "10.2.1.0/24"
region = "us-central1"
network = google_compute_network.main.id
}
resource "google_compute_router" "main" {
name = "gcp-router"
region = "us-central1"
network = google_compute_network.main.id
bgp {
asn = 64512
}
}
resource "google_compute_vpn_gateway" "main" {
name = "gcp-vpn-gateway"
network = google_compute_network.main.id
region = "us-central1"
}
resource "google_compute_address" "vpn_static_ip" {
name = "gcp-vpn-ip"
region = "us-central1"
}
resource "google_compute_vpn_tunnel" "aws" {
name = "gcp-aws-tunnel"
peer_ip = aws_vpn_connection.gcp.tunnel1_address
shared_secret = aws_vpn_connection.gcp.tunnel1_preshared_key
target_vpn_gateway = google_compute_vpn_gateway.main.id
depends_on = [
google_compute_forwarding_rule.esp,
google_compute_forwarding_rule.udp500,
google_compute_forwarding_rule.udp4500,
]
}
resource "google_compute_route" "aws" {
name = "route-to-aws"
network = google_compute_network.main.name
dest_range = "10.0.0.0/16"
priority = 1000
next_hop_vpn_tunnel = google_compute_vpn_tunnel.aws.id
}
Multi-Cloud Transit Gateway
Create a hub-and-spoke network topology:
# Central transit hub in AWS
resource "aws_ec2_transit_gateway" "hub" {
description = "Multi-cloud transit hub"
tags = {
Name = "multi-cloud-tgw"
}
}
resource "aws_ec2_transit_gateway_vpc_attachment" "aws_vpc" {
subnet_ids = [aws_subnet.private.id]
transit_gateway_id = aws_ec2_transit_gateway.hub.id
vpc_id = aws_vpc.main.id
tags = {
Name = "aws-vpc-attachment"
}
}
resource "aws_ec2_transit_gateway_vpn_attachment" "azure" {
vpn_connection_id = aws_vpn_connection.azure.id
transit_gateway_id = aws_ec2_transit_gateway.hub.id
tags = {
Name = "azure-vpn-attachment"
}
}
resource "aws_ec2_transit_gateway_route_table" "main" {
transit_gateway_id = aws_ec2_transit_gateway.hub.id
tags = {
Name = "multi-cloud-route-table"
}
}
resource "aws_ec2_transit_gateway_route" "azure" {
destination_cidr_block = "10.1.0.0/16"
transit_gateway_attachment_id = aws_ec2_transit_gateway_vpn_attachment.azure.id
transit_gateway_route_table_id = aws_ec2_transit_gateway_route_table.main.id
}
Network Automation Script
Automate cross-cloud network setup and validation:
#!/bin/bash
# scripts/setup-cross-cloud-network.sh
set -e
AWS_REGION=${1:-"us-west-2"}
AZURE_REGION=${2:-"East US"}
GCP_REGION=${3:-"us-central1"}
setup_aws_networking() {
echo "Setting up AWS networking..."
cd aws/
terraform init
terraform plan -var="region=$AWS_REGION"
terraform apply -auto-approve -var="region=$AWS_REGION"
# Export connection details
terraform output -json > ../aws-outputs.json
cd ..
}
setup_azure_networking() {
echo "Setting up Azure networking..."
# Import AWS VPN details
AWS_VPN_IP=$(jq -r '.vpn_tunnel1_address.value' aws-outputs.json)
AWS_PRESHARED_KEY=$(jq -r '.vpn_tunnel1_preshared_key.value' aws-outputs.json)
cd azure/
terraform init
terraform plan \
-var="location=$AZURE_REGION" \
-var="aws_vpn_ip=$AWS_VPN_IP" \
-var="aws_preshared_key=$AWS_PRESHARED_KEY"
terraform apply -auto-approve \
-var="location=$AZURE_REGION" \
-var="aws_vpn_ip=$AWS_VPN_IP" \
-var="aws_preshared_key=$AWS_PRESHARED_KEY"
cd ..
}
setup_gcp_networking() {
echo "Setting up GCP networking..."
cd gcp/
terraform init
terraform plan -var="region=$GCP_REGION"
terraform apply -auto-approve -var="region=$GCP_REGION"
cd ..
}
validate_connectivity() {
echo "Validating cross-cloud connectivity..."
# Test AWS to Azure
AWS_INSTANCE_IP=$(jq -r '.test_instance_private_ip.value' aws-outputs.json)
AZURE_INSTANCE_IP=$(jq -r '.test_instance_private_ip.value' azure-outputs.json)
echo "Testing AWS ($AWS_INSTANCE_IP) to Azure ($AZURE_INSTANCE_IP)..."
# This would typically involve SSH to instances and running ping tests
# For demo purposes, we'll just check VPN status
aws ec2 describe-vpn-connections \
--region "$AWS_REGION" \
--query 'VpnConnections[0].State' \
--output text
}
# Execute setup
setup_aws_networking
setup_azure_networking
setup_gcp_networking
validate_connectivity
echo "✅ Multi-cloud network setup completed"
Network Monitoring
Monitor cross-cloud network performance:
#!/usr/bin/env python3
# scripts/network_monitor.py
import boto3
import time
from azure.identity import DefaultAzureCredential
from azure.mgmt.network import NetworkManagementClient
from google.cloud import monitoring_v3
class MultiCloudNetworkMonitor:
def __init__(self):
self.aws_ec2 = boto3.client('ec2')
self.aws_cloudwatch = boto3.client('cloudwatch')
def check_aws_vpn_status(self, vpn_connection_id: str) -> dict:
"""Check AWS VPN connection status"""
response = self.aws_ec2.describe_vpn_connections(
VpnConnectionIds=[vpn_connection_id]
)
connection = response['VpnConnections'][0]
return {
'state': connection['State'],
'tunnel1_state': connection['VgwTelemetry'][0]['Status'],
'tunnel2_state': connection['VgwTelemetry'][1]['Status'],
'tunnel1_accepted_routes': connection['VgwTelemetry'][0]['AcceptedRouteCount'],
'tunnel2_accepted_routes': connection['VgwTelemetry'][1]['AcceptedRouteCount']
}
def get_network_metrics(self, vpn_connection_id: str) -> dict:
"""Get network performance metrics"""
end_time = time.time()
start_time = end_time - 3600 # Last hour
metrics = {}
# Get tunnel state metrics
for tunnel_num in [1, 2]:
response = self.aws_cloudwatch.get_metric_statistics(
Namespace='AWS/VPN',
MetricName='TunnelState',
Dimensions=[
{'Name': 'VpnId', 'Value': vpn_connection_id},
{'Name': 'TunnelIpAddress', 'Value': f'tunnel-{tunnel_num}'}
],
StartTime=start_time,
EndTime=end_time,
Period=300,
Statistics=['Average']
)
metrics[f'tunnel_{tunnel_num}_uptime'] = len([
dp for dp in response['Datapoints'] if dp['Average'] == 1
]) / len(response['Datapoints']) * 100 if response['Datapoints'] else 0
return metrics
def generate_report(self, vpn_connection_id: str) -> str:
"""Generate network status report"""
status = self.check_aws_vpn_status(vpn_connection_id)
metrics = self.get_network_metrics(vpn_connection_id)
report = [
"Multi-Cloud Network Status Report",
"=" * 40,
f"VPN Connection: {vpn_connection_id}",
f"Overall State: {status['state']}",
"",
"Tunnel Status:",
f" Tunnel 1: {status['tunnel1_state']} ({status['tunnel1_accepted_routes']} routes)",
f" Tunnel 2: {status['tunnel2_state']} ({status['tunnel2_accepted_routes']} routes)",
"",
"Uptime (Last Hour):",
f" Tunnel 1: {metrics.get('tunnel_1_uptime', 0):.1f}%",
f" Tunnel 2: {metrics.get('tunnel_2_uptime', 0):.1f}%"
]
return "\n".join(report)
def main():
import argparse
parser = argparse.ArgumentParser(description='Multi-Cloud Network Monitor')
parser.add_argument('--vpn-connection-id', required=True, help='AWS VPN Connection ID')
args = parser.parse_args()
monitor = MultiCloudNetworkMonitor()
report = monitor.generate_report(args.vpn_connection_id)
print(report)
if __name__ == "__main__":
main()
What’s Next
Cross-cloud networking provides the foundation for multi-cloud architecture, but managing identities and permissions across providers requires unified identity strategies. In the next part, we’ll explore how to implement consistent access control and identity management across AWS, Azure, and GCP.
Unified Identity and Access
Managing identities and permissions consistently across multiple cloud providers is critical for security and operational efficiency. Each provider has different identity models, but you can create unified patterns that provide consistent access control while leveraging each platform’s strengths.
Cross-Cloud Service Accounts
Create service accounts that can access multiple cloud providers:
# AWS IAM role for cross-cloud access
resource "aws_iam_role" "cross_cloud_service" {
name = "cross-cloud-service-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
},
{
Action = "sts:AssumeRoleWithWebIdentity"
Effect = "Allow"
Principal = {
Federated = aws_iam_openid_connect_provider.azure_ad.arn
}
Condition = {
StringEquals = {
"${aws_iam_openid_connect_provider.azure_ad.url}:aud" = var.azure_application_id
}
}
}
]
})
}
resource "aws_iam_policy" "cross_cloud_policy" {
name = "cross-cloud-access-policy"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"s3:GetObject",
"s3:PutObject",
"secretsmanager:GetSecretValue"
]
Resource = "*"
}
]
})
}
resource "aws_iam_role_policy_attachment" "cross_cloud" {
role = aws_iam_role.cross_cloud_service.name
policy_arn = aws_iam_policy.cross_cloud_policy.arn
}
# Azure AD application for cross-cloud identity
resource "azuread_application" "cross_cloud" {
display_name = "cross-cloud-service"
web {
redirect_uris = ["https://signin.aws.amazon.com/saml"]
}
}
resource "azuread_service_principal" "cross_cloud" {
application_id = azuread_application.cross_cloud.application_id
}
resource "azuread_service_principal_password" "cross_cloud" {
service_principal_id = azuread_service_principal.cross_cloud.object_id
}
# GCP service account
resource "google_service_account" "cross_cloud" {
account_id = "cross-cloud-service"
display_name = "Cross-Cloud Service Account"
}
resource "google_service_account_key" "cross_cloud" {
service_account_id = google_service_account.cross_cloud.name
}
resource "google_project_iam_member" "cross_cloud_storage" {
project = var.gcp_project_id
role = "roles/storage.admin"
member = "serviceAccount:${google_service_account.cross_cloud.email}"
}
Federated Identity Setup
Configure identity federation between providers:
# AWS OIDC provider for Azure AD
resource "aws_iam_openid_connect_provider" "azure_ad" {
url = "https://sts.windows.net/${var.azure_tenant_id}/"
client_id_list = [
var.azure_application_id
]
thumbprint_list = [
"626d44e704d1ceabe3bf0d53397464ac8080142c"
]
}
# Azure AD SAML configuration for AWS
resource "azuread_application" "aws_sso" {
display_name = "AWS-SSO"
web {
redirect_uris = ["https://signin.aws.amazon.com/saml"]
}
app_role {
allowed_member_types = ["User"]
description = "AWS SSO Access"
display_name = "AWS Access"
enabled = true
id = "b9632174-c057-4f7e-951b-b3adc3ddb778"
value = "AWSAccess"
}
}
# GCP Workload Identity for cross-cloud access
resource "google_iam_workload_identity_pool" "cross_cloud" {
workload_identity_pool_id = "cross-cloud-pool"
display_name = "Cross-Cloud Identity Pool"
description = "Identity pool for cross-cloud access"
}
resource "google_iam_workload_identity_pool_provider" "aws" {
workload_identity_pool_id = google_iam_workload_identity_pool.cross_cloud.workload_identity_pool_id
workload_identity_pool_provider_id = "aws-provider"
display_name = "AWS Provider"
aws {
account_id = var.aws_account_id
}
attribute_mapping = {
"google.subject" = "assertion.arn"
"attribute.aws_role" = "assertion.arn.contains('role') ? assertion.arn.extract('{account_arn}role/') : ''"
"attribute.account_id" = "assertion.account"
}
}
Unified RBAC Implementation
Create consistent role-based access control across providers:
# Define common roles
locals {
common_roles = {
admin = {
description = "Full administrative access"
permissions = ["*"]
}
developer = {
description = "Development environment access"
permissions = ["read", "write", "deploy"]
}
readonly = {
description = "Read-only access"
permissions = ["read"]
}
}
}
# AWS IAM roles based on common roles
resource "aws_iam_role" "common_roles" {
for_each = local.common_roles
name = "multi-cloud-${each.key}"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Federated = aws_iam_openid_connect_provider.azure_ad.arn
}
}
]
})
}
resource "aws_iam_policy" "common_role_policies" {
for_each = local.common_roles
name = "multi-cloud-${each.key}-policy"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = each.value.permissions
Resource = "*"
}
]
})
}
# Azure AD groups for common roles
resource "azuread_group" "common_roles" {
for_each = local.common_roles
display_name = "MultiCloud-${title(each.key)}"
description = each.value.description
security_enabled = true
}
# GCP IAM custom roles
resource "google_project_iam_custom_role" "common_roles" {
for_each = local.common_roles
role_id = "multiCloud${title(each.key)}"
title = "Multi-Cloud ${title(each.key)}"
description = each.value.description
permissions = [
for perm in each.value.permissions :
"storage.objects.${perm}" if perm != "*"
]
}
Secret Management Across Clouds
Implement unified secret management:
#!/usr/bin/env python3
# scripts/cross_cloud_secrets.py
import boto3
import json
from azure.keyvault.secrets import SecretClient
from azure.identity import DefaultAzureCredential
from google.cloud import secretmanager
class CrossCloudSecretManager:
def __init__(self, aws_region: str, azure_vault_url: str, gcp_project_id: str):
self.aws_secrets = boto3.client('secretsmanager', region_name=aws_region)
self.azure_secrets = SecretClient(vault_url=azure_vault_url, credential=DefaultAzureCredential())
self.gcp_secrets = secretmanager.SecretManagerServiceClient()
self.gcp_project_id = gcp_project_id
def create_secret_everywhere(self, secret_name: str, secret_value: str) -> dict:
"""Create the same secret in all three cloud providers"""
results = {}
# AWS Secrets Manager
try:
self.aws_secrets.create_secret(
Name=secret_name,
SecretString=secret_value,
Description=f"Cross-cloud secret: {secret_name}"
)
results['aws'] = 'success'
except Exception as e:
results['aws'] = f'error: {str(e)}'
# Azure Key Vault
try:
self.azure_secrets.set_secret(secret_name, secret_value)
results['azure'] = 'success'
except Exception as e:
results['azure'] = f'error: {str(e)}'
# GCP Secret Manager
try:
parent = f"projects/{self.gcp_project_id}"
# Create secret
secret = self.gcp_secrets.create_secret(
request={
"parent": parent,
"secret_id": secret_name,
"secret": {"replication": {"automatic": {}}},
}
)
# Add secret version
self.gcp_secrets.add_secret_version(
request={
"parent": secret.name,
"payload": {"data": secret_value.encode("UTF-8")},
}
)
results['gcp'] = 'success'
except Exception as e:
results['gcp'] = f'error: {str(e)}'
return results
def get_secret_from_all(self, secret_name: str) -> dict:
"""Retrieve secret from all providers for comparison"""
secrets = {}
# AWS
try:
response = self.aws_secrets.get_secret_value(SecretId=secret_name)
secrets['aws'] = response['SecretString']
except Exception as e:
secrets['aws'] = f'error: {str(e)}'
# Azure
try:
secret = self.azure_secrets.get_secret(secret_name)
secrets['azure'] = secret.value
except Exception as e:
secrets['azure'] = f'error: {str(e)}'
# GCP
try:
name = f"projects/{self.gcp_project_id}/secrets/{secret_name}/versions/latest"
response = self.gcp_secrets.access_secret_version(request={"name": name})
secrets['gcp'] = response.payload.data.decode("UTF-8")
except Exception as e:
secrets['gcp'] = f'error: {str(e)}'
return secrets
def sync_secrets(self, secret_mappings: dict) -> dict:
"""Sync secrets across providers based on mapping"""
sync_results = {}
for secret_name, config in secret_mappings.items():
source_provider = config['source']
target_providers = config['targets']
# Get secret from source
if source_provider == 'aws':
try:
response = self.aws_secrets.get_secret_value(SecretId=secret_name)
secret_value = response['SecretString']
except Exception as e:
sync_results[secret_name] = f'Failed to read from AWS: {e}'
continue
# Sync to targets
for target in target_providers:
if target == 'azure':
try:
self.azure_secrets.set_secret(secret_name, secret_value)
sync_results[f'{secret_name}_to_azure'] = 'success'
except Exception as e:
sync_results[f'{secret_name}_to_azure'] = f'error: {e}'
elif target == 'gcp':
try:
name = f"projects/{self.gcp_project_id}/secrets/{secret_name}/versions/latest"
self.gcp_secrets.add_secret_version(
request={
"parent": f"projects/{self.gcp_project_id}/secrets/{secret_name}",
"payload": {"data": secret_value.encode("UTF-8")},
}
)
sync_results[f'{secret_name}_to_gcp'] = 'success'
except Exception as e:
sync_results[f'{secret_name}_to_gcp'] = f'error: {e}'
return sync_results
def main():
import argparse
parser = argparse.ArgumentParser(description='Cross-Cloud Secret Manager')
parser.add_argument('--aws-region', default='us-west-2', help='AWS region')
parser.add_argument('--azure-vault-url', required=True, help='Azure Key Vault URL')
parser.add_argument('--gcp-project-id', required=True, help='GCP Project ID')
parser.add_argument('--action', choices=['create', 'get', 'sync'], required=True)
parser.add_argument('--secret-name', help='Secret name')
parser.add_argument('--secret-value', help='Secret value')
parser.add_argument('--config-file', help='JSON config file for sync operation')
args = parser.parse_args()
manager = CrossCloudSecretManager(
args.aws_region,
args.azure_vault_url,
args.gcp_project_id
)
if args.action == 'create':
if not args.secret_name or not args.secret_value:
print("Error: --secret-name and --secret-value required for create")
return
results = manager.create_secret_everywhere(args.secret_name, args.secret_value)
print(json.dumps(results, indent=2))
elif args.action == 'get':
if not args.secret_name:
print("Error: --secret-name required for get")
return
secrets = manager.get_secret_from_all(args.secret_name)
print(json.dumps(secrets, indent=2))
elif args.action == 'sync':
if not args.config_file:
print("Error: --config-file required for sync")
return
with open(args.config_file, 'r') as f:
config = json.load(f)
results = manager.sync_secrets(config)
print(json.dumps(results, indent=2))
if __name__ == "__main__":
main()
Access Control Automation
Automate user provisioning across all providers:
#!/bin/bash
# scripts/provision-user.sh
set -e
USER_EMAIL=${1:-""}
ROLE=${2:-"readonly"}
GROUPS=${3:-""}
if [ -z "$USER_EMAIL" ]; then
echo "Usage: $0 <user_email> [role] [additional_groups]"
exit 1
fi
provision_aws_access() {
echo "Provisioning AWS access for $USER_EMAIL..."
# Create IAM user
aws iam create-user --user-name "$USER_EMAIL" || true
# Attach role policy
aws iam attach-user-policy \
--user-name "$USER_EMAIL" \
--policy-arn "arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/multi-cloud-${ROLE}-policy"
# Generate access keys
aws iam create-access-key --user-name "$USER_EMAIL" --output table
}
provision_azure_access() {
echo "Provisioning Azure access for $USER_EMAIL..."
# Create user (if external, invite as guest)
az ad user create \
--display-name "$USER_EMAIL" \
--user-principal-name "$USER_EMAIL" \
--password "TempPassword123!" \
--force-change-password-next-sign-in true || \
az ad user invite --invited-user-email-address "$USER_EMAIL"
# Add to role group
GROUP_ID=$(az ad group show --group "MultiCloud-$(echo $ROLE | sed 's/.*/\u&/')" --query objectId --output tsv)
USER_ID=$(az ad user show --id "$USER_EMAIL" --query objectId --output tsv)
az ad group member add --group "$GROUP_ID" --member-id "$USER_ID"
}
provision_gcp_access() {
echo "Provisioning GCP access for $USER_EMAIL..."
# Add IAM policy binding
gcloud projects add-iam-policy-binding "$GCP_PROJECT_ID" \
--member="user:$USER_EMAIL" \
--role="projects/$GCP_PROJECT_ID/roles/multiCloud$(echo $ROLE | sed 's/.*/\u&/')"
}
# Execute provisioning
provision_aws_access
provision_azure_access
provision_gcp_access
echo "✅ User $USER_EMAIL provisioned with $ROLE access across all clouds"
What’s Next
Unified identity and access management provides the security foundation for multi-cloud operations. In the next part, we’ll explore provider abstraction patterns that allow you to create modules and configurations that work consistently across different cloud providers.
Provider Abstraction Patterns
Creating truly portable infrastructure requires abstraction layers that hide provider-specific differences while exposing common functionality. This part covers patterns for building cloud-agnostic modules that can deploy the same logical infrastructure across AWS, Azure, and GCP.
Universal Compute Module
Create a compute module that works across all providers:
# modules/universal-compute/variables.tf
variable "provider_type" {
description = "Cloud provider (aws, azure, gcp)"
type = string
validation {
condition = contains(["aws", "azure", "gcp"], var.provider_type)
error_message = "Provider must be aws, azure, or gcp."
}
}
variable "instance_config" {
description = "Instance configuration"
type = object({
name = string
size = string
image = string
subnet_id = string
key_name = optional(string)
user_data = optional(string)
tags = optional(map(string), {})
})
}
variable "gcp_region" {
description = "GCP region (required when provider_type is gcp)"
type = string
default = ""
}
variable "gcp_project_id" {
description = "GCP project ID (required when provider_type is gcp)"
type = string
default = ""
}
# modules/universal-compute/main.tf
locals {
# Size mapping across providers
size_mapping = {
aws = {
small = "t3.micro"
medium = "t3.small"
large = "t3.medium"
}
azure = {
small = "Standard_B1s"
medium = "Standard_B2s"
large = "Standard_B4ms"
}
gcp = {
small = "e2-micro"
medium = "e2-small"
large = "e2-medium"
}
}
actual_size = local.size_mapping[var.provider_type][var.instance_config.size]
}
# AWS EC2 Instance
resource "aws_instance" "this" {
count = var.provider_type == "aws" ? 1 : 0
ami = var.instance_config.image
instance_type = local.actual_size
subnet_id = var.instance_config.subnet_id
key_name = var.instance_config.key_name
user_data = var.instance_config.user_data
tags = merge(var.instance_config.tags, {
Name = var.instance_config.name
})
}
# Azure Virtual Machine
resource "azurerm_network_interface" "this" {
count = var.provider_type == "azure" ? 1 : 0
name = "${var.instance_config.name}-nic"
location = data.azurerm_subnet.this[0].location
resource_group_name = data.azurerm_subnet.this[0].resource_group_name
ip_configuration {
name = "internal"
subnet_id = var.instance_config.subnet_id
private_ip_address_allocation = "Dynamic"
}
}
resource "azurerm_linux_virtual_machine" "this" {
count = var.provider_type == "azure" ? 1 : 0
name = var.instance_config.name
resource_group_name = data.azurerm_subnet.this[0].resource_group_name
location = data.azurerm_subnet.this[0].location
size = local.actual_size
disable_password_authentication = true
admin_username = "adminuser"
network_interface_ids = [
azurerm_network_interface.this[0].id,
]
os_disk {
caching = "ReadWrite"
storage_account_type = "Standard_LRS"
}
source_image_reference {
publisher = "Canonical"
offer = "0001-com-ubuntu-server-focal"
sku = "20_04-lts-gen2"
version = "latest"
}
admin_ssh_key {
username = "adminuser"
public_key = file("~/.ssh/id_rsa.pub")
}
custom_data = base64encode(var.instance_config.user_data)
tags = var.instance_config.tags
}
# GCP Compute Instance
resource "google_compute_instance" "this" {
count = var.provider_type == "gcp" ? 1 : 0
name = var.instance_config.name
machine_type = local.actual_size
zone = "${data.google_compute_subnetwork.this[0].region}-a" # Use first zone in region
boot_disk {
initialize_params {
image = var.instance_config.image
}
}
network_interface {
subnetwork = var.instance_config.subnet_id
}
metadata = {
ssh-keys = "adminuser:${file("~/.ssh/id_rsa.pub")}"
}
metadata_startup_script = var.instance_config.user_data
labels = var.instance_config.tags
}
# Data sources for provider-specific information
data "azurerm_subnet" "this" {
count = var.provider_type == "azure" ? 1 : 0
name = split("/", var.instance_config.subnet_id)[10] # Extract subnet name from resource ID
virtual_network_name = split("/", var.instance_config.subnet_id)[8] # Extract VNet name
resource_group_name = split("/", var.instance_config.subnet_id)[4] # Extract RG name
}
data "google_compute_subnetwork" "this" {
count = var.provider_type == "gcp" ? 1 : 0
name = var.instance_config.subnet_id
region = var.gcp_region
project = var.gcp_project_id
}
# modules/universal-compute/outputs.tf
output "instance_id" {
description = "Instance ID"
value = var.provider_type == "aws" ? aws_instance.this[0].id : (
var.provider_type == "azure" ? azurerm_linux_virtual_machine.this[0].id :
google_compute_instance.this[0].id
)
}
output "private_ip" {
description = "Private IP address"
value = var.provider_type == "aws" ? aws_instance.this[0].private_ip : (
var.provider_type == "azure" ? azurerm_linux_virtual_machine.this[0].private_ip_address :
google_compute_instance.this[0].network_interface[0].network_ip
)
}
Universal Storage Module
Create storage that works across providers:
# modules/universal-storage/variables.tf
variable "provider_type" {
description = "Cloud provider"
type = string
}
variable "bucket_config" {
description = "Storage bucket configuration"
type = object({
name = string
versioning_enabled = optional(bool, false)
encryption_enabled = optional(bool, true)
public_access = optional(bool, false)
lifecycle_rules = optional(list(object({
days = number
action = string
})), [])
tags = optional(map(string), {})
})
}
variable "resource_group_name" {
description = "Azure resource group name (required when provider_type is azure)"
type = string
default = ""
}
variable "location" {
description = "Azure location (required when provider_type is azure)"
type = string
default = ""
}
variable "gcp_region" {
description = "GCP region (required when provider_type is gcp)"
type = string
default = ""
}
# modules/universal-storage/main.tf
# AWS S3 Bucket
resource "aws_s3_bucket" "this" {
count = var.provider_type == "aws" ? 1 : 0
bucket = var.bucket_config.name
tags = var.bucket_config.tags
}
resource "aws_s3_bucket_versioning" "this" {
count = var.provider_type == "aws" && var.bucket_config.versioning_enabled ? 1 : 0
bucket = aws_s3_bucket.this[0].id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "this" {
count = var.provider_type == "aws" && var.bucket_config.encryption_enabled ? 1 : 0
bucket = aws_s3_bucket.this[0].id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
}
}
resource "aws_s3_bucket_lifecycle_configuration" "this" {
count = var.provider_type == "aws" && length(var.bucket_config.lifecycle_rules) > 0 ? 1 : 0
bucket = aws_s3_bucket.this[0].id
dynamic "rule" {
for_each = var.bucket_config.lifecycle_rules
content {
id = "rule-${rule.key}"
status = "Enabled"
expiration {
days = rule.value.days
}
}
}
}
# Azure Storage Account
resource "azurerm_storage_account" "this" {
count = var.provider_type == "azure" ? 1 : 0
name = replace(var.bucket_config.name, "-", "")
resource_group_name = var.resource_group_name
location = var.location
account_tier = "Standard"
account_replication_type = "LRS"
blob_properties {
versioning_enabled = var.bucket_config.versioning_enabled
}
tags = var.bucket_config.tags
}
resource "azurerm_storage_container" "this" {
count = var.provider_type == "azure" ? 1 : 0
name = "data"
storage_account_name = azurerm_storage_account.this[0].name
container_access_type = var.bucket_config.public_access ? "blob" : "private"
}
# GCP Storage Bucket
resource "google_storage_bucket" "this" {
count = var.provider_type == "gcp" ? 1 : 0
name = var.bucket_config.name
location = var.gcp_region
versioning {
enabled = var.bucket_config.versioning_enabled
}
dynamic "lifecycle_rule" {
for_each = var.bucket_config.lifecycle_rules
content {
condition {
age = lifecycle_rule.value.days
}
action {
type = lifecycle_rule.value.action == "delete" ? "Delete" : "SetStorageClass"
}
}
}
labels = var.bucket_config.tags
}
Configuration Factory Pattern
Generate provider-specific configurations from common definitions:
#!/usr/bin/env python3
# scripts/config_factory.py
import json
import yaml
from typing import Dict, Any, List
from pathlib import Path
class MultiCloudConfigFactory:
def __init__(self):
self.provider_mappings = {
'compute': {
'aws': self._generate_aws_compute,
'azure': self._generate_azure_compute,
'gcp': self._generate_gcp_compute
},
'storage': {
'aws': self._generate_aws_storage,
'azure': self._generate_azure_storage,
'gcp': self._generate_gcp_storage
},
'network': {
'aws': self._generate_aws_network,
'azure': self._generate_azure_network,
'gcp': self._generate_gcp_network
}
}
def generate_configs(self, spec_file: str, output_dir: str) -> Dict[str, str]:
"""Generate provider-specific configs from universal spec"""
with open(spec_file, 'r') as f:
spec = yaml.safe_load(f)
output_path = Path(output_dir)
output_path.mkdir(exist_ok=True)
generated_files = {}
for provider in spec.get('providers', []):
provider_name = provider['name']
provider_config = {
'terraform': {
'required_providers': {
provider_name: provider.get('version_constraint', {})
}
},
'provider': {
provider_name: provider.get('config', {})
}
}
# Generate resources for each service
for service_name, service_config in spec.get('services', {}).items():
if service_name in self.provider_mappings:
generator = self.provider_mappings[service_name].get(provider_name)
if generator:
resources = generator(service_config)
provider_config.update(resources)
# Write provider-specific configuration
config_file = output_path / f"{provider_name}.tf.json"
with open(config_file, 'w') as f:
json.dump(provider_config, f, indent=2)
generated_files[provider_name] = str(config_file)
return generated_files
def _generate_aws_compute(self, config: Dict[str, Any]) -> Dict[str, Any]:
"""Generate AWS compute resources"""
resources = {'resource': {'aws_instance': {}}}
for instance_name, instance_config in config.get('instances', {}).items():
resources['resource']['aws_instance'][instance_name] = {
'ami': instance_config['image'],
'instance_type': self._map_instance_size('aws', instance_config['size']),
'subnet_id': instance_config['subnet_id'],
'tags': instance_config.get('tags', {})
}
if 'user_data' in instance_config:
resources['resource']['aws_instance'][instance_name]['user_data'] = instance_config['user_data']
return resources
def _generate_azure_compute(self, config: Dict[str, Any]) -> Dict[str, Any]:
"""Generate Azure compute resources"""
resources = {
'resource': {
'azurerm_linux_virtual_machine': {},
'azurerm_network_interface': {}
}
}
for instance_name, instance_config in config.get('instances', {}).items():
# Network interface
resources['resource']['azurerm_network_interface'][f"{instance_name}_nic"] = {
'name': f"{instance_name}-nic",
'location': '${var.location}',
'resource_group_name': '${var.resource_group_name}',
'ip_configuration': [{
'name': 'internal',
'subnet_id': instance_config['subnet_id'],
'private_ip_address_allocation': 'Dynamic'
}]
}
# Virtual machine
resources['resource']['azurerm_linux_virtual_machine'][instance_name] = {
'name': instance_name,
'resource_group_name': '${var.resource_group_name}',
'location': '${var.location}',
'size': self._map_instance_size('azure', instance_config['size']),
'disable_password_authentication': True,
'network_interface_ids': [f"${{azurerm_network_interface.{instance_name}_nic.id}}"],
'os_disk': [{
'caching': 'ReadWrite',
'storage_account_type': 'Standard_LRS'
}],
'source_image_reference': [{
'publisher': 'Canonical',
'offer': '0001-com-ubuntu-server-focal',
'sku': '20_04-lts-gen2',
'version': 'latest'
}],
'tags': instance_config.get('tags', {})
}
return resources
def _generate_gcp_compute(self, config: Dict[str, Any]) -> Dict[str, Any]:
"""Generate GCP compute resources"""
resources = {'resource': {'google_compute_instance': {}}}
for instance_name, instance_config in config.get('instances', {}).items():
resources['resource']['google_compute_instance'][instance_name] = {
'name': instance_name,
'machine_type': self._map_instance_size('gcp', instance_config['size']),
'zone': '${var.zone}',
'boot_disk': [{
'initialize_params': [{
'image': instance_config['image']
}]
}],
'network_interface': [{
'subnetwork': instance_config['subnet_id']
}],
'labels': instance_config.get('tags', {})
}
return resources
def _generate_aws_storage(self, config: Dict[str, Any]) -> Dict[str, Any]:
"""Generate AWS storage resources"""
resources = {'resource': {'aws_s3_bucket': {}}}
for bucket_name, bucket_config in config.get('buckets', {}).items():
resources['resource']['aws_s3_bucket'][bucket_name] = {
'bucket': bucket_config['name'],
'tags': bucket_config.get('tags', {})
}
return resources
def _generate_azure_storage(self, config: Dict[str, Any]) -> Dict[str, Any]:
"""Generate Azure storage resources"""
resources = {'resource': {'azurerm_storage_account': {}}}
for bucket_name, bucket_config in config.get('buckets', {}).items():
resources['resource']['azurerm_storage_account'][bucket_name] = {
'name': bucket_config['name'].replace('-', ''),
'resource_group_name': '${var.resource_group_name}',
'location': '${var.location}',
'account_tier': 'Standard',
'account_replication_type': 'LRS',
'tags': bucket_config.get('tags', {})
}
return resources
def _generate_gcp_storage(self, config: Dict[str, Any]) -> Dict[str, Any]:
"""Generate GCP storage resources"""
resources = {'resource': {'google_storage_bucket': {}}}
for bucket_name, bucket_config in config.get('buckets', {}).items():
resources['resource']['google_storage_bucket'][bucket_name] = {
'name': bucket_config['name'],
'location': '${var.region}',
'labels': bucket_config.get('tags', {})
}
return resources
def _generate_aws_network(self, config: Dict[str, Any]) -> Dict[str, Any]:
"""Generate AWS network resources"""
resources = {
'resource': {
'aws_vpc': {},
'aws_subnet': {}
}
}
for vpc_name, vpc_config in config.get('vpcs', {}).items():
resources['resource']['aws_vpc'][vpc_name] = {
'cidr_block': vpc_config['cidr'],
'enable_dns_hostnames': True,
'enable_dns_support': True,
'tags': vpc_config.get('tags', {})
}
for subnet_name, subnet_config in vpc_config.get('subnets', {}).items():
resources['resource']['aws_subnet'][subnet_name] = {
'vpc_id': f"${{aws_vpc.{vpc_name}.id}}",
'cidr_block': subnet_config['cidr'],
'availability_zone': subnet_config.get('az', '${data.aws_availability_zones.available.names[0]}'),
'tags': subnet_config.get('tags', {})
}
return resources
def _generate_azure_network(self, config: Dict[str, Any]) -> Dict[str, Any]:
"""Generate Azure network resources"""
resources = {
'resource': {
'azurerm_virtual_network': {},
'azurerm_subnet': {}
}
}
for vpc_name, vpc_config in config.get('vpcs', {}).items():
resources['resource']['azurerm_virtual_network'][vpc_name] = {
'name': vpc_name,
'address_space': [vpc_config['cidr']],
'location': '${var.location}',
'resource_group_name': '${var.resource_group_name}',
'tags': vpc_config.get('tags', {})
}
for subnet_name, subnet_config in vpc_config.get('subnets', {}).items():
resources['resource']['azurerm_subnet'][subnet_name] = {
'name': subnet_name,
'resource_group_name': '${var.resource_group_name}',
'virtual_network_name': f"${{azurerm_virtual_network.{vpc_name}.name}}",
'address_prefixes': [subnet_config['cidr']]
}
return resources
def _generate_gcp_network(self, config: Dict[str, Any]) -> Dict[str, Any]:
"""Generate GCP network resources"""
resources = {
'resource': {
'google_compute_network': {},
'google_compute_subnetwork': {}
}
}
for vpc_name, vpc_config in config.get('vpcs', {}).items():
resources['resource']['google_compute_network'][vpc_name] = {
'name': vpc_name,
'auto_create_subnetworks': False
}
for subnet_name, subnet_config in vpc_config.get('subnets', {}).items():
resources['resource']['google_compute_subnetwork'][subnet_name] = {
'name': subnet_name,
'ip_cidr_range': subnet_config['cidr'],
'region': '${var.region}',
'network': f"${{google_compute_network.{vpc_name}.id}}"
}
return resources
def _map_instance_size(self, provider: str, size: str) -> str:
"""Map universal size to provider-specific instance type"""
size_mappings = {
'aws': {
'small': 't3.micro',
'medium': 't3.small',
'large': 't3.medium',
'xlarge': 't3.large'
},
'azure': {
'small': 'Standard_B1s',
'medium': 'Standard_B2s',
'large': 'Standard_B4ms',
'xlarge': 'Standard_B8ms'
},
'gcp': {
'small': 'e2-micro',
'medium': 'e2-small',
'large': 'e2-medium',
'xlarge': 'e2-standard-2'
}
}
return size_mappings.get(provider, {}).get(size, size)
def main():
import argparse
parser = argparse.ArgumentParser(description='Multi-Cloud Configuration Factory')
parser.add_argument('--spec-file', required=True, help='Universal specification file')
parser.add_argument('--output-dir', default='./generated', help='Output directory')
args = parser.parse_args()
factory = MultiCloudConfigFactory()
generated_files = factory.generate_configs(args.spec_file, args.output_dir)
print("Generated configurations:")
for provider, file_path in generated_files.items():
print(f" {provider}: {file_path}")
if __name__ == "__main__":
main()
What’s Next
Provider abstraction patterns enable you to write infrastructure code once and deploy it across multiple clouds. However, data often needs to move between these environments. In the next part, we’ll explore data and storage strategies for multi-cloud architectures, including replication, backup, and disaster recovery patterns.
Data and Storage Strategies
Multi-cloud data strategies require careful planning for replication, backup, disaster recovery, and compliance. Each cloud provider offers different storage services with varying performance characteristics, pricing models, and integration capabilities. This part covers patterns for managing data across multiple clouds effectively.
Cross-Cloud Data Replication
Set up automated data replication between cloud providers:
# Primary storage in AWS
resource "aws_s3_bucket" "primary" {
bucket = "company-data-primary"
tags = {
Environment = "production"
Role = "primary"
}
}
resource "aws_s3_bucket_versioning" "primary" {
bucket = aws_s3_bucket.primary.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_notification" "replication_trigger" {
bucket = aws_s3_bucket.primary.id
lambda_function {
lambda_function_arn = aws_lambda_function.cross_cloud_replicator.arn
events = ["s3:ObjectCreated:*"]
}
}
# Backup storage in Azure
resource "azurerm_storage_account" "backup" {
name = "companydatabackup"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
account_tier = "Standard"
account_replication_type = "GRS"
tags = {
Environment = "production"
Role = "backup"
}
}
resource "azurerm_storage_container" "backup" {
name = "data-backup"
storage_account_name = azurerm_storage_account.backup.name
container_access_type = "private"
}
# Archive storage in GCP
resource "google_storage_bucket" "archive" {
name = "company-data-archive"
location = "US"
storage_class = "COLDLINE"
lifecycle_rule {
condition {
age = 90
}
action {
type = "SetStorageClass"
storage_class = "ARCHIVE"
}
}
labels = {
environment = "production"
role = "archive"
}
}
# Cross-cloud replication Lambda
resource "aws_lambda_function" "cross_cloud_replicator" {
filename = "replicator.zip"
function_name = "cross-cloud-replicator"
role = aws_iam_role.replicator.arn
handler = "index.handler"
runtime = "python3.9"
timeout = 300
environment {
variables = {
AZURE_STORAGE_ACCOUNT = azurerm_storage_account.backup.name
AZURE_CONTAINER = azurerm_storage_container.backup.name
GCP_BUCKET = google_storage_bucket.archive.name
}
}
}
resource "aws_iam_role" "replicator" {
name = "cross-cloud-replicator-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "lambda.amazonaws.com"
}
}
]
})
}
resource "aws_iam_role_policy" "replicator" {
name = "cross-cloud-replicator-policy"
role = aws_iam_role.replicator.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"s3:GetObject",
"s3:PutObject",
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
]
Resource = "*"
}
]
})
}
Database Replication Strategy
Implement cross-cloud database replication:
# Primary database in AWS RDS
resource "aws_db_instance" "primary" {
identifier = "company-db-primary"
engine = "postgres"
engine_version = "13.7"
instance_class = "db.t3.medium"
allocated_storage = 100
max_allocated_storage = 1000
storage_encrypted = true
db_name = "companydb"
username = "dbadmin"
password = var.db_password
vpc_security_group_ids = [aws_security_group.rds.id]
db_subnet_group_name = aws_db_subnet_group.main.name
backup_retention_period = 7
backup_window = "03:00-04:00"
maintenance_window = "sun:04:00-sun:05:00"
skip_final_snapshot = false
final_snapshot_identifier = "company-db-final-snapshot"
tags = {
Environment = "production"
Role = "primary"
}
}
# Read replica in Azure Database
resource "azurerm_postgresql_server" "replica" {
name = "company-db-replica"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
administrator_login = "dbadmin"
administrator_login_password = var.db_password
sku_name = "GP_Gen5_2"
version = "11"
storage_mb = 102400
backup_retention_days = 7
geo_redundant_backup_enabled = true
auto_grow_enabled = true
ssl_enforcement_enabled = true
tags = {
Environment = "production"
Role = "replica"
}
}
# Backup database in GCP Cloud SQL
resource "google_sql_database_instance" "backup" {
name = "company-db-backup"
database_version = "POSTGRES_13"
region = "us-central1"
settings {
tier = "db-custom-2-7680"
backup_configuration {
enabled = true
start_time = "03:00"
point_in_time_recovery_enabled = true
}
ip_configuration {
ipv4_enabled = false
private_network = google_compute_network.main.id
}
database_flags {
name = "log_statement"
value = "all"
}
}
deletion_protection = true
}
Data Synchronization Pipeline
Create automated data synchronization across clouds:
#!/usr/bin/env python3
# scripts/data_sync_pipeline.py
import boto3
import asyncio
from azure.storage.blob import BlobServiceClient
from google.cloud import storage as gcs
from typing import Dict, List, Optional
import hashlib
import json
from datetime import datetime
class MultiCloudDataSync:
def __init__(self, config: Dict[str, any]):
self.config = config
# Initialize cloud clients
self.s3_client = boto3.client('s3')
self.azure_client = BlobServiceClient(
account_url=f"https://{config['azure']['account_name']}.blob.core.windows.net",
credential=config['azure']['access_key']
)
self.gcp_client = gcs.Client(project=config['gcp']['project_id'])
async def sync_all_data(self) -> Dict[str, any]:
"""Synchronize data across all cloud providers"""
sync_results = {
'timestamp': datetime.utcnow().isoformat(),
'synced_objects': 0,
'errors': [],
'providers': {}
}
# Get source data inventory
source_objects = await self._get_source_inventory()
# Sync to each target provider
for target_provider in self.config['sync_targets']:
provider_results = await self._sync_to_provider(
source_objects,
target_provider
)
sync_results['providers'][target_provider] = provider_results
sync_results['synced_objects'] += provider_results.get('synced_count', 0)
sync_results['errors'].extend(provider_results.get('errors', []))
return sync_results
async def _get_source_inventory(self) -> List[Dict[str, any]]:
"""Get inventory of objects from source provider"""
source_config = self.config['source']
objects = []
if source_config['provider'] == 'aws':
paginator = self.s3_client.get_paginator('list_objects_v2')
for page in paginator.paginate(Bucket=source_config['bucket']):
for obj in page.get('Contents', []):
objects.append({
'key': obj['Key'],
'size': obj['Size'],
'etag': obj['ETag'].strip('"'),
'last_modified': obj['LastModified'].isoformat(),
'provider': 'aws'
})
return objects
async def _sync_to_provider(self, source_objects: List[Dict], target_provider: str) -> Dict[str, any]:
"""Sync objects to target provider"""
results = {
'synced_count': 0,
'skipped_count': 0,
'errors': []
}
target_config = self.config['targets'][target_provider]
# Get existing objects in target
existing_objects = await self._get_target_inventory(target_provider, target_config)
existing_keys = {obj['key']: obj for obj in existing_objects}
for source_obj in source_objects:
try:
# Check if object needs sync
if await self._needs_sync(source_obj, existing_keys.get(source_obj['key'])):
await self._copy_object(source_obj, target_provider, target_config)
results['synced_count'] += 1
else:
results['skipped_count'] += 1
except Exception as e:
results['errors'].append({
'object': source_obj['key'],
'error': str(e)
})
return results
async def _get_target_inventory(self, provider: str, config: Dict) -> List[Dict[str, any]]:
"""Get inventory from target provider"""
objects = []
if provider == 'azure':
container_client = self.azure_client.get_container_client(config['container'])
async for blob in container_client.list_blobs():
objects.append({
'key': blob.name,
'size': blob.size,
'etag': blob.etag.strip('"'),
'last_modified': blob.last_modified.isoformat()
})
elif provider == 'gcp':
bucket = self.gcp_client.bucket(config['bucket'])
for blob in bucket.list_blobs():
objects.append({
'key': blob.name,
'size': blob.size,
'etag': blob.etag.strip('"'),
'last_modified': blob.time_created.isoformat()
})
return objects
async def _needs_sync(self, source_obj: Dict, target_obj: Optional[Dict]) -> bool:
"""Determine if object needs synchronization"""
if not target_obj:
return True
# Compare ETags (checksums)
if source_obj['etag'] != target_obj['etag']:
return True
# Compare sizes
if source_obj['size'] != target_obj['size']:
return True
return False
async def _copy_object(self, source_obj: Dict, target_provider: str, target_config: Dict):
"""Copy object from source to target provider"""
# Download from source
source_config = self.config['source']
if source_config['provider'] == 'aws':
response = self.s3_client.get_object(
Bucket=source_config['bucket'],
Key=source_obj['key']
)
data = response['Body'].read()
# Upload to target
if target_provider == 'azure':
blob_client = self.azure_client.get_blob_client(
container=target_config['container'],
blob=source_obj['key']
)
blob_client.upload_blob(data, overwrite=True)
elif target_provider == 'gcp':
bucket = self.gcp_client.bucket(target_config['bucket'])
blob = bucket.blob(source_obj['key'])
blob.upload_from_string(data)
def generate_sync_report(self, results: Dict[str, any]) -> str:
"""Generate human-readable sync report"""
report_lines = [
"Multi-Cloud Data Sync Report",
"=" * 40,
f"Timestamp: {results['timestamp']}",
f"Total Objects Synced: {results['synced_objects']}",
f"Total Errors: {len(results['errors'])}",
""
]
for provider, provider_results in results['providers'].items():
report_lines.extend([
f"{provider.upper()} Results:",
f" Synced: {provider_results['synced_count']}",
f" Skipped: {provider_results['skipped_count']}",
f" Errors: {len(provider_results['errors'])}",
""
])
if results['errors']:
report_lines.extend(["Errors:", ""])
for error in results['errors'][:10]: # Show first 10 errors
report_lines.append(f" {error['object']}: {error['error']}")
return "\n".join(report_lines)
async def main():
import argparse
parser = argparse.ArgumentParser(description='Multi-Cloud Data Sync')
parser.add_argument('--config', required=True, help='Sync configuration file')
parser.add_argument('--dry-run', action='store_true', help='Show what would be synced')
args = parser.parse_args()
with open(args.config, 'r') as f:
config = json.load(f)
sync_manager = MultiCloudDataSync(config)
if args.dry_run:
print("DRY RUN - No data will be copied")
results = await sync_manager.sync_all_data()
report = sync_manager.generate_sync_report(results)
print(report)
if __name__ == "__main__":
asyncio.run(main())
Disaster Recovery Automation
Implement automated disaster recovery across clouds:
#!/bin/bash
# scripts/disaster-recovery.sh
set -e
DR_CONFIG_FILE=${1:-"dr-config.json"}
RECOVERY_TYPE=${2:-"full"} # full, partial, test
execute_disaster_recovery() {
echo "🚨 Executing disaster recovery: $RECOVERY_TYPE"
# Load DR configuration
if [ ! -f "$DR_CONFIG_FILE" ]; then
echo "❌ DR configuration file not found: $DR_CONFIG_FILE"
exit 1
fi
PRIMARY_PROVIDER=$(jq -r '.primary_provider' "$DR_CONFIG_FILE")
DR_PROVIDER=$(jq -r '.dr_provider' "$DR_CONFIG_FILE")
echo "Primary: $PRIMARY_PROVIDER"
echo "DR Target: $DR_PROVIDER"
# Check primary provider health
if check_provider_health "$PRIMARY_PROVIDER"; then
echo "⚠️ Primary provider is healthy. Are you sure you want to proceed?"
read -p "Continue with DR? (y/N): " -n 1 -r
echo
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
exit 1
fi
fi
# Execute recovery steps
case "$RECOVERY_TYPE" in
"full")
execute_full_recovery
;;
"partial")
execute_partial_recovery
;;
"test")
execute_test_recovery
;;
*)
echo "❌ Unknown recovery type: $RECOVERY_TYPE"
exit 1
;;
esac
}
check_provider_health() {
local provider=$1
case "$provider" in
"aws")
aws sts get-caller-identity >/dev/null 2>&1
;;
"azure")
az account show >/dev/null 2>&1
;;
"gcp")
gcloud auth list --filter=status:ACTIVE --format="value(account)" | head -1 >/dev/null 2>&1
;;
esac
}
execute_full_recovery() {
echo "🔄 Executing full disaster recovery..."
# 1. Activate DR infrastructure
activate_dr_infrastructure
# 2. Restore data from backups
restore_data_from_backups
# 3. Update DNS to point to DR site
update_dns_to_dr
# 4. Validate recovery
validate_recovery
echo "✅ Full disaster recovery completed"
}
activate_dr_infrastructure() {
echo "Activating DR infrastructure..."
DR_TERRAFORM_DIR=$(jq -r '.dr_terraform_dir' "$DR_CONFIG_FILE")
cd "$DR_TERRAFORM_DIR"
# Initialize and apply DR infrastructure
terraform init
terraform plan -var="dr_mode=active"
terraform apply -auto-approve -var="dr_mode=active"
# Wait for infrastructure to be ready
sleep 60
}
restore_data_from_backups() {
echo "Restoring data from backups..."
BACKUP_LOCATIONS=$(jq -r '.backup_locations[]' "$DR_CONFIG_FILE")
for backup_location in $BACKUP_LOCATIONS; do
echo "Restoring from: $backup_location"
# This would call provider-specific restore scripts
case "$DR_PROVIDER" in
"aws")
restore_from_s3_backup "$backup_location"
;;
"azure")
restore_from_azure_backup "$backup_location"
;;
"gcp")
restore_from_gcs_backup "$backup_location"
;;
esac
done
}
update_dns_to_dr() {
echo "Updating DNS to point to DR site..."
DR_ENDPOINT=$(jq -r '.dr_endpoint' "$DR_CONFIG_FILE")
DNS_ZONE=$(jq -r '.dns_zone' "$DR_CONFIG_FILE")
# Update DNS record to point to DR endpoint
aws route53 change-resource-record-sets \
--hosted-zone-id "$DNS_ZONE" \
--change-batch "{
\"Changes\": [{
\"Action\": \"UPSERT\",
\"ResourceRecordSet\": {
\"Name\": \"$(jq -r '.primary_domain' "$DR_CONFIG_FILE")\",
\"Type\": \"A\",
\"TTL\": 60,
\"ResourceRecords\": [{\"Value\": \"$DR_ENDPOINT\"}]
}
}]
}"
}
validate_recovery() {
echo "Validating disaster recovery..."
HEALTH_CHECK_URL=$(jq -r '.health_check_url' "$DR_CONFIG_FILE")
# Wait for application to be healthy
for i in {1..30}; do
if curl -f "$HEALTH_CHECK_URL" >/dev/null 2>&1; then
echo "✅ Application is healthy"
return 0
fi
echo "Waiting for application to be healthy... ($i/30)"
sleep 10
done
echo "❌ Application health check failed"
return 1
}
# Execute based on parameters
case "${3:-execute}" in
"execute")
execute_disaster_recovery
;;
"test")
echo "🧪 Testing disaster recovery procedures..."
RECOVERY_TYPE="test"
execute_disaster_recovery
;;
*)
echo "Usage: $0 <dr_config_file> <recovery_type> [execute|test]"
echo ""
echo "Recovery types: full, partial, test"
exit 1
;;
esac
What’s Next
Data and storage strategies provide the foundation for reliable multi-cloud operations, but monitoring and observability across multiple providers requires unified approaches. In the next part, we’ll explore how to implement comprehensive monitoring and observability that gives you visibility into your entire multi-cloud infrastructure from a single pane of glass.
Monitoring and Observability
Monitoring multi-cloud infrastructure requires aggregating metrics, logs, and traces from different providers into unified dashboards and alerting systems. Each cloud provider has native monitoring services, but you need centralized observability to understand your entire system’s health and performance.
Unified Metrics Collection
Set up centralized metrics collection from all cloud providers:
# Prometheus deployment for centralized metrics
resource "kubernetes_namespace" "monitoring" {
metadata {
name = "monitoring"
}
}
resource "helm_release" "prometheus" {
name = "prometheus"
repository = "https://prometheus-community.github.io/helm-charts"
chart = "kube-prometheus-stack"
namespace = kubernetes_namespace.monitoring.metadata[0].name
values = [
yamlencode({
prometheus = {
prometheusSpec = {
retention = "30d"
storageSpec = {
volumeClaimTemplate = {
spec = {
storageClassName = "fast-ssd"
accessModes = ["ReadWriteOnce"]
resources = {
requests = {
storage = "100Gi"
}
}
}
}
}
additionalScrapeConfigs = [
{
job_name = "aws-cloudwatch"
static_configs = [{
targets = ["cloudwatch-exporter:9106"]
}]
},
{
job_name = "azure-monitor"
static_configs = [{
targets = ["azure-exporter:9107"]
}]
},
{
job_name = "gcp-monitoring"
static_configs = [{
targets = ["gcp-exporter:9108"]
}]
}
]
}
}
grafana = {
adminPassword = var.grafana_admin_password
persistence = {
enabled = true
size = "10Gi"
}
}
})
]
}
# CloudWatch Exporter for AWS metrics
resource "kubernetes_deployment" "cloudwatch_exporter" {
metadata {
name = "cloudwatch-exporter"
namespace = kubernetes_namespace.monitoring.metadata[0].name
}
spec {
replicas = 1
selector {
match_labels = {
app = "cloudwatch-exporter"
}
}
template {
metadata {
labels = {
app = "cloudwatch-exporter"
}
}
spec {
container {
name = "cloudwatch-exporter"
image = "prom/cloudwatch-exporter:latest"
port {
container_port = 9106
}
env {
name = "AWS_REGION"
value = var.aws_region
}
volume_mount {
name = "config"
mount_path = "/config"
}
}
volume {
name = "config"
config_map {
name = kubernetes_config_map.cloudwatch_config.metadata[0].name
}
}
}
}
}
}
resource "kubernetes_config_map" "cloudwatch_config" {
metadata {
name = "cloudwatch-exporter-config"
namespace = kubernetes_namespace.monitoring.metadata[0].name
}
data = {
"config.yml" = yamlencode({
region = var.aws_region
metrics = [
{
aws_namespace = "AWS/EC2"
aws_metric_name = "CPUUtilization"
aws_dimensions = ["InstanceId"]
aws_statistics = ["Average"]
},
{
aws_namespace = "AWS/RDS"
aws_metric_name = "DatabaseConnections"
aws_dimensions = ["DBInstanceIdentifier"]
aws_statistics = ["Average"]
},
{
aws_namespace = "AWS/S3"
aws_metric_name = "BucketSizeBytes"
aws_dimensions = ["BucketName", "StorageType"]
aws_statistics = ["Average"]
}
]
})
}
}
# Azure Monitor Exporter
resource "kubernetes_deployment" "azure_exporter" {
metadata {
name = "azure-exporter"
namespace = kubernetes_namespace.monitoring.metadata[0].name
}
spec {
replicas = 1
selector {
match_labels = {
app = "azure-exporter"
}
}
template {
metadata {
labels = {
app = "azure-exporter"
}
}
spec {
container {
name = "azure-exporter"
image = "webdevops/azure-metrics-exporter:latest"
port {
container_port = 9107
}
env {
name = "AZURE_SUBSCRIPTION_ID"
value = var.azure_subscription_id
}
env {
name = "AZURE_CLIENT_ID"
value_from {
secret_key_ref {
name = kubernetes_secret.azure_credentials.metadata[0].name
key = "client_id"
}
}
}
env {
name = "AZURE_CLIENT_SECRET"
value_from {
secret_key_ref {
name = kubernetes_secret.azure_credentials.metadata[0].name
key = "client_secret"
}
}
}
}
}
}
}
}
# GCP Monitoring Exporter
resource "kubernetes_deployment" "gcp_exporter" {
metadata {
name = "gcp-exporter"
namespace = kubernetes_namespace.monitoring.metadata[0].name
}
spec {
replicas = 1
selector {
match_labels = {
app = "gcp-exporter"
}
}
template {
metadata {
labels = {
app = "gcp-exporter"
}
}
spec {
container {
name = "gcp-exporter"
image = "prometheuscommunity/stackdriver-exporter:latest"
port {
container_port = 9108
}
env {
name = "GOOGLE_APPLICATION_CREDENTIALS"
value = "/credentials/service-account.json"
}
env {
name = "STACKDRIVER_EXPORTER_GOOGLE_PROJECT_ID"
value = var.gcp_project_id
}
volume_mount {
name = "gcp-credentials"
mount_path = "/credentials"
}
}
volume {
name = "gcp-credentials"
secret {
secret_name = kubernetes_secret.gcp_credentials.metadata[0].name
}
}
}
}
}
}
Cross-Cloud Alerting System
Implement unified alerting across all providers:
#!/usr/bin/env python3
# scripts/multi_cloud_alerting.py
import boto3
import json
import requests
from azure.monitor.query import LogsQueryClient
from azure.identity import DefaultAzureCredential
from google.cloud import monitoring_v3
from typing import Dict, List, Any
from datetime import datetime, timedelta
class MultiCloudAlertManager:
def __init__(self, config: Dict[str, Any]):
self.config = config
# Initialize cloud monitoring clients
self.aws_cloudwatch = boto3.client('cloudwatch')
self.azure_credential = DefaultAzureCredential()
self.azure_logs_client = LogsQueryClient(self.azure_credential)
self.gcp_monitoring = monitoring_v3.MetricServiceClient()
# Alert channels
self.slack_webhook = config.get('slack_webhook_url')
self.pagerduty_key = config.get('pagerduty_integration_key')
def check_all_providers(self) -> Dict[str, Any]:
"""Check health across all cloud providers"""
results = {
'timestamp': datetime.utcnow().isoformat(),
'overall_status': 'healthy',
'providers': {},
'alerts': []
}
# Check each provider
for provider_config in self.config['providers']:
provider_name = provider_config['name']
try:
if provider_name == 'aws':
provider_results = self._check_aws_health(provider_config)
elif provider_name == 'azure':
provider_results = self._check_azure_health(provider_config)
elif provider_name == 'gcp':
provider_results = self._check_gcp_health(provider_config)
else:
continue
results['providers'][provider_name] = provider_results
# Collect alerts
if provider_results['alerts']:
results['alerts'].extend(provider_results['alerts'])
results['overall_status'] = 'degraded'
except Exception as e:
results['providers'][provider_name] = {
'status': 'error',
'error': str(e),
'alerts': [{
'severity': 'critical',
'message': f"Failed to check {provider_name}: {str(e)}"
}]
}
results['alerts'].append({
'provider': provider_name,
'severity': 'critical',
'message': f"Monitoring failure: {str(e)}"
})
results['overall_status'] = 'critical'
return results
def _check_aws_health(self, config: Dict[str, Any]) -> Dict[str, Any]:
"""Check AWS resource health"""
results = {
'status': 'healthy',
'metrics': {},
'alerts': []
}
# Check EC2 instances
for instance_check in config.get('ec2_checks', []):
metric_data = self.aws_cloudwatch.get_metric_statistics(
Namespace='AWS/EC2',
MetricName='CPUUtilization',
Dimensions=[{'Name': 'InstanceId', 'Value': instance_check['instance_id']}],
StartTime=datetime.utcnow() - timedelta(minutes=10),
EndTime=datetime.utcnow(),
Period=300,
Statistics=['Average']
)
if metric_data['Datapoints']:
cpu_usage = metric_data['Datapoints'][-1]['Average']
results['metrics'][f"ec2_{instance_check['instance_id']}_cpu"] = cpu_usage
if cpu_usage > instance_check.get('cpu_threshold', 80):
results['alerts'].append({
'severity': 'warning',
'resource': instance_check['instance_id'],
'message': f"High CPU usage: {cpu_usage:.1f}%"
})
# Check RDS instances
for rds_check in config.get('rds_checks', []):
metric_data = self.aws_cloudwatch.get_metric_statistics(
Namespace='AWS/RDS',
MetricName='DatabaseConnections',
Dimensions=[{'Name': 'DBInstanceIdentifier', 'Value': rds_check['db_instance']}],
StartTime=datetime.utcnow() - timedelta(minutes=10),
EndTime=datetime.utcnow(),
Period=300,
Statistics=['Average']
)
if metric_data['Datapoints']:
connections = metric_data['Datapoints'][-1]['Average']
results['metrics'][f"rds_{rds_check['db_instance']}_connections"] = connections
if connections > rds_check.get('connection_threshold', 80):
results['alerts'].append({
'severity': 'warning',
'resource': rds_check['db_instance'],
'message': f"High database connections: {connections}"
})
return results
def _check_azure_health(self, config: Dict[str, Any]) -> Dict[str, Any]:
"""Check Azure resource health"""
results = {
'status': 'healthy',
'metrics': {},
'alerts': []
}
# Check virtual machines
for vm_check in config.get('vm_checks', []):
query = f"""
Perf
| where TimeGenerated > ago(10m)
| where Computer == "{vm_check['vm_name']}"
| where CounterName == "% Processor Time"
| summarize avg(CounterValue) by bin(TimeGenerated, 5m)
| order by TimeGenerated desc
| limit 1
"""
try:
response = self.azure_logs_client.query_workspace(
workspace_id=config['workspace_id'],
query=query,
timespan=timedelta(minutes=10)
)
if response.tables and response.tables[0].rows:
cpu_usage = response.tables[0].rows[0][1]
results['metrics'][f"vm_{vm_check['vm_name']}_cpu"] = cpu_usage
if cpu_usage > vm_check.get('cpu_threshold', 80):
results['alerts'].append({
'severity': 'warning',
'resource': vm_check['vm_name'],
'message': f"High CPU usage: {cpu_usage:.1f}%"
})
except Exception as e:
results['alerts'].append({
'severity': 'error',
'resource': vm_check['vm_name'],
'message': f"Failed to query metrics: {str(e)}"
})
return results
def _check_gcp_health(self, config: Dict[str, Any]) -> Dict[str, Any]:
"""Check GCP resource health"""
results = {
'status': 'healthy',
'metrics': {},
'alerts': []
}
project_name = f"projects/{config['project_id']}"
# Check Compute Engine instances
for instance_check in config.get('instance_checks', []):
interval = monitoring_v3.TimeInterval({
"end_time": {"seconds": int(datetime.utcnow().timestamp())},
"start_time": {"seconds": int((datetime.utcnow() - timedelta(minutes=10)).timestamp())},
})
request = monitoring_v3.ListTimeSeriesRequest({
"name": project_name,
"filter": f'metric.type="compute.googleapis.com/instance/cpu/utilization" AND resource.labels.instance_name="{instance_check["instance_name"]}"',
"interval": interval,
"view": monitoring_v3.ListTimeSeriesRequest.TimeSeriesView.FULL,
})
try:
page_result = self.gcp_monitoring.list_time_series(request=request)
for time_series in page_result:
if time_series.points:
cpu_usage = time_series.points[0].value.double_value * 100
results['metrics'][f"gce_{instance_check['instance_name']}_cpu"] = cpu_usage
if cpu_usage > instance_check.get('cpu_threshold', 80):
results['alerts'].append({
'severity': 'warning',
'resource': instance_check['instance_name'],
'message': f"High CPU usage: {cpu_usage:.1f}%"
})
except Exception as e:
results['alerts'].append({
'severity': 'error',
'resource': instance_check['instance_name'],
'message': f"Failed to query metrics: {str(e)}"
})
return results
def send_alerts(self, alerts: List[Dict[str, Any]]):
"""Send alerts to configured channels"""
if not alerts:
return
# Group alerts by severity
critical_alerts = [a for a in alerts if a.get('severity') == 'critical']
warning_alerts = [a for a in alerts if a.get('severity') == 'warning']
# Send to Slack
if self.slack_webhook:
self._send_slack_alert(critical_alerts, warning_alerts)
# Send to PagerDuty for critical alerts
if self.pagerduty_key and critical_alerts:
self._send_pagerduty_alert(critical_alerts)
def _send_slack_alert(self, critical_alerts: List, warning_alerts: List):
"""Send alert to Slack"""
color = "danger" if critical_alerts else "warning"
message = {
"attachments": [{
"color": color,
"title": "Multi-Cloud Infrastructure Alert",
"fields": []
}]
}
if critical_alerts:
message["attachments"][0]["fields"].append({
"title": f"Critical Alerts ({len(critical_alerts)})",
"value": "\n".join([f"• {alert['message']}" for alert in critical_alerts[:5]]),
"short": False
})
if warning_alerts:
message["attachments"][0]["fields"].append({
"title": f"Warning Alerts ({len(warning_alerts)})",
"value": "\n".join([f"• {alert['message']}" for alert in warning_alerts[:5]]),
"short": False
})
requests.post(self.slack_webhook, json=message)
def _send_pagerduty_alert(self, critical_alerts: List):
"""Send critical alert to PagerDuty"""
payload = {
"routing_key": self.pagerduty_key,
"event_action": "trigger",
"payload": {
"summary": f"Multi-Cloud Critical Alert: {len(critical_alerts)} issues detected",
"source": "multi-cloud-monitor",
"severity": "critical",
"custom_details": {
"alerts": critical_alerts
}
}
}
requests.post("https://events.pagerduty.com/v2/enqueue", json=payload)
def main():
import argparse
parser = argparse.ArgumentParser(description='Multi-Cloud Alert Manager')
parser.add_argument('--config', required=True, help='Configuration file')
parser.add_argument('--send-alerts', action='store_true', help='Send alerts to configured channels')
args = parser.parse_args()
with open(args.config, 'r') as f:
config = json.load(f)
alert_manager = MultiCloudAlertManager(config)
results = alert_manager.check_all_providers()
print(f"Overall Status: {results['overall_status']}")
print(f"Total Alerts: {len(results['alerts'])}")
for provider, provider_results in results['providers'].items():
print(f"\n{provider.upper()}:")
print(f" Status: {provider_results['status']}")
print(f" Alerts: {len(provider_results.get('alerts', []))}")
if args.send_alerts and results['alerts']:
alert_manager.send_alerts(results['alerts'])
print(f"\n📧 Sent {len(results['alerts'])} alerts")
if __name__ == "__main__":
main()
Unified Dashboard Creation
Create comprehensive dashboards showing all cloud providers:
#!/bin/bash
# scripts/setup-dashboards.sh
set -e
GRAFANA_URL=${1:-"http://localhost:3000"}
GRAFANA_USER=${2:-"admin"}
GRAFANA_PASSWORD=${3:-"admin"}
create_multi_cloud_dashboard() {
echo "Creating multi-cloud overview dashboard..."
cat > multi-cloud-dashboard.json << 'EOF'
{
"dashboard": {
"title": "Multi-Cloud Infrastructure Overview",
"tags": ["multi-cloud", "overview"],
"timezone": "browser",
"panels": [
{
"title": "AWS EC2 CPU Utilization",
"type": "stat",
"targets": [
{
"expr": "aws_ec2_cpuutilization_average",
"legendFormat": "{{instance_id}}"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"thresholds": {
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 70},
{"color": "red", "value": 90}
]
}
}
},
"gridPos": {"h": 8, "w": 8, "x": 0, "y": 0}
},
{
"title": "Azure VM CPU Utilization",
"type": "stat",
"targets": [
{
"expr": "azure_vm_cpu_percent",
"legendFormat": "{{vm_name}}"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"thresholds": {
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 70},
{"color": "red", "value": 90}
]
}
}
},
"gridPos": {"h": 8, "w": 8, "x": 8, "y": 0}
},
{
"title": "GCP Compute CPU Utilization",
"type": "stat",
"targets": [
{
"expr": "gcp_compute_instance_cpu_utilization",
"legendFormat": "{{instance_name}}"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"thresholds": {
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 70},
{"color": "red", "value": 90}
]
}
}
},
"gridPos": {"h": 8, "w": 8, "x": 16, "y": 0}
},
{
"title": "Cross-Cloud Network Latency",
"type": "graph",
"targets": [
{
"expr": "probe_duration_seconds{job=\"blackbox\"}",
"legendFormat": "{{instance}}"
}
],
"yAxes": [
{
"label": "Latency (seconds)",
"min": 0
}
],
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 8}
},
{
"title": "Storage Usage by Provider",
"type": "piechart",
"targets": [
{
"expr": "aws_s3_bucket_size_bytes",
"legendFormat": "AWS S3"
},
{
"expr": "azure_storage_account_used_capacity",
"legendFormat": "Azure Storage"
},
{
"expr": "gcp_storage_bucket_size",
"legendFormat": "GCP Storage"
}
],
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 16}
},
{
"title": "Database Connections",
"type": "graph",
"targets": [
{
"expr": "aws_rds_database_connections",
"legendFormat": "AWS RDS {{db_instance_identifier}}"
},
{
"expr": "azure_sql_connections",
"legendFormat": "Azure SQL {{server_name}}"
},
{
"expr": "gcp_cloudsql_connections",
"legendFormat": "GCP Cloud SQL {{database_id}}"
}
],
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 16}
}
],
"time": {
"from": "now-1h",
"to": "now"
},
"refresh": "30s"
}
}
EOF
# Import dashboard to Grafana
curl -X POST \
-H "Content-Type: application/json" \
-u "$GRAFANA_USER:$GRAFANA_PASSWORD" \
-d @multi-cloud-dashboard.json \
"$GRAFANA_URL/api/dashboards/db"
echo "✅ Multi-cloud dashboard created"
}
create_cost_dashboard() {
echo "Creating cost monitoring dashboard..."
cat > cost-dashboard.json << 'EOF'
{
"dashboard": {
"title": "Multi-Cloud Cost Analysis",
"tags": ["cost", "billing"],
"panels": [
{
"title": "Daily Costs by Provider",
"type": "graph",
"targets": [
{
"expr": "aws_billing_estimated_charges",
"legendFormat": "AWS"
},
{
"expr": "azure_consumption_cost",
"legendFormat": "Azure"
},
{
"expr": "gcp_billing_cost",
"legendFormat": "GCP"
}
],
"yAxes": [
{
"label": "Cost (USD)",
"min": 0
}
],
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 0}
},
{
"title": "Cost by Service Category",
"type": "table",
"targets": [
{
"expr": "sum by (service) (aws_billing_estimated_charges)",
"format": "table"
}
],
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 8}
},
{
"title": "Monthly Cost Trend",
"type": "graph",
"targets": [
{
"expr": "increase(aws_billing_estimated_charges[30d])",
"legendFormat": "AWS Monthly"
}
],
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 8}
}
]
}
}
EOF
curl -X POST \
-H "Content-Type: application/json" \
-u "$GRAFANA_USER:$GRAFANA_PASSWORD" \
-d @cost-dashboard.json \
"$GRAFANA_URL/api/dashboards/db"
echo "✅ Cost dashboard created"
}
create_sla_dashboard() {
echo "Creating SLA monitoring dashboard..."
cat > sla-dashboard.json << 'EOF'
{
"dashboard": {
"title": "Multi-Cloud SLA Monitoring",
"tags": ["sla", "uptime"],
"panels": [
{
"title": "Service Uptime",
"type": "stat",
"targets": [
{
"expr": "avg_over_time(up{job=\"multi-cloud-services\"}[24h]) * 100",
"legendFormat": "{{service}}"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 95,
"max": 100,
"thresholds": {
"steps": [
{"color": "red", "value": null},
{"color": "yellow", "value": 99},
{"color": "green", "value": 99.9}
]
}
}
},
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 0}
},
{
"title": "Response Time SLA",
"type": "graph",
"targets": [
{
"expr": "histogram_quantile(0.95, http_request_duration_seconds_bucket)",
"legendFormat": "95th percentile"
},
{
"expr": "histogram_quantile(0.99, http_request_duration_seconds_bucket)",
"legendFormat": "99th percentile"
}
],
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 8}
}
]
}
}
EOF
curl -X POST \
-H "Content-Type: application/json" \
-u "$GRAFANA_USER:$GRAFANA_PASSWORD" \
-d @sla-dashboard.json \
"$GRAFANA_URL/api/dashboards/db"
echo "✅ SLA dashboard created"
}
# Create all dashboards
create_multi_cloud_dashboard
create_cost_dashboard
create_sla_dashboard
# Cleanup temp files
rm -f multi-cloud-dashboard.json cost-dashboard.json sla-dashboard.json
echo "✅ All multi-cloud dashboards created successfully"
echo "Access them at: $GRAFANA_URL"
What’s Next
Unified monitoring and observability provide the visibility needed to operate multi-cloud infrastructure effectively. With comprehensive metrics, alerting, and dashboards in place, you can maintain high availability and performance across all your cloud providers.
In the final part of this guide, we’ll explore governance and cost management strategies that help you maintain control, compliance, and cost efficiency across your entire multi-cloud environment.
Governance and Cost Management
Managing governance and costs across multiple cloud providers presents unique challenges. Each provider has different pricing models, compliance frameworks, and management tools. Effective multi-cloud governance requires unified policies, consistent tagging strategies, and comprehensive cost monitoring that works across AWS, Azure, and Google Cloud.
This final part covers the patterns and practices for implementing governance and cost management in multi-cloud Terraform environments.
Unified Tagging Strategy
Implement consistent tagging across all cloud providers:
# Global tagging strategy
locals {
# Standard tags that work across all providers
standard_tags = {
Environment = var.environment
Project = var.project_name
Owner = var.team_name
CostCenter = var.cost_center
ManagedBy = "terraform"
CreatedDate = formatdate("YYYY-MM-DD", timestamp())
LastModified = formatdate("YYYY-MM-DD", timestamp())
}
# Provider-specific tag formats
aws_tags = local.standard_tags
azure_tags = {
for k, v in local.standard_tags :
k => v
}
gcp_labels = {
for k, v in local.standard_tags :
lower(replace(k, " ", "_")) => lower(replace(v, " ", "_"))
}
}
# AWS resources with standard tags
resource "aws_instance" "web" {
count = var.providers_config.aws_enabled ? var.instance_count : 0
ami = data.aws_ami.latest[0].id
instance_type = var.instance_type
tags = merge(local.aws_tags, {
Name = "${var.project_name}-web-${count.index + 1}"
Role = "webserver"
Provider = "aws"
})
}
# Azure resources with standard tags
resource "azurerm_virtual_machine" "web" {
count = var.providers_config.azure_enabled ? var.instance_count : 0
name = "${var.project_name}-web-${count.index + 1}"
location = var.azure_location
resource_group_name = azurerm_resource_group.main[0].name
vm_size = var.azure_vm_size
tags = merge(local.azure_tags, {
Role = "webserver"
Provider = "azure"
})
}
# GCP resources with standard labels
resource "google_compute_instance" "web" {
count = var.providers_config.gcp_enabled ? var.instance_count : 0
name = "${var.project_name}-web-${count.index + 1}"
machine_type = var.gcp_machine_type
zone = var.gcp_zone
labels = merge(local.gcp_labels, {
role = "webserver"
provider = "gcp"
})
}
Multi-Cloud Policy Framework
Implement consistent policies across providers:
# Policy configuration for all providers
variable "governance_policies" {
description = "Governance policies to apply across all providers"
type = object({
allowed_regions = object({
aws = list(string)
azure = list(string)
gcp = list(string)
})
allowed_instance_types = object({
aws = list(string)
azure = list(string)
gcp = list(string)
})
required_tags = list(string)
cost_limits = object({
monthly_budget = number
alert_threshold = number
})
})
default = {
allowed_regions = {
aws = ["us-west-2", "us-east-1", "eu-west-1"]
azure = ["West US 2", "East US", "West Europe"]
gcp = ["us-west1", "us-east1", "europe-west1"]
}
allowed_instance_types = {
aws = ["t3.micro", "t3.small", "t3.medium", "t3.large"]
azure = ["Standard_B1s", "Standard_B2s", "Standard_D2s_v3"]
gcp = ["e2-micro", "e2-small", "e2-medium", "e2-standard-2"]
}
required_tags = ["Environment", "Project", "Owner", "CostCenter"]
cost_limits = {
monthly_budget = 10000
alert_threshold = 80
}
}
}
# AWS policy validation
resource "aws_instance" "web" {
count = var.providers_config.aws_enabled ? var.instance_count : 0
ami = data.aws_ami.latest[0].id
instance_type = var.aws_instance_type
lifecycle {
precondition {
condition = contains(
var.governance_policies.allowed_instance_types.aws,
var.aws_instance_type
)
error_message = "Instance type ${var.aws_instance_type} is not allowed. Allowed types: ${join(", ", var.governance_policies.allowed_instance_types.aws)}"
}
postcondition {
condition = alltrue([
for tag in var.governance_policies.required_tags :
contains(keys(self.tags), tag)
])
error_message = "All required tags must be present: ${join(", ", var.governance_policies.required_tags)}"
}
}
tags = local.aws_tags
}
# Azure policy validation
resource "azurerm_virtual_machine" "web" {
count = var.providers_config.azure_enabled ? var.instance_count : 0
name = "${var.project_name}-web-${count.index + 1}"
location = var.azure_location
resource_group_name = azurerm_resource_group.main[0].name
vm_size = var.azure_vm_size
lifecycle {
precondition {
condition = contains(
var.governance_policies.allowed_regions.azure,
var.azure_location
)
error_message = "Region ${var.azure_location} is not allowed for Azure resources."
}
precondition {
condition = contains(
var.governance_policies.allowed_instance_types.azure,
var.azure_vm_size
)
error_message = "VM size ${var.azure_vm_size} is not allowed."
}
}
tags = local.azure_tags
}
Cost Monitoring and Budgets
Implement comprehensive cost monitoring across providers:
# AWS cost monitoring
resource "aws_budgets_budget" "monthly_aws" {
count = var.providers_config.aws_enabled ? 1 : 0
name = "${var.project_name}-aws-monthly-budget"
budget_type = "COST"
limit_amount = var.governance_policies.cost_limits.monthly_budget * 0.4 # 40% allocation to AWS
limit_unit = "USD"
time_unit = "MONTHLY"
cost_filters = {
LinkedAccount = [data.aws_caller_identity.current[0].account_id]
TagKey = ["Project"]
TagValue = [var.project_name]
}
notification {
comparison_operator = "GREATER_THAN"
threshold = var.governance_policies.cost_limits.alert_threshold
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = var.budget_notification_emails
}
}
# Azure cost monitoring
resource "azurerm_consumption_budget_resource_group" "monthly_azure" {
count = var.providers_config.azure_enabled ? 1 : 0
name = "${var.project_name}-azure-monthly-budget"
resource_group_id = azurerm_resource_group.main[0].id
amount = var.governance_policies.cost_limits.monthly_budget * 0.4 # 40% allocation to Azure
time_grain = "Monthly"
time_period {
start_date = formatdate("YYYY-MM-01T00:00:00Z", timestamp())
end_date = formatdate("YYYY-MM-01T00:00:00Z", timeadd(timestamp(), "8760h")) # 1 year
}
notification {
enabled = true
threshold = var.governance_policies.cost_limits.alert_threshold
operator = "GreaterThan"
threshold_type = "Actual"
contact_emails = var.budget_notification_emails
}
}
# GCP cost monitoring
resource "google_billing_budget" "monthly_gcp" {
count = var.providers_config.gcp_enabled ? 1 : 0
billing_account = var.gcp_billing_account
display_name = "${var.project_name}-gcp-monthly-budget"
budget_filter {
projects = ["projects/${var.gcp_project_id}"]
labels = {
project = var.project_name
}
}
amount {
specified_amount {
currency_code = "USD"
units = tostring(floor(var.governance_policies.cost_limits.monthly_budget * 0.2)) # 20% allocation to GCP
}
}
threshold_rules {
threshold_percent = var.governance_policies.cost_limits.alert_threshold / 100
spend_basis = "CURRENT_SPEND"
}
all_updates_rule {
monitoring_notification_channels = var.gcp_notification_channels
disable_default_iam_recipients = false
}
}
Unified Cost Reporting
Create unified cost reporting across all providers:
#!/usr/bin/env python3
# scripts/multi_cloud_cost_report.py
import boto3
import json
import requests
from datetime import datetime, timedelta
from google.cloud import billing_v1
from azure.identity import DefaultAzureCredential
from azure.mgmt.consumption import ConsumptionManagementClient
class MultiCloudCostReporter:
def __init__(self, config):
self.config = config
self.aws_client = boto3.client('ce', region_name='us-east-1') if config.get('aws_enabled') else None
self.azure_client = ConsumptionManagementClient(
DefaultAzureCredential(),
config.get('azure_subscription_id')
) if config.get('azure_enabled') else None
self.gcp_client = billing_v1.CloudBillingClient() if config.get('gcp_enabled') else None
def get_aws_costs(self, start_date, end_date):
"""Get AWS costs for the specified period"""
if not self.aws_client:
return {"provider": "aws", "total_cost": 0, "services": []}
try:
response = self.aws_client.get_cost_and_usage(
TimePeriod={
'Start': start_date.strftime('%Y-%m-%d'),
'End': end_date.strftime('%Y-%m-%d')
},
Granularity='MONTHLY',
Metrics=['BlendedCost'],
GroupBy=[
{'Type': 'DIMENSION', 'Key': 'SERVICE'},
]
)
total_cost = 0
services = []
for result in response['ResultsByTime']:
for group in result['Groups']:
service_name = group['Keys'][0]
cost = float(group['Metrics']['BlendedCost']['Amount'])
total_cost += cost
services.append({
'service': service_name,
'cost': cost
})
return {
"provider": "aws",
"total_cost": total_cost,
"services": services
}
except Exception as e:
print(f"Error getting AWS costs: {e}")
return {"provider": "aws", "total_cost": 0, "services": []}
def get_azure_costs(self, start_date, end_date):
"""Get Azure costs for the specified period"""
if not self.azure_client:
return {"provider": "azure", "total_cost": 0, "services": []}
try:
# Azure consumption API call would go here
# This is a simplified example
return {
"provider": "azure",
"total_cost": 0, # Placeholder
"services": []
}
except Exception as e:
print(f"Error getting Azure costs: {e}")
return {"provider": "azure", "total_cost": 0, "services": []}
def get_gcp_costs(self, start_date, end_date):
"""Get GCP costs for the specified period"""
if not self.gcp_client:
return {"provider": "gcp", "total_cost": 0, "services": []}
try:
# GCP billing API call would go here
# This is a simplified example
return {
"provider": "gcp",
"total_cost": 0, # Placeholder
"services": []
}
except Exception as e:
print(f"Error getting GCP costs: {e}")
return {"provider": "gcp", "total_cost": 0, "services": []}
def generate_unified_report(self, days_back=30):
"""Generate a unified cost report across all providers"""
end_date = datetime.now()
start_date = end_date - timedelta(days=days_back)
# Get costs from all providers
aws_costs = self.get_aws_costs(start_date, end_date)
azure_costs = self.get_azure_costs(start_date, end_date)
gcp_costs = self.get_gcp_costs(start_date, end_date)
# Combine results
total_cost = aws_costs['total_cost'] + azure_costs['total_cost'] + gcp_costs['total_cost']
report = {
"report_date": datetime.now().isoformat(),
"period": {
"start": start_date.isoformat(),
"end": end_date.isoformat()
},
"total_cost": total_cost,
"providers": [aws_costs, azure_costs, gcp_costs],
"cost_breakdown": {
"aws_percentage": (aws_costs['total_cost'] / total_cost * 100) if total_cost > 0 else 0,
"azure_percentage": (azure_costs['total_cost'] / total_cost * 100) if total_cost > 0 else 0,
"gcp_percentage": (gcp_costs['total_cost'] / total_cost * 100) if total_cost > 0 else 0
}
}
return report
def save_report(self, report, filename=None):
"""Save the cost report to a file"""
if not filename:
filename = f"multi_cloud_cost_report_{datetime.now().strftime('%Y%m%d')}.json"
with open(filename, 'w') as f:
json.dump(report, f, indent=2)
print(f"Cost report saved to {filename}")
return filename
# Usage example
if __name__ == "__main__":
config = {
'aws_enabled': True,
'azure_enabled': True,
'gcp_enabled': True,
'azure_subscription_id': 'your-subscription-id',
'gcp_project_id': 'your-project-id'
}
reporter = MultiCloudCostReporter(config)
report = reporter.generate_unified_report(30)
reporter.save_report(report)
print(f"Total multi-cloud cost: ${report['total_cost']:.2f}")
for provider in report['providers']:
print(f"{provider['provider'].upper()}: ${provider['total_cost']:.2f}")
Compliance and Audit Framework
Implement unified compliance monitoring:
# Compliance monitoring module
module "compliance_monitoring" {
source = "./modules/compliance-monitoring"
providers_config = var.providers_config
project_name = var.project_name
# Compliance requirements
compliance_frameworks = [
"SOC2",
"ISO27001",
"GDPR",
"HIPAA"
]
# Audit requirements
audit_config = {
log_retention_days = 2555 # 7 years
enable_encryption = true
enable_monitoring = true
}
# Notification settings
compliance_alerts = {
email_addresses = var.compliance_notification_emails
slack_webhook = var.compliance_slack_webhook
}
}
# AWS compliance resources
resource "aws_config_configuration_recorder" "compliance" {
count = var.providers_config.aws_enabled ? 1 : 0
name = "${var.project_name}-compliance-recorder"
role_arn = aws_iam_role.config_role[0].arn
recording_group {
all_supported = true
include_global_resource_types = true
}
}
resource "aws_config_config_rule" "required_tags" {
count = var.providers_config.aws_enabled ? 1 : 0
name = "${var.project_name}-required-tags"
source {
owner = "AWS"
source_identifier = "REQUIRED_TAGS"
}
input_parameters = jsonencode({
tag1Key = "Environment"
tag2Key = "Project"
tag3Key = "Owner"
tag4Key = "CostCenter"
})
depends_on = [aws_config_configuration_recorder.compliance]
}
# Azure compliance resources
resource "azurerm_policy_assignment" "required_tags" {
count = var.providers_config.azure_enabled ? 1 : 0
name = "${var.project_name}-required-tags"
scope = azurerm_resource_group.main[0].id
policy_definition_id = "/providers/Microsoft.Authorization/policyDefinitions/1e30110a-5ceb-460c-a204-c1c3969c6d62"
parameters = jsonencode({
tagName = {
value = "Environment"
}
})
}
# GCP compliance resources
resource "google_project_organization_policy" "require_labels" {
count = var.providers_config.gcp_enabled ? 1 : 0
project = var.gcp_project_id
constraint = "constraints/gcp.resourceLocations"
list_policy {
allow {
values = var.governance_policies.allowed_regions.gcp
}
}
}
Automated Governance Enforcement
Implement automated policy enforcement:
# .github/workflows/governance-check.yml
name: Multi-Cloud Governance Check
on:
pull_request:
paths: ['infrastructure/**']
jobs:
governance-validation:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.6.0
- name: Setup OPA
uses: open-policy-agent/setup-opa@v2
- name: Generate Terraform Plans
run: |
find infrastructure -name "*.tf" -exec dirname {} \; | sort -u | while read dir; do
cd "$dir"
terraform init -backend=false
terraform plan -out=plan.tfplan
terraform show -json plan.tfplan > plan.json
cd - > /dev/null
done
- name: Run Multi-Cloud Policy Checks
run: |
# Check AWS resources
find infrastructure -name "plan.json" | while read plan; do
echo "Checking AWS policies for $plan"
opa eval -d policies/aws/ -i "$plan" "data.aws.deny[x]"
done
# Check Azure resources
find infrastructure -name "plan.json" | while read plan; do
echo "Checking Azure policies for $plan"
opa eval -d policies/azure/ -i "$plan" "data.azure.deny[x]"
done
# Check GCP resources
find infrastructure -name "plan.json" | while read plan; do
echo "Checking GCP policies for $plan"
opa eval -d policies/gcp/ -i "$plan" "data.gcp.deny[x]"
done
- name: Cost Impact Analysis
run: |
python3 scripts/cost_impact_analysis.py \
--terraform-plans "infrastructure/*/plan.json" \
--budget-limit ${{ vars.MONTHLY_BUDGET_LIMIT }} \
--output cost-impact.json
- name: Generate Governance Report
run: |
python3 scripts/governance_report.py \
--terraform-plans "infrastructure/*/plan.json" \
--policies policies/ \
--output governance-report.md
- name: Comment on PR
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const report = fs.readFileSync('governance-report.md', 'utf8');
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `## Multi-Cloud Governance Report\n\n${report}`
});
Conclusion
Effective multi-cloud governance and cost management require unified policies, consistent monitoring, and automated enforcement across all cloud providers. The patterns covered in this guide provide a framework for implementing governance that scales across AWS, Azure, and Google Cloud while maintaining cost control and compliance requirements.
The key to successful multi-cloud governance is treating it as a unified system rather than managing each provider separately. Consistent tagging, unified cost reporting, and automated policy enforcement ensure that your multi-cloud infrastructure remains manageable, compliant, and cost-effective as it scales.
Remember that governance is an ongoing process that requires regular review and adjustment as your multi-cloud architecture evolves and as cloud providers introduce new services and pricing models.