Multi-Cloud Terraform: Managing Infrastructure Across Providers

Multi-cloud strategies are becoming increasingly common as organizations seek to avoid vendor lock-in, leverage best-of-breed services, and meet compliance requirements. However, managing infrastructure across multiple cloud providers introduces complexity in networking, identity management, monitoring, and operational processes.

This guide covers the patterns and practices for successfully implementing multi-cloud infrastructure with Terraform, from basic provider configuration to advanced cross-cloud networking and unified governance.

Multi-Provider Setup

Managing infrastructure across multiple cloud providers requires careful planning of provider configurations, authentication strategies, and resource organization. Each cloud provider has different authentication mechanisms, regional structures, and service offerings that need to be coordinated in a unified Terraform configuration.

This part covers the foundational patterns for multi-cloud Terraform configurations, from basic provider setup to advanced authentication and resource management strategies.

Multi-Provider Configuration

A typical multi-cloud setup involves configuring multiple providers with appropriate aliases and authentication:

terraform {
  required_version = ">= 1.6"
  
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.20"
    }
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.70"
    }
    google = {
      source  = "hashicorp/google"
      version = "~> 4.80"
    }
  }
}

# AWS Provider Configuration
provider "aws" {
  region = var.aws_region
  alias  = "primary"
  
  default_tags {
    tags = local.common_tags
  }
}

provider "aws" {
  region = var.aws_secondary_region
  alias  = "secondary"
  
  default_tags {
    tags = local.common_tags
  }
}

# Azure Provider Configuration
provider "azurerm" {
  features {}
  
  subscription_id = var.azure_subscription_id
  tenant_id       = var.azure_tenant_id
  
  # Use managed identity when running in Azure
  use_msi = var.use_azure_msi
}

# Google Cloud Provider Configuration
provider "google" {
  project = var.gcp_project_id
  region  = var.gcp_region
  
  # Use service account key or application default credentials
  credentials = var.gcp_credentials_file
}

provider "google" {
  project = var.gcp_project_id
  region  = var.gcp_secondary_region
  alias   = "secondary"
  
  credentials = var.gcp_credentials_file
}

Authentication Strategies

Different providers require different authentication approaches:

AWS Authentication:

# Method 1: IAM Roles (recommended for production)
provider "aws" {
  region = "us-west-2"
  
  assume_role {
    role_arn = "arn:aws:iam::123456789012:role/TerraformRole"
  }
}

# Method 2: Environment variables
# AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN

# Method 3: AWS CLI profiles
# AWS_PROFILE=production terraform apply

Azure Authentication:

# Method 1: Service Principal
provider "azurerm" {
  features {}
  
  subscription_id = var.azure_subscription_id
  client_id       = var.azure_client_id
  client_secret   = var.azure_client_secret
  tenant_id       = var.azure_tenant_id
}

# Method 2: Managed Identity (when running in Azure)
provider "azurerm" {
  features {}
  use_msi = true
}

# Method 3: Azure CLI authentication
# az login && terraform apply

Google Cloud Authentication:

# Method 1: Service Account Key
provider "google" {
  project     = var.gcp_project_id
  region      = var.gcp_region
  credentials = file("path/to/service-account-key.json")
}

# Method 2: Application Default Credentials
provider "google" {
  project = var.gcp_project_id
  region  = var.gcp_region
  # Uses gcloud application-default login
}

# Method 3: Workload Identity (when running in GKE)
provider "google" {
  project = var.gcp_project_id
  region  = var.gcp_region
  # Automatically uses workload identity
}

Environment-Specific Provider Configuration

Different environments often require different provider configurations:

# variables.tf
variable "environment" {
  description = "Environment name"
  type        = string
}

variable "cloud_providers" {
  description = "Cloud providers to use by environment"
  type = map(object({
    aws_enabled   = bool
    azure_enabled = bool
    gcp_enabled   = bool
    aws_region    = string
    azure_region  = string
    gcp_region    = string
  }))
  
  default = {
    dev = {
      aws_enabled   = true
      azure_enabled = false
      gcp_enabled   = false
      aws_region    = "us-west-2"
      azure_region  = "West US 2"
      gcp_region    = "us-west1"
    }
    staging = {
      aws_enabled   = true
      azure_enabled = true
      gcp_enabled   = false
      aws_region    = "us-west-2"
      azure_region  = "West US 2"
      gcp_region    = "us-west1"
    }
    production = {
      aws_enabled   = true
      azure_enabled = true
      gcp_enabled   = true
      aws_region    = "us-west-2"
      azure_region  = "West US 2"
      gcp_region    = "us-west1"
    }
  }
}

# main.tf
locals {
  config = var.cloud_providers[var.environment]
  
  common_tags = {
    Environment = var.environment
    ManagedBy   = "terraform"
    Project     = var.project_name
  }
}

# Conditional provider configuration
provider "aws" {
  count  = local.config.aws_enabled ? 1 : 0
  region = local.config.aws_region
  
  default_tags {
    tags = local.common_tags
  }
}

provider "azurerm" {
  count = local.config.azure_enabled ? 1 : 0
  features {}
}

provider "google" {
  count   = local.config.gcp_enabled ? 1 : 0
  project = var.gcp_project_id
  region  = local.config.gcp_region
}

Resource Organization Patterns

Organize multi-cloud resources for maintainability:

# AWS Resources
resource "aws_vpc" "main" {
  count      = local.config.aws_enabled ? 1 : 0
  provider   = aws
  cidr_block = "10.0.0.0/16"
  
  tags = merge(local.common_tags, {
    Name     = "${var.project_name}-aws-vpc"
    Provider = "aws"
  })
}

# Azure Resources
resource "azurerm_resource_group" "main" {
  count    = local.config.azure_enabled ? 1 : 0
  provider = azurerm
  name     = "${var.project_name}-rg"
  location = local.config.azure_region
  
  tags = merge(local.common_tags, {
    Provider = "azure"
  })
}

resource "azurerm_virtual_network" "main" {
  count               = local.config.azure_enabled ? 1 : 0
  provider            = azurerm
  name                = "${var.project_name}-vnet"
  address_space       = ["10.1.0.0/16"]
  location            = azurerm_resource_group.main[0].location
  resource_group_name = azurerm_resource_group.main[0].name
  
  tags = merge(local.common_tags, {
    Provider = "azure"
  })
}

# Google Cloud Resources
resource "google_compute_network" "main" {
  count                   = local.config.gcp_enabled ? 1 : 0
  provider                = google
  name                    = "${var.project_name}-vpc"
  auto_create_subnetworks = false
  
  labels = {
    environment = var.environment
    managed_by  = "terraform"
    project     = var.project_name
    provider    = "gcp"
  }
}

Share data between providers using outputs and data sources:

# outputs.tf
output "network_info" {
  description = "Network information across all providers"
  value = {
    aws = local.config.aws_enabled ? {
      vpc_id         = aws_vpc.main[0].id
      vpc_cidr_block = aws_vpc.main[0].cidr_block
      region         = local.config.aws_region
    } : null
    
    azure = local.config.azure_enabled ? {
      vnet_id           = azurerm_virtual_network.main[0].id
      vnet_address_space = azurerm_virtual_network.main[0].address_space
      resource_group    = azurerm_resource_group.main[0].name
      region           = local.config.azure_region
    } : null
    
    gcp = local.config.gcp_enabled ? {
      network_id   = google_compute_network.main[0].id
      network_name = google_compute_network.main[0].name
      region       = local.config.gcp_region
    } : null
  }
}

# Use in other configurations
data "terraform_remote_state" "network" {
  backend = "s3"
  config = {
    bucket = "company-terraform-state"
    key    = "network/terraform.tfstate"
    region = "us-west-2"
  }
}

locals {
  aws_vpc_id   = data.terraform_remote_state.network.outputs.network_info.aws.vpc_id
  azure_vnet_id = data.terraform_remote_state.network.outputs.network_info.azure.vnet_id
  gcp_network_id = data.terraform_remote_state.network.outputs.network_info.gcp.network_id
}

Provider Version Management

Pin provider versions for consistency across clouds:

terraform {
  required_version = ">= 1.6"
  
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.20"
    }
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.70"
    }
    google = {
      source  = "hashicorp/google"
      version = "~> 4.80"
    }
    random = {
      source  = "hashicorp/random"
      version = "~> 3.4"
    }
    tls = {
      source  = "hashicorp/tls"
      version = "~> 4.0"
    }
  }
}

# Lock file ensures consistent provider versions
# terraform.lock.hcl is automatically generated

Multi-Cloud Module Structure

Organize modules for multi-cloud scenarios:

modules/
├── multi-cloud-network/
│   ├── main.tf
│   ├── variables.tf
│   ├── outputs.tf
│   ├── aws.tf
│   ├── azure.tf
│   └── gcp.tf
├── cloud-agnostic-database/
│   ├── main.tf
│   ├── variables.tf
│   ├── outputs.tf
│   ├── aws-rds.tf
│   ├── azure-sql.tf
│   └── gcp-sql.tf
└── monitoring/
    ├── main.tf
    ├── variables.tf
    ├── outputs.tf
    ├── aws-cloudwatch.tf
    ├── azure-monitor.tf
    └── gcp-monitoring.tf

Multi-cloud network module example:

# modules/multi-cloud-network/main.tf
variable "providers_config" {
  description = "Configuration for each cloud provider"
  type = object({
    aws_enabled   = bool
    azure_enabled = bool
    gcp_enabled   = bool
  })
}

variable "network_cidrs" {
  description = "CIDR blocks for each provider"
  type = object({
    aws   = string
    azure = string
    gcp   = string
  })
  
  default = {
    aws   = "10.0.0.0/16"
    azure = "10.1.0.0/16"
    gcp   = "10.2.0.0/16"
  }
}

# AWS networking resources
resource "aws_vpc" "main" {
  count      = var.providers_config.aws_enabled ? 1 : 0
  cidr_block = var.network_cidrs.aws
  
  enable_dns_hostnames = true
  enable_dns_support   = true
  
  tags = {
    Name     = "${var.name_prefix}-aws-vpc"
    Provider = "aws"
  }
}

# Azure networking resources
resource "azurerm_virtual_network" "main" {
  count               = var.providers_config.azure_enabled ? 1 : 0
  name                = "${var.name_prefix}-vnet"
  address_space       = [var.network_cidrs.azure]
  location            = var.azure_location
  resource_group_name = var.azure_resource_group_name
  
  tags = {
    Provider = "azure"
  }
}

# GCP networking resources
resource "google_compute_network" "main" {
  count                   = var.providers_config.gcp_enabled ? 1 : 0
  name                    = "${var.name_prefix}-vpc"
  auto_create_subnetworks = false
  
  labels = {
    provider = "gcp"
  }
}

Error Handling and Debugging

Multi-cloud configurations can be complex to debug:

# Enable detailed logging
export TF_LOG=DEBUG
export TF_LOG_PATH=terraform.log

# Test provider authentication
terraform providers

# Validate configuration
terraform validate

# Plan with specific providers
terraform plan -target="aws_vpc.main"
terraform plan -target="azurerm_virtual_network.main"
terraform plan -target="google_compute_network.main"

# Check provider plugin cache
ls -la .terraform/providers/

Provider-specific debugging:

# AWS debugging
aws sts get-caller-identity
aws configure list

# Azure debugging
az account show
az account list

# GCP debugging
gcloud auth list
gcloud config list
gcloud projects list

What’s Next

Multi-provider setup provides the foundation for multi-cloud infrastructure, but the real challenges emerge when you need to connect networks across different cloud providers. Cross-cloud networking requires understanding each provider’s networking model and implementing secure, performant connections.

In the next part, we’ll explore cross-cloud networking patterns, including VPN connections, private peering, and hybrid connectivity solutions that enable seamless communication across AWS, Azure, and Google Cloud.

Cross-Cloud Networking

Connecting networks across different cloud providers is one of the most complex aspects of multi-cloud architecture. Each provider has different networking models, security requirements, and connectivity options. This part covers practical patterns for establishing secure, reliable connections between AWS, Azure, and GCP.

AWS-Azure VPN Connection

Establish site-to-site VPN between AWS and Azure:

# AWS side configuration
data "aws_availability_zones" "available" {
  state = "available"
}

resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true
  
  tags = {
    Name = "aws-vpc"
  }
}

resource "aws_subnet" "private" {
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.1.0/24"
  availability_zone = data.aws_availability_zones.available.names[0]
  
  tags = {
    Name = "aws-private-subnet"
  }
}

resource "aws_vpn_gateway" "main" {
  vpc_id = aws_vpc.main.id
  
  tags = {
    Name = "aws-vpn-gateway"
  }
}

resource "aws_customer_gateway" "azure" {
  bgp_asn    = 65000
  ip_address = azurerm_public_ip.vpn_gateway.ip_address
  type       = "ipsec.1"
  
  tags = {
    Name = "azure-customer-gateway"
  }
}

resource "aws_vpn_connection" "azure" {
  vpn_gateway_id      = aws_vpn_gateway.main.id
  customer_gateway_id = aws_customer_gateway.azure.id
  type                = "ipsec.1"
  static_routes_only  = true
  
  tags = {
    Name = "aws-azure-vpn"
  }
}

resource "aws_vpn_connection_route" "azure" {
  vpn_connection_id      = aws_vpn_connection.azure.id
  destination_cidr_block = "10.1.0.0/16"
}

# Azure side configuration
resource "azurerm_resource_group" "main" {
  name     = "multi-cloud-rg"
  location = "East US"
}

resource "azurerm_virtual_network" "main" {
  name                = "azure-vnet"
  address_space       = ["10.1.0.0/16"]
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
}

resource "azurerm_subnet" "gateway" {
  name                 = "GatewaySubnet"
  resource_group_name  = azurerm_resource_group.main.name
  virtual_network_name = azurerm_virtual_network.main.name
  address_prefixes     = ["10.1.255.0/27"]
}

resource "azurerm_public_ip" "vpn_gateway" {
  name                = "azure-vpn-gateway-ip"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  allocation_method   = "Static"
  sku                 = "Standard"
}

resource "azurerm_virtual_network_gateway" "main" {
  name                = "azure-vpn-gateway"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  
  type     = "Vpn"
  vpn_type = "RouteBased"
  
  active_active = false
  enable_bgp    = false
  sku           = "VpnGw1"
  
  ip_configuration {
    name                          = "vnetGatewayConfig"
    public_ip_address_id          = azurerm_public_ip.vpn_gateway.id
    private_ip_address_allocation = "Dynamic"
    subnet_id                     = azurerm_subnet.gateway.id
  }
}

resource "azurerm_local_network_gateway" "aws" {
  name                = "aws-local-gateway"
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location
  
  gateway_address = aws_vpn_connection.azure.tunnel1_address
  address_space   = ["10.0.0.0/16"]
}

resource "azurerm_virtual_network_gateway_connection" "aws" {
  name                = "azure-aws-connection"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  
  type                       = "IPsec"
  virtual_network_gateway_id = azurerm_virtual_network_gateway.main.id
  local_network_gateway_id   = azurerm_local_network_gateway.aws.id
  
  shared_key = aws_vpn_connection.azure.tunnel1_preshared_key
}

AWS-GCP Interconnect

Establish dedicated connection between AWS and GCP:

# GCP side configuration
resource "google_compute_network" "main" {
  name                    = "gcp-vpc"
  auto_create_subnetworks = false
}

resource "google_compute_subnetwork" "private" {
  name          = "gcp-private-subnet"
  ip_cidr_range = "10.2.1.0/24"
  region        = "us-central1"
  network       = google_compute_network.main.id
}

resource "google_compute_router" "main" {
  name    = "gcp-router"
  region  = "us-central1"
  network = google_compute_network.main.id
  
  bgp {
    asn = 64512
  }
}

resource "google_compute_vpn_gateway" "main" {
  name    = "gcp-vpn-gateway"
  network = google_compute_network.main.id
  region  = "us-central1"
}

resource "google_compute_address" "vpn_static_ip" {
  name   = "gcp-vpn-ip"
  region = "us-central1"
}

resource "google_compute_vpn_tunnel" "aws" {
  name          = "gcp-aws-tunnel"
  peer_ip       = aws_vpn_connection.gcp.tunnel1_address
  shared_secret = aws_vpn_connection.gcp.tunnel1_preshared_key
  
  target_vpn_gateway = google_compute_vpn_gateway.main.id
  
  depends_on = [
    google_compute_forwarding_rule.esp,
    google_compute_forwarding_rule.udp500,
    google_compute_forwarding_rule.udp4500,
  ]
}

resource "google_compute_route" "aws" {
  name       = "route-to-aws"
  network    = google_compute_network.main.name
  dest_range = "10.0.0.0/16"
  priority   = 1000
  
  next_hop_vpn_tunnel = google_compute_vpn_tunnel.aws.id
}

Multi-Cloud Transit Gateway

Create a hub-and-spoke network topology:

# Central transit hub in AWS
resource "aws_ec2_transit_gateway" "hub" {
  description = "Multi-cloud transit hub"
  
  tags = {
    Name = "multi-cloud-tgw"
  }
}

resource "aws_ec2_transit_gateway_vpc_attachment" "aws_vpc" {
  subnet_ids         = [aws_subnet.private.id]
  transit_gateway_id = aws_ec2_transit_gateway.hub.id
  vpc_id             = aws_vpc.main.id
  
  tags = {
    Name = "aws-vpc-attachment"
  }
}

resource "aws_ec2_transit_gateway_vpn_attachment" "azure" {
  vpn_connection_id  = aws_vpn_connection.azure.id
  transit_gateway_id = aws_ec2_transit_gateway.hub.id
  
  tags = {
    Name = "azure-vpn-attachment"
  }
}

resource "aws_ec2_transit_gateway_route_table" "main" {
  transit_gateway_id = aws_ec2_transit_gateway.hub.id
  
  tags = {
    Name = "multi-cloud-route-table"
  }
}

resource "aws_ec2_transit_gateway_route" "azure" {
  destination_cidr_block         = "10.1.0.0/16"
  transit_gateway_attachment_id  = aws_ec2_transit_gateway_vpn_attachment.azure.id
  transit_gateway_route_table_id = aws_ec2_transit_gateway_route_table.main.id
}

Network Automation Script

Automate cross-cloud network setup and validation:

#!/bin/bash
# scripts/setup-cross-cloud-network.sh

set -e

AWS_REGION=${1:-"us-west-2"}
AZURE_REGION=${2:-"East US"}
GCP_REGION=${3:-"us-central1"}

setup_aws_networking() {
    echo "Setting up AWS networking..."
    
    cd aws/
    terraform init
    terraform plan -var="region=$AWS_REGION"
    terraform apply -auto-approve -var="region=$AWS_REGION"
    
    # Export connection details
    terraform output -json > ../aws-outputs.json
    
    cd ..
}

setup_azure_networking() {
    echo "Setting up Azure networking..."
    
    # Import AWS VPN details
    AWS_VPN_IP=$(jq -r '.vpn_tunnel1_address.value' aws-outputs.json)
    AWS_PRESHARED_KEY=$(jq -r '.vpn_tunnel1_preshared_key.value' aws-outputs.json)
    
    cd azure/
    terraform init
    terraform plan \
        -var="location=$AZURE_REGION" \
        -var="aws_vpn_ip=$AWS_VPN_IP" \
        -var="aws_preshared_key=$AWS_PRESHARED_KEY"
    
    terraform apply -auto-approve \
        -var="location=$AZURE_REGION" \
        -var="aws_vpn_ip=$AWS_VPN_IP" \
        -var="aws_preshared_key=$AWS_PRESHARED_KEY"
    
    cd ..
}

setup_gcp_networking() {
    echo "Setting up GCP networking..."
    
    cd gcp/
    terraform init
    terraform plan -var="region=$GCP_REGION"
    terraform apply -auto-approve -var="region=$GCP_REGION"
    
    cd ..
}

validate_connectivity() {
    echo "Validating cross-cloud connectivity..."
    
    # Test AWS to Azure
    AWS_INSTANCE_IP=$(jq -r '.test_instance_private_ip.value' aws-outputs.json)
    AZURE_INSTANCE_IP=$(jq -r '.test_instance_private_ip.value' azure-outputs.json)
    
    echo "Testing AWS ($AWS_INSTANCE_IP) to Azure ($AZURE_INSTANCE_IP)..."
    
    # This would typically involve SSH to instances and running ping tests
    # For demo purposes, we'll just check VPN status
    
    aws ec2 describe-vpn-connections \
        --region "$AWS_REGION" \
        --query 'VpnConnections[0].State' \
        --output text
}

# Execute setup
setup_aws_networking
setup_azure_networking
setup_gcp_networking
validate_connectivity

echo "✅ Multi-cloud network setup completed"

Network Monitoring

Monitor cross-cloud network performance:

#!/usr/bin/env python3
# scripts/network_monitor.py

import boto3
import time
from azure.identity import DefaultAzureCredential
from azure.mgmt.network import NetworkManagementClient
from google.cloud import monitoring_v3

class MultiCloudNetworkMonitor:
    def __init__(self):
        self.aws_ec2 = boto3.client('ec2')
        self.aws_cloudwatch = boto3.client('cloudwatch')
        
    def check_aws_vpn_status(self, vpn_connection_id: str) -> dict:
        """Check AWS VPN connection status"""
        
        response = self.aws_ec2.describe_vpn_connections(
            VpnConnectionIds=[vpn_connection_id]
        )
        
        connection = response['VpnConnections'][0]
        
        return {
            'state': connection['State'],
            'tunnel1_state': connection['VgwTelemetry'][0]['Status'],
            'tunnel2_state': connection['VgwTelemetry'][1]['Status'],
            'tunnel1_accepted_routes': connection['VgwTelemetry'][0]['AcceptedRouteCount'],
            'tunnel2_accepted_routes': connection['VgwTelemetry'][1]['AcceptedRouteCount']
        }
    
    def get_network_metrics(self, vpn_connection_id: str) -> dict:
        """Get network performance metrics"""
        
        end_time = time.time()
        start_time = end_time - 3600  # Last hour
        
        metrics = {}
        
        # Get tunnel state metrics
        for tunnel_num in [1, 2]:
            response = self.aws_cloudwatch.get_metric_statistics(
                Namespace='AWS/VPN',
                MetricName='TunnelState',
                Dimensions=[
                    {'Name': 'VpnId', 'Value': vpn_connection_id},
                    {'Name': 'TunnelIpAddress', 'Value': f'tunnel-{tunnel_num}'}
                ],
                StartTime=start_time,
                EndTime=end_time,
                Period=300,
                Statistics=['Average']
            )
            
            metrics[f'tunnel_{tunnel_num}_uptime'] = len([
                dp for dp in response['Datapoints'] if dp['Average'] == 1
            ]) / len(response['Datapoints']) * 100 if response['Datapoints'] else 0
        
        return metrics
    
    def generate_report(self, vpn_connection_id: str) -> str:
        """Generate network status report"""
        
        status = self.check_aws_vpn_status(vpn_connection_id)
        metrics = self.get_network_metrics(vpn_connection_id)
        
        report = [
            "Multi-Cloud Network Status Report",
            "=" * 40,
            f"VPN Connection: {vpn_connection_id}",
            f"Overall State: {status['state']}",
            "",
            "Tunnel Status:",
            f"  Tunnel 1: {status['tunnel1_state']} ({status['tunnel1_accepted_routes']} routes)",
            f"  Tunnel 2: {status['tunnel2_state']} ({status['tunnel2_accepted_routes']} routes)",
            "",
            "Uptime (Last Hour):",
            f"  Tunnel 1: {metrics.get('tunnel_1_uptime', 0):.1f}%",
            f"  Tunnel 2: {metrics.get('tunnel_2_uptime', 0):.1f}%"
        ]
        
        return "\n".join(report)

def main():
    import argparse
    
    parser = argparse.ArgumentParser(description='Multi-Cloud Network Monitor')
    parser.add_argument('--vpn-connection-id', required=True, help='AWS VPN Connection ID')
    
    args = parser.parse_args()
    
    monitor = MultiCloudNetworkMonitor()
    report = monitor.generate_report(args.vpn_connection_id)
    
    print(report)

if __name__ == "__main__":
    main()

What’s Next

Cross-cloud networking provides the foundation for multi-cloud architecture, but managing identities and permissions across providers requires unified identity strategies. In the next part, we’ll explore how to implement consistent access control and identity management across AWS, Azure, and GCP.

Unified Identity and Access

Managing identities and permissions consistently across multiple cloud providers is critical for security and operational efficiency. Each provider has different identity models, but you can create unified patterns that provide consistent access control while leveraging each platform’s strengths.

Cross-Cloud Service Accounts

Create service accounts that can access multiple cloud providers:

# AWS IAM role for cross-cloud access
resource "aws_iam_role" "cross_cloud_service" {
  name = "cross-cloud-service-role"
  
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "ec2.amazonaws.com"
        }
      },
      {
        Action = "sts:AssumeRoleWithWebIdentity"
        Effect = "Allow"
        Principal = {
          Federated = aws_iam_openid_connect_provider.azure_ad.arn
        }
        Condition = {
          StringEquals = {
            "${aws_iam_openid_connect_provider.azure_ad.url}:aud" = var.azure_application_id
          }
        }
      }
    ]
  })
}

resource "aws_iam_policy" "cross_cloud_policy" {
  name = "cross-cloud-access-policy"
  
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "s3:GetObject",
          "s3:PutObject",
          "secretsmanager:GetSecretValue"
        ]
        Resource = "*"
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "cross_cloud" {
  role       = aws_iam_role.cross_cloud_service.name
  policy_arn = aws_iam_policy.cross_cloud_policy.arn
}

# Azure AD application for cross-cloud identity
resource "azuread_application" "cross_cloud" {
  display_name = "cross-cloud-service"
  
  web {
    redirect_uris = ["https://signin.aws.amazon.com/saml"]
  }
}

resource "azuread_service_principal" "cross_cloud" {
  application_id = azuread_application.cross_cloud.application_id
}

resource "azuread_service_principal_password" "cross_cloud" {
  service_principal_id = azuread_service_principal.cross_cloud.object_id
}

# GCP service account
resource "google_service_account" "cross_cloud" {
  account_id   = "cross-cloud-service"
  display_name = "Cross-Cloud Service Account"
}

resource "google_service_account_key" "cross_cloud" {
  service_account_id = google_service_account.cross_cloud.name
}

resource "google_project_iam_member" "cross_cloud_storage" {
  project = var.gcp_project_id
  role    = "roles/storage.admin"
  member  = "serviceAccount:${google_service_account.cross_cloud.email}"
}

Federated Identity Setup

Configure identity federation between providers:

# AWS OIDC provider for Azure AD
resource "aws_iam_openid_connect_provider" "azure_ad" {
  url = "https://sts.windows.net/${var.azure_tenant_id}/"
  
  client_id_list = [
    var.azure_application_id
  ]
  
  thumbprint_list = [
    "626d44e704d1ceabe3bf0d53397464ac8080142c"
  ]
}

# Azure AD SAML configuration for AWS
resource "azuread_application" "aws_sso" {
  display_name = "AWS-SSO"
  
  web {
    redirect_uris = ["https://signin.aws.amazon.com/saml"]
  }
  
  app_role {
    allowed_member_types = ["User"]
    description          = "AWS SSO Access"
    display_name         = "AWS Access"
    enabled              = true
    id                   = "b9632174-c057-4f7e-951b-b3adc3ddb778"
    value                = "AWSAccess"
  }
}

# GCP Workload Identity for cross-cloud access
resource "google_iam_workload_identity_pool" "cross_cloud" {
  workload_identity_pool_id = "cross-cloud-pool"
  display_name              = "Cross-Cloud Identity Pool"
  description               = "Identity pool for cross-cloud access"
}

resource "google_iam_workload_identity_pool_provider" "aws" {
  workload_identity_pool_id          = google_iam_workload_identity_pool.cross_cloud.workload_identity_pool_id
  workload_identity_pool_provider_id = "aws-provider"
  display_name                       = "AWS Provider"
  
  aws {
    account_id = var.aws_account_id
  }
  
  attribute_mapping = {
    "google.subject"       = "assertion.arn"
    "attribute.aws_role"   = "assertion.arn.contains('role') ? assertion.arn.extract('{account_arn}role/') : ''"
    "attribute.account_id" = "assertion.account"
  }
}

Unified RBAC Implementation

Create consistent role-based access control across providers:

# Define common roles
locals {
  common_roles = {
    admin = {
      description = "Full administrative access"
      permissions = ["*"]
    }
    developer = {
      description = "Development environment access"
      permissions = ["read", "write", "deploy"]
    }
    readonly = {
      description = "Read-only access"
      permissions = ["read"]
    }
  }
}

# AWS IAM roles based on common roles
resource "aws_iam_role" "common_roles" {
  for_each = local.common_roles
  
  name = "multi-cloud-${each.key}"
  
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Federated = aws_iam_openid_connect_provider.azure_ad.arn
        }
      }
    ]
  })
}

resource "aws_iam_policy" "common_role_policies" {
  for_each = local.common_roles
  
  name = "multi-cloud-${each.key}-policy"
  
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = each.value.permissions
        Resource = "*"
      }
    ]
  })
}

# Azure AD groups for common roles
resource "azuread_group" "common_roles" {
  for_each = local.common_roles
  
  display_name     = "MultiCloud-${title(each.key)}"
  description      = each.value.description
  security_enabled = true
}

# GCP IAM custom roles
resource "google_project_iam_custom_role" "common_roles" {
  for_each = local.common_roles
  
  role_id     = "multiCloud${title(each.key)}"
  title       = "Multi-Cloud ${title(each.key)}"
  description = each.value.description
  
  permissions = [
    for perm in each.value.permissions :
    "storage.objects.${perm}" if perm != "*"
  ]
}

Secret Management Across Clouds

Implement unified secret management:

#!/usr/bin/env python3
# scripts/cross_cloud_secrets.py

import boto3
import json
from azure.keyvault.secrets import SecretClient
from azure.identity import DefaultAzureCredential
from google.cloud import secretmanager

class CrossCloudSecretManager:
    def __init__(self, aws_region: str, azure_vault_url: str, gcp_project_id: str):
        self.aws_secrets = boto3.client('secretsmanager', region_name=aws_region)
        self.azure_secrets = SecretClient(vault_url=azure_vault_url, credential=DefaultAzureCredential())
        self.gcp_secrets = secretmanager.SecretManagerServiceClient()
        self.gcp_project_id = gcp_project_id
    
    def create_secret_everywhere(self, secret_name: str, secret_value: str) -> dict:
        """Create the same secret in all three cloud providers"""
        
        results = {}
        
        # AWS Secrets Manager
        try:
            self.aws_secrets.create_secret(
                Name=secret_name,
                SecretString=secret_value,
                Description=f"Cross-cloud secret: {secret_name}"
            )
            results['aws'] = 'success'
        except Exception as e:
            results['aws'] = f'error: {str(e)}'
        
        # Azure Key Vault
        try:
            self.azure_secrets.set_secret(secret_name, secret_value)
            results['azure'] = 'success'
        except Exception as e:
            results['azure'] = f'error: {str(e)}'
        
        # GCP Secret Manager
        try:
            parent = f"projects/{self.gcp_project_id}"
            
            # Create secret
            secret = self.gcp_secrets.create_secret(
                request={
                    "parent": parent,
                    "secret_id": secret_name,
                    "secret": {"replication": {"automatic": {}}},
                }
            )
            
            # Add secret version
            self.gcp_secrets.add_secret_version(
                request={
                    "parent": secret.name,
                    "payload": {"data": secret_value.encode("UTF-8")},
                }
            )
            results['gcp'] = 'success'
        except Exception as e:
            results['gcp'] = f'error: {str(e)}'
        
        return results
    
    def get_secret_from_all(self, secret_name: str) -> dict:
        """Retrieve secret from all providers for comparison"""
        
        secrets = {}
        
        # AWS
        try:
            response = self.aws_secrets.get_secret_value(SecretId=secret_name)
            secrets['aws'] = response['SecretString']
        except Exception as e:
            secrets['aws'] = f'error: {str(e)}'
        
        # Azure
        try:
            secret = self.azure_secrets.get_secret(secret_name)
            secrets['azure'] = secret.value
        except Exception as e:
            secrets['azure'] = f'error: {str(e)}'
        
        # GCP
        try:
            name = f"projects/{self.gcp_project_id}/secrets/{secret_name}/versions/latest"
            response = self.gcp_secrets.access_secret_version(request={"name": name})
            secrets['gcp'] = response.payload.data.decode("UTF-8")
        except Exception as e:
            secrets['gcp'] = f'error: {str(e)}'
        
        return secrets
    
    def sync_secrets(self, secret_mappings: dict) -> dict:
        """Sync secrets across providers based on mapping"""
        
        sync_results = {}
        
        for secret_name, config in secret_mappings.items():
            source_provider = config['source']
            target_providers = config['targets']
            
            # Get secret from source
            if source_provider == 'aws':
                try:
                    response = self.aws_secrets.get_secret_value(SecretId=secret_name)
                    secret_value = response['SecretString']
                except Exception as e:
                    sync_results[secret_name] = f'Failed to read from AWS: {e}'
                    continue
            
            # Sync to targets
            for target in target_providers:
                if target == 'azure':
                    try:
                        self.azure_secrets.set_secret(secret_name, secret_value)
                        sync_results[f'{secret_name}_to_azure'] = 'success'
                    except Exception as e:
                        sync_results[f'{secret_name}_to_azure'] = f'error: {e}'
                
                elif target == 'gcp':
                    try:
                        name = f"projects/{self.gcp_project_id}/secrets/{secret_name}/versions/latest"
                        self.gcp_secrets.add_secret_version(
                            request={
                                "parent": f"projects/{self.gcp_project_id}/secrets/{secret_name}",
                                "payload": {"data": secret_value.encode("UTF-8")},
                            }
                        )
                        sync_results[f'{secret_name}_to_gcp'] = 'success'
                    except Exception as e:
                        sync_results[f'{secret_name}_to_gcp'] = f'error: {e}'
        
        return sync_results

def main():
    import argparse
    
    parser = argparse.ArgumentParser(description='Cross-Cloud Secret Manager')
    parser.add_argument('--aws-region', default='us-west-2', help='AWS region')
    parser.add_argument('--azure-vault-url', required=True, help='Azure Key Vault URL')
    parser.add_argument('--gcp-project-id', required=True, help='GCP Project ID')
    parser.add_argument('--action', choices=['create', 'get', 'sync'], required=True)
    parser.add_argument('--secret-name', help='Secret name')
    parser.add_argument('--secret-value', help='Secret value')
    parser.add_argument('--config-file', help='JSON config file for sync operation')
    
    args = parser.parse_args()
    
    manager = CrossCloudSecretManager(
        args.aws_region,
        args.azure_vault_url,
        args.gcp_project_id
    )
    
    if args.action == 'create':
        if not args.secret_name or not args.secret_value:
            print("Error: --secret-name and --secret-value required for create")
            return
        
        results = manager.create_secret_everywhere(args.secret_name, args.secret_value)
        print(json.dumps(results, indent=2))
    
    elif args.action == 'get':
        if not args.secret_name:
            print("Error: --secret-name required for get")
            return
        
        secrets = manager.get_secret_from_all(args.secret_name)
        print(json.dumps(secrets, indent=2))
    
    elif args.action == 'sync':
        if not args.config_file:
            print("Error: --config-file required for sync")
            return
        
        with open(args.config_file, 'r') as f:
            config = json.load(f)
        
        results = manager.sync_secrets(config)
        print(json.dumps(results, indent=2))

if __name__ == "__main__":
    main()

Access Control Automation

Automate user provisioning across all providers:

#!/bin/bash
# scripts/provision-user.sh

set -e

USER_EMAIL=${1:-""}
ROLE=${2:-"readonly"}
GROUPS=${3:-""}

if [ -z "$USER_EMAIL" ]; then
    echo "Usage: $0 <user_email> [role] [additional_groups]"
    exit 1
fi

provision_aws_access() {
    echo "Provisioning AWS access for $USER_EMAIL..."
    
    # Create IAM user
    aws iam create-user --user-name "$USER_EMAIL" || true
    
    # Attach role policy
    aws iam attach-user-policy \
        --user-name "$USER_EMAIL" \
        --policy-arn "arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/multi-cloud-${ROLE}-policy"
    
    # Generate access keys
    aws iam create-access-key --user-name "$USER_EMAIL" --output table
}

provision_azure_access() {
    echo "Provisioning Azure access for $USER_EMAIL..."
    
    # Create user (if external, invite as guest)
    az ad user create \
        --display-name "$USER_EMAIL" \
        --user-principal-name "$USER_EMAIL" \
        --password "TempPassword123!" \
        --force-change-password-next-sign-in true || \
    az ad user invite --invited-user-email-address "$USER_EMAIL"
    
    # Add to role group
    GROUP_ID=$(az ad group show --group "MultiCloud-$(echo $ROLE | sed 's/.*/\u&/')" --query objectId --output tsv)
    USER_ID=$(az ad user show --id "$USER_EMAIL" --query objectId --output tsv)
    
    az ad group member add --group "$GROUP_ID" --member-id "$USER_ID"
}

provision_gcp_access() {
    echo "Provisioning GCP access for $USER_EMAIL..."
    
    # Add IAM policy binding
    gcloud projects add-iam-policy-binding "$GCP_PROJECT_ID" \
        --member="user:$USER_EMAIL" \
        --role="projects/$GCP_PROJECT_ID/roles/multiCloud$(echo $ROLE | sed 's/.*/\u&/')"
}

# Execute provisioning
provision_aws_access
provision_azure_access
provision_gcp_access

echo "✅ User $USER_EMAIL provisioned with $ROLE access across all clouds"

What’s Next

Unified identity and access management provides the security foundation for multi-cloud operations. In the next part, we’ll explore provider abstraction patterns that allow you to create modules and configurations that work consistently across different cloud providers.

Provider Abstraction Patterns

Creating truly portable infrastructure requires abstraction layers that hide provider-specific differences while exposing common functionality. This part covers patterns for building cloud-agnostic modules that can deploy the same logical infrastructure across AWS, Azure, and GCP.

Universal Compute Module

Create a compute module that works across all providers:

# modules/universal-compute/variables.tf
variable "provider_type" {
  description = "Cloud provider (aws, azure, gcp)"
  type        = string
  validation {
    condition     = contains(["aws", "azure", "gcp"], var.provider_type)
    error_message = "Provider must be aws, azure, or gcp."
  }
}

variable "instance_config" {
  description = "Instance configuration"
  type = object({
    name         = string
    size         = string
    image        = string
    subnet_id    = string
    key_name     = optional(string)
    user_data    = optional(string)
    tags         = optional(map(string), {})
  })
}

variable "gcp_region" {
  description = "GCP region (required when provider_type is gcp)"
  type        = string
  default     = ""
}

variable "gcp_project_id" {
  description = "GCP project ID (required when provider_type is gcp)"
  type        = string
  default     = ""
}

# modules/universal-compute/main.tf
locals {
  # Size mapping across providers
  size_mapping = {
    aws = {
      small  = "t3.micro"
      medium = "t3.small"
      large  = "t3.medium"
    }
    azure = {
      small  = "Standard_B1s"
      medium = "Standard_B2s"
      large  = "Standard_B4ms"
    }
    gcp = {
      small  = "e2-micro"
      medium = "e2-small"
      large  = "e2-medium"
    }
  }
  
  actual_size = local.size_mapping[var.provider_type][var.instance_config.size]
}

# AWS EC2 Instance
resource "aws_instance" "this" {
  count = var.provider_type == "aws" ? 1 : 0
  
  ami                    = var.instance_config.image
  instance_type          = local.actual_size
  subnet_id              = var.instance_config.subnet_id
  key_name               = var.instance_config.key_name
  user_data              = var.instance_config.user_data
  
  tags = merge(var.instance_config.tags, {
    Name = var.instance_config.name
  })
}

# Azure Virtual Machine
resource "azurerm_network_interface" "this" {
  count = var.provider_type == "azure" ? 1 : 0
  
  name                = "${var.instance_config.name}-nic"
  location            = data.azurerm_subnet.this[0].location
  resource_group_name = data.azurerm_subnet.this[0].resource_group_name
  
  ip_configuration {
    name                          = "internal"
    subnet_id                     = var.instance_config.subnet_id
    private_ip_address_allocation = "Dynamic"
  }
}

resource "azurerm_linux_virtual_machine" "this" {
  count = var.provider_type == "azure" ? 1 : 0
  
  name                = var.instance_config.name
  resource_group_name = data.azurerm_subnet.this[0].resource_group_name
  location            = data.azurerm_subnet.this[0].location
  size                = local.actual_size
  
  disable_password_authentication = true
  admin_username                  = "adminuser"
  
  network_interface_ids = [
    azurerm_network_interface.this[0].id,
  ]
  
  os_disk {
    caching              = "ReadWrite"
    storage_account_type = "Standard_LRS"
  }
  
  source_image_reference {
    publisher = "Canonical"
    offer     = "0001-com-ubuntu-server-focal"
    sku       = "20_04-lts-gen2"
    version   = "latest"
  }
  
  admin_ssh_key {
    username   = "adminuser"
    public_key = file("~/.ssh/id_rsa.pub")
  }
  
  custom_data = base64encode(var.instance_config.user_data)
  
  tags = var.instance_config.tags
}

# GCP Compute Instance
resource "google_compute_instance" "this" {
  count = var.provider_type == "gcp" ? 1 : 0
  
  name         = var.instance_config.name
  machine_type = local.actual_size
  zone         = "${data.google_compute_subnetwork.this[0].region}-a"  # Use first zone in region
  
  boot_disk {
    initialize_params {
      image = var.instance_config.image
    }
  }
  
  network_interface {
    subnetwork = var.instance_config.subnet_id
  }
  
  metadata = {
    ssh-keys = "adminuser:${file("~/.ssh/id_rsa.pub")}"
  }
  
  metadata_startup_script = var.instance_config.user_data
  
  labels = var.instance_config.tags
}

# Data sources for provider-specific information
data "azurerm_subnet" "this" {
  count = var.provider_type == "azure" ? 1 : 0
  name  = split("/", var.instance_config.subnet_id)[10]  # Extract subnet name from resource ID
  virtual_network_name = split("/", var.instance_config.subnet_id)[8]   # Extract VNet name
  resource_group_name  = split("/", var.instance_config.subnet_id)[4]   # Extract RG name
}

data "google_compute_subnetwork" "this" {
  count   = var.provider_type == "gcp" ? 1 : 0
  name    = var.instance_config.subnet_id
  region  = var.gcp_region
  project = var.gcp_project_id
}

# modules/universal-compute/outputs.tf
output "instance_id" {
  description = "Instance ID"
  value = var.provider_type == "aws" ? aws_instance.this[0].id : (
    var.provider_type == "azure" ? azurerm_linux_virtual_machine.this[0].id : 
    google_compute_instance.this[0].id
  )
}

output "private_ip" {
  description = "Private IP address"
  value = var.provider_type == "aws" ? aws_instance.this[0].private_ip : (
    var.provider_type == "azure" ? azurerm_linux_virtual_machine.this[0].private_ip_address :
    google_compute_instance.this[0].network_interface[0].network_ip
  )
}

Universal Storage Module

Create storage that works across providers:

# modules/universal-storage/variables.tf
variable "provider_type" {
  description = "Cloud provider"
  type        = string
}

variable "bucket_config" {
  description = "Storage bucket configuration"
  type = object({
    name                = string
    versioning_enabled  = optional(bool, false)
    encryption_enabled  = optional(bool, true)
    public_access       = optional(bool, false)
    lifecycle_rules     = optional(list(object({
      days   = number
      action = string
    })), [])
    tags = optional(map(string), {})
  })
}

variable "resource_group_name" {
  description = "Azure resource group name (required when provider_type is azure)"
  type        = string
  default     = ""
}

variable "location" {
  description = "Azure location (required when provider_type is azure)"
  type        = string
  default     = ""
}

variable "gcp_region" {
  description = "GCP region (required when provider_type is gcp)"
  type        = string
  default     = ""
}

# modules/universal-storage/main.tf
# AWS S3 Bucket
resource "aws_s3_bucket" "this" {
  count = var.provider_type == "aws" ? 1 : 0
  
  bucket = var.bucket_config.name
  tags   = var.bucket_config.tags
}

resource "aws_s3_bucket_versioning" "this" {
  count = var.provider_type == "aws" && var.bucket_config.versioning_enabled ? 1 : 0
  
  bucket = aws_s3_bucket.this[0].id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "this" {
  count = var.provider_type == "aws" && var.bucket_config.encryption_enabled ? 1 : 0
  
  bucket = aws_s3_bucket.this[0].id
  
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

resource "aws_s3_bucket_lifecycle_configuration" "this" {
  count = var.provider_type == "aws" && length(var.bucket_config.lifecycle_rules) > 0 ? 1 : 0
  
  bucket = aws_s3_bucket.this[0].id
  
  dynamic "rule" {
    for_each = var.bucket_config.lifecycle_rules
    content {
      id     = "rule-${rule.key}"
      status = "Enabled"
      
      expiration {
        days = rule.value.days
      }
    }
  }
}

# Azure Storage Account
resource "azurerm_storage_account" "this" {
  count = var.provider_type == "azure" ? 1 : 0
  
  name                     = replace(var.bucket_config.name, "-", "")
  resource_group_name      = var.resource_group_name
  location                 = var.location
  account_tier             = "Standard"
  account_replication_type = "LRS"
  
  blob_properties {
    versioning_enabled = var.bucket_config.versioning_enabled
  }
  
  tags = var.bucket_config.tags
}

resource "azurerm_storage_container" "this" {
  count = var.provider_type == "azure" ? 1 : 0
  
  name                  = "data"
  storage_account_name  = azurerm_storage_account.this[0].name
  container_access_type = var.bucket_config.public_access ? "blob" : "private"
}

# GCP Storage Bucket
resource "google_storage_bucket" "this" {
  count = var.provider_type == "gcp" ? 1 : 0
  
  name     = var.bucket_config.name
  location = var.gcp_region
  
  versioning {
    enabled = var.bucket_config.versioning_enabled
  }
  
  dynamic "lifecycle_rule" {
    for_each = var.bucket_config.lifecycle_rules
    content {
      condition {
        age = lifecycle_rule.value.days
      }
      action {
        type = lifecycle_rule.value.action == "delete" ? "Delete" : "SetStorageClass"
      }
    }
  }
  
  labels = var.bucket_config.tags
}

Configuration Factory Pattern

Generate provider-specific configurations from common definitions:

#!/usr/bin/env python3
# scripts/config_factory.py

import json
import yaml
from typing import Dict, Any, List
from pathlib import Path

class MultiCloudConfigFactory:
    def __init__(self):
        self.provider_mappings = {
            'compute': {
                'aws': self._generate_aws_compute,
                'azure': self._generate_azure_compute,
                'gcp': self._generate_gcp_compute
            },
            'storage': {
                'aws': self._generate_aws_storage,
                'azure': self._generate_azure_storage,
                'gcp': self._generate_gcp_storage
            },
            'network': {
                'aws': self._generate_aws_network,
                'azure': self._generate_azure_network,
                'gcp': self._generate_gcp_network
            }
        }
    
    def generate_configs(self, spec_file: str, output_dir: str) -> Dict[str, str]:
        """Generate provider-specific configs from universal spec"""
        
        with open(spec_file, 'r') as f:
            spec = yaml.safe_load(f)
        
        output_path = Path(output_dir)
        output_path.mkdir(exist_ok=True)
        
        generated_files = {}
        
        for provider in spec.get('providers', []):
            provider_name = provider['name']
            provider_config = {
                'terraform': {
                    'required_providers': {
                        provider_name: provider.get('version_constraint', {})
                    }
                },
                'provider': {
                    provider_name: provider.get('config', {})
                }
            }
            
            # Generate resources for each service
            for service_name, service_config in spec.get('services', {}).items():
                if service_name in self.provider_mappings:
                    generator = self.provider_mappings[service_name].get(provider_name)
                    if generator:
                        resources = generator(service_config)
                        provider_config.update(resources)
            
            # Write provider-specific configuration
            config_file = output_path / f"{provider_name}.tf.json"
            with open(config_file, 'w') as f:
                json.dump(provider_config, f, indent=2)
            
            generated_files[provider_name] = str(config_file)
        
        return generated_files
    
    def _generate_aws_compute(self, config: Dict[str, Any]) -> Dict[str, Any]:
        """Generate AWS compute resources"""
        
        resources = {'resource': {'aws_instance': {}}}
        
        for instance_name, instance_config in config.get('instances', {}).items():
            resources['resource']['aws_instance'][instance_name] = {
                'ami': instance_config['image'],
                'instance_type': self._map_instance_size('aws', instance_config['size']),
                'subnet_id': instance_config['subnet_id'],
                'tags': instance_config.get('tags', {})
            }
            
            if 'user_data' in instance_config:
                resources['resource']['aws_instance'][instance_name]['user_data'] = instance_config['user_data']
        
        return resources
    
    def _generate_azure_compute(self, config: Dict[str, Any]) -> Dict[str, Any]:
        """Generate Azure compute resources"""
        
        resources = {
            'resource': {
                'azurerm_linux_virtual_machine': {},
                'azurerm_network_interface': {}
            }
        }
        
        for instance_name, instance_config in config.get('instances', {}).items():
            # Network interface
            resources['resource']['azurerm_network_interface'][f"{instance_name}_nic"] = {
                'name': f"{instance_name}-nic",
                'location': '${var.location}',
                'resource_group_name': '${var.resource_group_name}',
                'ip_configuration': [{
                    'name': 'internal',
                    'subnet_id': instance_config['subnet_id'],
                    'private_ip_address_allocation': 'Dynamic'
                }]
            }
            
            # Virtual machine
            resources['resource']['azurerm_linux_virtual_machine'][instance_name] = {
                'name': instance_name,
                'resource_group_name': '${var.resource_group_name}',
                'location': '${var.location}',
                'size': self._map_instance_size('azure', instance_config['size']),
                'disable_password_authentication': True,
                'network_interface_ids': [f"${{azurerm_network_interface.{instance_name}_nic.id}}"],
                'os_disk': [{
                    'caching': 'ReadWrite',
                    'storage_account_type': 'Standard_LRS'
                }],
                'source_image_reference': [{
                    'publisher': 'Canonical',
                    'offer': '0001-com-ubuntu-server-focal',
                    'sku': '20_04-lts-gen2',
                    'version': 'latest'
                }],
                'tags': instance_config.get('tags', {})
            }
        
        return resources
    
    def _generate_gcp_compute(self, config: Dict[str, Any]) -> Dict[str, Any]:
        """Generate GCP compute resources"""
        
        resources = {'resource': {'google_compute_instance': {}}}
        
        for instance_name, instance_config in config.get('instances', {}).items():
            resources['resource']['google_compute_instance'][instance_name] = {
                'name': instance_name,
                'machine_type': self._map_instance_size('gcp', instance_config['size']),
                'zone': '${var.zone}',
                'boot_disk': [{
                    'initialize_params': [{
                        'image': instance_config['image']
                    }]
                }],
                'network_interface': [{
                    'subnetwork': instance_config['subnet_id']
                }],
                'labels': instance_config.get('tags', {})
            }
        
        return resources
    
    def _generate_aws_storage(self, config: Dict[str, Any]) -> Dict[str, Any]:
        """Generate AWS storage resources"""
        
        resources = {'resource': {'aws_s3_bucket': {}}}
        
        for bucket_name, bucket_config in config.get('buckets', {}).items():
            resources['resource']['aws_s3_bucket'][bucket_name] = {
                'bucket': bucket_config['name'],
                'tags': bucket_config.get('tags', {})
            }
        
        return resources
    
    def _generate_azure_storage(self, config: Dict[str, Any]) -> Dict[str, Any]:
        """Generate Azure storage resources"""
        
        resources = {'resource': {'azurerm_storage_account': {}}}
        
        for bucket_name, bucket_config in config.get('buckets', {}).items():
            resources['resource']['azurerm_storage_account'][bucket_name] = {
                'name': bucket_config['name'].replace('-', ''),
                'resource_group_name': '${var.resource_group_name}',
                'location': '${var.location}',
                'account_tier': 'Standard',
                'account_replication_type': 'LRS',
                'tags': bucket_config.get('tags', {})
            }
        
        return resources
    
    def _generate_gcp_storage(self, config: Dict[str, Any]) -> Dict[str, Any]:
        """Generate GCP storage resources"""
        
        resources = {'resource': {'google_storage_bucket': {}}}
        
        for bucket_name, bucket_config in config.get('buckets', {}).items():
            resources['resource']['google_storage_bucket'][bucket_name] = {
                'name': bucket_config['name'],
                'location': '${var.region}',
                'labels': bucket_config.get('tags', {})
            }
        
        return resources
    
    def _generate_aws_network(self, config: Dict[str, Any]) -> Dict[str, Any]:
        """Generate AWS network resources"""
        
        resources = {
            'resource': {
                'aws_vpc': {},
                'aws_subnet': {}
            }
        }
        
        for vpc_name, vpc_config in config.get('vpcs', {}).items():
            resources['resource']['aws_vpc'][vpc_name] = {
                'cidr_block': vpc_config['cidr'],
                'enable_dns_hostnames': True,
                'enable_dns_support': True,
                'tags': vpc_config.get('tags', {})
            }
            
            for subnet_name, subnet_config in vpc_config.get('subnets', {}).items():
                resources['resource']['aws_subnet'][subnet_name] = {
                    'vpc_id': f"${{aws_vpc.{vpc_name}.id}}",
                    'cidr_block': subnet_config['cidr'],
                    'availability_zone': subnet_config.get('az', '${data.aws_availability_zones.available.names[0]}'),
                    'tags': subnet_config.get('tags', {})
                }
        
        return resources
    
    def _generate_azure_network(self, config: Dict[str, Any]) -> Dict[str, Any]:
        """Generate Azure network resources"""
        
        resources = {
            'resource': {
                'azurerm_virtual_network': {},
                'azurerm_subnet': {}
            }
        }
        
        for vpc_name, vpc_config in config.get('vpcs', {}).items():
            resources['resource']['azurerm_virtual_network'][vpc_name] = {
                'name': vpc_name,
                'address_space': [vpc_config['cidr']],
                'location': '${var.location}',
                'resource_group_name': '${var.resource_group_name}',
                'tags': vpc_config.get('tags', {})
            }
            
            for subnet_name, subnet_config in vpc_config.get('subnets', {}).items():
                resources['resource']['azurerm_subnet'][subnet_name] = {
                    'name': subnet_name,
                    'resource_group_name': '${var.resource_group_name}',
                    'virtual_network_name': f"${{azurerm_virtual_network.{vpc_name}.name}}",
                    'address_prefixes': [subnet_config['cidr']]
                }
        
        return resources
    
    def _generate_gcp_network(self, config: Dict[str, Any]) -> Dict[str, Any]:
        """Generate GCP network resources"""
        
        resources = {
            'resource': {
                'google_compute_network': {},
                'google_compute_subnetwork': {}
            }
        }
        
        for vpc_name, vpc_config in config.get('vpcs', {}).items():
            resources['resource']['google_compute_network'][vpc_name] = {
                'name': vpc_name,
                'auto_create_subnetworks': False
            }
            
            for subnet_name, subnet_config in vpc_config.get('subnets', {}).items():
                resources['resource']['google_compute_subnetwork'][subnet_name] = {
                    'name': subnet_name,
                    'ip_cidr_range': subnet_config['cidr'],
                    'region': '${var.region}',
                    'network': f"${{google_compute_network.{vpc_name}.id}}"
                }
        
        return resources
    
    def _map_instance_size(self, provider: str, size: str) -> str:
        """Map universal size to provider-specific instance type"""
        
        size_mappings = {
            'aws': {
                'small': 't3.micro',
                'medium': 't3.small',
                'large': 't3.medium',
                'xlarge': 't3.large'
            },
            'azure': {
                'small': 'Standard_B1s',
                'medium': 'Standard_B2s',
                'large': 'Standard_B4ms',
                'xlarge': 'Standard_B8ms'
            },
            'gcp': {
                'small': 'e2-micro',
                'medium': 'e2-small',
                'large': 'e2-medium',
                'xlarge': 'e2-standard-2'
            }
        }
        
        return size_mappings.get(provider, {}).get(size, size)

def main():
    import argparse
    
    parser = argparse.ArgumentParser(description='Multi-Cloud Configuration Factory')
    parser.add_argument('--spec-file', required=True, help='Universal specification file')
    parser.add_argument('--output-dir', default='./generated', help='Output directory')
    
    args = parser.parse_args()
    
    factory = MultiCloudConfigFactory()
    generated_files = factory.generate_configs(args.spec_file, args.output_dir)
    
    print("Generated configurations:")
    for provider, file_path in generated_files.items():
        print(f"  {provider}: {file_path}")

if __name__ == "__main__":
    main()

What’s Next

Provider abstraction patterns enable you to write infrastructure code once and deploy it across multiple clouds. However, data often needs to move between these environments. In the next part, we’ll explore data and storage strategies for multi-cloud architectures, including replication, backup, and disaster recovery patterns.

Data and Storage Strategies

Multi-cloud data strategies require careful planning for replication, backup, disaster recovery, and compliance. Each cloud provider offers different storage services with varying performance characteristics, pricing models, and integration capabilities. This part covers patterns for managing data across multiple clouds effectively.

Cross-Cloud Data Replication

Set up automated data replication between cloud providers:

# Primary storage in AWS
resource "aws_s3_bucket" "primary" {
  bucket = "company-data-primary"
  
  tags = {
    Environment = "production"
    Role        = "primary"
  }
}

resource "aws_s3_bucket_versioning" "primary" {
  bucket = aws_s3_bucket.primary.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_notification" "replication_trigger" {
  bucket = aws_s3_bucket.primary.id
  
  lambda_function {
    lambda_function_arn = aws_lambda_function.cross_cloud_replicator.arn
    events              = ["s3:ObjectCreated:*"]
  }
}

# Backup storage in Azure
resource "azurerm_storage_account" "backup" {
  name                     = "companydatabackup"
  resource_group_name      = azurerm_resource_group.main.name
  location                 = azurerm_resource_group.main.location
  account_tier             = "Standard"
  account_replication_type = "GRS"
  
  tags = {
    Environment = "production"
    Role        = "backup"
  }
}

resource "azurerm_storage_container" "backup" {
  name                  = "data-backup"
  storage_account_name  = azurerm_storage_account.backup.name
  container_access_type = "private"
}

# Archive storage in GCP
resource "google_storage_bucket" "archive" {
  name     = "company-data-archive"
  location = "US"
  
  storage_class = "COLDLINE"
  
  lifecycle_rule {
    condition {
      age = 90
    }
    action {
      type          = "SetStorageClass"
      storage_class = "ARCHIVE"
    }
  }
  
  labels = {
    environment = "production"
    role        = "archive"
  }
}

# Cross-cloud replication Lambda
resource "aws_lambda_function" "cross_cloud_replicator" {
  filename         = "replicator.zip"
  function_name    = "cross-cloud-replicator"
  role            = aws_iam_role.replicator.arn
  handler         = "index.handler"
  runtime         = "python3.9"
  timeout         = 300
  
  environment {
    variables = {
      AZURE_STORAGE_ACCOUNT = azurerm_storage_account.backup.name
      AZURE_CONTAINER       = azurerm_storage_container.backup.name
      GCP_BUCKET           = google_storage_bucket.archive.name
    }
  }
}

resource "aws_iam_role" "replicator" {
  name = "cross-cloud-replicator-role"
  
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "lambda.amazonaws.com"
        }
      }
    ]
  })
}

resource "aws_iam_role_policy" "replicator" {
  name = "cross-cloud-replicator-policy"
  role = aws_iam_role.replicator.id
  
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "s3:GetObject",
          "s3:PutObject",
          "logs:CreateLogGroup",
          "logs:CreateLogStream",
          "logs:PutLogEvents"
        ]
        Resource = "*"
      }
    ]
  })
}

Database Replication Strategy

Implement cross-cloud database replication:

# Primary database in AWS RDS
resource "aws_db_instance" "primary" {
  identifier = "company-db-primary"
  
  engine         = "postgres"
  engine_version = "13.7"
  instance_class = "db.t3.medium"
  
  allocated_storage     = 100
  max_allocated_storage = 1000
  storage_encrypted     = true
  
  db_name  = "companydb"
  username = "dbadmin"
  password = var.db_password
  
  vpc_security_group_ids = [aws_security_group.rds.id]
  db_subnet_group_name   = aws_db_subnet_group.main.name
  
  backup_retention_period = 7
  backup_window          = "03:00-04:00"
  maintenance_window     = "sun:04:00-sun:05:00"
  
  skip_final_snapshot = false
  final_snapshot_identifier = "company-db-final-snapshot"
  
  tags = {
    Environment = "production"
    Role        = "primary"
  }
}

# Read replica in Azure Database
resource "azurerm_postgresql_server" "replica" {
  name                = "company-db-replica"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  
  administrator_login          = "dbadmin"
  administrator_login_password = var.db_password
  
  sku_name   = "GP_Gen5_2"
  version    = "11"
  storage_mb = 102400
  
  backup_retention_days        = 7
  geo_redundant_backup_enabled = true
  auto_grow_enabled           = true
  
  ssl_enforcement_enabled = true
  
  tags = {
    Environment = "production"
    Role        = "replica"
  }
}

# Backup database in GCP Cloud SQL
resource "google_sql_database_instance" "backup" {
  name             = "company-db-backup"
  database_version = "POSTGRES_13"
  region          = "us-central1"
  
  settings {
    tier = "db-custom-2-7680"
    
    backup_configuration {
      enabled    = true
      start_time = "03:00"
      
      point_in_time_recovery_enabled = true
    }
    
    ip_configuration {
      ipv4_enabled = false
      private_network = google_compute_network.main.id
    }
    
    database_flags {
      name  = "log_statement"
      value = "all"
    }
  }
  
  deletion_protection = true
}

Data Synchronization Pipeline

Create automated data synchronization across clouds:

#!/usr/bin/env python3
# scripts/data_sync_pipeline.py

import boto3
import asyncio
from azure.storage.blob import BlobServiceClient
from google.cloud import storage as gcs
from typing import Dict, List, Optional
import hashlib
import json
from datetime import datetime

class MultiCloudDataSync:
    def __init__(self, config: Dict[str, any]):
        self.config = config
        
        # Initialize cloud clients
        self.s3_client = boto3.client('s3')
        self.azure_client = BlobServiceClient(
            account_url=f"https://{config['azure']['account_name']}.blob.core.windows.net",
            credential=config['azure']['access_key']
        )
        self.gcp_client = gcs.Client(project=config['gcp']['project_id'])
    
    async def sync_all_data(self) -> Dict[str, any]:
        """Synchronize data across all cloud providers"""
        
        sync_results = {
            'timestamp': datetime.utcnow().isoformat(),
            'synced_objects': 0,
            'errors': [],
            'providers': {}
        }
        
        # Get source data inventory
        source_objects = await self._get_source_inventory()
        
        # Sync to each target provider
        for target_provider in self.config['sync_targets']:
            provider_results = await self._sync_to_provider(
                source_objects, 
                target_provider
            )
            sync_results['providers'][target_provider] = provider_results
            sync_results['synced_objects'] += provider_results.get('synced_count', 0)
            sync_results['errors'].extend(provider_results.get('errors', []))
        
        return sync_results
    
    async def _get_source_inventory(self) -> List[Dict[str, any]]:
        """Get inventory of objects from source provider"""
        
        source_config = self.config['source']
        objects = []
        
        if source_config['provider'] == 'aws':
            paginator = self.s3_client.get_paginator('list_objects_v2')
            
            for page in paginator.paginate(Bucket=source_config['bucket']):
                for obj in page.get('Contents', []):
                    objects.append({
                        'key': obj['Key'],
                        'size': obj['Size'],
                        'etag': obj['ETag'].strip('"'),
                        'last_modified': obj['LastModified'].isoformat(),
                        'provider': 'aws'
                    })
        
        return objects
    
    async def _sync_to_provider(self, source_objects: List[Dict], target_provider: str) -> Dict[str, any]:
        """Sync objects to target provider"""
        
        results = {
            'synced_count': 0,
            'skipped_count': 0,
            'errors': []
        }
        
        target_config = self.config['targets'][target_provider]
        
        # Get existing objects in target
        existing_objects = await self._get_target_inventory(target_provider, target_config)
        existing_keys = {obj['key']: obj for obj in existing_objects}
        
        for source_obj in source_objects:
            try:
                # Check if object needs sync
                if await self._needs_sync(source_obj, existing_keys.get(source_obj['key'])):
                    await self._copy_object(source_obj, target_provider, target_config)
                    results['synced_count'] += 1
                else:
                    results['skipped_count'] += 1
                    
            except Exception as e:
                results['errors'].append({
                    'object': source_obj['key'],
                    'error': str(e)
                })
        
        return results
    
    async def _get_target_inventory(self, provider: str, config: Dict) -> List[Dict[str, any]]:
        """Get inventory from target provider"""
        
        objects = []
        
        if provider == 'azure':
            container_client = self.azure_client.get_container_client(config['container'])
            
            async for blob in container_client.list_blobs():
                objects.append({
                    'key': blob.name,
                    'size': blob.size,
                    'etag': blob.etag.strip('"'),
                    'last_modified': blob.last_modified.isoformat()
                })
        
        elif provider == 'gcp':
            bucket = self.gcp_client.bucket(config['bucket'])
            
            for blob in bucket.list_blobs():
                objects.append({
                    'key': blob.name,
                    'size': blob.size,
                    'etag': blob.etag.strip('"'),
                    'last_modified': blob.time_created.isoformat()
                })
        
        return objects
    
    async def _needs_sync(self, source_obj: Dict, target_obj: Optional[Dict]) -> bool:
        """Determine if object needs synchronization"""
        
        if not target_obj:
            return True
        
        # Compare ETags (checksums)
        if source_obj['etag'] != target_obj['etag']:
            return True
        
        # Compare sizes
        if source_obj['size'] != target_obj['size']:
            return True
        
        return False
    
    async def _copy_object(self, source_obj: Dict, target_provider: str, target_config: Dict):
        """Copy object from source to target provider"""
        
        # Download from source
        source_config = self.config['source']
        
        if source_config['provider'] == 'aws':
            response = self.s3_client.get_object(
                Bucket=source_config['bucket'],
                Key=source_obj['key']
            )
            data = response['Body'].read()
        
        # Upload to target
        if target_provider == 'azure':
            blob_client = self.azure_client.get_blob_client(
                container=target_config['container'],
                blob=source_obj['key']
            )
            blob_client.upload_blob(data, overwrite=True)
        
        elif target_provider == 'gcp':
            bucket = self.gcp_client.bucket(target_config['bucket'])
            blob = bucket.blob(source_obj['key'])
            blob.upload_from_string(data)
    
    def generate_sync_report(self, results: Dict[str, any]) -> str:
        """Generate human-readable sync report"""
        
        report_lines = [
            "Multi-Cloud Data Sync Report",
            "=" * 40,
            f"Timestamp: {results['timestamp']}",
            f"Total Objects Synced: {results['synced_objects']}",
            f"Total Errors: {len(results['errors'])}",
            ""
        ]
        
        for provider, provider_results in results['providers'].items():
            report_lines.extend([
                f"{provider.upper()} Results:",
                f"  Synced: {provider_results['synced_count']}",
                f"  Skipped: {provider_results['skipped_count']}",
                f"  Errors: {len(provider_results['errors'])}",
                ""
            ])
        
        if results['errors']:
            report_lines.extend(["Errors:", ""])
            for error in results['errors'][:10]:  # Show first 10 errors
                report_lines.append(f"  {error['object']}: {error['error']}")
        
        return "\n".join(report_lines)

async def main():
    import argparse
    
    parser = argparse.ArgumentParser(description='Multi-Cloud Data Sync')
    parser.add_argument('--config', required=True, help='Sync configuration file')
    parser.add_argument('--dry-run', action='store_true', help='Show what would be synced')
    
    args = parser.parse_args()
    
    with open(args.config, 'r') as f:
        config = json.load(f)
    
    sync_manager = MultiCloudDataSync(config)
    
    if args.dry_run:
        print("DRY RUN - No data will be copied")
    
    results = await sync_manager.sync_all_data()
    report = sync_manager.generate_sync_report(results)
    
    print(report)

if __name__ == "__main__":
    asyncio.run(main())

Disaster Recovery Automation

Implement automated disaster recovery across clouds:

#!/bin/bash
# scripts/disaster-recovery.sh

set -e

DR_CONFIG_FILE=${1:-"dr-config.json"}
RECOVERY_TYPE=${2:-"full"}  # full, partial, test

execute_disaster_recovery() {
    echo "🚨 Executing disaster recovery: $RECOVERY_TYPE"
    
    # Load DR configuration
    if [ ! -f "$DR_CONFIG_FILE" ]; then
        echo "❌ DR configuration file not found: $DR_CONFIG_FILE"
        exit 1
    fi
    
    PRIMARY_PROVIDER=$(jq -r '.primary_provider' "$DR_CONFIG_FILE")
    DR_PROVIDER=$(jq -r '.dr_provider' "$DR_CONFIG_FILE")
    
    echo "Primary: $PRIMARY_PROVIDER"
    echo "DR Target: $DR_PROVIDER"
    
    # Check primary provider health
    if check_provider_health "$PRIMARY_PROVIDER"; then
        echo "⚠️  Primary provider is healthy. Are you sure you want to proceed?"
        read -p "Continue with DR? (y/N): " -n 1 -r
        echo
        if [[ ! $REPLY =~ ^[Yy]$ ]]; then
            exit 1
        fi
    fi
    
    # Execute recovery steps
    case "$RECOVERY_TYPE" in
        "full")
            execute_full_recovery
            ;;
        "partial")
            execute_partial_recovery
            ;;
        "test")
            execute_test_recovery
            ;;
        *)
            echo "❌ Unknown recovery type: $RECOVERY_TYPE"
            exit 1
            ;;
    esac
}

check_provider_health() {
    local provider=$1
    
    case "$provider" in
        "aws")
            aws sts get-caller-identity >/dev/null 2>&1
            ;;
        "azure")
            az account show >/dev/null 2>&1
            ;;
        "gcp")
            gcloud auth list --filter=status:ACTIVE --format="value(account)" | head -1 >/dev/null 2>&1
            ;;
    esac
}

execute_full_recovery() {
    echo "🔄 Executing full disaster recovery..."
    
    # 1. Activate DR infrastructure
    activate_dr_infrastructure
    
    # 2. Restore data from backups
    restore_data_from_backups
    
    # 3. Update DNS to point to DR site
    update_dns_to_dr
    
    # 4. Validate recovery
    validate_recovery
    
    echo "✅ Full disaster recovery completed"
}

activate_dr_infrastructure() {
    echo "Activating DR infrastructure..."
    
    DR_TERRAFORM_DIR=$(jq -r '.dr_terraform_dir' "$DR_CONFIG_FILE")
    
    cd "$DR_TERRAFORM_DIR"
    
    # Initialize and apply DR infrastructure
    terraform init
    terraform plan -var="dr_mode=active"
    terraform apply -auto-approve -var="dr_mode=active"
    
    # Wait for infrastructure to be ready
    sleep 60
}

restore_data_from_backups() {
    echo "Restoring data from backups..."
    
    BACKUP_LOCATIONS=$(jq -r '.backup_locations[]' "$DR_CONFIG_FILE")
    
    for backup_location in $BACKUP_LOCATIONS; do
        echo "Restoring from: $backup_location"
        
        # This would call provider-specific restore scripts
        case "$DR_PROVIDER" in
            "aws")
                restore_from_s3_backup "$backup_location"
                ;;
            "azure")
                restore_from_azure_backup "$backup_location"
                ;;
            "gcp")
                restore_from_gcs_backup "$backup_location"
                ;;
        esac
    done
}

update_dns_to_dr() {
    echo "Updating DNS to point to DR site..."
    
    DR_ENDPOINT=$(jq -r '.dr_endpoint' "$DR_CONFIG_FILE")
    DNS_ZONE=$(jq -r '.dns_zone' "$DR_CONFIG_FILE")
    
    # Update DNS record to point to DR endpoint
    aws route53 change-resource-record-sets \
        --hosted-zone-id "$DNS_ZONE" \
        --change-batch "{
            \"Changes\": [{
                \"Action\": \"UPSERT\",
                \"ResourceRecordSet\": {
                    \"Name\": \"$(jq -r '.primary_domain' "$DR_CONFIG_FILE")\",
                    \"Type\": \"A\",
                    \"TTL\": 60,
                    \"ResourceRecords\": [{\"Value\": \"$DR_ENDPOINT\"}]
                }
            }]
        }"
}

validate_recovery() {
    echo "Validating disaster recovery..."
    
    HEALTH_CHECK_URL=$(jq -r '.health_check_url' "$DR_CONFIG_FILE")
    
    # Wait for application to be healthy
    for i in {1..30}; do
        if curl -f "$HEALTH_CHECK_URL" >/dev/null 2>&1; then
            echo "✅ Application is healthy"
            return 0
        fi
        
        echo "Waiting for application to be healthy... ($i/30)"
        sleep 10
    done
    
    echo "❌ Application health check failed"
    return 1
}

# Execute based on parameters
case "${3:-execute}" in
    "execute")
        execute_disaster_recovery
        ;;
    "test")
        echo "🧪 Testing disaster recovery procedures..."
        RECOVERY_TYPE="test"
        execute_disaster_recovery
        ;;
    *)
        echo "Usage: $0 <dr_config_file> <recovery_type> [execute|test]"
        echo ""
        echo "Recovery types: full, partial, test"
        exit 1
        ;;
esac

What’s Next

Data and storage strategies provide the foundation for reliable multi-cloud operations, but monitoring and observability across multiple providers requires unified approaches. In the next part, we’ll explore how to implement comprehensive monitoring and observability that gives you visibility into your entire multi-cloud infrastructure from a single pane of glass.

Monitoring and Observability

Monitoring multi-cloud infrastructure requires aggregating metrics, logs, and traces from different providers into unified dashboards and alerting systems. Each cloud provider has native monitoring services, but you need centralized observability to understand your entire system’s health and performance.

Unified Metrics Collection

Set up centralized metrics collection from all cloud providers:

# Prometheus deployment for centralized metrics
resource "kubernetes_namespace" "monitoring" {
  metadata {
    name = "monitoring"
  }
}

resource "helm_release" "prometheus" {
  name       = "prometheus"
  repository = "https://prometheus-community.github.io/helm-charts"
  chart      = "kube-prometheus-stack"
  namespace  = kubernetes_namespace.monitoring.metadata[0].name
  
  values = [
    yamlencode({
      prometheus = {
        prometheusSpec = {
          retention = "30d"
          storageSpec = {
            volumeClaimTemplate = {
              spec = {
                storageClassName = "fast-ssd"
                accessModes      = ["ReadWriteOnce"]
                resources = {
                  requests = {
                    storage = "100Gi"
                  }
                }
              }
            }
          }
          additionalScrapeConfigs = [
            {
              job_name = "aws-cloudwatch"
              static_configs = [{
                targets = ["cloudwatch-exporter:9106"]
              }]
            },
            {
              job_name = "azure-monitor"
              static_configs = [{
                targets = ["azure-exporter:9107"]
              }]
            },
            {
              job_name = "gcp-monitoring"
              static_configs = [{
                targets = ["gcp-exporter:9108"]
              }]
            }
          ]
        }
      }
      grafana = {
        adminPassword = var.grafana_admin_password
        persistence = {
          enabled = true
          size    = "10Gi"
        }
      }
    })
  ]
}

# CloudWatch Exporter for AWS metrics
resource "kubernetes_deployment" "cloudwatch_exporter" {
  metadata {
    name      = "cloudwatch-exporter"
    namespace = kubernetes_namespace.monitoring.metadata[0].name
  }
  
  spec {
    replicas = 1
    
    selector {
      match_labels = {
        app = "cloudwatch-exporter"
      }
    }
    
    template {
      metadata {
        labels = {
          app = "cloudwatch-exporter"
        }
      }
      
      spec {
        container {
          name  = "cloudwatch-exporter"
          image = "prom/cloudwatch-exporter:latest"
          
          port {
            container_port = 9106
          }
          
          env {
            name  = "AWS_REGION"
            value = var.aws_region
          }
          
          volume_mount {
            name       = "config"
            mount_path = "/config"
          }
        }
        
        volume {
          name = "config"
          config_map {
            name = kubernetes_config_map.cloudwatch_config.metadata[0].name
          }
        }
      }
    }
  }
}

resource "kubernetes_config_map" "cloudwatch_config" {
  metadata {
    name      = "cloudwatch-exporter-config"
    namespace = kubernetes_namespace.monitoring.metadata[0].name
  }
  
  data = {
    "config.yml" = yamlencode({
      region = var.aws_region
      metrics = [
        {
          aws_namespace = "AWS/EC2"
          aws_metric_name = "CPUUtilization"
          aws_dimensions = ["InstanceId"]
          aws_statistics = ["Average"]
        },
        {
          aws_namespace = "AWS/RDS"
          aws_metric_name = "DatabaseConnections"
          aws_dimensions = ["DBInstanceIdentifier"]
          aws_statistics = ["Average"]
        },
        {
          aws_namespace = "AWS/S3"
          aws_metric_name = "BucketSizeBytes"
          aws_dimensions = ["BucketName", "StorageType"]
          aws_statistics = ["Average"]
        }
      ]
    })
  }
}

# Azure Monitor Exporter
resource "kubernetes_deployment" "azure_exporter" {
  metadata {
    name      = "azure-exporter"
    namespace = kubernetes_namespace.monitoring.metadata[0].name
  }
  
  spec {
    replicas = 1
    
    selector {
      match_labels = {
        app = "azure-exporter"
      }
    }
    
    template {
      metadata {
        labels = {
          app = "azure-exporter"
        }
      }
      
      spec {
        container {
          name  = "azure-exporter"
          image = "webdevops/azure-metrics-exporter:latest"
          
          port {
            container_port = 9107
          }
          
          env {
            name  = "AZURE_SUBSCRIPTION_ID"
            value = var.azure_subscription_id
          }
          
          env {
            name = "AZURE_CLIENT_ID"
            value_from {
              secret_key_ref {
                name = kubernetes_secret.azure_credentials.metadata[0].name
                key  = "client_id"
              }
            }
          }
          
          env {
            name = "AZURE_CLIENT_SECRET"
            value_from {
              secret_key_ref {
                name = kubernetes_secret.azure_credentials.metadata[0].name
                key  = "client_secret"
              }
            }
          }
        }
      }
    }
  }
}

# GCP Monitoring Exporter
resource "kubernetes_deployment" "gcp_exporter" {
  metadata {
    name      = "gcp-exporter"
    namespace = kubernetes_namespace.monitoring.metadata[0].name
  }
  
  spec {
    replicas = 1
    
    selector {
      match_labels = {
        app = "gcp-exporter"
      }
    }
    
    template {
      metadata {
        labels = {
          app = "gcp-exporter"
        }
      }
      
      spec {
        container {
          name  = "gcp-exporter"
          image = "prometheuscommunity/stackdriver-exporter:latest"
          
          port {
            container_port = 9108
          }
          
          env {
            name  = "GOOGLE_APPLICATION_CREDENTIALS"
            value = "/credentials/service-account.json"
          }
          
          env {
            name  = "STACKDRIVER_EXPORTER_GOOGLE_PROJECT_ID"
            value = var.gcp_project_id
          }
          
          volume_mount {
            name       = "gcp-credentials"
            mount_path = "/credentials"
          }
        }
        
        volume {
          name = "gcp-credentials"
          secret {
            secret_name = kubernetes_secret.gcp_credentials.metadata[0].name
          }
        }
      }
    }
  }
}

Cross-Cloud Alerting System

Implement unified alerting across all providers:

#!/usr/bin/env python3
# scripts/multi_cloud_alerting.py

import boto3
import json
import requests
from azure.monitor.query import LogsQueryClient
from azure.identity import DefaultAzureCredential
from google.cloud import monitoring_v3
from typing import Dict, List, Any
from datetime import datetime, timedelta

class MultiCloudAlertManager:
    def __init__(self, config: Dict[str, Any]):
        self.config = config
        
        # Initialize cloud monitoring clients
        self.aws_cloudwatch = boto3.client('cloudwatch')
        self.azure_credential = DefaultAzureCredential()
        self.azure_logs_client = LogsQueryClient(self.azure_credential)
        self.gcp_monitoring = monitoring_v3.MetricServiceClient()
        
        # Alert channels
        self.slack_webhook = config.get('slack_webhook_url')
        self.pagerduty_key = config.get('pagerduty_integration_key')
    
    def check_all_providers(self) -> Dict[str, Any]:
        """Check health across all cloud providers"""
        
        results = {
            'timestamp': datetime.utcnow().isoformat(),
            'overall_status': 'healthy',
            'providers': {},
            'alerts': []
        }
        
        # Check each provider
        for provider_config in self.config['providers']:
            provider_name = provider_config['name']
            
            try:
                if provider_name == 'aws':
                    provider_results = self._check_aws_health(provider_config)
                elif provider_name == 'azure':
                    provider_results = self._check_azure_health(provider_config)
                elif provider_name == 'gcp':
                    provider_results = self._check_gcp_health(provider_config)
                else:
                    continue
                
                results['providers'][provider_name] = provider_results
                
                # Collect alerts
                if provider_results['alerts']:
                    results['alerts'].extend(provider_results['alerts'])
                    results['overall_status'] = 'degraded'
                
            except Exception as e:
                results['providers'][provider_name] = {
                    'status': 'error',
                    'error': str(e),
                    'alerts': [{
                        'severity': 'critical',
                        'message': f"Failed to check {provider_name}: {str(e)}"
                    }]
                }
                results['alerts'].append({
                    'provider': provider_name,
                    'severity': 'critical',
                    'message': f"Monitoring failure: {str(e)}"
                })
                results['overall_status'] = 'critical'
        
        return results
    
    def _check_aws_health(self, config: Dict[str, Any]) -> Dict[str, Any]:
        """Check AWS resource health"""
        
        results = {
            'status': 'healthy',
            'metrics': {},
            'alerts': []
        }
        
        # Check EC2 instances
        for instance_check in config.get('ec2_checks', []):
            metric_data = self.aws_cloudwatch.get_metric_statistics(
                Namespace='AWS/EC2',
                MetricName='CPUUtilization',
                Dimensions=[{'Name': 'InstanceId', 'Value': instance_check['instance_id']}],
                StartTime=datetime.utcnow() - timedelta(minutes=10),
                EndTime=datetime.utcnow(),
                Period=300,
                Statistics=['Average']
            )
            
            if metric_data['Datapoints']:
                cpu_usage = metric_data['Datapoints'][-1]['Average']
                results['metrics'][f"ec2_{instance_check['instance_id']}_cpu"] = cpu_usage
                
                if cpu_usage > instance_check.get('cpu_threshold', 80):
                    results['alerts'].append({
                        'severity': 'warning',
                        'resource': instance_check['instance_id'],
                        'message': f"High CPU usage: {cpu_usage:.1f}%"
                    })
        
        # Check RDS instances
        for rds_check in config.get('rds_checks', []):
            metric_data = self.aws_cloudwatch.get_metric_statistics(
                Namespace='AWS/RDS',
                MetricName='DatabaseConnections',
                Dimensions=[{'Name': 'DBInstanceIdentifier', 'Value': rds_check['db_instance']}],
                StartTime=datetime.utcnow() - timedelta(minutes=10),
                EndTime=datetime.utcnow(),
                Period=300,
                Statistics=['Average']
            )
            
            if metric_data['Datapoints']:
                connections = metric_data['Datapoints'][-1]['Average']
                results['metrics'][f"rds_{rds_check['db_instance']}_connections"] = connections
                
                if connections > rds_check.get('connection_threshold', 80):
                    results['alerts'].append({
                        'severity': 'warning',
                        'resource': rds_check['db_instance'],
                        'message': f"High database connections: {connections}"
                    })
        
        return results
    
    def _check_azure_health(self, config: Dict[str, Any]) -> Dict[str, Any]:
        """Check Azure resource health"""
        
        results = {
            'status': 'healthy',
            'metrics': {},
            'alerts': []
        }
        
        # Check virtual machines
        for vm_check in config.get('vm_checks', []):
            query = f"""
            Perf
            | where TimeGenerated > ago(10m)
            | where Computer == "{vm_check['vm_name']}"
            | where CounterName == "% Processor Time"
            | summarize avg(CounterValue) by bin(TimeGenerated, 5m)
            | order by TimeGenerated desc
            | limit 1
            """
            
            try:
                response = self.azure_logs_client.query_workspace(
                    workspace_id=config['workspace_id'],
                    query=query,
                    timespan=timedelta(minutes=10)
                )
                
                if response.tables and response.tables[0].rows:
                    cpu_usage = response.tables[0].rows[0][1]
                    results['metrics'][f"vm_{vm_check['vm_name']}_cpu"] = cpu_usage
                    
                    if cpu_usage > vm_check.get('cpu_threshold', 80):
                        results['alerts'].append({
                            'severity': 'warning',
                            'resource': vm_check['vm_name'],
                            'message': f"High CPU usage: {cpu_usage:.1f}%"
                        })
            
            except Exception as e:
                results['alerts'].append({
                    'severity': 'error',
                    'resource': vm_check['vm_name'],
                    'message': f"Failed to query metrics: {str(e)}"
                })
        
        return results
    
    def _check_gcp_health(self, config: Dict[str, Any]) -> Dict[str, Any]:
        """Check GCP resource health"""
        
        results = {
            'status': 'healthy',
            'metrics': {},
            'alerts': []
        }
        
        project_name = f"projects/{config['project_id']}"
        
        # Check Compute Engine instances
        for instance_check in config.get('instance_checks', []):
            interval = monitoring_v3.TimeInterval({
                "end_time": {"seconds": int(datetime.utcnow().timestamp())},
                "start_time": {"seconds": int((datetime.utcnow() - timedelta(minutes=10)).timestamp())},
            })
            
            request = monitoring_v3.ListTimeSeriesRequest({
                "name": project_name,
                "filter": f'metric.type="compute.googleapis.com/instance/cpu/utilization" AND resource.labels.instance_name="{instance_check["instance_name"]}"',
                "interval": interval,
                "view": monitoring_v3.ListTimeSeriesRequest.TimeSeriesView.FULL,
            })
            
            try:
                page_result = self.gcp_monitoring.list_time_series(request=request)
                
                for time_series in page_result:
                    if time_series.points:
                        cpu_usage = time_series.points[0].value.double_value * 100
                        results['metrics'][f"gce_{instance_check['instance_name']}_cpu"] = cpu_usage
                        
                        if cpu_usage > instance_check.get('cpu_threshold', 80):
                            results['alerts'].append({
                                'severity': 'warning',
                                'resource': instance_check['instance_name'],
                                'message': f"High CPU usage: {cpu_usage:.1f}%"
                            })
            
            except Exception as e:
                results['alerts'].append({
                    'severity': 'error',
                    'resource': instance_check['instance_name'],
                    'message': f"Failed to query metrics: {str(e)}"
                })
        
        return results
    
    def send_alerts(self, alerts: List[Dict[str, Any]]):
        """Send alerts to configured channels"""
        
        if not alerts:
            return
        
        # Group alerts by severity
        critical_alerts = [a for a in alerts if a.get('severity') == 'critical']
        warning_alerts = [a for a in alerts if a.get('severity') == 'warning']
        
        # Send to Slack
        if self.slack_webhook:
            self._send_slack_alert(critical_alerts, warning_alerts)
        
        # Send to PagerDuty for critical alerts
        if self.pagerduty_key and critical_alerts:
            self._send_pagerduty_alert(critical_alerts)
    
    def _send_slack_alert(self, critical_alerts: List, warning_alerts: List):
        """Send alert to Slack"""
        
        color = "danger" if critical_alerts else "warning"
        
        message = {
            "attachments": [{
                "color": color,
                "title": "Multi-Cloud Infrastructure Alert",
                "fields": []
            }]
        }
        
        if critical_alerts:
            message["attachments"][0]["fields"].append({
                "title": f"Critical Alerts ({len(critical_alerts)})",
                "value": "\n".join([f"• {alert['message']}" for alert in critical_alerts[:5]]),
                "short": False
            })
        
        if warning_alerts:
            message["attachments"][0]["fields"].append({
                "title": f"Warning Alerts ({len(warning_alerts)})",
                "value": "\n".join([f"• {alert['message']}" for alert in warning_alerts[:5]]),
                "short": False
            })
        
        requests.post(self.slack_webhook, json=message)
    
    def _send_pagerduty_alert(self, critical_alerts: List):
        """Send critical alert to PagerDuty"""
        
        payload = {
            "routing_key": self.pagerduty_key,
            "event_action": "trigger",
            "payload": {
                "summary": f"Multi-Cloud Critical Alert: {len(critical_alerts)} issues detected",
                "source": "multi-cloud-monitor",
                "severity": "critical",
                "custom_details": {
                    "alerts": critical_alerts
                }
            }
        }
        
        requests.post("https://events.pagerduty.com/v2/enqueue", json=payload)

def main():
    import argparse
    
    parser = argparse.ArgumentParser(description='Multi-Cloud Alert Manager')
    parser.add_argument('--config', required=True, help='Configuration file')
    parser.add_argument('--send-alerts', action='store_true', help='Send alerts to configured channels')
    
    args = parser.parse_args()
    
    with open(args.config, 'r') as f:
        config = json.load(f)
    
    alert_manager = MultiCloudAlertManager(config)
    results = alert_manager.check_all_providers()
    
    print(f"Overall Status: {results['overall_status']}")
    print(f"Total Alerts: {len(results['alerts'])}")
    
    for provider, provider_results in results['providers'].items():
        print(f"\n{provider.upper()}:")
        print(f"  Status: {provider_results['status']}")
        print(f"  Alerts: {len(provider_results.get('alerts', []))}")
    
    if args.send_alerts and results['alerts']:
        alert_manager.send_alerts(results['alerts'])
        print(f"\n📧 Sent {len(results['alerts'])} alerts")

if __name__ == "__main__":
    main()

Unified Dashboard Creation

Create comprehensive dashboards showing all cloud providers:

#!/bin/bash
# scripts/setup-dashboards.sh

set -e

GRAFANA_URL=${1:-"http://localhost:3000"}
GRAFANA_USER=${2:-"admin"}
GRAFANA_PASSWORD=${3:-"admin"}

create_multi_cloud_dashboard() {
    echo "Creating multi-cloud overview dashboard..."
    
    cat > multi-cloud-dashboard.json << 'EOF'
{
  "dashboard": {
    "title": "Multi-Cloud Infrastructure Overview",
    "tags": ["multi-cloud", "overview"],
    "timezone": "browser",
    "panels": [
      {
        "title": "AWS EC2 CPU Utilization",
        "type": "stat",
        "targets": [
          {
            "expr": "aws_ec2_cpuutilization_average",
            "legendFormat": "{{instance_id}}"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "unit": "percent",
            "thresholds": {
              "steps": [
                {"color": "green", "value": null},
                {"color": "yellow", "value": 70},
                {"color": "red", "value": 90}
              ]
            }
          }
        },
        "gridPos": {"h": 8, "w": 8, "x": 0, "y": 0}
      },
      {
        "title": "Azure VM CPU Utilization",
        "type": "stat",
        "targets": [
          {
            "expr": "azure_vm_cpu_percent",
            "legendFormat": "{{vm_name}}"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "unit": "percent",
            "thresholds": {
              "steps": [
                {"color": "green", "value": null},
                {"color": "yellow", "value": 70},
                {"color": "red", "value": 90}
              ]
            }
          }
        },
        "gridPos": {"h": 8, "w": 8, "x": 8, "y": 0}
      },
      {
        "title": "GCP Compute CPU Utilization",
        "type": "stat",
        "targets": [
          {
            "expr": "gcp_compute_instance_cpu_utilization",
            "legendFormat": "{{instance_name}}"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "unit": "percent",
            "thresholds": {
              "steps": [
                {"color": "green", "value": null},
                {"color": "yellow", "value": 70},
                {"color": "red", "value": 90}
              ]
            }
          }
        },
        "gridPos": {"h": 8, "w": 8, "x": 16, "y": 0}
      },
      {
        "title": "Cross-Cloud Network Latency",
        "type": "graph",
        "targets": [
          {
            "expr": "probe_duration_seconds{job=\"blackbox\"}",
            "legendFormat": "{{instance}}"
          }
        ],
        "yAxes": [
          {
            "label": "Latency (seconds)",
            "min": 0
          }
        ],
        "gridPos": {"h": 8, "w": 24, "x": 0, "y": 8}
      },
      {
        "title": "Storage Usage by Provider",
        "type": "piechart",
        "targets": [
          {
            "expr": "aws_s3_bucket_size_bytes",
            "legendFormat": "AWS S3"
          },
          {
            "expr": "azure_storage_account_used_capacity",
            "legendFormat": "Azure Storage"
          },
          {
            "expr": "gcp_storage_bucket_size",
            "legendFormat": "GCP Storage"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 16}
      },
      {
        "title": "Database Connections",
        "type": "graph",
        "targets": [
          {
            "expr": "aws_rds_database_connections",
            "legendFormat": "AWS RDS {{db_instance_identifier}}"
          },
          {
            "expr": "azure_sql_connections",
            "legendFormat": "Azure SQL {{server_name}}"
          },
          {
            "expr": "gcp_cloudsql_connections",
            "legendFormat": "GCP Cloud SQL {{database_id}}"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 16}
      }
    ],
    "time": {
      "from": "now-1h",
      "to": "now"
    },
    "refresh": "30s"
  }
}
EOF
    
    # Import dashboard to Grafana
    curl -X POST \
        -H "Content-Type: application/json" \
        -u "$GRAFANA_USER:$GRAFANA_PASSWORD" \
        -d @multi-cloud-dashboard.json \
        "$GRAFANA_URL/api/dashboards/db"
    
    echo "✅ Multi-cloud dashboard created"
}

create_cost_dashboard() {
    echo "Creating cost monitoring dashboard..."
    
    cat > cost-dashboard.json << 'EOF'
{
  "dashboard": {
    "title": "Multi-Cloud Cost Analysis",
    "tags": ["cost", "billing"],
    "panels": [
      {
        "title": "Daily Costs by Provider",
        "type": "graph",
        "targets": [
          {
            "expr": "aws_billing_estimated_charges",
            "legendFormat": "AWS"
          },
          {
            "expr": "azure_consumption_cost",
            "legendFormat": "Azure"
          },
          {
            "expr": "gcp_billing_cost",
            "legendFormat": "GCP"
          }
        ],
        "yAxes": [
          {
            "label": "Cost (USD)",
            "min": 0
          }
        ],
        "gridPos": {"h": 8, "w": 24, "x": 0, "y": 0}
      },
      {
        "title": "Cost by Service Category",
        "type": "table",
        "targets": [
          {
            "expr": "sum by (service) (aws_billing_estimated_charges)",
            "format": "table"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 8}
      },
      {
        "title": "Monthly Cost Trend",
        "type": "graph",
        "targets": [
          {
            "expr": "increase(aws_billing_estimated_charges[30d])",
            "legendFormat": "AWS Monthly"
          }
        ],
        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 8}
      }
    ]
  }
}
EOF
    
    curl -X POST \
        -H "Content-Type: application/json" \
        -u "$GRAFANA_USER:$GRAFANA_PASSWORD" \
        -d @cost-dashboard.json \
        "$GRAFANA_URL/api/dashboards/db"
    
    echo "✅ Cost dashboard created"
}

create_sla_dashboard() {
    echo "Creating SLA monitoring dashboard..."
    
    cat > sla-dashboard.json << 'EOF'
{
  "dashboard": {
    "title": "Multi-Cloud SLA Monitoring",
    "tags": ["sla", "uptime"],
    "panels": [
      {
        "title": "Service Uptime",
        "type": "stat",
        "targets": [
          {
            "expr": "avg_over_time(up{job=\"multi-cloud-services\"}[24h]) * 100",
            "legendFormat": "{{service}}"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "unit": "percent",
            "min": 95,
            "max": 100,
            "thresholds": {
              "steps": [
                {"color": "red", "value": null},
                {"color": "yellow", "value": 99},
                {"color": "green", "value": 99.9}
              ]
            }
          }
        },
        "gridPos": {"h": 8, "w": 24, "x": 0, "y": 0}
      },
      {
        "title": "Response Time SLA",
        "type": "graph",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, http_request_duration_seconds_bucket)",
            "legendFormat": "95th percentile"
          },
          {
            "expr": "histogram_quantile(0.99, http_request_duration_seconds_bucket)",
            "legendFormat": "99th percentile"
          }
        ],
        "gridPos": {"h": 8, "w": 24, "x": 0, "y": 8}
      }
    ]
  }
}
EOF
    
    curl -X POST \
        -H "Content-Type: application/json" \
        -u "$GRAFANA_USER:$GRAFANA_PASSWORD" \
        -d @sla-dashboard.json \
        "$GRAFANA_URL/api/dashboards/db"
    
    echo "✅ SLA dashboard created"
}

# Create all dashboards
create_multi_cloud_dashboard
create_cost_dashboard
create_sla_dashboard

# Cleanup temp files
rm -f multi-cloud-dashboard.json cost-dashboard.json sla-dashboard.json

echo "✅ All multi-cloud dashboards created successfully"
echo "Access them at: $GRAFANA_URL"

What’s Next

Unified monitoring and observability provide the visibility needed to operate multi-cloud infrastructure effectively. With comprehensive metrics, alerting, and dashboards in place, you can maintain high availability and performance across all your cloud providers.

In the final part of this guide, we’ll explore governance and cost management strategies that help you maintain control, compliance, and cost efficiency across your entire multi-cloud environment.

Governance and Cost Management

Managing governance and costs across multiple cloud providers presents unique challenges. Each provider has different pricing models, compliance frameworks, and management tools. Effective multi-cloud governance requires unified policies, consistent tagging strategies, and comprehensive cost monitoring that works across AWS, Azure, and Google Cloud.

This final part covers the patterns and practices for implementing governance and cost management in multi-cloud Terraform environments.

Unified Tagging Strategy

Implement consistent tagging across all cloud providers:

# Global tagging strategy
locals {
  # Standard tags that work across all providers
  standard_tags = {
    Environment   = var.environment
    Project       = var.project_name
    Owner         = var.team_name
    CostCenter    = var.cost_center
    ManagedBy     = "terraform"
    CreatedDate   = formatdate("YYYY-MM-DD", timestamp())
    LastModified  = formatdate("YYYY-MM-DD", timestamp())
  }
  
  # Provider-specific tag formats
  aws_tags = local.standard_tags
  
  azure_tags = {
    for k, v in local.standard_tags :
    k => v
  }
  
  gcp_labels = {
    for k, v in local.standard_tags :
    lower(replace(k, " ", "_")) => lower(replace(v, " ", "_"))
  }
}

# AWS resources with standard tags
resource "aws_instance" "web" {
  count = var.providers_config.aws_enabled ? var.instance_count : 0
  
  ami           = data.aws_ami.latest[0].id
  instance_type = var.instance_type
  
  tags = merge(local.aws_tags, {
    Name = "${var.project_name}-web-${count.index + 1}"
    Role = "webserver"
    Provider = "aws"
  })
}

# Azure resources with standard tags
resource "azurerm_virtual_machine" "web" {
  count = var.providers_config.azure_enabled ? var.instance_count : 0
  
  name                = "${var.project_name}-web-${count.index + 1}"
  location            = var.azure_location
  resource_group_name = azurerm_resource_group.main[0].name
  vm_size             = var.azure_vm_size
  
  tags = merge(local.azure_tags, {
    Role = "webserver"
    Provider = "azure"
  })
}

# GCP resources with standard labels
resource "google_compute_instance" "web" {
  count = var.providers_config.gcp_enabled ? var.instance_count : 0
  
  name         = "${var.project_name}-web-${count.index + 1}"
  machine_type = var.gcp_machine_type
  zone         = var.gcp_zone
  
  labels = merge(local.gcp_labels, {
    role = "webserver"
    provider = "gcp"
  })
}

Multi-Cloud Policy Framework

Implement consistent policies across providers:

# Policy configuration for all providers
variable "governance_policies" {
  description = "Governance policies to apply across all providers"
  type = object({
    allowed_regions = object({
      aws   = list(string)
      azure = list(string)
      gcp   = list(string)
    })
    allowed_instance_types = object({
      aws   = list(string)
      azure = list(string)
      gcp   = list(string)
    })
    required_tags = list(string)
    cost_limits = object({
      monthly_budget = number
      alert_threshold = number
    })
  })
  
  default = {
    allowed_regions = {
      aws   = ["us-west-2", "us-east-1", "eu-west-1"]
      azure = ["West US 2", "East US", "West Europe"]
      gcp   = ["us-west1", "us-east1", "europe-west1"]
    }
    allowed_instance_types = {
      aws   = ["t3.micro", "t3.small", "t3.medium", "t3.large"]
      azure = ["Standard_B1s", "Standard_B2s", "Standard_D2s_v3"]
      gcp   = ["e2-micro", "e2-small", "e2-medium", "e2-standard-2"]
    }
    required_tags = ["Environment", "Project", "Owner", "CostCenter"]
    cost_limits = {
      monthly_budget = 10000
      alert_threshold = 80
    }
  }
}

# AWS policy validation
resource "aws_instance" "web" {
  count = var.providers_config.aws_enabled ? var.instance_count : 0
  
  ami           = data.aws_ami.latest[0].id
  instance_type = var.aws_instance_type
  
  lifecycle {
    precondition {
      condition = contains(
        var.governance_policies.allowed_instance_types.aws,
        var.aws_instance_type
      )
      error_message = "Instance type ${var.aws_instance_type} is not allowed. Allowed types: ${join(", ", var.governance_policies.allowed_instance_types.aws)}"
    }
    
    postcondition {
      condition = alltrue([
        for tag in var.governance_policies.required_tags :
        contains(keys(self.tags), tag)
      ])
      error_message = "All required tags must be present: ${join(", ", var.governance_policies.required_tags)}"
    }
  }
  
  tags = local.aws_tags
}

# Azure policy validation
resource "azurerm_virtual_machine" "web" {
  count = var.providers_config.azure_enabled ? var.instance_count : 0
  
  name                = "${var.project_name}-web-${count.index + 1}"
  location            = var.azure_location
  resource_group_name = azurerm_resource_group.main[0].name
  vm_size             = var.azure_vm_size
  
  lifecycle {
    precondition {
      condition = contains(
        var.governance_policies.allowed_regions.azure,
        var.azure_location
      )
      error_message = "Region ${var.azure_location} is not allowed for Azure resources."
    }
    
    precondition {
      condition = contains(
        var.governance_policies.allowed_instance_types.azure,
        var.azure_vm_size
      )
      error_message = "VM size ${var.azure_vm_size} is not allowed."
    }
  }
  
  tags = local.azure_tags
}

Cost Monitoring and Budgets

Implement comprehensive cost monitoring across providers:

# AWS cost monitoring
resource "aws_budgets_budget" "monthly_aws" {
  count = var.providers_config.aws_enabled ? 1 : 0
  
  name         = "${var.project_name}-aws-monthly-budget"
  budget_type  = "COST"
  limit_amount = var.governance_policies.cost_limits.monthly_budget * 0.4  # 40% allocation to AWS
  limit_unit   = "USD"
  time_unit    = "MONTHLY"
  
  cost_filters = {
    LinkedAccount = [data.aws_caller_identity.current[0].account_id]
    TagKey        = ["Project"]
    TagValue      = [var.project_name]
  }
  
  notification {
    comparison_operator        = "GREATER_THAN"
    threshold                 = var.governance_policies.cost_limits.alert_threshold
    threshold_type            = "PERCENTAGE"
    notification_type         = "ACTUAL"
    subscriber_email_addresses = var.budget_notification_emails
  }
}

# Azure cost monitoring
resource "azurerm_consumption_budget_resource_group" "monthly_azure" {
  count = var.providers_config.azure_enabled ? 1 : 0
  
  name              = "${var.project_name}-azure-monthly-budget"
  resource_group_id = azurerm_resource_group.main[0].id
  
  amount     = var.governance_policies.cost_limits.monthly_budget * 0.4  # 40% allocation to Azure
  time_grain = "Monthly"
  
  time_period {
    start_date = formatdate("YYYY-MM-01T00:00:00Z", timestamp())
    end_date   = formatdate("YYYY-MM-01T00:00:00Z", timeadd(timestamp(), "8760h"))  # 1 year
  }
  
  notification {
    enabled        = true
    threshold      = var.governance_policies.cost_limits.alert_threshold
    operator       = "GreaterThan"
    threshold_type = "Actual"
    
    contact_emails = var.budget_notification_emails
  }
}

# GCP cost monitoring
resource "google_billing_budget" "monthly_gcp" {
  count = var.providers_config.gcp_enabled ? 1 : 0
  
  billing_account = var.gcp_billing_account
  display_name    = "${var.project_name}-gcp-monthly-budget"
  
  budget_filter {
    projects = ["projects/${var.gcp_project_id}"]
    
    labels = {
      project = var.project_name
    }
  }
  
  amount {
    specified_amount {
      currency_code = "USD"
      units         = tostring(floor(var.governance_policies.cost_limits.monthly_budget * 0.2))  # 20% allocation to GCP
    }
  }
  
  threshold_rules {
    threshold_percent = var.governance_policies.cost_limits.alert_threshold / 100
    spend_basis      = "CURRENT_SPEND"
  }
  
  all_updates_rule {
    monitoring_notification_channels = var.gcp_notification_channels
    disable_default_iam_recipients   = false
  }
}

Unified Cost Reporting

Create unified cost reporting across all providers:

#!/usr/bin/env python3
# scripts/multi_cloud_cost_report.py

import boto3
import json
import requests
from datetime import datetime, timedelta
from google.cloud import billing_v1
from azure.identity import DefaultAzureCredential
from azure.mgmt.consumption import ConsumptionManagementClient

class MultiCloudCostReporter:
    def __init__(self, config):
        self.config = config
        self.aws_client = boto3.client('ce', region_name='us-east-1') if config.get('aws_enabled') else None
        self.azure_client = ConsumptionManagementClient(
            DefaultAzureCredential(), 
            config.get('azure_subscription_id')
        ) if config.get('azure_enabled') else None
        self.gcp_client = billing_v1.CloudBillingClient() if config.get('gcp_enabled') else None
    
    def get_aws_costs(self, start_date, end_date):
        """Get AWS costs for the specified period"""
        if not self.aws_client:
            return {"provider": "aws", "total_cost": 0, "services": []}
        
        try:
            response = self.aws_client.get_cost_and_usage(
                TimePeriod={
                    'Start': start_date.strftime('%Y-%m-%d'),
                    'End': end_date.strftime('%Y-%m-%d')
                },
                Granularity='MONTHLY',
                Metrics=['BlendedCost'],
                GroupBy=[
                    {'Type': 'DIMENSION', 'Key': 'SERVICE'},
                ]
            )
            
            total_cost = 0
            services = []
            
            for result in response['ResultsByTime']:
                for group in result['Groups']:
                    service_name = group['Keys'][0]
                    cost = float(group['Metrics']['BlendedCost']['Amount'])
                    total_cost += cost
                    services.append({
                        'service': service_name,
                        'cost': cost
                    })
            
            return {
                "provider": "aws",
                "total_cost": total_cost,
                "services": services
            }
        
        except Exception as e:
            print(f"Error getting AWS costs: {e}")
            return {"provider": "aws", "total_cost": 0, "services": []}
    
    def get_azure_costs(self, start_date, end_date):
        """Get Azure costs for the specified period"""
        if not self.azure_client:
            return {"provider": "azure", "total_cost": 0, "services": []}
        
        try:
            # Azure consumption API call would go here
            # This is a simplified example
            return {
                "provider": "azure",
                "total_cost": 0,  # Placeholder
                "services": []
            }
        
        except Exception as e:
            print(f"Error getting Azure costs: {e}")
            return {"provider": "azure", "total_cost": 0, "services": []}
    
    def get_gcp_costs(self, start_date, end_date):
        """Get GCP costs for the specified period"""
        if not self.gcp_client:
            return {"provider": "gcp", "total_cost": 0, "services": []}
        
        try:
            # GCP billing API call would go here
            # This is a simplified example
            return {
                "provider": "gcp",
                "total_cost": 0,  # Placeholder
                "services": []
            }
        
        except Exception as e:
            print(f"Error getting GCP costs: {e}")
            return {"provider": "gcp", "total_cost": 0, "services": []}
    
    def generate_unified_report(self, days_back=30):
        """Generate a unified cost report across all providers"""
        end_date = datetime.now()
        start_date = end_date - timedelta(days=days_back)
        
        # Get costs from all providers
        aws_costs = self.get_aws_costs(start_date, end_date)
        azure_costs = self.get_azure_costs(start_date, end_date)
        gcp_costs = self.get_gcp_costs(start_date, end_date)
        
        # Combine results
        total_cost = aws_costs['total_cost'] + azure_costs['total_cost'] + gcp_costs['total_cost']
        
        report = {
            "report_date": datetime.now().isoformat(),
            "period": {
                "start": start_date.isoformat(),
                "end": end_date.isoformat()
            },
            "total_cost": total_cost,
            "providers": [aws_costs, azure_costs, gcp_costs],
            "cost_breakdown": {
                "aws_percentage": (aws_costs['total_cost'] / total_cost * 100) if total_cost > 0 else 0,
                "azure_percentage": (azure_costs['total_cost'] / total_cost * 100) if total_cost > 0 else 0,
                "gcp_percentage": (gcp_costs['total_cost'] / total_cost * 100) if total_cost > 0 else 0
            }
        }
        
        return report
    
    def save_report(self, report, filename=None):
        """Save the cost report to a file"""
        if not filename:
            filename = f"multi_cloud_cost_report_{datetime.now().strftime('%Y%m%d')}.json"
        
        with open(filename, 'w') as f:
            json.dump(report, f, indent=2)
        
        print(f"Cost report saved to {filename}")
        return filename

# Usage example
if __name__ == "__main__":
    config = {
        'aws_enabled': True,
        'azure_enabled': True,
        'gcp_enabled': True,
        'azure_subscription_id': 'your-subscription-id',
        'gcp_project_id': 'your-project-id'
    }
    
    reporter = MultiCloudCostReporter(config)
    report = reporter.generate_unified_report(30)
    reporter.save_report(report)
    
    print(f"Total multi-cloud cost: ${report['total_cost']:.2f}")
    for provider in report['providers']:
        print(f"{provider['provider'].upper()}: ${provider['total_cost']:.2f}")

Compliance and Audit Framework

Implement unified compliance monitoring:

# Compliance monitoring module
module "compliance_monitoring" {
  source = "./modules/compliance-monitoring"
  
  providers_config = var.providers_config
  project_name     = var.project_name
  
  # Compliance requirements
  compliance_frameworks = [
    "SOC2",
    "ISO27001",
    "GDPR",
    "HIPAA"
  ]
  
  # Audit requirements
  audit_config = {
    log_retention_days = 2555  # 7 years
    enable_encryption  = true
    enable_monitoring  = true
  }
  
  # Notification settings
  compliance_alerts = {
    email_addresses = var.compliance_notification_emails
    slack_webhook   = var.compliance_slack_webhook
  }
}

# AWS compliance resources
resource "aws_config_configuration_recorder" "compliance" {
  count = var.providers_config.aws_enabled ? 1 : 0
  
  name     = "${var.project_name}-compliance-recorder"
  role_arn = aws_iam_role.config_role[0].arn
  
  recording_group {
    all_supported                 = true
    include_global_resource_types = true
  }
}

resource "aws_config_config_rule" "required_tags" {
  count = var.providers_config.aws_enabled ? 1 : 0
  
  name = "${var.project_name}-required-tags"
  
  source {
    owner             = "AWS"
    source_identifier = "REQUIRED_TAGS"
  }
  
  input_parameters = jsonencode({
    tag1Key = "Environment"
    tag2Key = "Project"
    tag3Key = "Owner"
    tag4Key = "CostCenter"
  })
  
  depends_on = [aws_config_configuration_recorder.compliance]
}

# Azure compliance resources
resource "azurerm_policy_assignment" "required_tags" {
  count = var.providers_config.azure_enabled ? 1 : 0
  
  name                 = "${var.project_name}-required-tags"
  scope                = azurerm_resource_group.main[0].id
  policy_definition_id = "/providers/Microsoft.Authorization/policyDefinitions/1e30110a-5ceb-460c-a204-c1c3969c6d62"
  
  parameters = jsonencode({
    tagName = {
      value = "Environment"
    }
  })
}

# GCP compliance resources
resource "google_project_organization_policy" "require_labels" {
  count = var.providers_config.gcp_enabled ? 1 : 0
  
  project    = var.gcp_project_id
  constraint = "constraints/gcp.resourceLocations"
  
  list_policy {
    allow {
      values = var.governance_policies.allowed_regions.gcp
    }
  }
}

Automated Governance Enforcement

Implement automated policy enforcement:

# .github/workflows/governance-check.yml
name: Multi-Cloud Governance Check

on:
  pull_request:
    paths: ['infrastructure/**']

jobs:
  governance-validation:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.6.0
      
      - name: Setup OPA
        uses: open-policy-agent/setup-opa@v2
      
      - name: Generate Terraform Plans
        run: |
          find infrastructure -name "*.tf" -exec dirname {} \; | sort -u | while read dir; do
            cd "$dir"
            terraform init -backend=false
            terraform plan -out=plan.tfplan
            terraform show -json plan.tfplan > plan.json
            cd - > /dev/null
          done
      
      - name: Run Multi-Cloud Policy Checks
        run: |
          # Check AWS resources
          find infrastructure -name "plan.json" | while read plan; do
            echo "Checking AWS policies for $plan"
            opa eval -d policies/aws/ -i "$plan" "data.aws.deny[x]"
          done
          
          # Check Azure resources
          find infrastructure -name "plan.json" | while read plan; do
            echo "Checking Azure policies for $plan"
            opa eval -d policies/azure/ -i "$plan" "data.azure.deny[x]"
          done
          
          # Check GCP resources
          find infrastructure -name "plan.json" | while read plan; do
            echo "Checking GCP policies for $plan"
            opa eval -d policies/gcp/ -i "$plan" "data.gcp.deny[x]"
          done
      
      - name: Cost Impact Analysis
        run: |
          python3 scripts/cost_impact_analysis.py \
            --terraform-plans "infrastructure/*/plan.json" \
            --budget-limit ${{ vars.MONTHLY_BUDGET_LIMIT }} \
            --output cost-impact.json
      
      - name: Generate Governance Report
        run: |
          python3 scripts/governance_report.py \
            --terraform-plans "infrastructure/*/plan.json" \
            --policies policies/ \
            --output governance-report.md
      
      - name: Comment on PR
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const report = fs.readFileSync('governance-report.md', 'utf8');
            
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: `## Multi-Cloud Governance Report\n\n${report}`
            });

Conclusion

Effective multi-cloud governance and cost management require unified policies, consistent monitoring, and automated enforcement across all cloud providers. The patterns covered in this guide provide a framework for implementing governance that scales across AWS, Azure, and Google Cloud while maintaining cost control and compliance requirements.

The key to successful multi-cloud governance is treating it as a unified system rather than managing each provider separately. Consistent tagging, unified cost reporting, and automated policy enforcement ensure that your multi-cloud infrastructure remains manageable, compliant, and cost-effective as it scales.

Remember that governance is an ongoing process that requires regular review and adjustment as your multi-cloud architecture evolves and as cloud providers introduce new services and pricing models.

Multi-Provider Setup

Multi-Provider Configuration

Authentication Strategies

Environment-Specific Provider Configuration

Resource Organization Patterns

Cross-Provider Data Sharing

Provider Version Management

Multi-Cloud Module Structure

Error Handling and Debugging

What’s Next

Cross-Cloud Networking

AWS-Azure VPN Connection

AWS-GCP Interconnect

Multi-Cloud Transit Gateway

Network Automation Script

Network Monitoring

What’s Next

Unified Identity and Access

Cross-Cloud Service Accounts

Federated Identity Setup

Unified RBAC Implementation

Secret Management Across Clouds

Access Control Automation

What’s Next

Provider Abstraction Patterns

Universal Compute Module

Universal Storage Module

Configuration Factory Pattern

What’s Next

Data and Storage Strategies

Cross-Cloud Data Replication

Database Replication Strategy

Data Synchronization Pipeline

Disaster Recovery Automation

What’s Next

Monitoring and Observability

Unified Metrics Collection

Cross-Cloud Alerting System

Unified Dashboard Creation

What’s Next

Governance and Cost Management

Unified Tagging Strategy

Multi-Cloud Policy Framework

Cost Monitoring and Budgets

Unified Cost Reporting

Compliance and Audit Framework

Automated Governance Enforcement

Conclusion