Team Collaboration

Terraform collaboration goes beyond sharing code repositories. When multiple people need to modify shared infrastructure, you’re dealing with coordination challenges that don’t exist in application development. State conflicts, permission boundaries, and deployment coordination become critical concerns that can make or break your team’s productivity.

Successful Terraform collaboration requires processes, conventions, and technical patterns that prevent conflicts while enabling teams to move quickly. The approaches in this part address the organizational and technical challenges that emerge when infrastructure management scales beyond individual contributors.

Git Workflows for Infrastructure

Infrastructure code needs the same discipline as application code, but with higher stakes. A bug in application code might affect users; a bug in infrastructure code can take down entire systems.

Branch protection and code review:

# .github/branch-protection.yml
protection_rules:
  main:
    required_status_checks:
      - terraform-plan
      - terraform-validate
      - security-scan
    required_pull_request_reviews:
      required_approving_review_count: 2
      dismiss_stale_reviews: true
      require_code_owner_reviews: true
    restrictions:
      users: []
      teams: ["infrastructure-team"]

CODEOWNERS for infrastructure:

# CODEOWNERS
# Global infrastructure requires platform team approval
/infrastructure/global/           @platform-team
/modules/                        @platform-team

# Environment-specific changes
/environments/production/        @platform-team @security-team
/environments/staging/           @platform-team
/environments/development/       @development-team

# Application-specific infrastructure
/applications/web-app/           @web-team
/applications/api/               @backend-team

Conventional commits for infrastructure:

feat(vpc): add support for IPv6 dual-stack
fix(rds): correct backup retention period
docs(modules): update VPC module documentation
refactor(security): consolidate security group rules

CI/CD Pipeline Design

Terraform CI/CD pipelines need to handle the unique challenges of infrastructure management—state locking, plan review, and safe deployment practices:

GitHub Actions workflow:

name: Terraform CI/CD
on:
  pull_request:
    paths: ['infrastructure/**']
  push:
    branches: [main]
    paths: ['infrastructure/**']

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: hashicorp/setup-terraform@v2
        with:
          terraform_version: 1.6.0
      
      - name: Terraform Format Check
        run: terraform fmt -check -recursive
      
      - name: Terraform Validate
        run: |
          cd infrastructure
          terraform init -backend=false
          terraform validate
      
      - name: Security Scan
        uses: bridgecrewio/checkov-action@master
        with:
          directory: infrastructure/
          framework: terraform

  plan:
    if: github.event_name == 'pull_request'
    runs-on: ubuntu-latest
    environment: staging
    steps:
      - uses: actions/checkout@v3
      - uses: hashicorp/setup-terraform@v2
      
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v2
        with:
          role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
          aws-region: us-west-2
      
      - name: Terraform Plan
        run: |
          cd infrastructure/staging
          terraform init
          terraform plan -out=tfplan
          terraform show -no-color tfplan > plan.txt
      
      - name: Comment Plan
        uses: actions/github-script@v6
        with:
          script: |
            const fs = require('fs');
            const plan = fs.readFileSync('infrastructure/staging/plan.txt', 'utf8');
            const body = `## Terraform Plan\n\`\`\`\n${plan}\n\`\`\``;
            
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: body
            });

  apply:
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    environment: production
    steps:
      - uses: actions/checkout@v3
      - uses: hashicorp/setup-terraform@v2
      
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v2
        with:
          role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
          aws-region: us-west-2
      
      - name: Terraform Apply
        run: |
          cd infrastructure/production
          terraform init
          terraform apply -auto-approve

GitLab CI pipeline:

stages:
  - validate
  - plan
  - apply

variables:
  TF_ROOT: infrastructure
  TF_VERSION: 1.6.0

.terraform_base:
  image: hashicorp/terraform:$TF_VERSION
  before_script:
    - cd $TF_ROOT
    - terraform init

validate:
  extends: .terraform_base
  stage: validate
  script:
    - terraform fmt -check -recursive
    - terraform validate
  rules:
    - changes:
      - infrastructure/**/*

plan:
  extends: .terraform_base
  stage: plan
  script:
    - terraform plan -out=tfplan
    - terraform show -no-color tfplan
  artifacts:
    paths:
      - $TF_ROOT/tfplan
    expire_in: 1 week
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"

apply:
  extends: .terraform_base
  stage: apply
  script:
    - terraform apply -auto-approve tfplan
  dependencies:
    - plan
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
      when: manual
  environment:
    name: production

State Locking and Coordination

Multiple team members need to coordinate access to shared state files:

DynamoDB locking configuration:

resource "aws_dynamodb_table" "terraform_locks" {
  name           = "terraform-locks"
  billing_mode   = "PAY_PER_REQUEST"
  hash_key       = "LockID"
  
  attribute {
    name = "LockID"
    type = "S"
  }
  
  tags = {
    Name = "Terraform State Locks"
  }
}

# Use in backend configuration
terraform {
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "infrastructure/terraform.tfstate"
    region         = "us-west-2"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

Handling stuck locks:

# Check for existing locks
terraform force-unlock <LOCK_ID>

# Or use AWS CLI to inspect DynamoDB
aws dynamodb scan --table-name terraform-locks

# Remove stuck locks (use carefully!)
aws dynamodb delete-item \
  --table-name terraform-locks \
  --key '{"LockID":{"S":"my-terraform-state/infrastructure/terraform.tfstate-md5"}}'

Environment Promotion Strategies

Teams need reliable ways to promote changes through environments:

Gitflow with environment branches:

main (production)
├── staging
├── development
└── feature/new-vpc-config

Directory-based environments:

infrastructure/
├── modules/
│   ├── vpc/
│   ├── database/
│   └── application/
├── environments/
│   ├── development/
│   │   ├── main.tf
│   │   ├── terraform.tfvars
│   │   └── backend.hcl
│   ├── staging/
│   │   ├── main.tf
│   │   ├── terraform.tfvars
│   │   └── backend.hcl
│   └── production/
│       ├── main.tf
│       ├── terraform.tfvars
│       └── backend.hcl

Automated promotion pipeline:

name: Environment Promotion
on:
  workflow_dispatch:
    inputs:
      source_env:
        description: 'Source environment'
        required: true
        type: choice
        options: ['development', 'staging']
      target_env:
        description: 'Target environment'
        required: true
        type: choice
        options: ['staging', 'production']

jobs:
  promote:
    runs-on: ubuntu-latest
    steps:
      - name: Validate Promotion
        run: |
          if [[ "${{ inputs.source_env }}" == "staging" && "${{ inputs.target_env }}" == "development" ]]; then
            echo "Cannot promote backwards"
            exit 1
          fi
      
      - name: Copy Configuration
        run: |
          # Copy module versions and configuration
          cp environments/${{ inputs.source_env }}/versions.tf \
             environments/${{ inputs.target_env }}/versions.tf
          
          # Update environment-specific variables
          sed -i 's/${{ inputs.source_env }}/${{ inputs.target_env }}/g' \
            environments/${{ inputs.target_env }}/terraform.tfvars

Code Organization Patterns

Large teams need consistent code organization:

Monorepo structure:

terraform-infrastructure/
├── modules/
│   ├── networking/
│   │   ├── vpc/
│   │   ├── subnets/
│   │   └── security-groups/
│   ├── compute/
│   │   ├── ec2/
│   │   ├── ecs/
│   │   └── lambda/
│   └── data/
│       ├── rds/
│       ├── s3/
│       └── dynamodb/
├── environments/
│   ├── shared/
│   │   ├── dns/
│   │   ├── iam/
│   │   └── monitoring/
│   ├── development/
│   ├── staging/
│   └── production/
├── applications/
│   ├── web-app/
│   ├── api-service/
│   └── data-pipeline/
└── tools/
    ├── scripts/
    ├── policies/
    └── templates/

Multi-repo structure for team autonomy:

platform-infrastructure/     # Shared infrastructure
├── networking/
├── security/
└── monitoring/

web-team-infrastructure/      # Team-specific infrastructure
├── applications/
├── databases/
└── environments/

data-team-infrastructure/     # Another team's infrastructure
├── pipelines/
├── storage/
└── analytics/

Access Control and Permissions

Teams need different levels of access to different parts of the infrastructure:

Role-based access control:

# Platform team - full access
data "aws_iam_policy_document" "platform_team" {
  statement {
    effect = "Allow"
    actions = ["*"]
    resources = ["*"]
  }
}

# Development team - limited to dev environment
data "aws_iam_policy_document" "dev_team" {
  statement {
    effect = "Allow"
    actions = [
      "ec2:*",
      "rds:*",
      "s3:*"
    ]
    resources = ["*"]
    condition {
      test     = "StringEquals"
      variable = "aws:RequestedRegion"
      values   = ["us-west-2"]
    }
    condition {
      test     = "ForAllValues:StringLike"
      variable = "aws:ResourceTag/Environment"
      values   = ["development", "dev-*"]
    }
  }
}

# Read-only access for security team
data "aws_iam_policy_document" "security_team" {
  statement {
    effect = "Allow"
    actions = [
      "ec2:Describe*",
      "rds:Describe*",
      "s3:List*",
      "s3:Get*"
    ]
    resources = ["*"]
  }
}

Environment-specific CI/CD roles:

resource "aws_iam_role" "terraform_ci" {
  for_each = toset(["development", "staging", "production"])
  
  name = "terraform-ci-${each.key}"
  
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRoleWithWebIdentity"
        Effect = "Allow"
        Principal = {
          Federated = aws_iam_openid_connect_provider.github.arn
        }
        Condition = {
          StringEquals = {
            "token.actions.githubusercontent.com:sub" = "repo:myorg/infrastructure:environment:${each.key}"
          }
        }
      }
    ]
  })
}

Collaboration Tools and Practices

Terraform Cloud/Enterprise for team collaboration:

terraform {
  cloud {
    organization = "my-company"
    
    workspaces {
      name = "production-infrastructure"
    }
  }
}

Atlantis for pull request automation:

# atlantis.yaml
version: 3
projects:
  - name: production
    dir: environments/production
    workspace: production
    autoplan:
      when_modified: ["*.tf", "*.tfvars"]
    apply_requirements: ["approved", "mergeable"]
    
  - name: staging
    dir: environments/staging
    workspace: staging
    autoplan:
      when_modified: ["*.tf", "*.tfvars"]

Documentation as code:

# Generate documentation automatically
resource "local_file" "module_docs" {
  for_each = fileset("${path.module}/modules", "*/")
  
  content = templatefile("${path.module}/templates/module-doc.md.tpl", {
    module_name = each.key
    variables   = yamldecode(file("${path.module}/modules/${each.key}/variables.yaml"))
    outputs     = yamldecode(file("${path.module}/modules/${each.key}/outputs.yaml"))
  })
  
  filename = "${path.module}/docs/modules/${each.key}.md"
}

Conflict Resolution and Recovery

When things go wrong in team environments:

State file recovery:

# Backup current state before recovery
terraform state pull > backup-$(date +%Y%m%d-%H%M%S).tfstate

# Import resources that exist but aren't in state
terraform import aws_instance.web i-1234567890abcdef0

# Remove resources from state that no longer exist
terraform state rm aws_instance.old_server

# Move resources between configurations
terraform state mv aws_instance.web module.web.aws_instance.server

Merge conflict resolution:

# When state files conflict, use the remote version and re-import
terraform state pull > current-state.tfstate
git checkout HEAD -- terraform.tfstate
terraform refresh
terraform plan  # Review differences

What’s Coming Next

Team collaboration patterns are essential for scaling Terraform beyond individual use. The workflows, access controls, and organizational practices we’ve covered enable multiple teams to work together safely and efficiently while maintaining the reliability and security that production infrastructure requires.

In the final part, we’ll explore scaling and optimization—how to handle very large Terraform configurations, multi-cloud scenarios, performance optimization, and the enterprise patterns that support infrastructure management at massive scale.