Team Collaboration
Terraform collaboration goes beyond sharing code repositories. When multiple people need to modify shared infrastructure, you’re dealing with coordination challenges that don’t exist in application development. State conflicts, permission boundaries, and deployment coordination become critical concerns that can make or break your team’s productivity.
Successful Terraform collaboration requires processes, conventions, and technical patterns that prevent conflicts while enabling teams to move quickly. The approaches in this part address the organizational and technical challenges that emerge when infrastructure management scales beyond individual contributors.
Git Workflows for Infrastructure
Infrastructure code needs the same discipline as application code, but with higher stakes. A bug in application code might affect users; a bug in infrastructure code can take down entire systems.
Branch protection and code review:
# .github/branch-protection.yml
protection_rules:
main:
required_status_checks:
- terraform-plan
- terraform-validate
- security-scan
required_pull_request_reviews:
required_approving_review_count: 2
dismiss_stale_reviews: true
require_code_owner_reviews: true
restrictions:
users: []
teams: ["infrastructure-team"]
CODEOWNERS for infrastructure:
# CODEOWNERS
# Global infrastructure requires platform team approval
/infrastructure/global/ @platform-team
/modules/ @platform-team
# Environment-specific changes
/environments/production/ @platform-team @security-team
/environments/staging/ @platform-team
/environments/development/ @development-team
# Application-specific infrastructure
/applications/web-app/ @web-team
/applications/api/ @backend-team
Conventional commits for infrastructure:
feat(vpc): add support for IPv6 dual-stack
fix(rds): correct backup retention period
docs(modules): update VPC module documentation
refactor(security): consolidate security group rules
CI/CD Pipeline Design
Terraform CI/CD pipelines need to handle the unique challenges of infrastructure management—state locking, plan review, and safe deployment practices:
GitHub Actions workflow:
name: Terraform CI/CD
on:
pull_request:
paths: ['infrastructure/**']
push:
branches: [main]
paths: ['infrastructure/**']
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: hashicorp/setup-terraform@v2
with:
terraform_version: 1.6.0
- name: Terraform Format Check
run: terraform fmt -check -recursive
- name: Terraform Validate
run: |
cd infrastructure
terraform init -backend=false
terraform validate
- name: Security Scan
uses: bridgecrewio/checkov-action@master
with:
directory: infrastructure/
framework: terraform
plan:
if: github.event_name == 'pull_request'
runs-on: ubuntu-latest
environment: staging
steps:
- uses: actions/checkout@v3
- uses: hashicorp/setup-terraform@v2
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v2
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: us-west-2
- name: Terraform Plan
run: |
cd infrastructure/staging
terraform init
terraform plan -out=tfplan
terraform show -no-color tfplan > plan.txt
- name: Comment Plan
uses: actions/github-script@v6
with:
script: |
const fs = require('fs');
const plan = fs.readFileSync('infrastructure/staging/plan.txt', 'utf8');
const body = `## Terraform Plan\n\`\`\`\n${plan}\n\`\`\``;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: body
});
apply:
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
environment: production
steps:
- uses: actions/checkout@v3
- uses: hashicorp/setup-terraform@v2
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v2
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: us-west-2
- name: Terraform Apply
run: |
cd infrastructure/production
terraform init
terraform apply -auto-approve
GitLab CI pipeline:
stages:
- validate
- plan
- apply
variables:
TF_ROOT: infrastructure
TF_VERSION: 1.6.0
.terraform_base:
image: hashicorp/terraform:$TF_VERSION
before_script:
- cd $TF_ROOT
- terraform init
validate:
extends: .terraform_base
stage: validate
script:
- terraform fmt -check -recursive
- terraform validate
rules:
- changes:
- infrastructure/**/*
plan:
extends: .terraform_base
stage: plan
script:
- terraform plan -out=tfplan
- terraform show -no-color tfplan
artifacts:
paths:
- $TF_ROOT/tfplan
expire_in: 1 week
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
apply:
extends: .terraform_base
stage: apply
script:
- terraform apply -auto-approve tfplan
dependencies:
- plan
rules:
- if: $CI_COMMIT_BRANCH == "main"
when: manual
environment:
name: production
State Locking and Coordination
Multiple team members need to coordinate access to shared state files:
DynamoDB locking configuration:
resource "aws_dynamodb_table" "terraform_locks" {
name = "terraform-locks"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
tags = {
Name = "Terraform State Locks"
}
}
# Use in backend configuration
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "infrastructure/terraform.tfstate"
region = "us-west-2"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
Handling stuck locks:
# Check for existing locks
terraform force-unlock <LOCK_ID>
# Or use AWS CLI to inspect DynamoDB
aws dynamodb scan --table-name terraform-locks
# Remove stuck locks (use carefully!)
aws dynamodb delete-item \
--table-name terraform-locks \
--key '{"LockID":{"S":"my-terraform-state/infrastructure/terraform.tfstate-md5"}}'
Environment Promotion Strategies
Teams need reliable ways to promote changes through environments:
Gitflow with environment branches:
main (production)
├── staging
├── development
└── feature/new-vpc-config
Directory-based environments:
infrastructure/
├── modules/
│ ├── vpc/
│ ├── database/
│ └── application/
├── environments/
│ ├── development/
│ │ ├── main.tf
│ │ ├── terraform.tfvars
│ │ └── backend.hcl
│ ├── staging/
│ │ ├── main.tf
│ │ ├── terraform.tfvars
│ │ └── backend.hcl
│ └── production/
│ ├── main.tf
│ ├── terraform.tfvars
│ └── backend.hcl
Automated promotion pipeline:
name: Environment Promotion
on:
workflow_dispatch:
inputs:
source_env:
description: 'Source environment'
required: true
type: choice
options: ['development', 'staging']
target_env:
description: 'Target environment'
required: true
type: choice
options: ['staging', 'production']
jobs:
promote:
runs-on: ubuntu-latest
steps:
- name: Validate Promotion
run: |
if [[ "${{ inputs.source_env }}" == "staging" && "${{ inputs.target_env }}" == "development" ]]; then
echo "Cannot promote backwards"
exit 1
fi
- name: Copy Configuration
run: |
# Copy module versions and configuration
cp environments/${{ inputs.source_env }}/versions.tf \
environments/${{ inputs.target_env }}/versions.tf
# Update environment-specific variables
sed -i 's/${{ inputs.source_env }}/${{ inputs.target_env }}/g' \
environments/${{ inputs.target_env }}/terraform.tfvars
Code Organization Patterns
Large teams need consistent code organization:
Monorepo structure:
terraform-infrastructure/
├── modules/
│ ├── networking/
│ │ ├── vpc/
│ │ ├── subnets/
│ │ └── security-groups/
│ ├── compute/
│ │ ├── ec2/
│ │ ├── ecs/
│ │ └── lambda/
│ └── data/
│ ├── rds/
│ ├── s3/
│ └── dynamodb/
├── environments/
│ ├── shared/
│ │ ├── dns/
│ │ ├── iam/
│ │ └── monitoring/
│ ├── development/
│ ├── staging/
│ └── production/
├── applications/
│ ├── web-app/
│ ├── api-service/
│ └── data-pipeline/
└── tools/
├── scripts/
├── policies/
└── templates/
Multi-repo structure for team autonomy:
platform-infrastructure/ # Shared infrastructure
├── networking/
├── security/
└── monitoring/
web-team-infrastructure/ # Team-specific infrastructure
├── applications/
├── databases/
└── environments/
data-team-infrastructure/ # Another team's infrastructure
├── pipelines/
├── storage/
└── analytics/
Access Control and Permissions
Teams need different levels of access to different parts of the infrastructure:
Role-based access control:
# Platform team - full access
data "aws_iam_policy_document" "platform_team" {
statement {
effect = "Allow"
actions = ["*"]
resources = ["*"]
}
}
# Development team - limited to dev environment
data "aws_iam_policy_document" "dev_team" {
statement {
effect = "Allow"
actions = [
"ec2:*",
"rds:*",
"s3:*"
]
resources = ["*"]
condition {
test = "StringEquals"
variable = "aws:RequestedRegion"
values = ["us-west-2"]
}
condition {
test = "ForAllValues:StringLike"
variable = "aws:ResourceTag/Environment"
values = ["development", "dev-*"]
}
}
}
# Read-only access for security team
data "aws_iam_policy_document" "security_team" {
statement {
effect = "Allow"
actions = [
"ec2:Describe*",
"rds:Describe*",
"s3:List*",
"s3:Get*"
]
resources = ["*"]
}
}
Environment-specific CI/CD roles:
resource "aws_iam_role" "terraform_ci" {
for_each = toset(["development", "staging", "production"])
name = "terraform-ci-${each.key}"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRoleWithWebIdentity"
Effect = "Allow"
Principal = {
Federated = aws_iam_openid_connect_provider.github.arn
}
Condition = {
StringEquals = {
"token.actions.githubusercontent.com:sub" = "repo:myorg/infrastructure:environment:${each.key}"
}
}
}
]
})
}
Collaboration Tools and Practices
Terraform Cloud/Enterprise for team collaboration:
terraform {
cloud {
organization = "my-company"
workspaces {
name = "production-infrastructure"
}
}
}
Atlantis for pull request automation:
# atlantis.yaml
version: 3
projects:
- name: production
dir: environments/production
workspace: production
autoplan:
when_modified: ["*.tf", "*.tfvars"]
apply_requirements: ["approved", "mergeable"]
- name: staging
dir: environments/staging
workspace: staging
autoplan:
when_modified: ["*.tf", "*.tfvars"]
Documentation as code:
# Generate documentation automatically
resource "local_file" "module_docs" {
for_each = fileset("${path.module}/modules", "*/")
content = templatefile("${path.module}/templates/module-doc.md.tpl", {
module_name = each.key
variables = yamldecode(file("${path.module}/modules/${each.key}/variables.yaml"))
outputs = yamldecode(file("${path.module}/modules/${each.key}/outputs.yaml"))
})
filename = "${path.module}/docs/modules/${each.key}.md"
}
Conflict Resolution and Recovery
When things go wrong in team environments:
State file recovery:
# Backup current state before recovery
terraform state pull > backup-$(date +%Y%m%d-%H%M%S).tfstate
# Import resources that exist but aren't in state
terraform import aws_instance.web i-1234567890abcdef0
# Remove resources from state that no longer exist
terraform state rm aws_instance.old_server
# Move resources between configurations
terraform state mv aws_instance.web module.web.aws_instance.server
Merge conflict resolution:
# When state files conflict, use the remote version and re-import
terraform state pull > current-state.tfstate
git checkout HEAD -- terraform.tfstate
terraform refresh
terraform plan # Review differences
What’s Coming Next
Team collaboration patterns are essential for scaling Terraform beyond individual use. The workflows, access controls, and organizational practices we’ve covered enable multiple teams to work together safely and efficiently while maintaining the reliability and security that production infrastructure requires.
In the final part, we’ll explore scaling and optimization—how to handle very large Terraform configurations, multi-cloud scenarios, performance optimization, and the enterprise patterns that support infrastructure management at massive scale.