Terraform Testing: Unit, Integration, and End-to-End

Most Terraform code has zero tests. That’s insane for something managing production infrastructure. We wouldn’t ship application code without tests — why do we treat the thing that creates our VPCs, databases, and IAM roles like it’s somehow less important?

I learned this lesson the painful way. Last year I pushed a Terraform change that modified a security group rule on a shared networking stack. The plan looked clean. Added an ingress rule, removed an old one. Terraform showed exactly two changes. I approved it, applied it, and went to lunch. By the time I got back, three services were down. The “old” rule I removed was the one allowing traffic between our application tier and the database subnet. The plan was technically correct — it did exactly what I told it to. But I’d told it the wrong thing, and nothing in our pipeline caught it.

That incident cost us about four hours of partial downtime and a very uncomfortable post-mortem. It also kicked off a six-week effort to build a proper testing strategy for our Terraform code. Here’s everything I’ve learned since then.

The Testing Pyramid for Infrastructure

The testing pyramid applies to infrastructure code just like application code, but the layers look different:

Static analysis (linting, policy checks) — fast, cheap, catches obvious mistakes
Unit tests — validate individual module logic without creating real resources
Integration tests — deploy real infrastructure, verify it works, tear it down
End-to-end tests — validate complete environments and cross-stack dependencies

Most teams stop at static analysis if they test at all. That’s better than nothing, but it wouldn’t have caught my security group disaster. You need all four layers, weighted toward the bottom of the pyramid.

Static Analysis: The First Line of Defense

Static analysis is the easiest win. No infrastructure gets created, tests run in seconds, and they catch a surprising number of issues. I run two tools on every commit: tflint and checkov.

tflint catches HCL errors, deprecated syntax, and provider-specific issues that terraform validate misses. Here’s a minimal config:

# .tflint.hcl
plugin "aws" {
  enabled = true
  version = "0.32.0"
  source  = "github.com/terraform-linters/tflint-ruleset-aws"
}

rule "terraform_naming_convention" {
  enabled = true
}

rule "terraform_unused_declarations" {
  enabled = true
}

Run it:

tflint --init
tflint --recursive

checkov handles policy-as-code. It checks your Terraform against hundreds of security best practices — things like “is this S3 bucket encrypted?” or “does this security group allow 0.0.0.0/0 on port 22?”

checkov -d . --framework terraform --quiet

The --quiet flag suppresses passing checks so you only see failures. I pipe this into CI and fail the build on any HIGH or CRITICAL finding.

You can write custom checkov policies too. We have one that enforces our tagging standard:

from checkov.terraform.checks.resource.base_resource_check import BaseResourceCheck
from checkov.common.models.enums import CheckResult, CheckCategories

class RequiredTagsCheck(BaseResourceCheck):
    def __init__(self):
        supported = ["aws_instance", "aws_s3_bucket", "aws_rds_cluster"]
        super().__init__(
            name="Ensure required tags are present",
            id="CUSTOM_001",
            categories=[CheckCategories.GENERAL_SECURITY],
            supported_resources=supported,
        )

    def scan_resource_conf(self, conf):
        tags = conf.get("tags", [{}])[0]
        required = {"Environment", "Team", "CostCenter"}
        if required.issubset(set(tags.keys())):
            return CheckResult.PASSED
        return CheckResult.FAILED

check = RequiredTagsCheck()

Static analysis runs in under 10 seconds on most codebases. There’s no excuse not to have it. But it only catches what it knows to look for — it wouldn’t have caught my security group issue because the configuration was technically valid. It just wasn’t what I intended.

Unit Testing with terraform test

HashiCorp shipped the terraform test framework in Terraform 1.6, and it’s matured significantly since then. This is where things get interesting. You can now write tests in HCL that validate your module logic without deploying anything — using mock providers.

Here’s a module that creates a VPC with public and private subnets. I covered module design patterns in Terraform modules, but here’s the testing side.

The module:

# modules/vpc/main.tf
variable "cidr_block" {
  type = string
}

variable "environment" {
  type = string
}

variable "private_subnet_count" {
  type    = number
  default = 2
}

resource "aws_vpc" "this" {
  cidr_block           = var.cidr_block
  enable_dns_hostnames = true
  tags = {
    Name        = "${var.environment}-vpc"
    Environment = var.environment
  }
}

resource "aws_subnet" "private" {
  count             = var.private_subnet_count
  vpc_id            = aws_vpc.this.id
  cidr_block        = cidrsubnet(var.cidr_block, 8, count.index)
  availability_zone = data.aws_availability_zones.available.names[count.index]
  tags = {
    Name = "${var.environment}-private-${count.index}"
  }
}

The test:

# tests/vpc_unit.tftest.hcl
mock_provider "aws" {}

variables {
  cidr_block           = "10.0.0.0/16"
  environment          = "test"
  private_subnet_count = 3
}

run "vpc_creates_correct_cidr" {
  command = plan

  assert {
    condition     = aws_vpc.this.cidr_block == "10.0.0.0/16"
    error_message = "VPC CIDR block doesn't match input"
  }
}

run "vpc_has_dns_hostnames" {
  command = plan

  assert {
    condition     = aws_vpc.this.enable_dns_hostnames == true
    error_message = "DNS hostnames should be enabled"
  }
}

run "correct_subnet_count" {
  command = plan

  assert {
    condition     = length(aws_subnet.private) == 3
    error_message = "Expected 3 private subnets"
  }
}

run "subnet_cidrs_are_valid" {
  command = plan

  assert {
    condition     = aws_subnet.private[0].cidr_block == "10.0.0.0/24"
    error_message = "First subnet CIDR incorrect"
  }
}

Run it with:

terraform test

The command = plan directive means Terraform only generates a plan — no resources get created. The mock provider intercepts all API calls. This runs in seconds and catches logic errors in your module configuration.

I use terraform test for things like:

Validating CIDR math and subnet calculations
Checking that tags propagate correctly
Ensuring conditional resources are created (or not) based on input variables
Verifying output values match expectations

It wouldn’t have caught my security group bug directly, but it would’ve caught it indirectly — if I’d had a test asserting “these specific ingress rules must exist,” the test would’ve failed when I removed one.

Integration Testing with Terratest

Unit tests validate logic. Integration tests validate reality. Terratest is a Go library that deploys real infrastructure, runs assertions against it, and tears it down. It’s slower and more expensive, but it catches things nothing else can.

Here’s a Terratest test for that VPC module:

package test

import (
	"testing"

	"github.com/gruntwork-io/terratest/modules/aws"
	"github.com/gruntwork-io/terratest/modules/terraform"
	"github.com/stretchr/testify/assert"
)

func TestVpcModule(t *testing.T) {
	t.Parallel()

	opts := &terraform.Options{
		TerraformDir: "../modules/vpc",
		Vars: map[string]interface{}{
			"cidr_block":           "10.99.0.0/16",
			"environment":          "test",
			"private_subnet_count": 2,
		},
	}

	defer terraform.Destroy(t, opts)
	terraform.InitAndApply(t, opts)

	vpcID := terraform.Output(t, opts, "vpc_id")
	subnets := terraform.OutputList(t, opts, "private_subnet_ids")

	assert.NotEmpty(t, vpcID)
	assert.Equal(t, 2, len(subnets))

	vpc := aws.GetVpcById(t, vpcID, "us-east-1")
	assert.Equal(t, "10.99.0.0/16", *vpc.CidrBlock)
}

A few things I’ve learned running Terratest in production pipelines:

Always use t.Parallel(). Integration tests are slow. Running them in parallel cuts your test time dramatically. Use unique naming (I append random suffixes) to avoid resource conflicts.

Always defer terraform.Destroy(). If your test panics or fails mid-way, the defer ensures cleanup still runs. Leaked test infrastructure is expensive — I once found $400/month in orphaned test resources because a test was failing before the destroy step and nobody noticed.

Use a dedicated test AWS account. Don’t run integration tests in your production account. Ever. I covered account structure in Terraform state management — the same principles apply here. Separate accounts, separate blast radius.

Set timeouts. Some resources take forever to create (looking at you, RDS). Set reasonable timeouts so your CI pipeline doesn’t hang for 45 minutes on a stuck resource.

opts := &terraform.Options{
	TerraformDir: "../modules/rds",
	Vars: map[string]interface{}{
		"instance_class": "db.t3.micro",
		"engine":         "postgres",
	},
	RetryableTerraformErrors: map[string]string{
		".*timeout.*": "Resource creation timeout, retrying",
	},
	MaxRetries:         2,
	TimeBetweenRetries: 30 * time.Second,
}

Testing Security Group Rules (The Thing That Bit Me)

After the incident, I wrote a specific integration test for our networking module. This is the test that would’ve saved me four hours of downtime:

func TestNetworkingSecurityGroups(t *testing.T) {
	t.Parallel()

	opts := &terraform.Options{
		TerraformDir: "../stacks/networking",
		Vars: map[string]interface{}{
			"environment": "test",
		},
	}

	defer terraform.Destroy(t, opts)
	terraform.InitAndApply(t, opts)

	sgID := terraform.Output(t, opts, "app_security_group_id")
	rules := aws.GetSecurityGroupIngressRules(t, sgID, "us-east-1")

	// This is the rule I accidentally deleted
	hasDBRule := false
	for _, rule := range rules {
		if rule.FromPort == 5432 && rule.ToPort == 5432 {
			hasDBRule = true
		}
	}
	assert.True(t, hasDBRule, "App SG must allow port 5432 to database subnet")
}

It’s not complicated. It just checks that the security group has the rules we expect. But that simple assertion would’ve caught my mistake before it hit production. The test runs in about 90 seconds — creating a VPC, subnets, security groups, verifying the rules, and tearing it all down.

End-to-End Testing

End-to-end tests validate that your complete environment works as a system. Individual modules might be fine, but do they work together? Can the application actually reach the database through the networking stack you built?

I keep e2e tests minimal because they’re expensive and slow. Usually one or two tests per environment that validate critical paths:

func TestFullEnvironmentConnectivity(t *testing.T) {
	// Deploy networking stack
	netOpts := &terraform.Options{
		TerraformDir: "../stacks/networking",
		Vars: map[string]interface{}{
			"environment": "e2e-test",
		},
	}
	defer terraform.Destroy(t, netOpts)
	terraform.InitAndApply(t, netOpts)

	vpcID := terraform.Output(t, netOpts, "vpc_id")
	privateSubnets := terraform.OutputList(t, netOpts, "private_subnet_ids")

	// Deploy compute stack using networking outputs
	computeOpts := &terraform.Options{
		TerraformDir: "../stacks/compute",
		Vars: map[string]interface{}{
			"vpc_id":     vpcID,
			"subnet_ids": privateSubnets,
		},
	}
	defer terraform.Destroy(t, computeOpts)
	terraform.InitAndApply(t, computeOpts)

	// Verify the endpoint is reachable
	endpoint := terraform.Output(t, computeOpts, "lb_dns_name")
	http_helper.HttpGetWithRetry(
		t, fmt.Sprintf("http://%s/health", endpoint),
		nil, 200, "ok", 30, 10*time.Second,
	)
}

This test deploys two stacks, wires them together, and verifies the application is reachable. It takes 5-10 minutes to run. I don’t run this on every commit — it runs nightly and on PRs that touch the networking or compute stacks.

Wiring It Into CI/CD

Here’s how I structure the pipeline. I use GitHub Actions for this, but the pattern works anywhere:

# .github/workflows/terraform-test.yml
name: Terraform Tests
on:
  pull_request:
    paths: ["terraform/**"]

jobs:
  static:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: tflint
        run: |
          tflint --init
          tflint --recursive
      - name: checkov
        run: checkov -d terraform/ --framework terraform --quiet --compact

  unit:
    runs-on: ubuntu-latest
    needs: static
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - name: terraform test
        run: terraform test
        working-directory: terraform/modules/vpc

  integration:
    runs-on: ubuntu-latest
    needs: unit
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v5
        with:
          go-version: "1.22"
      - name: Run integration tests
        run: go test -v -timeout 30m ./...
        working-directory: terraform/test
        env:
          AWS_REGION: us-east-1

Static analysis runs first — if it fails, nothing else runs. Unit tests next. Integration tests last, and only if the cheaper tests pass. This keeps feedback fast. Most PRs get feedback in under a minute from static analysis. Only changes that pass linting and unit tests burn the time and money on integration tests.

The Cost of Testing Infrastructure

Let’s be honest about costs. Integration tests create real AWS resources. Every test run costs money.

My VPC integration test costs about $0.02 per run. The full e2e test costs around $0.50 per run (mostly the ALB and NAT gateway). Running integration tests on every PR across a team of 8 engineers costs us roughly $80/month. The nightly e2e tests add another $15/month.

That’s $95/month to catch infrastructure bugs before they hit production. My security group incident cost us roughly $12,000 in engineering time and customer impact. The math isn’t even close.

To keep costs down:

Use the smallest instance types possible (t3.micro, db.t3.micro)
Always clean up — defer terraform.Destroy() is non-negotiable
Run expensive tests less frequently (nightly, not per-commit)
Use t.Parallel() to reduce wall-clock time

What I Test vs What I Don’t

I don’t test everything. That’s a trap. Here’s my rule of thumb:

Always test:

Networking rules (security groups, NACLs, route tables) — because mistakes here break everything
IAM policies — because overly permissive policies are a security incident waiting to happen
Module logic with conditional resources — because count and for_each bugs are subtle
Cross-stack references — because broken outputs cascade

Don’t bother testing:

Simple resource creation with no logic (a single S3 bucket with static config)
Provider behavior (you’re testing AWS at that point, not your code)
Exact resource attributes that are just pass-through from variables

If you’re coming from a codebase with zero tests, start with static analysis. Get tflint and checkov running in CI this week. Then add terraform test for your most complex modules. Integration tests come last — they’re the most valuable but also the most effort to set up and maintain.

I’ve written about the broader Terraform primer and how Terraform compares to CDK if you’re evaluating your IaC approach. But regardless of which tool you pick, test your infrastructure code. The alternative is finding bugs in production, and I can tell you from experience — that’s a lot more expensive than a few integration tests.

Where I’ve Landed

After a year of building out this testing strategy, here’s what changed: we went from roughly one infrastructure incident per month to one per quarter. Our mean time to detect dropped from “someone notices the app is down” to “CI catches it before merge.” And engineers actually trust Terraform changes now — they’re not white-knuckling every apply.

The security group incident was a turning point. Not because the fix was hard — it was a one-line change — but because it exposed how fragile our process was. We were treating terraform plan as our test suite. That’s like treating a compiler as your QA team. It tells you the code is syntactically valid. It doesn’t tell you it does what you think it does.

Test your Terraform. Start small, build up. Your future self will thank you when a plan looks clean but the tests catch what your eyes missed.