AWS EventBridge: Building Event-Driven Architectures

EventBridge is the most underused AWS service. I’ll die on that hill. Teams will build these elaborate Rube Goldberg machines out of SNS topics, SQS queues, and Lambda functions stitched together with duct tape and prayers, when EventBridge would’ve given them a cleaner architecture in a fraction of the time.

I know this because I was one of those teams. About two years ago I inherited a system where a single order placement triggered a cascade of 14 SNS topics fanning out to 23 SQS queues. Nobody could tell me what happened when an order was placed without opening a spreadsheet. A spreadsheet. For message routing. When I asked why they hadn’t used EventBridge, the answer was “we started before it existed and never migrated.” Fair enough. But the pain was real — we’d get phantom duplicate processing, messages landing in DLQs with no context about where they came from, and debugging meant grepping through six different CloudWatch log groups hoping to find a correlation ID someone remembered to pass along.

We migrated the whole thing to EventBridge over about three months. The 14 SNS topics and 23 SQS queues collapsed into two custom event buses with a handful of rules. The spreadsheet became unnecessary because the rules themselves documented the routing. If you’ve been building event-driven architecture patterns on AWS and haven’t seriously looked at EventBridge, this post is for you.

Why EventBridge Over SNS/SQS

SNS and SQS aren’t going anywhere, and they’re still the right choice for certain patterns. SQS is unbeatable for simple work queues with backpressure. SNS is great for straightforward fan-out where every subscriber gets every message. But the moment your routing logic gets conditional — “send order events to the fulfillment service, but only if the order total exceeds $100 and the shipping destination is international” — you’re either filtering in the consumer (wasteful) or building a Lambda function between SNS and your targets (fragile).

EventBridge handles this natively with content-based filtering in rules. The event bus receives events, rules match against event content using patterns, and matched events get routed to targets. No intermediate compute. No custom filtering code to maintain.

The other thing that sold me: EventBridge has a schema registry. Events aren’t just opaque blobs anymore. The registry discovers and catalogs event shapes automatically, generates code bindings, and gives you a place to look when you’re wondering “what fields does the OrderPlaced event actually contain?” Try getting that from an SNS topic.

Custom Event Buses and When to Use Them

Every AWS account comes with a default event bus that receives AWS service events — things like EC2 state changes, CodePipeline execution updates, S3 object notifications via CloudTrail. You can put rules on the default bus, and for reacting to AWS service events, that’s exactly what you should do.

For your own application events, create custom event buses. I typically create one per bounded context or domain. An e-commerce platform might have orders-bus, inventory-bus, and payments-bus. This gives you isolation — a bad rule on the orders bus can’t accidentally match inventory events — and it maps cleanly to team ownership.

Here’s the CLI to create one:

aws events create-event-bus --name orders-bus

And the Terraform equivalent:

resource "aws_cloudwatch_event_bus" "orders" {
  name = "orders-bus"
}

Putting an event on the bus:

aws events put-events --entries '[
  {
    "Source": "com.myapp.orders",
    "DetailType": "OrderPlaced",
    "Detail": "{\"orderId\":\"ord-123\",\"total\":149.99,\"currency\":\"USD\"}",
    "EventBusName": "orders-bus"
  }
]'

One thing that tripped me up early: the Detail field is a JSON string, not a JSON object. You’re JSON-encoding JSON. It looks weird but that’s the API contract. Every 64KB chunk of payload counts as one event for billing purposes, and the maximum event size is 256KB.

Rules and Event Patterns

Rules are where EventBridge earns its keep. A rule matches incoming events against a pattern and routes matches to one or more targets (up to five per rule). The pattern language is declarative — no code, no Lambda, just JSON.

Match all OrderPlaced events:

{
  "source": ["com.myapp.orders"],
  "detail-type": ["OrderPlaced"]
}

Match only high-value international orders:

{
  "source": ["com.myapp.orders"],
  "detail-type": ["OrderPlaced"],
  "detail": {
    "total": [{"numeric": [">=", 100]}],
    "shipping": {
      "country": [{"anything-but": "US"}]
    }
  }
}

In Terraform:

resource "aws_cloudwatch_event_rule" "high_value_international" {
  name           = "high-value-international-orders"
  event_bus_name = aws_cloudwatch_event_bus.orders.name

  event_pattern = jsonencode({
    source      = ["com.myapp.orders"]
    detail-type = ["OrderPlaced"]
    detail = {
      total    = [{ numeric = [">=", 100] }]
      shipping = { country = [{ "anything-but" = "US" }] }
    }
  })
}

resource "aws_cloudwatch_event_target" "fulfillment_lambda" {
  rule           = aws_cloudwatch_event_rule.high_value_international.name
  event_bus_name = aws_cloudwatch_event_bus.orders.name
  arn            = aws_lambda_function.fulfillment.arn
  target_id      = "fulfillment"
}

The pattern language supports prefix matching, suffix matching, numeric comparisons, anything-but, exists, and the $or operator for combining conditions. It’s surprisingly expressive. I’ve yet to hit a routing scenario it couldn’t handle.

One gotcha from our migration: make your patterns as specific as possible. Always include source and detail-type at minimum. We had a rule early on that only matched on a detail field, and it started catching events from a completely different service that happened to use the same field name. Debugging that was not fun. The EventBridge best practices docs are worth reading — especially the section on avoiding infinite loops.

You can validate patterns before deploying with the CLI:

aws events test-event-pattern \
  --event-pattern '{"source":["com.myapp.orders"]}' \
  --event '{"source":"com.myapp.orders","detail-type":"OrderPlaced","detail":{}}'

Schema Discovery and the Schema Registry

This is the feature that doesn’t get enough attention. When you enable schema discovery on an event bus, EventBridge watches the events flowing through and automatically catalogs their shapes in the schema registry. It creates new schema versions when event structures change.

aws schemas create-discoverer \
  --source-arn arn:aws:events:us-east-1:123456789012:event-bus/orders-bus \
  --description "Discover order event schemas"

Once schemas are discovered, you can generate code bindings:

aws schemas get-code-binding-source \
  --registry-name discovered-schemas \
  --schema-name com.myapp.orders@OrderPlaced \
  --language Python36 \
  --output text > order_placed.py

In practice, I don’t rely solely on auto-discovery for production schemas. I define them explicitly using OpenAPI 3.0 or JSONSchema Draft 4 format and register them manually. Auto-discovery is brilliant for development — you fire events and the registry tells you what shape they are. But for production contracts between teams, explicit schemas prevent the “someone added a field and broke our consumer” problem.

The registry also integrates with the EventBridge console’s sandbox, where you can test event patterns against sample events before deploying rules. If you’re building serverless architecture patterns with multiple teams publishing events, the schema registry becomes your contract layer.

Cross-Account Event Delivery

This is where EventBridge really shines for organizations with multiple AWS accounts — which is everyone following AWS best practices at this point. You’ve got your workload accounts, your shared services account, your security account. Events need to flow between them.

The old way required chaining event buses: source account bus → rule → target event bus in destination account → another rule → actual target. Two buses, two rules, double the configuration.

As of early 2025, EventBridge supports direct cross-account delivery. You can route events directly to SQS queues, Lambda functions, SNS topics, Kinesis streams, and API Gateway endpoints in other accounts. No intermediate bus needed.

The setup requires mutual trust. In the source account, your rule needs an IAM execution role:

resource "aws_iam_role" "eventbridge_cross_account" {
  name = "eventbridge-cross-account-delivery"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = { Service = "events.amazonaws.com" }
    }]
  })
}

resource "aws_iam_role_policy" "send_to_target" {
  role = aws_iam_role.eventbridge_cross_account.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect   = "Allow"
      Action   = "sqs:SendMessage"
      Resource = "arn:aws:sqs:us-east-1:TARGET_ACCOUNT:order-processing"
    }]
  })
}

In the target account, the SQS queue needs a resource policy allowing the execution role:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "AWS": "arn:aws:iam::SOURCE_ACCOUNT:role/eventbridge-cross-account-delivery"
    },
    "Action": "sqs:SendMessage",
    "Resource": "arn:aws:sqs:us-east-1:TARGET_ACCOUNT:order-processing"
  }]
}

This simplified our multi-account setup enormously. The receiving team doesn’t need to manage any EventBridge resources — they just grant IAM permissions on their target. If you’re working with microservices patterns across account boundaries, direct delivery cuts both latency and operational overhead.

EventBridge Pipes

Pipes are the newer addition that handles point-to-point integrations. Where event buses do many-to-many routing, Pipes connect a single source to a single target with optional filtering, enrichment, and transformation in between. No compute required for the basic flow.

Supported sources include SQS, DynamoDB Streams, Kinesis, MSK, and self-managed Kafka. You can filter events before they leave the source, enrich them by calling a Lambda or API Gateway or Step Functions, and transform the payload before it hits the target.

I use Pipes for the “DynamoDB change → do something” pattern that used to require a Lambda function just to reshape the DynamoDB stream record into something the downstream service could understand. Now a Pipe with an input transformer handles it directly.

The key distinction: use event buses when multiple consumers need the same events or when routing logic is complex. Use Pipes when you have a straightforward source-to-target flow and want to avoid writing glue code.

Archive and Replay

EventBridge lets you archive events and replay them later. This is invaluable for debugging, testing, and disaster recovery. You create an archive on an event bus, optionally with a filter pattern so you’re not archiving everything, and set a retention period.

aws events create-archive \
  --archive-name order-events-archive \
  --source-arn arn:aws:events:us-east-1:123456789012:event-bus/orders-bus \
  --event-pattern '{"source":["com.myapp.orders"]}' \
  --retention-days 90

When you need to replay — say a downstream service was down and missed events, or you’re testing a new consumer — you start a replay:

aws events start-replay \
  --replay-name replay-missed-orders \
  --event-source-arn arn:aws:events:us-east-1:123456789012:event-bus/orders-bus \
  --destination '{"Arn":"arn:aws:events:us-east-1:123456789012:event-bus/orders-bus"}' \
  --event-start-time 2026-03-15T00:00:00Z \
  --event-end-time 2026-03-16T00:00:00Z

Replayed events have a replay-name field in their metadata, so your consumers can distinguish replays from live events if needed. Make sure your consumers are idempotent — this matters for any event-driven architecture, but especially when you’re replaying potentially thousands of events.

EventBridge Scheduler

Scheduler is the cron replacement you didn’t know you needed. It supports one-time schedules, rate-based schedules, and full cron expressions with timezone support. Unlike CloudWatch Events scheduled rules, Scheduler can invoke targets with custom payloads and supports at-least-once delivery with built-in retry policies.

resource "aws_scheduler_schedule" "daily_report" {
  name       = "daily-order-report"
  group_name = "default"

  flexible_time_window {
    mode = "OFF"
  }

  schedule_expression          = "cron(0 8 * * ? *)"
  schedule_expression_timezone = "America/New_York"

  target {
    arn      = aws_lambda_function.report_generator.arn
    role_arn = aws_iam_role.scheduler.arn

    input = jsonencode({
      reportType = "daily-orders"
      format     = "csv"
    })
  }
}

The free tier includes 14 million invocations per month, which is generous. I’ve moved all our scheduled Lambda invocations from CloudWatch Events rules to Scheduler — the timezone support alone was worth it. No more UTC-to-local mental math.

Integration Patterns That Work

After running EventBridge in production across several projects, here are the patterns I keep coming back to.

Event notification, not event-carried state transfer. Keep events small. Include the entity ID and what happened, not the entire entity state. Let consumers call back for the full data if they need it. This keeps your events under the 256KB limit and avoids coupling consumers to your data model.

{
  "source": "com.myapp.orders",
  "detail-type": "OrderPlaced",
  "detail": {
    "orderId": "ord-123",
    "customerId": "cust-456",
    "total": 149.99,
    "itemCount": 3
  }
}

Dead-letter queues on every rule target. EventBridge retries failed deliveries for 24 hours with exponential backoff. After that, events are gone unless you’ve configured a DLQ. Always configure a DLQ.

resource "aws_cloudwatch_event_target" "with_dlq" {
  rule           = aws_cloudwatch_event_rule.orders.name
  event_bus_name = aws_cloudwatch_event_bus.orders.name
  arn            = aws_lambda_function.processor.arn

  dead_letter_config {
    arn = aws_sqs_queue.dlq.arn
  }

  retry_policy {
    maximum_event_age_in_seconds = 3600
    maximum_retry_attempts       = 3
  }
}

Input transformers for target-specific payloads. Your event schema shouldn’t be dictated by what your targets expect. Use input transformers to reshape events before delivery.

Observability from day one. Enable CloudWatch metrics on your event buses. Monitor FailedInvocations, TriggeredRules, and Invocations. Set alarms on FailedInvocations — if events aren’t reaching targets, you want to know immediately, not when a customer complains. If you’re dealing with Lambda cold starts on your target functions, factor that into your retry configuration.

The Migration Playbook

If you’re sitting on a tangled SNS/SQS architecture like I was, here’s how I’d approach the migration.

Don’t try to migrate everything at once. Pick one event flow — ideally one that’s causing pain — and move it to EventBridge. Run the old and new paths in parallel. Compare outputs. When you’re confident, cut over and decommission the old path.

Start by mapping your current routing. For every SNS topic, document: what publishes to it, what subscribes, and what filtering (if any) happens in the consumers. This map becomes your EventBridge rule definitions. Each SNS topic with conditional routing becomes a custom event bus with rules. Each SQS subscriber becomes a rule target.

The hardest part isn’t the technical migration — it’s getting teams to agree on event schemas. Invest time here. Use the schema registry. Define your source naming convention (I use reverse domain notation: com.company.service) and your detail-type naming convention (past-tense verbs: OrderPlaced, PaymentProcessed, ShipmentDispatched). Document it. Enforce it in code review.

EventBridge has a default quota of 300 rules per event bus and a maximum of five targets per rule. If you’re hitting those limits, you probably need to rethink your event bus boundaries rather than requesting increases.

What I’d Do Differently

Looking back at our migration, I’d change a few things. I’d have enabled schema discovery on day one in our dev environment — we spent too long manually documenting event shapes that the registry would’ve captured automatically. I’d have set up archive and replay earlier — we lost events during the parallel-run phase because a consumer bug went unnoticed for two days.

And I’d have pushed harder for EventBridge from the start of the project, before the SNS/SQS sprawl became entrenched. The best time to adopt EventBridge is before you need it. The second best time is now.

If you’re building anything with more than two or three services communicating asynchronously, EventBridge should be your default choice. It’s not the right tool for strict ordering (use SQS FIFO or Kinesis for that) and it’s not the right tool for high-throughput streaming (Kinesis again). But for event-driven routing, schema management, cross-account delivery, and operational sanity? Nothing else on AWS comes close.