Real-Time Data Processing Fundamentals

Core Concepts and Terminology

Understanding the building blocks of real-time systems:

Real-Time Processing vs. Batch Processing:

  • Real-time: Continuous processing with minimal latency
  • Batch: Periodic processing of accumulated data
  • Micro-batch: Small batches with higher frequency
  • Near real-time: Low but not immediate latency
  • Stream processing: Continuous data flow processing

Key Concepts:

  • Events: Discrete data records representing occurrences
  • Streams: Unbounded sequences of events
  • Producers: Systems generating event data
  • Consumers: Systems processing event data
  • Topics/Channels: Named streams for event organization
  • Partitions: Subdivisions of streams for parallelism
  • Offsets: Positions within event streams

Processing Semantics:

  • At-most-once: Events may be lost but never processed twice
  • At-least-once: Events are never lost but may be processed multiple times
  • Exactly-once: Events are processed once and only once
  • Processing guarantees vs. delivery guarantees
  • End-to-end exactly-once semantics

Time Concepts in Streaming:

  • Event time: When the event actually occurred
  • Processing time: When the system processes the event
  • Ingestion time: When the system receives the event
  • Watermarks: Progress indicators for event time
  • Windows: Time-based groupings of events

Real-Time Processing Architectures

Common patterns for building real-time systems:

Lambda Architecture:

  • Combines batch and stream processing
  • Batch layer for accuracy
  • Speed layer for low latency
  • Serving layer for query access
  • Reconciliation between layers
  • Duplicate processing logic

Example Lambda Architecture:

┌───────────────┐
│               │
│  Data Sources │
│               │
└───────┬───────┘
        │
        ▼
┌───────────────┐     ┌───────────────┐
│               │     │               │
│  Batch Layer  │     │ Speed Layer   │
│               │     │               │
└───────┬───────┘     └───────┬───────┘
        │                     │
        ▼                     ▼
┌───────────────┐     ┌───────────────┐
│               │     │               │
│  Batch Views  │     │ Real-time     │
│               │     │ Views         │
└───────┬───────┘     └───────┬───────┘
        │                     │
        └─────────┬───────────┘
                  │
                  ▼
          ┌───────────────┐
          │               │
          │  Serving      │
          │  Layer        │
          │               │
          └───────────────┘

Kappa Architecture:

  • Stream processing only
  • Single processing path
  • Reprocessing for historical data
  • Simplified maintenance
  • Unified programming model
  • Reduced complexity

Example Kappa Architecture:

┌───────────────┐
│               │
│  Data Sources │
│               │
└───────┬───────┘
        │
        ▼
┌───────────────┐
│               │
│  Stream       │
│  Processing   │
│  Layer        │
│               │
└───────┬───────┘
        │
        ▼
┌───────────────┐
│               │
│  Serving      │
│  Layer        │
│               │
└───────────────┘

Modern Event-Driven Architecture:

  • Event backbone (e.g., Kafka)
  • Event processors (e.g., Flink, Kafka Streams)
  • Event stores
  • Command and query responsibility segregation (CQRS)
  • Event sourcing
  • Materialized views

Example Event-Driven Architecture:

┌───────────────┐     ┌───────────────┐     ┌───────────────┐
│               │     │               │     │               │
│  Event        │     │  Event        │     │  Event        │
│  Producers    │────▶│  Backbone     │────▶│  Processors   │
│               │     │               │     │               │
└───────────────┘     └───────────────┘     └───────┬───────┘
                                                    │
                                                    │
                      ┌───────────────┐             │
                      │               │             │
                      │  Query        │◀────────────┘
                      │  Services     │
                      │               │
                      └───────────────┘