Circuit Breakers and Bulkheads
Preventing cascading failures:
Circuit Breaker Pattern:
- Monitors for failures
- Trips when failure threshold reached
- Prevents cascading failures
- Allows periodic recovery attempts
- Provides fallback mechanisms
- Improves system stability
- Enables graceful degradation
Circuit Breaker States:
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ │ │ │ │ │
│ Closed │────▶│ Open │────▶│ Half-Open │
│ (Normal) │ │ (Failing) │ │ (Testing) │
│ │ │ │ │ │
└───────────────┘ └───────────────┘ └───────────────┘
▲ │
└────────────────────────────────────────────┘
Example Circuit Breaker Implementation (Java):
// Resilience4j Circuit Breaker example
import io.github.resilience4j.circuitbreaker.CircuitBreaker;
import io.github.resilience4j.circuitbreaker.CircuitBreakerConfig;
import io.vavr.control.Try;
import java.time.Duration;
import java.util.function.Supplier;
public class OrderService {
private final PaymentService paymentService;
private final CircuitBreaker circuitBreaker;
public OrderService(PaymentService paymentService) {
this.paymentService = paymentService;
// Configure the circuit breaker
CircuitBreakerConfig config = CircuitBreakerConfig.custom()
.failureRateThreshold(50) // 50% failure rate to trip
.waitDurationInOpenState(Duration.ofSeconds(10)) // Wait 10s before testing
.ringBufferSizeInHalfOpenState(5) // Number of calls in half-open state
.ringBufferSizeInClosedState(10) // Number of calls in closed state
.automaticTransitionFromOpenToHalfOpenEnabled(true)
.build();
this.circuitBreaker = CircuitBreaker.of("paymentService", config);
}
public PaymentResult processPayment(Order order) {
// Decorate the payment service call with circuit breaker
Supplier<PaymentResult> decoratedSupplier = CircuitBreaker
.decorateSupplier(circuitBreaker, () -> paymentService.processPayment(order));
// Execute the call with fallback
return Try.ofSupplier(decoratedSupplier)
.recover(e -> fallbackPaymentMethod(order))
.get();
}
private PaymentResult fallbackPaymentMethod(Order order) {
// Fallback logic when payment service is unavailable
return new PaymentResult(
PaymentStatus.PENDING,
"Payment queued for processing",
order.getId()
);
}
}
Bulkhead Pattern:
- Isolates components and failures
- Prevents resource exhaustion
- Limits concurrent calls
- Compartmentalizes failures
- Improves fault tolerance
- Enables partial availability
- Protects critical services
Example Bulkhead Implementation (Java):
// Resilience4j Bulkhead example
import io.github.resilience4j.bulkhead.Bulkhead;
import io.github.resilience4j.bulkhead.BulkheadConfig;
import io.vavr.control.Try;
import java.time.Duration;
import java.util.function.Supplier;
public class ApiGateway {
private final UserService userService;
private final OrderService orderService;
private final InventoryService inventoryService;
private final Bulkhead userServiceBulkhead;
private final Bulkhead orderServiceBulkhead;
private final Bulkhead inventoryServiceBulkhead;
public ApiGateway(
UserService userService,
OrderService orderService,
InventoryService inventoryService) {
this.userService = userService;
this.orderService = orderService;
this.inventoryService = inventoryService;
// Configure bulkheads with different capacities based on criticality
BulkheadConfig userConfig = BulkheadConfig.custom()
.maxConcurrentCalls(20)
.maxWaitDuration(Duration.ofMillis(500))
.build();
BulkheadConfig orderConfig = BulkheadConfig.custom()
.maxConcurrentCalls(30)
.maxWaitDuration(Duration.ofMillis(1000))
.build();
BulkheadConfig inventoryConfig = BulkheadConfig.custom()
.maxConcurrentCalls(10)
.maxWaitDuration(Duration.ofMillis(200))
.build();
this.userServiceBulkhead = Bulkhead.of("userService", userConfig);
this.orderServiceBulkhead = Bulkhead.of("orderService", orderConfig);
this.inventoryServiceBulkhead = Bulkhead.of("inventoryService", inventoryConfig);
}
public UserProfile getUserProfile(String userId) {
Supplier<UserProfile> decoratedSupplier = Bulkhead
.decorateSupplier(userServiceBulkhead, () -> userService.getProfile(userId));
return Try.ofSupplier(decoratedSupplier)
.recover(e -> new UserProfile(userId, "Unknown", "Guest"))
.get();
}
public OrderDetails getOrderDetails(String orderId) {
Supplier<OrderDetails> decoratedSupplier = Bulkhead
.decorateSupplier(orderServiceBulkhead, () -> orderService.getDetails(orderId));
return Try.ofSupplier(decoratedSupplier)
.recover(e -> new OrderDetails(orderId, OrderStatus.UNKNOWN))
.get();
}
}
Retry and Backoff Strategies
Handling transient failures:
Retry Pattern:
- Automatically retry failed operations
- Handle transient failures
- Improve success probability
- Implement retry limits
- Use appropriate backoff strategies
- Consider idempotency requirements
- Monitor retry metrics
Backoff Strategies:
- Constant backoff
- Linear backoff
- Exponential backoff
- Exponential backoff with jitter
- Decorrelated jitter
- Random backoff
Example Retry with Exponential Backoff (Python):
# Python retry with exponential backoff
import random
import time
from functools import wraps
def retry_with_exponential_backoff(
max_retries=5,
base_delay_ms=100,
max_delay_ms=30000,
jitter=True
):
"""Retry decorator with exponential backoff."""
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
retries = 0
while True:
try:
return func(*args, **kwargs)
except (ConnectionError, TimeoutError) as e:
retries += 1
if retries > max_retries:
raise Exception(f"Failed after {max_retries} retries") from e
# Calculate delay with exponential backoff
delay_ms = min(base_delay_ms * (2 ** (retries - 1)), max_delay_ms)
# Add jitter to prevent thundering herd
if jitter:
delay_ms = random.uniform(0, delay_ms * 1.5)
print(f"Retry {retries}/{max_retries} after {delay_ms:.2f}ms")
time.sleep(delay_ms / 1000)
return wrapper
return decorator
@retry_with_exponential_backoff()
def fetch_data_from_api(url):
"""Fetch data from an API with retry capability."""
response = requests.get(url, timeout=5)
response.raise_for_status()
return response.json()
Retry Considerations:
- Idempotency of operations
- Retry budget and limits
- Timeout configurations
- Failure categorization
- Retry storm prevention
- Circuit breaker integration
- Monitoring and alerting