Saga Pattern in Distributed Transactions - With Examples in Go

Transactions in Microservices with Saga pattern

Page content

The Saga pattern provides an elegant solution by breaking distributed transactions into a series of local transactions with compensating actions.

Instead of relying on distributed locks that can block operations across services, Saga enables eventual consistency through a sequence of reversible steps, making it ideal for long-running business processes.

In microservices architectures, maintaining data consistency across services is one of the most challenging problems. Traditional ACID transactions don’t work when operations span multiple services with independent databases, leaving developers searching for alternative approaches to ensure data integrity.

This guide demonstrates Saga pattern implementation in Go with practical examples covering both orchestration and choreography approaches. If you need a quick reference for Go fundamentals, the Go Cheat Sheet provides a helpful overview.

construction worker with distributed transactions This nice image is generated by AI model Flux 1 dev.

Understanding the Saga Pattern

The Saga pattern was originally described by Hector Garcia-Molina and Kenneth Salem in 1987. In the context of microservices, it’s a sequence of local transactions where each transaction updates data within a single service. If any step fails, compensating transactions are executed to undo the effects of preceding steps.

Unlike traditional distributed transactions that use two-phase commit (2PC), Saga doesn’t hold locks across services, making it suitable for long-running business processes. The trade-off is eventual consistency rather than strong consistency.

Key Characteristics

  • No Distributed Locks: Each service manages its own local transaction
  • Compensating Actions: Every operation has a corresponding rollback mechanism
  • Eventual Consistency: The system eventually reaches a consistent state
  • Long-Running: Suitable for processes that take seconds, minutes, or even hours

Saga Implementation Approaches

There are two primary approaches to implementing the Saga pattern: orchestration and choreography.

Orchestration Pattern

In orchestration, a central coordinator (orchestrator) manages the entire transaction flow. The orchestrator is responsible for:

  • Invoking services in the correct order
  • Handling failures and triggering compensations
  • Maintaining the state of the saga
  • Coordinating retries and timeouts

Advantages:

  • Centralized control and visibility
  • Easier to understand and debug
  • Better error handling and recovery
  • Simpler testing of the overall flow

Disadvantages:

  • Single point of failure (though this can be mitigated)
  • Additional service to maintain
  • Can become a bottleneck for complex flows

Example in Go:

type OrderSagaOrchestrator struct {
    orderService    OrderService
    paymentService  PaymentService
    inventoryService InventoryService
    shippingService ShippingService
}

func (o *OrderSagaOrchestrator) CreateOrder(order Order) error {
    sagaID := generateSagaID()
    
    // Step 1: Create order
    orderID, err := o.orderService.Create(order)
    if err != nil {
        return err
    }
    
    // Step 2: Reserve inventory
    if err := o.inventoryService.Reserve(order.Items); err != nil {
        o.orderService.Cancel(orderID) // Compensate
        return err
    }
    
    // Step 3: Process payment
    paymentID, err := o.paymentService.Charge(order.CustomerID, order.Total)
    if err != nil {
        o.inventoryService.Release(order.Items) // Compensate
        o.orderService.Cancel(orderID)          // Compensate
        return err
    }
    
    // Step 4: Create shipment
    if err := o.shippingService.CreateShipment(orderID); err != nil {
        o.paymentService.Refund(paymentID)      // Compensate
        o.inventoryService.Release(order.Items) // Compensate
        o.orderService.Cancel(orderID)          // Compensate
        return err
    }
    
    return nil
}

Choreography Pattern

In choreography, there’s no central coordinator. Each service knows what to do and communicates through events. Services listen for events and react accordingly. This event-driven approach is particularly powerful when combined with message streaming platforms like AWS Kinesis, which provide scalable infrastructure for event distribution across microservices. For a comprehensive guide on implementing event-driven microservices with Kinesis, see Building Event-Driven Microservices with AWS Kinesis.

Advantages:

  • Decentralized and scalable
  • No single point of failure
  • Services remain loosely coupled
  • Natural fit for event-driven architectures

Disadvantages:

  • Harder to understand the overall flow
  • Difficult to debug and trace
  • Complex error handling
  • Risk of cyclic dependencies

Example with Event-Driven Architecture:

// Order Service
type OrderService struct {
    eventBus EventBus
    repo     OrderRepository
}

func (s *OrderService) CreateOrder(order Order) (string, error) {
    orderID, err := s.repo.Save(order)
    if err != nil {
        return "", err
    }
    
    s.eventBus.Publish("OrderCreated", OrderCreatedEvent{
        OrderID:    orderID,
        CustomerID: order.CustomerID,
        Items:      order.Items,
        Total:      order.Total,
    })
    
    return orderID, nil
}

func (s *OrderService) HandlePaymentFailed(event PaymentFailedEvent) error {
    return s.repo.Cancel(event.OrderID) // Compensation
}

// Payment Service
type PaymentService struct {
    eventBus EventBus
    client   PaymentClient
}

func (s *PaymentService) HandleOrderCreated(event OrderCreatedEvent) {
    paymentID, err := s.client.Charge(event.CustomerID, event.Total)
    if err != nil {
        s.eventBus.Publish("PaymentFailed", PaymentFailedEvent{
            OrderID: event.OrderID,
        })
        return
    }
    
    s.eventBus.Publish("PaymentSucceeded", PaymentSucceededEvent{
        OrderID:   event.OrderID,
        PaymentID: paymentID,
    })
}

func (s *PaymentService) HandleInventoryReservationFailed(event InventoryReservationFailedEvent) error {
    // Compensation: refund payment
    return s.client.Refund(event.PaymentID)
}

Compensation Strategies

Compensation is the heart of the Saga pattern. Each operation must have a corresponding compensation that can reverse its effects.

Types of Compensation

  1. Reversible Operations: Operations that can be directly undone

    • Example: Releasing reserved inventory, refunding payments
  2. Compensating Actions: Different operations that achieve the reverse effect

    • Example: Canceling an order instead of deleting it
  3. Pessimistic Compensation: Pre-allocate resources that can be released

    • Example: Reserve inventory before charging payment
  4. Optimistic Compensation: Execute operations and compensate if needed

    • Example: Charge payment first, refund if inventory unavailable

Idempotency Requirements

All operations and compensations must be idempotent. This ensures that retrying a failed operation doesn’t cause duplicate effects.

func (s *PaymentService) Refund(paymentID string) error {
    // Check if already refunded
    payment, err := s.getPayment(paymentID)
    if err != nil {
        return err
    }
    
    if payment.Status == "refunded" {
        return nil // Already refunded, idempotent
    }
    
    // Process refund
    return s.processRefund(paymentID)
}

Best Practices

1. Saga State Management

Maintain the state of each saga instance to track progress and enable recovery. When persisting saga state to a database, choosing the right ORM is crucial for performance and maintainability. For PostgreSQL-based implementations, consider the comparison in Comparing Go ORMs for PostgreSQL: GORM vs Ent vs Bun vs sqlc to select the best fit for your saga state storage needs:

type SagaState struct {
    ID           string
    Status       SagaStatus
    Steps        []SagaStep
    CurrentStep  int
    CreatedAt    time.Time
    UpdatedAt    time.Time
}

type SagaStep struct {
    Service     string
    Operation   string
    Status      StepStatus
    Compensated bool
    Data        map[string]interface{}
}

2. Timeout Handling

Implement timeouts for each step to prevent sagas from hanging indefinitely:

type SagaOrchestrator struct {
    timeout time.Duration
}

func (o *SagaOrchestrator) ExecuteWithTimeout(step SagaStep) error {
    ctx, cancel := context.WithTimeout(context.Background(), o.timeout)
    defer cancel()
    
    done := make(chan error, 1)
    go func() {
        done <- step.Execute()
    }()
    
    select {
    case err := <-done:
        return err
    case <-ctx.Done():
        // Timeout occurred, compensate
        if err := step.Compensate(); err != nil {
            return fmt.Errorf("compensation failed: %w", err)
        }
        return fmt.Errorf("step %s timed out after %v", step.Name(), o.timeout)
    }
}

3. Retry Logic

Implement exponential backoff for transient failures:

func retryWithBackoff(operation func() error, maxRetries int) error {
    backoff := time.Second
    for i := 0; i < maxRetries; i++ {
        err := operation()
        if err == nil {
            return nil
        }
        
        if !isTransientError(err) {
            return err
        }
        
        time.Sleep(backoff)
        backoff *= 2
    }
    return fmt.Errorf("operation failed after %d retries", maxRetries)
}

4. Event Sourcing for Saga State

Use event sourcing to maintain a complete audit trail. When implementing event stores and replay mechanisms, Go generics can help create type-safe, reusable event handling code. For advanced patterns using generics in Go, see Go Generics: Use Cases and Patterns.

type SagaEvent struct {
    SagaID    string
    EventType string
    Payload   []byte
    Timestamp time.Time
    Version   int64
}

type SagaEventStore struct {
    store EventRepository
}

func (s *SagaEventStore) AppendEvent(sagaID string, eventType string, payload interface{}) error {
    data, err := json.Marshal(payload)
    if err != nil {
        return fmt.Errorf("failed to marshal payload: %w", err)
    }
    
    version, err := s.store.GetNextVersion(sagaID)
    if err != nil {
        return fmt.Errorf("failed to get version: %w", err)
    }
    
    event := SagaEvent{
        SagaID:    sagaID,
        EventType: eventType,
        Payload:   data,
        Timestamp: time.Now(),
        Version:   version,
    }
    
    return s.store.Save(event)
}

func (s *SagaEventStore) ReplaySaga(sagaID string) (*Saga, error) {
    events, err := s.store.GetEvents(sagaID)
    if err != nil {
        return nil, fmt.Errorf("failed to get events: %w", err)
    }
    
    saga := NewSaga()
    for _, event := range events {
        if err := saga.Apply(event); err != nil {
            return nil, fmt.Errorf("failed to apply event: %w", err)
        }
    }
    
    return saga, nil
}

5. Monitoring and Observability

Implement comprehensive logging and tracing:

func (o *OrderSagaOrchestrator) CreateOrder(order Order) error {
    span := tracer.StartSpan("saga.create_order")
    defer span.Finish()
    
    span.SetTag("saga.id", sagaID)
    span.SetTag("order.id", order.ID)
    
    logger.WithFields(log.Fields{
        "saga_id": sagaID,
        "order_id": order.ID,
        "step": "create_order",
    }).Info("Saga started")
    
    // ... saga execution
    
    return nil
}

Common Patterns and Anti-Patterns

Patterns to Follow

  • Saga Coordinator Pattern: Use a dedicated service for orchestration
  • Outbox Pattern: Ensure reliable event publishing
  • Idempotency Keys: Use unique keys for all operations
  • Saga State Machine: Model saga as a state machine

Anti-Patterns to Avoid

  • Synchronous Compensation: Don’t wait for compensation to complete
  • Nested Sagas: Avoid sagas calling other sagas (use sub-sagas instead)
  • Shared State: Don’t share state between saga steps
  • Long-Running Steps: Break down steps that take too long

Tools and Frameworks

Several frameworks can help implement Saga patterns:

  • Temporal: Workflow orchestration platform with built-in Saga support
  • Zeebe: Workflow engine for microservices orchestration
  • Eventuate Tram: Saga framework for Spring Boot
  • AWS Step Functions: Serverless workflow orchestration
  • Apache Camel: Integration framework with Saga support

For orchestrator services that need CLI interfaces for management and monitoring, Building CLI Applications in Go with Cobra & Viper provides excellent patterns for creating command-line tools to interact with saga orchestrators.

When deploying saga-based microservices in Kubernetes, implementing a service mesh can significantly improve observability, security, and traffic management. Implementing Service Mesh with Istio and Linkerd covers how service meshes complement distributed transaction patterns by providing cross-cutting concerns like distributed tracing and circuit breaking.

When to Use Saga Pattern

Use the Saga pattern when:

  • ✅ Operations span multiple microservices
  • ✅ Long-running business processes
  • ✅ Eventual consistency is acceptable
  • ✅ You need to avoid distributed locks
  • ✅ Services have independent databases

Avoid when:

  • ❌ Strong consistency is required
  • ❌ Operations are simple and fast
  • ❌ All services share the same database
  • ❌ Compensation logic is too complex

Conclusion

The Saga pattern is essential for managing distributed transactions in microservices architectures. While it introduces complexity, it provides a practical solution for maintaining data consistency across service boundaries. Choose orchestration for better control and visibility, or choreography for scalability and loose coupling. Always ensure operations are idempotent, implement proper compensation logic, and maintain comprehensive observability.

The key to successful Saga implementation is understanding your consistency requirements, carefully designing compensation logic, and choosing the right approach for your use case. With proper implementation, Saga enables you to build resilient, scalable microservices that maintain data integrity across distributed systems.