What is the Saga pattern in microservices?

The Saga pattern is a design pattern for managing distributed transactions across multiple microservices. Instead of using traditional ACID transactions (which don’t work well across service boundaries), Saga breaks a transaction into a series of local transactions, each with a compensating action that can undo the operation if needed.

What’s the difference between orchestration and choreography in Saga?

Orchestration uses a central coordinator (orchestrator) that manages the entire transaction flow and decides which services to call and in what order. Choreography is decentralized, where each service knows what to do next and communicates through events, making it more scalable but harder to debug.

When should I use the Saga pattern?

Use Saga when you need to maintain data consistency across multiple microservices that don’t share a database. It’s ideal for long-running business processes, order processing, payment workflows, and any scenario where you need eventual consistency rather than strong consistency.

What are the main challenges with Saga pattern?

Key challenges include handling partial failures, ensuring idempotency of operations, managing compensation logic complexity, dealing with concurrent sagas, and maintaining visibility into distributed transaction state. Testing can also be complex due to the distributed nature.

How do I implement compensation in Saga?

Each step in a Saga should have a corresponding compensation action that can reverse its effects. Compensations must be idempotent and should handle cases where the original operation partially completed. Store compensation logic alongside business logic for maintainability.

Can Saga guarantee ACID properties?

No, Saga provides eventual consistency rather than ACID guarantees. It ensures that either all operations complete successfully or compensating actions are executed to roll back changes. This is often acceptable in microservices where strong consistency would create tight coupling.

Saga Pattern in Distributed Transactions - With Examples in Go

Transactions in Microservices with Saga pattern

Page content

The Saga pattern provides an elegant solution by breaking distributed transactions into a series of local transactions with compensating actions.

Instead of relying on distributed locks that can block operations across services, Saga enables eventual consistency through a sequence of reversible steps, making it ideal for long-running business processes.

In microservices architectures, maintaining data consistency across services is one of the most challenging problems. Traditional ACID transactions don’t work when operations span multiple services with independent databases, leaving developers searching for alternative approaches to ensure data integrity.

This guide demonstrates Saga pattern implementation in Go with practical examples covering both orchestration and choreography approaches. If you need a quick reference for Go fundamentals, the Go Cheat Sheet provides a helpful overview.

construction worker with distributed transactions This nice image is generated by AI model Flux 1 dev.

Understanding the Saga Pattern

The Saga pattern was originally described by Hector Garcia-Molina and Kenneth Salem in 1987. In the context of microservices, it’s a sequence of local transactions where each transaction updates data within a single service. If any step fails, compensating transactions are executed to undo the effects of preceding steps.

Unlike traditional distributed transactions that use two-phase commit (2PC), Saga doesn’t hold locks across services, making it suitable for long-running business processes. The trade-off is eventual consistency rather than strong consistency.

Key Characteristics

No Distributed Locks: Each service manages its own local transaction
Compensating Actions: Every operation has a corresponding rollback mechanism
Eventual Consistency: The system eventually reaches a consistent state
Long-Running: Suitable for processes that take seconds, minutes, or even hours

Saga Implementation Approaches

There are two primary approaches to implementing the Saga pattern: orchestration and choreography.

Orchestration Pattern

In orchestration, a central coordinator (orchestrator) manages the entire transaction flow. The orchestrator is responsible for:

Invoking services in the correct order
Handling failures and triggering compensations
Maintaining the state of the saga
Coordinating retries and timeouts

Advantages:

Centralized control and visibility
Easier to understand and debug
Better error handling and recovery
Simpler testing of the overall flow

Disadvantages:

Single point of failure (though this can be mitigated)
Additional service to maintain
Can become a bottleneck for complex flows

Example in Go:

type OrderSagaOrchestrator struct {
    orderService    OrderService
    paymentService  PaymentService
    inventoryService InventoryService
    shippingService ShippingService
}

func (o *OrderSagaOrchestrator) CreateOrder(order Order) error {
    sagaID := generateSagaID()
    
    // Step 1: Create order
    orderID, err := o.orderService.Create(order)
    if err != nil {
        return err
    }
    
    // Step 2: Reserve inventory
    if err := o.inventoryService.Reserve(order.Items); err != nil {
        o.orderService.Cancel(orderID) // Compensate
        return err
    }
    
    // Step 3: Process payment
    paymentID, err := o.paymentService.Charge(order.CustomerID, order.Total)
    if err != nil {
        o.inventoryService.Release(order.Items) // Compensate
        o.orderService.Cancel(orderID)          // Compensate
        return err
    }
    
    // Step 4: Create shipment
    if err := o.shippingService.CreateShipment(orderID); err != nil {
        o.paymentService.Refund(paymentID)      // Compensate
        o.inventoryService.Release(order.Items) // Compensate
        o.orderService.Cancel(orderID)          // Compensate
        return err
    }
    
    return nil
}

Choreography Pattern

In choreography, there’s no central coordinator. Each service knows what to do and communicates through events. Services listen for events and react accordingly. This event-driven approach is particularly powerful when combined with message streaming platforms like AWS Kinesis, which provide scalable infrastructure for event distribution across microservices. For a comprehensive guide on implementing event-driven microservices with Kinesis, see Building Event-Driven Microservices with AWS Kinesis.

Advantages:

Decentralized and scalable
No single point of failure
Services remain loosely coupled
Natural fit for event-driven architectures

Disadvantages:

Harder to understand the overall flow
Difficult to debug and trace
Complex error handling
Risk of cyclic dependencies

Example with Event-Driven Architecture:

// Order Service
type OrderService struct {
    eventBus EventBus
    repo     OrderRepository
}

func (s *OrderService) CreateOrder(order Order) (string, error) {
    orderID, err := s.repo.Save(order)
    if err != nil {
        return "", err
    }
    
    s.eventBus.Publish("OrderCreated", OrderCreatedEvent{
        OrderID:    orderID,
        CustomerID: order.CustomerID,
        Items:      order.Items,
        Total:      order.Total,
    })
    
    return orderID, nil
}

func (s *OrderService) HandlePaymentFailed(event PaymentFailedEvent) error {
    return s.repo.Cancel(event.OrderID) // Compensation
}

// Payment Service
type PaymentService struct {
    eventBus EventBus
    client   PaymentClient
}

func (s *PaymentService) HandleOrderCreated(event OrderCreatedEvent) {
    paymentID, err := s.client.Charge(event.CustomerID, event.Total)
    if err != nil {
        s.eventBus.Publish("PaymentFailed", PaymentFailedEvent{
            OrderID: event.OrderID,
        })
        return
    }
    
    s.eventBus.Publish("PaymentSucceeded", PaymentSucceededEvent{
        OrderID:   event.OrderID,
        PaymentID: paymentID,
    })
}

func (s *PaymentService) HandleInventoryReservationFailed(event InventoryReservationFailedEvent) error {
    // Compensation: refund payment
    return s.client.Refund(event.PaymentID)
}

Compensation Strategies

Compensation is the heart of the Saga pattern. Each operation must have a corresponding compensation that can reverse its effects.

Types of Compensation

Reversible Operations: Operations that can be directly undone
- Example: Releasing reserved inventory, refunding payments
Compensating Actions: Different operations that achieve the reverse effect
- Example: Canceling an order instead of deleting it
Pessimistic Compensation: Pre-allocate resources that can be released
- Example: Reserve inventory before charging payment
Optimistic Compensation: Execute operations and compensate if needed
- Example: Charge payment first, refund if inventory unavailable

Idempotency Requirements

All operations and compensations must be idempotent. This ensures that retrying a failed operation doesn’t cause duplicate effects.

func (s *PaymentService) Refund(paymentID string) error {
    // Check if already refunded
    payment, err := s.getPayment(paymentID)
    if err != nil {
        return err
    }
    
    if payment.Status == "refunded" {
        return nil // Already refunded, idempotent
    }
    
    // Process refund
    return s.processRefund(paymentID)
}

Best Practices

1. Saga State Management

Maintain the state of each saga instance to track progress and enable recovery. When persisting saga state to a database, choosing the right ORM is crucial for performance and maintainability. For PostgreSQL-based implementations, consider the comparison in Comparing Go ORMs for PostgreSQL: GORM vs Ent vs Bun vs sqlc to select the best fit for your saga state storage needs:

type SagaState struct {
    ID           string
    Status       SagaStatus
    Steps        []SagaStep
    CurrentStep  int
    CreatedAt    time.Time
    UpdatedAt    time.Time
}

type SagaStep struct {
    Service     string
    Operation   string
    Status      StepStatus
    Compensated bool
    Data        map[string]interface{}
}

2. Timeout Handling

Implement timeouts for each step to prevent sagas from hanging indefinitely:

type SagaOrchestrator struct {
    timeout time.Duration
}

func (o *SagaOrchestrator) ExecuteWithTimeout(step SagaStep) error {
    ctx, cancel := context.WithTimeout(context.Background(), o.timeout)
    defer cancel()
    
    done := make(chan error, 1)
    go func() {
        done <- step.Execute()
    }()
    
    select {
    case err := <-done:
        return err
    case <-ctx.Done():
        // Timeout occurred, compensate
        if err := step.Compensate(); err != nil {
            return fmt.Errorf("compensation failed: %w", err)
        }
        return fmt.Errorf("step %s timed out after %v", step.Name(), o.timeout)
    }
}

3. Retry Logic

Implement exponential backoff for transient failures:

func retryWithBackoff(operation func() error, maxRetries int) error {
    backoff := time.Second
    for i := 0; i < maxRetries; i++ {
        err := operation()
        if err == nil {
            return nil
        }
        
        if !isTransientError(err) {
            return err
        }
        
        time.Sleep(backoff)
        backoff *= 2
    }
    return fmt.Errorf("operation failed after %d retries", maxRetries)
}

4. Event Sourcing for Saga State

Use event sourcing to maintain a complete audit trail. When implementing event stores and replay mechanisms, Go generics can help create type-safe, reusable event handling code. For advanced patterns using generics in Go, see Go Generics: Use Cases and Patterns.

type SagaEvent struct {
    SagaID    string
    EventType string
    Payload   []byte
    Timestamp time.Time
    Version   int64
}

type SagaEventStore struct {
    store EventRepository
}

func (s *SagaEventStore) AppendEvent(sagaID string, eventType string, payload interface{}) error {
    data, err := json.Marshal(payload)
    if err != nil {
        return fmt.Errorf("failed to marshal payload: %w", err)
    }
    
    version, err := s.store.GetNextVersion(sagaID)
    if err != nil {
        return fmt.Errorf("failed to get version: %w", err)
    }
    
    event := SagaEvent{
        SagaID:    sagaID,
        EventType: eventType,
        Payload:   data,
        Timestamp: time.Now(),
        Version:   version,
    }
    
    return s.store.Save(event)
}

func (s *SagaEventStore) ReplaySaga(sagaID string) (*Saga, error) {
    events, err := s.store.GetEvents(sagaID)
    if err != nil {
        return nil, fmt.Errorf("failed to get events: %w", err)
    }
    
    saga := NewSaga()
    for _, event := range events {
        if err := saga.Apply(event); err != nil {
            return nil, fmt.Errorf("failed to apply event: %w", err)
        }
    }
    
    return saga, nil
}

5. Monitoring and Observability

Implement comprehensive logging and tracing:

func (o *OrderSagaOrchestrator) CreateOrder(order Order) error {
    span := tracer.StartSpan("saga.create_order")
    defer span.Finish()
    
    span.SetTag("saga.id", sagaID)
    span.SetTag("order.id", order.ID)
    
    logger.WithFields(log.Fields{
        "saga_id": sagaID,
        "order_id": order.ID,
        "step": "create_order",
    }).Info("Saga started")
    
    // ... saga execution
    
    return nil
}

Common Patterns and Anti-Patterns

Patterns to Follow

Saga Coordinator Pattern: Use a dedicated service for orchestration
Outbox Pattern: Ensure reliable event publishing
Idempotency Keys: Use unique keys for all operations
Saga State Machine: Model saga as a state machine

Anti-Patterns to Avoid

Synchronous Compensation: Don’t wait for compensation to complete
Nested Sagas: Avoid sagas calling other sagas (use sub-sagas instead)
Shared State: Don’t share state between saga steps
Long-Running Steps: Break down steps that take too long

Tools and Frameworks

Several frameworks can help implement Saga patterns:

Temporal: Workflow orchestration platform with built-in Saga support
Zeebe: Workflow engine for microservices orchestration
Eventuate Tram: Saga framework for Spring Boot
AWS Step Functions: Serverless workflow orchestration
Apache Camel: Integration framework with Saga support

For orchestrator services that need CLI interfaces for management and monitoring, Building CLI Applications in Go with Cobra & Viper provides excellent patterns for creating command-line tools to interact with saga orchestrators.

When deploying saga-based microservices in Kubernetes, implementing a service mesh can significantly improve observability, security, and traffic management. Implementing Service Mesh with Istio and Linkerd covers how service meshes complement distributed transaction patterns by providing cross-cutting concerns like distributed tracing and circuit breaking.

When to Use Saga Pattern

Use the Saga pattern when:

✅ Operations span multiple microservices
✅ Long-running business processes
✅ Eventual consistency is acceptable
✅ You need to avoid distributed locks
✅ Services have independent databases

Avoid when:

❌ Strong consistency is required
❌ Operations are simple and fast
❌ All services share the same database
❌ Compensation logic is too complex

Conclusion

The Saga pattern is essential for managing distributed transactions in microservices architectures. While it introduces complexity, it provides a practical solution for maintaining data consistency across service boundaries. Choose orchestration for better control and visibility, or choreography for scalability and loose coupling. Always ensure operations are idempotent, implement proper compensation logic, and maintain comprehensive observability.

The key to successful Saga implementation is understanding your consistency requirements, carefully designing compensation logic, and choosing the right approach for your use case. With proper implementation, Saga enables you to build resilient, scalable microservices that maintain data integrity across distributed systems.