Chapter 4: Managing Transactions with Sagas

How to maintain data consistency across services without distributed transactions — using the Saga pattern, choreography vs. orchestration, and countermeasures for isolation anomalies.

9 Patterns 10 Definitions 4 Principles 6 Problems 2 Tradeoffs 2 Technologies 33 Total Concepts

4.1 Transaction Management in a Microservice Architecture

In a monolith, you can use a single ACID database transaction to keep data consistent. In a microservice architecture, each service has its own database. When an operation spans multiple services, you need a different approach to maintain consistency.

The Need for Distributed Transactions

Many business operations touch data owned by more than one service. For example, creating an order in the FTGO application requires verifying the consumer, validating the restaurant menu, authorizing the credit card, and creating a kitchen ticket. Each of these actions belongs to a different service with its own database.

In a monolith, all this data lives in one database, so a single transaction guarantees consistency. In a microservice architecture, there is no single database. You cannot wrap all these steps in one ACID transaction. You need a mechanism that coordinates updates across multiple service databases.

Analogy: Imagine a group project where each team member keeps their own notebook. There is no shared notebook. If the project needs changes across all notebooks at once, the team needs a protocol to make sure everyone updates their notebook correctly — or rolls back if something goes wrong.

The Trouble with Distributed Transactions

The traditional solution for cross-database consistency is distributed transactions, typically using the X/Open XA standard and two-phase commit (2PC). However, this approach has serious problems in a microservice architecture:

Problem: Distributed transactions (2PC) require all participants to be available simultaneously. This reduces system availability and does not work with many modern databases and brokers. They are not a viable solution for microservices.

Conclusion: The failure of distributed transactions in microservices motivates the need for a different pattern — the Saga pattern.

Using the Saga Pattern to Maintain Data Consistency

A saga is a sequence of local transactions. Each local transaction updates a single service's database using standard ACID guarantees, and then publishes a message or event to trigger the next step. If any step fails, the saga executes compensating transactions in reverse order to undo the work done by previous steps.

Pattern: Saga

Problem How to maintain data consistency across services without distributed transactions?
Solution Define a saga — a sequence of local transactions coordinated via asynchronous messaging. On failure, compensating transactions undo completed steps in reverse order.
Properties ACD (Atomicity, Consistency, Durability) but NO Isolation.
Requires Transactional Messaging (Ch. 3) to atomically update the database and send messages.

Saga Properties: ACD without Isolation

A saga provides three of the four ACID properties:

A saga does not provide isolation. Other transactions can see intermediate data while the saga is in progress. This lack of isolation can cause anomalies, which are addressed in section 4.3.

Transaction Types in a Saga

The steps of a saga fall into three categories:

Type Description Compensation
Compensatable Steps that can be undone. They precede the pivot transaction. Yes — has a compensating transaction
Pivot The go/no-go decision point. If it commits, the saga will run to completion. No — it is the boundary
Retriable Steps that follow the pivot. They are guaranteed to succeed. No — always succeeds

Example: Create Order Saga

The Create Order Saga in FTGO has six steps:

  1. Order Service — create order in PENDING state (compensatable)
  2. Consumer Service — verify consumer (compensatable)
  3. Kitchen Service — create ticket (compensatable)
  4. Accounting Service — authorize credit card (pivot)
  5. Kitchen Service — approve ticket (retriable)
  6. Order Service — approve order (retriable)

Steps 1–3 are compensatable — each has a compensating transaction that undoes its work. Step 4 is the pivot — if the credit card is authorized, the saga will definitely complete. Steps 5–6 are retriable — they are guaranteed to succeed because the pivot already committed.

Analogy: Think of booking a vacation. You reserve a flight, then a hotel, then a rental car. If the rental car is not available, you cancel the hotel and the flight in reverse order. The "credit card charge" step is the point of no return — once you pay, the rest of the trip will happen.

4.2 Coordinating Sagas

A saga involves multiple services that must execute their steps in the right order. There are two approaches to coordinate a saga: choreography and orchestration.

Aspect Choreography Orchestration
Coordination Distributed — participants exchange events Centralized — orchestrator sends commands
Coupling Loose (via events) Low (orchestrator knows the flow)
Complexity Hard to understand as saga grows Clear, linear flow
Dependencies Risk of cyclic dependencies No cyclic dependencies
Separation of concerns Saga logic spread across participants Saga logic centralized in orchestrator
Risk Tight coupling through event chains Bloated orchestrator with too much logic

Recommendation: Use orchestration for all but the simplest sagas. Keep the orchestrator focused only on sequencing — it should not contain business logic.

Important: Both choreography and orchestration require Transactional Messaging (covered in Ch. 3) to atomically update the database and send/publish messages in the same operation.

Choreography-Based Sagas

In a choreography-based saga, there is no central coordinator. Each participant listens for events from other participants and decides what to do next. When a participant completes its local transaction, it publishes an event. Other participants subscribe to that event and react.

How It Works

  1. Order Service creates an order and publishes an OrderCreated event.
  2. Consumer Service hears the event, verifies the consumer, and publishes a ConsumerVerified event.
  3. Kitchen Service hears the event, creates a ticket, and publishes a TicketCreated event.
  4. And so on — each step is triggered by the previous step's event.

Benefits

Drawbacks

Analogy: Choreography is like a dance where every dancer watches the others and decides their next move based on what they see. It works for a small group, but with many dancers it becomes chaotic because nobody has the full picture.

Orchestration-Based Sagas

In an orchestration-based saga, a central orchestrator object controls the saga. The orchestrator sends command messages to participants and receives reply messages. Based on each reply, it decides what command to send next.

How It Works

  1. The orchestrator sends a command to the first participant (e.g., "verify consumer").
  2. The participant processes the command and sends a reply (success or failure).
  3. The orchestrator receives the reply and sends the next command (e.g., "create ticket").
  4. If any step fails, the orchestrator sends compensating commands in reverse order.

Modeling as a State Machine

An orchestration-based saga is naturally modeled as a state machine. It has:

The state machine makes the saga easy to understand, test, and debug because the entire flow is visible in one place.

Benefits

Drawbacks

Analogy: Orchestration is like a conductor leading an orchestra. The conductor tells each section when to play and what to play. Each musician focuses on their own part. The conductor keeps everyone in sync and handles the overall flow.

4.3 Handling the Lack of Isolation

Unlike ACID transactions, sagas do not provide isolation. While a saga is running, other transactions can read and modify the same data. This can cause anomalies. This section describes the anomalies and the countermeasures to handle them.

Overview of Anomalies

The lack of isolation can cause three types of anomalies:

Anomaly Description Example
Lost Updates One saga overwrites changes made by another saga without reading them first. Saga A changes an order. Saga B also changes the same order. Saga A's changes are lost.
Dirty Reads A saga reads data that another saga has written but not yet committed (or will later roll back). Saga B reads an order that Saga A created, but Saga A later rolls back and cancels the order.
Fuzzy / Nonrepeatable Reads A saga reads the same data twice and gets different results because another saga modified it in between. Saga A reads an order's total, then reads it again later and the total has changed because Saga B updated it.

Key problem: These anomalies can cause incorrect business decisions. For example, a dirty read might cause a saga to approve an order that will be cancelled moments later. You must use countermeasures to reduce or eliminate these risks.

Countermeasures for Handling the Lack of Isolation

There are six countermeasures that reduce or eliminate isolation anomalies. Choose the right one based on the specific anomaly and your business requirements.

1. Semantic Lock

Set a flag on the record to indicate it is being processed by a saga. For example, an order might have a status like APPROVAL_PENDING or CANCEL_PENDING. This acts as an application-level lock.

Other sagas or operations that see the *_PENDING flag know the record is locked. They can either retry later (wait for the saga to finish) or block until the lock is released.

Countermeasure: Semantic Lock

How Use *_PENDING states as application-level locks on records being modified by a saga.
Handles Lost Updates, Dirty Reads
Trade-off Other operations must handle locked records (retry or block).

2. Commutative Updates

Design your update operations so that they can be executed in any order and still produce the same result. If operations are commutative, the order in which concurrent sagas run does not matter, and lost updates are eliminated.

For example, instead of "set balance to X," use "add amount to balance" and "subtract amount from balance." These operations produce the same final result regardless of order.

3. Pessimistic View

Reorder the steps of a saga so that risky updates happen in retriable transactions (after the pivot) instead of compensatable transactions (before the pivot). Since retriable transactions are guaranteed to succeed, dirty reads of their intermediate state are less dangerous.

This reduces the window in which another saga can read inconsistent data.

4. Reread Value

Before updating a record, read it again and verify that it has not changed since the saga first read it. If the value has changed, abort the saga and restart it. This is essentially the Optimistic Offline Lock pattern.

This countermeasure prevents lost updates by detecting concurrent modifications.

5. Version File

Instead of applying updates directly, record the operations in a log (version file). Then reorder the logged operations so that they become commutative before applying them. This turns noncommutative operations into commutative ones.

For example, if "authorize credit card" arrives before "create order" due to a concurrent saga, the version file can reorder them into the correct sequence before execution.

6. By Value

Dynamically choose the concurrency mechanism based on the business risk of each request. Low-risk requests use a saga (with its weaker isolation). High-risk requests use a distributed transaction (with stronger guarantees but lower availability).

This is a pragmatic approach: most requests are low-risk, so the saga handles them efficiently. Only critical operations pay the cost of stricter consistency.

Countermeasure Anomaly Addressed How It Works
Semantic Lock Lost Updates, Dirty Reads Application-level *_PENDING lock flag
Commutative Updates Lost Updates Design operations to be order-independent
Pessimistic View Dirty Reads Move risky updates to retriable transactions
Reread Value Lost Updates Verify data unchanged before update (optimistic lock)
Version File Fuzzy Reads Log and reorder operations before applying
By Value All (selective) Choose saga or distributed transaction per request risk

4.4 The Design of the Order Service and the Create Order Saga

This section shows how the Saga pattern is implemented in practice using the Create Order Saga from the FTGO application. It uses the Eventuate Tram Saga framework for orchestration.

The OrderService Class

The OrderService class is the entry point for creating orders. When it receives a "create order" request, it does two things in a single transaction:

  1. Creates an Order entity in the database (in a PENDING state).
  2. Creates a CreateOrderSagaState object that holds the saga's persistent state.

It then calls SagaManager.create() to start the saga. The SagaManager takes over and drives the saga through its steps.

Note: The Order Service acts as both the orchestrator (it owns the saga) and a participant (it receives commands to approve or reject the order). This dual role is common in practice.

The Implementation of the Create Order Saga

CreateOrderSaga — The Saga Definition

The CreateOrderSaga class is a singleton that defines the saga's steps. It uses the Eventuate Tram Saga DSL (domain-specific language) to declare each step:

The saga definition is purely declarative. It describes what to do, not how to do it. The framework handles the execution.

CreateOrderSagaState — The Persistent State

The CreateOrderSagaState class holds data that the saga needs across steps. It creates command messages for each participant and processes reply messages. The SagaManager persists this state in a SAGA_INSTANCE table so the saga can survive restarts and failures.

Saga Participant Proxy Classes

Each participant has a proxy class (e.g., KitchenServiceProxy, OrderServiceProxy) that defines the participant's messaging API:

These proxy classes provide a clean, typed interface for the orchestrator to communicate with participants.

SagaManager — The Execution Engine

The SagaManager is the framework component that drives the saga. It:

  1. Persists the saga instance in the SAGA_INSTANCE database table.
  2. Sends the first command message to the first participant.
  3. Subscribes to reply messages from participants.
  4. On each reply, advances the state machine to the next state and sends the next command.
  5. On failure, sends compensating commands in reverse order.

Implementation Architecture

Orchestrator CreateOrderSaga (defines steps) + CreateOrderSagaState (holds data)
Engine SagaManager persists saga state and drives the state machine
Proxies KitchenServiceProxy, OrderServiceProxy — define participant channels and command/reply types
Persistence SAGA_INSTANCE table stores saga state across steps
Framework Eventuate Tram Saga

The OrderCommandHandlers Class

The OrderCommandHandlers class is an adapter that handles commands sent by sagas to the Order Service. Remember: the Order Service is both the orchestrator and a participant. When the saga sends a command like "approve order" or "reject order," the OrderCommandHandlers class receives it and calls the appropriate method on the Order entity.

A SagaCommandDispatcher routes incoming command messages to the correct handler method in OrderCommandHandlers. This dispatcher is configured to listen on the Order Service's command channel.

Pattern: The separation between the saga definition (CreateOrderSaga), the saga state (CreateOrderSagaState), and the command handlers (OrderCommandHandlers) keeps the code organized. Each class has a single responsibility.

The OrderServiceConfiguration Class

The OrderServiceConfiguration class wires everything together. It configures:

This configuration class is the glue that connects the orchestrator, the participants, and the messaging infrastructure.

Summary of the architecture: OrderService creates the order and starts the saga. SagaManager drives the state machine, sending commands to participants via proxy classes and processing their replies. OrderCommandHandlers handle commands sent to the Order Service itself. The framework persists all saga state so it survives failures.

Key Takeaways