How to maintain data consistency across services without distributed transactions — using the Saga pattern, choreography vs. orchestration, and countermeasures for isolation anomalies.
In a monolith, you can use a single ACID database transaction to keep data consistent. In a microservice architecture, each service has its own database. When an operation spans multiple services, you need a different approach to maintain consistency.
Many business operations touch data owned by more than one service. For example, creating an order in the FTGO application requires verifying the consumer, validating the restaurant menu, authorizing the credit card, and creating a kitchen ticket. Each of these actions belongs to a different service with its own database.
In a monolith, all this data lives in one database, so a single transaction guarantees consistency. In a microservice architecture, there is no single database. You cannot wrap all these steps in one ACID transaction. You need a mechanism that coordinates updates across multiple service databases.
Analogy: Imagine a group project where each team member keeps their own notebook. There is no shared notebook. If the project needs changes across all notebooks at once, the team needs a protocol to make sure everyone updates their notebook correctly — or rolls back if something goes wrong.
The traditional solution for cross-database consistency is distributed transactions, typically using the X/Open XA standard and two-phase commit (2PC). However, this approach has serious problems in a microservice architecture:
Problem: Distributed transactions (2PC) require all participants to be available simultaneously. This reduces system availability and does not work with many modern databases and brokers. They are not a viable solution for microservices.
Conclusion: The failure of distributed transactions in microservices motivates the need for a different pattern — the Saga pattern.
A saga is a sequence of local transactions. Each local transaction updates a single service's database using standard ACID guarantees, and then publishes a message or event to trigger the next step. If any step fails, the saga executes compensating transactions in reverse order to undo the work done by previous steps.
A saga provides three of the four ACID properties:
A saga does not provide isolation. Other transactions can see intermediate data while the saga is in progress. This lack of isolation can cause anomalies, which are addressed in section 4.3.
The steps of a saga fall into three categories:
| Type | Description | Compensation |
|---|---|---|
| Compensatable | Steps that can be undone. They precede the pivot transaction. | Yes — has a compensating transaction |
| Pivot | The go/no-go decision point. If it commits, the saga will run to completion. | No — it is the boundary |
| Retriable | Steps that follow the pivot. They are guaranteed to succeed. | No — always succeeds |
The Create Order Saga in FTGO has six steps:
Steps 1–3 are compensatable — each has a compensating transaction that undoes its work. Step 4 is the pivot — if the credit card is authorized, the saga will definitely complete. Steps 5–6 are retriable — they are guaranteed to succeed because the pivot already committed.
Analogy: Think of booking a vacation. You reserve a flight, then a hotel, then a rental car. If the rental car is not available, you cancel the hotel and the flight in reverse order. The "credit card charge" step is the point of no return — once you pay, the rest of the trip will happen.
A saga involves multiple services that must execute their steps in the right order. There are two approaches to coordinate a saga: choreography and orchestration.
| Aspect | Choreography | Orchestration |
|---|---|---|
| Coordination | Distributed — participants exchange events | Centralized — orchestrator sends commands |
| Coupling | Loose (via events) | Low (orchestrator knows the flow) |
| Complexity | Hard to understand as saga grows | Clear, linear flow |
| Dependencies | Risk of cyclic dependencies | No cyclic dependencies |
| Separation of concerns | Saga logic spread across participants | Saga logic centralized in orchestrator |
| Risk | Tight coupling through event chains | Bloated orchestrator with too much logic |
Recommendation: Use orchestration for all but the simplest sagas. Keep the orchestrator focused only on sequencing — it should not contain business logic.
Important: Both choreography and orchestration require Transactional Messaging (covered in Ch. 3) to atomically update the database and send/publish messages in the same operation.
In a choreography-based saga, there is no central coordinator. Each participant listens for events from other participants and decides what to do next. When a participant completes its local transaction, it publishes an event. Other participants subscribe to that event and react.
Analogy: Choreography is like a dance where every dancer watches the others and decides their next move based on what they see. It works for a small group, but with many dancers it becomes chaotic because nobody has the full picture.
In an orchestration-based saga, a central orchestrator object controls the saga. The orchestrator sends command messages to participants and receives reply messages. Based on each reply, it decides what command to send next.
An orchestration-based saga is naturally modeled as a state machine. It has:
The state machine makes the saga easy to understand, test, and debug because the entire flow is visible in one place.
Analogy: Orchestration is like a conductor leading an orchestra. The conductor tells each section when to play and what to play. Each musician focuses on their own part. The conductor keeps everyone in sync and handles the overall flow.
Unlike ACID transactions, sagas do not provide isolation. While a saga is running, other transactions can read and modify the same data. This can cause anomalies. This section describes the anomalies and the countermeasures to handle them.
The lack of isolation can cause three types of anomalies:
| Anomaly | Description | Example |
|---|---|---|
| Lost Updates | One saga overwrites changes made by another saga without reading them first. | Saga A changes an order. Saga B also changes the same order. Saga A's changes are lost. |
| Dirty Reads | A saga reads data that another saga has written but not yet committed (or will later roll back). | Saga B reads an order that Saga A created, but Saga A later rolls back and cancels the order. |
| Fuzzy / Nonrepeatable Reads | A saga reads the same data twice and gets different results because another saga modified it in between. | Saga A reads an order's total, then reads it again later and the total has changed because Saga B updated it. |
Key problem: These anomalies can cause incorrect business decisions. For example, a dirty read might cause a saga to approve an order that will be cancelled moments later. You must use countermeasures to reduce or eliminate these risks.
There are six countermeasures that reduce or eliminate isolation anomalies. Choose the right one based on the specific anomaly and your business requirements.
Set a flag on the record to indicate it is being processed by a saga. For example, an order might have a status like APPROVAL_PENDING or CANCEL_PENDING. This acts as an application-level lock.
Other sagas or operations that see the *_PENDING flag know the record is locked. They can either retry later (wait for the saga to finish) or block until the lock is released.
Design your update operations so that they can be executed in any order and still produce the same result. If operations are commutative, the order in which concurrent sagas run does not matter, and lost updates are eliminated.
For example, instead of "set balance to X," use "add amount to balance" and "subtract amount from balance." These operations produce the same final result regardless of order.
Reorder the steps of a saga so that risky updates happen in retriable transactions (after the pivot) instead of compensatable transactions (before the pivot). Since retriable transactions are guaranteed to succeed, dirty reads of their intermediate state are less dangerous.
This reduces the window in which another saga can read inconsistent data.
Before updating a record, read it again and verify that it has not changed since the saga first read it. If the value has changed, abort the saga and restart it. This is essentially the Optimistic Offline Lock pattern.
This countermeasure prevents lost updates by detecting concurrent modifications.
Instead of applying updates directly, record the operations in a log (version file). Then reorder the logged operations so that they become commutative before applying them. This turns noncommutative operations into commutative ones.
For example, if "authorize credit card" arrives before "create order" due to a concurrent saga, the version file can reorder them into the correct sequence before execution.
Dynamically choose the concurrency mechanism based on the business risk of each request. Low-risk requests use a saga (with its weaker isolation). High-risk requests use a distributed transaction (with stronger guarantees but lower availability).
This is a pragmatic approach: most requests are low-risk, so the saga handles them efficiently. Only critical operations pay the cost of stricter consistency.
| Countermeasure | Anomaly Addressed | How It Works |
|---|---|---|
| Semantic Lock | Lost Updates, Dirty Reads | Application-level *_PENDING lock flag |
| Commutative Updates | Lost Updates | Design operations to be order-independent |
| Pessimistic View | Dirty Reads | Move risky updates to retriable transactions |
| Reread Value | Lost Updates | Verify data unchanged before update (optimistic lock) |
| Version File | Fuzzy Reads | Log and reorder operations before applying |
| By Value | All (selective) | Choose saga or distributed transaction per request risk |
This section shows how the Saga pattern is implemented in practice using the Create Order Saga from the FTGO application. It uses the Eventuate Tram Saga framework for orchestration.
The OrderService class is the entry point for creating orders. When it receives a "create order" request, it does two things in a single transaction:
It then calls SagaManager.create() to start the saga. The SagaManager takes over and drives the saga through its steps.
Note: The Order Service acts as both the orchestrator (it owns the saga) and a participant (it receives commands to approve or reject the order). This dual role is common in practice.
The CreateOrderSaga class is a singleton that defines the saga's steps. It uses the Eventuate Tram Saga DSL (domain-specific language) to declare each step:
step() — defines a step in the sagainvokeParticipant() — specifies the command to send to a participantwithCompensation() — specifies the compensating command if the saga must roll backonReply() — handles the participant's replyThe saga definition is purely declarative. It describes what to do, not how to do it. The framework handles the execution.
The CreateOrderSagaState class holds data that the saga needs across steps. It creates command messages for each participant and processes reply messages. The SagaManager persists this state in a SAGA_INSTANCE table so the saga can survive restarts and failures.
Each participant has a proxy class (e.g., KitchenServiceProxy, OrderServiceProxy) that defines the participant's messaging API:
These proxy classes provide a clean, typed interface for the orchestrator to communicate with participants.
The SagaManager is the framework component that drives the saga. It:
SAGA_INSTANCE database table.The OrderCommandHandlers class is an adapter that handles commands sent by sagas to the Order Service. Remember: the Order Service is both the orchestrator and a participant. When the saga sends a command like "approve order" or "reject order," the OrderCommandHandlers class receives it and calls the appropriate method on the Order entity.
A SagaCommandDispatcher routes incoming command messages to the correct handler method in OrderCommandHandlers. This dispatcher is configured to listen on the Order Service's command channel.
Pattern: The separation between the saga definition (CreateOrderSaga), the saga state (CreateOrderSagaState), and the command handlers (OrderCommandHandlers) keeps the code organized. Each class has a single responsibility.
The OrderServiceConfiguration class wires everything together. It configures:
This configuration class is the glue that connects the orchestrator, the participants, and the messaging infrastructure.
Summary of the architecture: OrderService creates the order and starts the saga. SagaManager drives the state machine, sending commands to participants via proxy classes and processing their replies. OrderCommandHandlers handle commands sent to the Order Service itself. The framework persists all saga state so it survives failures.