Chapter 4 — Managing Transactions with Sagas | Microservices Patterns Study Guide

4.1 Transaction Management in a Microservice Architecture

In a monolith, you can use a single ACID database transaction to keep data consistent. In a microservice architecture, each service has its own database. When an operation spans multiple services, you need a different approach to maintain consistency.

The Need for Distributed Transactions

Many business operations touch data owned by more than one service. For example, creating an order in the FTGO application requires verifying the consumer, validating the restaurant menu, authorizing the credit card, and creating a kitchen ticket. Each of these actions belongs to a different service with its own database.

In a monolith, all this data lives in one database, so a single transaction guarantees consistency. In a microservice architecture, there is no single database. You cannot wrap all these steps in one ACID transaction. You need a mechanism that coordinates updates across multiple service databases.

Analogy: Imagine a group project where each team member keeps their own notebook. There is no shared notebook. If the project needs changes across all notebooks at once, the team needs a protocol to make sure everyone updates their notebook correctly — or rolls back if something goes wrong.

The Trouble with Distributed Transactions

The traditional solution for cross-database consistency is distributed transactions, typically using the X/Open XA standard and two-phase commit (2PC). However, this approach has serious problems in a microservice architecture:

Many modern technologies do not support it — NoSQL databases (MongoDB, Cassandra) and message brokers (Apache Kafka) do not participate in XA transactions.
It reduces availability — all participants must be available for the transaction to commit. The overall availability is the product of each participant's availability, which drops quickly as you add more services.
CAP theorem — in a distributed system, you must choose between consistency and availability. Modern architectures favor availability over strict consistency.

Problem: Distributed transactions (2PC) require all participants to be available simultaneously. This reduces system availability and does not work with many modern databases and brokers. They are not a viable solution for microservices.

Conclusion: The failure of distributed transactions in microservices motivates the need for a different pattern — the Saga pattern.

Using the Saga Pattern to Maintain Data Consistency

A saga is a sequence of local transactions. Each local transaction updates a single service's database using standard ACID guarantees, and then publishes a message or event to trigger the next step. If any step fails, the saga executes compensating transactions in reverse order to undo the work done by previous steps.

Pattern: Saga

Problem How to maintain data consistency across services without distributed transactions?

Solution Define a saga — a sequence of local transactions coordinated via asynchronous messaging. On failure, compensating transactions undo completed steps in reverse order.

Properties ACD (Atomicity, Consistency, Durability) but NO Isolation.

Requires Transactional Messaging (Ch. 3) to atomically update the database and send messages.

Saga Properties: ACD without Isolation

A saga provides three of the four ACID properties:

Atomicity — either all local transactions execute, or all compensating transactions run to undo the work.
Consistency — each local transaction preserves consistency within its service. Cross-service consistency is maintained through the saga protocol.
Durability — each local transaction commits to its service's database, which provides durability.

A saga does not provide isolation. Other transactions can see intermediate data while the saga is in progress. This lack of isolation can cause anomalies, which are addressed in section 4.3.

Transaction Types in a Saga

The steps of a saga fall into three categories:

Type	Description	Compensation
Compensatable	Steps that can be undone. They precede the pivot transaction.	Yes — has a compensating transaction
Pivot	The go/no-go decision point. If it commits, the saga will run to completion.	No — it is the boundary
Retriable	Steps that follow the pivot. They are guaranteed to succeed.	No — always succeeds

Example: Create Order Saga

The Create Order Saga in FTGO has six steps:

Order Service — create order in PENDING state (compensatable)
Consumer Service — verify consumer (compensatable)
Kitchen Service — create ticket (compensatable)
Accounting Service — authorize credit card (pivot)
Kitchen Service — approve ticket (retriable)
Order Service — approve order (retriable)

Steps 1–3 are compensatable — each has a compensating transaction that undoes its work. Step 4 is the pivot — if the credit card is authorized, the saga will definitely complete. Steps 5–6 are retriable — they are guaranteed to succeed because the pivot already committed.

Analogy: Think of booking a vacation. You reserve a flight, then a hotel, then a rental car. If the rental car is not available, you cancel the hotel and the flight in reverse order. The "credit card charge" step is the point of no return — once you pay, the rest of the trip will happen.

4.2 Coordinating Sagas

A saga involves multiple services that must execute their steps in the right order. There are two approaches to coordinate a saga: choreography and orchestration.

Aspect	Choreography	Orchestration
Coordination	Distributed — participants exchange events	Centralized — orchestrator sends commands
Coupling	Loose (via events)	Low (orchestrator knows the flow)
Complexity	Hard to understand as saga grows	Clear, linear flow
Dependencies	Risk of cyclic dependencies	No cyclic dependencies
Separation of concerns	Saga logic spread across participants	Saga logic centralized in orchestrator
Risk	Tight coupling through event chains	Bloated orchestrator with too much logic

Recommendation: Use orchestration for all but the simplest sagas. Keep the orchestrator focused only on sequencing — it should not contain business logic.

Important: Both choreography and orchestration require Transactional Messaging (covered in Ch. 3) to atomically update the database and send/publish messages in the same operation.

Choreography-Based Sagas

In a choreography-based saga, there is no central coordinator. Each participant listens for events from other participants and decides what to do next. When a participant completes its local transaction, it publishes an event. Other participants subscribe to that event and react.

How It Works

Order Service creates an order and publishes an OrderCreated event.
Consumer Service hears the event, verifies the consumer, and publishes a ConsumerVerified event.
Kitchen Service hears the event, creates a ticket, and publishes a TicketCreated event.
And so on — each step is triggered by the previous step's event.

Benefits

Simple — no extra orchestrator service needed
Loose coupling — participants only know about events, not about each other

Drawbacks

Hard to understand — the saga logic is spread across all participants. There is no single place to see the full flow.
Cyclic dependencies — participants may subscribe to each other's events, creating dependency cycles.
Risk of tight coupling — participants may need to know about events from many other services, which creates implicit coupling.

Analogy: Choreography is like a dance where every dancer watches the others and decides their next move based on what they see. It works for a small group, but with many dancers it becomes chaotic because nobody has the full picture.

Orchestration-Based Sagas

In an orchestration-based saga, a central orchestrator object controls the saga. The orchestrator sends command messages to participants and receives reply messages. Based on each reply, it decides what command to send next.

How It Works

The orchestrator sends a command to the first participant (e.g., "verify consumer").
The participant processes the command and sends a reply (success or failure).
The orchestrator receives the reply and sends the next command (e.g., "create ticket").
If any step fails, the orchestrator sends compensating commands in reverse order.

Modeling as a State Machine

An orchestration-based saga is naturally modeled as a state machine. It has:

States — represent where the saga is in its sequence (e.g., VerifyingConsumer, CreatingTicket, AuthorizingCard)
Transitions — triggered by reply messages from participants
Actions — commands sent to participants when entering a new state

The state machine makes the saga easy to understand, test, and debug because the entire flow is visible in one place.

Benefits

Clear flow — the entire saga logic is in one place (the orchestrator)
No cyclic dependencies — participants do not depend on each other, only the orchestrator depends on participants
Better separation of concerns — participants only need to implement the command they receive; they do not need to know about the saga

Drawbacks

Risk of bloated orchestrator — if you put too much business logic in the orchestrator, it becomes a "god class." Keep it focused on sequencing only.

Analogy: Orchestration is like a conductor leading an orchestra. The conductor tells each section when to play and what to play. Each musician focuses on their own part. The conductor keeps everyone in sync and handles the overall flow.

4.3 Handling the Lack of Isolation

Unlike ACID transactions, sagas do not provide isolation. While a saga is running, other transactions can read and modify the same data. This can cause anomalies. This section describes the anomalies and the countermeasures to handle them.

Overview of Anomalies

The lack of isolation can cause three types of anomalies:

Anomaly	Description	Example
Lost Updates	One saga overwrites changes made by another saga without reading them first.	Saga A changes an order. Saga B also changes the same order. Saga A's changes are lost.
Dirty Reads	A saga reads data that another saga has written but not yet committed (or will later roll back).	Saga B reads an order that Saga A created, but Saga A later rolls back and cancels the order.
Fuzzy / Nonrepeatable Reads	A saga reads the same data twice and gets different results because another saga modified it in between.	Saga A reads an order's total, then reads it again later and the total has changed because Saga B updated it.

Key problem: These anomalies can cause incorrect business decisions. For example, a dirty read might cause a saga to approve an order that will be cancelled moments later. You must use countermeasures to reduce or eliminate these risks.

Countermeasures for Handling the Lack of Isolation

There are six countermeasures that reduce or eliminate isolation anomalies. Choose the right one based on the specific anomaly and your business requirements.

1. Semantic Lock

Set a flag on the record to indicate it is being processed by a saga. For example, an order might have a status like APPROVAL_PENDING or CANCEL_PENDING. This acts as an application-level lock.

Other sagas or operations that see the *_PENDING flag know the record is locked. They can either retry later (wait for the saga to finish) or block until the lock is released.

Countermeasure: Semantic Lock

How Use *_PENDING states as application-level locks on records being modified by a saga.

Handles Lost Updates, Dirty Reads

Trade-off Other operations must handle locked records (retry or block).

2. Commutative Updates

Design your update operations so that they can be executed in any order and still produce the same result. If operations are commutative, the order in which concurrent sagas run does not matter, and lost updates are eliminated.

For example, instead of "set balance to X," use "add amount to balance" and "subtract amount from balance." These operations produce the same final result regardless of order.

3. Pessimistic View

Reorder the steps of a saga so that risky updates happen in retriable transactions (after the pivot) instead of compensatable transactions (before the pivot). Since retriable transactions are guaranteed to succeed, dirty reads of their intermediate state are less dangerous.

This reduces the window in which another saga can read inconsistent data.

4. Reread Value

Before updating a record, read it again and verify that it has not changed since the saga first read it. If the value has changed, abort the saga and restart it. This is essentially the Optimistic Offline Lock pattern.

This countermeasure prevents lost updates by detecting concurrent modifications.

5. Version File

Instead of applying updates directly, record the operations in a log (version file). Then reorder the logged operations so that they become commutative before applying them. This turns noncommutative operations into commutative ones.

For example, if "authorize credit card" arrives before "create order" due to a concurrent saga, the version file can reorder them into the correct sequence before execution.

6. By Value

Dynamically choose the concurrency mechanism based on the business risk of each request. Low-risk requests use a saga (with its weaker isolation). High-risk requests use a distributed transaction (with stronger guarantees but lower availability).

This is a pragmatic approach: most requests are low-risk, so the saga handles them efficiently. Only critical operations pay the cost of stricter consistency.

Countermeasure	Anomaly Addressed	How It Works
Semantic Lock	Lost Updates, Dirty Reads	Application-level *_PENDING lock flag
Commutative Updates	Lost Updates	Design operations to be order-independent
Pessimistic View	Dirty Reads	Move risky updates to retriable transactions
Reread Value	Lost Updates	Verify data unchanged before update (optimistic lock)
Version File	Fuzzy Reads	Log and reorder operations before applying
By Value	All (selective)	Choose saga or distributed transaction per request risk

4.4 The Design of the Order Service and the Create Order Saga

This section shows how the Saga pattern is implemented in practice using the Create Order Saga from the FTGO application. It uses the Eventuate Tram Saga framework for orchestration.

The OrderService Class

The OrderService class is the entry point for creating orders. When it receives a "create order" request, it does two things in a single transaction:

Creates an Order entity in the database (in a PENDING state).
Creates a CreateOrderSagaState object that holds the saga's persistent state.

It then calls SagaManager.create() to start the saga. The SagaManager takes over and drives the saga through its steps.

Note: The Order Service acts as both the orchestrator (it owns the saga) and a participant (it receives commands to approve or reject the order). This dual role is common in practice.

The Implementation of the Create Order Saga

CreateOrderSaga — The Saga Definition

The CreateOrderSaga class is a singleton that defines the saga's steps. It uses the Eventuate Tram Saga DSL (domain-specific language) to declare each step:

step() — defines a step in the saga
invokeParticipant() — specifies the command to send to a participant
withCompensation() — specifies the compensating command if the saga must roll back
onReply() — handles the participant's reply

The saga definition is purely declarative. It describes what to do, not how to do it. The framework handles the execution.

CreateOrderSagaState — The Persistent State

The CreateOrderSagaState class holds data that the saga needs across steps. It creates command messages for each participant and processes reply messages. The SagaManager persists this state in a SAGA_INSTANCE table so the saga can survive restarts and failures.

Saga Participant Proxy Classes

Each participant has a proxy class (e.g., KitchenServiceProxy, OrderServiceProxy) that defines the participant's messaging API:

The message channel to send commands to
The command types the participant accepts
The reply types the participant sends back

These proxy classes provide a clean, typed interface for the orchestrator to communicate with participants.

SagaManager — The Execution Engine

The SagaManager is the framework component that drives the saga. It:

Persists the saga instance in the SAGA_INSTANCE database table.
Sends the first command message to the first participant.
Subscribes to reply messages from participants.
On each reply, advances the state machine to the next state and sends the next command.
On failure, sends compensating commands in reverse order.

Implementation Architecture

Orchestrator CreateOrderSaga (defines steps) + CreateOrderSagaState (holds data)

Engine SagaManager persists saga state and drives the state machine

Proxies KitchenServiceProxy, OrderServiceProxy — define participant channels and command/reply types

Persistence SAGA_INSTANCE table stores saga state across steps

Framework Eventuate Tram Saga

The OrderCommandHandlers Class

The OrderCommandHandlers class is an adapter that handles commands sent by sagas to the Order Service. Remember: the Order Service is both the orchestrator and a participant. When the saga sends a command like "approve order" or "reject order," the OrderCommandHandlers class receives it and calls the appropriate method on the Order entity.

A SagaCommandDispatcher routes incoming command messages to the correct handler method in OrderCommandHandlers. This dispatcher is configured to listen on the Order Service's command channel.

Pattern: The separation between the saga definition (CreateOrderSaga), the saga state (CreateOrderSagaState), and the command handlers (OrderCommandHandlers) keeps the code organized. Each class has a single responsibility.

The OrderServiceConfiguration Class

The OrderServiceConfiguration class wires everything together. It configures:

The CreateOrderSaga singleton — the saga definition
The SagaManager — the execution engine
The OrderCommandHandlers — the command handlers for when Order Service is a participant
The SagaCommandDispatcher — routes commands to handlers
The participant proxy classes — define messaging APIs for other services

This configuration class is the glue that connects the orchestrator, the participants, and the messaging infrastructure.

Summary of the architecture: OrderService creates the order and starts the saga. SagaManager drives the state machine, sending commands to participants via proxy classes and processing their replies. OrderCommandHandlers handle commands sent to the Order Service itself. The framework persists all saga state so it survives failures.

Key Takeaways

In a microservice architecture, each service has its own database. Cross-service operations cannot use a single ACID transaction.
Distributed transactions (X/Open XA, 2PC) do not work in modern microservices because many technologies do not support them and they reduce availability.
The Saga pattern maintains consistency through a sequence of local transactions coordinated via asynchronous messaging. On failure, compensating transactions undo completed steps.
Sagas provide Atomicity, Consistency, and Durability — but not Isolation.
Saga steps are classified as compensatable (before the pivot), pivot (the go/no-go point), or retriable (after the pivot, guaranteed to succeed).
Choreography coordinates sagas through events exchanged between participants — simple but hard to understand at scale.
Orchestration uses a central orchestrator modeled as a state machine — clearer flow, recommended for most sagas.
The lack of isolation causes three anomalies: Lost Updates, Dirty Reads, and Fuzzy Reads.
Six countermeasures address these anomalies: Semantic Lock, Commutative Updates, Pessimistic View, Reread Value, Version File, and By Value.
In practice, the Eventuate Tram Saga framework provides a DSL for defining sagas, a SagaManager for execution, and participant proxy classes for typed communication.
A service can be both the saga orchestrator and a saga participant — the Order Service in FTGO plays both roles.

Chapter 3 Chapter 5